There are many important open problems within applied computational statistics, many of which fall out of mainstream academic interests and are hence overlooked. Here I collect some of the research projects that I personally think would be particularly impactful in improving tools for practical applications. Unsurprisingly there is a strong focus on the principled implementation of geometric methodologies.
At the moment I am not able to supervise any of these projects but I am happy to answer occasional questions. Additionally I may be interested in collaboration.
Step Size Optimization for Dynamic Hamiltonian Monte Carlo
Early implementations of Hamiltonian Monte Carlo utilized trajectories with static integration times and considered only the final point in the trajectory. In Optimizing The Integrator Step Size for Hamiltonian Monte Carlo colleagues and I derived a general criteria for the optimal integrator step size in these algorithms. Modern implementations of Hamiltonian Monte Carlo, in particular that used in Stan, utilize dynamic integration times and consider all of the points within each trajectory. Scattered empirical results suggest that the step size optimization for these algorithms must be modified.
Can one generalize the optimality criterion for static implementations to dynamic implementations?
Geometric Ergodicity of Dynamic Hamiltonian Monte Carlo
In On the Geometric Ergodicity of Hamiltonian Monte Carlo colleagues and I identified important obstructions to the geometric ergodicity of static Hamiltonian Monte Carlo implementations. Empirical and theoretical evidence, however, indicates that many of these obstructions do not persist to dynamic implementations of Hamiltonian Monte Carlo.
Can one identify obstructions to geometric ergodicity for dynamic implementations?
Note that many of the mathematical techniques utilized in the literature rely on bounding a Markov chain by a diffusion with known properties. These bounds have limited utility for Hamiltonian Monte Carlo, however, which exploits non-diffusive Hamiltonian flow. Significant generalizations may require entirely new techniques, perhaps exploiting the topology of Hamiltonian sets and subsequent properties like Poincaré recurrence and Hamiltonian chaos. See for example “Hamiltonian Chaos and Fractional Dynamics” by Zaslavsky as well as some interesting work in Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.
Divergences for Implicit Symplectic Integrators
Divergences result when a symplectic integrator becomes unstable and numerical trajectories stray away from the true trajectories they are attempting to simulate. In practice they are powerful diagnostics of the failure of Hamiltonian Monte Carlo methods to adequately explore a given target distribution. For more see Section 5 and 6 of A Conceptual Introduction to Hamiltonian Monte Carlo.
Divergences are straightforward to identify using explicit symplectic integrators but they are more subtle for implicit symplectic integrators, in particular those required by Riemannian Hamiltonian Monte Carlo implementations. The problem is that each update of an implicit symplectic integrators requires a fixed point solution of an implicit function, and these fixed point solves may fail to converge. Indeed they are vulnerable to convergence failures in exactly the circumstances that encourage divergences, making it difficult to distinguish between a true divergence and a convergence failure.
Can one design a robust implementation of implicit symplectic integrators that doesn’t compromise the diagnostic power of divergences in Hamiltonian Monte Carlo?
Dynamic Tuning of Higher-Order Symplectic Integrators
Typical implementations of Hamiltonian Monte Carlo utilize the leapfrog integrator, which is a second-order symplectic integrator. Higher-order symplectic integrators are more expensive but offer higher numerical accuracy that has the potential to offer higher overall performance within Hamiltonian Monte Carlo algorithms. The problem is that higher-order symplectic integrators have multiple configurations which introduce additional tuning parameters. Unfortunately the performance of resulting Hamiltonian Monte Carlo implementations can be extremely sensitive to these tunings and the optimal tuning is itself sensitive to the integrator step size, further complicating matters. See [Adaptive multi-stage integrators for optimal energy conservation in molecular simulations] (http://arxiv.org/abs/1512.03335v2) and Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods for some relevant discussion.
Can one design a comprehensive tuning method for implementations of Hamiltonian Monte Carlo that utilize higher-order symplectic integrators that covers the integrator configuration and step size?
Implementing Adiabatic Monte Carlo
In Adiabatic Monte Carlo I introduced a generalization of Hamiltonian Monte Carlo capable of targeting multimodal distributions. Robust implementations of this method share much with robust implementations of Hamiltonian Monte Carlo methods, but there are a few key differences. The most problematic of these is that Hamiltonian Monte Carlo targets a single distribution while Adiabatic Monte Carlo targets an smooth interpolation between two distributions. This difference means that the techniques used to compensate for the numerical error introduced by simulations in Hamiltonian Monte Carlo are not applicable to Adiabatic Monte Carlo.
Can one design an exact algorithmic correction to the numerical error introduced when simulating adiabatic flows in Adiabatic Monte Carlo?
Generalizing Hamiltonian Monte Carlo to Novel Topological Spaces
Hamiltonian Monte Carlo is applicable to any smooth probability distribution defined on a smooth manifold, for example those that admit probability density functions with well-defined gradients. The methodology, however, is a special case of a more general approach that may be applicable to more general spaces.
This more general approach focuses on building flows that preserve the given target distribution from group actions. In particular these flows arise immediately from the orbits of any group whose Haar measure equals or marginalizes to the target distribution. For example, Hamiltonian Monte Carlo can be thought of as a method that constructs a symplectomorphism group that acts on the cotangent bundle of the target space. The symplectomorphic orbits trace out the Hamiltonian trajectories that generate the efficient exploration of the target space. For more discussion see Section 4.3 of The Geometric Foundations of Hamiltonian Monte Carlo.
Taking this group theoretic perspective may help to identify similar methods amenable to non-smooth spaces. For example, might the groups arising in tropical algebras provide the foundation of methods that flow between the topological configurations in phylogenetic spaces?
Can on identify group constructions that provide the foundations for generalizing Hamiltonian Monte Carlo to novel spaces?