Path integral control theory

Overview and history

Control theory is a theory from engineering that gives a formal description of how a system, such as a robot or animal, can move from a current state to a future state at minimal cost, where cost can mean time spent, or energy spent or any other quantity. Control theory is used traditionally to control industrial plants, airplanes or missiles, but is also the natural framework to model intelligent behavior in animals or robots. The mathematical formulation of deterministic control theory is very similar to classical mechanics. In fact, classical mechanics can be viewed as a special case of control theory.

Stochastic control theory uses the language of stochastic differential equations. The optimal control is usually computed from the Bellman equation, which is a partial differential equation. Solving the equation for high dimensional systems is difficult in general, except for special cases, most notably the case of linear dynamics and quadratic control cost or the noiseless deterministic case. Therefore, despite its elegance and generality, SOC has not been used much in practice.

In (Fleming et al. 1982) it was observed that posterior inference in a certain class of diffusion processes can be mapped onto a stochastic optimal control problem. These so-called Path integral (PI) control problems (Kappen 2005, Todorov 2006) represent a restricted class of non-linear control problems with arbitrary dynamics and state cost, but with a linear dependence of the control on the dynamics and quadratic control cost. For this class of control problems, the Bellman equation can be transformed into a linear partial differential equation. The solution for both the optimal control and the optimal cost-to-go can be expressed in closed form as a Feyman-Kac path integral. The path integral involves an expectation value with respect to a dynamical system. As a result, the optimal control can be estimated using Monte Carlo sampling or deterministic approximation methods. The path integral control method provides a deep link between control, inference and statistical physics. This statistical physics view of control theory shows that qualitative different control solutions exist for different noise levels separated by phase transitions. See (Todorov 2009, Kappen 2011, Kappen et al. 2012) for earlier reviews and references.

Equivalence between optimal control and optimal sampling

The Monte Carlos sampling method can be made more efficient using importance sampling. For the path integral control problem, importance sampling takes the form of a control signal that steers the particle to regions of low cost. One can use an arbitrary controller, provided that the trajectories are reweighted appropriately. Different importance samplers, using different controllers, all yield unbiased estimates of the optimal control, but differ in their efficiency. One can show an intimate relation between optimal importance sampling and optimal control: the optimal control solution {\em is} the optimal sampler, and better samplers (in terms of effective sample size) are better controllers (in terms of control cost) (Thijssen et al. 2015). The idea is mathematically described by the Girsanov change of measure.

This allow the design of iterative procedure, where in iteration i+1 one generates samples using the controller that was estimated in iteration i. In this way one increases the efficiency of the sampling in each iteration. The controller that is estimated in iteration i is a function that maps states to actions. This is a complex function in general and an infinite dimensional object to estimate, and thus requiring an infinite amount of samples. For practical implementation it is thus an important question are how to represent this control function. In (Kappen, Ruiz 2015) we show how we can learn a control function that is parametrized by a finite number of parameters using the so-called Cross Entropy method. The controller is learned on the basis of self-generated data using a gradient method. The gradient involve 'statistics' that are estimated using self-generated samples using importance sampling. The samples that are generated in the first importance sampling iteration provide a very poor statistics for the learning because of the poor importance sampler. With these samples, only a simple control function can be learned. However, the data that are generated with this simple control function yields much better statistics and thus a more complex control function can be learned. During subsequent iterations, better and more complex controllers are learned that generate more and more efficient statistics that allow to learn even better controllers. In the limit the method may estimate the optimal control solution provided that the this solution can be expressed by the parametrized model.

Control and particle smoothing

The adaptive importance sampling procedure and the equivalence between optimal control and optimal sampling can be used in both directions. For stochastic time-series problems, one can derive an equivalent control formulation and compute approximations to the optimal control for use as importance sampling.

Particle smoothing methods are used for inference of stochastic time series based on noisy observations. We demonstrate how the smoothing problem can be mapped onto a path integral control problem. Subsequently, we use an adaptive importance sampling method to improve drastically the effective sampling size of the posterior and the reliability of the estimations of the marginal smoothing distributions. This method has a linear computational complexity in the number of particles and gives a feedback controller that makes it possible to sample efficiently from the joint smoothing distribution. We show that the proposed method gives more reliable estimations than state-of-the-art forward filtering backward smoothing methods in significantly less time. See (Ruiz et al. 2015).


We jointly develop path integral control methods in collaboration with UCL for Quadrotors. We illustrate the approach in the video below. The video consists of three parts: the control of the holding pattern and the cat & mouse scenario in simulation, and real flight with 2 and 3 quadrotors. We are able to control up to 20 quadrotors (with 4 dynamical variables each), and probably more. We found no significant performance loss of the control method by moving from the simulator to the real quadrotors, but there are many other practical limitations which made flying with more than 3 quadrotors difficult. We can thus conclude that the path integral control method is able to compute the state- dependent optimal control for realistic high dimensional stochastic non-linear problem in real time.

We develop decentralized control strategies for UAVs. This is illustrated for the task where a number of UAVs must maintain a holding pattern. Oblivious3 and Oblivious10 are simulations of decentralized control without communication. Each agent assumes that each other agent will move straight ahead based on the measurement of their current position and velocity. Based on that assumption, each agent computes its optimal control. Communication3 and Communication10 are simulations of decentralized control with communication. Each agent send broadcasts its expected future trajectory to all other agents. Based on this information, each agent computes its optimal control.

Related research