Inference, neural networks and control

The course provides an advanced introduction to the modern view on Bayesian inference with application in neural networks and control theory.

Course material: The course is mainly based on

Bayesian Inference
- chapters of Information Theory, Inference and Learning Algorithms from David MacKay (referred to below as MK). This book can be downloaded from MacKay's web site.
- Pattern Recognition and Machine Learning, Christopher Bishop, Springer 2006, chapter 8.
- Further reading
  - Why use probabilities:
    - Cox, R. T. (1946). Probability, frequency and reasonable expectation. American journal of physics, 14, 1.
    - Jaynes, E. T. (2003). Probability theory: the logic of science. Cambridge university press.
  - Graphical models:
    - Cowell, R. G., Dawid, P., Lauritzen, S. L., & Spiegelhalter, D. J. (2007). Probabilistic networks and expert systems: Exact computational methods for Bayesian networks. Springer.
  - Approximate inference:
    - Kappen, H. J., & Rodriguez, F. B. (1998). Efficient learning in Boltzmann machines using linear response theory. Neural Computation, 10(5), 1137-1156.
    - Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999, July). Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 467-475). Morgan Kaufmann Publishers Inc..
    - Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. Advances in neural information processing systems, 689-695.
    - Graphical models, exponential families, and variational inference, M.J. Wainwright and M.I. Jordan
    - Pelizzola, A. (2005). Cluster variation method in statistical physics and probabilistic graphical models. Journal of Physics A: Mathematical and General, 38(33), R309.
    - Mooij, J. M. (2008). Understanding and improving belief propagation.
Neural networks
- Supervised learning, perceptrons, multi-layered perceptrons
  - Perceptrons sheets supervised learning 1
  - Multi-layered perceptron sheets supervised learning 2
- Recurrent stochastic networks
  - Ergodic Markov Chains reader chapter 3
  - Boltzmann Machines sheets Boltzmann Machines reader chapter 4
  - Attractor neural networks sheets attractors reader chapter 5
- Further reading
  - Dayan and Abbott. Theoretical Neuroscience (DA) chapters 7.6, 8.4, pg. 322-324
  - Hertz, Krogh and Palmer Introduction to the theory of neural computation (HKP) chapters 2, 5, 6 and 7.1
Control lectures:
- chapter 1 from the book Dynamic programming and optimal control by Dimitri Bertsekas. Copies 1a Copies 1b (from 1st edition, 2nd edition is current).
- my ICML 2008 tutorial text will be published in a book Inference and Learning in Dynamical Models (Cambridge University Press 2010), edited by David Barber, Taylan Cemgil and Sylvia Chiappa.
- These are the slides that are used for the course.
- Further reading on path integral control theory
  - Theodorou et al., AISTATS 2010 Application of path integral control to real robots.
  - van den Broek et al., JAIR 2008 A (model based) sampling approach to control a robot arm.
  - van den Broek et al., UAI 2010 Extension of path integral control to risk sensitive control.
  - Kappen, Gomez, Opper. Machine Learning 2012 A formulation of path integral control as a special case of KL control and an application of multi-agent coordination.
Sparse regression:
- lasso slides
- These papers are on Lasso Pathwise coordinate optimization Regularized Paths for Generalized Linear Models via Coordinate Descent
- This paper is on convergence of fixed point iterations paper.pdf
- L0 slides
- This is a paper on spike and slab George and McCulloch 1993
- This is a paper on variational garrote Kappen 2011

Presentation schedule:

Lecture Topic Material Exercises

1 Probability, entropy and inference MK 2,3 MK 2.4, 2.6 + continued, 2.7, 2.8 to be discussed in the class.
Exercises: MK 2.10, 2.14, 2.16ab, 2.18, 2.19, 2.26
MK 3.3, 3.4, 3.15 to be discussed in the class.
Exercises: MK 3.1, 3.2, 3.5, 3.6, (3.7 Bonus), 3.8, 3.9

2 Model comparison and Occam's raisor
Graphical models MK 28
Bishop Chapter 8 [slides]
Skip 8.1.3 Exercises: MK 28.1, 28.2 only for model H_1, (28.3 Bonus)
8.1, 8.14 to be discussed in the class.
Exercises: 8.3, 8.4, 8.7, 8.10, 8.11, 8.15

3 Approximate inference cvm_sheets
nips paper Run Belief Propagation for a discrete optimization problem see: opgave Ising model.
NB: Download LIBDAI from: LIBDAI.

4 Networks of binary neurons
Markov processes
Boltzmann Machines
Mean field approximation
Linear response approximation sheets Boltzmann Machines
reader chapter 3, 4 reader chapter 3 exercise 2
reader chapter 4 exercises 1, 2

5 Monte Carlo Methods (2), HMC
Bayesian inference with perceptron MK 29.1-29.5
MK 29.6,29.9 ,30.1, 30.3
MK 38, 39, 41
slides Exercise to compare MCMC with Belief Propagation discrete optimization problem see: opgave Ising model.
NB: Download LIBDAI from: LIBDAI.
An example of Baysian inference in perceptron learning using MCMC methods. The files (Matlabfiles and instructions) needed to do this exercise can be found here: [mcmc_mackay.tar.gz].

6 Exercises and recap sofar MK 2.10, MK 3.15 3.8 3.9
Example message passing in chain and loopy graph
Example sequential dynamics
Detailed balance, MF examples

7 Discrete time control
dynamic programming
Bellman equation
Bertsekas 2-5, 13-14, 18, 21-32 (2nd ed.)
Bertsekas 2-5, 10-12, 16-27, 30-32 (1nd ed.)
Kappen ICML tutorial 1.2
slides up to 28 Ex: Carry out the calculations needed to verify that J0(1)=2.7 and J0(2)=2.818 in Bertsekas Example 3.2 on pg. 23-25 in Copies 1b
extra exercise 1, 2a,b

8 Continuous time control
Hamilton-Jacobi-Bellman Equation
Pontryagin Minimum Principle
Stochastic differential equations
Stochastic optimal control
LQ examples, Portfolio management
Kappen ICML tutorial 1.3, 1.4
slides up to 69 extra exercise 2a,b

9 Path integral control theory
Kappen ICML tutorial 1.5, 1.6, 1.7
slides up to 93 extra exercise 2c, 3

Lecture	Topic	Material	Exercises
1	Probability, entropy and inference	MK 2,3	MK 2.4, 2.6 + continued, 2.7, 2.8 to be discussed in the class. Exercises: MK 2.10, 2.14, 2.16ab, 2.18, 2.19, 2.26 MK 3.3, 3.4, 3.15 to be discussed in the class. Exercises: MK 3.1, 3.2, 3.5, 3.6, (3.7 Bonus), 3.8, 3.9
2	Model comparison and Occam's raisor Graphical models	MK 28 Bishop Chapter 8 [slides] Skip 8.1.3	Exercises: MK 28.1, 28.2 only for model H_1, (28.3 Bonus) 8.1, 8.14 to be discussed in the class. Exercises: 8.3, 8.4, 8.7, 8.10, 8.11, 8.15
3	Approximate inference	cvm_sheets nips paper	Run Belief Propagation for a discrete optimization problem see: opgave Ising model. NB: Download LIBDAI from: LIBDAI.
4	Networks of binary neurons Markov processes Boltzmann Machines Mean field approximation Linear response approximation	sheets Boltzmann Machines reader chapter 3, 4	reader chapter 3 exercise 2 reader chapter 4 exercises 1, 2
5	Monte Carlo Methods (2), HMC Bayesian inference with perceptron	MK 29.1-29.5 MK 29.6,29.9 ,30.1, 30.3 MK 38, 39, 41 slides	Exercise to compare MCMC with Belief Propagation discrete optimization problem see: opgave Ising model. NB: Download LIBDAI from: LIBDAI. An example of Baysian inference in perceptron learning using MCMC methods. The files (Matlabfiles and instructions) needed to do this exercise can be found here: [mcmc_mackay.tar.gz].
6	Exercises and recap sofar		MK 2.10, MK 3.15 3.8 3.9 Example message passing in chain and loopy graph Example sequential dynamics Detailed balance, MF examples
7	Discrete time control dynamic programming Bellman equation	Bertsekas 2-5, 13-14, 18, 21-32 (2nd ed.) Bertsekas 2-5, 10-12, 16-27, 30-32 (1nd ed.) Kappen ICML tutorial 1.2 slides up to 28	Ex: Carry out the calculations needed to verify that J0(1)=2.7 and J0(2)=2.818 in Bertsekas Example 3.2 on pg. 23-25 in Copies 1b extra exercise 1, 2a,b
8	Continuous time control Hamilton-Jacobi-Bellman Equation Pontryagin Minimum Principle Stochastic differential equations Stochastic optimal control LQ examples, Portfolio management	Kappen ICML tutorial 1.3, 1.4 slides up to 69	extra exercise 2a,b
9	Path integral control theory	Kappen ICML tutorial 1.5, 1.6, 1.7 slides up to 93	extra exercise 2c, 3

If time permits:

Lecture Topic Material Exercises

11 Path integral control theory
MC Sampling solution
Numerical examples (particle in a box, N joint arm, Robot learning)
Kappen ICML tutorial 1.7
slides up to 127 extra exercise 4,5 Matlab code for n joint problem
Here is a directory of matlab files, which allows you to run and inspect the variational approximation for the n joint stochastic control problem as discussed in the tutorial text section 1.6.7. Type tar xvf njoints.tar to unpack the directory and simply run file1.m. In file1.m you can select demo1 (3 joint arm) or demo2 (10 joint arm). You can also try larger n but be sure to adjust eta for the smoothing of the variational fixed point equations. You can compare the results with exact cmputation (only recommendable for 2 joints) by setting METHOD='exact'. There is also an implementation of importance sampling (does not work very well) and Metropolis Hastings sampling (works nice, but not as stable as the variational approximation).

12 Lasso lasso slides
Sparse regression computer exercise
Derive the sequential Gauss-Seidel update rule from lasso slides Eq. 1.
Write your own Lasso method using coordinate descent.
Test your algorithm on data set 1 lasso data Reproduce a figure similar to slide 17. Find the optimal value of gamma by cross validation. Compare the Lasso result with ridge regression (with optimized ridge regression parameter found by cross validation).
Consider the example of correlated inputs on slide 21. Reproduce these results with your software using data generated by correlated_data.m. Compute the input output correlations b_i and use this to explain the observed phenomenon.
Write a brief report on your findings and include your source code.

13 Spike and slab
Variational Garrote L0 slides
George and McCulloch 1993
Kappen 2011

6 Ising model MK 31 MK 31.1, 31.3

8a Attractor neural networks sheets attractors

5 Perceptrons DA 8.4
sheets supervised 1
sheets supervised 2 DA 8.8, 8.9

Lecture	Topic	Material	Exercises
11	Path integral control theory MC Sampling solution Numerical examples (particle in a box, N joint arm, Robot learning)	Kappen ICML tutorial 1.7 slides up to 127	extra exercise 4,5 Matlab code for n joint problem Here is a directory of matlab files, which allows you to run and inspect the variational approximation for the n joint stochastic control problem as discussed in the tutorial text section 1.6.7. Type tar xvf njoints.tar to unpack the directory and simply run file1.m. In file1.m you can select demo1 (3 joint arm) or demo2 (10 joint arm). You can also try larger n but be sure to adjust eta for the smoothing of the variational fixed point equations. You can compare the results with exact cmputation (only recommendable for 2 joints) by setting METHOD='exact'. There is also an implementation of importance sampling (does not work very well) and Metropolis Hastings sampling (works nice, but not as stable as the variational approximation).
12	Lasso	lasso slides	Sparse regression computer exercise Derive the sequential Gauss-Seidel update rule from lasso slides Eq. 1. Write your own Lasso method using coordinate descent. Test your algorithm on data set 1 lasso data Reproduce a figure similar to slide 17. Find the optimal value of gamma by cross validation. Compare the Lasso result with ridge regression (with optimized ridge regression parameter found by cross validation). Consider the example of correlated inputs on slide 21. Reproduce these results with your software using data generated by correlated_data.m. Compute the input output correlations b_i and use this to explain the observed phenomenon. Write a brief report on your findings and include your source code.
13	Spike and slab Variational Garrote	L0 slides George and McCulloch 1993 Kappen 2011
6	Ising model	MK 31	MK 31.1, 31.3
8a	Attractor neural networks	sheets attractors
5	Perceptrons	DA 8.4 sheets supervised 1 sheets supervised 2	DA 8.8, 8.9

Examination: