Machine Learning - autumn 2008

Course information

Please contact the lecturers if you want to take this course.

The course provides an advanced introduction to the modern view on machine learning with emphasis on the Bayesian perspective. The course on machine learning is held every autumn en replaces the course on neurophysics. The course is intended for Master's students in physics as well as AI/computer science students with sufficient mathematical background. For AI/computer science students it is highly recommended to to take the course ``Introduction to Pattern Recognition'' prior to this course.

Course material: The course is mainly based on chapters of Information Theory, Inference and Learning Algorithms from David MacKay. This book can be downloaded from MacKay's web site.

Format: The course will be weekly sessions, consisting of student presentations (beamer and laptop will be available) and making exercises. All students are expected to have read the chapters and have worked on the excercises of that week.

Place: Unless stated otherwise: colloquium room biophysics. In Geert Grooteplein 21 (prekliniek) follow route 126. If you leave the corridor and go into the wing, go trhrough the doors and go left immediately. See route description

Time:
Wednesdays 8:45 to 10:30 (starting with a kick-off meeting at sept 3) Exercises (office-hours): Wednesdays 10:45 to 12:30

Presentation schedule: Note that the schedule may change during the course. Detailed breakdown of the chapters to be presented will be discussed during the course.

Date Topic Chapter Presenter Exercises
1 10 Sept Probability, entropy and inference 2 Loek 2.4, 2.6 + continued, 2.7, 2.8 to be discussed in the class.
Recommended further exercises: 2.10, 2.14, 2.16ab, 2.18, 2.19, 2.26
2 17 Sept More about inference 3 Stijn 3.3, 3.4, 3.15 to be discussed in the class.
Recommended further exercises: 3.1, 3.2, 3.5, 3.7, 3.8, 3.9
3 24 Sept Exact inference in graphical models with discrete variables Chapter 8 of Chris Bishops book [pdf], until end of 8.3
Skip 8.1.3
Tanneke 8.1, 8.10, 8.14 to be discussed in the class.
Recommended further exercises: 8.4, 8.7, 8.11, 8.13
See Solutions to WWW exercises
4 1 Oct Inference in graphical models Remaining part of the chapter, starting at 8.4 Mohammad 8.15, to be discussed in the class.
Recommended further exercises: 8.18, 8.27. Derive a message passing algorithm fior a chain of nodes (not a chain graph!) p(x_1, ..., x_n)=f_1(x_1,x_2)f_2(x_2,x_3)...f_n-1(x_n-1,x_n).
See Solutions to WWW exercises
5 8 Oct Mixtures of Gaussians, maximum likelihood and clustering 21.2, 22 Syed 22.1, 22.2, 22.3, 22.4 to be discussed in the class.
Recommended further exercises: 22.6, 22.8
- 15 Oct Fall break - -
6 22 Oct Some probability distributions, exact marginalization and the Laplace approximation 23, 24, 27 Ali Show that Eq. 23.7 is true. Hint: consider the transformation from polar coordinates (u1,u2) to carthesian coordinates (z,x) in two dimensions with 2 pi u1 the angle and sqrt(2ln (1/u2)) the radius. Use furthermore, the relation between probability distribution between transformed variables: if y=f(x) and p_x(x) is a probabability distribution over x, then p_y(y)dy=p_x (x) dx. In more dimensions this becomes p_y(y)=p_x(x(y))det(dx/dy), with dx/dy the Jacobian matrix with elements dx_i/dy_j and det the determinant.
Ex. 24.2
7 29 Oct Model comparison and Occam's raisor + MLPs 28 + 44 Benjamin
8 5 Nov Monte Carlo Methods (1) 29.1-29.5 Pavol Computer exercise
9 12 Nov Monte Carlo Methods (2), HMC 29.6 ,30.1-3, 30.6-7 Wendy Computer exercise
10 19 Nov Ising model 31 Bob
11 26 Nov ICA 34 Vacancy
12 3 Dec Variational Methods 33 Henk
13 10 Dec TBA TBA TBA


Computer exercise:
An example of Baysian inference in perceptron learning using MCMC methods. The files (Matlabfiles and instructions) needed to do this exercise can be found here:
[mcmc_mackay.tar.gz]. It is recommended to do this exercise while studying the material of chapters 29 and 30. The results should be handed in before december 20.


Note on Bishop fig. 8.34:
An undirected graph G is an I-map of a probability distribution P if all conditional independent statements that are true in G are also true in P. The trivial example is the complete graph, which has no true conditional independence statements. The minimal I-map is the smallest I-map G and is called a Markov network.

A directed acyclic graph G is an I-map of a probability distribution P if every d-separation condition true in G is also true in P. The trivial example is the complete graph, which has no true d-separation statements. The minimal I-map is the smallest I-map G and is called a Bayesian network (Pearl 1988).

An undirected graph G is a D-map of a probability distribution P if all conditional independent statements that are true in P are also true in G. The trivial example is the empty graph (no links), in which all conditional independence statements are true.

A directed graph G is a D-map of a probability distribution P if all conditional independent statements that are true in P correspond to true d-separation statements in G. The trivial example is the empty graph (no links), in which all d-separation statements are true.

G is an undirected or directed perfect map of P if it is both an I-map and a D-map.

The sets D and U illustrated in Bishop fig. 8.34 are the directed and undirected perfect maps and are proper subsets of all probability distributions.