- One-semester course (6 Euro points)
- Form: Lectures, student presentations, excercises
- For: Ma and PhD students
- When: Wednesdays, 8.45-10:30, starting 3 September, see below.
- Lecturers: Bert Kappen en Wim Wiegerinck

The course provides an advanced introduction to the modern view on machine learning with emphasis on the Bayesian perspective. The course on machine learning is held every autumn en replaces the course on neurophysics. The course is intended for Master's students in physics as well as AI/computer science students with sufficient mathematical background. For AI/computer science students it is highly recommended to to take the course ``Introduction to Pattern Recognition'' prior to this course.

**Course material:**
The course is mainly based on chapters of
Information Theory, Inference and Learning Algorithms from David
MacKay. This book can be downloaded from MacKay's web site.

**Format:** The course will be weekly sessions, consisting of
student presentations (beamer and laptop will be available) and making exercises. All students are expected to have read the chapters and have worked on the excercises of that week.

**Place:** Unless stated otherwise: colloquium room biophysics.
In Geert Grooteplein 21 (prekliniek) follow route 126. If you leave the corridor and go into the wing, go trhrough the doors and go left immediately.
See route description

**Time:**

Wednesdays 8:45 to 10:30 (starting with a kick-off meeting at sept 3)
Exercises (office-hours): Wednesdays 10:45 to 12:30

**Presentation schedule:**
Note that the schedule may change during the course. Detailed breakdown of the chapters to be presented will be discussed during the course.

Date | Topic | Chapter | Presenter | Exercises | ||

1 | 10 Sept | Probability, entropy and inference | 2 | Loek |
2.4, 2.6 + continued, 2.7, 2.8 to be discussed in the class.
Recommended further exercises: 2.10, 2.14, 2.16ab, 2.18, 2.19, 2.26 | |

2 | 17 Sept | More about inference | 3 | Stijn |
3.3, 3.4, 3.15 to be discussed in the class.
Recommended further exercises: 3.1, 3.2, 3.5, 3.7, 3.8, 3.9 | |

3 | 24 Sept | Exact inference in graphical models with discrete variables | Chapter 8 of Chris Bishops book
[pdf], until end of 8.3 Skip 8.1.3 |
Tanneke |
8.1, 8.10, 8.14 to be discussed in the class.
Recommended further exercises: 8.4, 8.7, 8.11, 8.13 See Solutions to WWW exercises | |

4 | 1 Oct | Inference in graphical models | Remaining part of the chapter, starting at 8.4 | Mohammad |
8.15, to be discussed in the class.
Recommended further exercises: 8.18, 8.27. Derive a message passing algorithm fior a chain of nodes (not a chain graph!) p(x_1, ..., x_n)=f_1(x_1,x_2)f_2(x_2,x_3)...f_n-1(x_n-1,x_n). See Solutions to WWW exercises | |

5 | 8 Oct | Mixtures of Gaussians, maximum likelihood and clustering | 21.2, 22 | Syed |
22.1, 22.2, 22.3, 22.4 to be discussed in the class. Recommended further exercises: 22.6, 22.8 | |

- | 15 Oct | Fall break | - | - | ||

6 | 22 Oct | Some probability distributions, exact marginalization and the Laplace approximation | 23, 24, 27 | Ali |
Show that Eq. 23.7 is true. Hint: consider the transformation from polar coordinates (u1,u2) to carthesian coordinates (z,x) in two dimensions
with 2 pi u1 the angle and sqrt(2ln (1/u2)) the radius. Use furthermore, the relation between probability distribution between transformed variables: if y=f(x) and p_x(x) is a probabability distribution over x, then p_y(y)dy=p_x (x) dx. In more dimensions this becomes p_y(y)=p_x(x(y))det(dx/dy), with dx/dy the Jacobian matrix with elements dx_i/dy_j and det the determinant.
Ex. 24.2 | |

7 | 29 Oct | Model comparison and Occam's raisor + MLPs | 28 + 44 | Benjamin | ||

8 | 5 Nov | Monte Carlo Methods (1) | 29.1-29.5 | Pavol | Computer exercise | |

9 | 12 Nov | Monte Carlo Methods (2), HMC | 29.6 ,30.1-3, 30.6-7 | Wendy | Computer exercise | |

10 | 19 Nov | Ising model | 31 | Bob | ||

11 | 26 Nov | ICA | 34 | Vacancy | ||

12 | 3 Dec | Variational Methods | 33 | Henk | ||

13 | 10 Dec | TBA | TBA | TBA |

**Computer exercise:**

An example of Baysian inference in perceptron
learning using MCMC methods. The files (Matlabfiles and instructions) needed to do this exercise
can be found here: [mcmc_mackay.tar.gz]. It is recommended to do this exercise while studying the material of chapters 29 and 30. The results should be handed in before december 20.

**Note on Bishop fig. 8.34:**

An undirected graph G is an I-map of a probability distribution P if
all conditional independent statements that are true in G are also true in P. The
trivial example
is the complete graph, which has no true conditional independence
statements.
The minimal I-map is the smallest I-map G and is called a Markov
network.

A directed acyclic graph G is an I-map of a probability distribution P if every d-separation condition true in G is also true in P. The trivial example is the complete graph, which has no true d-separation statements. The minimal I-map is the smallest I-map G and is called a Bayesian network (Pearl 1988).

An undirected graph G is a D-map of a probability distribution P if all conditional independent statements that are true in P are also true in G. The trivial example is the empty graph (no links), in which all conditional independence statements are true.

A directed graph G is a D-map of a probability distribution P if all conditional independent statements that are true in P correspond to true d-separation statements in G. The trivial example is the empty graph (no links), in which all d-separation statements are true.

G is an undirected or directed perfect map of P if it is both an I-map and a D-map.

The sets D and U illustrated in Bishop fig. 8.34 are the directed and undirected perfect maps and are proper subsets of all probability distributions.