Advanced machine learning - autumn 2024

Course information

Half semester course (6 ec).
Format: Lectures, excercises, computer exercises, student presentations
For: Master and PhD students in physics or mathematics
Teacher lectures: Bert Kappen
Teacher Practicals: Eduardo Dominguez and Aaron Villanueva and Peyman Najafi.

This course covers advanced topics in machine learning. The course is intended for Master's students in physics and mathematics. This course is the follow-up of the course Machine Learning

Format: The course will be weekly sessions. Emphasis is on learning the material through mathematical derivation and computer exercises.

Course material: The course uses the following material:

Information Theory, Inference and Learning Algorithms from David MacKay which can be downloaded free of charge (MK).
Bayesian Reasoning and Machine Learning from David Barber (BRML).
Introduction to the theory of neural computation. Hertz, Krogh and Palmer (HKP).
These notes
Here you find topics for the student presentations. In addition you may find interesting recent topics here
All material that is used during the lectures is summarized in these slides.pdf (NB these slides are frequently updated throughout the course)

Week Topic Material Exercises
1 45 Monte Carlo Methods (1)
Sampling means and variance, uniform sampling, sampling from multi variate Gaussian, importance sampling, rejection sampling MK chapter 29.1-29.5
BRML chapter 27 BRML exercise 27.1. Box-Muller method (3)
MK exercise 29.3. Show diffusion scales as sqrt(T) (1)
MK exercise 29.13. Importance sampling of one Gaussian with another (only the computer simulations, reproducing figs 29.20) (3)
2 46 Monte Carlo Methods (2)
Markov processes, ergodicity, Metropolis Hasting algorithm, Gibbs sampling, Hamilton Monte Carlo MK chapter 29.6
MK chapter 30.1, 30.3
handouts chapter 1
MK exercise 29.15. Gibbs sampling of posterior over mu, sigma given data (5). Hint: It is recommended to sample beta = 1/sigma^2 rather than sigma^2. But be aware that such a transformation affects the prior that you assume. For instance, if you assume a flat prior over sigma^2, this transforms to a non-flat prior over beta. For this exercise choose the prior over beta as 1/beta. This choice corresponds to a so-called non-informative prior that is flat in the log(sigma) domain. See also slides lecture 3 where we consider the variational approximation for this problem. (5)
MCMC exercises (10)

3 47 The Ising model
Phase transitions, critical slowing down, frustration, transfer matrix method
discrete optimization with simulated annealing. MK chapter 31
Further reading: HKP appendix A,
Sandvik 2018 section 5
SoKal 1999: Critical slowing down
Aarts and Korst, Simulated Annealing and Boltzmann Machines 1989 MK exercise 31.1. Relation entropy and free energy (2)
Simulated annealing exercise on spin glass (10)
4 48 Deterministic approximate inference for Bayesian posterior
Laplace approximation
Variational approximation MK chapter 33.1, 33.4, 33.5
Further reading: Barber Bishop 1998, Ensemble Learning in Bayesian Neural Networks Consider again the perceptron learning problem of Mackay chapter 39 and 41, for which we computed the posterior by sampling in week 2. This time, compute p(t=1|x,D,alpha) using the Laplace approximation and reproduce Mackay figure 41.11b.(7)

5 49 Deterministic approximate inference for the Ising model
Mean field approximation
Linear response correction, TAP
SK model
Belief propagation
MK 33.2, 33.3, BRML chapter 28.7
Further reading: Kappen, Spanjers, Mean field theory for asymmetric neural networks (1999), Physical Review E, 61:56585663. MF and BP in Ising model exercise (10)
6 50 Deterministic approximate inference for the Ising model
Convergence of BP, Factor graph version of BP, max-product BP, Applications of BP for compressed sensing and clustering
Further reading: Mooij, Kappen. Sufficient conditions for convergence of the sum-product algorithm (2007). IEEE Information Theory, 53:44224437
7 51 The statistical physics approach to machine learning
The replica symmetric solution for the SK model
The cavity method
Analysis of compressed sensing and random satisfiability using replica method and message passing algorithms
Further reading: Sherrington, D. and Kirkpatrick, S. (1975). Solvable model of Spin-Glass. Physical review letters, 35:17921796.
Kappen, H.J. An introduction to stochastic neural networks, in: Handbook of biological physics 2001, 517-552. pdf Reproduce the phase plot of the SK model in the replica symmetric approximation replica_SK.pdf (7)
8 5 The Boltzmann Machine
Quantum machine learning, Quantum Boltzmann machine MK chapter 43
handouts chapter 1-2 Boltzmann Machine Learning (10)
Here are the salamander retina data.
9 6 Control Theory
Quantum computing
control exercises 1,2a (5)

10 7 Transformers
Modern AI is rapidly changing society. Write a 1-2 page essay on what you think will be the implications of AI for our future. Your essay should express your personal point of view and based on rational arguments. Substantiate your views by reviewing existing literature, where you contrast different views. You may consider the following questions:

Do you believe that AI machines will be conscious one day? Give reasons why or why not.
What do you think are the implications of future AI machines for our society? Do these implications depend on whether these machines are conscious or not?
Suppose that one day a computer company announces that it has produced a conscious machine. Obviously, such a product would have a great impact on society and the statement that the machine is indeed conscious will be questioned by many people. How do you think our society should deal with this?
For your inspiration, I wrote a short essay. on this topic. You may agree with me or not, it does not matter. Sometimes, I think that the young generation may have a different opinion on this topic, and I would like to hear your arguments.
PS1: This assignment is in principle a group effort, ie. one essay per group. But if you have diverging views, you may also hand in your personal essay.
PS2: Please do not send me ChatGPT generated documents, because those I can generate myself. They will be discarted. (15)
11 week 8, 18 february Student presentations
Bas van Heumen, Donald Scheuring and Sjoerd Jansen. Model Comparison
12 week 9, 25 february Student presentations
Kelly Karremans Abhinav Ramireddy Isabel Stein , Neural data analysis and anaesthesia
Mirza Redzepagic, Marieke Van Vreeswijk en Vince Elter. Enhancing Variational Autoencoders: Addressing Latent Space Disentanglement, Generalization, and Prior Limitations
13 week 11, 11 march Student presentations
Stian, Simon, Ich'en. Pretraining neural networks
Timon Crouzen & Marko Bilobrk. Dimensionality reduction for tensors
13 week 11, 13 march Student presentations
Simon Reichert, Ayse Gul Yildiz, Yuri Weessies. Gaussian Processes
Florentin Seifert, Katherina Hemmo, David Ernstberger. Quantum optimal control

Examination:
There will be no final examination. The grade will be based entirely on the exercises and the student presentation of a research paper. You are expected to work in groups of 3 persons and you will be graded as a group. The final grade for each student is his group grade.

Each exercise counts for a number of points, indicated between brackets. The total number of points for the exercises is 68. For each exercise, hand in the code that can be run stand-alone. For the large (10 point) exercises, write a report:

the report should be well structured and clear and no longer than 8 pages. It summarizes the relevant theory and algorithm(s) (brief!). It itemizes a number of research questions that you address It contains a detailed description of the different numerical studies that you have performed with plots and specification of the parameter settings.

Precise, clear description of what you have done, with the used formulas and the argumentation of why you have done it; e.g. what measure of convergence you used, are there alternatives?
Figures with a detailed explanation of what the relevant results are
An analysis of the results (for example: is the result as expected? why?/why not? what does the result mean? etc.
Figures need to have caption explaining in detail what they show
It contains a summary of the main findings and conclusions.

You get points if you notice things: If something is different than you expected, first check if you didnt make an error. If the oddity remains, indicate it and try to explain what and why it happens.

Hand in the report and code digitally as a zipped directory to the TA

All assignments of lectures 1-5 should be handed in before the end of January 2024. The assignments of lecture 6-8 should be handed in before end March 2024.

In addition to the exercises, your group should prepare a 45 minute presentation on a modern topic in machine learning. There are several suggestions for student presentations but your group can also suggest another topic, such as for instance from the JSTAT ML special issues mentioned above. The topic of your group presentation, a list of the articles that you will be presenting, and preferred time slot should be sent to me before the end of the year. The student presentation counts for 20 points.

	Week	Topic	Material	Exercises
1	45	Monte Carlo Methods (1) Sampling means and variance, uniform sampling, sampling from multi variate Gaussian, importance sampling, rejection sampling	MK chapter 29.1-29.5 BRML chapter 27	BRML exercise 27.1. Box-Muller method (3) MK exercise 29.3. Show diffusion scales as sqrt(T) (1) MK exercise 29.13. Importance sampling of one Gaussian with another (only the computer simulations, reproducing figs 29.20) (3)
2	46	Monte Carlo Methods (2) Markov processes, ergodicity, Metropolis Hasting algorithm, Gibbs sampling, Hamilton Monte Carlo	MK chapter 29.6 MK chapter 30.1, 30.3 handouts chapter 1	MK exercise 29.15. Gibbs sampling of posterior over mu, sigma given data (5). Hint: It is recommended to sample beta = 1/sigma^2 rather than sigma^2. But be aware that such a transformation affects the prior that you assume. For instance, if you assume a flat prior over sigma^2, this transforms to a non-flat prior over beta. For this exercise choose the prior over beta as 1/beta. This choice corresponds to a so-called non-informative prior that is flat in the log(sigma) domain. See also slides lecture 3 where we consider the variational approximation for this problem. (5) MCMC exercises (10)
3	47	The Ising model Phase transitions, critical slowing down, frustration, transfer matrix method discrete optimization with simulated annealing.	MK chapter 31 Further reading: HKP appendix A, Sandvik 2018 section 5 SoKal 1999: Critical slowing down Aarts and Korst, Simulated Annealing and Boltzmann Machines 1989	MK exercise 31.1. Relation entropy and free energy (2) Simulated annealing exercise on spin glass (10)
4	48	Deterministic approximate inference for Bayesian posterior Laplace approximation Variational approximation	MK chapter 33.1, 33.4, 33.5 Further reading: Barber Bishop 1998, Ensemble Learning in Bayesian Neural Networks	Consider again the perceptron learning problem of Mackay chapter 39 and 41, for which we computed the posterior by sampling in week 2. This time, compute p(t=1\|x,D,alpha) using the Laplace approximation and reproduce Mackay figure 41.11b.(7)
5	49	Deterministic approximate inference for the Ising model Mean field approximation Linear response correction, TAP SK model Belief propagation	MK 33.2, 33.3, BRML chapter 28.7 Further reading: Kappen, Spanjers, Mean field theory for asymmetric neural networks (1999), Physical Review E, 61:56585663.	MF and BP in Ising model exercise (10)
6	50	Deterministic approximate inference for the Ising model Convergence of BP, Factor graph version of BP, max-product BP, Applications of BP for compressed sensing and clustering	Further reading: Mooij, Kappen. Sufficient conditions for convergence of the sum-product algorithm (2007). IEEE Information Theory, 53:44224437
7	51	The statistical physics approach to machine learning The replica symmetric solution for the SK model The cavity method Analysis of compressed sensing and random satisfiability using replica method and message passing algorithms	Further reading: Sherrington, D. and Kirkpatrick, S. (1975). Solvable model of Spin-Glass. Physical review letters, 35:17921796. Kappen, H.J. An introduction to stochastic neural networks, in: Handbook of biological physics 2001, 517-552. pdf	Reproduce the phase plot of the SK model in the replica symmetric approximation replica_SK.pdf (7)
8	5	The Boltzmann Machine Quantum machine learning, Quantum Boltzmann machine	MK chapter 43 handouts chapter 1-2	Boltzmann Machine Learning (10) Here are the salamander retina data.
9	6	Control Theory Quantum computing		control exercises 1,2a (5)
10	7	Transformers		Modern AI is rapidly changing society. Write a 1-2 page essay on what you think will be the implications of AI for our future. Your essay should express your personal point of view and based on rational arguments. Substantiate your views by reviewing existing literature, where you contrast different views. You may consider the following questions: Do you believe that AI machines will be conscious one day? Give reasons why or why not. What do you think are the implications of future AI machines for our society? Do these implications depend on whether these machines are conscious or not? Suppose that one day a computer company announces that it has produced a conscious machine. Obviously, such a product would have a great impact on society and the statement that the machine is indeed conscious will be questioned by many people. How do you think our society should deal with this? For your inspiration, I wrote a short essay. on this topic. You may agree with me or not, it does not matter. Sometimes, I think that the young generation may have a different opinion on this topic, and I would like to hear your arguments. PS1: This assignment is in principle a group effort, ie. one essay per group. But if you have diverging views, you may also hand in your personal essay. PS2: Please do not send me ChatGPT generated documents, because those I can generate myself. They will be discarted. (15)
11	week 8, 18 february	Student presentations Bas van Heumen, Donald Scheuring and Sjoerd Jansen. Model Comparison
12	week 9, 25 february	Student presentations Kelly Karremans Abhinav Ramireddy Isabel Stein , Neural data analysis and anaesthesia Mirza Redzepagic, Marieke Van Vreeswijk en Vince Elter. Enhancing Variational Autoencoders: Addressing Latent Space Disentanglement, Generalization, and Prior Limitations
13	week 11, 11 march	Student presentations Stian, Simon, Ich'en. Pretraining neural networks Timon Crouzen & Marko Bilobrk. Dimensionality reduction for tensors
13	week 11, 13 march	Student presentations Simon Reichert, Ayse Gul Yildiz, Yuri Weessies. Gaussian Processes Florentin Seifert, Katherina Hemmo, David Ernstberger. Quantum optimal control