Machine Learning - autumn 2018

Course information

The course provides an advanced introduction to machine learning. The course is intended for Master's students in physics as well as AI/computer science students with sufficient mathematical background. For AI/computer science students it is highly recommended to take the course Statistical machine learning prior to this course.
For physics and math students, this course is the follow-up of the bachelor course Inleiding Machine Learning

Course material: All machine learning material is summarized in these slides.pdf

  • Perceptrons, Multi-layered perceptrons. Chapters 5 and 6 from Hertz Krogh and Palmer summarized in handouts chapter 3
  • Sparse regression:
  • Bayesian inference:
  • Control lectures:

    Format: The course will be weekly sessions, mainly taught by me. Emphasis is on learning the material through written and computer exercises.

    Presentation schedule: Note that the schedule may change during the course. Detailed breakdown of the chapters to be presented will be discussed during the course.

    Exercises between brackets are important to understand and have solution in the book. They do not count towards the grade. Extra exercises 29,31 -->
    Week Topic Chapter MacKay/Material Weekly exercises Computer exercises (hand in before end of course)
    1 36 Supervised learning: Perceptrons
    Gradient methods
    handouts chapter 3 (HKP 5 and 6)
    handouts chapter 3, Ex. 2,3
    • Write a computer program that implements the perceptron learning rule. Take as data p random input vectors of dimension n with binary components. Take as outputs random assignments \pm 1. Take n=50 and test empirically that when p < 2 n the rule converges almost always and for p > 2n the rule converges almost never.
    • Reconstruct the curve C(p,n) as a function of p for n=50 in the following way. For each p construct a number of learning problems randomly and compute the fraction of these problems for which the perceptron learning rule converges. Plot this fraction versus p.
    Gradient descent exercise
    program template
    MNIST data
    2 37 Gradient methods
    MLPs, Deep networks
    Write a multi-layered perceptron learning algorithm to classify the MNIST problem. Consider both the two class problem of classifying the 3's versus the 7's and the 10 class problem to classify all digits. Optimize the architecture by varying the number of hidden units and hidden layers. For the two class problem, compare your results with the logistic regression problem. For the 10 class problem compare quality of the solution with results reported in the literature.
    3 38 Sparse regression.
    Sparse regression computer exercise
  • Write your own Lasso method using coordinate descent.
  • Test your algorithm on data set 1 lasso data Reproduce a figure similar to slide 51. Find the optimal value of gamma by cross validation. Compare the Lasso result with ridge regression (with optimized ridge regression parameter found by cross validation).
  • Consider the example of correlated inputs on slide 62. Reproduce these results with your software using data generated by correlated_data.m. Compute the input output correlations b_i and use this to explain the observed phenomenon. Write a brief report on your findings and include your source code.
  • 4 39 Probability, entropy and inference Chapter 2
    Exercises 2.4, 2.6 + continued, 2.7, 2.8 to be discussed in the class.

    Exercises: 2.10, 2.14, 2.16ab, 2.18, 2.19, 2.26
    5 40 More about inference
    Model comparison and Occam's raisor
    Chapter 3, Chapter 28
    Exercises: 3.3, 3.4, 3.15 to be discussed in the class.
    Exercises: 3.1, 3.2, 3.5, 3.6, (3.7 if you like) 3.8, 3.9
    Exercises: 28.1, 28.2 only for model H_1, (28.3 if you like)
    6 41 Monte Carlo Methods (1) 29.1-29.5 29.3
    The computer simulations of 29.13, reproducing figs 29.20
    29.15
    7 42 Markov processes, ergodicity
    Monte Carlo Methods (2), HMC
    MCMC for Perceptron posterior
    29.6,30.1, 30.3
    38,39,41
    An example of Baysian inference in perceptron learning using MCMC methods. The files (Matlabfiles and instructions) needed to do this exercise can be found here: [mcmc_mackay.tar.gz]. Exercise to compare simulated annealing with iterative improvement on Ising model see: simulated_annealing.zip
    43 no class
    44 no class
    8 45 Variational inference, Ising model, Boltzmann Machines Chapter 33 and 43
    handouts chapter 1-2
    handouts chapter 2 exercises 1a, 2, 3 Write a computer program to implement the Boltzmann machine learning rule as given on pg. 21 of chapter 2
    . Use N=10 neurons and generate random binary patterns. Use these data to compute the clamped statistics (x_i x_j)_c and (x_i)_c. Use K=200 learning steps. In each learning step use T=500 steps of sequential stochastic dynamics to compute the free statistics (x_i x_j) and (x_i). Test the convergence by plotting the size of the change in weights versus iteration.
    A much more efficient learning method can be obtained by using the mean field theory and the linear response correction. Build a classifier for the MNIST data based on the Boltzmann Machine as described in 2.5.1
    46 no class
    9 47 Variational inference for Bayesian posterior
    Clustering, Gaussian mixture and variational EM
    Chapter 21.2, 22.1, 23.3, 33.4-5
    Chapter 20,22.2,22.3, 33.7
    Write a computer algorithm that reproduces fig. 33.4
    Generalize the EM algorithm for the gaussian mixture problem where in each iteration also the paramters p_k are adapted (see slide 238 EM accompanying Ch. 33.7)
    10 48 Variational Garrote
    11 49 Discrete time control
    dynamic programming
    Bellman equation
    Bertsekas 2-5, 13-14, 18, 21-32 (2nd ed.)
    Bertsekas 2-5, 10-12, 16-27, 30-32 (1nd ed.)
    Kappen ICML tutorial 1.2
    Ex: Carry out the calculations needed to verify that J0(1)=2.7 and J0(2)=2.818 in Bertsekas Example 3.2 on pg. 23 in Copies 1b
    extra exercise 1, 2a,b
    12 50 Continuous time control
    Hamilton-Jacobi-Bellman Equation
    Pontryagin Minimum Principle
    Stochastic differential equations
    Stochastic optimal control
    LQ examples, Portfolio management
    Kappen ICML tutorial 1.3,1.4
    extra exercise 2c,3
    13 51 Path integral control theory
    Kappen ICML tutorial 1.6
    Thijssen, Kappen
    Kappen, Ruiz
    extra exercise 4 and 5
    14 2 Overview of research at SNN Machine learning
    15 3 Presentation computer exercises
    • Ising model:
    • Boltzmann Machines:
    • MLP:
    • sparse regression
    • control theory


    Examination:
    There will be no final examination. The grade will be based on take home computer exercises

    During one of the last lectures you will present your solution for one of these exercises. Hand in the code that can be run stand-alone. In addition write a report for each exercise See handleiding verslag (in Dutch) All should be handed in before the end of January 2019.