Supervision: Konstantinos Pitas

Project type: Semester project (master) Master thesis

Available

Flat minima are claimed to have links with good generalization in deep neural networks [1]. However such links to good generalization are motivated by simple empirical correlations. A main problem hindering more formal results is accurately estimating the curvature at points of the loss function of DNNs. The Hessian matrix of the loss with respect to the DNN parameters has dimensions dxd with d being the number of parameters. Thus this matrix is huge and various approximations are used to compute and store it [2][3].

In this project the student will test empirically and explore ways to accurately characterize the quality of various approximations to the Hessian matrix for small and moderate deep learning problems.

The student must be highly motivated and independent with good knowledge of Python, Tensorflow/Keras and/or Pytorch.

The project consists of 20% theory and 80% application and but has links to a number of interesting problems in deep neural network theory.

[1] Keskar, Nitish Shirish, et al. "On large-batch training for deep learning: Generalization gap and sharp minima." arXiv preprint arXiv:1609.04836 (2016).

[2] Martens, James, and Roger Grosse. "Optimizing neural networks with kronecker-factored approximate curvature." International conference on machine learning. 2015.

[3] Mishkin, Aaron, et al. "Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient." Advances in Neural Information Processing Systems. 2018.