Statistical Learning Theory and Sequential Prediction

**Prerequisites:** Probability Theory and Linear Algebra.

The "Algorithms" section will go through an overhaul this semester.

- Introduction. Overview of Problems in Learning, Estimation, Optimization
- Minimax Formulation
- Background Material: Stochastic Processes, Empirical Processes, Concentration and Deviation Inequalities
- Statistical Learning
- Empirical Risk Minimization, Uniform Glivenko-Cantelli classes, Vapnik-Chervonenkis Dimension, Growth Function
- Finite Class Lemma, Covering and Packing Numbers, Pollard's Bound
- Chaining for Subgaussian Processes, Symmetrization, Rademacher Averages, Dudley's Bound
- Combinatorial Dimensions, Vapnik-Chervonenkis-Sauer-Shelah Lemma, Lower Bounds
- Sequential Prediction and Decision Making
- Prediction with Expert Advice, Exponential Weights Algorithm, Proof of von Neumann's Minimax Theorem
- Sequential Minimax Theorem, Dual Representation of the Value, Martingale uGC, Finite Class Lemma
- Symmetrization, Sequential Rademacher, Majorization for Martingales
- Sequential Covering Numbers, Chaining, Dudley-type Bound
- Combinatorial Dimensions, Analogue of V-C-S-S Lemma, Learnability in Supervised Setting
- Algorithms for Non-Convex Problems: Halving for Finite Classes, SOA
- Algorithms for Convex Problems: Mirror Descent, Follow the Leader, Follow the Regularized Leader
- From Sequential to Statistical Learning: Relationship Between the Minimax Values
- Optimality of Mirror Descent. Type and M-Type of a Banach Space
- Model Selection and Oracle Inequalities in Statistics and Online Learning
- Logarithmic Loss: Stochastic and Deterministic Settings, Redundancy-Capacity Theorem
- Decision Theory for Individual Sequences: Beyond Regret
- Blackwell's Approachability: Two Proofs (Geometric and Minimax)
- Prequential Statistics, Calibration of Forecasters, Testing
- Algorithmic Stability
- Aggregation of Estimators

#### Articles:

- Gábor Lugosi. Concentration-of-Measure Inequalities.

- Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning, 2003: 169-207
- Stéphane Boucheron, Olivier Bousquet, Gábor Lugosi. Theory of Classification: A Survey of Some Recent Advances. ESAIM: Probability and Statistics, 2005.

- Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 2012.

- Lecture Notes on Online Convex Optimization. UC Berkeley, 2008.

#### Books (not required):

- van der Vaart and Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics.
- Sara van de Geer. Empirical Processes in M-Estimation.
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games.
- Hiriart-Urruty and Lemaréchal. Fundamentals of Convex Analysis.