Statistical Learning Theory and Sequential Prediction
Time & Location: MW 3:00 - 4:30, Room G90 JMHH
Instructors: Alexander Rakhlin, Karthik Sridharan
This course will focus on theoretical aspects of Statistical Learning and Sequential Prediction. In the first part of the course, we will analyze learning with i.i.d. data using classical tools: concentration inequalities, random averages, covering numbers, and combinatorial parameters (VC dimension and the scale-sensitive dimension). We then focus on prediction of individual sequences and develop many of the same tools for learning in this scenario. The latter part is based on recent research and offers many directions for further investigation. The minimax approach, which we emphasize throughout the course, offers a systematic way of comparing learning problems. Beyond the theoretical analysis, we will discuss learning algorithms and, in particular, an important connection between learning and optimization. Time permitting, we will make excursions into Information Theory and Game Theory, and show how our new tools seamlessly yield a number of interesting results.
Prerequisites: Probability Theory and Linear Algebra.
These lecture notes
are constantly evolving, so if your version says x<y today, it might say x>y tomorrow.
- Introduction. Overview of Problems in Learning, Estimation, Optimization
- Minimax Formulation
- Background Material: Stochastic Processes, Empirical Processes, Concentration and Deviation Inequalities
- Statistical Learning
- Empirical Risk Minimization, Uniform Glivenko-Cantelli classes, Vapnik-Chervonenkis Dimension, Growth Function
- Finite Class Lemma, Covering and Packing Numbers, Pollard's Bound
- Chaining for Subgaussian Processes, Symmetrization, Rademacher Averages, Dudley's Bound
- Combinatorial Dimensions, Vapnik-Chervonenkis-Sauer-Shelah Lemma, Lower Bounds
- Sequential Prediction and Decision Making
- Prediction with Expert Advice, Exponential Weights Algorithm, Proof of von Neumann's Minimax Theorem
- Sequential Minimax Theorem, Dual Representation of the Value, Martingale uGC, Finite Class Lemma
- Symmetrization, Sequential Rademacher, Majorization for Martingales
- Sequential Covering Numbers, Chaining, Dudley-type Bound
- Combinatorial Dimensions, Analogue of V-C-S-S Lemma, Learnability in Supervised Setting
- Algorithms for Non-Convex Problems: Halving for Finite Classes, SOA
- Algorithms for Convex Problems: Mirror Descent, Follow the Leader, Follow the Regularized Leader
- From Sequential to Statistical Learning: Relationship Between the Minimax Values
- Optimality of Mirror Descent. Type and M-Type of a Banach Space
- Model Selection and Oracle Inequalities in Statistics and Online Learning
- Logarithmic Loss: Stochastic and Deterministic Settings, Redundancy-Capacity Theorem
- Decision Theory for Individual Sequences: Beyond Regret
- Blackwell's Approachability: Two Proofs (Geometric and Minimax)
- Prequential Statistics, Calibration of Forecasters, Testing
- Algorithmic Stability
- Aggregation of Estimators
Statistical Learning Theory
Sequential Prediction / Online Learning
Books (not required):
van der Vaart and Wellner.
Weak Convergence and Empirical Processes: With Applications to Statistics.
Sara van de Geer.
Empirical Processes in M-Estimation.
N. Cesa-Bianchi and G. Lugosi.
Prediction, Learning, and Games.
Hiriart-Urruty and Lemaréchal.
Fundamentals of Convex Analysis.
Other relevant courses:
Peter Bartlett at UC Berkeley (also the 2006 version)
Sham Kakade and Ambuj Tewari at TTI, UChicago
Maxim Raginsky at Duke
Shai Shalev-Shwartz at Hebrew U.
Dmitry Panchenko at MIT