Probability Inequalities
and Machine Learning

Final Projects : Due Wednesday December 19 (Noon) You can send me your project reports by email.

End-of-Term Guest Lectures (RevCron Order)

Ricky Der "Large Margin Classifiers in Function Spaces" (link to Der and Lee "Large Margin Classifiers in Banach Spaces")

Abhishek Gupta "Unsupervised distance metric learning using predictability". Link to Abishek's paper with Dean Foster and Lyle Unger.

Alex Braunstein on "Boosting and Algorithm Tests for Guessing the Sign of Daily Stock Returns" (NB: Alex gets 60-40 bets using logistic regression bagged 20 or so times --- sweet!).

Mikhail Traskin on "Random Forests in Machine Learning." Mikhail has prepared a wonderful web page with his presentation and references to the classics --- eg Breiman's originals.

Blake McShane "STILL MORE evidence contrary to the statistical view of boosting"

Ed George: "George On BART"

Ed spoke about his paper with Chipman and McCulloch Bayesian Additive Regression Trees (BART)

Note: BART is function "bart" in the R-package Bayes Tree that runs under R. You can get it at http://www.r-project.org/ (click on "CRAN", pick a mirror and then click on "packages")

Exploration of BART in some application of interest to you would certainly make a nice final project!

More about BART including family bios.

ML and Financial Applications

Over the my next session or two I will go over some results and observations about the use of machine learning ideas in the design of investment strategies. Some of this is "known" and some is work in progress, but it should be amusing.

I'll build a resource page and keep adding items as the plan progresses.

Flash News: Job Listing and Interview this Friday (10/26)

I received mail from a former student Shobhit Verma who is now working for CCG Strategies Group which is the commercial quantitative group within Constellation Energy, the largest wholesale power trader in North America and one of the largest traders of natural gas. The candidates are requested to email there CVs to shobhit.verma@constellation.com as soon as possible to be considered for an interview this Friday October 26.

Current Topic: Combining Experts beating Bandits

Webber on the Multi-Arm Bandit Problem (to be distributed in class)

Cesa-Bianci and Lugosi

On Deck:

The course is at its half way mark, so I'd better start doing some reasonably honest machine learning, or I fear a mutiny.
Some Resources:
Burgess Tutorial on SVM
Collins, Shapire, and Singer (2001). ( Bregman distance as unifying idea)

News Items:

We won't deal with it explicitly in class, but you will surely enjoy looking at the new paper by Mease and Wyner, "Evidence Contrary to the Statistical View of Boosting" --- it is sure to become a citation classic.
We'll mention some of the literature on point separation. The field really is quite extensive, but there is a nice recent survey by Kaneko and Kano that gives the flavor of the enterprise.

Tentative List of Topics

Concentration Inequalities (Bounded Difference Martingale, Hoeffding, Bennett, McDiarmid, etc.)
"Efron-Stein" Bounds and "hold-out-one" Methods
Concentration: Applications in Learning Theory and Combinatorics
Talagrand Inequalities (via Entropy and via Log Sobolev)
Hypercontractivity and Tensorization of Bernoulli Random Variables
Vapnik-Chevonenkis Dimension and Empirical Process
Chaining Methods and Empirical Process
Lindeberg Methods --- an Instance of "Easy Chaining"
Gaussian Process, Slepian's Inequality, and OU Interpolation
Consistency Results in Machine Learning

Basic Resources

We'll use a few pieces of my SIAM monograph Probability Theory and Combinatorial Optimization but you won't have to buy a copy. I'll distribute appropriate versions of the relevant chapters.

One resource that we will use almost in it's entirety is Gabor Lugosi's ANU Lecture Notes.

Peter Bartlett's Berkeley CS 281B has a nice set of lecture notes. We will use at least some of these, and you might well want to look at them all. These notes are a nice example of what can be achieved with the "scribe" system that is so common in CS and EE. For some strange reason, this is hardly even used in statistics. The notes from Bartlett's course that are most useful for us are those from Lecture 12 through 27.

We will also use several parts of Pascal Massart's Monograph Concentration Inequalities and Model Selection, which is beautiful, 370 pages long, and free for the downloading.

We'll also us parts of the wonderful book Convex Optimization by Stephen Boyd and Lieven Vandenberghe. You can also access this whole book on line --- what good guys!

We won't be able to cover all of the details, but one of the motivations for what well will cover is to make accessible the paper AdaBoost is Consistent by Peter Bartlett and Mikhail Traskin

Finally, In the context of the "hold-out" techniques of probability inequalities, we will go over the nice paper An Error Bound in the Sudukov-Frenique Inequality by Sourov Chatterjee.

Other Relevant Texts

An Introduction to Computational Learning Theory (Michael J. Kearns and Umesh V. Vazirani)

Gaussian Processes for Machine Learning (Carl Edward Rasmussen and Christopher K. I. Williams)

Combinatorial Entropy and Uniform Limit Laws (J. Michael Steele)

The Nature of Statistical Learning Theory (Vladimir N. Vapnik)

In a Parallel Universe ---- Peter Bartlett also has a nice page of well-chosen papers that make a nice shopping list of papers for presentation, summary, or poster presentations.

General Considerations: This Is (Mostly) a Theory Course

This course is about the core theoretical tools and constructs of what is now understood to be the theory of machine learning. We will spend almost all of our time proving theorems.

Empirical testing is substantially left to other courses, but we will not completely ignore computational experience. Some exposure to the statistical language R may be helpful, but it is quickly learned by those with backgrounds in CS and engineering. Also, if you know Mathematica or Mathlab, you can do your computational work in those environments.

Here we focus mostly on what one can prove. We will use combinatorics, linear algebra, and probability theory as our main tools. Our problems will come from machine learning, but our ideas, methods, and energy will come from mathematics. Still, If you want to become a grandmaster of machine learning, you will need to learn this stuff eventually --- even if in your heart-of-hearts you are a die-hard empiricist.

But if you really like theory --- computational, probabilistic, combinatorial, or statistical --- then this course is a natural for you.

The course is band new, so of necessity, it is experimental. Still, there is a rich body of knowledge that fits under our umbrella. We are at liberty to pick and choose the results that we find to be the most deeply beautiful and the most consistently useful.

I hope this course will be useful for a reasonably big slice of the UPenn statistics and machine learning community. All the work is motivated by machine learning and contributes to the theory of machine learning, but participants should understand up-front that this mainly a probability course.

At least half of the course will be devoted to CONCENTRATION INEQUALITIES. These are relatively new inequalities --- most less than ten years old --- yet they have had pervasive impact on machine learning, combinatorics, and combinatorial optimization. If you are interested in machine learning, then sooner or later you will be very happy to have a serious understanding of concentration inequalities.

The course assumes that participants have an active familiarity with probability theory at the level of Stat 530 (or at least OPIM 930). The course will not make sense to you if you do not have an appropriate background in probability.

Deliverables?

I'm flexible. Let's see how many people show up and see what their backgrounds are. In theory, I would like the course to be accessible to people with a reasonably diverse set of backgrounds, so I can't be absolutely hard nosed about homework, midterms, etc.

Still, I am not yet of the school where it is enough just to get together and sing camp songs. Here, to close the loop, you will have to show that you have honestly mastered a serious chunk of the material of the course.

Lame Course Web Page?

Hey, it's functional, and it's temporary. In due course this page may be replaced by a decent two-column CSS design, but it's not clear that the "more esthetic" display is worth even the modest increment to the maintenance overhead. We'll see.

Back to Steele's Home Page and List of Courses

Probability Inequalities and Machine Learning