Probability Inequalities
and Machine Learning

Final Projects : Due Wednesday December 19 (Noon) You can send me your project reports by email.

End-of-Term Guest Lectures (RevCron Order)

Ricky Der "Large Margin Classifiers in Function Spaces" (link to Der and Lee "Large Margin Classifiers in Banach Spaces")

Abhishek Gupta "Unsupervised distance metric learning using predictability". Link to Abishek's paper with Dean Foster and Lyle Unger.

Alex Braunstein on "Boosting and Algorithm Tests for Guessing the Sign of Daily Stock Returns" (NB: Alex gets 60-40 bets using logistic regression bagged 20 or so times --- sweet!).

Mikhail Traskin on "Random Forests in Machine Learning." Mikhail has prepared a wonderful web page with his presentation and references to the classics --- eg Breiman's originals.

Blake McShane "STILL MORE evidence contrary to the statistical view of boosting"

Ed George: "George On BART"

Ed spoke about his paper with Chipman and McCulloch Bayesian Additive Regression Trees (BART)

Note: BART is function "bart" in the R-package Bayes Tree that runs under R. You can get it at http://www.r-project.org/  (click on "CRAN", pick a mirror and then click on "packages")

Exploration of BART in some application of interest to you would certainly make a nice final project!

More about BART including family bios.

ML and Financial Applications

Over the my next session or two I will go over some results and observations about the use of machine learning ideas in the design of investment strategies. Some of this is "known" and some is work in progress, but it should be amusing.

I'll build a resource page and keep adding items as the plan progresses.

Flash News: Job Listing and Interview this Friday (10/26)

I received mail from a former student Shobhit Verma who is now working for CCG Strategies Group which is the commercial quantitative group within Constellation Energy, the largest wholesale power trader in North America and one of the largest traders of natural gas. The candidates are requested to email there CVs to shobhit.verma@constellation.com as soon as possible to be considered for an interview this Friday October 26.

Current Topic: Combining Experts beating Bandits

Webber on the Multi-Arm Bandit Problem (to be distributed in class)

Cesa-Bianci and Lugosi

On Deck:

News Items:

Tentative List of Topics

Basic Resources

We'll use a few pieces of my SIAM monograph Probability Theory and Combinatorial Optimization but you won't have to buy a copy. I'll distribute appropriate versions of the relevant chapters.

One resource that we will use almost in it's entirety is Gabor Lugosi's ANU Lecture Notes.

Peter Bartlett's Berkeley CS 281B has a nice set of lecture notes. We will use at least some of these, and you might well want to look at them all. These notes are a nice example of what can be achieved with the "scribe" system that is so common in CS and EE. For some strange reason, this is hardly even used in statistics. The notes from Bartlett's course that are most useful for us are those from Lecture 12 through 27.

We will also use several parts of Pascal Massart's Monograph Concentration Inequalities and Model Selection, which is beautiful, 370 pages long, and free for the downloading.

We'll also us parts of the wonderful book Convex Optimization by Stephen Boyd and Lieven Vandenberghe. You can also access this whole book on line --- what good guys!

We won't be able to cover all of the details, but one of the motivations for what well will cover is to make accessible the paper AdaBoost is Consistent by Peter Bartlett and Mikhail Traskin

Finally, In the context of the "hold-out" techniques of probability inequalities, we will go over the nice paper An Error Bound in the Sudukov-Frenique Inequality by Sourov Chatterjee.

Other Relevant Texts

General Considerations: This Is (Mostly) a Theory Course

This course is about the core theoretical tools and constructs of what is now understood to be the theory of machine learning. We will spend almost all of our time proving theorems.

Empirical testing is substantially left to other courses, but we will not completely ignore computational experience. Some exposure to the statistical language R may be helpful, but it is quickly learned by those with backgrounds in CS and engineering. Also, if you know Mathematica or Mathlab, you can do your computational work in those environments.

Here we focus mostly on what one can prove. We will use combinatorics, linear algebra, and probability theory as our main tools. Our problems will come from machine learning, but our ideas, methods, and energy will come from mathematics. Still, If you want to become a grandmaster of machine learning, you will need to learn this stuff eventually --- even if in your heart-of-hearts you are a die-hard empiricist.

But if you really like theory --- computational, probabilistic, combinatorial, or statistical --- then this course is a natural for you.

The course is band new, so of necessity, it is experimental. Still, there is a rich body of knowledge that fits under our umbrella. We are at liberty to pick and choose the results that we find to be the most deeply beautiful and the most consistently useful.

I hope this course will be useful for a reasonably big slice of the UPenn statistics and machine learning community. All the work is motivated by machine learning and contributes to the theory of machine learning, but participants should understand up-front that this mainly a probability course.

At least half of the course will be devoted to CONCENTRATION INEQUALITIES. These are relatively new inequalities --- most less than ten years old --- yet they have had pervasive impact on machine learning, combinatorics, and combinatorial optimization. If you are interested in machine learning, then sooner or later you will be very happy to have a serious understanding of concentration inequalities.

The course assumes that participants have an active familiarity with probability theory at the level of Stat 530 (or at least OPIM 930). The course will not make sense to you if you do not have an appropriate background in probability.

Deliverables?

I'm flexible. Let's see how many people show up and see what their backgrounds are. In theory, I would like the course to be accessible to people with a reasonably diverse set of backgrounds, so I can't be absolutely hard nosed about homework, midterms, etc.

Still, I am not yet of the school where it is enough just to get together and sing camp songs. Here, to close the loop, you will have to show that you have honestly mastered a serious chunk of the material of the course.

Lame Course Web Page?

Hey, it's functional, and it's temporary. In due course this page may be replaced by a decent two-column CSS design, but it's not clear that the "more esthetic" display is worth even the modest increment to the maintenance overhead. We'll see.

Back to Steele's Home Page and List of Courses