and Machine Learning

Ed spoke about his paper with Chipman and McCulloch

Bayesian Additive Regression Trees (BART)

Note:BART is function "bart" in the R-package Bayes Treethat runs under R. You can get it at http://www.r-project.org/ (click on "CRAN", pick a mirror and then click on "packages")

Exploration of BART in some application of interest to you would certainly make a nice final project!More about BART including family bios.

Over the my next session or two I will go over some results and observations about the use of machine learning ideas in the design of investment strategies. Some of this is "known" and some is work in progress, but it should be amusing.

I'll build a resource page and keep adding items as the plan progresses.

I received mail from a former student Shobhit Verma who is now working for CCG Strategies Group which is the commercial quantitative group within Constellation Energy, the largest wholesale power trader in North America and one of the largest traders of natural gas. The candidates are requested to email there CVs to **shobhit.verma@constellation.com** as soon as possible to be considered for an interview this Friday October 26.

**Webber on the Multi-Arm Bandit Problem (to be distributed in class) **

**The course is at its half way mark, so I'd better start doing some reasonably honest machine learning, or I fear a mutiny.**- Some Resources:
- Burgess Tutorial on SVM
- Collins, Shapire, and Singer (2001). ( Bregman distance as unifying idea)

- We won't deal with it explicitly in class, but you will surely enjoy looking at the new paper by Mease and Wyner, "Evidence Contrary to the Statistical View of Boosting" --- it is sure to become a citation classic.
- We'll mention some of the literature on point separation. The field really is quite extensive, but there is a nice recent survey by Kaneko and Kano that gives the flavor of the enterprise.

**Concentration Inequalities (Bounded Difference Martingale, Hoeffding, Bennett, McDiarmid, etc.)****"Efron-Stein" Bounds and "hold-out-one" Methods****Concentration: Applications in Learning Theory and Combinatorics****Talagrand Inequalities (via Entropy and via Log Sobolev)****Hypercontractivity and Tensorization of Bernoulli Random Variables****Vapnik-Chevonenkis Dimension and Empirical Process****Chaining Methods and Empirical Process****Lindeberg Methods --- an Instance of "Easy Chaining"****Gaussian Process, Slepian's Inequality, and OU Interpolation****Consistency Results in Machine Learning**

We'll use a few pieces of my SIAM monograph

Probability Theory and Combinatorial Optimizationbut you won't have to buy a copy.I'll distribute appropriate versions of the relevant chapters.One resource that we will

use almost in it's entiretyisGabor Lugosi's ANU Lecture Notes.

Peter Bartlett's Berkeley CS 281B has a nice set of lecture notes. We will useat least some of these, and you might well want to look at them all. These notes are a nice example of what can be achieved with the "scribe" system that is so common in CS and EE. For some strange reason, this is hardly even used in statistics. The notes from Bartlett's course that are most useful for us are those from Lecture 12 through 27.We will also use several parts of Pascal Massart's Monograph

which is beautiful, 370 pages long, andConcentration Inequalities and Model Selection,free for the downloading.We'll also us parts of the wonderful book

Convex Optimization by Stephen Boyd and Lieven Vandenberghe. You can also access this whole book on line --- what good guys!We won't be able to cover all of the details, but one of the motivations for what well will cover is to make accessible the paper AdaBoost is Consistent by Peter Bartlett and Mikhail Traskin

Finally, In the context of the "hold-out" techniques of probability inequalities, we will go over the nice paper An Error Bound in the Sudukov-Frenique Inequality by Sourov Chatterjee.

## Other Relevant Texts

An Introduction to Computational Learning Theory (Michael J. Kearns and Umesh V. Vazirani)Gaussian Processes for Machine Learning (Carl Edward Rasmussen and Christopher K. I. Williams)Combinatorial Entropy and Uniform Limit Laws (J. Michael Steele)The Nature of Statistical Learning Theory (Vladimir N. Vapnik)- In a
Parallel Universe---- Peter Bartlett also has a nice page of well-chosen papers that make a nice shopping list of papers for presentation, summary, or poster presentations.

This course is about the

core theoretical tools and constructsof what is now understood to be the theory of machine learning. We willspend almost all of our time proving theorems.Empirical testing is substantially left to other courses, but we will not completely ignore computational experience. Some exposure to the statistical language R may be helpful, but it is quickly learned by those with backgrounds in CS and engineering. Also, if you know Mathematica or Mathlab, you can do your computational work in those environments.

Here we focus mostly on what one can prove.

We will use combinatorics, linear algebra, and probability theory as our main tools.Our problems will come from machine learning, but our ideas, methods, and energy will come from mathematics. Still, If you want to become a grandmaster of machine learning, you will need to learn this stuff eventually --- even if in your heart-of-hearts you are a die-hard empiricist.

But if you really like theory --- computational, probabilistic, combinatorial, or statistical --- then this course is a natural for you.The course is band new, so of necessity, it is experimental. Still, there is a rich body of knowledge that fits under our umbrella. We are at liberty to pick and choose the results that we find to be

the most deeply beautiful and the most consistently useful.I hope this course will be useful for a reasonably big slice of the UPenn statistics and machine learning community. All the work is motivated by machine learning and contributes to the theory of machine learning, but participants should understand up-front that this mainly a

probability course.At least half of the course will be devoted to CONCENTRATION INEQUALITIES. These are relatively new inequalities --- most less than ten years old --- yet they have had pervasive impact on machine learning, combinatorics, and combinatorial optimization. If you are interested in machine learning, then sooner or later you will be very happy to have a serious understanding of concentration inequalities.

The course assumes that participants have an active familiarity with probability theory at the level of Stat 530 (or at least OPIM 930).The course will not make sense to you if you do not have an appropriate background in probability.

I'm flexible.Let's see how many people show up and see what their backgrounds are. In theory, I would like the course to be accessible to people with a reasonably diverse set of backgrounds, so I can't be absolutely hard nosed about homework, midterms, etc.

Still, I am not yet of the school where it is enough just to get together and sing camp songs.Here, to close the loop, you will have to show that you have honestly mastered a serious chunk of the material of the course.

Hey, it's functional, and it's temporary.In due course this page may be replaced by a decent two-column CSS design, but it's not clear that the "more esthetic" display is worth even the modest increment to the maintenance overhead. We'll see.## Back to Steele's Home Page and List of Courses