... Topics in Probability and Optimization with a Focus on Optimal Stopping

## Course Blog Spring 2011

### Day 26: Last Class --- It's Celebration Time!

After a warm-up problem, we'll deal with our main business which is an improved lower bound on the influence function. The more general puchline is that hyper-contractivity can crush some estimates that you might otherwise have though to be best possible. There is plenty of juice left in this method and if you keep it clearly in mind, you may well pick some nice low-hanging fruit. I'll give you a proof that I think is a bit prettier than the one in Kalai and Safra, though the basic idea is the same.

We'll also take a moment to look back at the rather wide ranging set of topics that we have considered. I will revisit the "mind map" from Day One. To join in the celebration of the end of the course, please do take a moment to read back over the blog. It really has been a substantial voyage.

### Day 25: Influence, Thresholds and Concentration (part II)

We contained with the paper "Threshold Phenomena and Influence with some Perspectives fro Mathematics, Computer Science, and Economics" by Kalai and Safra. In particular, we developed the machinery of the Walsh Transform: (1) Parseval formula, (2) Influence in terms of Walsh coefficients, and (2) our lower bound on the influence using plain vanilla Parseval.

For our side-bars and warm-up we looked at some lovely formulas related to the Gaussian distribution. In particular, we considered a formula for the Hermite polynomials that makes many of the key facts about those formulas obvious. I also gave an example of the Wick product for four dependent Gaussians. This "embeds" the simple fact that a standard Gaussian has fourth moment 3 into a whole "manifold" of Gaussian identities.

### Day 24: Influence, Thresholds and Concentration

As I mentioned on Wednesday, there has been amazing progress in the probabilistic understanding of "threshold" phenomena. Roughly, these are modeling situations where at some level of a parameter p there is a sudden shift ---- all of a sudden things that were rare become common. The modern understanding of these things is less than ten years old, and the appearance of striking applications is now a regular occurrence. Failure to know these techniques puts one at an unnecessary disadvantage.

One of the key concepts is that of "influence". Basically, if no individual or small set individuals in your model can determine the outcome of the event you are studying, then you can expect a sharp threshold.

### Sidebar: Some Nice Tao Essays

One of Tao's essays for beginners is "Ask yourself dumb questions, and answer them!" I think this there is much wisdom in this essay that can serve through a lifetime. This essay chains to two others and these chain again. I love them all, but I would especially encourage you to consider "Know the limitations of your tools."

It is easy to get hung up on assertions of the form "the only way you can prove blah blah is by using blah blah." I'm often guilty of this, and I am trying to "coach myself" to get off that horse. In fact, there are remarkably few mathematical results of substance that cannot be obtained by strikingly different methods.

### Sidebar: Baby Gauss Contemplates the Dirichlet Kernel

One of the most useful (and most famous) formulas of Fourier analysis is the sine ratio formula for the Dirichlet Kernel. We'll look briefly at a geometric proof of this formula that is based on shifting and subtracting dots. This "symmetry" argument is of the kind much appreciated by baby Gauss.

### Day 23: Tsitsiklis Proof of An Index Theorem

I have nice pictures to coach us along in the Tsitsiklis Proof of the Gittins index theorem. The computations are indeed simple and instructive. You'll also see how the ideas of "an exchange" "exponential discounting" and "stationary strategies" are joined at the hip. Also, we'll see what moving to semi-Markov can buy you that plain vanilla Markov doesn't. Not to keep a secret; it's that one has more "clumping" possibilities in a semi-Markov setting.

### Sidebar: Coupling Revisited

Coupling is one of the most powerful of the "genuinely probabilistic" techniques. Here by "genuinely probabilistic" we mean something that works directly with random variables rather than with their analytical co-travelers (like distributions, densities, or characteristic functions). We have used coupling in our proof of the Le Cam inequality for the Poisson approximation of Binomial sums. If we have time, we'll look at some more applications, say to Markov chains.

### Sidebar: "Short Proofs"

Kai Lai Chung always had considerable skepticism about "short proofs." He did not fully make the case, but it is easy to do so. Sometimes a "short proof" is short just because the second player gets to rely on the context that has already been set up by the first player. This let's the second player glide over issues that the first player would have had to explain. If you look at Riesz's proof of the Reisz representation theorem, he takes more than five pages to do what now every analysis textbook does in a short paragraph. Riesz also published his paper in several variations!

Even when a short proof contains a genuinely new idea, it may not be all that valuable. One way to "poker guess" this is to realize that if the short proof really had jump (i.e. it really advanced our understanding of something) then the author would have not just given "a short proof of ..." but he would have gone on to prove the new result!

Even with all of this criticism of short proofs clearly in sight, I must admit that I am always a sucker for the "short proof" pitch!

Please provide both hardcopy and email me a PDF. The PDF should have the file name structure yourname900.pdf.

Here is some REPEATED ADVICE: (1) Get quickly to some fact that you find interesting --- with no baloney or "begats" to bog you down. You need to get to an honest and interesting statement of fact on the first page if at all possible. (2) Be very clear but understand that you are writing for a person with a solid mathematical education. (3) Look very hard at providing some honest coaching. If you can possibly do it, leave the reader feeling honestly educated about something that is worth learning.

For this one report, these things are more important than doing something "new" or showing that you have done something "hard." If such things happen along the way, that is great --- but the key deliverable is to honestly convey some "insight."

How to create a disaster? If there are errors or misstatements early in the paper, I simply will not be able to read it. You could be brilliant on page 3, but if I am lost (or disbelieving) on page 2 then your brilliance will go unnoticed. Proof read your paper carefully, and if you possibly can --- have a friend read it and then discuss it with you. By the way, your friend is unlikely to confess about all the places he got stuck. Insist that he point out at least a few of the places.

Let me underline one further point: Tell No Lies. If in doubt, leave it out. So, if you are writing some that still seems vague to you, you definitely want to attend to this "problem". Either work on it until it is no longer vague, or punt --- remove the assertion and go to some other part of your problem.

### Day 22. Gittins Index Theory

The main business of the day will be to introduce the simplest of the Gittins index theorems and collect some useful observations. We'll start out with some of the examples from Frostig and Weiss (1999). In particular, we'll look at the dynamic programming formulation of the 1.5 bandit problem and the retirement bandit problem. This gives a useful set of concepts.

Still, it turns out that is painful to try to follow the whole exposition of Frostig and Weiss (1999). One problem is the overburdening of notation ---- same or similar notations used for migrating concepts. A little of this is tolerable, but with too much we come to the "parable of the chairs."

So, for the real meat, we turn to the 1994 set up of Tsitsiklis. This set up is more general than Frostig and Weiss (1999), but the analysis is simpler and the proof is shorter. Moreover, all of the definitions are transparent in their relations and the notation is clear.

We may return to Frostig and Weiss (1999) for more "story." There are surely insights there, but the analysis has a lot of overhead.

### Sidebar: A Problem in Geometry

Our warm-up problem will be a three dimensional generalization of the Pythagorean theorem. The featured solution will provide you will some motivation to learn about wedge products.

### Sidebar: A Robust Philosophy of Symmetry

We may not get to a second sidebar, but I have a few philosophical observations about "symmetry" that I would like to get into the conversation. I've hinted at this a few times with assertions like "symmetry is anything that is cool," or "a symmetry doesn't have to leave anything fixed; it just has to change things in a way that you can easily track." The new bit I'd like to add has to do with "breaking out of a lopsided situation." The context will be that of the remainder term in Taylor's series.

### Sidebar: Polya's List of Heuristics

In How to Solve It, Polya gives a "Short Dictionary of Heuristics":
Analogy
Cases
Extreme conditions
Patterns
Specialization
Symmetry
Variations
Working Backwards

My personal experience with these tells me that my weakest spots are "contradiction" and "patterns." All the other heuristics are constantly at my finger tips, but for some reason consistently under use argument by contradiction. I tend to come to it only when forced, and this can be a terrible waste of time.

I also under use "patterns" if by this one means "inferences from examples." I am sure that I would be more effective with this if I were a more facile programmer. If you are a good programmer, you can really make hay with the "pattern" thesis. It's easy to prove stuff, if you always know what is true!

### Ultra Filters and Epsilon Management

Our brief discussion of ultra filters on Wednesday was based on the exposition by Tao. This piece is certainly worth a good long look. In particular, it sketches a proof of the existence of non-principal ultra filters where one uses nothing more than Zorn's lemma.

One of gems from Tao's discussion is that one "does not need" the mysterious Transfer principle. In fact, going back and forth between the equivalent standard and non-standard assertions is not hard to do by "bare hands." Typically, the standard assertion implies the non-standard assertion just by checking the definitions. To get the reverse implication, one usually argues by contradiction. One assumes the standard assertion is false and then uses that "fact" to construct a counter-example to the non-standard assertion. Tao's proofs of Lemma 1 and Lemma 2 follow this pattern.

### Sidebar: More on Ultra Filters

Mickey has passed along another link to a nice exposition of ultra filters and their applications, including an interesting (close!) connection to the famous Arrow impossibility theorem. It also covers the classical applications to Schur's Addition Theorem and to Hindeman's Cube Theorem.

### Sidebar (Financial): UST CDSs

Just as part of discussion of the rationality of the world, I may briefly discuss a puzzle: "Who buys UST CDSs, and what are they thinking?" TLAs will be explained.

### Day 21: Making Sequential Choices

We'll be getting back to our main track: the theory that supports the making of sequential selections. This brings us to the change point problem (mentioned briefly last time), the theory of multi armed bandits (which we'll start on today). I'll also mention some related problems of choice theory.

In many problems of sequential selection one either introduces discounting or geometric stopping. This is really a symmetrization step; the "group" is the one generated by "shifting time by n" and we set things up to preserve the problem under that "group action." This is the stuff that symmetry is made of. A question that greatly intrigues me is that of undoing the "damage" that is done by this symmetrization. I'll explain why this interests me. A quick answer is that any piece of general progress converts hundreds of "paper P's" into "papers P+1's"

### Warm-Up: The "Prove a Consequence" Heuristic

Suppose you want to prove Theorem A. You have done all the usual stuff (simplest variation you can find, examples done in detail, analogies listed, parts --- hypothesis, conclusion --- examined individually and varied individually and in pairs).

One more thing you can do is "Prove a Consequence of Theorem A." You might not think to do this since it may seem like building a house on sand. Still, the experience of many who have tried this idea is that is more effective that one might ever have imagined. I'll give an illustration by calling on a theorem we all know --- it is due to Pythagoras.

### Sidebar: Galton-Watson Process Cantor Sets

If we need a little filler, there is small construction that every one should know. It is a variation on the Cantor set, but it is more powerful in some interesting ways. It's also very easy, so why not add it to your tool kit.

### Problem Solving Strategies

Tao has a nice list of problem solving strategies. Look over the list and see how many of these strategies have made an appearance of some sort in your project.

I was particularly intrigued by Strategy 8, where Tao says "One should try instead to draw a picture in which the hypotheses hold but for which the conclusion does not – in other words, a counterexample to the problem."

There are several psychological forces in play here. First, we have a natural "fear" of a counterexample since we usually "want" to prove our conjecture (and think we have good reasons for believing its truth). Second, as Tao observes, we can't expect to get much out of a picture where the hypothesis and the conclusion hold. Don Ornstein has used this process of "drawing counterexamples" with great success --- to prove positive theorems!

### Expositional Idea: Use an Appendix!

I can tell from the progress reports that several people are having a hard time finding their footing when it comes to being "simple and clear." This does not mean putting in stuff that everybody knows (like the definition of "convex" or of a "stopping time"). You should assume that I know everything that has been covered in the class --- this is true with very few exceptions.

But what if you want to be "sure" that you have all the background down? You can say something link: "Appendix A contains definitions from theory of convex sets that may be unfamiliar." For example, there are "cones" and "pointed cones" and one can probably guess the difference. This is unusual enough to say in the text, but if you have a lot of such background material it is best "stuffed" into an appendix.

In general you want to do everything you can to be clear and professional. This means that you must quite quickly get to something that is "new" to a mature reader. For me a paper will be good if it teaches me something that I have some chance of using sometime. This is why one read any paper!

Prospero:
Our revels now are ended. These our actors,
As I foretold you, were all spirits, and
Are melted into air, into thin air:
And like the baseless fabric of this vision,
The cloud-capp'd tow'rs, the gorgeous palaces,
The solemn temples, the great globe itself,
Yea, all which it inherit, shall dissolve,
And, like this insubstantial pageant faded,
Leave not a rack behind. We are such stuff
As dreams are made on; and our little life
Is rounded with a sleep.

The Tempest Act 4, scene 1, 148–158

### Day 20: Lagrange Duality, Saddle Points, and Games

The minimax theorem of the theory of zero-sum games actually predates the duality theorem of LP. Still, even before LP duality, people were aware of the relationship of the minimax theorem to the theory of saddle points where you have an F(x,y) that is convex in one variable and concave in the other. Of course in the game theory setting you have linearity in each variable (so both concavity and convexity in each variable).

Now days the gown-up way to think about these things is via the general (but simple and concrete!) theory of Lagrange Duality. This is beautifully discussed in Chapter 5 of Boyd and Vandenberghe (free and honestly available on line). They also have their lecture notes for their Stanford course, and we may look at their very nice Figure on page 5-15. There are also instructive variations on this picture that I will sketch in class. I greatly encourage you to browse through this book even if your interests are purely statistical.

We'll go over some high points of this theory, essentially covering the first four sections of Chapter 5. This will give us a very rich and useful way to think about zero sum games --- and much more. It will also give us a little more practice with convex analytics, which will be handy when we resume our exploration of Brenier maps. This is a new research area where I think you can still find "low hanging fruit."

### Micro-Sidebar: Hurwitz Binomial Theorem

If we need a warm-up problem, I'll mention some facts about Abel's binomial formula and a stroking generalization due to Hurwitz. There are lovely probabilitstic connections with this formula to the theory of branching processes, a subject that is already rather lovely itself. Bennies and Pitman have a delightful paper on this connection. I believe that there is also much new that can be done with this connection.

### Sidebar: The World of Brunn-Minkowski

There is a excellent survey of the Brunn-Minknoski theorem which everyone should know about, even if it goes a bit beyond the natural attention span. We certainly won't get too deeply into it, but you might consider two pieces. First, there is a "flow chart" on the third page which shows you just how many rich connections there are in this theory. Also, there is a useful discussion of the Prekopa-Leidler inequality --- which slightly abstracts the Brunn-Minknoski theorem and which also leads to the simplest known proof of the Brunn-Minkowski inequality. The proof we will sketch in class is perhaps simpler, but it assumes that you have the Brennier map at your disposal. We justified the existence of this map, but the honest details require some irksome work on "regularity." Essentially one needs Rademacher's differentiation theorem --- and that's not so easy --- though it is classic and well worth learning if you have the time.

### Day 19: The Transportation Method and Related Problems

We left a long menu of items on our to-do list. The one that is most firmly in my head has to do with the "transportation problem" --- or the Monge coupling if we take the probabilist's point of view. I think this is a a genuine "breakthrough" and it is still not widely understood by probabilists or statisticians. Thus, it offers a good shot at grabbing some "low hanging fruit." My discussion will mainly follows the nice exposition by Keith Ball.

I do owe you more on game theory and LP but I have to check some "dualizations" before I impose them on you. Please do solve Jiri's "Santa, Easter Bunny" problem. It will stimulate your brain. Hint: There is one action where Santa and Bunny put the same probability.

As a warm-up (of sorts) I will discuss the Helly theorem that I mentioned at the beginning of class last time. Oddly enough, this is an "interpolation" between Farkas lemma theory (with which we began our optimization discussion) and the the classical Zero-Sum game theory results. My motivating source is a blog entry by Tao, but Tao kindly leaves us more a few cracks to fill.

I'd also like to start the discussion of how fixed point theory contributes to both game theory and to tricky little things like refinements of the Hahn-Banach theorem (or Helly's theorem). All of these theorems have at their heart the sentiment that one either (1) has what one wants or (2) there is some canonical impediment (or obstruction). If this seems too vague, stare at Farkas's lemma again.

Also, of course, we have Gittins Index work to do. Thankfully we still have eight more days of class after today. We'll also go back to the Tao well to engage the "sparse inversion" story. Statisticians will be eating lunch on this for many years to come. I suspect that there are also opportunities for combinatorists. The "Hail Mary" pass in this business is an analogy to the LLL algorithm, which has it's own "sparse" story. This idea could lead no where, or to a nice prize --- give it a shot.

### Sidebar: de Bruijn Sequences

In the "interesting object" category one can certainly include the "de Bruijn sequences". This is a cycle of 0's and 1's that has all of the 2^n possible n-blocks as interval substrings. Puzzle, how small can you make the cycle? What is a good construction? What if you want strings from an alphabet other than just 0's and 1's. Justin has provided a nice link with applications of these magical critters to tomography.

### Day 18: (Probably) Zero-Sum Games: Proofs and Applications.

Warm-up Problem. We'll look at Gua's Theorem and ask ourselves if it generalizes Rolle's Theorem or specializes Rolle's theorem. We'll fact the same issue with LP, Farkas, and the fundamental theorem of zero-sum games.

Second Warm-up Problem. We'll see how to "bootstrap" a trivial interpolation inequality to get a "best possible" inequality. This is a super-generic trick that falls squarely in the "pay the rent category."

Sidebar: Linear Expected Time MST Algorithm. Can you compute the MST in linear time, if you allow randomization? Turns out that you can, thought I suspect there are some subtleties to deal with. This link will put you on the trail.

### Day 18: Subadditivity and Poissonization

We'll continue the theme of last time and get the asymptotic behavior of the expected length of the traveling salesman path. This technique is remarkably general and if you keep it in mind, you'll find a day when it "pays the rent."

As time permits, I'll mention another application or two of the tensor power trick. I'll also introduce a way to think about functions (height and width) that sheds new light on the traditional L^p inequalities.

We may finish just a few minutes early since I have to get myself to the airport. Don't forget about your progress report.

### Progress Reports Due April 4

On Monday April 4, you should provide me with a progress report on your project. This should be 5 to 8 pages and it should "look like what you expect your final project to look like."

In particular, I hope that you can get directly to a mathematical problem and let the problem "speak for itself" ---i.e. we don't need a lot of fluffy "motivation." I also hope that you can provided something "interesting and concrete." For example, a trick, a formula, a representation, an inequality, an argument, or an algorithm that is honestly instructive.

For something to be "honestly instructive" it has to be presented clearly, and it has to convey something that you believe that more people should know and understand.

If "you believe" then it is almost certain that "I'll believe." You just have to check that you have been honest with yourself.

You don't need material that is "new" --- you just need material that is "worth knowing" --- whether it is old or new. When you find something that is "worth knowing" then it is usually pretty easy to lean on it until something new (and also worthwhile!) eventually squirms out. Conversely, there is not much point in panning for gold in an area where no one else has ever seen a nugget.

Also, don't forget to keep polishing your Latex skills. You surely want to use AMS Latex and all of the nice tools that it has for lovely matrix layouts, fonts, etc. They also have a brief on-line manual that rewards periodic review.

### An Amusement: Balancing a Table by Rotation

You can almost convince yourself that the intermediate value theorem implies that any four legged table with equal legs that sits on a smooth but uneven surface can always be brought to stable equilibrium by a rotation about the center of the table. Think about this; then see if your argument fits into the rigorizor. To see if you have perhaps omitted some condition, consult the nice (and non-trivial) paper Mathematical Table Turning Revisited.

### What I Forgot about the CLT

Gaposhkin proved that for any sequence of random variables that converge weakly to zero and whose squares converge weakly to one, there is a subsequence for which the CLT holds. This is in the CLT piece in the venerable Wikipedia. I found this very amusing, and then I remembered that it is (almost) a corollary of an infinitely general theorem of Aldous on "statutes".

### Markov Chains and MCMC

Whenever you have time you should poke around the monograph by Aldous and Fill on Markov chains. This very famous, never published book is filled with lovely ideas, discoveries, and facts --- that are too little known.

### Passport Ownership

We don't do data in this class, but if we did, there is an article in the Atlantic that discusses passport ownership as an econometric variable. It's amusing at the purely empirical level.

### Day 17: TSPs --- Variance and First Look at the Expected Value

This week will be spent using the variance bounds and the ideas of subadditivity to get results about some classic problems of combinatorial optimization. In particular, we'll first get a strong law for the traveling salesman problem. We'll then look at generalizations and refinements.

One of the tools that we'll consider in the analysis of the TSP is Hilbert's Space filling Curve.

Sidebar on Schur, ESS, and the Tensor Power Trick: Just as a little side dish, we'll perhaps look at a famous theorem on positive definite matrices due to I. Schur. The proof of this theorem gives one a powerful way to think about the covariance matrix of any Gaussian vectors. Curiously enough, the trick for proving Schur's theorem is related to the original proof of the ESS inequality. This is the "tensor" trick and it has many, many applications.

### Promises to Keep and The Inventory of Up-Coming Topics

Beginning on March 28 I'll be sure to keep some promises. In particular we'll take a discussion of some problems in game theory (and applications in combinatorial optimization). We'll also develop some of the theory of the Gittins index. Each of these topics will get about a week of attention; perhaps more if they strike a chord. We'll look into our other inventory items.

Codes and Languages. We won't get to it for weeks, but some time in the not too distant future, I want to make some observations about codes and languages (in the CS sense). In particular, everyone needs to know the sphere packing bound for error correcting codes. To illustrate the surprising power of LP technology, we'll also look at the very clever improvements on these bounds that were made by Delsarte. My sense is that the Delsarte trick has many more applications than have been found to date. Moreover, if one steps us to semi-definite programming, one is definitely in the research domain where there are still lots of "easy pickings."

Sparse Inversion. I have some comments about the ideas that Candes and Tao introduced concerning sparse inversion. This is the "breakthrough du jour" in statistics, and it is quite worth our time to see what it might say in problems that interest us. There are also a few observations about "the general theory of breakthroughs" that seem worth collecting.

More Powerful Second Moment Method. The second moment method is a famous device for providing lower bounds on probabilities. This is especially useful when one hopes to prove the existence of some combinatorial object. We can't cover all of the details, but I would like to mention a very instructive paper by Achlioptas and Perez that gives many insightful hints on how one can improve on the plain vanilla second moment method argument.

### Day 16: Techniques for Bounding a Variance

One of the most common tasks of probability theory is the bounding of a variance. This can be especially challenging when the random variable in question does not fit the standard model of a sum, a quadratic form, or even a U-Statistic. Fortunately, there are simple but powerful tools that often help.

We'll start off by discussing and proving the ESS inequality in several variations. In particular, we'll look at the interesting class of "self-bounding" inequalities and use this idea to give a nice bound on the variance of the LIS problem. This illustrates how "self bounding" can sometimes beat the plan vanilla ESS. Part of the charm of the self-bounding formulation is that this that its conclusion is precisely the condition that one needs for "stability about the mean" --- i.e. so one can say that the mean is representative of the whole distribution.

We'll then look at the Gaussian Poincare inequality which is a general bound on the variance of a function of Gaussian random variables in terms of the L^2 norm of the gradient. This inequality has come up many places in probability and analysis. Most folks connect it with some reasonably sophisticated stuff like the Ornstein-Uhlenbeck semi-group, but (at least in an important special case) one can base an easy proof the ESS inequality. I leaned this trick from Gabor Lugosi, and the proof comes with a nice open problem.

There are powerful analogies between the Bernoulli random variables, Uniform random variables, and Gaussian random variables. To give one illustration of the analogy, we'll look at the simplest of the inequalities of the Poincare/Sobolev kind. This is Wirtinger's inequality. It is stone simple, yet it started a vast industry where hundreds toil today.

Finally, to close the loop on an earlier discussion, there is a lovely combinatorial way to see the vertical binomial sum generating function. In fact, the trick means that you don't have anything to "remember" --- and that is always a good thing. Moreover, the trick is almost infinitely general.

### Day 15 (3/14 is Pi Day!)

Monday is national Pi Day (3.14159...). I'll share one (or maybe two) of my favorite pi facts. If you have a favorite pi fact, do bring it to class. No real pies please.

Intermediate Counting. Everyone knows the basic techniques of counting but remarkably few people know the techniques of intermediate counting (much less advanced counting). I'll mention a few of these illustrated by the vertical binomial OGF (vs the horizontal OGF that is universally known). BTW, if I had just one book to take with me to a desert island, it would be Flajolet and Sedgwick "Analytic Combinatorics."

Long Common Subsequences, Subadditive Methods and the ESS Inequality. The subtitle pretty well tells the whole story. Still, the punch line is that the problem of LCS lets one illustrate two simple tools that completely crush a large class of problems, yet if you do not know these techniques you would be very likely to be completely stuck.

Concentration Inequality Survey. Svante Janson has a succinct and highly readable survey of the basic facts about concentration that one needs to know to do probability and combinatorial optimization. We'll peak into this from time to time.

### Day 14 (Last Before the Break)

I'll discuss a problem from the probabilistic theory of Bin Packing. This is a well-studied subject, which I just recently learned has a powerful connection to a problem that I mentioned earlier in class --- the longest increasing subsequence problem.

I'll go over a nice argument from a paper P (Gnedin, 1999) which uses a "relaxation argument" that I certainly would have missed.

I'll do a little "proof mining" of this result. In particular, I'll calculate a variance that was not addressed in the paper. We may even push a little harder and trot out a martingale CLT to see if we can squeeze out a full limit theorem --- a decent start on a potential P+1, though not deep enough just yet.

There are many sources for martingale CLTs but certainly a pleasing one is McLeish (1974). This is very nicely written paper --- which as served as P for many P+1's.

We also have sidebars to discuss. I especially want to point out (1) the balayage trick to "keep an mean and increase a variance" (2) the nice observation that conditional expectation "preserves concentration" --- this has applications to the hypergeometric distribution, and (3) perhaps the trick of Jacobowski-Kwapien --- which pre-dates "modern" concentration theory but which still may have some juice left in it.

### Day 12 (Monday February 28) More on the Li Model, then Sklar, Coupling ...

Micro-Sidebar. We'll have distinguished guest today, please welcome them and be sure to point out all the ways how it is great to be a graduate student at Wharton and UPenn.

We'll start by collecting the project proposals and by venting some of the agonies and dilemmas that you many have run into in writing your proposals. I can tell ahead of time that there will be one "confusion" that will be almost universal: The temptation to talk about doing "something more than is in P" before having a serious understanding of "what's really done in P." Early in ones research life, this seems to be part of the human condition. This modest self-deception never really goes away, but we can all struggle against it together.

We will then look at a general theorem that is related to Li's approach to modeling of dependence of the kind that one imagine in bond defaults. The result is called Sklar's theorem, and and it is about the only theorem in the "theory of copulas." It is insanely simple, but it does shed some general light on how one might model dependence. As a favor to me, don't give this "theorem" more respect than it deserves.

The one leftover is to show how to simulate a multivariate Gaussian where all correlations are equal. This is easy with "exogenous" randomization, but it can be done with "just the randomness that you've got."

This matter of manipulating randomness can lead to a discussion of several other ideas. One of these is "balayage" --- where one sweeps probability from some place to some place else. Another is coupling, and we may look at a lovely coupling proof of LeCam's inequality on Poisson approximation.

As time permits, we'll look as some sidebars (two good ones below) or discuss a nice example of "optimal stopping" where exogenous randomization gives a surprising result. I did mention this example before (very quickly), but now perhaps you will see the example in a different light.

### Sidebar: Prophet Inequality Techniques

There is a useful (but somewhat old --- 1992) survey of Prophet inequalities by Ted Hill that I would encourage you to review. I will probably discuss some of the ideas that he summarizes under "Techniques" begriming page 197. I particularly like the "balayage" technique that one can use to replace random variables with ones that match on the mean but which otherwise are more "extreme". This copy is not well copied; it is a bit too elastic.

### Sidebar: Optimal Stopping of Averages

There is a nice Mathematical Monthly paper that addresses in the simplest possible way some problems of optimal stopping, especially stopping of an average. There is a sports story attached to this kind of problem and I will tell it even though I really do not feel comfortable when "statistics becomes baseball."

### Day 12 (February 23) More on Your Report, MCMC, and The Li Model

I'll cover a few more of the items that I mention in the piece on my top suggestion top suggestion on how to organize your report.

### Sidebar: MCMC and a Very Nice Example

Persi Diaconis has written a very nice survey of Markov Chain Monte Carlo. Please do read at least the first few pages. The piece was written for mathematicians, so it does aim to show the connections with a wide range of mathematical tools and ideas. There is also a nice piece from the American Mathematical Monthly that covers the history of the MCMC.

### Copulas, the Li Model, and Scapegoats (Part I)

I know that some of the people in the class have an interest in mathematical models in finance, and I hope it will be fun for all if I spend some time discussing one of the more infamous models associated with the Financial Crisis of 2007-2009: the Li Model.

We can use a nice 2009 paper by Donnelly and Embrechts as our guide. Don't worry, we will not try to cover the 633 page report of the FCIC, though I may comment on some of its conclusions.

This material is a little out off of our main trail. Still, we did do Black-Scholes time before last, and this material gets much more to the heart of many of the issues that the quants of 2011 need to face. We'll probably also benefit by having restful break from the heavier mathematics of the Ito Calculus. Finally, there is much material here for "easier" projects.

The main brick in this wall is the idea of a copula --- an idea that is almost "trivial." Nevertheless, as we all know, it is the "trivial" stuff that has the most scope for application. Even though they have been a bit over-hyped from time to time, copulas are definitely "news you can use."

Copulas came into existence to address two issues. First, lots of people have a love for correlation coefficients. Second, once you leave Gaussian land, the plain vanilla correlation is an entity of "diminished capacity" --- perhaps not always nuts, but too nutty to be given driving privileges. Copulas contribute in a simple and straightforward way to the challenge of expressing dependence in non-Gaussian models --- that's the good news. We'll deal with the bad news later.

### Matchings and Stars

There is a nice little graph theoretic result that echoes the arithmetic of the Erdos-Szekeres Theorem, but is really quite different. It is one of the simples examples of a graphical Ramsey number, and everyone should know about these. (Recent examples A, B, C)

### Erdos

There are many articles about Erdos on the web; I'll just link to Bondy's piece. It reminded me of several things, one of which is Turan theorem on the independence number of a graph. I may give a brief probabilistic proof of this some time in class.

### Processors to Compute Max in One Round?

There is a nice exposition of Turan's Theorem that gives some geometrical applications and some applications to multiprocessor computations.

### Free Book: Meyn and Tweedie On-Line

You can legitimately access the very useful book by Meyn and Tweedie on line. If you have a Markov chain and you are stuck trying to show that it is recurrent (or has a stationary distribution, etc.) then this is the place to look for more techniques that go past the elementary ones.

I have written an information page about the projects. Your project proposals will be due before Spring break. Everyone is responsible for having read and understood the project requirements. I've also added a page that hope to explain how your project paper should differ from the papers you have read.

Please DO float your ideas past me. I will have specific suggestions, but you can anticipate that the suggestions will focus on certain points: (1) cutting the fluff --- keeping the "story" to the minimum (2) KISS: For a result or idea to be useful, it is virtually forced to be simple. Keep in mind how many people make a living with Chebyshev's inequality; the day you mastered it was a glorious day. (3) Understanding is just the first step --- and it takes a little time. (4) You can let go of looking for something "new" --- except as a new figure, new example, new computation, or other smaller piece of the story.

### Day 11 (February 21)

Just to add briefly to our discussion of the final paper, I have some suggestions about a creative, easy, instructive way to organize your report.

We'll looked at a couple of different topics today. After this short detour we'll first deal with some old business ...

### Exploration of a Formula and The Zero'th Fourier Coef Trick

I discussed a general trick for writing a series as an integral. The benefits of an integral representation are many, e.g. you have natural transformations (like substitutions and integration by parts), you have natural estimations (from truncation, Schwarz inequality, etc.), and you have a body of polished approximation tools (Laplace approximation, method of stationary phase, etc.)

### Discussion of P to P+1 and the "Problem-Solution" Framework.

I introduced the Erdos-Szekeres theorem on monotone subsequences, and gave a proof using "labeling" and "pigeonhole". This served as an illustration of both P to P+1 since I have written many papers that grew out of Erdos-Szekeres. It also illustrated "Problem-Solution" because of the reasonably pleasing way one can begin with an example that illustrates how first to understand the problem and then solve it.

### Distributed Networks and Optimization

Also, several people in the class are interested in optimization in distributed networks. This is an interesting and rapidly growing field. I'll illustrate the process of "fluff elimination" by introducing a simple interesting mathematical problem that is completely jargon free. In fact, I'll frame the example as a "problem." This is very often a useful way to get a reader (or listener) involved, and I encourage you to consider this approach in your paper.

I'll also argue for the "mathematics first" and "metaphors second" approach to exposition in applied mathematics. I am convinced that this is the way to be maximally clear. Still, this approach is not the most popular one, and I think that I know why.

A useful reference for some of the mathematical aspects of social networks is the recent survey of Duchi, Argwal, and Wainright. The mathematics here is related to earlier work on on-line optimization, especially the projected gradient algorithm of Zinkevich.

### Fact or Fiction?

"An average person has about 100,000 hairs on their scalp. Redheads have 90,000 while blonds have 140,000; brunettes fall in between." This is an amusing, unintuitive assertion. Do you believe it or not? Why?

### Day Ten (February 16)

Project time is getting closer, and I want to go over some resources and suggestion. More generally, we want to "start the conversation."

You project proposal (3 pages in Latex, with the PDF of one "resource") will be due on Monday February 29 both by email and in hard copy (at class time).

The quality of your papers will be maximized if we have lots of interaction all along the process --- in class, by email, and in office hours.

Our main business of the day will be to add to our stochastic calculus took kit. In particular, we'll look at Ito's formula for functions of time and space. This gives puts us so close to the Black-Scholes theory that we would be remise if we did not give a heuristic derivation of the Black-Scholes PDE.

This also leads us to the most important optimal stopping problems --- those associated with American options. It turns out that for call options, we never have an "early exercise." This turns out to be an easy convexity argument --- though one needs more than the Black-Scholes formula to get there.

The optimal stopping problem for American puts is not so easy. We'll put this within range of numerical solution, but --- alas, there is no "formula." We won't go "whole hog" into this theory, but it is "culturally" worth a look.

On the rigorous side, we'll now start to saddle up to the Bellman's principle and dynamic programming. This is an enormously fecund idea and the applications are highly modular. These are among my favorite features.

### Sidebar: Bandit Page

We'll take another look at the Bandit resource page. In particular, I have added a new survey paper that will be our basic resource.

### Sidebar: Writing

The project pages give some suggestions about writing that are relevant to this particular project. Those suggestions really do need your attention.

If you get interested in writing more generally, you may want to look at some resources on writing. I collected these a while ago, and I don't want to stuff your "to do" list. Still, at some point in your life you might say to yourself, "Hey, I might get a bit more credit for my efforts if I could write a little better." When that moment comes, this is a place to look.

I hope you will make writing a large part of the experience of doing this paper, but I don't want to pester you about writing skill. It really will be fine if you are honest and clear. For me, good notation is more important than good grammar. Caveat: Being honest and clear is damn hard.

### Day Nine (February 14) Valentine's Day!

As a micro-sidebar, I'll first review a formula from second year calculus that illustrates ways one can "think about a formula." We've been doing this with Ito's formula, but it may be instructive to take a step back and take a look at an even simpler formula.

The main business remaining from last time is to show that the optimal stopping rule for one-dimensional Brownian motion is the old stand-by: you stop when you hit the set GAMMA where the value function is equal to the pay-off function.

We'll then press on to optimal stopping problems for Brownian motion in two and higher dimensions. We'll need expanded versions of Ito's formula, and these turn out to give lovely connections to classical applied mathematics. We could spend weeks exploring these formulas, but we'll just have a taste.

We'll also discuss the "usual theorem" for optimal stopping in the new context of BM in higher dimension. Specifically, we'll characterize the value function and the optimal stopping rule for Brownian motion in dimension two (and higher). The pattern is now familiar, but we do have new work to do. Along the way, you'll see the real reason that super martingales are called super martingales.

Finally, I want to take a little time for more applications of the "interchange argument", including a "path following proof" for Birkhoff's theorem on doubly stochastic matrices.

### Information about Fall 2011: 955 Stochastic Calculus Will Be Offered!

It turns out (unexpectedly to me) that Stochastic Calculus and Financial Applications will be offered in the Fall term of 2011. This course provides an honest development of the stochastic calculus for Brownian motion and related processes. It is aimed squarely at financial applications --- and does a bunch of them, e.g. several derivations of the Black-Scholes theory including the widely applied Harrison-Kreps "risk neutral" formulation for asset pricing. Students who have had Statistics 530 will be appropriately prepared.

### Day Eight (February 9)

This is an Ito Formula Day. We mainly focus on how to use Ito's formula to solve problems and do concrete calculations, but I will also sketch the proof. In fact our proof will be completely valid "at one point" --- the more tedious part of the rigorous proof is to show that we can do all the points at the same time.

We'll see how Ito's formula gives new insight into the moments of a Gaussian random variable and new insights into Jensen's inequality. Most critically for the theory of optimal stopping, we'll see how it helps us characterize the excessive functions. This intern helps us to characterize the value function of optimal stopping.

### Day Seven (February 7)

The first bit of business is to discuss how to extend our characterization of the optimal stopping time to countably infinite state spaces. Here we will have to be content with an epsilon-optimal strategy, but that is always good enough. This completes the "theory" that we need to show that our heuristic stopping rule for the secretary problem is indeed an optimum rule, though we will leave the final calculation until we revisit the secretary problem with the Bellman equation approach.

This completes our first "big picture" cut at the theory of optimal stopping in discrete time. We'll then start looking at the corresponding theory in continuous time, where we typically take Brownian motion as the underlying Markov process. We won't belabor the point, but in this setting the "potential theory" of Markov processes really is the "potential theory" electrostatics.

As time permits, there are some sidebars to consider. In particular, I'd like to give the "interchange argument" for the rearrangement inequality. This kind of argument will show up again in the more complicated context of Bandit problems.

### Bandit Theory

I've posted some resources for our development of "Bandit Theory." This is rich part of applied probability with many good applied stories --- e.g. problems like price discovery, choice of movie reviewers to trust, etc. --- in addition to sequential clinical trials (which is too bio-staty for my taste).

### Great Lottery Story

In 2003 a geological statistician (and cryptographer) cracked the Ontario tic-tac-toe lottery. A recent article in Wired Magazine gives a nice account of this discovery, including an explicit algorithm for winning 90% of the time. Note: This hack has been plugged.

### Kruskal's "Asymptotology"

I mentioned in class that there is a manuscript by Martin Kruskal that was for years a "Bible" for applied mathematicians --- especially graduate students in applied mathematics at Princeton. The manuscript is now a little dated; it came out in 1962 and even then it was always regarded as "informal." Still, a lot of people have made a living from it for many years. In fact, many continue to do so today.

In places the manuscript assumes more experience in physics and differential equations that we assume here, so you should skip page 4 and keep reading. Play around and you will get something from it no matter what your background. At a minimum, you probably want to learn about Newton diagrams and the idea of "dominant balance." There are many pieces of "philosophy" here that can pay the rent.

### Day Six (February 2 --- Groundhog's Day)

We back up a bit and take a more leisurely look at the set up for optimal stopping theory. In particular, we define superharmonic functions and excessive functions for Markov Chains. We'll look at several examples and ponder the idea that superharmonic functions are a natural generalization of concave functions. This idea has consequences not just for the theory of optimal stopping but also for graph theory and many other fields.

We then look at the most fundamental object in the theory of optimal stopping: The Value Function. Our main task is to prove the (easy but important) fact that the value function can be characterized as "the smallest excessive function that is at least as large as the pay-off function."

This characterization of the value function then gives us a recipe for an optimal stopping rule. In the case of a finite chain, we define the stopping set as the set where the objective function equals the value function, and we show that the first hitting time of this set provides a solution to the optimal stopping problem.

This chunk of theory is called the "Potential Theoretic" approach to the optimal stopping problem, and we will have followed the development of Dynkin and Yuskevich "Markov Processes: Problems and Exercises" Chapter 3. We'll stick with their approach for a while, then we will take a look at an alternative method based on dynamic programming (backward induction) and Bellman's equation.

Sidebar: If we have a sidebar today it will be about the "transformation method" for proving inequalities. I use this in the CSMC to prove the fundamental theorem of majorization, but in retrospect it makes more sense to see this technique first in the simplest context: the proof of the AMGM inequality.

Amuse Bouche: The simplest family of non-commutative groups are the dihedral groups. These suggest some Markov chains that are just a bit more complicated than the Markov chains on a cycle --- so the superharmonic functions associated with the dihedral group are in some ways the simplest "generalization" of concave functions. Exploring this could make a good project.

### Day Five (January 31)

The was the day for the famous secretary problem. We first turned the focus on record times and found that this was a Markov chain. We then computed explicitly the probability of having made the best choice given that we stop at a record instant with index m. Then we invoked the heuristic principle of "don't take a step unless its expected value is bigger than what you have right now." In the end, we found a strategy with probability 1/e of choosing the best. We were left with some concern whether this really is the optimum or not. That takes us to the general theory of optimal stopping.

Sidebar: Early in the class discussed very briefly the "Marriage Lemma" --- one of the most useful of al combinatorial facts. Every one should know a proof (or two) of this lemma.

### Day Four (January 26)

We want to master our three Wald identities, so for practice we will use use them to prove some other useful results. One of these is sometimes called Kolmogorov's "Other" inequality. In turn, we'll use this inequality to give a proof of Kolmogorov's condensation theorem. This surprising and beautiful theorem was proof last term in Stat 530 using characteristic functions.

I'll then prove the famous Chung-Fuchs recurrence theorem. This is one of the most fundamental theorems about random walks. Our proof will not use much more than a few "symmetry" observations and a little help from the simplest of the Wald lemmas.

Shortly --- quite possibly today --- we'll begin the real march on real optimal stopping problems. Chapter 3 of Dynkin and Yuskevich "Markov Process: Problems and Exercises" will be our guide. This path will take well into February --- or further depending on how much detail we devote to Dynkin and Yuskevich and how many detours we take.

Variational Method Sidebar: In physical science one typically approaches optimization problems via a variational method such as Fermat's principle and its generalizations. In OR and CS one typically uses "step wise improvement" methods with no visible derivatives. Sometime (maybe today) I'll show how you can get an LP type result (Gordon's theorem of the Alternative) using a variational fact --- the approximate Fermat principle. The proof uses very clever function related to the "soft max". This is a very fecund trick.

Farkas Sidebar: The Fundamental Theorem of Asset Pricing asserts that (1) either there exists a portfolio that is guaranteed to make money in the next period or (2) there exists a probability distribution on the states of nature (possible values of next period prices) such that the expected price vector next period is equal to the current price vector. With a little patience, you can show that this "financial theorem of the alternative" is a direct consequence of Farkas's Lemma. (see e.g. reference)

### Day Three (January 24)

To stir the pot for "optimal stopping" theory, I'll review a Martin Gardener type problem about amounts of money written on two cards. This problem has lead to many loud arguments in mathematical lunch rooms around the world.

I'll then gave some reminders about martingale theory (Doob's Stopping Time Theorem, Doob's Maximal inequality. The more novel business was to discuss three members of the family of "Wald's Identities".

The results depend on Doob's Stopping Time Theorem to get us going, but to get serious identities, one has to make a few sharp observations that exploit independence. In particular, we'll find some orthogonality that one might not have expected to show up uninvited.

### Day Two (January 19)

First, we went over Tao's inductive proof Farkas's Lemma. I then made the pitch that Farkas's Lemma makes some proofs "mechanical" because the resulting identity from the "second alternative" is so strong. This bit of philosophy was illustrated by an "mechanical" proof of the Duality Theorem for linear programming.

We then looked at the notion of a Cesaro average and pondered why one might be interested in results like Hardy's Tauberian theorem. The result is indeed useful in itself (it tells us something handy about averages!) but it also motivates the discussion of the discrete Taylor formula. We proved this (it's a two liner) and then applied it to obtain Hardy's theorem. The central idea was to "solve Taylor's Formula for the middle term." In a way, this is Newton's algorithm. This trick turns up in many "Big O to little o" arguments and it has saved my bacon in several papers.

### Day One (January 12)

The main business was the presentation of a mind-map of the course topics.

I then mentioned a little problem from 530 that illustrates some ideas about convergence of incompletely specified dynamical systems. In the little problem, the incompleteness came from the fact that instead of having a evolution equation we just had an inequality.

We then began a discussion of Farkas's Lemma. This is a core result of optimization theory and it also illustrates some very general mathematical ideas: (1) the structure of a "theorem of the alternative" and (2) a case where the "trivial obstruction" is the "only obstruction."

If you are taking this years version of Stat 900 and want the credit for a reading course, you need to set this up by January 25 (registrar's deadline). I would encourage you to handle this ASAP. You can set this up by contacting Tanya Winder, academic coordinator for the department of statistics. (winder@wharton.upenn.edu).

## About the Course in Spring 2011 --- It's All New!

This course covers different topics each year. If this year's topics are of interest to you and you have already taken Stat 900, you can sign up for this year's course under an independent study number.

This is an advanced course but it is expected that students will come from diverse backgrounds and have a wide range of interests, so the actual technical requirements are kept pretty light. This is a course that is designed to be inclusive. People with very different skill sets should still find plenty of material that is both interesting and accessible. You will see that the course deliverables are designed to accommodate this diversity.

TIME AND PLACE: MW 10:30-11:50 JMHH 94

## What's Covered This Year?

This year we will deal with topics in both discrete time and continuous time. The over arching theme will be the relationship between probability and optimization. We will be interested both in algorithms and in concrete mathematical solutions (i.e. exact formulas --- when we can get them!).

In particular we will cover:

## Processes and Procedures:

There will not be regular homework or tests, but instead students are expected to develop a project. This need not be a genuine "research project" but it should reflect the skills that one needs to get up-close and personal with an honest research problem.

The main deliverable will be a final project paper of approximately twenty pages. There will be preliminary deliverables of a project proposal, a project progress report, and a one-half hour class room presentation on the material of the project. It is OK for you to "double purpose" some of this work (e.g. it can be part of a master's thesis, etc.) but you and I will need to negotiate the details.