# Statistics 530: Probability Theory (Steele)

## Problem 14 Comment (12/16)

In problem 14, by conv(X_n,...) I mean the convex hull of all of the random variables in the parentheses. That is the set of all random variables that can be written as convex combinations of the random variables in the parentheses. I mentioned this in class.

## Problem 10 Comment (12/15)

I slightly refined the wording of problem 10 to avoid any ambiguity.

## Problem 4 Comment (12/7)

In problem 4, it is possible that S_n is negative, so the square root can be complex. For a careful, complete proof, you need to show that the complex part goes to zero in probability. This just requires a comment on the behavior of S_n.

## Final Exam Version 2.0

Final v2.0 PDF Latex Source Cover sheet PDF Latex Source. If you put the file mydefs.sty in the same directory as the exam and the cover sheet, everything will compile automagically.

Your solutions are due NOON DEC 18 ---PDF by email AND hard-copy in my mail box (unless you are out of town).

"This is the extraordinary thing about creativity: If just you keep your mind resting against the subject in a friendly but persistent way, sooner or later you will get a reward from your unconscious." --- John Cleese from his essay on creativity.

## Day 23

For the warm-up we'll look again at Kolmogorov's one series theorem, and we'll give the proof using characteristic functions. This will be amazingly automatic in comparison to the clever argument by Varahan that we saw earlier in the year. It is often the case that when they can be applied the methods of characteristic functions are amazingly more powerful and more straightforward than "real variable" methods.

There is a comment that connects the one-series theorem to McLeish's theorem and the martingale CLT, but we'll not revisit the McLeish theorem just yet. Instead we'll build a little more background with martingales. In particular, we'll do the Doob-Decomposition which shows that any square integrable Submartingale can be written as the sum of a martingale and a non-anticipating non-decreasing function. This is the discrete time version of one of the main theorems of continuous time martingale theory, the famous Doob-Meyer decomposition.

If we have time, we may also put the "Stein method" into play. This is really a constellation of methods, but we'll mainly pursue the Goldstein Zero Bias method.

## Nov 24 Compiling the Latex

I did not post the latex source in the past, so I forgot that I compiled the exam together with a file of definitions. If you put the file mydefs.sty in the same directory as the exam and the cover sheet, everything will compile automagically.

Final v 1.0 PDF Latex Source Cover sheet PDF Latex Source

## Nov 22 Happy Thanksgiving!

There are now a total of 12 problems --- but they are not posted. I'll post these at class time on Monday. Some problems will need material that we will cover tin the last 4 days. Be sure to read the whole "Martingale" chapter in Durrett. Now, let's all go back to digesting cranberry sauce!

## Nov 21 4:52pm.

Some clarifications made to problem 2 and 6. Not enough to change the version number, but you may want to look and refresh you pdf.

## Day 22

There is a detail to be added to the proof of McLeish's Dependent CLT, and this motivates Slutsky's theorem --- which is trivial, but worth discussion.

To complete the discussion of McLeish's CLT, we'll see how it implies a CLT for martingales.

We'll then discuss coupling and I will give a proof of Le Cam's inequality which is a nice quantitative version of the convergence theorem for Bernoulli sums (with differing p's)

If we have time we'll then start the discussion of Stein's Method for proving either Poisson Laws or CLT.

## Day 21

I'll first close the loop on Doob's strong-type maximal inequality. We won't revisit the proof of Hardy's inequality, but I may add a problem on the final that has a similar "punch line" --- i.e. how to make a martingale out of just a sequence of numbers.

The main order of business is to consider the CLT for martingales. We'll follow the plan of Don McLeish and first prove a very general result for dependent random variables. It shows us how if we can prove some very qualitative facts that are like the weak law of large numbers, then we have a CLT. These qualitative facts are easy for a martingale difference sequence, and that is how we get our CLT for martingales.

We'll talk further about the construction of martingales. In particular, we'll see how Doob construction and Hoeffding's inequality gives you concentration results for almost any function of independent random variables.

We'll also look at the problems on VERSION 1.0 of the final. I'll change the version number as I add problem from now until the end of classes. If I just fix a typo or add some qualifications, I'll just increment the decimal part of the version number (e.g. 1.0 becomes 1.1 if I make a little change).

Final v 1.0 PDF Latex Source Cover sheet PDF Latex Source

I'll also go over the instructions and the self-evaluation Cover sheet.

## Day 20

We'll first close the loop on some items started last time:

• More on conditional expectations
• Notion of a submartingale
• Doob's maximal inequality
• the weak type inequality
• the strong type inequality

I'll then give a new proof (never published) of Hardy's inequality. The punch line here is that you can construct martingales that will give you insight in to something as fundamental as a sequence of real numbers.

## Day 19

First some good news: No more homeworks.

Second --- I will soon start posting problems that will add up to the final exam.

I'll post the first slice on Wednesday. Ultimately, there will be 15-20 problems that really amount to "more creative, more integrative" homeworks spread over a month. I'll go over all the "rules" on Wednesday.

The rest of the plan:

• I forgot to do the "DCT" for UI sequences last time, so we'll do it today.
• I'll then do the core of martingale theory
• Doob's martingale transform theorem ("Law of Conservation of Fairness")
• Notion of a stopping time and Doob's stopping time theorem
• The up-crossing inequality
• The martingale convergence theorem

Now that I write this out, it seems a bit much --- especially as we need to sprinkle in a selection of examples. Life is short, but that does not mean that we should rush through it. What does not get done today, can be done later.

### Inventory of Future Items

• I owe you a chf proof of the converse to Kolmogorov's one series theorem. It is quite instructive to compare this proof to Varahan's "pure thought" proof that we did earlier.
• We will prove McLeish's CLT for dependent random variables and see that it gives us a CLT for martingales
• We will consider the basics of Stein's method for the CLT
• We'll use the Goldstein-Reinnert Zero-Bias transformation to show how one can extract honest CLTs from the Stein equation.
• Gain some insight into the role of the Zero-Bais transformation in the stunning proof of Tyurin (2010) of the Berry-Essen theorem with C<0.5.
• I'll start to introduce the very cool stochastic processes that we will study in 531 (Brownian motion, birth-death processes, diffusions, spatial point process --- like the Poisson, etc.)
• We'll fill in the cracks with more information about special random variables (e.g. symmetric or exchangeable random variables.)
• I'll prove at least one version of Talagrand's convex distance concentration inequality, and more generally we'll discuss concentration inequalities --- results that have transformed probability theory and its applications over the last ten years.

There are just seven more lectures after today.

## Day 18

Here is the plain vanilla plan:

• Warm up problem: Every integrable random variable is "more than just integrable"
• Notion of uniform integrability and the adult-strength "DCT"
• Complete the discussion of conditional expectation with respect to a sigma-field.
• Jensen's inequality for conditional expectation
• Uniform integrability for the family of conditional expectations
• Martingales --- Defined and Illustrated
• My Three Favorite Martingales
• The Classic applications of those martingales: Ruin problems and times.
• After this? We'll head back into characteristic function land to prove McLeish's CLT which in turn will give us a CLT for martingales.

## Day 17

The main business will be to discuss Lindeberg's Method, as opposed to his condition or his theorem.

Other items?

• The warm-up problem concerns symmetry and a polynomial studied by Euler some 200 years ago.
• We'll then discuss the kinds of data that "determine" the distribution of a random variable.
• This puts us in position to engage Lindeberg's method. I'll also make a pitch for using symbolic pictures (diagrams) to help one visualize a proof. This also has a "symmetry story."
• After getting a "poor man's" Berry-Essen theorem, I'll discuss without proof the adult strength Berry-Essen (with the 2010 constant).
• Any remaining time will go to the service of conditional expectation and the beginnings of martingale theory.
• We'll take the time to look at some very interesting conditional expectations and to ponder the power of projections onto subspaces.
• We're not done with characteristic functions, but the next place we apply them will be to martingales.

Homework 10. The Last of the Mohicans. Latex PDF

## Weekend HW 9 Notes:

On question one, I didn't intend anything fancy. You may assume that all of the expectations are finite.

On question three, to get started on the right foot, you should note that if X is a random variable that just takes on non-negative integer values, then its CHF is just a polynomial in exp(it), the polynomial has non-negative coefficients, and these coefficients have an immediate interpretation in terms of X.

## Day 16

This day has Lindeberg written all over it. We'll first write down the famous "Lindeberg Condition" and extract some elementary consequences. We'll then look more at a generic characteristic function to see how the kind of information offered by the Lindeberg condition may help.

We'll then prove Lindeberg's CLT for non-identically distributed sequences of independent random variables. We'll comment on --- but not prove -- the Feller-Levy converse.

As time permits we'll look at some more specialized problems just to put the ideas in your head. One of these is the Khintchine inequality for Rademacher random variables. I'll make a general pitch for the detailed study of two-valued random variables (usually -1 and 1, or 0 and 1, but sometimes -a and b). These considerations lead us to the beautiful new subject of "influence".

We will also start a discussion of conditional expectation. We can't go much longer without introducing martingales --- one of the real "money balls" of our course.

Reminder: HW 9 Due Monday November 5 --- PDF ---- Latex Source It's short and sweet.

## Day 15 --- Oops, It Got Cancelled

Still, I would not want you to be left homeworkless. These problems are "unusual" --- they are more focused on problem solving than on the mastery of basic techniques. We'll get back to basics after the water has cleared.

HW 9 Due Monday November 5 --- PDF ---- Latex Source

See you Wednesday --- Halloween!

PS Needless to say, due to Sandy, HW8 is due on Wednesday, October 31.

## Day 14

How is today different than any other day? Today we prove the CLT!

You know the CLT from other courses, but this is the honest, adult version where we just assume IID with second moments --- and no more.

Our proof is based on characteristic functions (Fourier transforms). We have most of the tools in hand, but we will develop a few facts before the final argument.

But take that the word "final" under advisement!

We'll prove many versions of the CLT before we are done. The Fourier approach is important for many reasons, but at the end of the day you may decide that the CLT is "true" for other reasons. The issues deserves to be engaged over time.

The CLT is a GRaVy problem, par excellence. Every week new CLTs are published; I expect to submit a couple before Christmas (and to see them published before next Christmas).

Naturally we'll also have our daily "challenge problem" --- and there will be some more minor observations.

New Homework: HW8 pdf and latex. This is due Monday October 29.

## Weekend Notes (10/20)

HW Up-Date:

Please refresh your pdf; there are eight problems. Number 8 is easy but it is the promised practice with the pi-lambda theorem.

Minor Note:

There is a nice piece on subsequences and compactness on the Trickipedia. While you are there, wander around. Although it has not grown as rapidly as I had hoped, the Trickipedia still has many fine articles. Some are quite easy for even calculus students to follow. Some may assume more background in some area than you have --- that's fine. Just explore and enjoy. These are honest articles written by experienced mathematicians who want to help you become more effective problem solver. What could be nicer?

## Day 13

I don't think people really "got" the Levy continuity theorem in the last lecture, and I think I know why --- I jumped into the full theorem, and it just had too many moving parts. This created a kind of mental "buffer overflow".

I'll back up a little today and show those parts individually. In particular, we will look at a classic example of the dance: "compactness"+"determining condition"="convergence result". Anytime you want to show something converges, this is one of the ideas to have in mind. I always start with this plan since it is so often the quickest.

We'll also add to our characteristic function tool box. In particular we'll get the results that relate moments of the random variables to smoothness of the characteristic functions.

Naturally there will be a warm-up question from the series that asks "Is it a characteristic function?"

## Day 12

Homework 7 (pdf and latex source). Solutions are due Wednesday October 24 ... There is no class Monday October 22 because of Fall break (a strange --- relatively modern --- undergraduate innovation. It's really for Mom and Dad to check in on how the new college boy or girl is doing.)

There are two "big theorems" today:

• The Helly Selection Theorem (or the compactness theorem for distribution)
• Levy's Continuity Theorem.

There are also two hugely general principles in the "cloud":

• The behavior of the characteristic function at zero "controls" the tail probabilities of X, and vice-versa.
• Subsequence Argument+Uniqueness Argument=Convergence Argument.

Naturally we put these in the cloud because there are almost uncountably many interpretations of these principles. They guide a huge amount of mathematics.

Along the way we'll also add to our catalog of characteristic functions and our catalog of the general properties of characteristic functions. The warm up question for the day will be: "Is cos(x^2) a characteristic function?".

## HW6 Note

HW 6 version 1 is Due October 15. I have added two problems, so this is the final version. This time you must do your solutions in Latex. If you start with my source Latex you will have the problems already typed, have a page design, etc.

## Day Eleven

We will continue with our discussion of "transforms" --- although the characteristic function (or Fourier transform) will get all most all of our attention. Here are the general topics:

• Examples of characteristic functions and Laplace transforms (Better to think of these as building blocks rather than examples. An example you may see once --- but these guys you'll see everyday).
• Levy's inversion formula. This is our "big theorem" for the day. If we do it honestly and clearly, we've earned our keep.
• Other news you can use ... and which we'll develop shortly:
• Characteristic functions and smoothness
• Characteristic functions and moments
• Special role of the neighborhood of zero and the "tightness inequality."
• Helly-Bray Lemma and compactness in the space of distributions.
• Starting today we will make distributions part of our toolkit. In particular, we'll start talking about the convergence of distributions.

We will use characteristic functions to prove the CLT, but the technology of characteristic functions buys you a lot more than just a proof of the CLT. In fact, non-characteristic functions of the CLT may be "better"

The great thing about characteristic functions (and other transforms) is that they give you an alternative way to think about almost any problem where distributions are important. That's a lot of problems!

## Day Ten

With some big theorems behind us (SLLN and Kolmogorov's 1-Series Theorem we have more of a sampler:

• More discussion of maximal inequalities, including a discussion of the reflection principle of SRW as a maximal inequality --- and a note on when constants matter --- and when they don't.
• Varadhan's proof of the converse of the Kolmogorov 1-series theorem. This theorem is traditionally proved with characteristic functions. It is interesting to see how Varadhan make a variance 'visible".
• The weak law of large numbers and the Weierstrass approximation theorem. This is a famous example of the application of probability theory to analysis. Later we may use an analogous trick to prove the Post-Widder inversion formula for the Laplace transform.
• Initial discussion of "transforms" --- Fourier, Laplace, and others, including the humble power series.

## Day Nine

The first action will be to show that the SLLN follows rather quickly from the Kolmogorov "One Series Theorem."

We will then take a little excursion that is seldom taken at the 530-level. We'll prove another maximal inequality --- one due to Hopf. This inequality is often thought of as part of ergodic theory, but that is a bum rap.

It is a very general inequality, but it should not be punished for being good.

In fact, I will show that the Hopf maximal inequality gives us the very shortest path to the SLLN --- shorter than Etemadi, shorter than Kolmogorov.

This will be our third proof of the SLLN. We'll meet one more when we do martingales. Proving the SLLN is a great way to see how your tools work together.

Brevity is not all that matters. Everyone is better off knowing about Etemadi's truncation, and life without Kolmogorov's maximal inequality is unimaginable. Still, I would argue that the proof of the SLLN via the Hopf maximal inequality is also a highly principled proof.

It tells us about the "structure" of the SLLN and it suggest lots of other jobs we can do. It is "news you can use."

Since these views are not in any books, I am writing a little piece on this. What I have now is just a draft but in Section 2 I give a complete (hopefully beautiful) proof of the Hopf inequality. It's not different from Garsia's proof in any essential way, except nobody can remember Garsia's proof and no one can forget this one.

I'll post the occasional change to this manuscript so you can see how such things evolve. I'll probably ship it off some place in January, after it has grown and mellowed a bit. If you are motivated to make a nice picture to illustrate the proof of the Hopf Lemma, you'll get an acknowledgement in the paper (Whoopee!)

BTW, I encourage you to write "expositions" about anything that interests you. You don't have to think about publication --- but you can. The main benefit comes from explaining something very carefully to yourself.

## Day Eight

Homework 5 --- It's now now live. HW5 is due Monday Oct 8. (I fixed a small typo in problem 1 on 10/1 at 1:17pm)

Here's the play book for Monday

• Kolmogorov's Maximal Inequality
• Idea of a stopping time (MAJOR IDEA)
• Idea of a maximal inequality (MAJOR IDEA)
• Organization of a proof
• Kolmogorov's Truncation --- How to guess it, How to use it.
• Kolmogorov's "One Series" Theorem
• Kronecker's Lemma
• Second Proof of the SLLN for IID RVs with finite mean.

Why Maximal Inequalities?

To a great many analysts, maximal functions and inequalities of like the Kolmogorov maximal inequality are the "heart of the enterprise." If you've had a second course in real analysis, you've at least heard that Hardy's maximal inequality is the key to differentiation theory. If you've had a solid course in harmonic analysis you've at least heard that given Caleson's maximal inequality, the almost sure theorem for L^2 Fourier series is almost immediate.

If you don't know about these things, don't worry about them. I mention them only to underscore that in any conversation about almost sure convergence, the presence of a maximal inequality is almost inevitable. True, we did prove the SLLN without using a maximal inequality, but --- it is there --- and had been put there many years earlier. There is even a theorem by Sawyer (Annals of Math, 1966) that says, USH, that an almost sure convergence theorem implies the existence of a maximal inequality.

Why Integral Representations?

The proof I'll give of Kronecker's lemma is one I learned from Kallenberg's book. It is both beautiful and principled. Part of the principle in play here is that one should look for an integral representation --- even if there is no immediate need for one --- or even a place for one!

Why does this work? Part of the beauty is that there is some residual mystery to why integral representations are so powerful. I have a story to tell about this and ... the alphabet.

Cauchy Criterion?!

I'll also wax on about the brilliance of the Cauchy criterion for convergence. I'll bet there is something to this definition that has escaped your eye.

## Day Seven

For a warm-up and generic illustration, I'll first prove some simple SLLNs under "excessive" moment conditions. Our argument has the virtue of being extremely robust. It applies to much more than sums of independent random variables.

We'll prove the L^1 strong law of large numbers --- This is a fully professional, adult-strength theorem.

Our proof uses Etemadi's observation (circa 1974) that for non-negative random variables, it suffices to prove the SLLN along every exponentially gowning subsequence. Basically we just follow our nose and remember the L^1 exponential truncation lemmas from the previous class. Doing the SLLN is an honest day's work, but it won't take us nearly that long.

While this proof of the SLLN is clearly in our heads, we'll take on a little "proof mining", It's good to get in the habit of doing this anytime you learn a significant new proof --- it's even a good idea with "insignificant" new proofs!

We'll also invest some time in useful minor topics. In particular, we'll look more at record times, derangements, and other interesting objects. BTW, the notion of an "interesting object" is a useful one from the perspective of "research strategy" --- in a way it was a favorite strategy of David Hilbert, though you only find that out by reading some really old papers that are far from probability theory.

We may even start musing about maximal inequalities, the more traditional workhorse for proving strong laws, but I don't want to rush this very important topic.

## Day Six

Homework No. 4 Due Monday October 1. (We're on a Monday to Monday schedule except for the bizarre Fall break day of October 22 and the traditional "Turkey Pardoning" weekend that gives November 21 a pass).

The plan for the day is to engage the various Laws of Large Numbers. We'll do a semi-professional weak law first to get in touch with our "truncation side." We'll then do a few decidedly amateurish strong laws, just to make sure that everyone understands the basic tricks of truncation, interpolation, and tail bounding. Think of this the world you would inhabit if if all you knew was Chebyshev's inequality and the Borel Cantelli Lemma I.

We'll then start taking a little more sophisticated look at what one can extract from knowing that a random variable has a finite mean. We'll look at "slick" methods for relating tail probabilities and expectations, and we'll look at "robust" methods for doing the same. Both views have a place in your tool kit.

Some practice anticipates the computations in a Tao style proof of the SLLN. We'll do that proof on Wednesday.

We'll also explore some general meta-themes, like "proof mining". This is an idea that "pays the rent." I may not get to GRaVy but you can read about it below.

As always, there are nice things to discuss from the minor topics inventory. We'll catch these when we have a few extra minutes, but you can always read about them on your own.

Main Topics --- Inventory:

• Weak LLN for IID RVs with finite mean (truncation method)
• Moments and Tails (done slickly or robustly)
• Easy SLLNs using moments and interpolation (non-sophisticated versions)
• A big benefit of non-negative summands --- ultra thin interpolation.
• "Wald's Lemma" (version 1.0)
• Notion of a stopping time: A CENTRAL TOOL of 530.

Minor Topics --- Inventory:

• Record times and other Bernoulli variables in combinatorics
• The inclusion-exclusion principle (more sophisticated than you'd guess)
• Derangements (and a class of generic problems)
• Renyi's 0-1 Law, Bonferroni, and the Sieves (of Brunn and others).

Snag a free (author provided, legitimate) PDF of T. Tao's measure theory book. I recommend reading the problem solving strategies (page 210, et. seq.) These are truly great. The only thing I'd ask you to keep in mind is that in 530 we are not so concerned about measure spaces like R^n as we are about more abstract (easier!) spaces. Thus, almost all of of the book is tangential to the needs of 530 --- but the the problem solving strategies are still priceless.

GRaVy:

This stands for "Generalizations", "Refinements" and "Variations" and this one word represents the way that 80% of day-to-day mathematics (and mathematical science, including statistics and computer science) gets created. The paradigm needs no modification in mathematical statistics, and the story for applied statistics requires only small modification.

The GRaVy model fits directly with the P to P+1 model. That is one reason why it is so wildly successful (or at least fecund --- but I'd even argue successful).

Of course it is interesting to speculate about the other 20% --- if indeed the residual percentage is that large. What might be in the 20%?

• Greenfield Projects --- projects that someone just invents. These are not common, but they happen regularly enough to get a name. Perhaps 5% of my papers have been greenfield projects, and I suspect that is high for the profession.
• P+Q=R. You have a paper P that mainly lives in one field (say statistics) and you have another paper Q that mainly lives in another field (say computational linguistics) and you work to see how P and Q interact to produce something that was not in either P or Q. This is a very powerful paradigm and the favorite of many statisticians and applied mathematicians. I've used it quite regularly. The plan even works within a field: P --- a paper in non-parametrics, Q --- a paper in graphical models, then R is ... well, it's something new and potentially interesting.

I've thought of perhaps a total of ten candidates for the genre list, but after just these five the opportunities for nice classification become less clear. My best shot at a sixth genre is the

• synthesis, or even just the survey.

In such a paper, one takes a great number of results and organizes them with some coherence --- and careful scholarship. It is a huge service when someone does this, and it happens far too rarely. Ironically, surveys and syntheses often turn out to be very highly cited papers, so it's totally false that "you don't get any credit" for doing a survey paper. I'd encourage everyone ON THE PLANET to write a nice survey paper this year!

A seventh genre I've definitely decided not to include is the ...

• Pollution Piece.

This genre includes work that makes a lot of unjustified (or poorly justified) claims. This kind of work slows down a field because the people who would prove the honest theorems won't invest effort in an activity where others claim to have done the work already. If you are a referee of a suspected Pollution Piece, I recommend that you reject it with extreme prejudice. If the claims can be backed up, the piece will appear later and in a much better form, so no harm is ever done.

Let me anticipate some flack here: Lots of "great work" was --- in retrospect --- from the Pollution Piece genre. This can happen when work exceeds the bounds of the foundations for that work. The favorite example for a probabilist would be much of the work of Paul Levy.

See if you concur: (a) this was great work, (b) it was right but not fully justified, (c) it was kept at a distance for a long time --- basically most of Levy's life. Only near (or after) the end of Levy's career did people like Doob and Hunt do the work needed to put everything on a firm foundation. Even then they worked in the shadow of others saying they were just dotting i's and crossing t's. Fortunately, they had the skill and integrity to do the right thing.

Oh, while agreeing that there is some great work in the Polluted Piece genre, let's keep in mind the old "exceptions and rules" issue. The vast majority of work in the Polluted Piece genre is just pollution --- assertions that more scholarly folks understood perfectly well but chose not to publish because they could not honestly "close the loop."

You see, after the first six genres things become very messy.

Also, few puppies are pure breeds. Right now I am working on a paper that I could regard as total greenfielding; it's a new model for a new phenomenon --- or a totally generic P to P+1; it modifies a known model in a way that is simple (but violent!).

## Day Five

Please consider how you did on HW2. If you have done poorly on HW1 and HW2 please take to heart what I have said and written about the required background for the course. It is not fun for people to attempt a class without honestly having the course prerequisites.

Don't lose track of the add and drop dates. A look at HW3 will show that HW1 and HW2 were just warm-ups.

Homework No.3. Due September 24

Big Picture

The main theme for today is concentration, especially concentration inequalities and their applications to theorems like the strong law of large numbers. Another basic theme is truncation. With these two tools one can prove many worthy results.

Concentration inequalities will never be far out of mind for the rest of our course. They make short work of many natural problems.

Main Topics --- Inventory:

• Warm-up. Concentration for Sums of Independent Rademachers.
• Hoeffding's Lemma and Inequality (one of several versions)
• Convexity and Linearity Play Well Together
• Convexity and "Extreme Thoughts"
• SLLN for IID Bounded RVs

Continue with Chapter 2 of Durrett. By now you should have read through about page 73. As you read, don't skip over the exercises. You don't have to solve them all (unless you want to) but you should ponder them and play with the ones that interest you. Most of Rick's exercises have big hints that make the problems pretty easy.

As I remind myself of what Rick has written, I note that he has a little tendency to prove a more general result and then collect simpler (more natural!) results as corollaries. For for example, he does the WLLN for triangular arrays before he does it for IID sequences.

I'd coach you to go the other way.

State the simplest version of the "theorem" that makes sense and prove that version first. Then look at your proof to see if you can extract from your proof: (1) a more general or (2) more precise version of your theorem.

I'd speculate that most "general theorems" are just simple theorems followed by some conscientious "proof mining". It's not a bold claim to assert that is the way that the world came to most of its collection of general results. Hilbert said that the path to mathematical discovery was through those examples that carry the germ of generality.

If you are interested in making your own contributions, I'd say that the "simple" to "general" path is the only way to go. The only caveat is that the words "simple" and "general" have many layers of interpretations. At this point --- why not take the simplest!

Look at the look at Tao's note on concentration. Tao's view of the Hoeffding inequalities was very instructive to me. I had Hoeffding in my head as a "convexity result" and I was honestly surprised to see that a useful version could be had by a crude Taylor series method. This is obvious once seen, but I never asked myself the question: "Can I just get by with an expansion?" By golly, the next time I see some convexity proof, I will ask myself that question.

The great benefit of a crude method is that it tends to be very robust. Tao's approach to Hoeffding makes inevitable the inequalities of Bennett and Bernstein (coming soon to a class room near you).

PS: There may not be any real ordering of ideas from complex to simple. It's certainly open to discussion whether a convexity argument is more fundamental than a Taylor series argument. Much of the time I'd side with convexity as being the simpler.As they say in the negotiations classes: "Why not both?" The two methods have robustness of differing kinds, so "proof mining" can lead from them in different directions.

An Open Problem?

It's not common to mention an open problem on day five, but over the weekend, I learned a nice one from Yuval Perez. It is easy to describe. If you solve it you get an A+ in the course and exemption from all homeworks. (Some modest conditions apply).

## Day Four

We begin by discussing the incoming Homework 2 and we may kick around some ideas about how to get the most out of effort you spend on the homework. Please be sure to READ the comments on HW1 that are given below today's plan. Especially note the drop and add dates.

Homework No.3. Due September 24

There are seven problems on the new homework. I wrote originally eight, but I deleted one that turned fishy. Some of these problems may be challenging. They all illustrate basic techniques. Don't forget that the reading is an important part of the homework, but Chapter 2 is long. We are not following Durrett lock-step, but you can tell when the chapter gets passed us.

Topics of the Day:

Here is our shopping list of topics. We'll get many done today, although we are unlikely to cover them all.

1. Inequalities of Holder and Jensen (This completes our "review")
2. The Second Borel Cantelli Lemma (just assuming pairwise independence)
3. Introduction to Dynkin's Pi-Lambda Theorem (a small taste for now)
4. Independence: More than just pairwise (where you can use Pi-Lambda)
5. Weak and Strong Laws for Coin Tosses (boosting Markov, bounding tails)
6. Another Baby Strong Law of Large Numbers (SLLN)
7. Concentration Inequalities: Keys to the Kingdom

In traditional courses in probability theory, concentration inequalities are are mentioned only late in the course. In this respect, Durrett's treatment largely reflects tradition.

There is a more modern approach that gets you to the action right away.

Almost from the beginning we'll keep an eye on concentration inequalities. They are central to the modern applications of probability theory, and, for many purposes, they effectively supplant traditional "limit theorems." We'll certainly see that they give us many limit theorems very handily. Even when they fall short, there are several standard patterns to bridge the gap between what they give and what we need.

If you want a "peak ahead" to material that starts were we are --- but which goes very quickly almost to the frontier ---- look at the elegant class note by Tao on concentration. You'll be able to follow the first third or so of the note without any problem. We'll eventually cover the whole note, but we won't be in any rush about it.

DDC: Dynkin Donut Condition. Don't let me forget to explain!

Epsilon of Room: I mentioned this great theme when I gave our (long winded) proof of the monotone convergence theorem. The Tickipedia has an inspirational list of generic ways to give yourself an epsilon of room. This list offers the possibility of saving you a month of time and effort on some part of your thesis! I'm taping it to my bathroom mirror. Note: This article is a little sophisticated; its charm may depend on the number of times you have been trapped in some wizard's force field where one of these magic tokens would get you out.

The vast majority of the students in the class did as expected on HW1. Specifically, they did a perfect --- or nearly perfect --- job.

A few students (actually eight students) did quite poorly, some getting just 2 points. If you are among this eight, you should understand that it is very likely that you do not have the background needed to succeed in this class.

If 530 is not appropriate for you at this time, you can always take it later when you have a better background in analysis. Also, if you must take some probability now, then there are alternatives to 530 that may suit your needs, such as Stat 510, OPIM 930, and other courses in other schools. Or, you may just prefer to take statistical computing, mathematical statistics, optimization, computer science, etc. There is no shortage of great courses to take.

It is dramatically easier to switch courses earlier rather than later, and if one waits too long it can be a disaster:

• Wharton's selection period (i.e. the add period) ENDs Sept 21 (Check in you school, etc. for special rules)
• Wharton drop Period ENDs Oct 12.

You can find an enjoyable course in which you can excel. You delay your move at your own risk. You should act now!

## Day Three

One of the pleasing consequences of the MCT is the additivity of expectations. This is the property of Lebesgue's definition that one might not have expected from its definition as a supremum. In the course of proving the additivity of the expectation with the MCT we'll also write down an almost canonical representation of a non-negative random as a monotone limit of simple random variables. This is worth remembering.

Next few topics (perhaps a bit more than one day's worth):

• Additivity of Expectation (an MCT benefit)
• Method of Indicators
• EN for integer valued N
• Boole's inequality
• Inclusion-Exclusion Identity
• Renyi's 0-1 law
• Inequalities of Markov and Chebyshev
• Borel Cantelli Lemma I
• Pairwise Independent Events
• Borel Cantelli Lemma II
• A "Wald Lemma" -- for fixed mean, Non-negative (possibly dependent) random variables, and sum size N that is independent of the summands (an "almost new" --- but still easy --- result... probably for next time.)

Assorted elementary facts about "measurable" sets and functions may be collected along the way. For example, why is the sum of two random variables a random variable? Why is the limit of f random variable a random variable? We may also think about ways to show that a random variable has a finite expectation: Fatou and "Tail Bounding" are the main tools.

Someday soon will look in more detail at the famous "Cantor Random Variable"; it gives us an example of a random variable without a density and without a probability mass function. It shows why one must let go of the way that expectations are defined in elementary courses.

More Comments on HW Solutions: Sloppy presentation is simply NOT ACCEPTABLE. If your presentation is not professionally executed, it is almost impossible to tell if your arguments are correct. Bizarrely enough, it's just like Mother said: "Neatness counts!" When the time comes to write real (publishable!) papers, you'll find huge benefits to knowing how to write your proofs clearly and honestly.

## Note: 7:45pm Wednesday (Sept. 5)

Yinglong Guo pointed out an improved statement of problem 4 in HW2. We want the first time we do better than the offer we got at time 1. It's a small fix, but it's made.

## Day Two

We'll complete the discussion of the MCT, Fatou, and DCT.

When we are done with the MCT, Fatou, and the DCT, we start to engage two themes that will be with us for the rest of the course: (a) modes of convergence and (b) independence.

You should complete your reading of Chapter 1 of Durrett (except the material on product measures). You should start reading Chapter 2. This is material that we will develop over the next several lectures.

Here are some comments that you should take into account in subsequent homework.

1. If you get a good "hint" from a book or the web, that is fine. Still write out your proof or solution from "first principles" (or whatever level is appropriate at the time) --- but add a line like: "This argument is based on a similar argument in Rudin, Principles of Analysis, page 52."

2. Please elevate your own levels of "self criticism." After you have written out a proof, look at it critically. Did you make use of the hypotheses? Did you "accidentally" prove more than is really true? What part of your argument is most unclear? Can you improve it?

3. See if you said "obviously" some place. This is usually a mistake. If something is really obvious, we just say it without comment. When we explicitly assert that something is obvious we are often trying to cover up for a guilty conscience. Similarly, if something is "easy," just do it. Don't be a bully. Don't try to make someone else do the work by asserting that "it is easy."

4. Is your solution ugly? Sometimes this is unavoidable, but if you elevate the esthetic demands that you place on your work, you will find the experience more enjoyable and more instructive. It is rare that an ugly proof is the right proof" -- even if it is a correct proof. I will not knowingly assign a problem that only has an ugly solution. Most problems are chosen because they have beautiful solutions. There may be the occasional piece of "grunt work" but I'll try to keep it to a minimum.

For Homework 2, please give yourself enough time to do a good write-up. Also, expect that on some problems you may get stuck. This is a good thing. All real learning takes place as one struggles to get "unstuck." Also, do not expect your first idea to work all the time --- or even most of the time. A good problem typically requires that you "overcome some objection" --- i.e. you have an idea, you find that it does not quite work, then you see how with more thought you can make it work.

Homework 2 is due at class time on Monday September 17.

## Note: Saturday 9/8 6:30pm

Dan McCarthy notes that Problem 2 had an (innocent?) typo that I have fixed. There was an errant S_1 but S_\epsilon was needed. If you did the S_1 version, you'll get full credit. Still, you should note why the two cases are virtually identical. Perhaps you can even spy a nice generalization?

## Note: Thursday 9/6 at 9:20am.

Peichao Peng found a typo in Problem 4 of HW1. It has now been fixed.

## Day One

I will provide an introduction to our course and then get down to the real work without missing a beat. There is a formal syllabus, but you'll get a much clearer view of our plan from the "mind map" given in class.

This course is about Probability Theory. It is designed to provide a professional foundation for people who will be using probability on a daily (or at least weekly) basis for the rest of their lives. It is all about theorems, proofs, and the solving of theoretical problems. Naturally, the theory has many practical applications, but this course is for people who have a need to master the mathematical core of the subject, especially those people who need a rigorous understating of the laws of large numbers, the central limit theorems, the applications of characteristic functions, and the theory of martingales. More topics are listed in the formal syllabus, but we will stick very closely to our key mission.

Topics for Day One

• The Probabilist's Trinity (and why it actually matters)
• Lebesgue's Miraculous Conception: or why I didn't invent his Integral
• Principled (but not slick) proof of MCT (This was not completed).
• Reading: Chapter 1 of Durrett

Homework No. 1 Due Monday September 10. BTW, if you think you've found a typo in a homework, send me an email, and I will post corrections as needed. This is also reminder to check the web page to look for up-dates, corrections, etc.

Day One Questionnaire: Due Monday September 10.

Template for use by those students who will be latexing their homework. If you are a neat writer, you do not have to latex your homework, though in my experience latexed homeworks are almost always better organize and have fewer mistakes. Thus, I encourage you to latex your homework, but it's not a problem if you do the first few 'by hand." Your solutions to the final exam MUST be in latex, so you don't want to put off acquiring --- and improving --- your latex skills.

### Homework Assignments In General

Homework assignments are typically due on Monday. The assignments are the core of the course. By and large, you should do the assignments individually, but conversation about the assignments is encouraged --- just don't forget that it is MUCH easier for any two people to solve a problem working together than it is to solve the problem working alone. For multiple reasons, I do not distribute solutions to the problems. If a problem is unsolved (or incorrectly solved) by a substantial fraction of students (e.g. 20%) then I will discuss the solution in class.

### Classroom Courtesy

Please be attentive to timeliness. I won't make a big deal out of tardiness if it is a one-time event, but if you are late more than once (or if you are ever substantially late), please send me an email with the reason.

Please no food. It is OK to bring Coffee, water, etc.

Please: no open laptops. I know open laptops are accepted in Engineering, but they distract me too much. Incidentally, I have written a little essay on this topic. It's not perfect, but it captures some of my reasoning and it may provide a basis for thoughtful engagement.

Please: Do your best not to fall into an IM or iPhone comma. If you must "look" or "respond" please keep it to 30 seconds.

Homework is an essential part of our course. When it is due; it is due. Hand in what you've got. Please provide neat and thoughtful presentations of your work. See the comments on latex above.

### Web Wonders --- and Karma

The web is now a fundamental part of human knowledge --- one could argue that it is the most fundamental part, Borg-like though that thought may be.

As a consequence, we would all just be talking silly-talk to suggest that you do not have the right to use the web in any way that your heart may desire.

As a practical matter, you may find the solution to a homework problem (or even an exam problem) on the web. If you find such information, just cite it completely and give me proper reassurance that you completely understand what you have found. You will get full credit. If you make such a discovery, please let me know ASAP. This deserves extra credit. I'll often want to post something about this on the blog.

As a courtesy, I'd ask that you not post solutions to problems on the web. There are plenty of sites where you can post anonymously, so I will never know who has done the posting. The only penalty for posting is the bad karma, but bad karma has a way of mattering. Unauthorized posting costs me time and costs future students access to high quality problems that I have edited, vetted, polished, and prepped.

### Teaching Assistant

Our course TA is Yang Jiang. Yang is responsible for grading your homework, and he can provide you with reasonable amounts of general coaching. He can provide modest hints on homework problems, but he cannot do your homework problems for you. Please see his website for his email contact information, office hours, and comments (but not solutions!) he may have posted about the homeworks (e.g. typos, common misunderstandings about the questions, suggestions about things to read, etc.)