A Course for Ph. D. Students

Course Blog Fall 2014

Day 16: More on Moments and Distributions

We'll first have a warm up problem that illustrates (a) the use of Parceval's formula and (b) the principle of "smooth truncation." It also adds one more layer of knowledge about the Dirichlet integral.

As a second warm-up, we'll develop the CHF for the double exponential and the Cauchy. These will be used as examples later in the hour.

The main task is to complete the discussion of the "moment problem," e.g. when do (or don't) the moments of a random variable determine its distribution. The trick is to use infinitely many Taylor expansions.

We then have a shopping list of possibilities which will stretch into next week:

  • Relationships between matchings in a graph and normal moments
  • Geometry Cauchy's Distribution (random pistol range problem)
  • Introduction to Stable Laws
  • A problem solved by Evans and Zhou using Polya's theorem
  • CHFs and uniform integrability.

In due course, we'll also explore Stein's method for the CLT. The essay by Chaterjee has much more than we will ever consider, but it gives a sense of the modern development of Stein's method. Take a peak if you are feeling ambitious.

Laplace Approximation

The Wikipedia article on the Laplace approximation is pretty good. In particular, the proof that is given there is honest and complete. What the Wiki covers is just the classic case, which does not quite include the Wallis integral, but once you know how to do one of these cases they are all very much the same.

Day 15: Using the CLT and CHF Technology

This is a problem oriented day where we will look at an eclectic set of questions and their solutions. One question involves a fixed point equation, and we solve it using the CLT. We then looked at a heuristic derivation of Parceval's Theorem, which gives us one of the most important identities in mathematics.

We then discussed what it means for a sequence characteristic functions just to converge in the neighborhood of zero. In the special case when the limit CHF is 1, this turns out to be enough. This fact can be used to prove sophisticated versions of the Weak Law of Large Numbers. Finally, we began the discussion of the determination of a distribution by its moments.

HW 9 (PDF Latex Source) is due until Monday October 27.

Logistic Distribution, CHFs, and Elo++

We'll shortly look a some more "exotic" distributions and their characteristic function theory. While playing with the logistic distribution, I hit upon an interesting paper on chess rating systems and their predictive power. If you are interested in the prediction of outcomes of contests, it is worth a look.

Mid-Term Rules

In an in-class exam, it is not "fair" to expect people to come up with the solution to an original problem. On the other hand, it is certainly fair to ask one to prove almost anything that has been proved in class. The "thing" probably can't be too "big" but any piece of the SLLN, L^1 0r L^2 maximal inequality, Levy's inversion formula, Levy's continuity theorem, or Lindbergh CLT should be fair game. One can add more to the list, but you get the point. Also definitions and theorem statements --- oh, yeah, there are all fair game.

Day 14: CLTs --- Basic and Lindeberg

This is a "money ball" day. There is nothing more central to probability theory than the central limit theorem. This is material that everyone should absolutely master, and we cover the ground carefully.

After reviewing the ingredients, we bake the cake. First it is the basic CLT --- an honest adult version where we assume that we have second moments and nothing more.

We then consider the Lindeberg CLT in two phases: (a) understanding the Lindeberg set up and the Lindeberg condition and (b) proving the Lindeberg CLT. Pleasantly enough, the proof follows the pattern that was used for the basic CLT.

If time permits, there are buckets of smaller topics on our plate: entropy, Parceval's formula, and the relation between moments of random variables and derivatives of Ch.fs --- perhaps including the solution of some recent HW problems.

Information Theoretical Entropy

It won't be for several days, but eventually we'll discuss the notion of entropy as it appears in information theory. As usual, the Wikipedia article is a little diffuse, but it is still worth a quick read. We'll only use entropy in one HW problem, but it is a brilliant entity that everyone should ponder to some extent.

Day 13. Applications of Levy's Continuity Theorem

We began with a warm-up problem that gave a basic inequality for the remainder in the Taylor approximation of e^{ix}. It is simple, but there are some subtitles to it. We then looked at what this inequality tells us for bounded random variables.

We then revisited Kolmogorov's one series theorem and contemplated its converse. This lead us relatively quickly to the famous Kolmogorov Three Series Theorem ---the K3S. Our main tool was the characteristic function for a bounded random variable. We looked at a "double triangle diagram" that captures the logic of the K3S. Such diagrams are a good way to get a long argument into ones head.

Homework 8: The PDF and the Latex Source. This is due on Monday October 20.

Just as a general heads-up, the final exam will be a take-home that will be due on Wednesday December 17 at Noon in my mail box (JMHH Suite 400, Department of Statistics). I will hand out the first tranche of the final before the Thanksgiving Holiday and I will then add problems until the last day of class, Monday December 8.

Day 12. Levy's Continuity Theorem

For the warm-up we looked at a simple example of an argument that one can awkwardly call --"uniqueness plus compactness gives convergence."

We then proved Levy's continuity theorem. The the basic ingredients were (a) the "continuity at zero implies tail bound" inequality and (b) Helly's selection theorem, which we proved last time. The proof of Levy's continuity theorem then followed the "uniqueness plus compactness gives convergence" dance.

Our first application of Levy's continuity theorem was the proof that all of the Polya type functions are actually characteristic functions. This result may look special, but it is very handy for answering many questions. It is especially useful for building examples.

Finally we discussed extreme points and Choquet's theorem. This is a rich but technical subject. Nevertheless, just knowing the basic ideas of the theory can add greatly to ones intuition. In fact, one often uses Choquet's theorem to guess what is true, but, once a conjecture has been framed, it is often easier to prove the conjecture by concrete means that rather than check the details needed to prove Choquet's theorem. Our algorithmic proof of Polya's theorem illustrated this idea.

Choquet's Theorem

Sometime in the next few lectures, we'll bump in to Choquet's theorem, a result that seems pretty obvious but which has some remarkable consequences. We won't cover Choquet's theorem in detail, but it is worth knowing about because it suggests many results that one can also prove "with bare hands." The Wikipedia article on Choquet's theorem assumes some familiarity with functional analysis, but it can still offer a useful introduction.

Day 11. Convergence in Distribution: Fourier View

We'll look at the notion of convergence in distribution of real valued random variables, and a few words will be tucked in on "weak" convergence of random variables that take values in a metric space.

The main theorem in our sights is Levy's continuity theorem which we'll do next time. We set this up by proving the Helly selection theorem. This involved

  • The notion of tightness and illustration by example
  • Recollection of the diagonal argument
  • The proof of Helly's theorem.

After getting Helly's selection theorem, we used the Levy inversion formula to find a density that has a "tent" for it's characteristic function. We'll also use this next time.

In due course, we will develop the smoothness and moment relationships, density characteristic function pairs, Polya's density, Polya type characteristic functions, WLLN, and Kolmogorov 3 Series theorem. We'll take one step at a time. This kind of activity will be with us for at least two more weeks.

HW 7 (and Latex source). Due Monday October 13.

Arzela-Ascoli Theorem

The Wikipedia article on the Arzela-Ascoli theorem is decent and reading this piece is good preparation for our proof of the continuity theorem for characteristic functions. You may find better discussions in standard textbooks, like Rudin.

A priori, the article on the Helly section theorem would be more relevant. Unfortunately, the venerable Wiki pays too much attention to the bounded variation case, and we just need the simpler monotone case.

Coaching on HW 6 Problem 1

This problem really can teach you something about problem solving and I hate to give a hint that is so large that it gives the problem away. Still, some people are finding themselves going in circles so I offer some coaching:

  • There is no need to reinvent the wheel. You can use the Paley-Zygmund inequality; you don't have to rederive it.
  • If you stare at PZ, you'll see that it suffices to show E[S_n^2] < 5( E[S_n])^2.
  • This means that you need to estimate E[S_n^2] in terms of E[S_n]^2. This should give you the focus you need to keep from going in circles.
  • If you need one more suggestion, don't forget the relations you know for Var[S_n]. These put some cancellations into the game that can help.

Day 10. Levy's Inversion Formula

We've discussed the Levy's inversion formula and we have developed the basic tool for its proof --- the finite form of Dirichlet's Discontinuous Integral. Today we'll lay out the proof of Levy's Formula and some corollaries. Further, we'll see how Levy's formula is part of "a theory", one of several that is embedded into what we call probability theory.

Other agenda items require a less sustained development, but they are important when the time comes to use characteristic functions. Specifically, we'll develop the relationships between characteristic functions and derivatives. We'll also look at other manifestation of the meta-principle that "the behavior of the ch.f. at 0 tells you about the behavior of the tail of the distribution, and vice-versa." This principle will be an important guide for the next few days.

Heads Up --- NNQ

There will be a No-Name-Quiz on Wednesday. Please review the most important results that we have covered in the last two weeks, and get them into short-term memory. They can never be part of long-term memory if they don't first make it to short-term memory!

The venerable Wikipedia does a decent biographical job for Dirichlet, but its discussion of Dirichlet's discontinuous integral only gets a B minus. Feel free to edit it.

On other business, there are two more proofs of the L^1 maximal inequality based on two lemmas of independent interest. These are the "leader lemma" and the "rising sun lemma". One of the reasons these interest me is that they are just facts about real numbers; it's hard to get closer to the bone.

Day 9. Characteristic Functions and Dirichlet's Integral

For the warm-up problems, we'll look at examples of characteristic functions and Laplace transforms. We then look at some general properties of characteristic functions including the uniform continuity property and the positive definite property.

Next we look at the integral of sin x over x. We will cherry pick from the scholium on sin(x)/x but there will be lots left for you to read. An important goal will be to give a clear and complete discussion of Dirichlet's Integral. You will want to digest this pretty quickly; it will be needed next class, and you don't want it to be a mystery.

As time permits, we'll discuss other properties of catechistic functions and look at some big picture issues. We'll be doing classical mathematics for a few days, and you want to make sure you can see the forest for the trees --- however lovely the trees may be.

Homework 6 is due on Monday October 6. If you have not been using Latex yet, it is time to bite the bullet. From this point onward, Latex is required. You can save a little typing if you start with my Latex source for the problems. Be sure to read the text through section 3.3. Not everything can be covered in class.

Homework Policies

We're well into the homework process now, but it seems that some reminders are needed about HW policies. You are strongly encouraged to do the HWs individually. At lease 80% should be done individually. If you collaborate with a fellow student on a problem or if you get a hint from a friend, you should acknowledge this by a written statement on your HW.

Scholium on Sin X over X

We will shortly need some information about the integral of sin(x)/x. We won't need a ton of information, but this topic offers a beautiful window on classical mathematics. You should take a peek if you have the time. I've written a brief and leisurely scholium on sin(x)/x, which has more than we need --- but you may still find it useful (even entertaining). Among other things, it gets the Laplace transform of sin (x) by five different methods.

Day 8. Proof of the L^1 Maximal Inequality

The warm-up problems will deal with the Laplace transform and the MGF. This is both to set up the work with transforms in Chapter 3 and to remind everyone about some basic calculus (e.g. the Gamma function, differentiation under the integral etc.)

The main task is to use first-step analysis to prove the maximal inequality that was introduced earlier. Naturally, we will follow the teaching note which I hope you have previewed.

As time permits we'll revisit the Paley-Zygmund argument and look at some of its variations. I have written some of this out in a note on lower bounds. We may not cover all of this, and we may revisit it from time to time.

Day 7. An L^1 Maximal Inequality and a Proof of the SLLN

We'll have a couple of warm-up problems that use the moment generating function. One of these gives us a version of Hoeffding's Lemma. This leads in turn to a powerful concentration inequality that makes the SLLN for bounded random variables "trivial".

We introduced another maximal inequality, the L^1 maximal inequality. This has notable benefits over Kolmogorov's L^2 maximal inequality and it gives a very direct proof of the SLLN. We did not prove the L^1 maximal inequality but we'll get it next time.

We then looked at some applications of technology of the weak law of large numbers. We gave Bernstein's proof of Weierstrass approximation and we started to look at the proof that the Laplace transform determines the function that was transformed. We did not complete the argument but we reviewed some of the needed facts about gamma family of densities.

HW 5 will be due on Monday, September 29.

HW 4 Note --- A Complete Graph of Inferences

In problem 4 of HW4, you will want to avoid use of the DCT if you want the benefit of a new proof of the DCT in problem 5. Here you can use the MCT, but perhaps you can even avoid the used of the MCT. Part of the point here is that you can essentially form a complete graph of the implications between the MCT, DCT, and Fatou. Any given book chooses and order, and starting with the MCT is most natural, but all of the 6 permutations can be executed.

Easiest Proof of SLLN?

We will shortly go over what I believe to be the easiest, and most direct proof of the SLLN. If you want to read this ahead of time, please check out the teaching note. Also, if you have any feedback about this note, it would be useful to me. I'd appreciate anything from typos to "here is where I get stuck".

Day 6. Adult Strength SLLN

We begin with a NO NAME quiz. You will be asked to write down the answers to three questions I that will write on the blackboard. Two of the questions will ask you to give full and correct statements of some lemmas, theorems, or facts that we have developed in class. One question is about calculus.

We'll go from the quiz to two warm-up problems that should put firmly in mind a couple of ideas that we need later in the class. We then prove the SLLN under the assumption of IID random variables and just a finite first moment.

This takes a sustained argument with three slices:

  • A DCT slice --- pretty trivial but requiring some knowledge.
  • A BC lemma slice --- again easy, but requiring a knowledge of a calculation and a lemma.
  • A Kolmogorov One Series Slice. Here we use V1.1, and we have to do a nice calculation to make it tick. The warm-up problems help us here.

This argument deserves to be mastered, and we'll take our time with it. If we do have time to spare, we'll revisit the proof of Levy's series theorem which we only sketched last time. We may also look at the highly flexible Paley-Zygmund argument.

Day 5. Series Theorems of Kolmogorov and Levy

We begin with a warm-up problem or two: The proof of Jensen's inequality is one of these.

We then put the Cauchy criterion into the language of random variables. In particular, we put the difference between convergence in probability and convergence with probability one into a tidy analytical box. Everyone needs to sort out the ways to express what it means for a sequence to be almost surely Cauchy, and everyone needs to see how this differs from the definition of a sequence being Cauchy in probability.

We use Kolmogorov's maximal inequality to prove Kolmogorov's One Series Theorem v1.0 and we note that the same argument gives us v1.1. We'll note how the Kronecker lemma then gives us the SLLN for IID random variables with a finite variance.

We'll then prove a curiously general maximal inequality due to Paul Levy, and we'll use this inequality to prove Levy's Series Theorem which says that for series of independent summands convergence in probability and convergence with probability one are equivalent.

Homework 4 is due Monday September 22.

Note: Some people did not do well on HW2 because they did not have deep enough experience with delta-epsilon proofs. This is not a deficit that one can make up with a little extra work; most people need a solid one-year course in analysis to succeed in 530. The demands on your analysis skill will start piling up very substantially, so, if you are not confident in your analysis skill, you should consider switching to auditor status.

9/11 Coaching Added

I added a tiny bit of Cauchy coaching to the end of the HW3 assignment. It may keep you from getting confused in one of the problems.

Day 4. More Techniques for the SLLN

We begin with three warm-up problems. They are motivated by what they suggest about problem solving. They also give us some facts that we'll need later.

We then give a proof of the SLLN for i.i.d. random variables with a finite variance. This is a "two trick" proof: (a) passing to a subsequence (b) using monotone interpolation to solve the original problem. We'll see many proofs of this theorem. This version is one of the simplest and most direct.

The task then becomes the proof of the SLLN under the most natural conditions where we only assume we have a finite first moment. We'll approach this by first considering infinite sums --- of real numbers and of random variables. This will lead us to the consideration of our first maximal inequality, Kolmogorov's "Weak Type L^2" maximal inequality.

You may need to review the Cauchy criterion. The venerable Wikipedia is a little lame this time but there is a useful discussion posted by an Oxford professor. It is not sophisticated but it is worth reading.

Day 3. First Look at Limit Laws

We'll revisit BCII and give a generalization. The proof illustrates two basic "tricks": The benefit of working with non-negative random variables and the juice one can get by "passing to subsequences." We'll see other versions of BCII as the course progresses.

We'll then prove a couple of easy versions of the Strong Law of Large Numbers. These results are not critical in and of themselves, but the techniques are very important. You can make a modest living with just the techniques that are covered today.

We'll may also prove a version of the Kolmogorov maximal inequality. The theory and applications of maximal inequalities is one of the big divides between elementary probability theory and graduate probability theory. Maximal inequities are not particularly hard, but they create a shift in the sophistication of the conversation.

Homework 3 is due on Monday, September 15.

Comment on Homework

There were too many people who did not do the first homework. This could have happened for a variety of reasons, but it should not happen in the future. Please read the policies on homework. I can guarantee you that you will learn more by doing the homework than from any other part of the course.

Please also keep up with the blog. In particular, you should always be on the lookout for bug reports on the current homework.

Fatou's Lemma

The Wikipedia article on Fatou's Lemma is surprisingly good. In addition to the usual proof it gives a "direct proof" (without the MCT). It also gives a version with "changing measures" that was new to me. It looks handy. Finally, it discusses the "conditional version" which we will need toward the end of the semester. Some attention is needed to the difference between the (easier) probability spaces and the (only slightly harder) spaces that do not have total mass equal to 1.

Homework Comments 9/4 and 9/7

I corrected a bug in the last problem on Homework No. 2 and a bug in the hint. Please look at the corrected version. BTW, this is a reminder that when you see what looks like a bug on a problem you should (a) check the web page for a correction and (b) if there is no posted correction, then send me email so that I can sort things out.

Day 2. MCT, DCT, Fatou, Etc.

We look at the fundamental results of integration theory from the probabilist's point of view. After asking what one wants from an "expected value", we look at Lebesgue's answer --- and see why it is surprising. We'll then do "problem solving" to discover a proof of the monotone convergence theorem --- after first finding the 'baby' MCT. With the MCT in hand, Fatou's Lemma is easy. With Fatou in hand, the Dominated Convergence Theorem is easy.

We'll then look at how one can estimate some expectations and look at three very fundamental inequalities. As time permits, we'll revisit the second Borel Cantelli lemma, perhaps giving two proofs.

Homework No. 2 will be due on Monday September 8 at class time. This will put us into a regular schedule of new homework posted each Monday and due the following Monday.

Office Hours: I will have office hours on Monday and Wednesday, 3pm to 4pm. Our course TA Peichao Peng will have office hours Tuesday and Thursday 4:30-5:30.

Midterm Exam: Monday, November 10 (Class Time)

This is super-advanced notice of our midterm exam. I will give more details about the exam as the time gets closer, but the short version of the plan is that it should be about factual knowledge not about cleverness or problem solving. Those skills are left for the homeworks and the take home final. Since the midterm exam is about knowledge, no notes are permitted. The exam will count, but it will not count too heavily. It will have a 10% weight in the total grade. Full Discloser: I will be at the INFORMS annual meeting for the most of the week of November 10. There will be no class on November 12.

Day 1. Getting Right to Work

We will go over the plan for the course and then get right to work. The main idea is independence, and some simple questions lead to the need for some new tools. We'll also meet two of our most constant companions: the two Borel Cantelli lemmas. We'll give proofs of these, and then start looking at applications. In the course of events, you will be reminded of various ideas from real analysis, especially limsup and liminf.

Homework No. 1 is due on Wednesday September 3.

This website is the place where one checks in to find the current homework and all of the additional information about our course, including periodic postings of supplemental material.

You can to look at the course syllabus for general information about the course as well as information about grading, homework, the midterm, and the final exam.

Please do review the course policies. I count on participants to read and follow these policies. They are quite reasonable, and it is awkward to have to single out an individual for not following our few rule.

Feel free to contact me if you have questions about the suitability of the course for you. In general the course will only be appropriate if you have had a solid background in real analysis, preferably at the graduate level.