A Course for Ph. D. Students

Course Blog Fall 2016

Day 25 Martingale Convergence Theorem: Applications

The main task is to make sure that everyone has mastered the martingale convergence theorem. We'll work on this mainly by doing applications. One of the best of these is to the Kolmogorov "one series" theorem. We'll see that the Martingale Convergence theorem gives us both of these in a marvelously straightforward way to get to the essence of this theorem.

Don't forget that we will have our "Second Midterm" on Monday Dec 5. Anything in our course is fair game for the test, so it is useful to do a complete review. Still, for balance, one one question will be ``based" on Days 1 through 12, and five will be based on days 13 through 25.

Day 24 Martingales, Stopping Times, Gambling Systems

We really did the twice deferred ABRACADABRA problem. We also develop Doob's System theorem which tells us in a rigorous way that there is no way to beat a casio. We looked at examples of martingales and the "fairness principle," which we used to calculate the hitting probabilities associted with unbiased and biased random walks.

The other big item was the proof of Doob's Up-Crossing inequality for supermartingales. We obtailed from this inequality our first version of the famous Martingale Convergence theorem. There are several more versions that also follow immediately from this one.

There will be no more homeworks; the one coming in today is the last.

Hint on Last Homework

To show that the point masses are the only extreme points in the set of probability measures, it is instructive to first show that a probability measure that his given by a density cannot be an extreme point. To mimic this argument in general, you may want to review the Hahn decomposition theorem from real analysis. We mentioned it very breifly in class. In a nutshell it says that a signed measure can be written as the difference of two ordinary measures. Here to go from your non-point measure mu to a signed measure, you can subtract a small multiple of Lebesgue measure --- that's one benefit of taking Omega to be [0,1]. An alternative method that does not use the Hahn decompositon argues by contradiction. Assume mu is extreme and assume that mu has two disjoint open neighborhoods that have each have positive probability. Now, find a "local surgery" to get two measures whose average is mu.

Day 23 Some Pi-Lambda Applications, then Martingales

I'll mention a couple of pi-lambda appliations, and --- with help from a fact on the last HW --- we'll give our first proof of Kolmogorov's zero-one law. We'll give a second proof after we have developed some martingale theory.

Our first martingale experience just uses the intuitive notion of a fair game to calculate the EXACT expected time until a monkey types ABRACADABRA.

We'll then start making things rigorous. The first steps are (1) to introduce the notion of a stopping time and (2) to prove Doob's "systems" theorem. The later explains why there is no system for beating roulette or other casio games. Shortly these ideas will lead us to an elegant generalization of Kolmogorov's maximal inequality due to Doob.

Day 22 Dynkin's Pi-Lambda Theorem and Applications

I've never found a text with a proof of the pi-lambda theorem that was satisfying. They are all the same; and they all seem like a sequence of checks that lack a direction --- or more cirically --- a PICTURE. I think I have the right way to explain this proof. The idea is to use three pictures --- which are oddly all very similar. I hope you will engage this proof very actively. If it works, I may write up a class note on it, though I hate drawing pictures with a computer.

FYI: here are my sloppy but nice (!?) notes on Dynkin's pi-lambda theorem.

HW 11 (PDF and Latex Source). This is due on November 21.

Day 21 Completing Conditional Expecataion; Introducting Martingales

We complete the Hilbert space construction of conditional expectations, and we take a first look at Martingales. We won't do the pi-lambda theorem until next week. I hope to give a more enlightening derivation of this theorem than one finds in the literature.

I will not be holding office hours today, but after 5pm I can answer your emails if you have questions.

Day 20 Conditional Expectations in Earnest

This is a foundational day devoted to the definition and construction of conditional expectations. To do the job, we'll need some background about Hilbert space projections, and this provides a good reminder that when we are working with square integragble random variables, Hilbert Space and Probability Theory are almost congnate theories.

Either today or shortly, we'll also introduce the Dynkin's famous pi-lambda theorem. In may ways this is the "only" foundational theorem that the working probabilist ever needs to know. Of course, that's an overstatement, but it does help one calibrate how useful the theorem can be.

HW 10 (PDF and Latex Source). This is due on November 14. One should note that for one of these problems there is a "remarkably similar" in the text. Play as faily as you can just by following the hints; these are already very substantial.

Day 18: Metrics on Distributions and Assorted Topics

Today's plan is ecclectic. First, we'll look at the notion of metrics on distributions including the total variation distance, the Wasserstein distance, and Kolmogorov's distance. Convergence with respect to any one of these is enough to imply convergence in distribution, i.e. these topologies are stronger than the topology of convergence in distribution.

We spend most of our effort on the total variation distance in discrete spaces. In particular, we'll prove the "coupling idenity" for TV distance, and we'll use the idea of coupling to prove an elegant "fixed n" version of the Poisson approximation theorem. The result is called LeCam's inequality.

Finally, we'll add to the tools we need to begin the study of martingales. The first tool is "uniform integrability" and this also features into some homeworks that are comming up. We also need to review some Hilbert space ideas that we will use shortly in the construction of the conditional expectation of a random variable with respect to a sigma-field.

Day 17: K's 3 Series Theorem and other CHF consequences

After a warm-up problem to be determined, we prove the famous Kolmogorov Three Series Theorem. The direct part of the theorem is just an easy consequence of Borel Cantelli and the (less famous) Kolmogorov 1 series theorem. It's the converse that I find to be clever, and it is here tht one uses the theory of CHFs.

The next topic is a brief discussion of Polya-type distributions. These have a nice theory, but for us the main theme is that their study shows how one can build up from a specific example to a class of examples. We'll also discuss some basic theorems of convexity, especially Choquet's theorem, which in a way is the Prince of all integral representations.

We then have a variety of topics to choose from, but high on the list is the dicussion of the number of cycles in a random permuation. This is also closely related to the number of records in a sequence of independent observations. Both of these applications are discussed in the text and they show nicely that Lindeberg's CLT has a very natural role in life.

Homework 9: Here you have the PDF and the Latex Source. This is due on Monday November 7, the day before election day.

The venerable Wikipeia has an informative article on characterisitc fuctions. It's not 100% correct, but if you take the statemens as "suggestions subject to verification" then it is useful.

Day 16: Mostly on Moments, Distributions, and CHFs

As a second application of Levy, we prove a basic version of the Poisson Approximation theorem. We'll later prove a more powerful inequailty that contains this approximation and much more.

We then discuss the "moment problem," e.g. when do (or don't) the moments of a random variable determine its distribution. In the positive direction, the simplest trick is to exploit analytic properties of the characteristic funcion. Here we don't make explicit use of complext analysis but we do show that there is a disk of fixed radius about each point where the infinite Taylor series holds. This is analytic function theory in everything but name.

In passing we used Stirling's formula. You have no doubt seen this many times in the past but it is possible tht you have never worked though a proof. It's time you plugged that gap in your knowledge. One truly general approach is via Laplace's approximation method for integrals. Naturally, one begins with the Gamma function.

Laplace Approximation for Integrals

The Wikipedia article on the Laplace approximation is pretty good. In particular, the proof that is given there is honest and complete. What the Wiki covers is just the classic case, so a little extra work is needed to cover Wallis integral. Still, once you know how to do one of these cases, they are all very much the same.

Class Note on Lower Bounds

Here is a brief class note on the Paley-Zygmund lower bound method and some related lower bounds. It's not a perfect model for your "four page paper" but the structure is right. At a minimum, you can start with the Latex Source as you wrapper.

Day 15: Lindeberg's CLT and CHF Technology

Our main goal is simple: Prove the Lindeberg CLT by the method of characteristic functions. This is a fully adult CLT that is ready for professional use for the rest of your life.

We'll take the time to do this carefully. Along the way we will get some valuable estimates and "tricks" that are of general applicability.

As time permits, we'll look at other things that one can do with CHFs. In particualar, we'll eventually need to have the Polya distribution at our disposal. I'll also hand back and discuss the first mid-term exam.

Homework 8: The PDF and the Latex Source. This is due on Monday October 31. It's not very scary.

Logistic Distribution, CHFs, and Elo++

We'll shortly look a some more "exotic" distributions and their characteristic function theory. While playing with the logistic distribution, I hit upon an interesting paper on chess rating systems and their predictive power. If you are interested in the prediction of outcomes of contests, it is worth a look. It may be suggestive of election predition technology. For example, you can treat the polling agencies as players and compute their ratings. This would be an empirical take on 538's ABC rankings.

Tao on the CLT

If you have any spare time at all, you should take a look at Tao's take on the CLT . You'll need to pay some dues to deal with slightly different notation etc., but your time is well spent.

Our discussion of the Lindeberg method will be loosely based loosely Tao's exposition, but first we'll prove Lindeberg's theorem (and other things) via the method of characteristic functions.

Nevertheless, the (post-modern version of) the "original" Lindeberg method is a valuable addition to one's toolkit. Curiously enough, Lindeberg's 1922 paper was almost impenetrable, and his method was saved from obscurity by H. Trotter and W. Feller. The more recent interpretations are even simpler than Feller's.

Day 14. Helly's Theorem and Levy's Continuity Theorem (Part II)

For the warm-up we look at a simple example of an argument that one can awkwardly call --"uniqueness plus compactness gives convergence."

We then prove Helly's Theorem as a consequence of some lemmas that underscore some well-isolated components of the classic argument.

We then prove Levy's continuity theorem. The the basic ingredients are (a) the "continuity at zero implies tail bound" inequality and (b) Helly's selection theorem.

The proof of Levy's continuity theorem then followed the "uniqueness plus compactness gives convergence" dance. There are untold applications of Levy's continuity theorem. We begin with a simple CLT (or two), and we may start the discussion of Polya type distributions.

Also, as time permits, we may discuss a basic inequality for the remainder in the Taylor approximation of e^{ix}. It is simple, but there are some subtitles to it.

Homework 7: The PDF and the Latex Source. This HW is shorter than usual because of the Midterm. It is due on Monday October 24. Note: For problem 2 in this HW you should provide honest derivations, not just quote a result from some statistics book.

Mid-Term Day, Time, and Rules

On Monday October 17 we will have an in-class exam. In such an exam is not "fair" to expect people to come up with the solution to a deeply original problem. On the other hand, it is certainly fair to ask one to prove almost anything that is close to what has been proved in class, or close to a previously assigned homework problem. Also, one is expected to have total mastery of all definitions and theorem statements. You should keep a keen eye open for any "distinctions" you can express for yourself. These often illuminate understanding.

The test will be closed book, closed notes, and without access to communication devices. Everyone is expected to be informed of the principles of Academic Integrity of the University of Pennsylvania.

In view of the mid-term test, there will be no homework due on October 17.

Day 12. Tightness and Levy's Continuity Theorem (Part I)

For the warm-up we look at some calculations using the Fourier inversion formula.

We then set up the proof Levy's Continuity Theorem. In particular, we develop the notion of "tightness" of a collection of distributions (or random variables, or measures). The general discussion of tighness is completed with a nice "limsup" characterization" of tightness.

We then work toward a criterion for tightness in terms of characteristic functions, and the key technical step is an interesting tail bound due to Levy. As a corollary of this proof we find a nice trick to deal with the (sometimes irksome) fact that characteristic functions are complex valued.

Finally we completed the proof of Levy's Tightness Criterion which says that if a limit of charageristic fuctions is continuous in a neighborhood of zero, the the sequence has to be tight.

Next lexture we'll prove Levy's continuity theorem, and begin harvisting its consequences.

Day 11. Distributions, Convergence, and the Fourier View

We will continue to develop the ways in which characteristic functions help us understand distributions. We'll look first at a few elementary results, but a main task will be to devlop the tools for proving convergence in distributions using characteristic functions.

The main theorem in our sights is Levy's continuity theorem, but some intermediate results and examples will need to be covered first. In particular, we'll need the notion of tightness and the Helly Selection Theorm --- which is a classic compactness result.

In due course, we will develop the smoothness and moment relationships, density characteristic function pairs, Polya's density, Polya type characteristic functions, WLLN, Kolmogorov 3 Series theorem, and various centtral limit theorems. This kind of activity will be with us for at least two more weeks.

Arzela-Ascoli Theorem

The Wikipedia article on the Arzela-Ascoli theorem is decent and reading this piece is good preparation for our proof of the continuity theorem for characteristic functions. Nevertheless, you may find better discussions in standard textbooks, such as Rudin.

A priori, the article on the Helly section theorem would be more relevant. Unfortunately, this aritcle is not well focused for applications in probability. In this case, the venerable Wiki pays too much attention to the bounded variation case, and we just need the simpler monotone case.

Day 10. Levy's Inversion Formula

We discussed the Levy's inversion formula and we reviewed the basic tool for its proof --- the finite form of Dirichlet's Discontinuous Integral. We then layed out the proof of Levy's Formula with all of the details.

Other agenda items required a less sustained development, but they are important when the time comes to use characteristic functions. Specifically, we noted some of the relationships between characteristic functions and derivatives. It's on deck to look at other manifestation of the meta-principle that "the behavior of the ch.f. at 0 tells you about the behavior of the tail of the distribution, and vice-versa." This principle will be an important guide for the next few days.

Incidentally, venerable Wikipedia does a decent biographical job for Dirichlet, but its discussion of Dirichlet's discontinuous integral only gets a B minus. Feel free to edit it.

On older business, there are two more proofs of the L^1 maximal inequality based on two lemmas of independent interest. These are the "leader lemma" and the "rising sun lemma". One of the reasons these interest me is that they are just facts about real numbers; it's hard to get closer to the bone.

Day 9. Characteristic Functions and Dirichlet's Integral

For the warm-up problems, we'll look at examples of characteristic functions and Laplace transforms. We then look at some general properties of characteristic functions including the uniform continuity property and the positive definite property.

Next we look at the integral of sin x over x. We will cherry pick from the scholium on sin(x)/x but there will be lots left for you to read. An important goal will be to give a clear and complete discussion of Dirichlet's Integral. You will want to digest this pretty quickly; it will be needed next class, and you don't want it to be a mystery.

As time permits, we'll discuss other properties of catechistic functions and look at some big picture issues. We'll be doing classical mathematics for a few days, and you want to make sure you can see the forest for the trees --- however lovely the trees may be.

Day 9. Characteristic Functions and Dirichlet's Integral

For the warm-up problems, we'll look at examples of characteristic functions and Laplace transforms. We then look at some general properties of characteristic functions including the uniform continuity property and the positive definite property.

Next we look at the integral of sin x over x. We will cherry pick from the scholium on sin(x)/x but there will be lots left for you to read. An important goal will be to give a clear and complete discussion of Dirichlet's Integral. You will want to digest this pretty quickly; it will be needed next class, and you don't want it to be a mystery.

As time permits, we'll discuss other properties of characteristic functions, and we will look at some big picture issues. We'll be doing classical mathematics for a few days, and you want to spend some contemplative time to make sure you can see the forest for the trees. Nevertheless, the trees are lovely.

Homework 6 is due on Monday October 10. If you have not been using Latex yet, it is time to bite the bullet. From this point onward, Latex is required. You can save a little typing if you start with my Latex source for the problems. Be sure to read the text through section 3.3. Not everything can be covered in class.

Coaching on HW 6 Problem 1

This problem really can teach you something about problem solving and I hate to give a hint that is so large that it gives the problem away. Still, in the past some people found themselves going in circles with this problem, so I offer some coaching:

  • There is no need to reinvent the wheel. You can use the Paley-Zygmund inequality; you don't have to rederive it.
  • If you stare at PZ, you'll see that it suffices to show E[S_n^2] < 5( E[S_n])^2.
  • This means that you need to estimate E[S_n^2] in terms of E[S_n]^2. This should give you the focus you need to keep from going in circles.
  • If you need one more suggestion, don't forget the relations you know for Var[S_n]. These put some cancellations into the game that can help.

Homework Policies

We're well into the homework process now, but it seems that some reminders are needed about HW policies. You are strongly encouraged to do the HWs individually. At lease 80% should be done individually. If you collaborate with a fellow student on a problem or if you get a hint from a friend, you should acknowledge this by a written statement on your HW.

Heads Up --- NNQ

There will be a No-Name-Quiz on Monday. Please review the most important results that we have covered in the last few weeks, and get them into short-term memory. They can never be part of long-term memory if they don't first make it to short-term memory!

The venerable Wikipedia does a decent biographical job for Dirichlet, but its discussion of Dirichlet's discontinuous integral only gets a B minus. Feel free to edit it.

On other business, there are two more proofs of the L^1 maximal inequality based on two lemmas of independent interest. These are the "leader lemma" and the "rising sun lemma". One of the reasons these interest me is that they are just facts about real numbers; it's hard to get closer to the bone.

Day 8. Probability, Analysis, and Weak Laws

The warm-up problems will deal with the Laplace transform and the MGF. This is both to set up the work with transforms in Chapter 3 and to remind everyone about some basic calculus (e.g. the Gamma function, differentiation under the integral etc.)

There are then two main tasks. First we'll see how Chebyshev's inequality can be used to prove some theorems in analysis such as the Weierstrass approximation theorem and a version of the Laplace inversion formula. We'll then look at some "minor" problems in probability such as records and runs. These are nice to know about even if they do not constitute core theory.

We will shortly need some information about the integrals --- especially integrals related to sin(x)/x. We won't need a ton of information, but this topic offers a beautiful window on classical mathematics. You should take a peek if you have the time. I've written a leisurely scholium on sin(x)/x. It has has more than we need --- but you may still find it useful (even entertaining). Among other things, it gets the Laplace transform of sin (x) by five different methods.

Day 7. An L^1 Maximal Inequality and a Proof of the SLLN

We'll have a couple of warm-up problems that use the moment generating function. One of these will be used later to give us another version of Hoeffding's Lemma. I also have a symmetry story about the variance.

We then introduce another maximal inequality, the L^1 maximal inequality. This has notable benefits over Kolmogorov's L^2 maximal inequality that can be used to give a very direct proof and very easy proof of the SLLN. If you want to read about this ahead of time, you can check out a brief piece that I did for the American Mathematical Monthly.

As time permits, we'll consider some applications of technology of the weak law of large numbers such as Bernstein's proof of Weierstrass approximation. We also have the Chung-Erdos lower bound on our to-do list.

HW 5 will be due on Monday, October 3.

HW 4 Note --- A Complete Graph of Inferences

In problem 4 of HW4, you will want to avoid use of the DCT if you want the benefit of a new proof of the DCT in problem 5. Here you can use the MCT, but perhaps you can even avoid the used of the MCT. Part of the point here is that you can essentially form a complete graph of the implications between the MCT, DCT, and Fatou. Any given book chooses an order, and starting with the MCT is most natural. Still, all 6 of the possible permutations can be done.

Day 6. Adult Strength SLLN

We'll do two warm-up problems that should put firmly in mind a couple of ideas that we need later in the class. We then prove the SLLN under the assumption of IID random variables and just a finite first moment.

This takes a sustained argument with three slices:

  • A DCT slice --- pretty trivial but requiring some knowledge.
  • A BC lemma slice --- again easy, but requiring a knowledge of a calculation and a lemma.
  • A Kolmogorov One Series Slice. Here we use V1.1, and we have to do a nice calculation to make it tick. The warm-up problems help us here.

This argument deserves to be mastered, and we'll take our time with it. If we do have time to spare, we'll prove Feller's Weak Law of Large Numbers. It illustrates the idea that one can make some progress even without a first moment.

Day 5. Series Theorems of Kolmogorov and Levy

We begin with a warm-up problem or two: The proof of Jensen's inequality is one of these.

We then put the Cauchy criterion into the language of random variables. In particular, we put the difference between convergence in probability and convergence with probability one into a tidy analytical box. Everyone needs to sort out the ways to express what it means for a sequence to be almost surely Cauchy, and everyone needs to see how this differs from the definition of a sequence being Cauchy in probability.

We recall Kolmogorov's maximal inequality and how we used it to prove Kolmogorov's One Series Theorem v1.0 and we note that the same argument gives us v1.1. We'll also review how we used the Kronecker lemma to get the SLLN for IID random variables with a finite variance.

We'll then prove a curiously general maximal inequality due to Paul Levy, and we'll use this inequality to prove Levy's Series Theorem which says that for series of independent summands convergence in probability and convergence with probability one are equivalent.

Homework 4 is due Monday September 26 --- the night of the first Presidential debate.

Note: Some people did not do well on HW2 because they did not have deep enough experience with delta-epsilon proofs. This is not a deficit that one can make up with a little extra work; most people need a solid one-year course in analysis to succeed in 930. The demands on your analysis skill will start piling up very substantially, so, if you are not confident in your analysis skill, you should consider switching to auditor status.

Coaching Added

I added a tiny bit of Cauchy coaching to the end of the HW3 assignment. It may keep you from getting confused in one of the problems.

Day 4. More Techniques for the SLLN

We begin with three warm-up problems. They are motivated by what they suggest about problem solving. They also give us some facts that we'll need later.

We then give a proof of the SLLN for i.i.d. random variables with a finite variance. This is a "two trick" proof: (a) passing to a subsequence (b) using monotone interpolation to solve the original problem. We'll see many proofs of this theorem. This version is one of the simplest and most direct.

The task then becomes the proof of the SLLN under the most natural conditions where we only assume we have a finite first moment. We'll approach this by first considering infinite sums --- of real numbers and of random variables. This will lead us to the consideration of our first maximal inequality, Kolmogorov's "Weak Type L^2" maximal inequality.

You may need to review the Cauchy criterion. The venerable Wikipedia is a little lame this time but there is a useful discussion posted by an Oxford professor. It is not sophisticated but it is worth reading.

Note: I am going to McMaster University this afternoon, so there will be no office hours today (Wednesday, 9/13).

You are encouraged to dig deeply into the differences between rabbits and hares. This is not a frivolous exercise; it can change your life. It is a fine thing to be aware of distictions --- or at least the possibility of distinctions.

Day 3. First Look at Limit Laws

We'll revisit BCII and give a generalization. The proof illustrates two basic "tricks": The benefit of working with non-negative random variables and the juice one can get by "passing to subsequences." We'll see other versions of BCII as the course progresses.

We'll then prove a couple of easy versions of the Strong Law of Large Numbers. These results are not critical in and of themselves, but the techniques are very important. You can make a modest living with just the techniques that are covered today.

We'll may also prove a version of the Kolmogorov maximal inequality. The theory and applications of maximal inequalities is one of the big divides between elementary probability theory and graduate probability theory. Maximal inequities are not particularly hard, but they create a shift in the sophistication of the conversation.

Homework 3 is due on Monday, September 19.

Please also keep up with the blog. In particular, you should always be on the lookout for bug reports on the current homework.

Fatou's Lemma

The Wikipedia article on Fatou's Lemma is surprisingly good. In addition to the usual proof it gives a "direct proof" (without the MCT). It also gives a version with "changing measures" that was new to me. It looks handy. Finally, it discusses the "conditional version" which we will need toward the end of the semester. Some attention is needed to the difference between the (easier) probability spaces and the (only slightly harder) spaces that do not have total mass equal to 1.

Day 2. MCT, DCT, Fatou, Etc.

We look at the fundamental results of integration theory from the probabilist's point of view. After asking what one wants from an "expected value", we look at Lebesgue's answer --- and see why it is surprising. We'll then do "problem solving" to discover a proof of the monotone convergence theorem --- after first finding the 'baby' MCT. With the MCT in hand, Fatou's Lemma is easy. With Fatou in hand, the Dominated Convergence Theorem is easy.

We'll then look at how one can estimate some expectations and look at three very fundamental inequalities. As time permits, we'll revisit the second Borel Cantelli lemma, perhaps giving two proofs.

Homework No. 2 will be due on Monday September 12 at class time. This will put us into a regular schedule of new homework posted each Monday and due the following Monday.

Day 1. Getting Right to Work

We will go over the plan for the course and then get right to work. The main idea is independence, and some simple questions lead to the need for some new tools. We'll also meet two of our most constant companions: the two Borel Cantelli lemmas. We'll give proofs of these, and then start looking at applications. In the course of events, you will be reminded of various ideas from real analysis, especially limsup and liminf.

Homework No. 1 is due on Wednesday September 7. Note: If sometimes you see "530" keep in mind that this is the old number for 930. I'll catch this much of the time but not all of the time.

This website is the place where one checks in to find the current homework and all of the additional information about our course, including periodic postings of supplemental material.

You can to look at the course syllabus for general information about the course as well as information about grading, homework, the midterm, and the final exam.

Please do review the course policies. I count on participants to read and follow these policies. They are quite reasonable, and it is awkward to have to single out an individual for not following our few rule.

Feel free to contact me if you have questions about the suitability of the course for you. In general the course will only be appropriate if you have had a solid background in real analysis, preferably at the graduate level.