\documentclass{amsart}
\usepackage{latexsym, bbm, html, enumerate, amssymb, amsmath}
\usepackage[abbr,dcucite]{harvard}
\usepackage[dvips]{graphicx}
\usepackage{setspace}
\usepackage{paralist}
\usepackage{ifpdf}
%\usepackage{wrapfig}
% Choose according to whether one wants to show or not the marginal notes in the ed package
\usepackage[show]{ed}
%\usepackage[hide]{ed}
\usepackage{tikz}
%\usetikzlibrary{snakes}
\newcommand{\subsectionnewline}{\mbox{}\medskip}
%\usepackage{showkeys}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\theoremstyle{definition}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{example}[theorem]{Example}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{conjecture}[theorem]{Conjecture}
%\numberwithin{equation}{section} \numberwithin{theorem}{section}
\allowdisplaybreaks[1]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% New Commands %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Fields
\newcommand{\R}{\mathbb{R}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
%Probability operators (probability, expectation, sigma-algebras)
\renewcommand{\P}{\mathbb{P}}
\newcommand{\Ps}{\mathcal{P}}
\renewcommand{\SS}{\mathcal{S}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\M}{\mathcal{M}}
\newcommand{\G}{\mathcal{G}}
\newcommand{\U}{\mathcal{U}}
\newcommand{\A}{\mathbf{A}}
\newcommand{\I}{\mathbf{I}}
\renewcommand{\H}{\boldsymbol{H}}
\newcommand{\C}{\mathcal{C}}
\newcommand{\D}{\mathcal{D}}
%\newcommand{\Z}{\mathcal{Z}}
%Induction Hypothesis
\newcommand{\hyp}{\mathbf{H}}
%Random vectors
\newcommand{\B}{\boldsymbol{B}}
\newcommand{\Bq}{\boldsymbol{\tilde{B}}}
\newcommand{\X}{\boldsymbol{X}}
\newcommand{\Y}{\boldsymbol{Y}}
\renewcommand{\S}{\boldsymbol{S}}
\newcommand{\W}{\boldsymbol{W}}
%Vectors
\newcommand{\x}{\boldsymbol{x}}
\newcommand{\s}{\boldsymbol{s}}
\newcommand{\y}{\boldsymbol{y}}
\newcommand{\z}{\boldsymbol{z}}
\newcommand{\w}{\boldsymbol{w}}
\renewcommand{\t}{\boldsymbol{t}}
\renewcommand{\c}{\boldsymbol{c}}
\newcommand{\1}{\mathbbm{1}}
\newcommand{\0}{\boldsymbol{0}}
\DeclareMathOperator{\rank}{rank}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\Cov}{Cov}
\renewcommand{\a}{\boldsymbol{a}}
\renewcommand{\b}{\boldsymbol{b}}
\newcommand{\p}{\boldsymbol{p}}
\newcommand{\pig}{\boldsymbol{\pi}}
\newcommand{\q}{\boldsymbol{q}}
\providecommand{\norm}[1]{\left\lVert#1\right\rVert}
\providecommand{\abs}[1]{\left\lvert#1\right\rvert}
\providecommand{\cyc}[1]{\left\langle#1\right\rangle}
\providecommand{\set}[1]{\left\lbrace#1\right\rbrace}
\setlength{\marginparwidth}{0.75in}
\let\oldmarginpar\marginpar
\renewcommand\marginpar[1]{\-\oldmarginpar[\raggedleft\footnotesize #1]%
{\raggedright\footnotesize #1}}
% New Note Making Commands (Modified from Dean Foster}
\definecolor{mypurple}{rgb}{.3,0,.5}
\newcommand{\ale}[1]{\noindent{\textcolor{mypurple}{\{{\bf aa:} \em #1\}}}}
\newcommand{\jms}[1]{\noindent{\textcolor{blue}{\{{\bf jms:} \em #1\}}}}
\newcommand{\note}[1]{\noindent{\textcolor{red}{\{{\bf NOTE:} \em #1\}}}}
\newcommand{\old}[1]{\noindent{\textcolor{red}{\{{\bf OLD:} \em #1\}}}}
\begin{document}
\title[]%
{The Paley-Zygmund Argument \\ and Three Variations}
\author[]
{J. Michael Steele}
\thanks{J. M.
Steele: The Wharton School, Department of Statistics, Huntsman Hall
447, University of Pennsylvania, Philadelphia, PA 19104.
Email address: \texttt{steele@wharton.upenn.edu}}
\begin{abstract}
This is a class note on techniques for proving ``lower bounds." The leading example is the Paley-Zygmund argument. Closely related
examples include the Chung-Erd\"{o}s
inequality and even Cantelli's inequality (which flips to an upper bound). The Cramer-Rao inequality rounds out the list.
\end{abstract}
%\date{\today, \texttt{\jobname.tex}}
\maketitle
%\doublespacing
\section{Paley-Zygmund Argument}
Consider a nonnegative random variable $X$. It is natural to let $\E[X]$ define a ``unit of scale" and to look at probabilities such as
$P(X \geq \theta \E[X])$ for $0 <\theta <1$. In combinatorial problems it is often of importance to get a lower bound on this probability.
The classic way to proceed uses the Paley-Zygmund argument, which is also called the second moment method.
The argument begins with a ``tautological" identity that is determined by the scaled cut,
$$
X = X \1 (X < \theta \E[X]) + X \1( X \geq \theta \E[X]).
$$
From a trivial bound on the first term and Cauchy-Schwarz applied to the second term we have
$$
\E[X] \leq \theta \E[X] + \E[X^2]^{1/2} P( X \geq \theta \E[X])^{1/2},
$$
so, when we clear the expectations to the left and square, we have
\begin{equation}\label{eq:Paley-Zygmund-Ineq}
\frac{(1-\theta)^2 (\E[X])^2}{\E[X^2]} \leq P( X \geq \theta \E[X[).
\end{equation}
This simple inequality is at the heart of the ``probabilistic method" which has been used by Erd\"os
and many others to solve some remarkable combinatorial problems, cf. \citeasnoun{AlonSpence1992}.
\citeasnoun{PaleyZygmund:PCPS1932a} introduced this argument, and shortly thereafter they used
a version for complex valued random variables \citeasnoun{PaleyZygmund:PCPS1932b}.
\section{Chung-Erd\"{o}s Inequality}
Let $A_1, A_2, \ldots, A_n$ be events in a probability space. How can one get a lower bound on the probability
of the event $B=\cup A_i$ that at least one of these
events occurs? As in the Paley-Zygmund argument, one begins with a tautology,
$$
\sum_{i=1}^n \1_{A_i} = \1_B \sum_{i=1}^n \1_{A_i}.
$$
By Cauchy-Schwarz and squaring we then have
$$
\{\E(\sum_{i=1}^n \1_{A_i})\}^2 \leq P(B) \E\left[ \{\sum_{i=1}^n \1_{A_i}\}^2 \right].
$$
When we compute the expectations, we get the the lower bound
\begin{equation}\label{eq:Chung-Erdos}
\frac{\sum_{i,j} P(A_i)P(A_j)}{\sum_{i,j} P(A_i \cap A_j)} \leq P\left( \cup_{i=1}^n A_i \right),
\end{equation}
where the sums are over all pairs of integers $1 \leq i \leq n$ and $1 \leq j \leq n$. Here, just to be clear about
the notation --- and to make it more intuitive that the lower bound is less than one --- we should note that if we set
$$
S=2 \sum_{i0$ and note
\begin{equation}\label{eq:SlipInPos}
0 \leq t =\E(t-Y) \leq \E[(t-Y) \1 (t-Y \geq 0)].
\end{equation}
By Cauchy-Schwarz we have
$$
0 \leq t \leq \{\E(t-Y)^2\}^{1/2}P(t\geq Y)^{1/2},
$$
so, when we square and recall $\E[Y]=0$, we have
$$
t^2 \leq \{\E(Y^2) + t^2\} \{1- P(Y > t)\}.
$$
Pure algebra then gives
$$
P(Y > t) \leq \frac{\E(Y^2)}{\E(Y^2)+t^2}.
$$
To make the comparison with Chebyshev's inequality, we take a random variable $X$ and let $Y=X-\E[X]$. The last inequality now gives
$$
P(X> t + \E[X]) \leq \frac{\Var [X]}{\Var[ X] + t^2}.
$$
This bound on the upper tail that is better than Chebyshev's inequality because of the extra summand $\Var [X]$ in the denominator.
\section{Revisiting the Paley-Zygmund Inequality}
In \eqref{eq:SlipInPos} we used a trivial relation of the form
$\E[Z] \leq \E[Z \1 (Z \geq 0)]$ to get an unexpected refinement of Chebyshev's inequality,
and the same trick can be applied to other problems. In particular, it
can be used to refine the Paley-Zygmund inequality \eqref{eq:Paley-Zygmund-Ineq}, which one can also write as
\begin{equation}\label{eq:Paley-Zygumund-rev}
\frac{(1-\theta)^2 (\E[X])^2}{\Var[X] + (\E[X])^2} \leq \P( X \geq \theta \E[X]).
\end{equation}
To get the refinement we start with
$$
\E[X-\theta\E[X]]\leq \E[(X-\theta\E[X])\1(X-\theta\E[X]\geq 0) ].
$$
When we simplify the left side and apply Cauchy-Schwarz on the right side, we then have
$$
(1-\theta)\E[X]\leq \E[(X-\theta\E[X])^2]^{1/2}\P( X \geq \theta \E[X])^{1/2}.
$$
When we square both sides and simplify again, we get our new Paley-Zygmund inequality:
\begin{equation}\label{eq:Paley-Zygmund-II}
\frac{(1-\theta)^2 (\E[X])^2}{\Var[X] + (1-\theta)^2(\E[X])^2} \leq \P( X \geq \theta \E[X]).
\end{equation}
In view of the extra factor $(1-\theta)^2<1$ in the denominator, this bound is a tiny bit sharper than
the classic Paley-Zygmund inequality \eqref{eq:Paley-Zygumund-rev}.
The bound \eqref{eq:Paley-Zygmund-II} also has the benefit of being
sharp in the sense that one has equality when $X$ is a constant.
These are pleasing features, even though it is hard to imagine an honest problem where \eqref{eq:Paley-Zygmund-II}
does work that cannot be done by \eqref{eq:Paley-Zygumund-rev}.
\section{Cramer-Rao Inequality}
This inequality needs some terminology from mathematical statistics. We suppose that we have a family of densities
$\{f_\theta(x): \theta \in \Theta \}$, and we assume we have a function $\hat{\theta}: \R \rightarrow \R $ that we call an \emph{esitimator}.
We also suppose that this estimator is \emph{unbiased} by which we mean that if $X$ has the density $f_\theta$ then
$$
\E[\hat{\theta}(X)] =\theta \quad \text{or, equivalently} \int_\R \hat{\theta}(x)f_\theta(x) \, dx= \theta.
$$
If we differentiate the last identity we have
$$
\int_\R \hat{\theta}(x)\frac{d}{d\theta}f_\theta(x) \, dx= 1.
$$
Just by the definition of the density function we for all $\theta$ that
$$
\int_\R f_\theta(x) \, dx= 1,
$$
and we can differentiate this to get
$$
\int_\R \frac{d}{d\theta} f_\theta(x) \, dx= 0=\int_\R \theta \frac{d}{d\theta} f_\theta(x) \, dx.
$$
Now, if we take the difference in the last two relations we get our desired identity,
$$
1=\int_\R (\hat{\theta}(x) -\theta)\frac{d}{d\theta} f_\theta(x) \, dx
=\int_\R (\hat{\theta}(x) -\theta)\frac{\frac{d}{d\theta} f_\theta(x)}{f_\theta(x)} f_\theta(x) \, dx,
$$
and the Cauchy-Schwarz inequality then gives us the bound,
$$
1 \leq
\int_\R (\hat{\theta}(x) -\theta)^2 f_\theta(x) \, dx \int_\R \left\{\frac{\frac{d}{d\theta} f_\theta(x)}{f_\theta(x)}\right\}^2 f_\theta(x) \, dx.
$$
The last integral has a name; it is called the expected Fisher information and it is denoted by $J(\theta)$. It is a bit like entropy and it is
typically easy to calculate.
When we divide by $J(\theta)$ we get
$$
\frac{1}{J(\theta)} \leq \Var(\hat{\theta}(X)),
$$
which is known as Cramer-Rao inequality. It tells us that no unbiased estimator can have a smaller variance than
$1/J(\theta)$. This bound is the basis for a large part of what one knows about the efficiency of estimators. If you are working in the
class of unbiased estimators and if you attain the lower bound $1/J(\theta)$, then you know that no one can ever beat you ---
however hard they try. Moreover, $J(\theta)$ depends only on the model, not on the estimator. In this sense it is an \emph{apriori} lower bound,
and such bounds are like gold wherever they are found.
\section{Further Results}
Another useful lower bound for the union of $n$ events has been given by \citeasnoun{deCaen1997}. This bound can be viewed as a special case of an
inner product inequality due to A. Selberg which will be discussed later.
\bibliography{biblio}
\bibliographystyle{agsm}
\end{document}