Please see PDF version

Le Cam's Inequality and Poisson

J. Michael Steele

1. INTRODUCTION. For the sum S,, of n independent, nonidentically distributed Bernoulli random variables Xi with P(Xi = 1) = p,, Le Cam [201 established the remarkable inequality.
P(Sn = k)  e 'AkIM 1 < 2 pe~
kO where A = P1 + P2 +    + Pn.
Naturally, this inequality contains the classical Poisson limit law Gust set pi = A/n and note that the right side simplifies to 2A2/n), but it also achieves a great deal more. In particular, Le Cam's inequality identifies the sum of the squares of the p, as a quantity governing the quality of the Poisson approximation.
Le Cam's inequality also seems to be one of those facts that repeatedly calls to be provedand improved. Almost before the ink Was dry on Le Cam's 1960 paper, an elementary proof was given by Hodges and Le Cam [18]. This proof was followed by numerous generalizations and refinements including contributions by Kerstan [191, Franken [151, Vervatt [301, Galambos [17], Freedman [16], Serfling [24), and Chen [11, 121. In fact, for raw simplicity it is hard to find a better proof of Le Cam's inequality than that given in the survey of Serfling [251.
One purpose of this note is to provide a proof of Le Cam's inequality using some basic facts from matrix analysis. This proof is simple, but simplicity is not its raison d'etre. It also serves as a concrete introduction to the semigroup method for approximation of probability distributions. This method was used in Le Cam [201, and it has been used again most recently by Deheuvels and Weifer [131 to provide impressively precise results.
The semigroup method is elegant and powerful, but it faces tough competition, especially from the coupling method and the ChenStein method. The literature of these methods is reviewed, and it is shown how they also lead to proofs of Le Cam's inequality.

2. MATRIX PROOF OF LE CAM'S INEQUALITY. If one is charged with the task of producing matrices that might help in understanding the distribution of the sum of n independent nonidentically distributed Bernoulli random variables, a little time and thought is likely to lead to n matrices P, like the N X N matrix
?l vi pi 0 ... 0 0
0 1  Pi A ... 0 0
0 0 1 pj ... 0 0
(2.1) Pi

0 0 0 ... 1 A A
0 0 0 ... 0 1 Pii


i i i 1

i i

This is almost a Markov transition matrix, except of course the last row of P, does not sum to 1. A benefit of this choice of the Pi is that they can be written as

pi = (1  pi) I + piR,
where I is the N X N identity matrix and R is the N X N matrix with rs on the first superdiagonal and O's elsewhere. Since each of the Pi's is just a linear combination of I and R, any pair of the P, commute, and because the matrices P, are so much like Markov transition matrices, their analysis is still reminiscent of the elementary theory of Markov chains.
In fact, by the usual considerations that attend the multiplication of Markov matrices, you can quickly convince yourself that for n < N the top row of the nfold matrix product P1P2P3 ... Pn is given by (P(S,, = 0), P(S', = 1), P(S,, = 2),. . ., P(S,, = n), 0, 0, . . . , 0), i.e. the first n + 1 elements of the top row of P1P2 ... P, correspond precisely to the Bernoulli sum probabilities that we wish to estimate. Also at this point, it may be good to be reminded that N is arbitrary except for the constraint n < N, so the padded O's can go on as far as we like.
So far, we have found a matrix that helps us understand the Bernoulli sum probabilities P(S,, = k), and now we would like to find a matrix that is intimately connected with the Poisson distribution. Given some past experience with calculating matrix functions using the Jordan normal form, one can easily find candidates, but knowledge of Jordan forms is not required. One just needs to compute the exponential of Pi, or, better yet, compute the exponential of a simpler matrix closely connected with Pi.
When we write Pi = I + Qi, we see Qi has the pleasing form,

_pi pi 0 ... 0 0
0 pi pi ... 0 0
0 0 pi ... 0 0
piI + piR, (2.2)

0 0 0 pi pi
0 0 0 ... 0 pi

and the Poisson distribution emerges clearly when we compute exp Qi:

ePi Piepi ... ePip,y'1(N  1)!'
0 e pi pieA ...
0 0 e Pi ...
0 0 0 ... e Ps pie Pi
0 0 0 ... 0 e A

F, p,rePiR`lrL (2.3)

Note that the Qi commute, so exp(Qi)exp(Q) = exp(Qi + Q), and, in detail,

n 1 n
exp(Qi) = exp~ F, Q) = exp(AI + AR), (2.4)

where A = P1 + P2 + +P,'




The essence of the method is now fully revealed, and we see the proof of
Le Cam's inequality boils down to comparing the top rows of II~  1 ex:p Qi and
rj~1Pi. This can be achieved most systematically by introducing matrix norms.
If A = (a,j) is any matrix, we set
11A11 = max Fjia,jj. (2.5)
i i

This recipe provides a bona fide matrix norm, and, in particular, one can easily check the relations 11AB11 t~ 11A11 11B11, 11A + B11 t~ 11A11 + 11BIL and licAll = c11A11 for c > 0.
It is also easy to use the explicit formulas for Pi, Qi and exp Qi to compute their norms: 11P,11 = 1, 11Q11 = 2pi, and llexp Q11 :!. 1. When we compare FIfi and r] exp Qi using the norm defined by (2.5) we see that the top row attains the maximum, so we have the basic relation,
N1 1~ n P, _ n exp Qi
F, IP(S,, = k)  e.Aklkli H 11 (2.6)
k1 i=1
Next~ it is easy to check that P ... P  exp Q, ... exp Q,, = (P,  exp Q 1) (P2 ... P.) n
(eXP Q1) (eXP Q2 ... eXP Q. P2 ... Pn).

This identity virtually completes the proof. We.just take norms, use the facts that 11P2 ... P,,11 .~ 1 and llexp Q111 ig 1, then repeat the process on the remaining (n  1)fold product to obtain

11P1 ... Pn  exp Q, ... ex:p Qn11 < 11P1  exp Q111 + HM Q2 ... eXP C,  P2 ... Pn 11 n
11Pi  exp Q11. (2.8)

How should we bound 11Pi  exp Q11? Since exp Q1 is defined by the expansion for ex, we naturally look to Taylor's formula, but we should be careful enough to consider a finite expansion with a remainder term. For any smooth function f we have

f(j) = f (0) + f 1(0) + 1(1.  u)f "(u) du, (2.9)
so, if we let f(t) = e'Q, the derivative calculations f(O) =I, f '(0) ~ Q and f "(u) = Q1euQ Yield

eQ + Q + f 1(1 _ U)Q2euQ du. (2.10)
Even for functions of matrices, integrals are just the limits of sums, so taking norms inside an integral only makes it larger, i.e. for any g(u, v) we have ilfg(u, Q)dull < f lig(u, Q)11 du. Also, just as we computed eQ, explicitly in order to bound its norm, we can compute euQi explicitly to find lleuQi11 t~ 1. Applying these observations to the Taylor representation (2.10), we find

11Pi  eQiil:!~, (1  u)Q~euQi du 2iiiQigll. (2.11)
11 01

i i
i i i i i i 1

This is just the tool needed to bound the right side of (2.8). Stringing together the,
identity (2.6) with inequalities (2.8) and (2.11), we find
n n
P(Sn = k)  e AAk/k! 1 < 21 F, 11Q9 11 2 F, p?, (2.12)
k=0 i=1 i1
where the last inequality depended on IIQ~ll :!~ 11Qij12 and our earlier explicit calculation that 11Q11 = 2pi. Since n < N is our only restriction on N, we can let N to obtain Le Cam's inequality.

3. THE SEMIGROUP METHOD HAS VIGOROUS COMPETITORS. Deheuvels and l`feifer [13] provide a version of Le Cam's inequality thatin an asymptotic sensehas a solid claim on being the last word:

2)/{ n
I P(S,, = k)  e`01k!J~2/~e p, (3.1)
k~O }

provided Wi_1pi  oo and max(pl,P2,.."Pn)  0 as n  .The essential ideas behind the proof of (3.1) have been seen in Section 2 in a basic form: one obtains an interpretation of the Bernoulli sum probabilities, introduces a semigroup (like exp(tQ)), finds ways to bound an approximation (like eQ  I  Q), and deals with the difference of two nfold products. Variations on this pattern are visible in Le Cam [20), Shur [26], and one even can see similar steps in Feller's exposition of Trotter's proof of Lindeberg's central limit theorem.
Through the explicit matrix exponentiation calculations used here, the semigroup method can be seen to be friendly as well as useful. Continued exploration of the method is likely to lead to deep and interesting results, but the semigroup method should not be oversold. There are competitors with considerable power.
The characteristic function method also has a role in Poisson approximations. In particular, the characteristic function method has been used by Rusenko [231 to obtain rates of approximation results for repeated samples taken without replacement, by Presman [22] to obtain refinements of Le Cam's inequality, and by Yakshyavicius [321 to provide an inequality like Le Cam's that is pertinent to classes of discrete distributions other than Bernoulli sums. Despite this litany, applications of the characteristic function method are infrequent in Poisson approximation, and it probably does not rank among the big three: the semigroup method, coupling, and the ChenStein method. Enough has been said about the first of these, and it is important to provide some sense of the promise inherent in the other two.

4. THE COUPLING METHOD. Because of its simplicity, the coupling method deserves to be reviewed first. For any two random variables X and Y, we begin by defining their variation distance by
d(X, Y) = sup 1 P(X r= A)  P(Y e A) (4.1)
For random variables that take values in Z', the metric d(, has an easily proved alternative expression (cf. Serfling [25], p. 569) that reveals its relevance to Le Cam's inequality:

d(X, Y) F, 1 P(X = k)  P(Y = k)

The coupling method is based on the simple observation that for random variables



i i i

X and Y defined on the same probability space, one has

d(X, Y) t~ P(X o Y). (4.3)
A second observation that helps the coupling method work well with sums is that
for S,, = W 1=1
X, and S,*, = W Y, one has

d(S., S.*) d(Xi, Yj). (4.4)

From (4.2), (4.3), and (4.4) we see that a good plan for proving Le Cam's inequality consists of building n bivariate couples Zi = (Xi, Y) such that the Zi are independent, X, is Bernoulli with parameter pi, Yi is Poisson with parameter A, = pi, and P(Xi :0 Y) is as small as possible. This plan has been successfully pursued in Hodges and Le Cam [181, Freedman [161, Serfling [241, Brown [101, Ahmad [11, and Wang [311. In fact, the couplings of Serfling, Brown, and Wang all satisfy

d(Xi, Yi) = P(Xi 0 Y) = pi(l  ePi), (4.5)

from which Le Cam's inequality (1.1) follows easily. As it happens, there is nodifficulty in constructing variables that satisfy (4.5)just think how to simulate X, and Y, simultaneously using a single uniformly distributed random variable.

5. THE CHENSTEIN METHOD. The ChenStein method may be the most powerful method for obtaining Poisson approximations, and it is often as easy to use as the coupling or semigroup methods, even though it may be more subtle conceptually. If one does not stop for motivation, one can say that the ChenStein method is based on the fact that for each A > 0 and A c Z' there is a function X = XX, A: Z '  R such that for any nonnegative integervalued random variable T one has the identity:

E(Ax(T + 1)  TY(T)) = P(T E A)  F, eTIkL (5.1)

Actually, the lefthand side of (5.1) is a natural quantity to consider in the context of Poisson approximation, since by summation by parts one can check that EAff + 1) = ETf(T) for any f, provided T is Poisson with parameter A. The identity (5.1) was first developed by Chen [111, and some of its mystery can be removed by studying an analogous identity used by Stein [27] in the context of normal approximations. While it is a good exercise to solve (5.1) for x, all one really needs to know about x is that it is bounded and changes slowly. In particular, Barbour and Eagleson [51 sharpened earlier bounds of Chen [121 and showed that for all A and A > 0:

1x 1 :!g minG, 4A 1) (5.2)


A X = sup J'x (m + 1)  x (m) :5 A  1 (1  e A) (5.3)

From these bounds it is easy to proveand even sharpenLe Cam's basic inequality (1.1). If we write W= S, Wj = S,,  Xj, A = P1 + P2 + .. +p,, and qj = 1  pj, we can follow Chen [121 and obtain a second identity that together with (5.1) gives one virtually complete information about the Poisson approxima


i i
i i i

tion. We evaluate the left side of (5.1) as follows:
E{Ax(W + 1)  Wk(W)} F, E(px(W + 1)  Xx(W)}
E pJE{X(W + 1)  x(Wj + 1)}
E pj {p, Ex (W, + 2)

+q,Ex(W, + 1)  Ex(W, + 1)}
F, pj2E{x (W, + 2)  x(Wj + 1)}. (5.4)

From the ChenStein identity (5.1) and the BarbourEagleson bound on Ax, we see that (5.4) gives
1  e pj2. (5.5)
sup P (S,, E= A)  e Aklk! A (1
A 1 k EEA j_1
Since A1(1  e') :5 1, the identity (4.2) shows that inequality (5.5) is sharper than (1.1). Obviously, the ChenStein method is very powerful, though it is only now beginning to be well understood. A richer understanding of the method can be obtained by studying Arratia, Goldstein, and Gordon [21, Barbour [3], Barbour and Eagleson [5, 6, 71, Barbour and Hall [81, Barbour [41, and, of course, Stein [281. A definitive study of Stein's method and its application to Poisson approximation has recently been given in the volume by Barbour, Hoist, and Janson [91.

6. CONCLUSION. Le Cam's inequality provides information on the quality of the Poisson approximation, but it also serves as a talisman that is able to charm concrete insights from general techniques. This survey relied on that second service to illustrate the semigroup method, coupling, and the ChenStein method. In the course of these illustrations, it has also been possible to survey most of the work on Poisson approximation since the review of Serfling [251, except for the cascade of work coming from the more refined developments of the ChenStein method that are dealt with in detail in the monograph of Barbour, Hoist, and Janson [91.


1. Ahmad, I. A. (1985). On the Poisson approximation of multinomial probabilities, Statist. Probab. Lett., 55~56.
2. Arratia, R., Goldstein, L., and Gordon, L. (1990). Two moments suffice for Poisson approximations: the ChenStein method Statistical Science, 5, 403434.
3. Barbour, A. D. (1982). Poisson convergence and random graphs, Math. Proc. Camb. Phil. Soc., 92, 349359.
4. Barbour, A. D. (1987). Asymptotic expansions in the Poisson limit theorem, Ann. Probab., is, 748~766.
5. Barbour, A. D. and Eagleson, G. K. (1983). Poisson approximation for some statistics based on exchangeable trials, Adv. Appl. Probab., 15, 585600.
6. Barbour, A. D. and Eagleson, G. K. (1984). On the rate of Poisson convergence, Math. Proc. Camb. Phil. Soc., 95, 473480.
7. Barbour, A. D. and Eagleson, G. K. (1986). Random association of symmetric arrays, Stoch. Anal. Appl., 4(3), 239~281.


i i

i i

8. Barbour, A. D. and Hall, P. (1984). On the rate of Poisson convergence, Math. ~. Camb. Ph;j. Soc., 95, 473480.
9. Barbour, A. D., Holst, L., and Janson, S. (1992) Poisson Approximation, Oxford University Press, New York, NY.
10. Brown, T. C. (1984). Poisson approximation and the definitions of the Poisson process, this MONTHLY, 91,116123.
11. Chen, L. H. Y. (1974). On the convergence of Poisson binomial to Poisson distributions, Ann. Probab., 2, 178~180.
12. Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probab., 3, 534545.
13. Deheuvels, P. and Weifer, D. (1986). A semigroup approach to Poisson approximation, Ann. Probab., 14, 663676.
14. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Volume 11, John Wiley and Sons: New York, NY.
15. Franken, P. (1964). Approximation des Verteilungen von Summen unabhfingiger nichtnegativer ganzzahler WalIsgr8ssen durch Poissonsche Verteilungen, Math. Nachr., 23, 303340.
l& Freedman, D. (1974). The Poisson approximation for dependent events, Ann. Probab., 2, 256~269.
17. Galambos, J. (1973). A general Poisson limit theorem of probability theory, Duke J. Math., 40, 581586.
18. Hodges, S. L. and Le Cam, L. (1960). The Poisson approximation to the binomial distribution, Ann. Math. Statist., 31, 737740.
19. Kerstan, J. (1964). Verallgemeinerung eines Satzes von Prochorow und Le Cam, Z. Wahrsch. Verw. Gebiete, 2, 173179.
20. Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution, Pacific J. Math., 10, 11811197.
21. Le Cam, L. (1963). On the distribution of sums of independent random variables, Bernoulli, Bayes, Laplace.. Proceeding of an International Research Seminar (J. Neyman and L. M. Le Cam, eds.), Springer: New York, NY, 179202.
22. Presman, E. L. (1985). Approximation in variation of the distribution of a sum of independent Bernoulli variables with a Poisson law, Theory Probab. Appl., 30, 417422.
23. Rusenko, N. (1985). On the approximation of the distribution of some statistics by the Poisson laws, Lith. Math. J., 25(4), 363370.
24. Serfling, R. J. (1975). A general Poisson approximation theorem. Ann. Probab., 3, 726731.
25. Serfling, R. J. (1978). Some elementary results on Poisson approximation in a sequence of Bernoulli trials, SL4M Rev., 20, 567579.
26. Shur, M. G. (1984). The Poisson theorem and Markov chains, Theory Probab. Appl., 29(1), 124126.
27. Stein, C. (1970). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab. Vol. 2, 583602, University of California Press.
28. Stein, C. (1986). Approximate Computation of Expectations, Inst. Math. Statist. Lecture NotesMonograph Series Vol. 7, Hayward, California.
29. Trotter, H. F. (1959). An elementary proof of the central limit theorem, Arch. Math., 10, 226234.
30. Vervaat, W. (1969). Upper bounds for the distance in total variation between the binomial and the Poisson distribution, Statist. Neerlandica, 23, 7986.
31. Wang, Y. H. (1986). Coupling methods in Poisson approximation, Canadian J. Statist., 14, 6974.
32. Yakshyavicius, S. V. (1986). Poisson approximation of the binomial, Pascal, negative binomial and geometric distributions (in Russian), Litousk. Mat. Sb., 26(1), 165~ 184.

Department of Statistics
RWarton School
University of Pennsylvania
Philadelphia, PA 19104

Answer to Who Was the Author.
(p. 14)
Emmy Noether.



i i
i i
i i