Please see PDF version

The Annals of Probability
1977, Vol. 5, No. 3, 395-403


University of British Columbia

A limit theorem is established for the length of the longest chain of random values in Rd with respect to a partial ordering. The result is applied to a question raised by T. Robertson and F. T. Wright concerning the generalized empirical distribution function associated with the class of lower layers.

Introduction. There are two distinct roots to the work presented in this paper: one in mathematical statistics, and the other in combinatorial geometry. Since the geometrical *origin is historically prior, as well as the more succinct, we will consider it first.
In 1935, Erdds and Szekeres [3] proved that any sequence of n distinct real numbers has a subsequence of length at least ni which is either monotone increasing or monotone decreasing. S. Ulam later posed a probabilistic version of the result of Erdbs and Szekeres and made a conjecture which we will now describe.
Let Xi = (Ci('), Ci(2)), i = 1, 2, be a sequence of i.i.d. random variables which are uniformly distributed in [0, 11 x [0, 1], and let 1(n) be equal to the largest integer k such that there is a subsequence il < i, <, ik n such that
cp) < C,(1) < C,(1) and Q2)
1 ::~ C(2) < Q2
'1 2 k  '2  k

In a form equivalent to our description, Ulam conjectured that the sequence of random variables 1(n)lnf n = 1, 2, .. . converges in probability to a nonzero constant c. This conjecture was proved by Hammersley [4], and his estimate for c was sharpened by Kingman [7]. A very nice addition to this work was made in a comment by Kesten (see [7, page 903], and [5]) who proved that l(n)Ink converges with probability one.
In the present paper we present a natural extension of Ularn's problem to Rd, d > 2, and provide results on the limit behavior of the analogue to 1(n). These results are given in the first section of this paper and are presented in a form which makes them immediately applicable to a question in mathematical statistics raised by Richardson and Wright [10].
To state their problem we need to introduce the concept of a "lower layer." If X = (C(% C(2,' C(d) ) and y = (C), C . ), . ..' C(d)) are two points in Rd, we
write x < y if C' CM for i = 1, 2, d. A Borel subset A of Rd is called a

Received November 20, 1975.
AMS 1970 classification numbers. 60C05, 60F15, 60K99.
Key words and phrases. Monotone subsequences, lower layers, partial ordering, discrepancy functions, subadditive processes.



lower layer if for any y c A we have fx: x < yj c A. We will denote this class of subsets by 121. To check ones understanding of this definition, he should note that in R' a lower layer corresponds to the set of points which lie below the graph of a monotone decreasing function. Also we note that for any d : 1, the class of sets (oo, CM1 X (_00, C121] X ... x (oo, C(J1] are a small subclass of the class of lower layers, which we will denote by Q.
We denote by 1, the characteristic function of the set A and we denote by X,i= 1,2,3 ... a sequence of Li.d. random variables which take values in R`. Next we define random variables ir,, and ft. by

7r,, = SUP,,, 1 E" l 1,(Xi)  P(X, A)

= SUPAeQ 1 E?=1 'A(Xi) P(Xl A)
1 n
Of course, fr. is the classical KolmogorovStnirnov statistic and describes the discrepancy between the empirical distribution and the theoretical distribution function. For the generalized discrepancy 7r,,, Blum [1] proved that 7r,. > 0 with probability one provided the X, have a density (see also Dehardt [21). Now one of the deepest results on the classical discrepancy fr. is Kiefer's law of the iterated logarithm [6], and the question raised by Robertson and Wright [10] is whether a similar.result holds for 7r.. That is, does one have

lim"__ (2n)17r,. 1 with probability 1 ?
(log log n)k

The answer to this question for d ~t 3 is proved in the second section of this paper to be NO. The results provided there show that the rate of growth of 7r. is much greater than that of 2t.. The question of the possibility of a law of the iterated logarithm for 7r. in R' is still open.
In the final section of this paper we offer two easily stated conjectures whose solution would be of interest, either from the viewpoint of combinatorial geometry or mathematical statistics.

1. The probability of long chains. The section is concerned primarily with chains of points in M with respect to the partial ordering < defined by (CP C,, ... 1 Cd) < (Cl" CJ1 ... 1 Cd') if and only if C, t~ C,' for i = 1, 2, .. ., d. A set which is totally ordered by ~< is called a chain, and a set such that no two of its elements are comparable is called an antichain. The only fact we require from the theory of partial orderings is the Dilworth decomposition theorem:

If a partially ordered set P possesses no antichain of cardinal
m + 1, then it can be expressed as the union of m ' chains.

(For a brief and elegant proof of Dilworth's theorem see Tverberg [ 131)


To gain a quick view of the power of Dilworth's theorem one should note that applied to (R', <<) it immediately implies the result of Erdds and Szekeres mentioned in the introduction.
Now if {yl, y, . . ., y,,} is a set of points in R, we write Ad+(Y11 Y21 ... 1 Y.) for the cardinal of the largest subset of U1, y, . . ., y.) which is a chain. Similarly Ad(y, ys, . . . , y.) is the cardinal of the largest antichain of ly, y, . .  , yj. If X, i = 1, 2, .. . are independent random variables which are uniformly distributed in [0, 1]d we write

Ad+(n) = Ad+(X~, 4 X%)
Ad(n) = Ad(XD X11 ... 1 X.)

These random variables have direct relevance to a variety of problems in probability and statistics. The remainder of this section develops inequalities for these variables and establishes results on their limit behavior.

THEOREM 1. (Recursive inequalities). For d > 3 and any positive integers n, a and P we have

(1. 1) P(A,+(n) P(A+_,(n) a) nP(A.+(a)


(1.2) P(Ad+0) ;5 P)  P(A1 (n) a) nP(A.+(a)

PROOF. The proof depends upon constructing a set of functions which will be used to carry estimates from d  1 dimensions to estimates for d dimensions. We let Y1, Y2, . . ., y,, be a set of points in [0, 11'1 such that no two of the y, have equal first coordinate. By Dilworth's theorem we can decompose the set {yl, ys, y.} into Adl(y, y, y.) chains. We denote these chains by C, i = 1, 2, A d,(y,, y, y,,) and assume that as the index increases the first coordinate of the least element of the chain increases. Now we can define a set of A.~,(y, y, y.) functions 2,(y, y,, y.; Xp X2, X,) where the x, are distinct points in [0, 11. The 2, are defined as follows:

MY1, Y2, . . .' y.; xl, X2, .. .' Xj is the cardinality of the largest subset B, c C, such that for any two elements Y,,, Yj2 of B, such that y,, < y,, we have x,, < Xj2.

If Z,
.' Z1, z,. are i.i.d. random variables with uniform distribution in [0, 11
we write
20, Z) = 201, y.; Z1 9 Z2, Z.) .

We note that 2,(y, z) is a random variable which has a distribution that depends only on the cardinality of the chain C, Moreover, we note

(1.3) If AdLl(y, y2, . * ., y.) :9 a we have

P(2i(yl, ys, . . * , y.; z, z., .  ., z.) ~~t P) ;5 P(A.+(a)



The second basic observation is that

(1.4) P(A,+(n) ~~: P) = Sp,,,pW1) P(InaXi 21 k P) dy
where the maximum is taken over i = 1, 2, . . ., Aj1(y, Y21 ... 1 YO)'
These two observations follow directly from the definition of the 2, and with these observations under our belt the proof of the theorem proceeds by computation. To begin we have

(1.5) P(Ad+(n) k P) ;5 5. P(max, 2, dy + P(Adt,(n) k a)
where A = ly: A+Ay, y, . . ., y,,) :!s~ al, and we have used the fact that SM AMaxl 'i k P) dY :5 S A. dy = P(A+ (n) > a). Next we have d
(1.6) SA P(MaXi 1, k P) dY :_5 S A E i PR k P) dY
where the sum over i is for i from 1 to Aj,(y, Y2, y,,). Since A+
d (Y1,
Y2, y,,) < a on A we have for y e A that
(1.7) P(2i(yl, y,, . . ., y,,; z, z,  . ., z.) t~ P(A2+(a) ~t
by the application of inequality (1.3). This shows that (1.6) reduces to

(1.8) S A P(maxi 2 2: P) dy ;S nP(A,+ (a) ~t P)

where we have also applied the trivial fact that for ally, A y,,) n.
dl(Y1, Y2,
Now by applying inequality (1.8) to the basic (1.5) we obtain

(1.9) P(A,,+(n) nP(A.+(a) + P(A+,(n) a)

and the first inequality of the theorem is proved.
To prove the second half we note that

(1.10) P(Ad+ (n) t~ P(A+ 1(n) ;5 ce) + S B P(maxi Ii dy
where B = ly: A+ (y,, y, y.) > a} C:: [0, 1 ].(d1) and i 1, 2, Ad,(y,
Y2, y.). Further we have

(1. 5. P(max, 2, ~~ P) dy ~5 5. Ei P(2i dy :!~ nP(A,+(a) ;5 P)

by the same proof as that of (1.8). When this last inequality is applied to (1.10) the inequality of the second half of the theorem is proved.
To make use of the recursive inequalities of Theorem 1, both tails of the
distributions of A,+(n) must be estimated. We. obtain such estimates with
the following results. 1

LEMMA 1. For all positive integers n, we have
(1.12) P(A.+(n) 8(n)l) :5 r1
(1.13) P(A2+(n) n118) < r1
where r = (effil < 1.


PROOF. The method for proving the first inequality is a slight variant of a method due to Hammersley [4] which we include for completeness. Let X, = (X, (1), Xi(2) i = 1, 2, ..., n be i.i.d. random variables which are uniformly distributed in [0, 11`. Next let ki = Xj12) if X,") is the ith largest element of the
(1) (1)
set {Xl ' X, , . . ., X,(')}. Now choose r so that 2r < n and let v be the number
of sequences i, < i, < ... < i, < n such that X',, < ki, < ... < By the
uniform distribution of the A we have that

(1.14) E(I~) = E'I<'2<"", P(k'l > 'th ... > k,,)
But if A2+(Xl, X21 ... 'Xy) ~~t 2r then v so by Chebyshev's inequality and
(1. 14) we have
(1.15) P(A2+(X,' 4 X.) k 2r) :5~ (")(r!
By Stirling's formula applied to (1. 15) and by letting r 4[ni] we obtain inequality (1. 12) of the lemma.
For the second half of the lemma, we first observe that Dilworth's theorem
(Xl, X21 Xy) :!5; n118 im
implies that A2+(n)Aj(n) n. This shows that A2
piles A.+(X, X2, Xy) k 8(n)l so by the first half of the lemma we obtain
(1.16) P(A2(Xl, 4 ..., X.) ;5 n118) :!~ r%l 
But the distribution of A2(4 X21 . ., Xyj is the same as the distribution of
A.+(Xl, X2, . . ., X.) so (1. 16) implies the second half of the lemma.
The inequalities of Theorem 1 and Lemma 1 can now be applied to yield a results on the limit behavior of A,+(n).

THEOREM 2. For d > 2 we have with probability one that
(1.17) lim log,,_ A,+(n)/log n = 21d .

PROOF. We first consider d ~> 3. For any sequence of positive integers Ply P29 ... 1 Pd we have

(1.18) P(A,+(n) k A)  P(A+An) k pk_) :!~; nP(A2+(A1) k Pk)
by Theorem 1. On adding these inequalities for k = 3, 4, .. ., d we obtain

(1.19) P(Adl(n) ~~t Pd)  P(A2+(n) k P2) t~ n Ed=. P(A2+(Pk1) k A)
Now by choosing P2k 8(n)l and P,, ~t 8(p,1)l we have by Lemma 1 that
P(A,+(pk_) k pk) ;5 r(Pk1)l for k = 3, 4, . . ., d. So (1. 19) provides the estimate

P(Ad+(n) ~!: Pd) + n Zd=.
and consequently
(1.20) E.=, P(Ad+(n) 21n2'd) < oo .
The same procedure based on the second inequalities of Theorem 1 and Lemma 1

(1.21) E.=, P(A.+(n) ;5 21n21d) < 00


The BorelCantelli lemma thus gives us that 28 :!~ lim inf A,+(n)/n2'd < lim sup A,+(n)/n2'd < 21 with probability one. The result (1. 17) of Theorem 2 now follows by taking logarithms and dividing by log n. For the case d = 2 we can go directly to (1. 19) from Lemma 1 and thus obtain the analogous result to that obtained for d ~t 3, but as noted in the introduction more precise results have already been obtained for d = 2.

COROLLARY 1. For d ~. 2 we have

(1.22) lim inf logn_ Ad(nfflog n ~t 1  21d

with probability one.

PROOF. By Dilworth's theorem we have log Ad+(n) + log Aj(n) k log n so (1.22) is an immediate consequence of Theorem 1.

2. Application to discrepancy functions. We will now focus our attention on the discrepancy function for the class of lower layers defined in the introduction. The next result shows in a strong way that the law of the iterated logarithm is quite far from the nature of the 7rn in Rd, d ~~t 3.

THEOREm 3. Let X, i = 1, 2, ... be Li.d. uniformly distributed random variables in [0, 1 ]d, d > 2. For the generalized discrepancy 2r. = sup,, , 1(1 In) E in=, 1,(X,) P(X, c A)l where L is the class of lower layers we have

(2.1) 7r.  1 Ad(n)
and consequently

(2.2) lim inf._. n'"7rn ~t 29 a.s.

PROOF. For any A E L we can define for 0 < 0 < 1 another element of L by OA = {y: y = Ox, x e AI. We note that limo, P(X, c OA) = P(X, c A). Further, if we choose A such that A has piecewise linear boundary and such that Aj(n) of the elements of (X,(w), X.(w), Xn(w)} are boundary points of A then we have

(2.3) 1,(X,)  10,(X,) k Ad
By (2.3) we immediately obtain that

(2.4) wrn ~> 2`Ad(n)

Thus the first half of the theorem is proved. Finally. (2.2) follows from (2. 1), (1.20) and (1.21).
The preceding result shows that for d 2~ 3 there is no log log law for 7rn. This can also be obtained by a completely . different method as has recently been proved by W. Schmidt [12, page 3871. If 2 is used to denote Lebesgue measure in Rd, and X,, X2, X, is any sequence of points in R", W. Schmidt has proved that

(2.5) SUPAeL 1 ZiLl UX1)  2(A) ~i> Cdn11d
1 n 1


where cd is a constant depending only on d. The rate of growth of wrn given by Schmidt's result is not as great as that provided by (2.2) but, of course, that is more than to be expected since his result pertains to any sequence of points x, x, ... no matter how favorably chosen. In the case of d = 2 both Schmidt's result and the result of (2.2) are compatible with the log log law.
Our next inequality provides an upper bound for 7r. in the case of R'. This result was suggested by an inequality of Hlawka (see [8, 229233]) for the class of convex subsets of R`. Letting 'F. = SUPAeC 1(11n) E,=, 1,(x,)  I(A)l and W the class of convex subsets of [0, Ild, Hlawka proved essentially that ;f.
, "(~t.)lld. It would be futile for us to seek to prove that rn :!~ Wd'(t.)11d for all d ~~ 2 since by Kiefer's theorem fr. = 0((n1 log log n)i) whereas by (2.2) we know that 7r. is not Offlog log n)1n)111d if d ~~i 5. We therefore provide the analog for the class of lower layers of Hlawka's inequality only in the case d = 2.

THEOREM 4. For Rd with d = 2 we have the inequality

(2.6) itn < 7r, :~ 4(7t.)' .

PROOF. The first inequality of (2.6) is, of course, valid for d ~~: 2 and is trivial. To prove the second inequality we first define

(2.7) n*.' = SUPA. . 1 EJ"=~ 1 AXi)  P(X~ c A)l
where R is the class of all rectangles of the form [0, r] x P, PT Since any such
rectangle can be written as [0, r] x [0, P110, A X [0, P] we have t.' :!~ 2t,. Next let r be any positive integer and suppose A is a lower layer. It is easy to prove (but tedious to write down) that for any lower layer A there are polygons P, P, with the following properties:

(i) Each P, is the union of r rectangles of the form [0, a] x [jlr, (j + I)Irl, (ii) P, c A c f, and
(iii) P(X~ c P,\P~) = 11r.

From these properties we have that

max,=1,2 1 zi%=, lpj(xl)  P(X~ G Pi)rt,' :!~ 2rt.
1 n
Further by (iii) we have

1 E?=' 1,1(X,)  P(X~ C Pj + P(Xl c Pj  P(Xl c A) n

in= 1 1 A (X1)  'P(X~ c A) n

1,2(X,)  P( X, e Pj 4 P(X, e P.)  P(X, c A)

This shows as a consequence of (iii) that
1 En=, 1,(Xi)  P(X~ e A) 2rfr, +
n r



Since A was arbitrary we have proved that

(2.8) ir~ < 2rfr, + 1 for all r .

By taking r = [ft.1] the inequality (2.6) is established.
At this point a note should be made about the generality of the results given here. The assumption that the X, are uniformly distributed has been used through this paper only to avoid making repeated reductions to this case. Since the class of lower layers is preserved under mappings of R,' which are monotone in each coordinate, the study of arbitrary X, with a density can be reduced to the case of the uniform distribution (see e.g. Rosenblatt [ 111). Indeed, all of the results of this paper are valid under the assumption that the X, have a density.

3. Two conjectures. In this final section we offer two conjectures.

lim.A,,+(n)/n1" = c < oo .

lim,, M7rfflog log n)l = oo a.s. when d = 2.

The first conjecture is, of course, a stronger version of our Theorem 2 and is the "one rightful heir" of the beautiful theorem of HammersleyKingmanKesten which solves Ularn's problem. The second conjecture would complete the answer to the question raised by Richardson and Wright. One should note that Philipp [91 has proved the log log law in R' for the convex sets; so, although the second conjecture is not as central as the first, there is still, perhaps, something subtle to uncover.

[11 BLUM, J. (1955). on convergence of empirical distribution functions. Ann. Math. Statist. 26527529.
[21 DEHARDT,L(1971). Generalization of the GlivenkoCantelli theorem. Ann.Math.Statist. 42 20502055.
[31 ERDOS,P.andSZEKERES,GY.(1935). A combinatorial problem in geometry. Composito 2463470.
[4] HAMMERSLEY,LM.(1972). A few seedlings of research. Proc. Sixth BerkeleySymp. Math. Math. Statist. Prob. 345394, Univ. of California Press.
[51 HAMMERSLEY,LM.(1974). Postulates for subadditive processes. Ann. Probability 2 652680.
[61 KIEFER, J. (1961). On the large deviation of the empiric d.f. of vector chance variables and a law of the iterated logarithm. Pacific J. Math. 11649660.
[7] KINGMAN, J. F. C. (1973). Subadditive ergodic theory. Ann. Probability 1883909.
[8] KuIPERS, L. and NIEDERREITER, H. (1974). Uniform Distribution of Sequences. Wiley, Toronto.
[91 PHILIPP, W. (1973). Empirical distribution functions and uniform distribution mod 1. Diophantine Approximation and Its Applications (C. F. Osgood, ed.). Academic Press, New York.
[10] ROBERTSON, T. and WRIGHT, F. T. (1974). On the maximum likelihood estimation of stochastically ordered random variates. Ann. Statist. 2 528534.