Please see PDF version

The Annals of Probability
1980, Vol. 8, No. 6,10791084



Stanford University

Given a process (X,}, any permutation a:[ 1, n] 1, n] determines an order statistical event A (a) = (X,(,) < X.(2) < ... < X.(,)}. How many events A (a) are needed to form a union whose probability exceeds 1  E? This question is answered in the case of stationary ergodic processes with finite entropy.

1. Introduction. One of the key properties of independent, identically distributed continuous random variables Y, 1 :s i:s n, is that the order statistical events defined by

{CO: Yo(l) < Yo(2) < ... < Yo(.) }

have the same probability 11M for any permutation cr: [1, n]+ [1, nj. The main objective of this paper is to determine the extent to which this property is retained asymptotically for processes which are only assumed to be stationary and ergodic.
To set the problem precisely, we suppose that (X } ' % 1 is a strictly stationary, ergodic process
defined on the probability space (2, A, P). We will also use the representation ofsuch a process
by X(.) = f(TL+lo), 1 :5 i < , where T: 2 > 2 is an ergodic measure preserving
transformation and f: 2 > R is a measurable map. To avoid inessential messiness, we also
restrict attention to processes which satisfy the continuity property
(1.1) p(xl = Xi) = 01

Our approach to the analysis of the order statistical events is motivated by the ShannonMeMillan Breiman theorem, and particularly the phrasing of that result in terms of the equipartition property Q 1 ], page 13 5, [9], page 3 5 (6.3)). Loosely speaking, that phrasing tells one in terms of the entropy of Tjust how many sets of a certain type are needed to cover most of 2.
To establish a comparable result for the order statistical events, we let Qn(F) be defined for any F EE ~F by

Q.(F) X.(1) (W) < X.(2) (W) < < Xo(tt)(W), for some w EE F}

Here 1 S 1 denotes the cardinality of the set S, so Q.(F) is equal to the least number of order statistical events

A. = {CO: Xo(l)((0) < X(2)(W) < ... < Xo(tt)(W)} which one needs to cover F.

The quantity of main interest is now defined for E > 0 by

(1.2) Qn*(E) = minE.P(E),, Q.ffi\E),

80 Qn*(E) is the least number of A. which will cover a set of probability 1  E.
To familiarize Qn*(E), we note that if the (X,}',, are i.i.d. and satisfy (1.1), then Qn* is equal to the least integer greater than (1  E)M.
In this particular example {X,}',, is a process with infinite entropy, and Q,*, (E) is near its

Received December 13, 1978.
' Research of the second author was supported in part by the National Science Foundation (NSF MCS 7716974).
AMS 1970 subject class~fications. Primary 60G10; secondary 60005.
Key words andphrases. Order statistics, entropy, stationary processes, de Bruijn sequences, directed graphs, equipartition property.



a priori upper bound. One intuitively expects that Qn* (E) should be of smaller order than n! for processes with finite entropy. We show more precisely that in that case Qn*(E) is, in fact, exponentially smaller.

THEOREM 1. For any stationary ergodicprocess {X} ',, which hasfinite entropy and satisfies P(Xi = X,) = 0, i 0 j, there isfor any E > 0 a sequence ofpositive reals p. tending to zero for which

(1.3) Qn*(E):5 (n !),n, for all n ~ 1.
Before giving the complementing result, two comments are in order. In the first place we note there are many processes satisfying the hypotheses, since iff satisfies P (w : f '(w) = y} = 0 for ally E R, the condition P(X, = Xj) = 0, i 0j, trivially holds for any measure preserving T. Also, transformations of finite entropy not only abound but play key roles in such distinct subjects as statistical mechanics and the metrical theory of diophantine approximation Q11).
Second, we note (1.3) is equivalent to saying Qn*(E) = o(p'n!) for each p > 0. The phrasing of Theorem 1 was chosen in view of the next result which makes precise the sense in which Theorem 1 is best possible.

THEOREM 2. For any positive p. which tend to zero there is a stationary, ergodic process (X),', with P(X = Xj) = 0, i 0 j, which has zero entropy, and which satisfies

Qn*(E) k (n!)p,~ for infinitely many n and any E < 1.

The preceding theorem is easily seen to be a consequence of the next result which shows that the underlying T plays a surprisingly small role in determining Qn* (E).

THEOREM 3. Given any ergodic measure preserving transformation T on a nonatomic
probability space P), andgiven anyp,, > 0 tending to zero, there is anf: 2> [0, 1] which
(1.4) P({w:f'(w) = X}) = 0, V X E [0, 1],
(1.5) Qn*(e) k (n
for infinitely many n and any E < 1.

In the next section we give the proof of Theorem 1 as a consequence of a counting argument and the application of the ShannonMeMillanBreitnan theorem to an appropriately chosen partition.
The proof of Theorem 3 is more subtle and makes use of a generalization of a combinatorial structure known as de BruiJn sequences.
Since the construction provides a technique,for building copies of a finite sequence of independent random variables inside a general stationary process, the construction should be useful in a variety of problems.
Finally, in the fourth section a brief speculation on the theory of order statistical events is ventured.

2. The upper bound method. For any measure preserving transformation S: 2 > 2 and any partition 9 = {P,} of 2, the sets given by

(2.1) n n1
j0 (w: Si(w) EE Pi,} for some 1 :5 i, :S s will be called the npS name associated with the npS ~~ (ii, i2,


ij}. If S is ergodic and has entropy H(S) < cz, the ShannonMeMillanBreiman theorem says there is an no = no(E, a, S, p) such that for n k no there is a collection of 2` of the npS names whose union has probability at least 1  E.
Since P(X = Xj) = 0, i 0 j, the disjoint sets P. given by the permutations of { 1, 2, k}

P. = {W: X.0) < X(2) < ... < X(k)}

have union 2 (except for a set of measure zero). The partition p = (P,} can be related usefully to the possible orderings of {Xi}~,',.

LEMMA2.1. For any npTk nameA we have

Q.k(A):5 (nk)!1(k!)".

PROOF. First consider n = 2. The 2pTk name A has an associated sequence (ii, i2); and il determines the ordering of R, = {Xi(w), X2(w), . , Xt(W)}, while i2 determines the order of R2 = {Xk+l(W), Xk+2(W), ..., X2k(w)}. To count the possible orderings of R, u R2, we note the set R, determines k + 1 intervals (, X(,)), (X(I) X(2)), * * *, (X(.), ) where {X(Jk,, are the order statistics of (Xi}*,i. Since there are 2k ways of putting the order statistics of {Xj}?,A!k,l into the k + 1 intervals, we have

Q1t(A):5 2k) (2k)!/(k!)2. ( k
In general, we see for R. = Mk+ly X~2, * * *, X.+1)k} and 0:5j < n thatuj'o R., determines

(t + 1)k + 1 intervals into which the order statistics of R,, can be placed in (t + ')k ways. ( k
Making the sequential choices we have

Q.k(A):5 2k)(3k) ... (nk (nk)!(k!)",
( k k k

which completes the lemma. 0
To prove Theorem 1 we need to bound the number of np TA' names which are needed to cover a set of probability 1  E. We first note that any nkTp name is contained in some n Tkp name because for any alias Qj}pko ' one has

jkW E= p,,.,} nk1
n.n01 (w: T D nj0 (w: Tiw EE PJ.

The ShannonMeMillanBreiman theorem applied to the ergodic transformation T with entropy H(T) < a < oo says there is a collection W of 2,nk of the nkp T names whose union contains 2\E with P(E) < E for all n k no = no(E, p). By the preceding remark this also implies there is a collection W' of 2"nk of the np T* names with the same property.
We now see

Qnkffi\E):5 EA.=W' Qnk(A)s 2'"k(nk)!(k!)n

where the last inequality follows from Lemma 2. 1. For any k we can write m = nk + r with 1 :5 rs k one has

QM.E):s Qtn,1)k(E) 5 2'n')k((n + 1)k)!(k!)n1

provided n k no. k k
Since (n + 1)k!lm! :5 ((n + 1)k)~ and k! k k e ' we have

Qt(E)s m![2(*+')'((n + 1)k)k(kle)k (n+l)


and the fixed integer k was arbitrary, so

QM*(.E)  O(Pmm!)

for all r > 0. The implied constant depends not only on H(T) but on T through the no given by the SMB theorem. As noted earlier, this last relation is sufficient to imply Theorem 1.

3. The lower bound method. The first lemma required for Theorem 3 is the socalled strong form of Rohlin's lemma Q8], page 22) which provides a systematic method for applying combinatorial constructions to stationary processes.

LEMMA 3. 1. Suppose T is an ergodic, measure preserving transformation on a nonatomic probability space (2, A, P). For anyfinitepartition v= (H,, H2, .. ., H.) of a andfor any real E > 0 and integer m there is an E E Y with thefollowing properties:

(3.1) E, T1E,  . . , T'lE are disjoint;
(3.2) P(U.50' T1E) ~ 1 .E;

(3.3) P((T'E) n H.) = P(E)P(Hj), 0:5i
The second lemma we need is a graph theoretic result due to I. J. Good Q51, [61, page 95) which sharpens a wellknown result of Euler.

LEMMA 3.2. If G is a connected directed graph, and !f at each point of G there are the same number of arcs going out as coming in, then there is a directed cycle in G that goes through every arc of G in its given direction, and uses no arc twice.

As an application of Lemma 3.2, we will obtain the existence of what can be called sary de Bruijn sequences. To introduce these sequences, we recall the classic result of de Bruijn which says the following: given a positive integer n, there is a sequence of O's and I's of length N = 2n, say aia2a3 ... aN, such that the ntuples aa,+, ... a,.,,, are a complete list of all 2' of the ntuples with alphabet {0, 1}.
Here a, is understood to follow aN, etc. in the cycle. For example, when n = 3, the cycle (00010111) contains every 3tuple of O's and I's exactly once. We will use the following generalization where d is an alphabet of s letters and 2 is the set of all ordered ktuples of sy.

LEMMA 3.3. There is a sequence a, a2 ... aiv with N = sk of the elements ofc?y such that each element of 3 occurs exactly once in the set of ktuples (ar+l, ar+2, a,+k), where 0 5 r < sA' and at = a. !f t > s* and t  sk = u.

PRooF. We define a directed graph G whose edges are the ordered (k  1)tuples formed by elements of sY. We have an edge from (bi b2 ... bki) to (b l, ..., Mk1) provided b2 = b11, b3 = bl2 = bkj and b'ki is arbitrary. Every vertex has indegree and outdegree equal to S, so Lemma 3.2 implies there is a cycle which traverses the edges of G and uses each exactly once. From such a cycle the sequence of a, EE d given by the successive bk_1 is easily checked to satisfy the claim of the lemma.
The proof given of Lemma 3.3 is only a mild modification of the application given by Good [5] and which Bondy and Murty [2] page 181, relate to the design of an efficient computer drum. A completely different algorithmic proof of Lemma 3.3 was developed independently in recent work of Fredricksen and Mariorana 141.
We now proceed to prove Theorem 3 by applying the preceding lemmas infinitely many times.
We suppose now that PXi1, (t,}"1, (hi),2', are increasing sequences of positive integers and {E,} j_1 is a sequence of positive reals decreasing to zero. We will define a sequence of
functions go(co)  0, 91M, 92M, gk(w), where each gk(w) will be defined viapk, tk,


hk, Ek, and the preceding &(w), 0 :5 j < k. Also, we should remark that each of the gi(w) will assume only finitely many values.
For notational convenience we temporarily write p = Pk, t = th, h = hk, and E  Ek. We define si by (a: a = c,2', E, = 0 or 1, 1  r 5 p}, and note by Lemma 3.3 there is a sequence of Id 1' = VP elements of si which form a cycle in which each ordered ttuples with letters from d appears precisely once among the (ai+l, at+2, .. ., a,+t) for 0:5 1 < 20.
We now apply Lemma 3.1 to obtain an E such that for 0s k < 2h20 the sets TkE are disjoint and their union has probability at least 1  e. For the fmite partion 10'we take the partition given by the distinct values of the sum Eik1' g,(w)2 q' where qj = p, + P2 + + pi, 1 :sj < k.
We now define gk(w) = a, if w E T" for 1 :s is h2 tP and i =_ i'mod 2 tP. Finally, we take gh(w) = 0 if w is in none of the T'+IE, 1 s k  h20.
The whole point of this construction is that now by setting Rk equal to the union of T`E with 0s i < (h  1)VP and letting Pk(A) = P(A n Uk)IP(2k), we see that the random variables gA!(w), gk(T'w), . , g*(T"w) are independent in the probability space (2k, Pk, J%) where A = (A n 2k:A E= JW}. Moreover, these random variables are also independent when, conditioned on the afield given by the partition .V. To prove these assertions one only has to note that for any H E A'we have by (3. 1) and (3.3)

P((gk(w) = so21, gt(Tlw) = s12,',', ..., gk(7t+'co) = st+12PI1} n H)

(h  1)P(E)P(H)1(h  1)2PP(E) 2`PP(H).

Finally, we are able to definef(co) by letting qk =EJ', p., and setting

f(w) = E'k~. gk(w)21.

The sum representingf(w) converges for all w, and one finds no difficulty in checking that for Xj(w) = f(T"w) we have P(X, = Xj) = 0 for all i 0 j.
To prove Theorem 3 we suppose that the sequence p. 10 and E E (0, 1) are given, and we
will proceed to show that (p.}.',, {h,},ti, and (Q',_1 can be chosen so that Qn* (E)
n!pnn for infinitely many n.
For the intervals 1.  [s2qk1, (S + 1)2") with 0  s < 2q*` we define random variables
P, as the number of elements of J, = (i: X E 1, 1 :S i :5 n}. The ordering of {Xi} ir=o. is
completely determined by the ordering of except for at most those W E 17=
{w: gk(T'lw) = gk(TJ+'w), for some 0 5 i < j :5 n}. Using the Phindependence of the
gk(T'lw)} 1 :5 i < n for ns t and the conditional independence given Yewe have

PQX(1) < X(2) < ... < X.W} n H)
:S eA! + hV + P({X.(1) < X(2) < ... < X.(,,)) n H n 2k)

5 ek + hk1 + Pk((X.(1) < X(2) < ... < X.(.)} n H)

:5 ek + hk1 + PA!(_07) + Pt(H)(11~ m)1

Since there are only ( n ) places a tie can take place and the P*probability of any such tie is 2
2qk, we have

P(_17)s (1  Et  N1)1Ph(Y)  2 (n )21* < n22~1k.

Also, Pk(H) = P(H) for all H r= V by (3.3); so summing over .V we have
P(X.(1) < XW < ... < X~(~)):5 2q,,(Ek + hk') + n22 + Ipp.!)

q*, Er1 )r by
Setting r = 2 we have 80 P, = n. Hence, Il~(P,!) F(P. + 1) ~: F((n + r)lr the


convexity of log r(x), [71, page 285. We begin with the bounds (3.4) Q.*(E) ~ (1  E){max,P(X.(1) < 42) < ... <

(1  E)n! r(n + 1)r n + r + r(n + 1)(2_% 2 + rEk + rh k 1)
f C~ 1
Since r = 2 "*' is fixed (as we choose Ek, ph, 1k, and hk), we invoke Stirling's formula to obtain lr(n + 1)17((n + r)1r)` S (r + 1)' for n k no(r). Since p. 10, we can now choose t = th ~ no so that

(3.5) (r + l)`k 2p,'(1  E)~'.
Finally, we choose ek, hk, andpk so that

(3.6) {r(t + 1)(2p"2 + rEk + rhk`))1 k 2pt(l  E)`.

By the elementary inequality ll(a + b) ~. (V2)min (I1a, llb} applied to (3.4), (3.5), and (3.6) for each t = tk, k = 1, 2, ... we have
Qn*(E) ~ p",(n!) for n = ti, t2,
The proof of Theorem 3 is thus complete. 0

4. A brief speculation. The only two cases where one has a reasonably complete understanding of Qn*(E) are the case of continuous Li.d. random variables and now in the case of continuous stationary ergodic processes with finite entropy. These two cases are in a way polar opposites of generality, and many important classes of processes lie in between.
Since many basic probabilistic events are simply unions of the order statistical events, it would seem to be of considerable interest to discover those cases in which a precise understanding of Qn*(E) can be obtained.


[1) BILLINGSLEY, P. (1965). Ergodic Theory and Information. Wiley, New York.
[2] BONDY, J. A. and MURTY, U. S. R. (1976). Graph Theory with Applications. American Elsevier, New
[3] DE BRuijN, N. 0. (1946). A combinatorial problem. Nederl. Akad. Wetensch. Proc. Ser. A 49 758764; Indag. Math. 8 461467.
[41 FREDERICKSEN, H. and MAIORANA, J. (1978). Necklaces of beads in k colors and kary de Bruijn sequences. Discrete Math. 23 2072 10.
[51 GOOD, I. J. (1946). Normal recurring decimals. J. London Math. Soc. 21 167169.
161 HALL, M., JR. (1967) Combinatorial Theory. Blaisdell, Waltham, Mass.
171 MITRINovit, D. S. (1970). Analytic Inequalities. Springer, New York.
181 SHIELDS, P. (1973). The Theory ofBernoulli Shes. Univ. Chicago.
[91 SMORODINSKY, M. (197 1). Ergodic Theory, Entropy. Lecture Notes in Math. 214 Springer, New York.