VARIATIONS ON THE MONOTONE SUBSEQUENCE
11
THEME OF ERDOS AND SZEKERES*

J. MICHAEL STEELEt

Abstract. A review is given of the results on the length of the longest increasing subsequence and related problems. The review covers results on random and pseudorandom sequences as well as deterministic ones. Although most attention is given to previously published research, some new proofs and new results are given. In particular, some new phenomena are demonstrated for the monotonic subsequences of sections of sequences. A number of open problems from the literature are also surveyed.

Key words. Monotone subsequence, unimodal subsequence, partial ordering, limit theory, irrational numbers, derandornization, pseudorandom permutations.

AMS(MOS) subject classifications. 60C05,06A10

1. Introduction. The main purpose of this article is to review a number of developments that spring from the classic theorem of Erd6s and Szekeres (1935) which tells us that from a sequence of n2 + 1 distinct real numbers we can always extract a monotonic subsequence of length at least n + 1. Although the Erd6sSzekeres theorem is purely deterministic, the subsequent work is accompanied by a diverse collection of results that make contact with randomness, pseudorandomness, and the theory of algorithms.
Central to the stochastic work that evolved from the discovery of Erd6s and Szekeres is the theorem that tells us that for independent random variables Xi, 1 < i < oo, with a continuous distribution, the length In of the longest increasing subsequence in {Xl, X2, . . ., XJ,

I,, = max{k : Xi, < Xi~ < ...
satisfies
(1. 1) 1 i m In/ ,/n = 2 with probability one .
n co

The suggestion that 2 might be the right limiting constant was first put forth by Baer and Brock (1968) who had engaged in an interesting Monte Carlo study motivated by S. Ularn (1961), but the first rigorous progress is due to Hammersley (1972) who showed In/,/n_ converges in probability to a constant C > 0. The constant C proved to be difficult to determine, but Logan and Shepp (1977) first showed C > 2 by a sustained argument of the calculus of variations and then C < 2 was established by Vershik and Kerov

* Research supported in part by NSF Grant Number DMS 9211634 and Army Research Office Grant DAAL0391G0110.
1 Department of Statistics, University of Pennsylvania, The Wharton School, 3000 Steinberg HallDietrich Hal~ Philadelphia, PA 191046302.

ill

i

112 J. MICHAEL STEELE

(1977) using information about the Plancherel measure on Youngtableaux. The later result was subsequently simplified by Pilpel (1986) who also gave the bound valid for all n,
n
EIn E 1 / 07
j=1

More recently, Aldous and Diaconis (1993) have given an insightful proof that C = 2 by building on elementary aspects of the theory of interacting particle systems.
In the next section we sketch six (or more) proofs of the Erd6sSzekeres theorem. The central intention of reviewing these proofs is to see what each of the methods can tell us about combinatorical technique, but the very multiplicity of proofs of the Erd6sSzekeres theorem offers an indication that the result is not so special as one might suspect. Quite to the contrary, the Erd6sSzekeres theorem offers us a touchstone for understanding a variety of useful ideas. The third section looks at generalizations of (1. 1) to d > 2 and to structures like unimodality. Section four then develops the theory of monotone subsequences for pseudorandom sequences. That section gives particularly detailed information about the Weyl sequences, x, = na mod 1 for irrational a. Section five develops the theory of monotone subsequences for sections of an infinite sequence.
The final section comments briefly on open problems and on some broader themes that seem to be emerging in the relationship between probability and combinatorial optimization.

2. Six or more proofs. Perhaps the most widely quoted proof of the Erd6sSzekeres theorem is that of Hammersley (1972) which uses a visually compelling pigeonhole argument. The key idea is to place the elements of the sequence xl, x2,.. ., x,,, with m = n2 + 1 into a set of ordered columns by the following rules:
(a) let xl start the first column, and, for i > 1,
(b) if xi is greater than or equal to the value that is on top of a column, we put xi on top of the first such column, and
(c) otherwise start a new column with xi. The first point to notice about this construction is that the elements of any column correspond to an increasing subsequence. The second observation is that the only time we shift to a later column is when we have an item that is smaller than one of its predecessors. Thus, if there are k columns in the final structure, we can trace back from the last of these and find monotone decreasing subsequence of length k. Since n2 + 1 numbers are placed into the column structure, one must either have more than n columns or some column that has height greater than n. Either way, we find a monotone subsequence of length n + 1.
Hammersley's proof is charming, but the original proof of Erd6s and Szekeres (1935) can in some circumstances teach us more. To follow the

VARIATIONS ON THE MONOTONE SUBSEQUENCE 113

original plan, we first let f (n) denote the least integer such that any sequence of f (n) real numbers must contain a monotone subsequence of length n. One clearly has f (1) = 1, f (2) = 2, and, with a moments reflection, f (3) = 5. By using the construction that extends the example {3, 2, 1, 6, 5,4, 9, 8, 7} we see f (n) > (n  1)2, and the natural conjecture is that f (n) = (n  1)2 + 1. The method we use to prove this identity calls on a modest bit of geometry, but the combinatorial technique that it teaches best could be called the abundance principle: In many situations if a structure of a certain size must have a special substructure, then a somewhat larger structure must have many of the special substructures.
The ErMsSzekeres proof is quickly done at a blackboard, although a few more words are needed on paper. To show by induction that f (n) = (n  1)2 + 1, we begin by considering an integer b > 0 and a set of f (n) + b distinct points in R2. By applying the induction hypothesis b + 1 times we can find b + 1 distinct points that are terminal points of monotone sets of length n. For the moment we withhold our budget set B of b points, and we invoke the induction hypothesis on set P with cardinality f(n). The induction hypothesis gives us a monotone subsequence of length n. We remove the last point of this sequence from P, and we add one of the budget points from B to get a new set P' of cardinality f (n).

FIG. 2.1. The dots denote the points of S+, and the shaded region is the region D of points that are not comparable to the points of S+ in the up andtothe right order.

1

114 J. MICHAEL STEELE

Now, just to get started, suppose we took b = 2n, then since any terminal point must be associated with an decreasing sequence or an increasing sequence, we can suppose without loss of generality that there are n + 1 points S that are terminal points of increasing sequence of length n. If some two elements of S were in increasing order, we could extend one of the length n increasing sequences to a sequence of length n + 1. On the other hand if no two elements of S are in increasing order, then S constitutes a set of size n + 1 in decreasing order. We have therefore shown f (n + 1) < f (n) + 2n, which is enough to show that f (n) < n2 + 1, but we can do better by exercising a little more care.
This time we add just b = 2n  1 new points and get 2n terminal points. If we have n + 1 of either of the two types of terminals, then we can proceed as before to find a monotone sequence of length n + 1. Therefore we can suppose that there are exactly n terminal points of each type. If S+ and S denote the terminals of the increasing and decreasing sequences, then by the argument of the previous paragraph there is no loss in assuming that S+ forms a decreasing set (see Figure 2.1). Now if any point x E S were to be upandtotheright of any point of y E S+, then x would combine with the increasing sequence sending y to form an increasing sequence of length n + 1. Therefore all points of S are in the region D. These are an increasing sequence of length n, and the last of there points is majorized by some y e S+, telling us S+ U {y} would give the required subsequence.
More will be said later about the virtue of the Erd6sSzekeres proof, but first we want to recall what is perhaps the slickest and most systematic proof, the one due to Seidenberg (1959) and which is naturally suggested by dynamic programming. We take S = 1P1, P2,  . .' Pm } C R2 distinct points ordered by their xcoordinates, and we define ~o : 5  ~ Z x Z by letting ~p(p) = (s, t) if s is the length of the longest increasing subsequence terminating at p, and t is the length of the longest decreasing subsequence terminating at p. We are not too concerned with algorithms in this review, but at this point one may as well note that there is no problem in calculating ~P(Pk) in time 0(k) given ~o(pi), ~C(P2),.. ., ~C(Pk1) so the complete computation of W on S can be determined in time 0(m2), (cf. Friedman (1975)). Now, if S contains no monotone subsequence of length n, then ~o(S) ~~ {l, 2,..., n  1} x {l, 2,..., n  1}. But ~o is injective, since, if p, q e S and q follows p in xcoordinate order, then ~o(q) might have at least one coordinate larger than the corresponding coordinate of ~p(p). Hence we see that if m > (n  1)2 + 1 we find that S must contain a monotone subsequence of length n. In other words, we have f (n) ~~ (n  1)2 + 1 and thus complete our third proof of the Erd6sSzekeres Theorem.
The fourth proof we consider is one due to Blackwell (1971) that is not so systematic as that of Seidenberg (1959) or as general the original proof of Erd6s and Szekeres, but it serves well to make explicitly the connection to greedy algorithms.
If S = {X 1, X2,. . ., X,} is our set of r > nm distinct real numbers, we

VARIATIONS ON THE MONOTONE SUBSEQUENCE 115

say a monotone decreasing subsequence S' is leftmost if x' = xl and each term x~ of S' is equal to the next term of S' which is smaller than x~.
Thus S' is the consequence of applying a greedy algorithm to the sequence S.
If we successively apply this greedy process to the points that remain after removal of the leftmost decreasing subsequence we obtain a decomposition of S into S,, S2,. . ., St where each Si is a decreasing subsequence. The observation about this decomposition is that we can construct an increasing subsequence 1 a,, 2,. . ., at } of S by the following backward moving process:
1. Select at arbitrarily from St
2. For any j = t down to 1, select aj as any term in Sj that is smaller than aj.
Because of the definition of the Sj, 1 < j < i we can always complete the steps required in this process. We then find either an increasing set {a,, a2, .. , at} with t > n or else one of the decreasing subsequences Sj has cardinality bigger than m. In retrospect one can see that Blackwell's proof is almost isomorphic to Hammersley, though the associated picture and algorithmic feel are rather different.
The fifth proof is closely related to the one just given, but it still offers some useful distinctions. In the solution of Exercise 14.25, Lov~sz (1979) suggests that given a set S of n2 + 1 real numbers {Xl, X2, . . ., X0+1} one can define a useful partition AI, A2.... of S by taking Ak to be the set of all xj with 1 < j :5 & + 1 for which the longest increasing subsequence beginning with xj has length exactly equal to k. One can easily check from this definition that each of the sets Ak = {il < i2 < ... < i, } gives rise to k monotone decreasing subsequence xi, > xi, > ... > xi, and from this observation the Erd6sSzekeres theorem follows immediately. This last proof has the benefit of showing that any digraph with no directed path of length greater than k has chromatic number bounded by k.
The fifth proof is one that deserves serious consideration, but which would lead us too far afield for us to develop in detail. The central idea is that of the construction of Schensted (1961) that provides a onetoone correspondence between pairs of standard Young tableaux and the set of permutations. This correspondence as well as its application to the theorem of Erd6s and Szekeres and to algorithms for the determination of the longest monotone subsequence of a permutation are well described in Stanton and White (1986). The work of Schensted (1961) was substantially extended by Knuth (1970) to objects that go well beyond permutations. We do not know the extent to which the correspondence provided by Knuth (1970) might contribute to the central problems of this review.
Our final observation on the proofs is just to note that the Erd6sSzekeres theorem also follows from the well known decomposition theorem of Dilworth (1950) which says that any finite partially ordered set can be partitioned into k chains Cl, C2, .. ., Ck where k is the maximum cardinality

116 J. MICHAEL STEELE

of all antichains in S. Using our previous S with the upandtotheright ordering, we see that if there is no decreasing subsequence of length n, then k < n and

1C11 + 1C21 + ICki = IS1 > (n  1)2 + 1

implies that for some 1Cj I we have 1Cj I > n. Since Cj corresponds to an increasing sequence, we have the final proof.
Dilworth's theorem has itself been the subject of many further developments, some of which are directly connected to the issues engaged by this review. For a recent survey of the work related to Dilworth's theorem and sketches of several proofs of Dilworth's theorem, one should consult Bogart, Greene, and Kung (1990). A result that generalizes both the Dilworth decomposition theorem and the digraph theorem of the previous paragraph is the theorem of Gallai and Milgram (1960) which says that if a is the largest number of points in a digraph G that are not connected by any edge, then G can be covered by a directed paths.

3. Higher dimensions. The charm of the Erd6sSzekeres monotonicity theorem goes beyond the call for innovative proofs. There are several useful extensions and generalizations, though amazingly almost all of these have grown up a good many years after the original stimulus.
One natural issue concerns the generalization of the monotonicity the6rem to ddimensions. If we define a partial order on Rd by saying y << w if Y (Y1, Y2,..., Yd) and w = (Wj 7 W2,.. ., Wd) satisfy y, :5 wl, y2 < W2, Yd < Wd, then the natural variables of interest are A+(yl, Y2~  . , Yn), the length of the longest chain in the set {Y1, Y2 ~   ., Yn } C Rd and correspondingly A (yl, Y2,. . ., Yn), the cardinality of the largest antichain in {Yl, Y2, .. ., yn }. After tentative steps in Steele (1977) which were nevertheless enough to settle a conjecture of Robertson and Wright (1974), the definitive result was established by Bollob;is and Winkler (1988). Their main result is that for {Xi : 1 < i < oo}, independent and uniformly distributed on [0, 1]d, one has

lim A+(Xl,X2,,Xn)/n' 1d= Cd > 0
noo

where the limit holds with probability one and where Cd is a constant that depends on the dimension d > 1. Further, Bollob~s and Winkler (1988) also showed that the constants Cd satisfy the interesting relation

00 1
liM Cd = 1:  = e.
d oo n=0 nt

Since it is trivial that cl = 1 and since we know from Logan and Shepp
(1977) and Vershik and Kerov (1977) that C2 = 2, we thus have in hand
three instances that support the humorous but feasible speculation due to

VARIATIONS ON THE MONOTONE SUBSEQUENCE 117

Aldous (personal communication) that perhaps

d 1
Cd = E ~!
n=0 *

for all d > 2. The correctness of this interpolation and also the one based on Cd = (1 + 11(d + 1))d 1 are placed on slippery ground by simulations of P. Winkler and R. Silverstein (cf. Silverstein (1988)) that suggest that C3 is approximately 2.35, a value that does not agree well with either of the two candidates of 5/2 and 9/4. Still, as the investigators note the convergence in the simulations was very slow, and the value of 2.35 is the result of heuristic curve fitting. The determination Of Cd remains of interest from several points of view.

4. Totals and functionals of totals. Lifschitz and Pittel (1981) have studied the total number T, of increasing subsequences of n independent uniformly distributed random variables. Among other results they found

ET,  an1/4e 2,/n

where a = (2.,,/ire)1, and

E (T,2, flo n  1/4 exp (01 n 1/2)

where 31 2(2 + v/5_) 1/2 and Oo is equal to (207r(2 + .,/5) 1/2 exp (2 + ,/5_))1/2 0.016. These asymptotic results were obtained by analytic methods beginning with the easy formula

n 1
E(Tn) n
k=0

and the trickier

E (T,2, r n (k+l)/2+£ 1
1 ( k
k+l
Lifschitz and Pittel (1981) also proved that there is a constant Y such that as n  ~ oo

n logTn  Y

in probability and in mean. The exact determination of y remains an open problem, though one has the bounds

21og2 < y < 2.

118 J. MICHAEL STEELE

5. Unimodal subsequences. The most natural variation on the theme of monotone subsequences is perhaps that of unimodal subsequences, which are those that either increase to a certain point and then decrease, or else decrease to a point and increase thereafter. Formally, given a sequence S = {Xl, X2 .... X.} we are concerned with U,,(S) defined by U"(S) max(U~, Uj) where

U,t max{k:31 Xil < Xi2 < ... < Xii > Xii+1 > ... > xi, for some j}

and

Uj max{k:31T il > Xi2 > ... Xij < Xij+1 < ... < xi, for some j}

In a remarkable tourdeforce, Chung (1980) established the result that for any sequence S of n distinct reals, we have

(5.1) U,, :,~ [(3n  3/4) 1/2 _ 1/21,

and, moreover, this result is best possible in the sense that for any n > there is a sequence of distinct reals for which we have equality in (5.1). The complexity of the proof of this result is of a different order than that of the Erd6sSzekeres theorem, although there are important qualitative insights that Chung's proof shares with the dynamic programming of the Erd6sSzekeres theorem that was given in Section two. The proof provided by Chung calls instead on four functions parallel to the two components ~o of Section two, although instead of considering only one simple injective image contained in [1, n] x [1, n], Chung must consider several such images that are contained in more complex domains.
With the deterministic problem resolved, Chung (1980) posed the natural problem of determining the asymptotic behavior of Un (X,, X2,.. ., X,) when the Xi are independent and random variables with a common continuous distribution. This turned out to be much easier to resolve than the problem solved by Chung, and in Steele (1981) it was proved that we have that

Un 2 Y/2

with probability one. The corresponding constant when one permits k changes in the sense of the monotonicity was also found to be 2y/k.

6. Concentration inequalities. The length I,, of the longest increasing subsequence of n independent uniformly distributed random variables has the property of being rather tightly concentrated about its mean. One way to see this in terms of variance is to call on the EfronStein inequality which in the form given by Steele (1986) tells us that if F(yj, Y2,  . ., Yn 1)

VARIATIONS ON THE MONOTONE SUBSEQUENCE 119

is any function of n  1 variables then by introducing n random variables by applying F to the variables Xl, X2,  . ., X, with the i'th variable withheld Fi = F(Xl, X2 .... Xi1,Xi+b...,X,) and setting F
1 (F, + F2 + + Fn), we have
n

n
2
Var (F(Xl,X2,...,X,,1))
When we focus on In, we first note that

I(Xl, X2,.. ., Xn) :~ 1 + I(Xl, X2, ..., Xi 1, Xi+1 .... Xn) and

I(Xl,X2,...Xi1,Xi+i .... Xn) :S I(Xl ~ X2 Xn).

If we let An denote the number of sample points Xi that are in all of the increasing subsequence having maximum length Ln, we then find

n
(I(Xl, X2,..., Xn)  I(Xl, X2 .... 3Xi_l~Xi+l~...,Xn))2 < An

Since An < In and since the quadratic sum is decreased by replacing I(Xl, X2,... 1 Xn) by n1 (I(X2, X3, ..., Xn) + I(Xl, X2, Xn) ++ I(Xl, X2,..., Xn1) we have

n
j:(I(xl,X27 ... 3XibXi+2 .... Xn) _ 1)2 < I..

Taking expectations and applying the EfronStein inequality gives

Var In, :5 E1n < CVrn

where for any n > n(c) we can take C < 2 + E.
This bound on Var In1 was easily won, but despite its simplicity is enables us to circumvent some rather heavy analysis. In particular, the variance bound, monotonicity of In, and the asymptotics of the mean EIn  2V4n give us an easy proof that In/,/n_ 2 with probability one just by following the usual Chebyshev and subsequence arguments. This development may seem surprising since the strong law for In has served on several occasions as a key example of the effectiveness of subadditive ergodic theory (Durrett (1991) and Kingman (1973)). Here we should also note that Aldous (1993) gives a scandalously simple proof of the EfronStein inequality, making it easy enough to cover in even a first course in probability theory.

120 J. MICHAEL STEELE

Lower bounds on Var I,, have been obtained recently by B. Bollob~is and R. Pernantle who independently established that there is a c > 0 for which

Var In > cn 1/8 for all n > 3. In view of the bounds just reviewed one suspects that

lim log(Var In)/ log n = a n oo

for some 1/8 < a < 1/2. The existence of the limit may not be too difficult to establish, but substantial new insight will be required to determine the value of a.
The tail probabilities of In were first studied in Frieze (1991) using the bounded difference method. The bound obtained by Frieze was subsequently improved by Bollob~is and Brightwell (1992) who established that for all c > 0 there is a 0 0(c) > 0 such that for n > n(e), we have

P (IIn EIn 1 ~: n' /4+e) :5 exp(n'6).

The work of Bollob~s and Brightwell (1992) also considered the ddimen
sional increasing subsequences A+ (X,, X2, X,) = A+,, for Xi indepen
d d,
dent and uniformly distributed on the unit dcube.
THEOREm 6.1. For every d > 2, there is a constant Ad such that for all n > n(d) one has

P (JA+,,  EA+,, 1 ~~ \Adn 1/2d log n/ log log n < 80A2exp( _'X2)
d, d,

for all X m 'th 2 < A < n1/2d/ log log n.
Large deviation results like the last one have many consequences but a particularly valuable consequence in the present case is that one can extract a rate of convergence result for certain means. We let N have the Poisson distribution with mean n and define constants C,,d by

n11d EA+
d (Xl, X2, XN) = Cn,d.

The main result is that

Cd  Cn,d = 0 (n1/2d 10g3/2 n/ loglogn)

where c2 = 2 and the other constants Cd are those of the BollobAsWinkler theorem for d > 2 as viewed in Section 3.
The most precise result on the deviations of the length longest increasing subsequence I,, is due to Talagrand (1993) and emerges from the theory of abstract isoperimetric theorems for product measures developed in Talagrand (1988, 1991, 1993).

VARIATIONS ON THE MONOTONE SUBSEQUENCE 121

THEOREM 6.2. If M, denotes the median of I, then for all u > 0 we have

P (I,, ~~ M, + u) :5 2 exp ( u 2 1(4(M,.~  u)))

and

P(I, :5 M,  u) :5 2 exp(u 2 /(4M.))_

7. Pseudorandom sequences. If 0 < a < 1 is an irrational number and {X} denotes the fractional part of X, then the sequence of values determined by the fractional parts of the multiples of a given by {&}, {2a}, {3a},..., {n,,} are uniformly distributed in [0, 1] in the sense that if [a, b] C [0, 1] then the number of the integers k satisfying {ka} G [a, b] for 1 < k < n is asymptotic to (b  a)n as n+ oc. Bohl (1909) was evidently the first to show sequence of points {na} share this property with the independent uniformly distributed random variables Xn, 1 < n < 00, though the subsequent importance of this observation was certainly driven home by H. Weyl and G. Hardy. The project of exploring which properties of the X,, are shared by the {na} is a natural one, of which the results of Kesten (1960) and Beck (1991) are telling examples. The survey of Niederreiter (1978) provides an extensive review of the ways that pseudorandom numbers can parallel those that are honestly random and also articulates ways where pseudorandom sequences can be even more useful.
There are some very simple properties that make {Xn} and {na} seem
quite different, so there is particular charm to the fact that they can behave
quite similarly in the context of such a nonstandard issue as the length
of the longest increasing subsequence. If we let £+ (a) and C (a) denote
n n
respectively the longest increasing and longest decreasing subsequences of
{2a},..., {na}, then £+ and C turn out to have behavior that echos
n n
closely the behavior of their stochastic cousins. The first investigation of
this issue is given in Del Junco and Steele (1978) where it is proved using
discrepancy estimates that

,109 £n+ (0e) 1 109 £n (a) 1
2
log n 2 "u ogn

for all algebraic irrationals a and for a set of irrationals of [0, 1] measure one.
In Boyd and Steele (1978) a more precise understanding of £+(ce) was n
obtained by using the continued fraction expansion of a. It turns out that
n1/2 is the correct order of C and C if and only if a has bounded partial
n n
quotients. Moreover, when the partial quotient sequence of a is known one
can determine the precise range of £+/.,fn and C/,/n_. For example, in
n n
the eternally favorite special case of the golden ratio ao = (1 + ,/5)/2
we have

liminf £+(ao)/~,/n=2/5 1/4 = 1.337481 ... n noc

122 J. MICHAEL STEELE

and

limsup £(ao)v'n= 51/4 = 1.495349...
n
noo

There is a nonzero gap A(ao) between these two limits, and in fact there is no a for which one has a limit theorem precisely like that for independent random variables, but nevertheless we see there is a close connection between {na} and the genuinely random case.
In Boyd and Steele (1978) it is further proved that the gap A(a) is minimized precisely for ao = (1 + v/5_)/2. It is also proved there that for all irrational a we have

limsup ~+(a)£(a)/n = 2,
n n
noo

and for a with unbounded partial quotients

lim inf £+ (a)t (a)/ n = 1.
n n
noo

These results contrast with the analog' for a sequence of independent uniformly distributed random variables for which we have

1 i m In+ 1j, = 4 a.s.
nm

The behavior of the pseudorandom Weyl sequence {ce}, {2a}, {3c,},..., {nce} is not unique in the world of pseudorandom sequences. There is a remarkable sequence due to van der Corput (cf. Hammersley and Handscomb (1964)) that was invented in order to provide a sequence {xi EE [0, 11 1 < i < 00} that has an especially small discrepancy

n
Dn = sup 1[0,1(xi)  X
0
To define this sequence we first note that for any integer n > 0 there is a unique representation n = 1:'0 ai2' where ai EE {O, 1}. The nth element ~o2(n) of the van der Corput sequence of base 2 is given by "reflecting the expansion of n in the decimal point," that is we have

00
~P2(n) = E ai 2
i=o

The sequence is thus given by {1/2,1/4, 3/4,1/8,5/8,3/8, 7/8 .... }, and one can see that the sequence does indeed disperse itself in a charmingly uniform fashion. Bdjian and Faure (1977) have shown that there are sequences that are still more uniform but even there new sequences are conceptually quite close to the original idea.

VARIATIONS ON THE MONOTONE SUBSEQUENCE 123

Asymptotic equidistribution is perhaps the most basic feature of a sequence of independent uniformly distributed sequence, but one can check that the van der Corput sequence is far more uniform than a random sequence. This divergence from the behavior of independent uniform random variables is one of the facts that adds zest to the study of the behavior of the longest increasing subsequence Of {(P2(k) : 1 < k < n}. The basic results are the following:

liMSUP £+(~P2(1),(P2(2),...,~P2(n))/,/n=3/2
noo

and

liminf
nco

For any integer p, one can define a base p van der Corput sequence by expanding n in base p and letting ~op(n) be the analogous "reflection in the decimal poinC for p > 2 we have

limsup £+(~pp(l),~op(2),...,cpp(n))/vn = P1/2
n~oo

and

liminf 1)1/2.
n~oo

Because of the interest that has been devoted to the hard won constant
C2 = 2 for the limit of ~+ /V'n for the case of independent uniform variables,
n
we should record that the last limit tells us that

lim lim inf £+ (~op (1), ~pp (2),. (pp (n)) /v/n = 2.
p~oo ncc

8. Theory of subsequences of sections. Many combinatorial problems exhibit new behaviors when they are imbedded in an infinite sequence of nested problems. The most classical instance of this phenomenon is offered by the theory of irregularity of distribution (cf. Beck and Chen (1987)).
To see how this phenomenon is manifested in the context of the Erd6sSzekeres theorem we suppose we are given an infinite sequence of distinct reals S = {2~l~.T21 ... ) Xn Y ... }1 we focus on M(Xl, X2,..., Xn) = M,,(S) which denote the cardinality of the largest monotone subsequence of {.Tl) ~C2~ ... 5 Xn }. The Erd6sSzekeres theorem tells us that Mn ~: V/_n for all n > 1, and, this is the best one can say for any fixed n. Still, for any infinite sequence S of distinct reals we can do better than Vfn_ on infinitely many blocks of length n.
THEOREM 8.1. There is a consiant y > 1 such Mat

(8.1) liMSUP M(Xi+bXi+2.... ) Xi+n)/v/n_ > y.
i,noo

J. MICHAEL STEELE

Before embarking on the proof of this result we provide an example that shows that a doubly indexed result is what one required if the intention is to beat the Erd6sSzekeres theorem by a factor exceeding 1.We can construct an infinite sequence of integers {xl, X2 .... } such that for all of the finite sections {xl, X2,.. , XJ we have M(xl, X2,.. , Xn) = rv1'n]. This sequence can be constructed as a concatenation of blocks

Bk = {(_1)k+4 k + (_1)k+j (2k  j) : j = 0, 1, 2,..., 2k}

for k = 0,1,2,.... We note that IBki 2k+ 1, so 1Bol+ JB11++ JBk 1 = (k + 1)'. Also, we note that Bo {Q, B, = {5, 4, 3}, B2
{13, 12, 11, 10, 9} and B3 = {33,32, 31, 30, 29,28, 27}.
The proof of Theorem 8.1 depends on results that tell us that if M(xl,X2,,xn) and M(xl,X2, ... jX2n) are both small, then M(xn+l, xn+2,. .., X2n) must be exceptionally large. We will provide one such result as a consequence of the next lemma, but we first require some notation. Given any fixed sequence S, we let

a(k) = max{t 3xi, > xi, > ... > xi, with 1 < il < i2 < ... < it = k}
b(k) = max{t 3xi, < xi. < ... < xi, with 1 il < i2 < ... < it = k}
Cn(k) = max{t 3xi, > xi, > ... > xi, with k il < i2 < ... < it n}
dn(k) = max{t 3xi, < xi, < ... < xi, with k il < i2 < ... < ii, n}.

To illuminate these definitions, we note a(k) is the length of the longest decreasing sequence terminating with Xk, and Cn(k) is the length of the longest decreasing subsequence beginning with Xk and ending before the nth element of the sequence. Thus, the sum a(k) + Cn(k) is the length of the longest decreasing subsequence Of {Xl, X2, .. , Xn} that contains Xk.
LEMMA 8.1. For any 0 < E < 1/2 and n > no(E), if we have the bound
M(Xl, X2.... Xn) :5 (1 + then there exists 1 < k < n such that

a(k) ~: (1  6)vrn

and

b(k) > (1  6)vrn

provided that 62 > 2,..
Proof. The mapping k  (a(k), b(k)) is injective since if k < k' then we have a(k) < a(C) or b(k) < b(C), or both, because Xk' will extend at least one of the monotone sequences ending at Xk. The set of integers (s, t) with 1 < s, t < (1 + E)Vfni and min(s, t) < (1  6)Vfn_ has cardinality at most (1  6)2n + 2(e + 6)(1  6)n < (1 _ 62 + 2e)n and the n distinct points (a(k), b(k)) are among the lattice points [1, Mn]2 C [1, (1 + e.)Vrn]2, so for 61 > 2e. the pigeonhole principle yields the lemma. 0

VARIATIONS ON THE MONOTONE SUBSEQUENCE 125

PROPOSITION 8.1. For 0 < s. < 1/2 and all n > no(E), the inequalities

M(xl, X2 .... ~ Xn) :5 (1 + 1).~/'n
M(xl, X2 .... ~ X2n) :5 (1 + ).\/2n

2 . Mply
M(Xn+l 7 Xn+2 X2n) > (1  256)v/2n

provided 6' > 2E.
Proof. Even before beginning, we note that the factor of 25 given above can be improved, but it suffices for our main point and allows for simple computations. By the preceding lemma we have 1 < k < n such that

a(k) ~~ (1  6)vfn_ and b(k) ~: (1  6).,/n_.

By considering subsequences that go through Xk and continue from xj with n
M(xl, X2,..., X2n) > (1  6)vl"n + min{C2n(j), d2n(M,
so, by the bound on M(xl, X2, ..., X2n), we find
(8.2) min{c2.(j), d2n(j)} < (1 + SW2n  (1  6).,/n

for all n < j < 2n. Now, unless the conclusion of the proposition holds we
also have
(8.3) MaX{C2. (j), d2. (j)} :~ (1  256),/2n,

so the issue is to estimate the number of n < j < 2n that can satisfy (5.2) and (5.3). Since the mapping g + (C2n (j), d2n (j)) is injective, the proposition follows if there are fewer than n solutions of (5.2) and (5.3).
We need to count the number of positive lattice points (i, j) with

min{i, j} < {(1 + E)v/"2_  1 + 6}v/n_ and

max{i, j} < ,/2_(1  256)vfn.

A calculation shows that the area of the Lshaped region defined by these inequalities is bounded by n{l  6} < n. 0
To complete the proof of the main theorem of this section, we first note that setting 62 = 2e and solving 1 + s. = (1  256OvIf2 we are lead to a value of 6 = 0.117118. This tells us that in the theorem we can take any j < 1 + 62/2 < 1.00014.
The key problem that remains open at this point is the determination of the best possible value of j.

126 J. MICHAEL STEELE

9. Subsequences along cycles. We say that a sequence of integers (il, i2,. . ., ik) has d descents if it can be written as the concatination of d monotonic blocks but cannot be written as a concatination of fewer than d monotonic blocks.
For example, the sequence (5,9,11,2,3,8,4, 1) can be written as the concatination of four monotonic increasing blocks (5, 9, 11), (2, 3, 8), (4), (1) and thus the sequence has d = 4 descents. We let

£+ (d; X 1, X 2,.  ., Xn) =
maxf k : xi, < xi, < < xik where (iI, i2,. ik) has d descents}.

We define £(d;X1,X2 .... , x,,) as the corresponding maximum length ddescent monotone sequence of xi's. The purpose of introducing these quantities is that they lead to a quite natural analog of the Erd6sSzeleeres theorem.
THEOREM 9.1. For any n distinct real numbers, we have

£1(d; xl, X2.... ~ X')£ (d; X 1, X2, . . ., X,) > dn.

Proof. We first remark that the case d = 1 is the usual Erd6sSzekeres theorem. Further in the trivial case d = n, we have equality since
£ = n. Cl

10. Common ascending subsequences. If 7 and o. are two permutations of f l, 2,...,n}, we say they have a common ascending subsequence of length r if there are indices 1 < ii < i2 < ... < i, < n and 1 < il < h < ... < j, :5 n such that 7r(i,) = o(j.,) for all 1 < s < r. The notion of the longest common ascending subsequence A(7r, o) was introduced in Alon (1990) for the purpose of derandomizing the randomized maximum flow algorithm of Cheriyan and Hagerup (1989). The connection between A(7r, o.) and the central theme of this review is made most explicit by noting that A(7r, a) is equal to the length of the longest increasing subsequence of o 1 7r(l), (717r(2), . . ., o 1 7r(n). The main result of Alon (1990) is the following theorem.
THEOREM 10.1. For every two integers k and n with k > n, one can construct in time 0(kn), a sequence of permutations 7rl, 72, .., 7n Of {l, 2,..., n} such that any permutation a satisfies
1 k 2/1).

k A (o., iri) = 0 (n

The fact that this result is reasonably sharp can be deduced from the probabilistic results of the first section. For any fixed 7rl,7r2,..,7rn, we have for a random o. that

k
E 7r(o, 7ri) = EIn > (2  ),/n

VARIATIONS ON THE MONOTONE SUBSEQUENCE 127

for any e. > 0 and n > n(c). Combining this observation with Alon's theorem with k = n we find that there is a constant c > 0 such that the functional
n
An = maxmln \(0, 7ri)
n
satisfies

2/3
(2  c.) V1'n_ < An < c n

The main point of reviewing this information is to point out the problem of determining the true order of An.

11. Optimal sequential selection. One of the intriguing themes of sequential selection is that one sometimes does surprisingly well in making selections even without knowledge of the future or recourse to change past choices. One illustration of the theme is the "secretary problem" of Gilbert and Mosteller (1966) that tells us that given Xl, X2,. ., Xn independent and identically distributed random variables, there is a stopping time T
7,, such that

P(X, = max Xi) > el.
1
There turns out to be a parallel phenomenon that takes place in the theory of monotone subsequences.
The problem is to determine how well one can make a sequence of choices from a sequentially reveal set of independent random variables with a known common continuous distribution. To put this problem rigorously, we call a sequence of stopping times 1 < 7.1 < 72 < ... a policy if they are adapted to Xl, X2.... and if we have X,, < X,2 < ... < X,, < . . .. We let S denote the set of all policies. The main result about such policies is given in Samuels and Steele (1981).
THEOREM 11.1. For any sequence of independent random variables with continuous distribution F and associated set of policies S, we have

u, = sup E(maxf k : rk < n})  v"2n

as n ). oo.
This theorem thus tells us that there is a policy by which we can make sequential selections from X,, X2, .. ., Xn and obtain a monotone increasing subsequence of length that is asymptotic in expectation to \/2n. Since the best one could do with full knowledge Of f Xl, X2, .. ., Xn } is to obtain a subsequence with length that is asymptotic in expectation to 2V/n, we see that a "mortal" does worse than a "propheC by only a factor of ~/2.
It was observed by Burgess Davis (cf. Samuels and Steele (1981), Section 7), that if one lets 4 denote the expected value of length of the longest

128 J. MICHAEL STEELE

increasing subsequence that can be made sequentially from a random permutation of {l, 2,..., }, then one has £,,  u, and consequently t,  ~/2n. The importance of this remark comes from the fact that the "naturaP selection studied in Baer and Brock (1968) is just "sequentiaF' selection as studied here. Thus, the results of Samuels and Steele (1981) coupled with the key observation of Burgess Davis resolve the main problem posed in Baer and Brock (1968).

12. Open problems and concluding remarks. Erd6s (see Chung (1980), p. 278) has raised the question of determining the optimum values associated with various weighted versions of the monotone and kmodal subsequence problems. Specifically, let W = {(Wl, W2 .... Wn) : Wi > 0 and WI +W2 +...+W, = 1} and for w E W and for 0 < k let U(w) denote the set of kunimodal subsequences of w. The problem is to determine the values
r(n, k) = min max E wi
WEWUEU(W) WiEU

To illustrate, we note that it is easy to show that r(n, 0) :5 nl/', since by considering a perturbation of uniform weights that have the same ordering which yields a longest monotone subsequence of length ru 1/21, we see ,r(n, 0) < n1 rnl/21.
One surely suspects that 7(n, 0)v/n  1 as n  oo, but this has not yet been established, though it might be easy. By similar considerations using Chung's theorem, one sees that limsup 7(n, 1)V4n :5 v13 while we expect that actually r(n, 1)+ v/"3.
Erd6s posed the problem of determining the largest integer f (n) such that any sequence of m = f (n) distinct real numbers X 1, X2,  . ., X, can be decomposed into n monotonic sequences. Hanani (1957) proved that

f(n) = n(n + 3)/2.

A question posed by Erd6s (1973) for which there seems to have been no progress is the following:
Given Xl, X2, ..., Xn distinct real numbers determine

max )71 Xi
m iEM

where the maximum is over all subsets of indeces 1 < il < i2 < ... < ik 5 n such that xi,, xi,, . . . 1 xi. is monotone.

Acknowledgement. I am pleased to thank D. Aldous, P. Diaconis and M. Talagrand for making available prepublication copies of their work. I also thank L. Lov~sz for comments that contributed to the discussion in Section nine, S. Janson for comments that contributed to Section six, and P. Winkler for comments that contributed to Section three. P. Erd6s provided

VARIATIONS ON THE MONOTONE SUBSEQUENCE 129

the reference to the work of Hanani, as well as the inspiration to pursue this survey.

REFERENCES

[1] D. ALDOUS, Reversible Markov chains and random walks on yraphs, Monograph Draft, version of January 25, 1993, Department of Statistics, University of California, Berkeley, CA (1993).
[2] D. ALDOUS AND P. DIACONIS, Hammersley's interacting particle process and longest increasing subsequences, Technical Report, Depaxtment of Statistics, U.C. Berkeley, Berkeley, CA (1993).
[3] N. ALON, Generating pseudorandom permutations and maximum flow algorithms, Information Processing Letters 35 (1990), pp. 201204.
[4] N. ALON, B. BOLLOBAS, G. BRIGHTWELL AND S. JANSON, Linear extensions of a random partial order, to appear in Ann. Appl. Prob. 3 (1993).
[5] R.M. BAER AND P. BROCK, Natural sorting over permutation spaces, Math. Comp. 22 (1968), pp. 385510.
[6] J. BECK, Randomness of nvr2modl and a Ramsey property of the hyperbola, ColloqgiaMathematicaSocietatisJAnos Bolyai 60 Sets, Graphs, and Numbers, Budapest, Hungary (1991).
[7] J. BECK AND W.W.L. CHEN, Irregularities of Distribution, Cambridge University Press, 1987.
[81 R. BbIAN AND H. FAURE, Discr6pance de la suite de van der Corptt, C.R. Acad. Sci. Paris S6r. A 285 (1977), pp. 313316.
[91 P. BLACKWELL, An alternative proof of a theorem of Erdds and Szekeres, Amer. Mathematical Monthly 78 (1971), p. 273.
[10] K.P. BOGART, C. GREENE, AND J.P. KUNG, The impact of the chain decomposition theorem on classical combinatorics, in The Dilworth Theorems: Selected Papers of Robert P. Dilworth (K.P. Bogart, R. Freese, and J.P. Kung, eds.), BirkhSuser Publishers, Boston (1990), pp. 1929.
[11] P. BOHL, C1ber ein in der Theorie der siktiaren St6runger vorkommendes Problem, J. Reine U. Angew. Math. 135 (1909), pp. 189283.
[12] B. BOLLOBAS AND G. BRIGHTWELL, The height of a random partial order: Concentration of measure, Ann. Appl. Prob. 2 (1992), pp. 10091018.
[13] B. BOLLOBkS AND P.M. WINKLER, The longest chain among random points in Euclidean space, Proc. Amer. Math. Soc. 103 (1988), pp. 347353.
[14] G. BRIGHTWELL, Models of Random Partial Orders in Surveys in Combinatorics, 1993, London Mathematical Society Lecture Notes 187 (K. Walker, ed.), (1993).
[151 J. CHERIYAN AND T. HAGERUP, A randomized maximumflow algorithm, Proceedings of the MEEFOCS 1989 (1989), pp. 118123.
[161 F.P.K. CHUNG, On Unimodal Subsequences, J. Combinatorial Theory (A) 29 (1980), pp. 267279.
[17] J.G. VAN DER CORPUT, Verteilungs funktionen, IJI, Nederl. Akad. Wetensch. Proc. 38 (1935), pp. 813821, 10581066.
[18] A. DEL JUNCO AND J.M. STEELE, Hammersley's Law for the van der Corput Sequence: An Instance of Probability Theory for Pseudorandom Numbers, Annals of Probability 7 (2) (1979), pp. 267275.
[191 R. DURRETT, Probability: Theory and examples, Wadsworth, Brooks/Cole, Pacific Grove, CA (1991).
[20] R.P. DILWORTH, A Decomposition Theorem for Partially Ordered Sets, Annals of Mathematics 51 (1950), pp. 161165.
[21] P. ERD6S, Ko CHAO, AND R. RADO, Intersection theorems for systems of finite sets, Quarterly J. Mathematics Oxford Series (2) 12 (1961), pp. 313320.

130 J. MICHAEL STEELE

[22] P. ERD6S, Unsolved problems in Graph Theory and Combinatorial Analysis, Combinatorial Mathematics and It's Applications (Proc. Conf. Oxford 1969), Acadernic Press, London (1971), pp. 97109. Also in Paul Erdds: The Art of Counting, (J. Spencer, ed.), MIT Press, Cambridge, MA (1973).
[23] P. ERD6S AND R. RADO, Combinatorial theorems on classifications of subsets of a given set, Proc. London Math Soc. 3 (2) (1952), pp. 417439.
[24] P. ERD6S AND Gy. SWKERES, A combinatorial problem in geometry, Compositio Math. 2 (1935), pp. 463470.
[25] A.M. FFaEzE, On the length of the longest monotone subsequence in a random permutation, Ann. Appl. Prob. 1 (1991), pp. 301~305.
[26] T. GALLAI AND N. MILGRAM, Verallgemeinerung eines graph entheoretisch en Satzes von Reide, Acta Sci. Math. 21 (1960), pp. 181186.
[27] J.P. GILBERT, AND F. MOSTELLER, Recognizing the maximum of a sequence, J. Amer. Statist. Soc. 61 (1966), pp. 228249.
[281 i.M. HAMMERSLEY, A few seedlings of research, Proc. 6th Berkeley Symp. Math. Stat. Prob., U. of California Press (1972), pp. 345394.
[29] J.M. HAMMERSLEY AND D.C. HANDSCOMB, Monte Carlo Methods, Methuen Publishers, London (1964).
[301 H. HANANI, On the number of monotonic subsequence, Bull Res. Council Israel, Sec. F (1957), pp. 1113.
[31] H. KESTEN, Uniform distribution mod 1, Annals of Math. 71 (1960), pp. 445471.
[32] J.F.C. KINGMAN, Subadditive ergodic theory, Ann. Prob. 1 (1973), pp. 883909.
[33] D. KNUTH, Permutations, matrices, and generalized Young tableaux, Pacific J. Mathematics 34 (1970), pp. 709727.
[34] W. KRASKIEWicZ, Reduced decompositions in hyperoctahedral groups, C.R. Acad. Sci. Paris 309 (1989), pp. 903907.
[35] V. LIFSCHITZ AND B. PITTEL, The number of increasing subsequences of the random permutation, J. Comb. Theory Ser. A 31 (1981), pp. 120.
[361 B.F. LOGAN AND L.A. SHEPP, A variational problem for Young tableaux, Advances in Mathematics 26 (1977), pp. 206~222.
[371 L. Lovksz, Combinatorial Problems and Exercises, NorthHolland Publishing Company, Amsterdam (1979).
[38] H. NIEDERREITER, QuasiMonte Carlo methods and pseudorandom numbers, Bull. Amer. Math. Soc. 84 (1978), pp. 9571041
[391 S. PILPEL, Descending subsequences of random permutations, J. Comb. Theory, Ser. A 53 (1) (1990), pp. 96116.
[40] S. SAMUELS AND J.M. STEELE, Optimal Sequential Selection of a Monotone Subsequence from a random sample, Ann. Prob. 9 (1981), 937947.
[41] C. SCHENSTED, Longest increasing and decreasing subsequences, Canadian J. Math. 13 (1961), pp. 179191.
[421 A. SEIDENBERG, A simple proof of a theorem of Erd6s and Szekeres, J. London Math. Soc. 34 (1959), p. 352.
[431 R.S. SILVERSTEIN, Estimation of a constant in the theory of three dimensional random orders, BS/MS Thesis in Mathematics (1988), Emory University.
[44] D.STANTON, AND D. WHITE, Constructive Combinatorics, SpringerVerlag, New York (1986).
[45] J.M. STEELE, Limit properties of random variables associated with a partial ordering of Rd, Ann. Prob. 5 (1977), pp. 395403.
[46] J.M. STEELE, Long unimodal subsequences: A problem of F.R.K. Chung, Discrete Mathematics 33 (1981), pp. 223225.
[47] J.M. STEELE, An EfronStein inequality for nonsymmetric statistics, Annals of Statistics 14 (1986), pp. 753758.
[48] M. TALAGRAND, An isoperimetric theorem on the cube and the KhintchinKahane inequalities, Proc. Amer. Math. Soc. 104 (1988), pp. 905909.
[49] M. TALAGRAND, A new isoperimetric inequality for product measure and the tails of sums of independent random variables, Geometric and Functional Analysis, 1 (2) (1991), pp. 211223.

VARIATIONS ON THE MONOTONE SUBSEQUENCE 131

[50) M. TALAGRAND, Concentration of measure and isoperimetric inequalities in a product space, Technical Report, Department of Mathematics, Ohio State University, Columbus, OH (1993).
[51] W.T. TROTTER, Combinatorics and Partially Ordered Sets: Dimension Theory, The Johns Hopkins University Press, Baltimore (1992), pp. xiv + 307.
[52] S.M. ULAM, Monte Carlo calculations in problems of mathematical physics, in Modem Mathematics for the Engineer (E.F. Beckenbach, ed.) McGrawHill, New York (1961).
[53] A.M. VER~IK AND S.V. KERov, Asymptotics of the Plancherel measure of the symmetric group and the limiting form of Young tableaux, DokL Akad. Nauk. SSSR 233 (1977), pp. 10241028.
[54] A.M. VER~II< AND S.V. KEROV, Asymptotics of the largest and the typical dimensions of irreducible representations of the symmetric group, Functional Analysis and Its Applications 19 (1985), pp. 2131.
[551 D.B. WEST, Extremal Problems in partially ordered sets, in Ordered Sets: Proceedings of the Banff Conference (I. Rival, eds.) Reidel, Dordrecht and Boston (1982), pp. 473521.