Please see PDF version

Seedlings in the Theory
of Shortest Paths

J. Michael Steele


This article explores three developments that arise from the fundamental theorem of Beardwood, Halton, and Hammersley on the asymptotic behavior of the shortest path through n random points. The first development concerns the role of martingales in the theory of shortest paths, especially their role in large deviation inequalities. The second development concerns the use of Lipschitz spacefilling curves to obtain analytical bounds in the theory of the TSP, and it provides some bounds that refine those of Bartholdi and Platzman on the worst case performance of the spacefilling heuristic for the TSP. The final topic addresses the relationship between Karp's partitioning heuristic and the BHH theorem.

1. Introduction

In 1959 Beardwood, Halton, and Hammersley established the following theorem:
If Xi, 1 < i < oc) are independent identically distributed random variables with bounded support in Rd, then the length Ln under the usual Euclidean metric of the shortest path through the points {Xl, X2,..., Xn} satisfies

n(dl)/d Ln` Cd fOR d f (X)(d1)1d dx almost surely. (1.1)

Here, f (x) is the density of the absolutely continuous part of the distribution of the Xi.
This result has proved fruitful in most of the ways that axe open to a mathematical discovery. In particular, it has lead to interesting applications, provoked useful generalizations and inspired new techniques of analysis. The intention of this article is to review and contribute to three developments associated with the Beardwood, Halton, Hammersley theorem.
The first development concerns the extent to which (1.1) can be complemented by large deviation results. This exploration leads us to consider

278 Steele

some basic results of large deviations for martingales, particularly Azuma's inequality, for which we give two proofs. While exploring the relationship of the TSP martingale theory, we also examine the demands it places on results like the square function inequality of Burkholder, bounds on Hermite moments, and related ideas. In the course of the review we give new proofs of two inequalities of Rhee and Talagrand, and we examine essentially all of the available information concerning the tail of the probability distribution of L,,.

We next address the use of spacefilling curves in the analytical theory of the TSP. Such techniques are relatively new, but their simplicity and generality suggests that their use will grow. The fact that underlies this development is the existence of measure preserving transformations from [0, 1] onto [0, 1]d that are Lipschitz of order 11d. A basic objective of Section 4 is to review the background of a problem of Platzman and Bartholdi on the ratio of the length of the tour provided by the spacefilling heuristic and the length of an optimal tour is bounded independently of n.

The third development concerns the role of (1.1) in Karp's polynomial time partitioning algorithm for the TSP. This topic is addressed briefly, but two results are reviewed that will make clear how one can show the effectiveness of Karp's algorithm without resort to the full force of (1.1). 

In the concluding section, we discuss some open problems and promising research directions. Finally there are two appendices that stand somewhat apart from our basic themes. The first of these gives S. Lalley's previously unpublished proof of the Beardwood, Halton, Hammersley Theorem in d = 2 for random variables with the uniform distribution on [0, 112. This proof uses minimal machinery and illustrates a technique that is applicable to many related problems. The second appendix develops an inequality for martingales that R.E.A.C. Paley introduced for Walsh functions. Paley's old argument is examined for the suggestions it provides about how one might pursue large deviation inequalities for L,, without paying the price of bounds on WO norms as demanded by Azuma's inequality.

2. Martingale Bounds for the TSP

For Xi, 1 < i < oc, independent and uniformly distributed in [0, 1]d, the length L,, of the shortest path through {X,, X2,, X,,} is a random variable that we can show to be tightly concentrated about its mean. In d = 2, for example, we know that Var L,, is bounded independently of n. This fact is proved in Steele (1981b) by means of the jackknife inequality of Efron and Stein (1981), but one can provide a proof that offers considerably more potential for further development by following Rlice and Talagrand (1987) and introducing martingale arguments.


Seedlings in the Theory of Shortest Paths 279

If Fk is the orfield generated by {Xl, X2,..., Xk} and

di = E(L,, 1 Fi)  E(Ln 1 Fi1), (2.1)

then di, 1 < i < n, is a sequence of martingale differences that satisfy n
L,,  ELn di. (2.2)

This wellknown representation is available for any integrable random variable, but there are features that make it particularly effective for L,,. The most central of these is that the di can be related to the change that takes place in Ln as one of the Xi is changed. In this respect, the analysis of Ln by means of martingale differences comes to rely on calculations that are quite close to those that made the jackknife inequality effective. By working out the details of the LP theory associated with the martingale representation, we are led to some of the basic themes of martingale theory.
For each 1 < i < n, let L(~) denote the length of the shortest path n
through Si {Xl, X2 .... ~ Xi17 Xi, Xj+1,..., Xn} where the random variables {ħi < i < n} are independent, uniformly distributed and also independent of the variables in the set S = {Xi : 1 < i < n}. Since E(Ln(') 1 Fi) = E(Ln 1 Fi1), we have the key observation that
di = E(L.  L(" 1 Fi). (2.3)

Since one can build a path through Si by following the minimal path through S and making a detour from Xj to ħi and back for some 54 i, we have
L(')  L. < 2 min iXi  Xjj. (2.4a)
n  JJ7b
By the same reasoning but starting from the optimal tour for Si we have
L,,  LnO) < 2 min 1Xi  Xj 1, (2.4b)

and, moreover, the right hand sides of (2.4a) and (2.4b) both have the same distribution. Next we note that simple geometric considerations as applied in Steele (1981b) give us a bound on the tail of these distributions:

P(Tin IXi  Xj I ~: t) < Ae nld, t > 0, (2.5)

where A = Ad and B = Bd are constants that depend on d (but not on n or t). From (2.3), (2.4a,b), (2.5), and Jensen's inequality, we therefore find for any p > 1 that

Eldi1P < 4PpdlA(nB) p/d r(pld). (2.6)

i i



Finally, in terms of LP norms, we find from Stirling's formula that
lidilip:5 Cl(pIn)11d



where Cl is a constant that depends only on d > 2.
Inequality (2.7) is a basic one for the theory of the traveling salesman problem. In particular, since the di are orthogonal random variables, we find that if we restrict attention top = 2 and d = 2, then (2.7) completes the proof of the rather surprising uniform bound Var L,, < 2A2B1 mentioned 2
For large p inequality (2.7) is not as effective as one would hope since from (2.4a,b) it is already immediate that the norms lidiii,,,, are bounded by 2P1. Still, by applying the argument used in (2.6) to the conditional
. In particular,

then from (2.3) we find

probabilities (2.3), we can get a sharper bound on the if we relax the bounds (2.4a,b) to

I.L(')  L,, 1 ~~_ 2 min IX^ j  Xj 1 + 2 min IXi  Xj 1,
n j:j>i j:j>i

idil<2E{MiniħiXji}+2E Min.iXiXj11Xi (2.8)
jj>i {jj>%

Using ildili,>~ < 2d1/2 to deal with i = n, we thus can find a constant C2 that depends only on d > 2, so for all 1 < i < n we have

Ildill. :5 C2(n  i + l)1/d.



The beauty *of (2.9) is that it permits us to use traditional martingale techniques to obtain reasonably sharp large deviation inequalities on L,, EL,,. To develop one such inequality we first note that for any y > 0,

exy < cosh y + x sinh y for all 1x 1 < 1, (2.10)

because (2.10) trivially holds for x E {1, 0, 1}, e'y is convex, and the right hand side is linear in x. If we now let x = dilildiil,,,, and y = tlidill~., we find for 1 < k < n that

exp (t k di) :5 k (coshtildili,,. + di(sinhtlidill.)
E fl Ildill.

Taking expectations and using the fact that the di aremartingale differences gives us

E exp, (t di :5 cosh(tlidiii.)
t2 k 2


Seedlings in the Theory of Shortest Paths 281

where in the last inequality we used the elementary bound coshx < e 2/2 .
From (2.11) and the fact that the right hand bound is an even function of
t, we find for all t > 0 that

P(1 k di 1\ :5 2eAt exp t2 k Ildi 112 (2.12)
so letting t A 1:k 1 Ildi 112 we conclude
i= 00 )

,\2 Ild,112
A 2 exp 2 (2.13)

This inequality is valid for any martingale difference sequence {di}, and it is due to Azurna (1967). When we apply (2.13) to our particular {di} satisfying (2.1) we find a theorem which was established in the case for d = 2 in Rhee and Talagrand (1987).

THEOREm 2.1. There is a constant C3 depending only on d such that for all n > 1 and A > 0 we have

P (IL,,  ELn 1 > X) 2exp(C3A1/logn) if d = 2 (2.14)
2 exp( C3,\2 n (2d)/d) if d > 3.

The technique used to obtain Azuma's inequality (2.13) is apparently quite crude, and one might hope to do better in several ways. One natural idea is to try to generalize (2.10) to

exy < xf (y) + g(y), 1xl < 1, y ~~ 0, (2.15)

for f and g that might be more effective than sinh and cosh. To see why this idea does not succeed, we let x =.:tl in (2.15) and add the two resulting inequalities. We find that (2.15) forces the bound coshy :5 g(y), and thus no inequality like (2.15) serves us any better than that used in the argument leading to (2.13).
A second seedling concerning Azuma's technique and the TSP comes from viewing (2.10) as a separation of variables for the bivariate function exy. A classical approach to such separation might call on the generating
function for Hermite polynomials:
G(x, y) = e 2XYY 2 1: Hn(XW (2.16)
n=0 n!




282 Steele

This approach has not been developed very far, but it seems rich enough to deserve a brief digression. Because of the basic orthogonality relation

H.(x)H.(x)e _,2dX = V177 2 ' n! 6 .. n,

it is not difficult to give a condition on the L2 norm of & (L,  EL,) that implies a large deviation inequality of Gaussian type. In fact it suffices to assume that
EHn2(Z) < Ann!

for some constant A.
Before closing this digression on separation of variables in e'y, we should note that (2.16) is closely related to (2.10); in particular from (2.16) we easily find expressions for sinhy and coshy in terms of odd and even Hermite polynomial (see e.g. Section 8.957 of Gradshteyn and Ryzhik 1963). Still, because of special properties of Hermite polynomials such as their recursion relation, one might expect some progress through Lemma 2.1.
Returning to the direct exploration of large deviation inequalities, we should note their easy application to moments. Thus, we multiply (2.13) by pAP1 and integrate over [0, oc) to find for p > 1 that

k p k lid,112 p/2
EEdi :52pF(p/2){21: or, in terms of norms,

k /2 k lld,112 1/2 (2.17)
di :5 C4P' E  1
lip (i=l

where C4 is a universal constant which does not even depend on d. When (2.17) is specialized to {di} satisfying (2.3), we find from (2.9) that for all n > 1
JIL.  EL,, lip :5 C5P1/2(logn)1/2 if d= 2, (2.18)
{ C5P 1/2n(d2)/(2d) if d > 3.

Inequality (2.8) can be obtained in another way that also provides an interesting proof of Azuma's inequality. The key idea comes from work of Jakubowski and Kwapiefi (1979), and, in our context, the main point is that if we let rk(s) be the kth Rademacher function (i.e. rk(s) sign(sin2k7rS), 0 < s < 1) then

n rk(s)dk (W)
f (W, 8) = J1 (1 + iidkil. (2.19)


Seedlings in the Theory of Shortest Paths 283

is a density function with respect to the product measure dsdP. The identities that make (2.19) effective are

n akdk(W) n
E ildkil. E akrk(s)f(w, s) ds (2.20a)
k=l 10 k=l

1= 1 f (w, s) dP. (2.20b)

The proof of (2.20a) just requires expanding (2.19) and using the fact that the Rademacher functions {rk(S)}1 To get our second proof of Azuma's inequality we first apply Jensen's inequality in (2.20a), integrate with respect to P, and change order of integration:

E exp t akdk ildkil. E (11 exp t akrk(S)
E ( E ) f(w, s) ds
k=l k=l

foo exp (t n akrk(S)
0 E ) d.,
k=l n
ri cosh(tak). (2.21)

In the second line of (2.21) we used (2.20b), and in the last we used the fact that the rk are Bernoulli random variables. If we now let ak = 11 dk 11 ,, in (2.21), we find that (2.21) reduces to the same bound as (2.11), so one can complete the proof of Azuma's inequality just as before.
The direct application of the JakubowskiKwapiefi representation (2.20) also provides a route to LP bounds on E', di. Letting ak = Ildj. in (2.20a) we have

dk ildkli.rk(s))f(w, s) ds, (2.22)
0 E
k=l k=l

so if we raise both sides tjo the pth power, apply Jensen's inequality on the right and then use (2.20b), we have
E 1 n dklp < fliE lldkil,,ork(S)IP ds. (2.23)


284 Steele

Since the {rk} are independent Bernoulli random variables, we can apply Khintchine's inequality (Chow and Teicher 1978 or Haagerup 1982) to obtain

E~ dkl' < Ildk 112 ) p/2 (2,24)
 2 E  
k=l k=l

Comparison of (2.24) with (2.17) shows that (2.24) is not an essential improvement. Still, the approach via the representations seem to be a bit better, at least it simplified tracking the constant. An intriguing feature of both approaches is the appearance of the sum of squares of the L,,, norms. Possibly this quantity is really rooted in the large deviation problem, but more likely, it is a coincidental artifact of the approaches. In the next section we systematically pursue the relationship of moments and large deviations in the context of the TSP. By introducing a few additional martingale tools, we can extract almost all of the information available on the tails of behavior of L,,.

3. Large Deviations and Moment Inequalities

We begin with a lemma that must be classical. It reminds us that the hunt for large deviation inequalities of Gaussian type can be conducted by pursuing appropriate LP bounds. The interest in this observation comes from the fact that for some variables closely connected with L. those bounds are easily proved.

LEMMA 3. 1. For any random variable Z, a necessary and sufficient condi
tion that
P(IZI > t) < Ae BO' t > 0, (3.1)

for some constants A > 0 and B > 0 is that for all p > 1

lIZI1P :5 Cp112 (3.2)

for some constant C.

PROOF: If (3.1) holds, we multiply byptP1 and integrate as in (2.6) to obtain (3.2). For the converse, we just note by (3.2) and Markov's inequality that
p(IZI > t) :5 Cppp12 = eP logC+(1/2)plogpplogt

so, choosing p such that logp = 2(log t  log C)  1, or p = t2C2e1, yields
(3.1) with A = 1 and B = (2C2e)1. E

i i i

i i i


Seedlings in the Theory of Shortest Paths 285

A central theme in the theory of martingales is that for any martingale difference sequence {Yi, 1 < i < n} the square function,

Sn n Y2 1/2
(3.3) and the maximal function,

Mn su .11:yi~,
1:5k< i=1

share many properties with the underlying martingale

Mk Yi, 1 < k < n.

In particular, the inequalities of Doob and Burkholder tell us, among other things, that if any one of S,,, M,*,, or M,, is in LP for some 1 < p < oc then all three are in LP. The comparability of the moments S,, and M,, is particularly interesting for the theory of the TSP in R d because, as we see in the next lemma, the LPnorm of S,, can be bounded with enough precision to yield powerful large deviation inequalities. In fact, for d = 2 the resulting LP bound is good enough to guarantee a large deviation inequality of Gaussian type.

LEMMA 3.2. For the TSP martingale summands di of (2.1), we have for even integers p ~: 2 and any set S C {l, 2,..., n} that
(Y d?) 11 :5 C1P11d1S1112 n l/d' (3.4)
iES p
where C, is the same constant as given in (2.7) and ISI is the cardinality ors.
PROOF: We first expand and apply the generalized H81der inequality:
E (1:d?)p 1: 1: 1: Edi21d 2 d?
1 i2 1p
iES ilESi2ES i,ES

(Ed?P)VP(Ed?P)11P ... (Ed?P)11P.
Z1 Z2 %P
ilESi2ES i,ES

Next, using the bound from (2.7), together with Ildi112p < C1(2p/n) 1/d or Ed 2P < C2P(2p/n)2p/d, we find i
E (E d?)p < 151pC2P(2p/n) 2p/d,

i i


and hence for even integers p we conclude

d?) C~pl/d1511/2 n 11d.


This bound is of particular interest for d = 2 and S = {l, 2,. n}1 since it is then of the form required in Lemma 3.1, i.e.

d2 :5 CP1/2.
1/211 (3.5)

Thus for d = 2 the square function associated with the TSP martingale differences of (2.1) satisfies a large deviation inequality of Gaussian type (3.1).
One hope raised by (3.4) and (3.5) is that of extracting a Gaussian type large deviation inequality for L,, from that available for the square function (1: di2)1/2 associated with L,,. To assess this possibility we first recall the square function inequalities of Burkholder (1966, 1973):
For 1 < p < oe and any sequence of martingale differences Yi with associated square function S,, defined by (3.3), we have
(18P1/2 q)'IIS.lip < IIEYill :5 18.q 112P11S.11P (3.6)
i=1 p
where 11p + 11q = 1.
To see how (3.6) relates to the inequalities considered earlier, we note that we always have

n p/2
ISnIP lidiII2.) (3.7)

so, in particular, the second inequality of (3.3) gives us a bound like (2.16) which expresses the LP version of Azuma's inequality. In this instance there is a critical difference in that the factor p111 is inflated to p. Since large deviation results depend on the LP inequalities for large p, this change in the constant is a major concern.
Still, when d = 2 we can use Lemma 3.2 to get good bounds on the tail probabilities of Ln  EL,,. We will give two illustrations of this approach. The first consists of showing that the moment generating function of Ln ELn can be bounded independently of n.
To begin we note that for ltdi 1 < 1, the Taylor expansion of log(l + tdi) gives us
n 00
IP + tdi) = exp (I: ( 1 Mipktk 1k) (3.8)
i=1 k=l

Seedlings in the Theory of Shortest Paths 287
where Ok = d k + d k ++ d k .
1 2 n We next note for k > 3 that
n n
lok 1 .5 1: Ildj Ilk < E Ck (n  j + 1)k/2
j=1 j=1


1 Ej3/2 = Ck
1((3/2) (3.9)

where ~(s) so from (3.8) and (3.9) we find

n 00
exp (131t 12 p2t2) :5 11(1 + tdi) exp ((3/2) Cktk 1k

After taking expectations, we see

(plt _ 102t2) :5 eXP Cktk
E exp 2 ((3/2) 1 1k Offil (3.10)

so writing exp(,81t) = eXp(olt 02t2) exp(P2 t2) and applying Schwarz's
inequality gives

Eexp(flit) :5 0(2t) 1/2 (E exp(2P2 t2))1/2. (3.11)

By (3.5) and Lemma 3.1 we know there is a constant A > 0 not depending on n such that P(O21/2 > t) :5 Aet2/A ; hence, we. have for Iti < A` that

E exp(t02) :5 1 (3.12)
1  At

and the bound (3.11) does not depend upon n.
The uniform bound on the moment generating function given by (3.11) and (3.12) naturally give a large deviation bound. For reference purposes we record the following consequence of (3.11) and (3.12) that was first obtained in Rhee and Talagrand (1988a) by different means.

PROPOSITION 3. 1. For d = 2, there is a constant C such that for all n > 2 and t > 0
P(IL,,  EL,,l ~~ t) < Cect. (3.13)

A stronger result than (3.13) can be obtained by the use of Burkholders inequality. In fact, the following theorem seems to be about as much as one can obtain without going beyond the information on the TSP that is incorporated in (2.7) and (2.9).



288 Steele

THEOREm 3. 1. For d = 2, there is a constant C such that for all n > 1 and p > 1,
JIL.  EL.11p :~ CP1/2(logp)1/2. (3.14)

PROOF: We rely on the martingale representation (2.3) and split the representing sum into two terms,

JIL.  ELn lip = ll E di ilil: dilip + ~iiF dill (3.15)
i=1 lip otn p

for any 0 < a < 1. To the first summand we apply (2.17), the LP version of Azuma's inequality, and to the second we apply Burkholder's second inequality to find

JIL.  ELnilp :f~ C4P1/2 lidi 112 1/2 + 18pq 1/2 ll (iF 2) 1,211
00 di
Kan >an p
(3.16) Now we apply (2.9) to the first sum and (3.4) to the second,

:5 C1C4P 1/2 (n  i + 1)1 + lspql/2C,P1/2(1 _ a)1/2

C6P1/2 (log 1/(1 _ a)) 1/2 + c6P3/2 q 1/2(1 1/2. (3.17)

When we let (1  a)1/2 = p', we find (3.14).

COROLLARY. There is a constant B such that for d 2 we have

^Ln  ELni ~~ t) :5 2e Bt2/log(l+t) (3.18)

for all t > 0.

The proof of (3.18) from (3.14) follows just as in Lemma 3.1. This time the proper choice of p is t2 /(Clog t) where C is the constant of (3.8).
Inequality (3.18) was also first established in Rhee and Talagrand (1988a). Their proof grew out of the idea of interpolating between the d = 2 case of (2.14) where the tails have quadratic exponential behavior that depends on n, and on (3.13), where the bound is independent of n but is linear exponential. Rhee and Talagrand (1988a) bring these two bounds together to prove (3.18) by use of interpolation results from Bergh and Lofstrom (1976). The present proof via (3.8) is simpler than that of

Seedlings in the Theory of Shortest Paths 289

Rhee and Talagrand, at least so far it relies on methods that are familiar to probabilists. Still, even now, the Burkholder inequalities might not be regarded as completely commonplace tools, and the proof of (3.18) is not yet elementary.
The quest that has been traced here, the derivation of a Gaussian type large deviation bound for L,,  EL,,, has very recently come to fruition through Rhee and Talagrand (1988b). By combining their basic martingale approach with a barehanded investigation of the geometry of an nsample from [0, 1]', they show that one can indeed remove the logarithmic factor from (3.18). The resulting inequality for the TSP in d = 2 stands as both the natural end to a line of investigation and as a hard challenge. What can one say for d > 2? What other functionals permit a comparable analysis?

4. Analytical Bounds from Spacefilling Curves

For many problems concerning combinatorial optimization in Rd one can obtain useful bounds by appealing to the existence of a map 0 from [0, 11 onto [0, Ild that is Lip a with a = lld, i.e. 10(8) ~ 0(t)l < C18  tilld for a constant c and all 0 < s < t < 1. Moreover, Milne (1980) established that one can further require 0 to be measure preserving, and from our perspective, the benefit of that fact is that it lets us use spacefilling curve techniques to get probabilistic inequalities, at least in the case of uniformly distributed random variables.
For our first example we again consider the traveling salesman problem in Rd, but this time we take the cost of travel from x to y to be 1x  y1P, the pth power of the Euclidean distance. If 5 = {Xl, X2,. . ., X,,} is a set of n points in [0, 1]d , how can we bound L(S), the length of the shortest tour through the points of 5 under this metric, i.e. how can we bound
L(S) = min ix,(i)  X,(i+l) JP (4.1)

where the minimum is over all cyclic permutations?
Since 0 is a surjection, each xi E S C: [0, lld has a preimage yi E [0, 11. If we choose a cyclic permutation or',so that y,(,) :5 Ya(2) :5 * * * < Ya(n) i then a heuristic tour of the {xi} can be formed by visiting them in the order of the {yi}. For this heuristic we find
E10(y,w)  O(Y.'(i+l)T
E 1 Ya (i)  Ya (i+ 1) JP/
< cPn(dP) 1d (4.2)


290 Steele
where we applied 1161der's inequality and the fact that E ly,(i)  Y,(i+l) 1 is
bounded by 1. The key idea of (4.2), i.e. building a path through {x, 1 <
i < n} by visiting the points in the linear ordering of the {yi, 1 < i < n}l
is called the spacefilling heuristic. For application to the TSP, this idea
was first proposed by Bartholdi and Platzman (1982) and independently
by D.H. Fremlin (see e.g. F~emlin 1982). Both for heuristic algorithms
and analytic bounds, the idea of using a spacefilling map to exploit the
linear ordering, or simple geometry, of [0, 11 has many natural applications,
and the breadth of these variations can be seen by consulting the survey
by Bartholdi and Platzman (1988), the papers by Glass (1985) and Imai
(1986), or the recent thesis by Bertsimas (1988).
For p 1, inequality (4.2) recaptures the familiar 0(n (dl)ld ) bound,
but for p d it provides new information by providing a 0(1) bound. In
contrast, one only obtains the weaker inequality
L(S) < clogn. (4.3)
by classical arguments that rest on the fact that any set of n points in
[0, 11d contains a pair within enl/d of each other.
The argument used for sharper bound (4.2) was also applied in Steele
(1988) to show that the sum of the dth powers of the lengths of the edges of
a minimal spanning tree of n points in [0, 11d can be bounded independent
of n. For d = 2 the uniform boundedness of the sum of squares of the
edge lengths had been established earlier by Gilbert and Pollak (1968), but
their delicate geometric argument has no natural analogue for d > 2. In
contrast, the bound provided by the spacefilling heuristic works pleasantly
in all d > 2. For the spacefilling heuristic applied to the TSP the most
AY interesting problems concern the ratio of the length of the tour produced
by the spacefilling curve to the length of the optimal tour. In R' Platz
man and Bartholdi (1988) provided a bound of order 0(logn), and they
conjectured that there is a uniform bound on the ratio. Bertsimas and
Grigni (1989) settled the conjecture by giving an example that shows the
ratio can be as bad as clog n. The following special case of work in Steele
(1989) complements the results of Platzman and Bartholdi (1988) in a way
that may be useful in algorithmic applications. The proof does not require
any detailed properties of the spacefilling curve in order to provide ratio
bounds, except that the curve is measure preserving and is as smooth as
THEOREm 4.1. Let 0 be a measure preserving transformation of [0, 11 onto
[0, 112 that is Lipschitz of order a = 1/2, i.e.
10(x)  OW1 < C1X  Y1 1/2 (4.4)
for some c and all X, y E [0, 11. If & is the length of the path through the
points {Xl,X2,..., X.} C [0,112 that is constructed using the spacefilling


i i i

Seedlings in the Theory of Shortest Paths 291
heuristic based on 0, then for n > 2

H. :5 Ln { 1 + 2C2 log(M/~) } + 7rC2M, (4.5a)

where L,, is the length of the optimal path through {xl ~ X2 5 Xn }, m is the length of the longest edge in the heuristic path, and 9 is the average length of the edges in the optimal path.

H. + ire' + 2c' log n) Ln. (4.5b)

PROOF: We suppose the heuristic tour visits the points in the order Xl, X2, .... Xn, i.e. we suppose there are ti E [0, 11 such that ti :5 t2 :5 :5 t, with xi 0(4). For A > 0 we introduce two basic subsets of {l, 2, n  1} by

U(A) = {i : 14+1  41 > A, 1 < i < n}

V(A) = {i : 10(ti+l)  0(ti) I > A, 1 < i < n}. For i E V(A) inequality (4.4) implies

Cit, _ t,+111/2 > 10(t,+1) _ 0(tjI > A

so i E V(A) implies i E U(,X2/C2), i.e.

V(,\) C U(C2,X2). (4.6)

If g(A) is the cardinality of V(A), we also have
H. f 'g(A) dA, (4.7)

where m = maxlFor i E U(A) the intervals [ti, ti + A] are nonintersecting, so if we set

Ai = 0([ti, ti + A])

then since 0 preserves measure, each Ai has Lebesgue measure A = M(Ai) and Ai n Aj has measure zero for any pair i 0 j, i, j E U (A).
We let D(x, C) C [0, 1]2 denote the set of all points within distance x of the curve C, and let T, be an optimal tour of {Xl, X21 ... i Xn} with length L.. By (4.4) and the fact that each xi is somewhere on the path, we have for each i E 5 that

(CA1/2, T
Ai C D (4.8)

292 Steele

Naiman's inequality on the volume of tubes (e.g. Naiman 1986, or the easier version of the basic result given in Johnstone and Siegmund 1989) tells us that for any rectifiable curve C of length L one has

p (D(x, C)) < 2xL + irX2, (4.9)

for all x > 0. If f (A) denotes the cardinality U(A) we then have by (4.8) and (4.9) that

Af (A) U Ai 1A (D (CA1/2,T'j) :5 2cA 1/2 L,, + 7rC2,\'
E U (A)

f (A) 2cA 1/2 L,, + 1rC2. (4.10)

By (4.6) and (4.10) we find our basic bound

g(A) :5 2C2 L,,A1 + 7rC2. (4.11)

For any 0 < a < m, we can apply the trivial bound g(A) :5 n  1 for A E [0, a] and apply (4.11) for A E [a, m]; so, when we integrate in (4.7), we find
& :5 a(n  1) + 2C2 L,, log(m/a) + 7rc, (m  a). (4.12)

Finally, since L,, :5 Hn :5 (n  1)m we have for a = L,,/(n  1) = ~~
that a E 10, m), so we can let a = E in (4.12) to find (4.5a). To see that
(4.5b) follows from (4.5a) we just invoke the very crude bound m < L,,
and 9 = L./(n  1). m

The argument used in the proof of Theorem 4.1 uses several ideas from Barthold! and Platzman (1988), and it makes progress mainly by being systematic in the exploitation of the bound (4.9).
The next section deals more directly with the geometry and topology of spacefilling curves.

5. Schoenbergls Map and Srnoother Maps

Section 4 made use of smooth spacefilling curves, but it did not provide concrete examples. This section engages the problem of constructing space~ filling curves, especially curves that are m smooth as possible and that preserve Lebesgue measure. It also points out a topological barrier to the sharpening of Theorem 4.1.
We begin by considering a method of Schoenberg (1938) that gives perhaps the shortest classical example of a continuous map from [0, 11 onto



Seedlings in the Theory of Shortest Paths 293

[0, 112. Schoenberg's map is not as smooth as we need, but it points the way to a map that is both simpler and smoother. We first define a real valued even function f of period 2 by taking f (t) = 0 in (0, 1/3), f (t) = 1 in (l/3, 1), and making f (t) linear in (l/3,2/3). We then define Schoenberg's spacefilling curve by the explicit formulas

1 1 2t) + 1 4t) +
X(t) 2 f (t) + F2 f (3 ~3 f (3 (5. 1 a)

1 3t) +
Y(t) 2 f (3t) + F2 f (3 F3f(35t)+ (5.1b)

To prove the map t (x(t), y(t)) is surjective, we first note that if {ak} is any infinite sequence of O's and 1's, then a typical point in the Cantor set C C [0, 11 can be written uniquely as

to = 2ao + ~al + ~a2 (5.2)
3 32 33

By straightforward, but tedious, bounds one can also show that f can be used to extract the kth term in the ternary expansion of to, specifically

f(3kto) = ak. (5.3)

Now, given any (xo, yo) E [0, 1]2, we can use the binary expansion of xo and yo together with the explicit formulas (5.1) and (5.2) to write down a point in C that 0 maps to (xo, yo), so 0 is a surjection of [0, 11 onto [0, 112.
One important aspect of the explicit formulas (5.1) and (5.2) is their computational feasibility. Not only do we know that for every point (xo, yo) of [0, 1]2 that there exists a point of [0, 11 that maps onto (xo, yo), but we can also quickly compute a point t E C such that Offl = (xo, yo).
Now we need to assess the smoothness of Schoenberg's map Offl
(x(t), y(t)). By uniform convergence, we see 0 is continuous on [0, 11. In fact, it is easy to show there is an a so that 0 is in Lip a, and we can even determine the best value of a. First, just consider x(t) and note that f satisfies the two naive bounds 1 f (s)  f (t) 1 :5 31 s  t 1 and 1 f (t) 1 < 1. Thus we have for any n > 1 that

n 00
lx(s)  x(t) 1 :5 3 1: 2 k 13 2k28 3 2k2ti +2 E 2 k, (5.4)
k=l k=n+l so for all integers n we have

lx(s)  x(t)i = 0(js  t1(9/2)" + 2`). (5.5)


i i i

294 Steele

Finally, by choosing n to be the integer nearest  (1092 1 S  t 1092 9, We find lx(s)  x(t)i = O(Is  tia) where a = (21092 3)1.
Having achieved an a for which f E Lip a, we will show that f
Lip a' for any a' > a by using some elementary facts about HausdoriT dimension. In fact, we use the result of Hausdorff (1919) that the dimension of the Cantor ternary set equals log 2/ log 3. If we let N(c) be the least number of intervals {Ii} of length 2c, 0 < c < 1, that cover the Cantor set C, then in terms of N(E), the fact that C has Hausdorff dimension 1/ 1092 3 tells us that for any 5 > 0 there are constants A and B such that AE01 > N(c) > Bc0+6, where fl = 1/ 1092 3.
Now suppose 0 is any map of C onto [0, 1]2, and suppose that ip is also Lip a'. If A denotes Lebesgue measure in R2, then since the compact set ~b(C) covers [0, 112, and since we have a collection of N(E) intervals {Ii} of length 2E that cover Q we have

\(9P(TJ) :5 N(E)ir(cE")2 = O(N(e)E2,'). (5.6)

From (5.6) and the arbitrariness of 6 > 0, we conclude that 0 > 2a', i.e. a < 1/(21092 3) for any Lip a map of the Cantor set onto [0, 112. We have thus established that Schoenberg's spacefilling curve is precisely of smoothness type Lip a with a = 1/(21092 3).
Although Schoenberg's mapping is a rich source of insight, one has to put in considerable modification in order to attain the maximal level of smoothness that one can have. Still the Lip 1/2 measure preserving property is shared by several of the classical spacefilling curves, particularly those due to Hilbert and Lebesgue. For a proof of these features of the classical curves as well as some remarkable analytical applications of spacefilling curves, one can consult Milne (1980). Also, to show one cannot find a map smoother than Lip 1/2 from.[0, 11 onto [0, 112, we just use the fact that the Hausdorff dimension of [0, 11 is 1 and repeat the argument given for the lower bound of smoothness for Schoenberg's map.
There is nothing more we need to say about the construction of smooth spacefilling curves, but there are some final issues concerning the spacefilling heuristic and the topology of [0, 1]2. The bound on the ratio H,,1L. that was given in Section 4 really relied on bounding the ratio H,*,1L,' where

it,+, _ t,11/2

and 0(ti) = xi, 1 < i < n. To see a subtlety in this process, we first recall that the dimension theorem of general topology tells us that there is no continuous bijection between [0, 11 and [0, 1]2 (see e.g. Dugundji 1970, p.

Seedlings in the Theory of Shortest Paths 295

359). Thus, every continuous surjection must have a double point. The investigation of multiple points was pursued further by P61ya (1913) who gave a spacefilling curve with multiple points with multiplicity bounded by three. This explicit line of investigation was completed by Hurewicz (1933) who showed that any surjection of [0, 11 onto [0, 112 must have a triple point. These facts can be used to show that bounding of H,*, can be slippery.
For example, suppose (x, y) is the triple point guaranteed by Hurewiez
and therefore suppose we have t, < t2 < 4 with 0(ti) = (x, y). Now, if
81,i < 82,i < 33j and si,j ~ ti as j  ~ oo for each i E f 1, 2, 3} we have that
L3 = L 0(82,j), 0(83j))  0 as j+ oc. On the other hand,

H3* 8,'j _ 8, + 1,j 1 1/ 2 > 1214  41 (5.8)
for all sufficiently large j. We thus have that H3* IL3 can be made arbitrarily large, and, at first blush, this fact might seem to cast doubt on (4.5a) or (4.5b). There is no contradiction between (5.8) and the earlier bounds, but (5.8) nicely shows that one cannot rely too heavily on,H,*, for a detailed understanding of H,,.

6. Karp's Partitioning Algorithm

The Euclidean traveling salesman problem is the task of computing the shortest path through a set of points in Rd. As a computational challenge, the TSP has become an essential test problem for combinatorial optimization, and, as one can see by considering the range of techniques in The 7'~aveling Salesman Problem: A Guided Tour of Combinatorial Optimization (Lawler, et al. 1985), the TSP has provided the inspiration for some of the most fundamental developments in the field.
One such development took place when Karp (1976, 1977) used the Beardwood, Halton, Hammersley theorem to show how a simple paxtitioning algorithm yields a solution to the TSP that is (1) computable in polynomial time and (2) asymptotically optimal in an appropriate probabilistic sense. In this section, we will review Karpis basic idea and make a point that deserves to be more widely known. The asymptotic optimality of Karp's algorithm can be obtained independently of the Beardwood, Halton, Hammersley theorem. In fact, we will see that one can justify Karp's algorithm with results that are considerably less refined than the BHH theorem.
The simplest version of Karp's algorithm addresses the case of the uniform distribution, and, for ease of exposition, we will keep to that case. Let Xi, 1 < i < oo, be independent random variables with the uniform distribution in [0, 11d , and suppose k.,, is a sequence of integers that grows


296 Steele

more slowly than nl/d Karp's method for obtaining a path through the points of 5 = {Xl, X2, .. , Xn } is as follows:
(1) Partition [0, 11d into kd congruent subcubes {Qj}. n
(2) For each j, 1 < j :5 kd, calculate an optimal path Pj through the n
points s n Qj.
(3) Join the endpoints of the Pj to form a heuristic path H through all
the points of 5.
This description is incomplete until we specify kn, provide a method for finding the paths Pj, and spell out how the Pj are joined to form H. As it happens, virtually any reasonable choices will suffice. For example, we can calculate the Pj by complete enumeration of the possible orders of visiting the points of Qi n s, or we can use dynamic programming. Either of these methods will be fast enough to yield a polynomial time algorithm if k,, is chosen appropriately. Thus, for the moment, our concern is just with the effectiveness of the algorithm.
Still, we need to pick a specific rule concerning the connection of the partial paths. Thus, for each 1 < i < k d' we label the two end points of the partial path of s n Qi by ai and bi, and we connect bi to ai+ l where the Qi have been ordered lexicographically according to the vertex within each square that is lexicographically minimal. With these procedures assumed, one can show the following:

THEOREM 6A. If LK denotes the length of the path produced by Karp's n
d method and if kn is any unbounded increasing sequence such that n/ kn oo, then for any c > 0 we have

00 P{~>1+6 E Ln

The proof of (6.1) follows from the next two lemmas. The first guarantees that under the hypotheses on {Xi} that L,, cannot be too short.

LEMMA 6.1. There exist constants A > 0 and 0 < p < 1 such that for all n > 1, we have
P {Ln < An (dl)/d} :5 pn. (6.2)

The proof of (6.2) is easily achieved by dividing [0, 11d into n subcubes of volume l/n, applying standard occupancy results, and a little geometry. The second lemma is more challenging.

LEMMA 6.2. There is an r, depending only on n and kn such that for all n,
L,, < L K < L,, + rn (6.3)

Seedlings in the Theory of Shortest Paths 297
rn S C n (d2)/(d 1) kll(d 1) + k d1 (6.4)
n n
where the constant c depends on d and the sequence {kn}.

Remark. When kn = o(nl/d) then (6.4) says that r, = o(n (dl)ld). Since this result holds everywhere, (6.4) and (6.2) yield (6.1).
It is relatively easy to sketch a proof of (6.4). Let {Fi} be the set of faces of the k d subcubes {Qj}. We will use the optimal path P and some
additional edges in order to bound LK. If e is an edge of the optimal path,
P, we associate with e a set of points that we will call pierce points. If c is interior to some Qj, then the set of pierce points created by e is just the null set. On the other hand, if e = (a, b) where a E Q and b E Q' and Q and Q' are distinct subcubes, then e will create a set of two pierce points. In particular, if F and F' are the faces of Q and Q' that intersect the line from a to b, then p = e n F and p' = e n F' are called the pierce points associated with a and b, respectively.
We will now build a set of points that may have rather large cardinality, but that can be proved to lie on a relatively short path. First, note the set of pierce points has cardinality bounded by n since each Xi is associated with at most one pierce point. Next, to each face F of each cube Q, we associate a set SF consisting of (1) its 2 d1 subfaces of dimension zero (i.e. its vertices) and (2) its set of pierce points. For each F the set SF is contained in a d  1 dimensional cube of edge length k1, so, by classical n
bounds (e.g. Few 1955), there is a tour through the points of SF of length bounded by ck1 ISFJ(dl)l(d1),
n where 18F1 denotes the cardinality of SF.
Now consider the union of all of the tours through SF for all F together
with the optimal path P. This set of edges has the property that for each
j it contains a path that is contained in Qj and goes through all the points
of 5 that are contained in Qj.
We finally see that L K can be bounded by three terms: (1) the length n
of the edges in the optimal tour, (2) the sum of the edges needed to tour SF for all F, and (3) the cost of the edges required in Step 3 of Karp's heuristic. We thus see that

L K < Ln + CE k1 ISFI(d2)1(,11) + ek dl. (6.5)
n n n

The bound on r,, given in (6.4) now follows from (6.5) by H61der's inequality, the fact that the sum of the ISFI is 0(n), and the fact that there are 0(k d) faces of the cubes {Qi}.
To some extent, the preceding sketch follows the lines of Halton and Terada (1982) which one can consult for additional details. From the present perspective, the main point of interest is that one requires so little


298 Steele

probability theory. All one needs is the elementary occupancy theory in
Lemma 6.1.

7. Concluding Remarks

In 1959 the work of Beardwood, Halton and Hammersley was a singular event in the sense that prior to that date and for many years subsequent one finds no comparable work relating probability theory and combinatorial optimization. The power and beauty of the Beardwood, Halton, Hammersley theorem were immediately present, but considerable time needed to elapse before wide appreciation was possible. The key step in the process toward that appreciation is the work of Karp (1976). By connecting the asymptotic result of Beardwood, Halton, and Hammersley with the possibility of effective algorithms, Karp created an eager audience for both the original work and for results that complement it. In Karp and Steele (1985) and the recent thesis of Bertsimas (1988), one can find a review of that development. This article also provides a review, but here the focus is narrowed to the developing roles of martingale theory and of spacefilling curves.
The field of martingale inequalities is so rich that the applications in Sections 2 and 3 only offer a hint of future possibilities. Connections between martingale theory and problems like the TSP can be counted on to develop vigorously in the next few years.
Among the concrete problems that may, or may not, be attacked via martingales, the one that stands out most concerns the completion of our understanding of the tails of L,,  EL,, in d = 2. More broadly we would like to understand the ways in which L,,  EL,, behaves like a Gaussian random variable. In particular, we would like to know if L,,  EL,, converges in distribution to a Gaussian limit.
The force behind applications of the spacefilling heuristic is not as great as that behind maxtingale theory, but one can still expect vigorous activity. The strong interest in the geometry of fractals provides one motivation, but the fact that the heuristic is easily coded also helps. Even though the conjecture of Bartholdi and Platzman is formally settled by the example of Bertsimas and Grigni (1989), many questions remain. As suggested in Section 5, one can expect some more negative results. Nevertheless, one may be able to provide further positive results like Theorem 4.1 that are of use in practical problems.
John Hammersley coined the inviting phrase 'seedlings of research', and throughout his work one finds a generous willingness to reveal interesting ideas that still have room to grow. The intention of this article has been to try to live up to that tradition while engaging the shortest path through many points.

Seedlings in the Theory of Shortest Paths 299


I wish to thank S. Lalley for permitting me to include his proof of the critical case of the Beardwood, Halton, Hammersley theorem. 1 also thank
I. Karatzas and L. Platzman for their comments on parts of this article,
and I thank D. Bertsimas, D.H. Fremlin, I. Johnstone, WS. Rhee, and D.
Siegmund for making available their unpublished work. This research was
supported in part by National Science Foundation Grant # DMS8812868.


Azuma, K. (1967). Weighted sums of certain dependent random variables. T6hoku Mathematical Journal 19, 359367.
Bartholdi, J3. and Platzman, L.K. (1982). An 0(n log n) planar travelling salesman heuristic based on spacefilling curves. Operations Research Letters I (4), 121125.
 (1988). Heuristics based on spacefilling curves for combinatorial problems in Euclidean space. Management Science 34, 291305.
v Beardwood, 1, Halton, J.H., and Hammersley, J. (1959). The shortest path through many points. Proceedings of the Cambridge Philosophical Society 55, 299327.
Bergh, J. and Lofstrom, J. (1976). Interpolation Spaces: An Introduction. SpringerVerlag, New York.
Bertsimas, D. (1988). Probabilistic Combinatorial Optimization Problems. Ph. D. Thesis, Operationg Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Bertsimas, D. and Grigni, M. (1989). On the spacefilling curve heuristic for the Euclidean traveling salesman problem. Technical Report, Operations Research Center, Massachusetts Institute of Technology.
Bially, T. (1969). Spacefilling curves: their generation and their application to bandwidth reduction. WEE 7Vansactions on Information Theory 1115, 658664.
Burkholder, D.L. (1966). Martingale transforms. Annals of Mathematical Statistics 37, 14941504.
 (1973). Distribution function inequalities for martingales: The 1971 Wald Memorial Lectures. Annals of Probability 1, 1942.
Chow, Y.S. and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martingales, SpringerVerlag, New York.
Dugundji, J. (1970). Topology. Allyn and Bacon, Boston.
Efron, B. and Stein, C. (1981). The Jackknife estimate of variance. Annals of Statistics 9, 586596.
Few, L. (1955). The shortest path and the shortest road through n points in a region. Mathematika 2, 141144.
Fremlin, D.H. (1982). On the traveling salesman problem. University of Essex, Note of 6 May 1982, unpublished.

300 Steele

Gilbert, E.N. and Pollak, H.O. (1968). Random minimal trees. SIAM Journal of Applied Mathematics 16, 376387.
Glass, M. (1985). The application of spacefilling curves to some VLSI design problems. Unpublished manuscript, Fermi National Accelerator Laboratory, Batavia, Illinois.
Gradshteyn, I.S. and Ryzhik, I.M. (1963), Tables ofIntegrals, Sums, Series and Products. 4th edn., Fiznatgizdat, Moscow.
Haagerup, U. (1982). The best constants in the Khintchine inequality. Studia Mathematica 52, 231283.
Halton, J.11. and Terada, R. (1982). A fast algorithm for the Euclidean traveling salesman problem, optimal with probability one. SIAM Journal of Computing 11, 2846.
Hausdorff, F. (1919). Dimension und SuBeres MaT Mathematische Annalen 79, 157179.
Hurewicz, W. (1933). Cber dimensionser h6hende steitige Abbildungen. Journal ffir Reine und Angewandte Mathematik 169, 7178.
Imai, H. (1986). Worstcase analysis for planar matching and tour heuristics with bucketing techniques and spacefilling curves. Journal of the Operations Research Society of Japan 29, 4367.
Jakubowski, J. and Kwapiefi, S. (1979). On multiplicative systems of functions. Bulletin de L'Acad6mie Polonaise des Sciences 27, 689694.
Johnstone, I. and Siegmund, D. (1989). On Hotelling's formula for the volume of tubes and Naiman's inequality. Annals of Statistics 16, to appear.
Kakutani, S. (1986). Collected Works. Ed. R.R. Kallman, Birkhduser, Boston.
Karp, R.M. (1976). The probabilistic analysis of some new combinatorial search algorithms. In Algorithms and Complexity: New Directions and Recent Results, ed. J. T~aub, Academic Press, New York, 119.
 (1977). Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane. Mathematics of Operations Research 2, 209224.
Karp, R.M. and Steele, J.M. (1985). Probabilistic analysis of heuristics. In The 71aveling Salesman Problem: A Guided Tour of Combinatorial Optimization, eds. E.L. Lawler et al., John Wiley and Sons, New York, 181206.
Lalley, S. (1984). Personal communication.
Lawler, E.L., et al. (1985). The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Editors, John Wiley and Sons, New York.
Milne, S.C. (1980). Peano. curves and smoothness of functions. Advances in Mathematics 35, 129157.
Naiman, D.Q. (1986). Conservative c onfidence bounds in curvilinear regression. Annals of Statistics 14, 896906.
Paley, R.E.A.C. (1932). A remarkable series of orthogonal functions (1). Proceedings of the London Mathematical Society 34, 241264.
Platzman, L.K. and Bartholdi, J.J. (1988). Spacefilling curves and the planar traveling salesman problem. Journal of the Association of Computing Machinery, to appear.
P61ya, G. (1913). Cber eine Peanosche Kurve. Bulletin de FAcaddmie de Sciences, Cracovie A 305313 (also George P61yals Collected Works 3, 18,


Seedlings in the Theory of Shortest Paths 301

eds. J. Hersch and G.C. Rota, MIT Press, Cambridge, Massachusetts).
Rhee, W.T. and Talagrand, M. (1987). Martingale inequalities and NPcomplete problems. Mathematics of Operations Research 12, 177181.
 (1988a). Martingale inequalities, interpolation, and NPcomplete problems. Mathematics of Operations Research, to appear.
 (1988b). A sharp deviation for the stochastic traveling salesman problem. fl v Annals ofProbability, to appear.
Schoenberg, I.J. (1938). On the Peano curve of Lebesgue. Bulletin of the American Mathematical Society 44, 519.
Steele, J.M. (1981a). Subadditive Euclidean functionals and nonlinear growth in geometric probability. Annals of Probability 9, 365376.
 (1981b). Complete convergence of short paths and Karp's algorithm for the TSP. Mathematics of Operations Research 6, 374378.
(1988). Probabilistic and worst case analyses of some classical problems of combinatorial optimization in Euclidean space. Technical Report, Program in Statistics and Operations Research, Princeton University.
 (1989). Efficacy of spacefilling heuristics in Euclidean combinatorial optimization. Technical Report, Program in Statistics and Operations Research, Princeton University.

Appendix I. Lalley's Proof of BHH

S. Lalley (1984) provided a remarkable proof of the most interesting case
of the Beardwood, Halton, Hammersley theorem. Lalley's previously unpublished proof wins the prize for using minimal machinery. Moreover, his proof serves as a model of the power of similarity arguments and provides a line of attack that is applicable to many other functionals. Let U,, U2.... be independent random variables each having the uni
form distribution on [0, 1]2, and let L. be the length of the shortest path
through U,, U2,, U.. Observe that L,, is nondecreasing in n. We are
to prove that nl/'L,, ~ C a.s. for a constant C E (0, oo). For this it
suffices to prove that if N(t), t > 0, is a Poisson process with rate 1, then
as t+ oc we have 1

t112LNffl , C almost surely. (AI. 1)

Partition the square [0, 112 into squares Q1, Q2,, Q.~ of side m1,
and define A(Qj) to be the length of the shortest path through 1Uj, U2, t
UN (t) } n Qi. It is easy to see t hat for each t > 0 and each m = 1, 2,
the random variables \m(Q1),,Xn(Q2) \(Qn2) are independent and
t t t
identically distributed. Moreover, mA'n t(Qi) has the same distribution as in2 LNffl. Finally, we note Var(LN(t)) < oe for each t > 0 as one can see from the trivial bound LNffl < 21/2Nffl.

302 Steele

LEMMA 1. For each t > 0 and each m 2,..., we have

M2 M2

6m + A' (Qi) :5 LN(t) :5 M1/5 + E A(Qi). (A1.2)
t t

PROOF: To prove the right inequality we only need to obtain a path through U,, U2,, UN(t). For 1 < i < M2 we first find the shortest path through {U,, U2,, UN(t)} n Qi then knit these m 2 paths together by joining endpoints in adjacent squares ordered in snake raster order. Since points in adjacent squares are not separated by a distance greater than
  1   ` A
1/5m the resulting path has length no greater than m"(5+zi=l t(Qi), establishing the right hand inequality. One should note that it does not hurt this bound if some of the sets {ui, u2,..., u,.,} n Qi are empty.
To prove the left hand inequality consider the shortest path Y through U1, U2,, UN(t). If the two endpoints of y do not lie in Ui 0Qi, extend the path y so that the endpoints of the extended path ~ lie in Ui aQi; this can be done in such a way that the length of ;y is bounded by 1fl + 2/m, where Ifl denotes the length of 1. Fix a square Qi. The intersection Qi n,;y
consists of a finite number of paths 71, 72 Yk in Qi each having its end
points on aQi. Clearly, each point in {U,, U2,, UN(t)} n Qi lies in Uj yj.
The paths yl, y2 yk may be joined together by cutting and pasting and
adding arcs )31 3k1 on aQi in such a way that no point in aQi lies
on more than one of Consequently, ANQj) 5 1yjI + 4/m.
Summing over i m 2 yields the left hand inequality.

LEMMA 2. For each t > 0,

m 2
lim M1 AMt(Qj) = ELNffl almost surely.

PROOF: This does not quite follow from the strong law of large numbers. But, since for each m the random variables mA,t(Q1),...,rn,\M
m M2t(Qm~)
are i.i.d. with the same distribution as LN(t), we have by Chebyshev's
inequality that

m  ELN(t) > < M262

The assertion therefore follows from the BorelCantelli lemma.

Seedlings in the Theory of Shortest Paths 303
LEMMA 3. There exists C E (0, oo) such that
. ELN(t) C.
11M ~1/2 =
t 00

PROOF: Take expectations in (A1.2) and use the fact that E(MA,' m t (Qi))
ELN(t) to obtain

6 + ELN(t) < ELN(M2t) < v/5~ + ELNffl m

It follows that for any c > 0 there exists t sufficiently large that

ELN( .. 2t) ELNW < E
Mti/2 tl/2 1

for all m = 1, 2,... . Since L,, is nondecreasing in n, this implies that

E (' ) E < liminf ELN(s)
tl/2  800 8112

< lim sup ELNW
800 81 2
tl/2 + E.

Since c > 0 is arbitrary, it follows that ELN(s)18 1/2 C as s+ oc) for some 0 < C < oc. To prove that C > 0, note that ELNffl+ w as t  oo, by an elementary argument. Choose t sufficiently large that ELN(t) > 4; then (AI.4) implies that

lim inf ELN( .. 2t) > 0.
M00 m

The proof of (AI.1) may now be completed. By (AI.2), for each t > 0 and

m LN(.2t)
6 + m' E,\m~.t(Qi) < 

VS + m' AM t (QJ,


so Lemma 2 implies that almost surely

6t 1/2 + ELNffl < lim inf LN(m2t)
tl/2  M00 Mt1/2

:5 lim sup LN(m2t)
MW Mt1/2

 t1/2 + ELN(t)
< v5 tl/2

Now Lemma 3 implies that if t is sufficiently large then almost surely

C  c < liminf LN(m2t)
M00 Mt1/2

< lim sup LN(M2t) M00 Mt1/2 < C+E.

Since L,, is nondecreasing in n it follows that

C  e < limin' LNW
._W /2

< lim su LNW
800 81/2
< C + E almost surely. Now (AI.1) follows by letting E  0.

Appendix II. Paley's Square Function Argument

This Appendix develops an argument for martingales that was introduced in Paley (1932) for Walsh functions. The only real changes made here to Paley's method axe those required to provide explicit bounds on the basic constant. As one should expect, the constant is not as sharp as that given in Burkholder (1973), but the reason for reviewing Paley's argument is rather to show how the maximal function can be used to bound LP norms of martingales. Other features of the proof are discussed at the end of the appendix.
Consider a martingale difference sequence {yi : 0 5 i < n} with yo =_ 0 and its associated martingale Mk = Y1 + Y2 + "' + Yk, 1 < k < n. To keep to the essentials, we will stick to the case of even integers p. We first

Seedlings in the Theory of Shortest Paths 305

compute the difference sequence of pth moments:

E{Mkv+1  Mkp} = E{(Mk + Yk+l)P  Mk"}

= E PYk+lMP1 + (P)y2+1Mkp2 ++ y p
2 k k+l}

= E (P) Y2+1 Mkp2 + (p 3 Mkp3 + ...
2 k 3) Yk+1 + Ykp+1


where only in the last inequality is the martingale property invoked. We then use H81der's inequality on the right hand side to bring the powers of Mk up to the same level. Specifically, for 3 < j < p  1 we use

Eyj . Y2+1MkP2)0
k+,MkP3 < (E (EyP )10 (AII.2)
k k+l

where 0 = (p  j)l(p  2). Since 0 < 0 < 1, inequality (AIL2) can be relaxed to
Eyj 'i < E Y2 Mf2 + Eykp+11 (AII.3)
k + 1 M  k+l
so we can crudely bound the sum of the binomial coefficients to find

2 1MP2
JE{Mf+1  Mkp} I < 2P { Eyk+ + Eyfk+1 (AII.4)

Finally, we sum over 0 < k < n to find

n n1
EMP:5 2PE y 2 max MP  2 + 2PE
M k ) 1 k=l k=0
Y2y/2}2/P 2)/P n Y2 p/2
2P {E( n (E 1max M.P) (p + 2PE k
E k k=l k=l

where in the first summand we used H61der's inequality, and in the second summand we used the elementary real variable inequality for p ~~ 2

ap + 2 2 2)p12.
1 aP2+...+aPn:5(al+a2+...+an

Our motivation for moving from (AII.4) to (AII.5) is to use Doob's maximal inequality, or rather its consequence for 1 < p < 00 that

E (,max 1 Mk JP) }'/p :5 q(EMP) V', (AIL6)




where q is the conjugate index to p (i.e. q = pl(p  1)). From (AII.6) we thus find

< q2P ~l( n 2) 1/2[ 11M.11 p2 + 2p 2 1/2 P. (AII.7)
11M.11pp E Yk p Yk
k=l p
k=l p

Finally, we note inequality (AII.7) is of the form xP < ay2Xp2 + byP which implies x < {(2a) 1/2 + (2b)1/P}y, so we find our modest version of Burkholder's inequality for even integers p:

n ) 11/21k
llMnlip :~ 01p 1~ (E Yk
k=l p

where ap :5 q 1/2 2(P+ 1)/2 +2 (p+l)/2 < q 1/22 (p+3)/2.



The constant ap is larger than the 18q 1/2p we know to be sufficient, so some comment seems needed to justify our enthusiasm for this morethanfifty year old argument. First, it uses very little about martingales; e.g. in (AIL1) we use a weak consequence of the definition, and the only other fact we need is a maximal inequality of Doob's type as given in (AII.6). Second, the differencing applied. to pth powers in (AIM) can be applied to other functions f of Mk, provided that f (Mk + yk)  f (Mk) can be bounded by a useful expression. Finally, since the argument is free of stopping times, its parts are amenable to more individual attention. In particular, the use of bounds on ilyillp, 1 < p < oc, can be tried out in AIL5, AIL6, or AII.7.

Added in Proof.. The idea of using a spacefilling curve to sequence visits to points in the square is evidently much older than recent references seem to indicate. From the comments of R. Adler in the Collected Works of 8. Kakutani (Kakutani 1986, V.II, p. 445), Kakutani had presented the idea as early as the spring of 1966.

Program in Statistics and Operations Research School of Engineering and Applied Science Princeton University Princeton New Jersey 08544.