Please see PDF version

The AmmIs of P5obabaity
1988, Vol. 16, No. 4,17671787



P~inceton University

Rd Let Xi, 1 .,~ i < 00' denote independent random variables with values in , d ~!: 2, and let M,, denote the cost of a minimal spanning tree of a complete graph with vertex set (X,, X2,, XJ, where the cost of an edge
(Xi, Xj) is given by 0(iXi  XJJ). Here IXi  XJ1 denotes the Euclidean distance between Xi and Xj and 0 is a monotone function. For bounded random variables and 0 < a < d, it is proved that as n ~ co one has
M.  c(a, d)n(da)ldfR.I(X)(d")lddX with probability 1, provided lp(x) x" as x  0. Here f (x) is the density of the absolutely continuous part of the distribution of the {Xi}.

1. Introduction. The main issue pursued here is the development of the probability theory for the minimal spanning tree of n independent multivariate observations. For x. E= R d
& Y 1 :5 i .,g n, we will be concerned with graphs G which have vertex set V = {Xl, X21 '  ., xJ and edge set E = {(xi,. x,): 1 c~ i < j 5 n}. Here the length of an edge e = (xi, xj) C= E will be denoted by lel, where Jel = Ixi  xji equals the Euclidean distance from xi to xj.
The functional of interest is M(Xl, X21, Xn), the weight of the minimal spanning tree of V = {Xl, XD. .., Xn}, where the weight assigned to edge e is ip(lel). More precisely, we focus on

M(Xl, X21 ... 1 Xn) = min
T er=T

where the minimum is over all connected graphs T with vertex set V. The weighting function %p: [0, oc)  [0, oo) which is of most interest to us is 0(x) = x', where 0 < a < d. More general 0 are still of interest and some of the analysis which follows assumes only that 0 > 0. When additional properties of 4, are required, those properties are made explicit.
Any tree T which attairs the minimum in (1.1) will be called a minimal spanning tree (MST) for V. In almost all of the situations considered here the minimal spanning tree is unique; but, to avoid consideration of special cases, we never rely on that uniqueness.
Subsequent sections will develop several results which pertain to finite values of n, but the main result is the limit theorem:

THEOREM 1. Suppose X,, 1 i < oo, are independent random variables with distribution p having compact support in R d , d 2?: 2. If the monotone

Received September 1986; revised December 1987.
'Research supported in part by National Science Foundation Grant D1IS8414069.
AMS 1980 subject cktssifications. Primary 60F15; secondary 05C05.
Key words andphrases. Minimal spanning trees, subadditive processes.


1768 J. M. STEELE

function 0 satisfies ip(x) ~ x' as x  0 for some 0 < a < d, then withprobabil
ity 1
f  a)ld d
lim n(dOVdM(Xl, X2,, XJ = C(a, d) Rdf(X)(d X.

Here f denotes the density of the absolutely continuous part of g and c(a, d) denotes a strictly positive constant which depends only on the power a and the dimension d.

For the case of ordinary length 0 (x) = 1x 1, Beardwood, Halton and Hammersley (1959) state that the preceding result follows from modifications of their analysis of the traveling salesman problem. The proof of Theorem 1 which is given here uses several tools which were not applied in Beardwood, Halton and Hammersley (1959). Perhaps largely for this reason, the proof given here for the general result is more direct and less taxing than the original analysis of Beardwood, Halton and Hammersley (1959) of the asymptotics of the traveling salesman problem. Still, many of the insights provided in Beardwood, Halton and Hammersley (1959) have helped guide this analysis of the minimiLl spanning tree.
One feature of Theorem 1 that should be noted is that if g has bounded support and p is singular with respect to Lebesgue measure, then we have with probability 1 that M(Xl, X2,..., X,) = o(n (da)ld). Part of the appeal of this observation is the indication that the length of the minimal spanning tree is a measure of the dimension of the support of a distribution. This suggests that the asymptotic behavior of the minimal spanning tree might be a useful adjunct to the concept of dimension in the modeling applications and analysis of fractals; see, e.g., Mandelbrot (1977).
Theorem 1 is closely related to the theory of subadditive Euclidean functionals [Steele (1981afl, but there are some essential differences. One issue is that M. = M(Xl, X2,, Xn) is not an almost surely increasing sequence of random variables. This fact forces subtleties on M,, which are absent in the study of the traveling salesman problem, the Steiner tree problem and other monotone Euclidean functionals.
The technique applied to prove Theorem 1 was also useful in studying optimal triangulations and the directed traveling salesman problem [Steele (1982, 1986)1. Still, the minimal spanning tree function has its own unique features and, apparently, one cannot avoid developing some results which are special to the geometry of the minimal spanning tree. Many of the special features of the deterministic functional M(Xl, X21 ... 1 Xn) are expressed by the inequalities given in Section 2. The unified treatment of those inequalities is made possible by the systematic use of a distance counting function and the pattern used in the analysis of the counting function might prove useful in other. geometric problems.
In Sections 3 and 4, a proof of Theorem 1 is given under the hypothesis that g is the uniform distribution in [0, 1]d. A key tool used in the proof is the jackknife inequality of Efron~Stein (1981). That inequality states (approximately) that


Tukey's jackknife estimate of variance is conservative in expectation. From the point of view of combinatorial and geometric probability, the virtue of the EfronStein inequality is that it shows how to bound the variance of a multivariate function of independent variables when one understands how the function changes as one of the arguments of the function is varied.
Section 5 provides a general method for extending results for Euclidean functionals from the uniform case to arbitrary distributions. This extension method is easier and more direct than the methods used in either Beardwood, Halton and Hammersley (1959) or Steele (1981a). The idea behind the extension technique is closely linked to almost sure embedding, and an important tool in the technique is the embedding lernma of Strassen (1965).
The final section reports on those aspects of the probability theory of minimal spanning trees which could not be resolved by the methods of this paper. In particular, effort is made to lay out some open problems of interest.

2. A counting function and its applications. Let TO denote a minimal
spanning tree of {Xl, X21... 1 X0 C [0,1]d p Where the edge weighting function 4,
is taken to be 00(x) = 1xl. Also, let Pd(x) denote the counting ffinction of the
edge lengths of TO, i.e., Yd(x) is equal to the cardinality of the set

(2.1) (e E= TO: lel > x}.

Lemma 2.1 does a good job of capturing the most basic information on the distribution of the edge lengths. An interesting feature of the bound is that it does not depend on n.

LEMMA 2.1. There is a constant #d depending only on the dinwnsion d ~, 1 such that

(2.2) pd(X) .5 pdXd

for all 0 < x < oc. 

PRoop. We first note by the pigeonhole principle that there is a constant ad such that from any set of k points {XV X21 ... 1 Xk} in [0, 1]d , one can select a pair of points xi and xj with IXi  Xil "~ adk'1d. With a little care, one can show that ad = 2Fd will suffice, but we will not need to be concerned with precise constants either here or subsequently.
We now suppose that {Xl, X21... 1 X0 are any n points in [0, 1]d and let E denote the set of edges of a MST for {Xl, X21 ... 1 X0  We recall Prim's algorithm for constructing a MST [see, e.g., Prim (1957) or Papadimitriou and Steiglitz (1982), page 273] can be expressed as (1) initially join the two nearest points and (2) iteratively join the two nearest connected components. Now if ej, 1 j < n, are the edges of a MST for {xl, XD. 'X0 constructed by Prim's algorithm and the edges are listed in the order chosen by the algorithm, we have the bound

Ifil '9 ad(n  j + 1) 11d,

1770 J. M. STEELE .

since, when ej is chosen, there are exactly n  j + 1 connected components of the forest produced up to that time by the algorithm.
Now if e , 1 ,~ j < k, are any k edges of E, the fact that the edges are chosen in monotone increasing order gives us

k n1 k+i
(2.3) E leil < F. leil 5 ad E '11d < adk(d1)1d
j1 inA i2

for a new constant iid. If we apply this last inequality to the set of k = Pd(x) edges with length at least x, we have

(2.4) X%(X) 5 4id(Pd(X))(d~l)ld.

Upon division by xPYWd(x), inequality (2.4) establishes Lemma 2.1 with
= (ad)d. E]

Lamma 2.2 is a generic application of inequality (2.2). It suggests the correct order of magnitude for sums of powers of edge lengths.

LEmmA 2.2. For any ~(x) = 1xly, any minimal spanning tree T of {Xl, X21 ... 1 X0 C [0, 11d satisfies the inequalities:
(2.5) E jely jW(V, d)n (d 1)1d for 0 < y < d,
e c T

(2.6) le _ .8'(d)log n
e c T


(2.7) lel1 < jg'(.y, d)yyd for 0 < y < oo and 0 < y < d.

Here d) is a constant depending only on the parameter V and the dimen
sion d and P'(d) is a constant depending only on d.

PRoop. For any X > 0, we have for 0 < y < d,
E jely = E jejy + F, lely,
e r= T jeJ:5nA jel>nA

so, noting that n  1  Yd(x) is the number of edges of the MST which are of length less than x, we can majorize the first sum by n`y and write the second sum as a Stieltjes integral to get
iel'y 5 n''yx + frdxyd(n  1  Pd(x)).
eeT n

Integrating by parts, we find
E jely .s; n'~xy + nyXpd(nX) + yfrdxy'pd(x) dx.
er= T n


Majorizing rd(x) by pdX~d and integrating, we get

E lell :5 n'"Y + #dn k(dy) + Pdy(d  7) er= T

Choosing A = 11d, we obtain (2.5). By analogous arguments, one can verify inequalities (2.6) and (2.7). E]

REMARK. The preceding proof provides a bound on O'(y, d) which diverges to infinity, as 1  d. When d = 2, an explicit bound'


was proved by Gilbert and Pollak [Q968), page 16]. The method of Gilbert and Pollak uses the geometry of the plane extensively and their method appears to be difficult to generalize beyond d = 2.

In the jackknife estimates developed in Section 4, we need bounds on the change which takes place in the value Of M(Xl, X21 ... , x.) as a point is deleted from {Xl, X21 ... 1 x.}. Following the customary jackknife notation, we set M(Xl, X21 ... 1 'il ' ' ' 1 X,) ~ M(Xl, X21 ... 1 Xi 11 Xi+ 11... 1 Xn), i.e., a hat is used to signal a missing variable. Further, if T is any graph, we let N(i) = {j: (xi, xj) E= T}, so N(i) is the set of neighbors of xi in the graph determined by T. We can now bound the changes in the values of M as the sample changes by adding or dropping points.

LEMMA 2.3. For any edge weight function ~, we have

(2.8) M(Xl, X21 ... 1 X.) < M(xl, X2 .... P x.) + min O(Ixi  xji)
j: jg& i

and, for any nondecreasing 0, we also have

(2.9) M(Xl, X21 ... 1 XJ ' M(Xl, X21 .... Xn) + E 0(2]xi  xji).
j c= N(i)

PRoop. To prove inequality (2.8), note that any tree which 8Pans {Xl, X29 ... 1i,...,Xn} can be completed to a connected graph spanning {Xl, X21 ... 1 Xn} by connecting the point xi to the point xj, j * i, for which ~(lxi  x,i) is minimal.
To prove inequality (2.9), let T be a minimal spanning tree of {xl, X2 .... 1 X0 and let x' be an element xj of N(i) such that Jxj  x,l is minimal. We get a connected graph spanning {Xl, X21 ... 1 il ... 1 Xn} by (1) taking the edges of T, (2) deleting all the edges incident to xi, i.e., deleting the set {(xi, xj): (xi, xj) EE T} and (3) adding the set of edges which join xl to the other neighbors, of xi, i.e., adding in the set A = {(x', xj): i C= N(i), xj * x'}. This procedure implies the

1772 J. M. STEELE


M(xl I X21 ... I'i, ... 1 X.) 5 M(xl, X21 ... Y X.)

E o(ixixii)+
iGN(i) jl=N(i)

By the triangle inequality and the definition of x', we have Jx'  xji
Jx'  xii + Ixi  xji :5 21xi  xjJ, so inequality (2.9) follows by the monotonicity of

To bound sums such as that in inequality (2.9), one needs a bound on IN(Q, the cardinality of the set N(i) of neighbors of x,

LEMMA 2.4. If 0 is strictly increasing, then there is a constant Dd depending only on d such that any MST in R d has maximum vertex degree bounded by Dd.

In the case of 00(x)  x, this lemma is well known; in fact, Dd is bounded by Nd, the number of spherical caps with angle 601' which are needed to cover the unit sphere in Rd. To cheek this bound, one can note that if a vertex had degree greater than Nd, then two edges of the MST would have an angle less than 60`. Simple geometry would then contradict minimality.
To prove Lemma 2.4 for general monotone 4,, we first note by the monotonicity that we can choose edges for a minimal spanning tree with edge weighting function ~ which are also the edges of a minimal spanning tree with edge weighting function 00(x)  1xl. To see this, just consider building the two minimal spanning trees by the algorithm of Kruskal (1956) which (1) orders the edges of the complete graph in increasing order and (2) examines each edge in order and accepts an edge into the minimal tree provided it does not. create a cycle with the previously chosen edges. By the monotonicity of 0, the edge ordering given by 0 and 4,0 coincide, so if one constructs a minimal spanning tree using Kruskal's algorithm for ~ and 00, the sets of edges can be chosen so they will also coincide. [For more detail on Kruskal's algorithm, one can consult Papadimitriou and Steiglitz (1982), page 289.1
The main result of this section is Lemma 2.5 which expresses a basic continuity of the minimal spanning tree functional.

LEMMA 2.5. There is a constant #"(a, d) such that for any two finite subsets X and X' of [0, 1]d , we have for the weighting function 0(x) = x', 0 < a < d, that

(2.10) IM(X)  M(X)l :5 C(a, d)IX,&X'I(da)/d.

Here X A X' denotes the symmetric difference of the sets X and X', i. e., (X U X) (X n X).

PRow. For any x' in the set X'  X, we let N(x) denote the set of neighbors of x' in the MST of X U X'. By the same considerations used in the


proof of (2.9), we obtain

M(X) . M(X U X) + 0(21x' xl).
x'oE X  X' x e N(x)

We will use Lemma 2.2 to bound this double sum, but some maneuvering is required. First consider the set of edges E' = {(x', x): x E N(x) and x' C X X'} and let V' denote the set of all vertices which belong to an edge of E'. We know the cardinality 1 V1 is bounded by (Dd + 1) 1x  x'i, since no vertex x' can have degree bigger than Dd.
A necessary and sufficient condition due to Prim (1957) for an edge e to belong to some minimal spanning tree of a graph is that the vertex set of the graph can he divided into two nontrivial complementary components A and Ac such that e is the least weight edge which meets both A and A. For e contained in MST{xl, X21 ... 1 X0 = E, we can thus find A C {Xl, X21 ... 1 X0 Such that e meets A and A' and
lel = niin{j e E: j n A g n Ac * sb}.

Now if e c= E', we also have e C= E, and hence we have the equality'
lel = min{e C= E': e n A n V' *.0, e n Ac n V *.0}

and we see that e must also be in MST(V) by ~s sufficient condition. We therefore have the bound

E E 0(21x  x'i) < 0(21el),
x'EXX' xr=N(x) er=MST(Y)
so using Lemma. 2.2 and the bound on the cardinality of V, we have
(2.11) M(X) :5 M(X U X) + 20P'(a, d)(Dd + 1)(da)/IX _ X,l(d.)1d.

To obtain an inequality in the other ~ion, we build a spanning tree for X U X' by taking minimal trees for X and X'  X and joining them with a single edge. This construction gives

(2.12) M(X U X') :5 M(x) + M(x'  x) + min ;AX  xl).

Since by inequality (2.5), we have the bound P(a, d)JX' _ Xl(da)/d on M(Xl  X) and since 4,(1x  x'l) is majorized by d"ld ' inequality (2.12) yields
(2.13) M(X U X) .!~ M(X) + P'(a, d)JX'  Xl(da)/d + da/2,

Now majorizing M(X U X) in (2.11) by the bound given by (2.13) yields M(X)  MW
(2.14) < #'(1, d) (1 + 2(Dd + 1)(daWd)1X X,l(da)/d + da/2.

By the symmetry of X and X', we see that IM(X) M(X)l is also bounded by the right side of (2.14). Since we can either assume lx  x'i ;> 1 or else inequality (2.10) is trivial, we see that inequality (2.14) establishes the claimed bound (2.10)

1774 1M.STEELE

with a value of OX which can be given by

#"(a, d) = #'(y, d)(1 + 2(Dd + 1)1d&)1d) + da/2.

3. Growth in mean. The asymptotics of mn = EM(Xl, XD..., X.), where X, are independent and uniform on [0, 1] d ' can be established by a three step technique of Poisson smoothing, analysis of an approximate recursion and a Tauberian argument for extracting information about Mn from information about averages.
By ff we denote a Poisson point process on R d with constant intensity equal to 1. For any bounded Borel set A, the set ir(A) is almost surely a finite set of points and M(.ff(A)) will denote total weight of the minimal spanning tree of the finite set of points ff(A). Here, of course, the total weight of the minimal spanning tree is defined in terms of 0 as expressed in (1.1).
We now let op(t) = EM(,ff [0, t]d); that is, cp(t) is the expected cost of the minimal spanning tree of the set of points ff [0, t]d~ To make this explicit, we Write 17[0, t]d = {X,, X2,..., XN}, where N is a Poisson random variable with mean td and we apply (1.1) to get the representation
4,(t) = Emin F, 0(jj  Xjj),
T (i,j)eT
where the minimum is over an spanning trees of {X,, X2,, Xlv}. The asymptotic behavior of 0 will be obtained from Lemma 3.1.

LEmmA 3.1. If 0(x) = xo' with 0 < a < d, then there is a constant C Qa, d) such that

Mdo(t/M) + CtaMd for aU reals t, 0 < t < oo, and integers m > 1.

Pitow. We first partition [0, tId into Md subcubes Qi of edge length tlin. Next, for each i for which ir(Qi):?& 0, we choose one representative Yi from ir(Qi). By inequality (2.5) scaled up to [0, t]", we can obtain a minimal spanning tree T of the set (Yi: 1 .!~ i _,5; mdy ff(Qi) g& 0) such that E,,Tieio'~ t00'(a, d)( Md)(d)1d . t#'(a, d)md. But now the tree T, together with the minimal spanning trees of all the v(Qi), must form a connected graph. Such a connected graph has length no greater than that of the minimal spanning tree of Irp, tld), SO

M(Ir[O, t]d) .g M(v (Q)) + E jeja


E M(gr(QJ) + tOR'(a, d)Md,. i1


Taking expectations and using the fact that EM(Irffl,)) = 0(t/m) completes the proof of the lemma with C = #'(a, d). 11

As a consequence of (3.1) and the continuity of 0, we can show there is a constant c(a, d) k 0 depending only on d and a such that

(3.3) .0(t) c(a, d)td as t , 00.

To prove this, we set ~(t) 0(t) + 2Ct and note that if mo is chosen large enough to insure 2 m" < m d then inequality (3.1) implies that for m > mo, we
0 op

. Mdj(t).

For any e > 0, we can let P  liminft_ j(t)ltd and use the continuity of ~ to find an interval [to, tl] such that ~(t)ltd:5 p + C for all t E [to, tl]. Setting A = {s: ~(S)/S d .~ 8 + }, we note that the recursion for 0 implies that U%_," [mto, mtl] C A. But since mt, ~: (m + 1)to for all m 2t. to(tl  to)` ~ t*, we see that for m greater than max(t*, mo) the intervals [mto, mtl),are overlapping. From this we see that A must contain an infinite interval [t*.*, oo). This ~(t)ltd .~ q + 8, and by the arbitrariness of 8 We ob
implies lim sup tain sh(t)
Ptd~ This result and the fact that a < d implies the asymptotic relation (3.3).
By the definition of 0(t) and the scaling property taM(Cl, X21 ... IXn)
M(txly tX21 ... 1 tx.) of M, one can compute by conditional expectations that

(3.4) .0(t) = ta E m.e I"tdn/W.
Here Mn = EM(Xl, X2, Xn), where the Xi are independent and uniformly
distributed on [0, 1]d~
To extract the asymptotics of m. from (3.3) and (3.4), we will first prove that
the sequence m. has a useful monotonicity property, specifically,
(3.5) nmll 5 (n + 2")Mn.
To verify inequality (3.5), we sum inequality (2.9) over 1 i :5 n to get
M(Xl, X21 1 1 Xn) .s nM(Xl, X2 1 X0

(3.6) n
+ F, F, 2aiXi  XjIa.
i1 jeN(i)

Next we note that the double sum exactly equals 21 + W(Xl, X2,, Xn), so taking expectations in inequality (3.6) completes the proof of inequality (3.5).
The way we will use inequality (3.5) is through the fact that it implies

(3.7) n k M,>_(nlYMn1 foralln~1,

provided k ~> 21+ a.
The last tool required is the following differentiation lemma of Hardy and Littlewood [see, e.g., Widder (1946), pages 193194].

1776 J. M. STEELE

LEMMA 3.2. If f(x)  Axy as x  0 and I"(x) = O(X72) as x  0, then
f'(x) ~ yAx7` as x  0.

We can now assemble the pieces to prove the main result of this section.

LEMMA 3.3. If Xi, 1 5 i < oo, are independent and uniformly distributed on
[0, 1]d and M. = EM(Xl, X2,...' X,,), then there is a constant c(a, d) such that

(3.8) m.  c(a, d)n(do')ld as n > oo.

PRoon By (3.3), (3.4) and the change of variables tl = u, we have

(3.9) E me"u'lnl ~ c(a, d)u(d~aVd as u  oo.

Taking the Laplace transform of the left side of (3.9) and applying the Abelian theorem for Laplace transforms [Widder (1946), page 181], we obtain for A  0 that

E MJ1 + X)" ~ c(a, d)F(2  ald)X2+..ld

or, equivalently,

(3.10) F, me ~ c(a, d)r(2  ald)X2+ald as x  0.

From Lemma 2.2 we.know m. = 0(n(d'Vd) and, by the EulerMaclaurin formula, we have E',,_Onlle' ~ x"']P(y + 1) as x  0; so, for all k,

(3.11) E nkmne = 0(xk2+ald) as X  0.

Finally, by the HardyLittlewood differentiation lemma applied successively beginning with (3.10) and using (3.11), we find for each integer k as x  0 that

nkm.e` ~ c(a, d)r(2  a/d)(2  ald)(3  a/d)
(3.12) no
+ 1 ~ ald)xk2+ald.

By the substitution CII y, we have as y  1 that

E nk MnY' c(a, d)F(2  a/d)(2 ald)(3  ald)
(3.13) nO

... (k + 1 _ ald)(, Y) k~2+./d.

The Karamata Tauberian theorem [see, e.g., Feller (1971), page 4471 applied to


(3.13) now gives us information about the partial sums of the nkMn:

E nkm,, ~ c(a, d)I7(2  ald)
(3.14) n1
X F1 (1 +i  ald)Nk+2 aldIF(k + 3  ald).

The point of this maneuvering is that the series (nAlmJ is increasing by (3.7) and, by a wellknown lemma [cf. Apostol (1976), page 280], the monotonicity of the terms of a partial sum justifies carrying over the asymptotics to the individual terms, i.e.,
(3.15) n Mn ~ c(a, d)n+

On dividing by nk, the proof of Lemma 3.3 is complete. 11

The repeated differentiation technique applied previously can  be used in many problems where one needs to exploit an approximate Monotonicity such as expressed by inequality (3.5). The passage from the asymptotic relation (3.3) to (3.15) can be made a bit more quickly by using a more sophisticated Tauberian theorem, but building the derivation on Karamata's theorem is simple enough. One benefit of the path chosen is that the structure of Mn is drawn out a bit more fully through the development of inequality (3.7). A third method for proving (3.8) can be based on (3.7), the fact that Mn = 0(n(da)/d) by Lemma 2.2 and direct estimation of the Borel sum in (3.9).

4. Variance bounds. Efron and Stein (1981) established the useful fact that Tukey's jackknife estimate of variance is conservative in expectation. Together with the geometric lemmas of Section 2, this fact will provide effective bounds on Var M..
Let S(Xl, X21 .... Xn1) denote any symmetric function of n  1 vectors xi G= Rd. If X,, 1 < i < n, are independent identically distributed random vectors in Rd , we define n + 1 new random variables by
Si=S(Xl,X2, ,Xil,Xi+,,.,X.) and ~mn 1 si.

The EfronStein inequality says

(4.1) VarS(Xl, X2,. Xn1) t~ E (Si _ g)2.

Since the right side of (4.1) is not decreased if 9 is replaced by any other variable, we can apply (4.1) in the MST problem to obtain
VarMn1 E (Si  M(Xl, X21 ... 1 XJ)

1778 J. M. STEELE

For Xi that are independent and uniformly distributed on [0, 11d ' it will be easy to bound the preceding sum.
To avoid concern over irrelevant constants, we use the Vinogradov symbol an ,*, b. to denote that a. .5 Cbn for some C not depending on n. By Lemma 2.3
2 2
and the bound (max(a, b))' :5 a + b , we have
n n 2
IX. _ X.12a + E F 1Xi  Xjj)
Var Mn 1 i1 j4rN(i)

By elementary calculus, we can show for any 0 < p < oo that E minj. jo Axi
.X.JP << n fild 80 the first SUM is Majorized by n' ~2a/. Applying Schwarzs j inequality to the second sum, yields a total bound
2a IX
VarM,,, c n' 1d + E j _ Xi12a.

Since the last sum equals 2EE.,=7,iei2o, where T is a minimal spanning tree of (X,, X2,, XJ, Lemma 2.2 shows that we have completed the proof ~of Lernma 4.1.

LEmmA 4.1. If Xi, 1 :!~ i < oo, are in&pendent and uniform on [0, 1]", then
(4.2) VarM,, W2"/d for 0 < 2a < d,
(4.3) Var M,, log n for 2 a = d
and for each a, d/2 < a < oo,
(4.4) Var Mn is uniformly boun&d for all n 2: 1.

To move now toward a proof that n (da)ld(Mn  Mn) converges almost
surely to 0, we set up a subsequence argument. For a real number A > 1 which
will be determined later, we define a subsequence of integers nk by letting
nk = [kl] and we note by Umma 4.1 that we have the set of three inequalities:
(4.5a) Var(Mjn (da)ld) n ` for 0 < a < d/2,
(4.5b) Var(M./n(dci)/d) log n/n for a = d/2,
(4.5c) Var(MnIn(da)1d) c!K n2+2a/d for d/2 < a < d.
Under each of the conditions of (4.5a)(4.5c), we see for 0 < a < d that

E Var{M.,,/n~ o,,1d} < 00,

provided also that X(2 + 2a/d) < 1, i.e., provided X > 1(1  ald)'. For
such a value of X, we see by the usual BorelCanteW argument that A.1%nk (da)ld is asymptotic to MnA with probability 1 as k  00.
To push this relition toward one valid for the full sequence of integers, we will
bound the variability of M. as n v aries through the intervals [ nk, nk + 1), i.e., we


bound Vk = maxnA,~gnIMn  M%J ,, fl"(ci, d)i (X.+,, Xn+22 ... Xnh+jj(da)1d << k(X1)(d~a)1d.

Hence we have with probability 1 that

(4.6) Vkln~'a)ld << k(da)ld.

From the identity

Mnn (da)ld  M,,,n h (da)ld

(Mn  Mn> (d~a)ld + MJn(da)1d  nk (da)ld

the bound (4.6), the fact that MnA' ,*~ n~a)/d and n(da)ld n k (d a)/d
k~'n, (do,)/d, we have the relation

(4.7) IMnn (d,x)ld  Mnhn k (da)Idl .,~ k (da)ld + k

Finally, inequality (4.7) and the fact that Mn  c(a, d)n5~a)ld with probability 1, complete the proof. Our result is summanized in Theorem 2.

THEoREm 2. If Xi, 1 , i < oo, are uniformly distributed on [0, 1]d and M,, is the length of the MST of {Xl, X21 ... 1 X0 using the edge weight function ~p(x) = x* with 0 < a < d, then there is a constant c(a, d) > 0 such that with
probability 1,

(4.8) Mn ~ c(a, d)n (d,,~ld as n > oo.

This theorem has been proved except for showing c(a, d) is strictly positive. To see this, note that each Xi is connected to some X., and hence

1 n
M. k niin{IXj  Xile': 1 .5j ~ n, i 0 i}.

Since it is easy calculus to show there is a c > 0 such that

E min (1Xj  Xil 0: 1 _.5 j ..5 n, i ~t cn  ald,
the proof is complete.

5. General extension principle. We now show how the result just obtained for uniform distributions can be extended to any bounded distribution. The main observation is that the minimal spanning tree functional has enough continuity to permit approximation of samples from a general distribution to be replaced by samples from a simpler clam of distributions whose asymptotic analysis follows easily from the results for uniformly distributed variables. This approximation is made easier by using an almost sure embedding technique based on the following property of marginal and joint distributions.

1780 J. M. STEELE

LEmMA 5.1 [Strassen (1965fl. Suppose P and Q are probability measures on a bounded subset of R d and suppose also that there is an e > 0 such that P and Q satisfy P(F) < Q(F) + e for all closed F. There is then a probability measure
on the product space R d X R d such that

A(., Rd) A(Rd') = Q() and A{(X, y): X * Y} .5 6.

For an elegant combinatorial proof of this lemma based on Hall's matching theorem, one can consult Dudley (1968 or 1976).
The proof of Theorem 1 begins by establishing the limit result for a special class of distributions which we call blocked distributions. These are probability measures on [0, 1]d with the form g(x) dx + dg, where g(x) = Din lailQ., the measureju, is purely singular and the Qi, 1 < i < m < oo, are disjoint cubi with edges parallel to the axes. Here we recall that a measure p, on [0, 1]d is called a purely singular Measure if g.([0, 11 d) = g,(A) for some measurable A of Lebesgue measure 0. The next result points out a continuity condition which suffices to carry asymptotic results for the class of blocked distributions to the class of bounded distributions.

THEOREm 3. Suppose that there is a constant B not depending on n such that S(Xl, X21 .... Xn) satisfies the continuity condition
(5.1) IS(Xl, X21.  Xn)  S(x', x x')1 < BI {i: X, * X~}1(da)ld.
1 2 n a

Suppose also that for every sequence of i.i.d. random variables {XJ1,j, distributed with a blocked distribution it it. + g(x) dx, we have with probability 1 that

(d~.)ld fg(X)(d.)1d d
(5.2) S(Xl, X2,, XJn c(a, d) X.

One then has that with probability 1,

(da ft(X)(da)Idd
S(Xl', X21,, X,',)  n )Idc(a, d) X,

whenever {Xi'} are independent and identically distributed with respect to any probability measure on [0, 1]d with an absolutely continuous part given by 1(x) dx.

PROOF. If the X,, are distributed according to f(x) dx + ju., ivhe,~, g,. is singular, we take an approximation g.(x) dx + It, where gn(x) _i_l. ilQ.* Here, each Qi is one of the subcubes obtained by partitioning [0, 1]d into jd parts and the ai are defined by ai = fQ. f(x) dx. It is a traditional exercise to show that for such ai we have f lgn(~)  f(x) 1 dx  0 as m  oo. Finally, defining measures P and Q by

P(A)=ff(x)dx+t&.(A) and Q(A)=fg.(x)dx+ju.(A),

we have IP(A)  Q(A)l :!. fA if(x)  g.(x)l dx :!5 e for all m;> mo(e) and e > 0.


By Lemma 5.1, one can therefore define a probability measure A on R d X R d with marginals P and Q such that A{(x, y): x * y} ,~ c.
We now define a sequence of random vectors in R d X R d by (Xi, Xil),
1 :5 i < oo, where (Xi, Xj') is the ith vector of an independent sequence of
random vectors with distribution given by the measure A. By the law of large
numbers, we have with probability 1 that
(5.3) 1 {i: Xi * Xj'}1  nAffix, y): x * y})
and by the bound on the offdiagonal probability mass we have nAQ(x, y): x * y}) < en. By conditions (5.1) and (5.3), we see

limsupn~ (du)ldlS(Xl, X2, ' X.)  S(Xl,' X2 X.1)l :T, Be(da)1d,
whence by (5.2) we have almost surely
limsupS(Xl, X2,..., XJn(da)1d
n cc

Be (da)ld + lim sup S(Xl', X2 X;)n (do~ld
n oo

Be (da)ld + C(ap d) fg.(X)(d a)ld C1X.
Since m and e > 0 are arbitrary, we consequently have

limsupS(Xl, X2,..., X.)n(du)1d < c(a, d) ff(X)(du)ld dx a.s.

Finally, since the limit inflmum can be dealt with in exactly the same way as the preceding Emit superior, the proof of this theorem is complete. Q

We now apply Theorem 3 to minimal spanning trees. By Lemma 2.5, inequality (5.1) is satisfied by the minimal spanning tree function~l, so it only remains to verify the relation (5.2).
We consider a blocked distribution g(x) dx + it. and let E denote the support of g.. We will also denote Lebesgue measure by X(.) or by dx, according to convenience. Since the Lebeisgue measure of E equals 0 and since g is constant on a set of subcubes, we can find for any e > 0 a partition {Qi}i r
= I of [0, 11d into subcubes such that the following properties hold:

(5.4) on each Qi, i r= I, g(.) is constant,
(5.5) forg, i (= 1, 1 = X(Qjl/d <
E c A u B, where A and B are disjoint, X(A) = 0,

(5.6) P(Xl r= A) = tt.(A) e,
B U Qj, where X(B) = X(Qj) ..5; e.
jr=J jeW

We now set C = [0, 1]d  (A U B) and note that by tying together the minimal
spanning trees of these sample points which lie, respectively, in the sets in A, B

1782 1M.STEELE

and C, we have
M(Xl, X2,, X.) :9 2da/2 + MQXi: Xi E= A})

+M{ Xi: Xi E= B} + MQXi: Xi c= C}),
where the first tertn 2d0/2 denotes the maximal cost needed to unite the three
trees. Since P(Xi E= A) < e, the strong law of large numbers and Lemma 2.2 give
the bound
(5.7) limsupMQXi: Xi C= A})n da)ld:!~ Po(a, d )e(da)ld.

For the sample in the set B, we use a more geometric bound. Specifically, we apply Lemma 2.2, rescaling and H61der's inequality to obtain
MQXi: Xi r= B}) :5 IJIda12+ E MQXi: Xi c= Qj})

(5.8) :5 IJIda12 + .8'(ci, d)la E 1 {Xi C Qj} l(da)1d
.5 1JJd212 + Pl(a, d )laijialdn(da)ld.

By the last bound given in (5.6), we have E,. jX(Qj) 1~ e; so by the definition of 1, We have ijild..~
_ e and inequality (5.8) finally yields
limsuPMQXi: Xi E= B})n (da)lde
n~oo _, #'(a, d

Writing J for the complement of J in I, we now handle the more substantial
contribution due to the. points in C. We write Qj  A and note C
Ui CE J14P so
(5.9) MQXi: Xi 6 C}) :5 d"121Xl + E M({Xi: Xi E= Qj}).

Setting yj a fqg(x) dx = P(Xi c= Qj), we note g is constant on each member of
the partition {Qjj, I so we can write g(x) = V j .1d for all X in Qj.
Conditional on the event {X, C= Qj} the random variable X, has the uniform distribution on Q,, so scaling, Theorem 2 and the law of large numbers tell us that with probability 1,

(5.10) M({Xi: Xi E Qj}) ~ c(a, d)11 {Xi: X, E= Qj}1(d)1d

~ c(a, d)lc'(yjn)(d")/d.

When we sum the previous. bounds on the contributions of A and B and the contribution expressed by (5.10), we have

limsupM(Xl, X2,., X.)n~(da)1d
{e(d~o,)ld + eald} y"(da)ldld.
#'(a, d) + c(a, d) '


The arbitrariness of e > 0 and the bound Ej r= jX(Qj) ~ e can now be applied in (5.11) permitting us to conclude that for any blocked density g(x) dx + M, we have the upper bound

(d a)ld
fg(X)(da)1d d
(5.12) limsupM(Xl, X2,, X.)n c(a, d) X.
n oo
The corresponding lower bound on the limit inferior is a bit more subtle. For A, B and C defined as before, we have by Lemma 2.5 that
M(Xl, X21, X.) ~: M(R: Xi E= B U CD

#"(a, d)l {Xi: Xi E= A} 1(da)ld.

Now, given a minimal spanning tree T of {Xi: Xi E= B U C}, we take X to be the set of elements of {Xi: Xi E= C} which are joined by an edge of T to an element of {Xi: Xi EE B}.
The minimal spanning tree of {Xi: Xi (= B U C} together with a minimal spanning tree of X will contain a spanning tree of the set {Xi: Xi,r= C}, so we have the crude bound

(5.14) MQXi: Xi FE C)) j~ MQXj: Xj E2 B U C}) + M(X).
To bound the cardinality IXI, we note that for each element of X there is either (1) an endpoint of an edge of length greater than y or (2) a point of C within a distance y of B. Thus

(5.15) 1Xl 25 PAY) + J{Xi: Xi EE C ., miniXi  wi 5 y)
Since C is disjoint from B and the singular support of {XJ, we have the bound

P(Xi E= C and inin 1Xi  ol :5 y) ..~ 11g11. JJ1 {(1 + 2 Y)d _ 1d

where 11g11. = supig(x)l < oo. Also, by (5.6) We have Ej.JA(Qj) = ldiji !~ e' SO the preceding bound gives us for 0 < y < 1/2d that

(5.16) P(X, r= C and miniXi  tol , y) < 2eligil.yl'ed,
where we have used 1 + x < e' .5 1 + ex for 0 < x .~ 1.
By Lemma 2.2 applied to M(X) and by the law of large numbers applied to inequalities (5.15) and the probability bound (5.16), we have

(5.17) limsupM(x)n (da)ld.,~ 8"(a, d)(2eligil.ylled )(d.)1d.

Also by (5.13) and (5.14), we have
(5.18) M(Xl, X21 ... 1 Xn) > M(R: Xi (= CD  MW
fl"(ci, d)i {Xi: Xi (= A} i (d a)ld.

The last two terms have already been bounded; so as before, our problem is reduced to calculating the contribution from C.

1784 J. M. STEELE

To begin, take a spanning tree T of {Xi: Xi E= C} and, for each j c J, let Dj denote the set of edges e E= T such that both end points of e are in Qj. Also, let X j denote the set of points in QJ which are joined by an edge of T to a point in the complement, Q~
Since Dj together with a MST of Xj will span {Xi:
Xi E= Qj), we have

(5.19) MQXi: Xi E= Qj}) jel + #'(a, d)1,1Xji(d.)1d,

which, after summing over X, yields
F, M({xi: Xi C= Qj}) ,~ MQXj: Xj E C})
(5.20) joEJ F
+#'(a, d)P' ' IXjl(da)1d.

To handle the last sum we note as in (5.15) that

(5.21) E JXjI . vd(y) + F, Xi: Xi E2 Qj, min Ji .wi 5 y~
jr=JI jc=XI{ W r= Qq
and, as in (5.16), we have for 0 < y < 1/2d that

(5.22) P(Xi G Q, and min ji  wi ..5 y) 1~. 2eligil.yld~ld.
0 r= Qic
By (5.21), (5.22), H61der's inequality and the law of large numbers, we have
liMSUp( E IXjI'da)1d)n~(da)1d
(5.23) n oo jer
< IXI(da)ld(2ellgil.yld~ld ) (da)ld.

Now, since IXIld:5 1, inequality (5.23), inequality (5.20) and the fact that y is arbitrary except for the constraint 0 < y < 1/2d, we can take yl 1 to be as small as we like and conclude

liminfMQXi: Xi E= C})n(d")1d
(5.24) > liminf F, M({Xi: Xi r= Qj})n(da)1d.
n oo jr=X Finally, for each Qj, relation (5.10) says the limit of MQXi: Xi (= Qj})n(da)1d exists and equals c(a, d)y,(de,)/dld ' so summing (5.24), we have
(da)ld C f g(X)(da)ld d
(5.25) lim. inf MQXi. Xi EE C}) n d) C
n oo

Since the Lebesgue measure of C is at least 1  e, inequality (5.25) completes the proof of a lower bound on the limit infimum which complements the bound on the limit superior given in (5.12). We have therefore completed the proof of Theorem 3 under the restriction that ~(x) = Ixia. Theconclusion of the proof of


Theorem 3 under the more general assumption that 0(x) ~ xa as x* 0 comes from Lemma 5.2.

LizmmA 5.2. If xi, 1 < i < oo, is any sequence of points in R d such that for 0 < a < d, we have M(Xl, X21 .... Xn)  oo as n oo, where

M'(Xl, X21 ... 1 Xn)  Mn jela.
T eeT

Then if ip is monotone and 0(x) ~ x a as x  0 and
M(Xl, X2,'. .,X.) = min E 0 (Iel),
T eeT we have

M(Xl, X2 1, Xn)  M~'(Xl, X21 ... 1 Xn) = 0 (M'(Xl, X21 ... 1 XJ) as n  oo.

PRoop. If we choose the edges of the two minimal spanning trees of (Xl, X21 ... 1 Xn} under weight functions 0(x) and Ixia by using Prim's algorithm, we will obtain the same tree T because of the monotonicity of 0. The absolute difference of the tree weights is therefore given by
4. = 1 E 0(lel)  E jel 1 = E e(iei)ieia,
er=T eeT eeT
where e(x) = 10(x)  xaix" and e(x)  0 as x  0. Since
(5.26) An < sup e(x)F, jeja +sup e(x)E lela,
0leisS jel>8
we see the second sum of (5.26) is bounded independently of n by inequality (2.7)
of Lemma 2.2 and the first sum of (5.26) is bounded by
Sup e(x)m(xl, X21 ... 1 Xn).
X:5 8

By our choice of 8 > 0, we can make supx,,5 se(x) as small as we like, so we see that (5.26) implies A. = o(M'(xl, X21 ... 1 X,)), as claimed. E]

6. Concluding remarks. The results given here concern the almost sure convergence of a sequence of normalized random variables to a constant c(a, d). The natural questions associated with such a result are: 1. can the constant be determined? 2. Can the basic strong law be supplemented with a result which provides information about the rate of convergence? 3. Can the strong law be Supplemented with distributional results such as a central limit theorem? Concerning the value of the constants c(a, d), there are some analytic bounds and Monte Carlo estimates, but there is little hope for an exact analytic

1786 J. M. STEELE '

determination. Gilbert (1965) proved that c(l, 2) :5 2  1/2  0.707 and cited experimental evidence that suggested c(l, 2) = 0.68 as a good approximation. Roberts (1968) proved that 0.5 < c(1,2) and 0.554 _.5 c(1,3) < 0.698. Through a more extensive Monte Carlo analysis, Roberts (1968) estimated that c(l, 2) 0.656 with a standard deviation of 0.002 and c(l, 3) = 0.668 with a standard deviation of 0.002.
There are two issues concerning the possibility of more detailed information on the rate of convergence. First, what can be said about the rate of convergence of the normalized means m,,n (da)ld to c(a, d)? Here experience in the area of subadditive processes suggests that progress is unlikely. On the other hand, several approaches are likely to give more detailed information about the asymptotic size of M,,  EM, For example, the EfronStein inequality can be used iteratively as in Steele (1981b) to obtain moment bounds on M. ~ EM, e.g., with 0 (x) = 1x 1 and d = 2, one can show that for each 1 :!~ p < 00 there is a constant BP < oc such that

(6.1) E(M.  EM.)p :!~ BP

for all n ~ 1. Even stronger bounds on the tails of M.  EMn should be possible using the inequalities of Section 2 together with the interpolation technique of Rhee and Talagrand (1986), which has proved remarkably effective in the traveling salesman problem.
Concerning the possibility of a central lin* theorem, there is encouraging progress owing to Ramey (1982), who proved that a central limit theorem would follow if one could verify a certain complicated Ansatz which expresses a type of conditional independence between distant parts of the minimal spanning tree. The required Ansatz isanalogous to results in statistical mechanics which have been rigorously justified, but so far there is no complete proof of a central limit theorem for nfinimal spanning trees.
Outside of this set of classical questions, there is an intriguing and delicate problem raised by Robert Bland. For Xi, Li.d. U[O, 1]', Bland observed empirically that the SUMS E.,.Tiei' seem to converge as n  oc. Bland's observations and conjecture provided basic motivation for the present analysis of power weighted minimal spanning trees, and it is intriguing that the case y = d seems particularly resistive. This conjecture bears an interesting relation to the result of Frieze (1985) concerning the weight Mn of the MST for the graph with edge weights chosen independently from the uniform distribution on [0,1]. Frieze proved that Mn converges in probability (and expectation) to the constant ~(3) = 1.202... . An earlier result of Thnofeev (1984) only established that EM. . 3.29, but the method of Timofeev is still intriguing because of its generality.

Acknowledgments. Persi Diacoms, Ingram Olkin, Dan Ramey, Geoffrey Grimmett and Joe Marhoul have been kind enough to comment on the Stanford Technical Report 212 which focused on 0(x) = 1xl and which formed the initial basis for part of this report. The motivation to study the problem of power


weighted edges was provided by Robert Bland. The present exposition has been
improved by comments provided by Timothy Snyder and the referees. REFERENCES

ApoSTOL, T. M. (1976). Introduction to Analytic Number Theory. Springer, New York.
BEARDWOOD, 1, HALToN, H. J. and HAmmzRsLEY, J. IYL (1959). The shortest path through many points. Proc. Cambridge Philm. Soc. 55 299327.
DUDLEY, R. M. (1968). Distances of probability measures and random variables. Ann. Math. Statist. 39 15631572.
DUDLEY, R. M. (1976). Probabilities and Metrics, Convergence of Laws on Metric Spaces, with a View to Statistical Tesfing. Lecture Note Series 45 18118.7. Maternatisk Institut, Aarhus Universitet. 
EFRoN, B. and STEIN, C. (1981). The jackknife estimate of variance. Ann. Statist. 9 586596.
FELLER, W. (1971). An Introduction to Probability Theory and Its Applications 2. Wiley, New York.
FRiEzz, A. M. (1985). On the value of a random spanning tree problem. Discrete Appl. Math. 10 4756.
GiLBERT, E. N. (1965). Random minimal trees. J. Soc. Indust. Appl. Math. 13 376387.
GiLBFRT, E. N. and PoLLAK, H. 0. (1968). Steiner minimal trees. SIAM J. Appl. Math. 16 129.
KRuSKAL, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7 4850.
MANDELBRoT, B. B. (1977). Fractals, Form, Chance, and Dimension. Freeman, San Francisco.
PAPADIMITRIOU, C. H. and STEiGuTz, K. (1982). Combinatorial Optimization: Algorithms and Comp~. PrenticeHall, Englewood Cliffs, NJ.
PRim, R. C. (1957). Shortest connection networks and some generalizations. Bell System Tech. J. 36 13891401.
RAmEY, D. B. (1982). A nonparametrie test of bimodality with application to cluster analysis. Thesis, Yale Univ.
RHEz, W. T. and TALAGRAm, M. (1986). Martingale inequalities, interpolation, and NPcomplete problems. Math. Oper. Res. To appear.
RoRERTs, F. D. K. (1968). Random minimal trees. Biometrika 65 255258.
STFELE, J. M. (1981a). Subadditive Euclidean functionals and nonlinear growth in geometric probability. Ann. Probab. 9 365376.
STEELE, J. M. (1981b). Complete convergence of short paths and Karp's algorithm for the TSP. Math. Oper. Res. 6 374378.
STzELp., J. M. (1982). Optimal triangulation of random samples in the plane. Ann. Probab. 10 548553.
STEELE, J. M. (1986). Probabilistic algorithm for the directed traveling salesman problem. Math. ~. Res. 11343350.
STRASSEN, V. (1965). The eidatence of probability measures with given marginals. Ann. Math. Statist. 36 423439.
TImopEEv, E. A. (1984). Random minimal trees. Theory Probab. Appl. 29 134141.
WIDDER, D. V. (1946). The Loplace Transform. Prineeton Univ. Press, Prineeton, N.J.