Probability in the Engineering and Informational Sciences, 2, 1988, 143156. Printed in the U.S.A.

PROBABILISTIC ANALYSIS OF

A GREEDY HEURISTIC FOR

EUCLIDEAN MATCHING

DAVID Avis'

School of Computer Science

McGill University

Montr6al, Qudbec

H3A 2K6, Canada

BURGEss DAVIS2

Department of Mathematics and Statistics

Purdue University

West Lafayette, Indiana 47907

J. MICHAEL STEELE 3

Program in Statistics and Operations Research

Princeton University

Princeton, New Jersey 08544

Given a collection of n points in the plane, the Euclidean matching problem

is the task of decomposing the collection into matched pairs connected by line

segments in such a way as to minimize the sum of all the segment lengths. The

greedy heuristic provides an approximate solution to the Euclidean matching

problem by successively matching the two closest unmatched points We study

the behavior of G, the sum of the lengths of the segments produced by the

greedy heuristic. In particular, it is proved that if the points are realized as n

independent observations from a common distribution with compact support,

then Gnl~n converges to a constant with probability one.

'Research support by N.S.E.R.C. Grant A3013 and F.C.A.C. Grant EQ1678.

2Research support in part by NSF Grant DMS8500998.

'Research supported in part by NSF Grant DMS8414069.

@ 1988 Cambridge University Press 02699648/88 $5.00 + .00 143

i

1

144 D. A vis, 8. Da vis, and J. Michael Steele

1. INTRODUCTION

For any set 1 Xl, X2, . . , X, 1 of n points in the Euclidean plane, a matching is a collection of [n121 disjoint pairs of points. By the weight of a matching we will denote the sum of the Euclidean distances between the elements of the pairs in the matching.

The best known methods for finding a minimal weight matching are implementations of Edmonds' algorithm and have running times proportional to n' (see, e.g., Papadimitriou and Steiglitz [101). Because of the relative slowness of the Edmonds' algorithm, substantial attention has been given to heuristic methods for obtaining matchings which are almost optimal (see, e.g., Avis [21, Iri, Murota, and Matsui [71, Reingold and Supowit [121, and Reingold and Tarjan [131).

One particularly appealing heuristic is the socalled greedy algorithm which successively matches the two closest unmatched pairs of points. Even naive implementations of the greedy heuristic are faster than Edmond's algorithm, but by making use of specialized data structure one can do much better. In particular, Bentley and Saxe [41 provided an implementation of the greedy algorithm which has a worst case running time of 0(n 3/2 log n).

If G, and OPTn denote the respective weights of the greedy and optimal matchings of an nset f xi, X2,.. . , Xn 1 C R2, there are several known relations between Gn and OPT, (see Avis [1]). Among the most interesting is the ratio bound of Reingold and Tarjan [13],

G,/OPT,:5 (413)n 10921.1.

By providing an explicit construction, Reingold and Tarjan [131 also established that inequality (1.1) cannot be improved. Still, as ' one might suspect, (1.1) is often too pessimistic, and in many problems one finds that G, and OPT, are of the same order.

In Papadimitriou [ 11 ], the behavior of OPT, was studied for points ( Xi 1 which were chosen at random from [0,112 , and that paper outlined how the technique used in Beardwood, Halton, and Hammersley [31 to study the traveling salesman problem, could be modified to prove

OPT, ~ COPTJn (1.2)

with probability one.

One consequence of the main theorem proved here is that one also has

Q, ~ CG'1n

with probability one. The result to be proved provides convergence which is stronger than convergence with probability one and deals with random variables with general distributions in Rd.

THEOREM 1. 1: For each integer d > 2, there is a positive constant kd such that if X,, X2.... are independent and identically distributed random variables

i i

i i i i i i

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 145

with values in R' and bounded support, and if G, denotes the Euclidean edge weight of the matching attained by the greedy algorithm applied to t X,, X2 X, 1, then given c > 0

P n 0dWd Gn kd f(x) (d1)1ddX > E < 00.

n=l C fR

Here, f is the density with respect to ddimensional Lebesgue measure of the absolutely continuous part of the distribution of the Xi.

There are several observations one should make concerning Theorem 1. 1 before attending to the proof. First, one should note that since the Xi have compact support, we always have

f f(x) (d 1) 1d dx < cc.

Second, the BorelCantelli Lemma applied to the conclusion of Theorem 1. 1 implies that n (1d) IdG,, converges with probability one. The stronger version of convergence guaranteed by Theorem 1. 1 is called complete convergence, and it is precisely such convergence that is most relevant to the probabilistic analysis of algorithms like the partitioning method introducing by R.M. Karp for the traveling salesman problems (cf, Karp [81 and Karp and Steele [9]).

The third observation concerns the distributional assumptions on the f Xi 1. If the support of the 1 Xi 1 is purely singular (i.e., P(Xi EE A) = 1, where A is a set of Lebesque measure zero), then f is identically zero and Theorem 1. 1 tells us that G, = o (n (dl)ld) with probability one. Also, there is a small technical point which should not be ignored. If the (Xi 1 have a nontrivial singular part and a nontrivial absolutely continuous part, then the associated density f is not a probability density, since it does not integrate to 1.

Finally, since the essential ideas behind the proof of Theorem 1. 1 are already present when d = 2, we will restrict ourselves to that case in order to keep the notation uncluttered.

2. PRELIMINARY LEMMAS

Several nonprobabilistic results about the greedy matching algorithm will be needed. The best of these, Lemma 2.4, spells out a basic combinatorial smoothness of the greedy functional, but first we need some bound on the lengths of the edges chosen by the greedy heuristic.

LEmmA 2. 1: There is a constant c, such that for any square S of side s and any n points [xl,x2,...,x,,) C S, we have Ixi xjI < clsn 1/2 for some < j :5 n.

PROOF: If each xi were covered by a disc with center at xi and radius r, and, if all of the discs were nonintersecting, then each disc would cover at least

146 D. A vis, 8. Da vis, and J. Michael Steele

irr'14 of the square. The total area of the square covered by the discs then

would be at least n7rr'14, hence we have nirr'14 :!~ 1 and establish the lemma

with a crude cl = 41r 1/2 (see also Few [61). a

LEMMA 2.2: There is a constant c2 such that any k edges of a greedy matching which lie completely in a subsquare S of [0, 1] 2 of side s will have total weight at Most C2Ck S.

PROOF: List the edges el,e2,. . . ek which lie in S in the reverse order from

which they were chosen for the greedy matching. Note for any 1 :5 j < k that

when edge ej was chosen for the greedy matching, there were at least 2(k j) of

the Xi, namely, the endpoints of the rest of the ei, which were yet unmatched in

S. Thus, by Lemma 2. 1, the length of ej is bounded by cl s (,f2 (k j) + 2)1,

k

and the lemma follows, since E (~2 (k j) + 2) ' < 2 Ck.

j=1

LEMMA 2.3: For any 6 > 0, a greedy matching of [ xl, x2_ ., x, 1 C [0, 11' has at Most C3 62 edges with length as great as 6.

PROOF: If there are r edges as large as 8, then the total length of these edges is at least 7.8. Now, by Lemma 2.2, this total length of any 7. edges of the greedy

/2 C2/32, :5 C2. a

matching is bounded by c2i' . Hence, we have r :s 2 giving C3 2

We now have the tools in place to prove the main result of this section. LEMMA 2.4: There is a constant c4 such that 1 G (xl, x2,. . . , x,,) G (xl, x2, .. 9Xn+k)l :5 C4Ck.

PROOF: First let el, e2,. . . , e, be a list of edges of the complete graph on 1 X, 3 X2,. . . , Xn+k 1, and suppose that the ei are listed in order of increasing length. An algorithm which simultaneously and inductively constructs a greedy

matching A Of t Xl, X2,. . . , X~ 1 and a greedy matching B of [ xl, x2 X,,

can be given as follows: for i = 1 to s, do

(i) if ei has no endpoint in common with the edges of A, add ei to A,

(ii) if ei has no endpoint in common with the edges of B and no endpoint in D = ( Xn+ 1, Xn+2,. . . , Xn+k 1, add ei to B.

We now need to recall some of the structure which comes with a union of

two matchings like A and B. In the first place, an edge can be placed in both

A and B (such edges will be called double edges of A U B). Also, the union

of A and B may have an important structure called an alternating path. Here,

an alternating path is a path whose edges are alternately in the matchings A and

B. For example, the path through the vertices V1, P2, . ., Vk would be called an

alternating path in E A for odd i and (;,i, pi+l) E= B for even i (or vice

i

i i

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 147

versa). By default, any path having only one edge will be considered as a bona fide alternating path. (Alternating paths are a basic tool of matching theory, see, e.g., Bondy and Murty Q51, pp. 7079.)

In order to make the structure of A U B explicit, we will verify by induction the following hypothesis.

At iteration i, all nondouble edges of A U B are either

(a) edges with an endpoint in D, or

(b) edges which belong to at least one alternating path terminating at a vertex in D. Moreover, this alternating path has edges with monotone decreasing length as one traverses the path toward the edge which meets D.

We will now verify the induction process. The hypothesis is valid for i = 1, so we assume the hypothesis is valid for i and consider edge ei, I which we can assume without loss to be a nondouble edge contained in A U B. If ei,l has an endpoint in D, condition (a) holds and the induction hypothesis is verified for i + 1. We are left with two cases:

Case 1: Suppose ei, I e A but ei, I §~ B, and ei, I has no endpoint in D. Since ei+l (Z B, one endpoint of ei+l must meet an edge ej, j :5 i, which is already in B but which is not in A. Since j < i and the edges are ordered, we have 1 ej 1 :s 1 ei 1 . Also, by the induction hypothesis, ej belongs to an alternating path of A U B having monotone edge lengths and terminating with an edge which has a vertex in D. Therefore, in this case, the induction hypothesis is verified.

Case 2: This concerns the possibility that ei+ l e B but ei+ l 0 A and ei+ l does not meet D. The induction step is identical to that of Case 1.

To complete the proof of the lemma, we have to bound the absolute value

of the difference Y, 1 e 1 E 1 e Since the edges which are in both A and

eE~A CEB

B will cancel, we only need to estimate the difference over those edges which belong to an alternating chain (of length k 1) with termination in D. If C is any such chain, we have that the difference

AC= E jel E jel

er=Anc er=gnc

is bounded in absolute value by max e e E= C), since the edge lengths of the edges in C are monotone. Moreover, each chain C must end in D and each vertex of D is in at most one edge of A, so the total number of alternating chains C is at most 1 D 1 . Since no two chains share an edge, we have that the sum E 1 Ac 1 is bounded by the sum of M 5 j D 1 = k edges. If MA of these c edges belong to A and MB belong to B, then we see by Lemma 2.2 that

148 D. A vis, 8. Da vis, and J. Michael Steele

E 1 AC 1 :5 C2 (MA112 + M8112). (2.1)

c

Since MA + MB < k, we see that Eq. (2. 1) completes the proof with C4 = 2 C2. E

With this basic lemma established, we are in a position to relate local properties of the greedy matching to global properties of the matching.

In the subsequent analysis, it will be essentialto know how breaking the unit square Q = [0,1]' into m' equal subsquares Qi, 1 < i:5 M2, changes a greedy matching. For any set K C [0, 112, we let aK denote the boundary of

M2

K, i.e., the points in the closure of K and K'. Next we let F U aQi aQ, i=1

i.e., F is the interior grating of the partition of Q given by the Qj. For 6 > 0, we will let F' denote the set of points of Q which are within 6 of F, i.e., F' is an interior grating thickened by 3. Similarly, we let Q,~ = Qi P.

The same method used in Lemma 2.4 can be used to relate the greedy matchings on the decomposition of Q. By simultaneously constructing all m 2 + 1 matchings of the Qi6, 1 :5 i:5 M2, and Q, we obtain the following:

LEmmA 2.5: Let A denote the greedy matching of ( xl, x2,. . . , xn 1 C Q and let B be the union of the M2 greedy matchings of txl,x2,...,xnj n QP, 1 :5 i:5 m 2 . The union A U B consists of (a) double edges, i.e., edges belonging to both A and B, and (b) edges which are part of a monotone alternating path which contains an edge of A which either (i) has an endpoint in F6 or (ii) has endpoints in two different Qp.

Here, one should recall that some of the socalled alternating chains may be of length one, i.e., chains consisting of a single edge are still regarded as honest alternating chains.

The next lemma shows that despite the a priori global nature of the greedy matching process, it can be approximately localized. To express this precisely, we let G(S) denote length of the greedy matching of (Xl,X2,..., x, 1 n s and allow the dependence of G (S) on n to be implicit.

LEmmA 2.6: Given 8 > 0, let 7. denote the number of edges in the greedy Jching of Q which have either (i) an endpoint within 3 of F, or (ii) length at least as great as 26. One then has the bound,

2

1/2

G (Q) G (Qi) < CS T (2.2)

M1

PRooF: We let A and B denote the matchings given in Lemma 2.5, and we note

that by the same considerations detailed in Lemma 2.4 the absolute value of

the difference E 1 e E 1 e 1 is majorized by

eE=A eE=B

Ylmax[lel: e E= C). (2.3)

c

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 149

The total number of distinct edges in the sum given above is bounded by 7, so arguing as in Eq. (2. 1) of Lemma 2.4, we can bound the sum given by Eqs. (2.3) by C2.r 1/2, establishing the lemma with c5 = C2.

3. UNIFORMLY DISTRIBUTED CASE

For any two sequences of random variables [A,) and [B, 1, which are defined on the same probability space, we say [A, 1 and (B, 1 are equivalent and write

A, =_ B, provided that for any E > 0, we have P (1 A, B, 1 > E) < oo.

n=l

This relation is a bonafide equivalence, and we will particularly make use of transitivity, i.e., A, =_ B, and B, =_ C, imply A, =_ C,

In this section, X1,X2,..will stand for independent and identically distributed random variables uniformly distributed in the unit square of R 2

The main result of this section can be stated quite succinctly.

THEOREm 3. 1: There is a constant c, 0 < c < , such that

G (X,, X2,.. . , XJ 1ffin =_ c.

The proof of Theorem 3.1 requires several lemmas. The first of these spells out two simple properties of triangular arrays which prove to be very handy.

LEmmA 3. 1:

(i) Let Ykj, 1 :s k < oo, 1 :5 i < nk, be a triangular array of random variables. For each k, we assume the nk random variables Ykj, 1 < i:s nk, are independent and identically distributed. We let ttk denote the mean of Ykj and set E(Ykj 9k )4 = Ink. If we have liminf nklk > 0 and liMSUP Ink < 00, then

nk

Y, YkjInk 9k.

i=1

(ii) If Z1J2,. .is a sequence of Poisson random variables with any joint distribution, and if EZk = ak, then lim sup akIk < 00 implies (Zk ak)lk 0. PROOF: To cheek part (i), we first center the series Wki = Yki Uk and calcu

late the fourth moment,

nk nk

W

kiy] 2 2

E[ EWk4i + 3nk(nk 1)EWklIVk2.

By Schwartz's inequality, the second summand is bounded by nk(nk I)Mk, so

nk

E WkiInky] :5 21nk 2 Ink.

1

i 1

150 D. A vis, 8. Da vis, and J. Michael Steele

Now for any E > 0

P( WjIn, > E 5 E 4 E [( nk WkiInk )41 5 E 4 3n 2 Mk,

k

and Mknk2 converges because liminf nklk > 0 and liMSUP Mk < 00.

The second part of Lemma 3.1 follows from the first part by noting that

each of the Poisson random variables Zk can be represetned as the sum of k

independent Poisson variables Ykj, each having expectation aklk. The [ Ykj

then form a triangular array to which Lemma 3.1(i) can be applied. a

Returning to the proof of Theorem 3. 1, we let F (t) be a Poisson random variable with parameter t which is independent of the sequence [Xil. By Lemma 2.4, we have the bound

1 G (X,, X2,.. , Xj G (X, .... YXr(n))l :5 c4IF(n) ni 1/2 so, for any E > 0, we have

G (X,, X2,. . . , Xj G(Xl,_,Xr(,,))

,fn ~n

k=l (1

IP(n) n 2c~2

P n > E <

k=l ( 1

where the last inequality comes from applying Lemma 3. 1 (ii). Thus, to prove Theorem 3. 1, it suffices to prove

G(Xl, ..,Xr(n))14rn = C. (3.1)

Now, if R is any subregion of [0,112, and (xl,x2, ...,xn) C [0, 1] 2, let GR (xl, x2_ . . , xn) be the greedy matching of the xi which are elements of R. If S is a subsquare of [0,1]2 of side length s, then the number of elements of [ xl, x2,...,x,(,)) n s isPoissonwith parameter S2t, so by scaling we see Gs(Xl,_,Xr(,)) is equal in distribution to sG(Xl,.. ~xr(S2t)).

Now, if weleto(t) =EG(X,,...,Xr(t)) andb(t) =E[(G(X,,...,Xr(,)) 0(t))'], then from Lemma 2.4, we can see both (p and 6 are continuous and hence bounded on compact intervals. Finally, we recall that if (Si 1, 1 :s i < N are any subsets of [0, 1] 2 such that Si n Sj has measure zero for each i and j with i # j, then for any variables Zi, 1 < i < N, such that Zi is a function of the random point set (xl,x2,...,xr(.)i n si, then the variables IZil are independent.

Next, let M be an integer, let 0 (n) be the largest integer satisfying n10 (n)' > M, and note that n10 (n)2 > M as n > co. When we divide [0, 1] 2 into 0 (n)'

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 151

equal squares S1,S2, ... 9SO(n)2~ we see by Lemma 3.1(i), the boundedness of 6 on compact intervals, and the continuity of 0, that we have

0(n)2

Gs,(Xl_ . .,Xr(,))10(n) =_ 0(n10(n )2) = O(M). (3.2)

We now need to relate this Poisson limit result to our original problem.

LEMMA 3.2: Given E > 0 there exists MO such that M ~t MO implies

P(~0(n)2

yj Yj Gs, (X,, . . . , Xr (,)) 1,1n G (X,, X2, . , X,) 1,1n > E < 00.

n=l i=1

(3.3)

PROOF: We first fix ?\ > 0, and put 6(n) = X/20(n); so, in the notation of Lemma 2.5, the area of F'(") does not exceed X. Next, let u (n) denote the number of edges of the greedy matching of ( X,, . . , Xr (n) 1 which have length as great as 25 (n), and let P (n) denote the number of points lying in F'("). By Lemma 2.3, we have u(n) < C3(26(n)) 2, so since 0(n)2 5 n1M, we have U0) < C3n1MX2. Also, P(n) is a Poisson random variable with mean not exceeding 2Xn, so by Lemma 3.1(ii), we have EP(v(n) > 3Xn) < oo. Thus, if M is chosen so large that C3 IM)~ < X, we have

P((u(n) + P(n)) 1/2 > (4Xn) 1/2 < 0,.

n=l

In view of Lemma 2.6 and the definitions of u and v, we see the proof of Lemma 3.2 is complete.

We can now return to the completion of the proof of Theorem 3. 1. By the equivalence relation (3.2), Lemma 3.2, the continuity of 0, and the fact that n10(n)2 _ M' we see that for any E > 0 there is an M, such that M k M, implies

C.

E P(IG(XI,X2,...,X,,)1~n O(M)1~M1 > E) < 00. (3.4)

n=l

One final (and slippery) observation is that Eq. (3.4) implies that O(M)1~Mconverges to a limit as M oo. To check this, note that if a = liminf O(M)1~M # limsup O(M)1~M = P one could not have Eq. (3.4) for E < (0 a)/3. Therefore, liM M 1/2.0 (M) = C < oo exists, and n 112Gn = C, COM_ pleting th6 proof of Theorem 3. 1.

4. GENERAL EXTENSION

We now extend the results on uniform distributions to random variables with compact support. The essence of the problem continues to be visible with d =

152 D. A vis, 8. Da vis, and J. Michael Steele

2, so we will restrict attention to that case. This time we let X1,X2,..be an infinite sequence of independent, identically distributed random variables with compact support in R' and denote the common distribution of the Xi by P.

We can assume without loss of generality that P has support in [0,112 , and we claim there is also no loss of generality in assuming

v f(x,y): x or y is rational) = 0. (4.1)

To check this, first note that there is an uncountable set A C IR such that the difference of any two elements of A is irrational. One way to build A is to let the singleton [ 11 be extended to a basis B for IR viewed as a vector space over and take the required set A to be B ( 11.

The uncountability of A and the fact that any differences of elements of A are irrational guarantees that there is an a EE A such that v [ (x, y): x a + r, r rational 1 is zero and a b E A such that P [(x, y): y = b + r, r rational equals zero. The translation of P by the pair (a, b) then satisfies Eq. (4. 1). Moreover, any further translation of P by rational numbers, or dilation of P by rational numbers, still satisfies Eq. (4. 1); in particular, any translation and dilation that maps the support of j, to [0,112.

We next denote the singular part of P by g. Let f stand for the density of the absolutely continuous part of g, and define our target limit constant k by k = c f f 1/2 (x) dx,

(0,1] 2

where c is the constant of Theorem 3. 1.

The main result of this section is the following. THEoREm 4.1: G(Xl,X2_. _X,)IVn =_ k. PROOF: As before, it suffices to show that

G (X,, . . . , Xr (,,) ) / ~\1n s k. (4.2)

We begin the proof of Eq. (4.2) by decomposing the square just as was done in Lemma 2. 5. Next, we define a piecewise constant approximation to f on Q = [0,1]2 by letting f, (x) = m' £i f for x in the interior of Qi and let

ting f (x) = 0 for any x not in the interior of some Qi. Also, for any finite set S, we let #S denote the number of elements of S.

Now, choose 0 < ii < 1 and pick an integer m (depending on n) which is sufficiently large to guarantee the following four approximation conditions:

(i) There exists a subcollection ( Q, i C E 1 of Qi, . . . , Q,2 such that #EIM2 < il and A ([0, 1] 2 H) < q, where H U Qi.

iEE

i

i

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 153

f P12 f fM1/2 <

[0,1] 2 [0,11 2H

f if f.1 <

[0,1] 2

(iv) m > I/n.

We let Pn, = ~ be the probability measure with singular part tt and absolutely continuous part f,,, and note that for any compact set K we have 1 P (K) P (K)l < 1q. By the coupling lemma of Strassen [161, there will then exist a pair of random vectors (Zl, W,) such that Z1 has the distribution P, W, has the distribution P, and P(Z1 # W,) < 1q. We next let (ZI, W,)__ be independent copies of these vectors, and for each integer n, we let l` (n) be a Poisson random variable which is independent of the sequence f(Zi, Wi)).

Finally, we can lay out notation for the variables of basic interest:

Un = ( Z1, 4 (n) 1,

Pn = f W1, wr(n) 1,

V,k= VnQ, 1 :s k :5 m',

ek = vkn(Qk support g 1, and

A, = UnA V, where A denotes the symmetric difference.

Since the Zi have the same distribution as the Xi, the complete convergence of n 17 (n) n 1 to zero and Lemma 2.6 lead us to

G(Un)l~n =_ G(Xl,X2,. . .,Xn)lln, (4.3)

by the same path used in Section 3. Now, since An has the Poisson distribution with parameter not greater than 2nq, we see from Lemma 2.4 and Lemma 3. 1 (ii) that there is a constant 00 not depending on m such that

m

Y, P(JG(U.)1~n G(Pn)Iffinj > Poq) < 0.. (4.4)

n=l

Next, we claim that there is a constant 01 not depending on m such that

M2

P G (Pn) 1,1R E G (Vnk) 1,1n > < 00. (4.5)

n=l C k=l

To prove Eq. (4. 5), we first note that by the assumption made in Eq. (4. 1) we can choose 8 (n) = 6 to be so small that v (F,',,) < i?. The number of points of Vn in Fm4 then has the Poisson distribution with parameter n;,(Fan) < nq,

1

154 D. A vis, 8. Da vis, and J. Michael Steele

and by Lemma 2.3 the number of edges in the greedy matching which have length exceeding 25 does not exceed C3 (25) 2 . The proof of Eq. (4.5) is thus completed by applying Lemma 2.6 and arguing again just as in the proof of Lemma 3.2.

We next claim that there is a constant 02 not depending on m such that

(~ M2 n,2 1/2

E p Y, G ( Vk) 1,1n Y, G (Wk) 1,1n 02 n < 00. (4.6)

n=l k=l k=l

To prove Eq. (4.6), we first note that #Vk is a Poisson random variable with mean nP(Qk) and for each fixed n the variables Wk, 1 ::~ k ~~ M2, form an independent collection. By Lemma 2.2, scaling, and the elementary inequal

ity In, + + In, :s fs~ ~ + + ns . we have,

G (Vnk) 5 C2 M Vnk

kEE keE

# V'k.

:S C2 M ',,F#E ~A ~IEE

Now, E# Vnk is a Poisson random variable with parameter less than or equal to n, and the approximation condition (i) says (#E) "'lm < q, so Lemma 3. 1 (ii) gives

P n 1/2 E G(Vk,) > 2C2n < OD. (4.7)

n=l (1 kE=E

By a completely similar analysis, we also have

EP n 1/2 E G(Wnk) > 2c2n) < co. (4.8)

n=0 ( ~ keE

Lemmas 2.2 and 2.4 show that

Y, 1 G ( Vk,) G ( Wnk) 1 :5 c4 m V# ( Vk, Wnk)

k0E k~tE

~M2 # (Vk Wk

:5 C4M

k§~E

Now, since the sum Y, # (Vk Wk) is a Poisson random variable with mean

k0E bounded by nq, Lemma 3.1(ii) gives

G (V.k) G (Wk, > 2c4,o 1/2 < Co.

n=0 k" ko4E

Together with Eqs. (4.7) and (4.8), this last inequality completes the proof of Eq. (4.6).

1

1

A GREEDY HEURISTIC FOR EUCLIDEAN MATCHING 155

Finally, we note that

M2

E G(Wk)n'12 c fm 1/2

k=l f

follows from

= (C1M2 1/2,

G (Wk) )ak (4.9)

where Uk is the value of fm on Qk. Given that Xi E Qk, we know Xi is uniformly distributed on Qk, so the relation (4.9) follows from scaling, the fact

Itr2

that #Wk" has a Poisson distribution with parameter nUk Theorem 3. 1,

and Lemma 2.4.

5. OPEN PROBLEM AND RELATED WORK

The limit theory for greedy matchings has been established with most of the generality and precision one might expect. Still, there are some tenacious open problems.

It is not difficult to extend the restriction of Theorem 1.1 a little way beyond compact support, but it does not seem easy to provide a broad sufficient condition even if one restricts attention to absolutely continuous random variables.

A second problem of interest concerns matching in one dimension. By the natural onedimensional analogue of Lemma 2.1, one can show that Q, = 0(log(n)), and it is reasonable to conjecture that for Xi independent and identically distributed on U[O, 11, one has G, =_ c log n with probability 1. The tools used here are not sharp enough to deal with this question, and it is not even known if EG, )~ co and n+ cc. For a related problem concerning optimal triangulation, one may consult Steele [15].

These last two problems seem solvable, but there are other open issues associated with Q, which are less likely to be resolved. First, we do not know any analytical method to determine (or even estimate) the value of the constant kd which appears in our limit theorem, although one can get some idea of kd by MonteCarlo experiments. Second, we strongly expect that the approximate representation of G(Q) as a sum of independent and identically distributed random variables G(Qj) should be strong enough to permit the development of a central limit theorem. Still, it is not yet possible to extract the central theorem from Lemma 2.6, and the central limit problem is likely to be much more difficult than that lemma suggests. For related problems for a number of different functionals, one may consult Steele [141.

References

1. Avis, D. (198 1). Worst case bounds for the Euclidean matching problem. Computers and Math

ematical Applications 7:251257.

2. Avis, D. (1983). A survey of heuristics for the weighted matching problem. Networks

13:475493.

156 D. A vis, 8. Da vis, and J. Michael Steele

3. Beardwood, J., Halton, J.H., & Hammersley, J.M. (1959). The shortest path through many points. Proceedings of the Cambridge Philosophical Society 55:299327.

4. Bentley, J.L. & Saxe, J.13. (1980). Decomposable searching problems 1: static to dynamic transformations. Journal of Algorithms 1:301358.

5. Bondy, LA. & Murty, U.S.R. (1976). Graph Theory with Applications. New York: American Elsevier Publishing Co., Inc.

6. Few, L. (1955). The shortest road through n points in a region. Mathematika 2, 141~144.

7. Iri, M., Murota, K., & Matsui, S. (1981). Linear time heuristics for the minimum weight perfect matching problem on a plane. Information Proceedings Letters 12:206209.

8. Karp, R.M. (1977). Probabilistics analysis of partitioning algorithms for the traveling salesman problem in the plane. Mathematics of Operations Research 2:209~224.

9. Karp, R.M. & Steele, J.M. (1985). Probabilistic analysis of heuristics. In E.L. Lawler et al. (eds.), The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. NY: John Wiley and Sons, pp. 181~206.

10. Papadimitriou, C.H. & Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs, NJ: PrenticeHall.

11. Papadimitriou, C.H. (1977). The probabilistic analysis of matching heuristics. Proceedings of the 15 Allerton Conference on Communication Control and Computing, pp. 368~378.

12. Reingold, E.M. & Supowit, K.J. (1983). Probabilistic analysis of divide and conquer heuristics for minimum weighted Euclidean matchings. Networks 13:4966.

13. Reingold, E.M. & Tarjan, R.E. (1981). On a greedy heuristic for complete matching. SIAM Journal of Computing 10:676681.

14. Steele, J.M. (1981). Subadditive Euclidean functionals and nonlinear growth in geometric probability. Annals of Probability 9:365376.

15. Steele, J.M. (1982). Optimal triangulation of random samples in the plane. Annals of Probability 10:548553.

16. Strassen, V. (1965). The existence of probability measures with given marginals. Annals of Mathematical Statistics 36:423439.

i

1

1