Adt. Appl. Prob. 14, 5667 (1982)

Printed in N. Ireland

00018678/82/01005612$01.45

@ Applied Probability Trust 1982

STEINHAUS'S GEOMETRIC LOCATION PROBLEM FOR RANDOM SAMPLES IN THE PLANE

DORIT HOCHBAUM,* CarnegieMellon University J. MICHAEL STEELE,** Stanford University

Abstract

Let

Mn Xill"

min IIXi

i.s

where X,, l:5 i 5_ n, are i.i.d. and uniformly distributed in [0, 11'. It is proved that M. cn 1p12 a.s. for 1 :_5 p < 2. This result is motivated by recent developments in the theory of algorithms and the theory of subadditive processes as well as by a wellknown problem of H. Steinhaus.

LOCATION PROBLEM; kMEDIAN PROBLEM; PROBABILISTIC ALGORITHM; SUBADDITIVE PROCESS; SUBADDITIVE EUCLIDEAN FUNCTIONAL

1. Introduedon

The work of Steinhaus (1956) was apparently the first explicit treatment of the natural question 'How should one choose n points from a mass distributed in the plane so as to best represent the whole?' The main objective of this article is to treat a stochastic analogue of Steinhaus's problem.

One principal motivation for this stochastic analogue comes from developments in the theory of algorithms. The first of these is the discovery by Karp (1977) of an eflicient probabilistic algorithm for solving the traveling salesman problem. The second development was the proof of Papadirnitriou (1981) of the conjecture of Fisher and Hochbaum (1980) that the Tuclidean kmedian location problem' is NPcomplete.

More will be said about these algorithmic considerations in a later section, but we should first state our results. For any xi ER' and integers k and n we define

n

(1.1) M(k;Xl, X2, Xn) = min Y. min 11x, x, 11.

5:ISI=ki=i ies

Received 11 November 1980; revision received 31 March 1981.

*Postal address: Graduate School of Industrial Administration, Carnegie~Mellon University, Pittsburgh, PA 15213, U.S.A.

** Postal address: Department of Statistics, Stanford University, Stanford, CA 94305, U.S.A.

Research supported in part by the Office of Naval Research Contract N0001476C0475.

56

i

1

Steinhaus's geometric location problem for random samples in the plane 57

(Here the minimum is over all S C {Xl, X2, Xn} such that S has cardinality

IS1 = k.)

In words, one views S as a ' set of k 'centers', and each 'site' xi is served by its

nearest center xi E: S at a cost equal to llxixil. The quantity

M(k; Xl, X2, . ',Xn) is therefore the minimal cost attainable by an optimal

choice of k centers. This language is chosen in sympathy with the applications

which have been suggested in Bollobds (1973), Cornuejols, Fisher and

Nernhauser (1978), and Starret (1974).

Our main result is the following.

Theorem 1. If Xi, l:5 i < , are independent and uniform on [0, 112, then for any 0

(1.2) lim n"Mffna ]; X,, X2, Xn) = C~

n

with probability 1 for some constant 0 < C~ <

For reasons which will be discussed in the last section it will be useful to generalize this result slightly. For any 1 ~= p < we define

n

(1.3) M,(k;xl,X2,,X.)= Min Yminlic.xjiiP.

S:1Sl=ki=1 irs

Under the same hypotheses as in Theorem 1 we shall prove the following. Theorem 2. For any 1 1 p < 2 and 0 < a < 1 we have with probability 1 lim M, ([an 1; X,, X2, XJIn 1p12 = C."

n

for some constant 0 < Cp <

Naturally, it will suffice to prove just Theorem 2, and this proof will occupy the next three sections. In Section 2 we concentrate on the key combinatorial observations which will make the theorem possible. Section 3 contains a Tauberian argument which partially parallels that used in Steele (1981b), but with the significant difference that the present problem lacks the monotonicity previously relied on in the theory of subadditive Euclidean functionals (Steele (1981a,b)). The elementary Lemma 3.4 on the 'differentiation' of an asymptotic series is one of the devices introduced here which may prove useful in other nonmonotone problems.

The proof of Theorem 2 is completed in Section 4 with the help of the EfronStein jackknife inequality teamed with the combinatorial lemmas of Section 2 and classical arguments.

The last section briefly discusses the algorithmic application of this result. We also discuss the extension of our results to nonuniformly distributed random variables, and comment briefly on some open problems.

i

58 DORIT HOCHBAUM AND J. M. STEELE

2. Combinatorial and geometric lemmas

Most of the observations which are special to the location problem are contained in this section, but one can find some hidden generality since most subadditive Euclidean functionals have properties analogous to those that follow.

Lemma 2. 1. There is a constant such that for all Xi E [0, 1]2, 1 :5 i :5 n,

and k 2

p12

(2.1) M,(k;X1,X2,'

S2 (S + 1)2 [0, 1]2

Proof. Let an integer s be chosen so that = k , and divide into S2 cells of side s1. For each occupied cell Cj choose one element x'EE Cj

S, = {X,: 1 t~_ i:5 S2}.

and set i Since each cell has diameter J2s1 and since

IS'l::5 k, we have the generous bound

M, (k;Xl, X2, x.) n(J2s')P 2P"n(kA 1)P

which yields the lemma.

The next lemma will provide the combinatorial linchpin required in our variance bounds.

2

Lemma 2.2. There is a constant J3'= J3p' such that for all xi e R ' 1 :5 i :5 n, and k 1 2 the difTerence

MP(k; Xl, X2, xn) M,(k + 1; Xl, X2, Xn)Ap

satisfies the bounds

1p12

(2.2) 0 AP:5 O'nk

Proof. Without loss we can assume that all of the 11x, .~.IIP are diflerent. For any set of optimal centers S associated with M, (k + 1; Xl, X2, X.) we define for each i E S the set

N(i) ={j; IIXJ X,11

This gives the representation

p

MP(k + 1; Xl, X2, X.) IIX, XIII

ieS jeN(i)

We now note that

(2.3) #{iE:S: Y, ilx,.x1IIP>4M,,(k+l;xl,x2,...,x,.)/(k+l)}5(k+1)14

jeN(i)

and also that

(2.4) #{iES: IN(i)l ~:4n/(k + 1)}:5(k + 1)14.

i

i

i

Steinhaus's geometric location problem for random samples in the plane

This implies that for H defined by

H iES: 1 iixixiliP:540n(k+l)P"/(k+l) and IN(i)i~4nl(k+l)

jeN(i)

59

We have by (2.3), (2.4), and Lemma 2.1 that #M:(k + 1)/2.

This time we choose s so that S2< (k + 1)/2:!!i (s + 1)2 and again divide [0, 112 into S2 cells of side s1. Some cell must contain two elements of H, say x,, and Xi2'

We now define a suboptimal choice for the k centers by using S'= S\{xJ. We continue to serve all Ui,i,{xi:jE=N(i)} as before but now serve the elements of {xi : j E N(iJ} by Xi2 . The cost of serving {xi : j c N(iffl by xi. is

1 lixixi."ilp=

(2.5) icN(i,)

+ lixi, xi,l~p)

iE=N(i~)

2P140n(k + 1)P"` + (.~,/2sl)P(4n)/(k + 1)}.

For k ~t 2 the last expression can be bounded by O'nk 1p12 ; and, since (2.5) majorizes AP, this proves the second inequality in (2.2). The first inequality is trivial, so the lemma is complete.

One often finds it useful to know that in an optimal allocation no site is very far from a center.

Lemma 2.3. There is a constant 0" = 0" with the property that for any S

p

with IS1 = k which minimizes Yin=l mini,s 11.~c, Ycillp we have

1p12

'p'n k

8, 8p(k; Xl, X2, Y,,) max min lix, )~ilp

l~j:.n ieS

Proof. If we take S'= S U {x'} where x' is the element Of {X 1, X2, X.}

farthest from any element of S then we have

MP(k+l;xl, X2, x.):5MP(k;Xl,X2, X.)8p.

1p12

By Lemma 2.2 we then see that 8p S O'nk ' so we can take JT.

The next lemma will be useful in applying jackknife methods to obtain variance bounds. By the notation g. we mean that xi is to be omitted from the sample, thus {Xl, X2, ' ' . ' :~. ' . . . 5 XJ= {Xl, X2, ' * * ' Xii, Xi+l, XJ. Also, we let 1(y:t~: 8) be 1 or 0 accordingly as y S 8 holds or not.

Lemma. Setting 8=8p(k;X1,X2, ''.,x,) and mi =mini:joi iix,x,.IIP,we have

MP(k;Xl, X2, g xJ:5 MP (k;Xl, X2, XJ

(2.6) n

+ 2P8 1 (Iixi x' lip = 8)

and

(2.7) Mp(k;Xl,X2,''.,x.):5MP(k;xl,X2,'

,k x.) + 2P (ni, + 8).

i

60 DORIT HOCHBAUM AND J. M. STEELE

Proof. To prove the first inequality let S be an optimal choice of k centers. for {Xl, X2, * * . 1 x,}. If xi 0 S the inequality is trivial, so suppose xi c S. We now choose a suboptimal set S' of k centers for {Xl, X2, * * * , ~i, x,}. First let

N ={xi: ~iXi Xi~l

If N:~ {x,} take any x'E N\{xj} and set S'= (S\{xj}) U {x'}, but if N = {xi} just take any X'E {Xl, X2, ' ' ' , Ģi, ' ' * , Xn} and define S' as before.

We now serve {xl, x ~i, . . . , x.} by S' as follows. If N = {xi} then each

point is served by the same center as it was served by in S. If N7L {xi} then the

elements of N\{xj} are served by x', and the others are served as before. The

cost of this suboptimal choice is bounded by

n

M,(k; Xl, X2, Xn )+ ~lx, x'~IP 1(x, E N\{xj})

n

M, (k; Xl, X2, Xn) + 2PS 1Oxi xillp 8).

The second inequality (2.7) is easier. Let S be an optimal choice of centers for 1xl, X2, 9 ib 5 Xn} and note that x, can be served by an element of S at a cost less than

2P ~inlix,xillP+maxminllx,xillP :52P(mi+8).

j:j:p~i trS

3. Regular expectadons

For brevity we set M,, = Mp([an]; X, X2,. XJ. Our first objective is to show that EX, cn p12 . The method begins as in the classical approach taken by Beardwood, Halton, and Hammersley (1959) in the study of the traveling salesman problem. As noted in the introduction, the main novelty here is due to the necessity of overcoming the fact that M. fails to have the monotonicity M,,:_SM,l. The impact of this nonmonotonicity is even more strongly felt in the next section. (The desire to understand a subadditive Euclidean functional which failed to be monotone provided the second principle motivating this work.)

We now let 11 denote a Poisson point process on W. For any Borel A c R', II(A) will consist of a set of NA points uniformly distributed in A, where NA is itself a Poisson random variable with mean A (A), the Lebesgue measure of A.

Lemma 3. 1. Let A = [0, t]' and set 0 (t) = EMp QaNA J; II(A)), then for all integers m ~, 1 we have

(3.1) 0(t)SM20(t1M).

Steinhaus's geometric location problem for random samples in the plane 61

Proof. Let A be divided into m' cells Q of side t/m, then by the suboptimal

ity of local optimization we have

M2

M,GaNA1; II(AW5 MQaNDJ; ll(Q)).

On taking expectations and using the homogeneity of H we obtain (3.1).

Lemma 3.2. If 0(t) is any continuous function which satisfied (3.1) for all m then

(3.2) lim 0 (t)l t' = lim inf (p (t)l t2

Proof. By the continuity of 0 (t) and the definition C, =_ lim, inf (P (t)/t2 we can choose an interval (a, b) such that

,0 (ty t2 C." + 8

(3.3)

for all t E: (a, b). By (3. 1) we can conclude that also we have (3.3) for all

tEUZ=l(ma,mb). Since Im=(ma,mb) and I,,, intersect for all m~:

a (b a)', we see U' (ma, mb) contains (moa, and, therefore,

m

lil~ SUP 0 (t)l t2,< C~", +

which completes the proof.

Lemma 3.3. For l:5 p < 4 we have for x T 1 that

(3.4) (EM.)xC.,,]F(2p/2)/(1x)'"'

n1

and

n

2p12

(3.5) EMk C, (2 p/2)'n

k=l

Proof. Calculating 0(t) by conditioning and making a change of scale from

[0, t]2 to [0, 1]2 shows that (3.2) can be written out as

tp(EM

(3.6) 0(t)= )el't2'1n! C.,t 2.

By changing variables t2 = U we see that

1p12

(3.7) (EMn)euln!C.,,u

n=l

Now by the Abelian theorem for Borel summability (e.g. Doetsch (1943), p.

i

62 DORIT HOCHBAUM AND J. M. STEELE

191) and the fact that 1p/2>1 we have as x 1 that

(3.8) ~ (EM,,EM,~)xC.,,F(2p/2)/(1x)'"2.

n=l

Multiplying by (1 x)' then completes the proof of (3.4). Since EM, ~: 0 we get (3.5) from (3.4) by an immediate application of the Karamata Tauberian theorem (Feller (1971), p. 447).

We should now like to 'differentiate' (3.5) in order to obtain the asymptotics of EM,. Fortunately, the next lemma shows that this is (just barely) legitimate.

Lemma 3.4. If 1Z = 1 Mk cn'y for y > 1 and Mk+1 ~~ Mk BVI2 for some B and all k k 1 then

(3.9) m. cyn"'.

Proof. Let y > 1 be chosen and note that

jY2 n(y 1) mn B y2.

Mk Mn B 2: 2: j

n;gk;Syn n;Sk:Syn ( i.n k=ni=n

Dividing by n^l and using the EulerMaclaurin summation formula to handle the double sum gives

(y'y 1)c (y 1) lim. sup (ntnln'yl) (.1 1)'{yl(y'y 1) (y 1)}B.

Next dividing by y 1 and letting y 11 shows

,yc !, lim sup (mnln'y').

In a completely analogous way one can show that lim sup (mnln11) k_ yc by estimating the SUM Y,nlkln Mk where y < 1.

The next lemma is the main consequence of this section.

Lemma 3.5. For 1 p< 2 we have EM,, Q,,n 1p12 as n

Proof. We already have Y,"t. 1 EMi C,, (2 p/2)1 n'P/'. By Lemma 3.4 with 1 < y = 2 p/2 it suffices to show

(3.10) EM, , 1 ~t EMl B 1 P'2

for some B and all 1. By Lemma 2.4 (with sample size n + 1 and kn,l) and

by Lemma 2.2 (if [(1 + 1) a ] > [ la ]) we have

1+1

(3.11) M1kMI2P8 1(11XIXilip:58)([(1+1)all(la)])AP.

Here by Lemma 2.3, 8 5 p,,1[Iallp12 ; and by Lemma 2.2, Alp 1[1a]1p12.

p

i

Steinhaus's geometric location problem for random samples in the plane 63

By elementary estimates we then see that

1+1 lk

E 1 (1iXi xi lip < .5)

is bounded, so taking expectations in (3.11) yields (3.10).

4. Completion of the proof

The results of the preceding sections will now be brought together to prove

Theorem 2. The only new tool required is the recent result of Efron and Stein

(1981) which says that the jackknife estimate of variance is positively biased.

Explicitly, we first suppose that 8(xl, x, . * * ' X.,) is any symmetric function

of n 1 vectors xi. For each i we set Si = S(Xl, X2, Ģi, x,) and also set

In

S.= l/n , =, Si. If X, are any independent and identically distributed random vectors, the EfronStein jackknife inequality says that

n

2

(4.1) Var S(Xl, X2, X,) ~ E (si S.)

We shall now apply this inequality with the aid of the combinatorial bounds of Section 2.

Lemma 4. 1. If Xi, 1 i < are independent and uniformly distributed on [0, lF, then for a constant C not depending on n we have

(4.2) Var M, ([an]; X,, X2,, X.) =Var Mn:5 Cn1P.

Proof. We first note that if S is replaced by any other variable, the right side of (4.1) is only increased. Using (4.1) and Lemma 2.4 we now calculate (with 8 = 8,Qnal; X,, X2,. X,1)):

n+l

Var Mn E (Mp ([an]; X,, X2, X+1)

(4.3) Mp ([an]; X,, X2,. X,,))'

n+l n+l

E 2P 8 1 (IIXi Xi 11P S 8) + 2P8 + 2P min 11X. X. 11 P)2.

Replacing 8 by P"n[anl'p' = pn will by Lemma 2.3 only increase the right

p

side, so using Vinogradov's symbol to ignore irrelevant constants we have

2 1 (IIX. Xlilp g; P.)2 _ X1112p

IIX.

(4.4) Var Mn<< nE Pn + p2. + min

i i i i i

i

64 DORIT HOCHBAUM AND J. M. STEELE

Now, since p2,<< nP, it is an elementary calculation to show

2 (n+l 2

<< E n << 1

E (ny loxi pn))

and

IIX X11J2p << np.

E min

j:j:~l

These bounds and (4.4) imply (4.2) and complete the lemma.

We now note that Var (MnIn 1p12) << l/n and that this bound is not sharp enough to automatically imply complete convergence. It is therefore necessary to resort to a subsequence and maximal argument to prove Theorem 2.

By the bound (4.2), Lemma (3.5), the BorelCantelli lemma and Chebyshev's inequality one can easily show

(4.5) lim, Mn,/n k" = C, a.s.

nk

for the subsequence nk = [k'y] for any y > 1.

We now set

1p/2

Dk= max JM,,/nl,"'M,/nk 1

n, Zn

and note that to complete the proof of the theorem it suffices to show Dk 0

a.s. For this it certainly suffices to show E Ek= 1 D2k <

We set an = IMn,j/(n + 1)102 MnIn 1p12 j and note

112 2P12

an << IM.,,Mni/n + MnI n

p/2 By Lemma 2.1 we have Mn << n` ' and if [a (n + l)] = [an] the same estimates used in (4.4) will show E(Mn,l _ Mj2 <

Mn:5 M(k + 1; X,, X2,. X.) + J3'nk 1p12 and by Lemma (2.4) (and (2.3)) that

M(k + 1; X,, X2,.. ., XJ:55 M(k + 1; X,, X2,. X, X,1) n

+ 2P8 (1 11X XnAl" _= 8)

where 8 =_ O'P'(n + 1)(k + l)1p12 . Together these bounds and elementary calcu

lations show in the case that [a(n+l)]=[anj+1 that one again has

M

j2 2 << 2.

E(Mn,l << n1, and hence Ean l/n

i i

1 i i

i 1

Steinhaus's geometric location problem for random samples in the plane 65

The final calculation is that

2 2 2

ED2k = E(E a.)'= (n,,, nj (Y_ Ean ) << kl I n << k

where the three sums are each over the range nk ~1 n < nk+1.

This verifies E IZ= 1 D'k < and completes the proof of Theorem 2, except for verifying that indeed C, > 0. To show this last fact we set

Zn = 1 1 l(IIXi XJJ:5 PIVn)

n 1ii.

and note that easy calculations show

(4.6) E4 > O'iT/2 as n

and

(4.7) Var 4, > 0 as n

Since M. is the sum of n [na] elements of S = {J1Xi XilIP: 1 t5 i

(4.8) M. ~: (on' 1. 1(nZ. < (n [na])/2). (n [na])/2.

Taking expectations in (4.8) we have

(4.9) n p121 EM,, k p P^4 < (1 a)/2) . (1 a)12.

t! For p2 < (1 a)lir, Equations (4.6), (4.7) and Chebyshev's inequality will

suffice to show that the right side of (4.9) is bounded away from 0. This shows

C, > 0 and completes the proof.

5. Algori~c hnplications

The fact that the Kmedian problem has been proved by Papadimitriou (1981) to be NPhard means that it is extremely unlikely that there is an efficient algorithm for calculating the optimal choice of an centers from n sites (d. Karp (1972)). Therefore, since the Kmedian problem occurs in a variety of practical contexts, it seems quite desirable to find efficient algorithms which are capable of providing approximate optimal center selections.

The results of this article take a step toward this by providing an estimate for the value of an optimal selection. This value can be used in the construction of approximately optirnal probabilistic algorithms for the Kmedian in a manner which is completely parallel to the way the asymptotic optimal value provided by Beardwood, Halton, and Hammersley (1959) has been used by Karp (1977) in the study of the traveling salesman problem. One algorithm of this type for

i

1

i

66 DORIT HOCHBAUM AND J. M. STEELE

the Kmedian problem (but with K< log n) has already been constructed in Fisher and Hochbaum (1981).

6. Concluding remarks and open problems

One of the motives for investigating the functional

n

M,([an];X1,X2,,X.)= min IminliX,X,~IP

S:ISI=[n~1j=1 ic=S

for general p is the trite observation that as p > we have

M11P , min max min IIX, xi 11 = H..

p S:1SI=[n.] 1izgn jeS

The functional Hn is of independent interest and it was hoped that the present methods might throw some light on its probabilistic behavior. We now believe that nIHn converges in distribution, but we have no idea how this might be proved. Since our methods seem to require 15 p < 2 and are more pertinent to strong laws, an entirely new technique may be needed.

There are also basic open problems directly concerning Mn

MI, ([ozn 1; X,, X2, . . . , Xn). In particular, it seems almost certain that a result analogous to our Theorem 2 must hold when the Xi are independent, identically distributed, and bounded. The methods used in Steele (1981a,c) seem to fail to help in the location problem because of the difficulty of establishing the intermediate result for step densities.

Finally, there is the question of determining C,. This is usually hopeless, but perhaps not in this case. FejesT6th (1959) was able to determine the analogous constant in the original Steinhaus problem, and McClure (1976) has been able to extend the work of FejesT6th to other functionals and extremal problems.

References

BEARDWOOD, L, HALTON, H. J. AND HAMMERSLEY, J. M. (1959) The shortest path through many points. Proc. Camb. Phil. Soc. 55, 299327.

BOLLOBAs, B. (1973) Ite optimal arrangement of producers. J. London Math. Soc. (2) 6, 417421.

CoRNuEjoLs, G., FisHER, M. L. AND NEmHAusFR, G. L. (1978) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Management $ci. 23, 789810.

DOETSCH, G. (1943) Theorie und Anwendung der Laplace Transformation. Dover, New York.

EFRoN, B. AND STEIN, C. (1981) The jackknife estimate of variance. Ann. Statist. 9, 586596.

FELLER, W. (1971) An Introduction to Probability Theory and its Applications, Vol. lI, 2nd edn. Wiley, New York.

FE1EST6TH, L. (1959) Sur la repr6sentation d'une population infinie par un nombre fini d'~16ments. Acta Math. Acad. Sci. Hungar. 10, 299304.

Steinhaus's geometric location problem for random samples in the plane 67

FISHER, M. L. AND H0CHBAUNI, D. S. (1980) Probabilistic analysis of the Kmedian problem. Math. Operat. Res. 5, 2734.

KARP, R. M. (1972) Reducibility among combinatorial problems. In Complexity of Computer Computations, ed. R. E. Miller and J. W. Thatcher, Plenum Press, New York, 85104.

KARP, R. M. (1977) Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane. Math. Operat. Res. 2, 209224.

MCCLURE, D. E. (1976) Characterization and approximation of optimal plane partitions. Technical Report, Division of Applied Mathematics, Brown University.

PAPADINIrTRIOU, C. H. (1981) Worst case and probabilistic analysis of a geometric location problem. SIAM J. Computing 10, 542557.

STARRET, D. A. (1974) Principles of optimal location in a large homogeneous area. J. Econom. Theory 9, 418448.

STEELE, J. M. (1981a) Subadditive Euclidean functionals and nonlinear growth in geometric probability. Arm. Prob. 9, 365376.

STEELE, J. M. (1981b) Optimal triangulation of random samples in the plane. Arm. Prob.

STEELE, J. M. (1981c) Growth rates of minimal spanning trees of random samples in space. Z. Wahrscheinlichkeitsth.

STEINHAUS, H. (1956) Sur la division de corps mat6riels en parties. Bull. Acad. Polon. Sci. 49 801804.

1

1