Adt. Appl. Prob. 14, 5667 (1982)
Printed in N. Ireland
00018678/82/01005612\$01.45
@ Applied Probability Trust 1982

STEINHAUS'S GEOMETRIC LOCATION PROBLEM FOR RANDOM SAMPLES IN THE PLANE

DORIT HOCHBAUM,* CarnegieMellon University J. MICHAEL STEELE,** Stanford University

Abstract

Let

Mn Xill"
min IIXi
i.s

where X,, l:5 i 5_ n, are i.i.d. and uniformly distributed in [0, 11'. It is proved that M.  cn 1p12 a.s. for 1 :_5 p < 2. This result is motivated by recent developments in the theory of algorithms and the theory of subadditive processes as well as by a wellknown problem of H. Steinhaus.

LOCATION PROBLEM; kMEDIAN PROBLEM; PROBABILISTIC ALGORITHM; SUBADDITIVE PROCESS; SUBADDITIVE EUCLIDEAN FUNCTIONAL

1. Introduedon

The work of Steinhaus (1956) was apparently the first explicit treatment of the natural question 'How should one choose n points from a mass distributed in the plane so as to best represent the whole?' The main objective of this article is to treat a stochastic analogue of Steinhaus's problem.
One principal motivation for this stochastic analogue comes from developments in the theory of algorithms. The first of these is the discovery by Karp (1977) of an eflicient probabilistic algorithm for solving the traveling salesman problem. The second development was the proof of Papadirnitriou (1981) of the conjecture of Fisher and Hochbaum (1980) that the Tuclidean kmedian location problem' is NPcomplete.
More will be said about these algorithmic considerations in a later section, but we should first state our results. For any xi ER' and integers k and n we define
n
(1.1) M(k;Xl, X2, Xn) = min Y. min 11x,  x, 11.
5:ISI=ki=i ies

** Postal address: Department of Statistics, Stanford University, Stanford, CA 94305, U.S.A.
Research supported in part by the Office of Naval Research Contract N0001476C0475.

56

i

1

Steinhaus's geometric location problem for random samples in the plane 57

(Here the minimum is over all S C {Xl, X2, Xn} such that S has cardinality
IS1 = k.)
In words, one views S as a ' set of k 'centers', and each 'site' xi is served by its
nearest center xi E: S at a cost equal to llxixil. The quantity
M(k; Xl, X2,  . ',Xn) is therefore the minimal cost attainable by an optimal
choice of k centers. This language is chosen in sympathy with the applications
which have been suggested in Bollobds (1973), Cornuejols, Fisher and
Nernhauser (1978), and Starret (1974).
Our main result is the following.

Theorem 1. If Xi, l:5 i < , are independent and uniform on [0, 112, then for any 0
(1.2) lim n"Mffna ]; X,, X2, Xn) = C~
n

with probability 1 for some constant 0 < C~ <

For reasons which will be discussed in the last section it will be useful to generalize this result slightly. For any 1 ~= p <  we define

n
(1.3) M,(k;xl,X2,,X.)= Min Yminlic.xjiiP.
S:1Sl=ki=1 irs

Under the same hypotheses as in Theorem 1 we shall prove the following. Theorem 2. For any 1 1 p < 2 and 0 < a < 1 we have with probability 1 lim M, ([an 1; X,, X2, XJIn 1p12 = C."
n

for some constant 0 < Cp <

Naturally, it will suffice to prove just Theorem 2, and this proof will occupy the next three sections. In Section 2 we concentrate on the key combinatorial observations which will make the theorem possible. Section 3 contains a Tauberian argument which partially parallels that used in Steele (1981b), but with the significant difference that the present problem lacks the monotonicity previously relied on in the theory of subadditive Euclidean functionals (Steele (1981a,b)). The elementary Lemma 3.4 on the 'differentiation' of an asymptotic series is one of the devices introduced here which may prove useful in other nonmonotone problems.
The proof of Theorem 2 is completed in Section 4 with the help of the EfronStein jackknife inequality teamed with the combinatorial lemmas of Section 2 and classical arguments.
The last section briefly discusses the algorithmic application of this result. We also discuss the extension of our results to nonuniformly distributed random variables, and comment briefly on some open problems.

i

58 DORIT HOCHBAUM AND J. M. STEELE

2. Combinatorial and geometric lemmas

Most of the observations which are special to the location problem are contained in this section, but one can find some hidden generality since most subadditive Euclidean functionals have properties analogous to those that follow.

Lemma 2. 1. There is a constant such that for all Xi E [0, 1]2, 1 :5 i :5 n,
and k  2
p12
(2.1) M,(k;X1,X2,'
S2 (S + 1)2 [0, 1]2
Proof. Let an integer s be chosen so that = k  , and divide into S2 cells of side s1. For each occupied cell Cj choose one element x'EE Cj
S, = {X,: 1 t~_ i:5 S2}.
and set i   Since each cell has diameter J2s1 and since
IS'l::5 k, we have the generous bound

M, (k;Xl, X2, x.)  n(J2s')P  2P"n(kA 1)P
which yields the lemma.

The next lemma will provide the combinatorial linchpin required in our variance bounds.
2
Lemma 2.2. There is a constant J3'= J3p' such that for all xi e R ' 1 :5 i :5 n, and k 1 2 the difTerence

MP(k; Xl, X2, xn)  M,(k + 1; Xl, X2, Xn)Ap

satisfies the bounds

1p12
(2.2) 0  AP:5 O'nk
Proof. Without loss we can assume that all of the 11x, .~.IIP are diflerent. For any set of optimal centers S associated with M, (k + 1; Xl, X2, X.) we define for each i E S the set
N(i) ={j; IIXJ X,11
This gives the representation

p
MP(k + 1; Xl, X2, X.) IIX, XIII
ieS jeN(i)
We now note that
(2.3) #{iE:S: Y, ilx,.x1IIP>4M,,(k+l;xl,x2,...,x,.)/(k+l)}5(k+1)14
jeN(i)

and also that
(2.4) #{iES: IN(i)l ~:4n/(k + 1)}:5(k + 1)14.

i
i
i

Steinhaus's geometric location problem for random samples in the plane

This implies that for H defined by

H iES: 1 iixixiliP:540n(k+l)P"/(k+l) and IN(i)i~4nl(k+l)
jeN(i)

59

We have by (2.3), (2.4), and Lemma 2.1 that #M:(k + 1)/2.
This time we choose s so that S2< (k + 1)/2:!!i (s + 1)2 and again divide [0, 112 into S2 cells of side s1. Some cell must contain two elements of H, say x,, and Xi2'
We now define a suboptimal choice for the k centers by using S'= S\{xJ. We continue to serve all Ui,i,{xi:jE=N(i)} as before but now serve the elements of {xi : j E N(iJ} by Xi2 . The cost of serving {xi : j c N(iffl by xi. is

1 lixixi."ilp=
(2.5) icN(i,)

+ lixi,  xi,l~p)

iE=N(i~)

2P140n(k + 1)P"` + (.~,/2sl)P(4n)/(k + 1)}.

For k ~t 2 the last expression can be bounded by O'nk 1p12 ; and, since (2.5) majorizes AP, this proves the second inequality in (2.2). The first inequality is trivial, so the lemma is complete.

One often finds it useful to know that in an optimal allocation no site is very far from a center.

Lemma 2.3. There is a constant 0" = 0" with the property that for any S
p
with IS1 = k which minimizes Yin=l mini,s 11.~c,  Ycillp we have

1p12
'p'n k
8, 8p(k; Xl, X2, Y,,) max min lix, )~ilp
l~j:.n ieS

Proof. If we take S'= S U {x'} where x' is the element Of {X 1, X2, X.}
farthest from any element of S then we have
MP(k+l;xl, X2, x.):5MP(k;Xl,X2, X.)8p.
1p12
By Lemma 2.2 we then see that 8p S O'nk ' so we can take JT.
The next lemma will be useful in applying jackknife methods to obtain variance bounds. By the notation g. we mean that xi is to be omitted from the sample, thus {Xl, X2, ' ' . ' :~. ' . . . 5 XJ= {Xl, X2, ' * * ' Xii, Xi+l, XJ. Also, we let 1(y:t~: 8) be 1 or 0 accordingly as y S 8 holds or not.

Lemma. Setting 8=8p(k;X1,X2, ''.,x,) and mi =mini:joi iix,x,.IIP,we have

MP(k;Xl, X2, g xJ:5 MP (k;Xl, X2, XJ

(2.6) n
+ 2P8 1 (Iixi  x' lip = 8)

and

(2.7) Mp(k;Xl,X2,''.,x.):5MP(k;xl,X2,'

,k x.) + 2P (ni, + 8).

i

60 DORIT HOCHBAUM AND J. M. STEELE

Proof. To prove the first inequality let S be an optimal choice of k centers. for {Xl, X2, * * . 1 x,}. If xi 0 S the inequality is trivial, so suppose xi c S. We now choose a suboptimal set S' of k centers for {Xl, X2, * * * , ~i, x,}. First let

N ={xi: ~iXi Xi~l
If N:~ {x,} take any x'E N\{xj} and set S'= (S\{xj}) U {x'}, but if N = {xi} just take any X'E {Xl, X2, ' ' ' , Ģi, ' ' * , Xn} and define S' as before.
We now serve {xl, x ~i, . . . , x.} by S' as follows. If N = {xi} then each
point is served by the same center as it was served by in S. If N7L {xi} then the
elements of N\{xj} are served by x', and the others are served as before. The
cost of this suboptimal choice is bounded by

n
M,(k; Xl, X2, Xn )+ ~lx,  x'~IP 1(x, E N\{xj})

n
M, (k; Xl, X2, Xn) + 2PS 1Oxi  xillp  8).

The second inequality (2.7) is easier. Let S be an optimal choice of centers for 1xl, X2, 9 ib 5 Xn} and note that x, can be served by an element of S at a cost less than

2P ~inlix,xillP+maxminllx,xillP :52P(mi+8).
j:j:p~i trS

For brevity we set M,, = Mp([an]; X, X2,. XJ. Our first objective is to show that EX,  cn p12 . The method begins as in the classical approach taken by Beardwood, Halton, and Hammersley (1959) in the study of the traveling salesman problem. As noted in the introduction, the main novelty here is due to the necessity of overcoming the fact that M. fails to have the monotonicity M,,:_SM,l. The impact of this nonmonotonicity is even more strongly felt in the next section. (The desire to understand a subadditive Euclidean functional which failed to be monotone provided the second principle motivating this work.)
We now let 11 denote a Poisson point process on W. For any Borel A c R', II(A) will consist of a set of NA points uniformly distributed in A, where NA is itself a Poisson random variable with mean A (A), the Lebesgue measure of A.

Lemma 3. 1. Let A = [0, t]' and set 0 (t) = EMp QaNA J; II(A)), then for all integers m ~, 1 we have

(3.1) 0(t)SM20(t1M).

Steinhaus's geometric location problem for random samples in the plane 61

Proof. Let A be divided into m' cells Q of side t/m, then by the suboptimal
ity of local optimization we have

M2
M,GaNA1; II(AW5 MQaNDJ; ll(Q)).

On taking expectations and using the homogeneity of H we obtain (3.1).

Lemma 3.2. If 0(t) is any continuous function which satisfied (3.1) for all m then

(3.2) lim 0 (t)l t' = lim inf (p (t)l t2

Proof. By the continuity of 0 (t) and the definition C, =_ lim, inf (P (t)/t2 we can choose an interval (a, b) such that

,0 (ty t2 C." + 8
(3.3)

for all t E: (a, b). By (3. 1) we can conclude that also we have (3.3) for all
tEUZ=l(ma,mb). Since Im=(ma,mb) and I,,, intersect for all m~:
a (b  a)', we see U' (ma, mb) contains (moa, and, therefore,
m

lil~ SUP 0 (t)l t2,< C~", +

which completes the proof.

Lemma 3.3. For l:5 p < 4 we have for x T 1 that

(3.4) (EM.)xC.,,]F(2p/2)/(1x)'"'
n1

and

n
2p12
(3.5) EMk  C, (2  p/2)'n
k=l

Proof. Calculating 0(t) by conditioning and making a change of scale from
[0, t]2 to [0, 1]2 shows that (3.2) can be written out as

tp(EM
(3.6) 0(t)= )el't2'1n!  C.,t 2.

By changing variables t2 = U we see that

1p12
(3.7) (EMn)euln!C.,,u
n=l

Now by the Abelian theorem for Borel summability (e.g. Doetsch (1943), p.

i

62 DORIT HOCHBAUM AND J. M. STEELE

191) and the fact that 1p/2>1 we have as x 1 that

(3.8) ~ (EM,,EM,~)xC.,,F(2p/2)/(1x)'"2.
n=l

Multiplying by (1  x)' then completes the proof of (3.4). Since EM, ~: 0 we get (3.5) from (3.4) by an immediate application of the Karamata Tauberian theorem (Feller (1971), p. 447).
We should now like to 'differentiate' (3.5) in order to obtain the asymptotics of EM,. Fortunately, the next lemma shows that this is (just barely) legitimate.

Lemma 3.4. If 1Z = 1 Mk  cn'y for y > 1 and Mk+1 ~~ Mk  BVI2 for some B and all k k 1 then

(3.9) m.  cyn"'.

Proof. Let y > 1 be chosen and note that

jY2 n(y  1) mn  B y2.
Mk  Mn B 2: 2: j
n;gk;Syn n;Sk:Syn ( i.n k=ni=n

Dividing by n^l and using the EulerMaclaurin summation formula to handle the double sum gives

(y'y  1)c  (y  1) lim. sup (ntnln'yl)  (.1  1)'{yl(y'y  1)  (y  1)}B.

Next dividing by y  1 and letting y 11 shows

,yc !, lim sup (mnln'y').

In a completely analogous way one can show that lim sup (mnln11) k_ yc by estimating the SUM Y,nlkln Mk where y < 1.

The next lemma is the main consequence of this section.

Lemma 3.5. For 1 p< 2 we have EM,,  Q,,n 1p12 as n

Proof. We already have Y,"t. 1 EMi  C,, (2  p/2)1 n'P/'. By Lemma 3.4 with 1 < y = 2  p/2 it suffices to show

(3.10) EM, , 1 ~t EMl  B 1  P'2

for some B and all 1. By Lemma 2.4 (with sample size n + 1 and kn,l) and
by Lemma 2.2 (if [(1 + 1) a ] > [ la ]) we have
1+1
(3.11) M1kMI2P8 1(11XIXilip:58)([(1+1)all(la)])AP.

Here by Lemma 2.3, 8 5 p,,1[Iallp12 ; and by Lemma 2.2, Alp 1[1a]1p12.
p

i

Steinhaus's geometric location problem for random samples in the plane 63

By elementary estimates we then see that

1+1 lk
E 1 (1iXi  xi lip < .5)

is bounded, so taking expectations in (3.11) yields (3.10).

4. Completion of the proof

The results of the preceding sections will now be brought together to prove
Theorem 2. The only new tool required is the recent result of Efron and Stein
(1981) which says that the jackknife estimate of variance is positively biased.
Explicitly, we first suppose that 8(xl, x, . * * ' X.,) is any symmetric function
of n  1 vectors xi. For each i we set Si = S(Xl, X2, Ģi, x,) and also set

In
S.= l/n , =, Si. If X, are any independent and identically distributed random vectors, the EfronStein jackknife inequality says that

n
2
(4.1) Var S(Xl, X2, X,) ~ E (si  S.)

We shall now apply this inequality with the aid of the combinatorial bounds of Section 2.

Lemma 4. 1. If Xi, 1  i <  are independent and uniformly distributed on [0, lF, then for a constant C not depending on n we have

(4.2) Var M, ([an]; X,, X2,, X.) =Var Mn:5 Cn1P.

Proof. We first note that if S is replaced by any other variable, the right side of (4.1) is only increased. Using (4.1) and Lemma 2.4 we now calculate (with 8 = 8,Qnal; X,, X2,. X,1)):

n+l
Var Mn E (Mp ([an]; X,, X2,  X+1)

(4.3)  Mp ([an]; X,, X2,. X,,))'
n+l n+l
E 2P 8 1 (IIXi  Xi 11P S 8) + 2P8 + 2P min 11X.  X. 11 P)2.

Replacing 8 by P"n[anl'p' = pn will by Lemma 2.3 only increase the right
p
side, so using Vinogradov's symbol to ignore irrelevant constants we have

2 1 (IIX.  Xlilp g; P.)2 _ X1112p
IIX.
(4.4) Var Mn<< nE Pn + p2. + min

i i i i i
i

64 DORIT HOCHBAUM AND J. M. STEELE

Now, since p2,<< nP, it is an elementary calculation to show

2 (n+l 2
<< E n << 1
E (ny loxi pn))

and
IIX  X11J2p << np.
E min
j:j:~l
These bounds and (4.4) imply (4.2) and complete the lemma.

We now note that Var (MnIn 1p12) << l/n and that this bound is not sharp enough to automatically imply complete convergence. It is therefore necessary to resort to a subsequence and maximal argument to prove Theorem 2.
By the bound (4.2), Lemma (3.5), the BorelCantelli lemma and Chebyshev's inequality one can easily show

(4.5) lim, Mn,/n k" = C, a.s.
nk

for the subsequence nk = [k'y] for any y > 1.
We now set
1p/2
Dk= max JM,,/nl,"'M,/nk 1
n, Zn
and note that to complete the proof of the theorem it suffices to show Dk 0
a.s. For this it certainly suffices to show E Ek= 1 D2k <
We set an = IMn,j/(n + 1)102  MnIn 1p12 j and note
112 2P12
an << IM.,,Mni/n + MnI n
p/2 By Lemma 2.1 we have Mn << n` ' and if [a (n + l)] = [an] the same estimates used in (4.4) will show E(Mn,l _ Mj2 < Now we can also check that Mn cannot be much bigger than Mn+1. Setting k =[an] we have by Lemma 2.2 that

Mn:5 M(k + 1; X,, X2,. X.) + J3'nk 1p12 and by Lemma (2.4) (and (2.3)) that

M(k + 1; X,, X2,.. ., XJ:55 M(k + 1; X,, X2,. X, X,1) n
+ 2P8 (1 11X  XnAl" _= 8)

where 8 =_ O'P'(n + 1)(k + l)1p12 . Together these bounds and elementary calcu
lations show in the case that [a(n+l)]=[anj+1 that one again has
M
j2 2 << 2.
E(Mn,l << n1, and hence Ean l/n

i i

1 i i
i 1

Steinhaus's geometric location problem for random samples in the plane 65

The final calculation is that

2 2 2
ED2k = E(E a.)'= (n,,,  nj (Y_ Ean ) << kl I n << k

where the three sums are each over the range nk ~1 n < nk+1.

This verifies E IZ= 1 D'k <  and completes the proof of Theorem 2, except for verifying that indeed C, > 0. To show this last fact we set

Zn = 1 1 l(IIXi XJJ:5 PIVn)
n 1ii.

and note that easy calculations show

(4.6) E4 > O'iT/2 as n

and
(4.7) Var 4, > 0 as n

Since M. is the sum of n  [na] elements of S = {J1Xi  XilIP: 1 t5 i
(4.8) M. ~: (on' 1. 1(nZ. < (n  [na])/2). (n  [na])/2.

Taking expectations in (4.8) we have

(4.9) n p121 EM,, k p P^4 < (1  a)/2) . (1  a)12.

t! For p2 < (1  a)lir, Equations (4.6), (4.7) and Chebyshev's inequality will
suffice to show that the right side of (4.9) is bounded away from 0. This shows
C, > 0 and completes the proof.

5. Algori~c hnplications

The fact that the Kmedian problem has been proved by Papadimitriou (1981) to be NPhard means that it is extremely unlikely that there is an efficient algorithm for calculating the optimal choice of an centers from n sites (d. Karp (1972)). Therefore, since the Kmedian problem occurs in a variety of practical contexts, it seems quite desirable to find efficient algorithms which are capable of providing approximate optimal center selections.
The results of this article take a step toward this by providing an estimate for the value of an optimal selection. This value can be used in the construction of approximately optirnal probabilistic algorithms for the Kmedian in a manner which is completely parallel to the way the asymptotic optimal value provided by Beardwood, Halton, and Hammersley (1959) has been used by Karp (1977) in the study of the traveling salesman problem. One algorithm of this type for

i
1

i

66 DORIT HOCHBAUM AND J. M. STEELE

the Kmedian problem (but with K< log n) has already been constructed in Fisher and Hochbaum (1981).

6. Concluding remarks and open problems

One of the motives for investigating the functional

n
M,([an];X1,X2,,X.)= min IminliX,X,~IP
S:ISI=[n~1j=1 ic=S

for general p is the trite observation that as p  >  we have
M11P , min max min IIX,  xi 11 = H..
p S:1SI=[n.] 1izgn jeS

The functional Hn is of independent interest and it was hoped that the present methods might throw some light on its probabilistic behavior. We now believe that nIHn converges in distribution, but we have no idea how this might be proved. Since our methods seem to require 15 p < 2 and are more pertinent to strong laws, an entirely new technique may be needed.
There are also basic open problems directly concerning Mn
MI, ([ozn 1; X,, X2, . . . , Xn). In particular, it seems almost certain that a result analogous to our Theorem 2 must hold when the Xi are independent, identically distributed, and bounded. The methods used in Steele (1981a,c) seem to fail to help in the location problem because of the difficulty of establishing the intermediate result for step densities.
Finally, there is the question of determining C,. This is usually hopeless, but perhaps not in this case. FejesT6th (1959) was able to determine the analogous constant in the original Steinhaus problem, and McClure (1976) has been able to extend the work of FejesT6th to other functionals and extremal problems.

References

BEARDWOOD, L, HALTON, H. J. AND HAMMERSLEY, J. M. (1959) The shortest path through many points. Proc. Camb. Phil. Soc. 55, 299327.
BOLLOBAs, B. (1973) Ite optimal arrangement of producers. J. London Math. Soc. (2) 6, 417421.
CoRNuEjoLs, G., FisHER, M. L. AND NEmHAusFR, G. L. (1978) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Management \$ci. 23, 789810.
DOETSCH, G. (1943) Theorie und Anwendung der Laplace Transformation. Dover, New York.
EFRoN, B. AND STEIN, C. (1981) The jackknife estimate of variance. Ann. Statist. 9, 586596.
FELLER, W. (1971) An Introduction to Probability Theory and its Applications, Vol. lI, 2nd edn. Wiley, New York.
FE1EST6TH, L. (1959) Sur la repr6sentation d'une population infinie par un nombre fini d'~16ments. Acta Math. Acad. Sci. Hungar. 10, 299304.

Steinhaus's geometric location problem for random samples in the plane 67

FISHER, M. L. AND H0CHBAUNI, D. S. (1980) Probabilistic analysis of the Kmedian problem. Math. Operat. Res. 5, 2734.
KARP, R. M. (1972) Reducibility among combinatorial problems. In Complexity of Computer Computations, ed. R. E. Miller and J. W. Thatcher, Plenum Press, New York, 85104.
KARP, R. M. (1977) Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane. Math. Operat. Res. 2, 209224.
MCCLURE, D. E. (1976) Characterization and approximation of optimal plane partitions. Technical Report, Division of Applied Mathematics, Brown University.
PAPADINIrTRIOU, C. H. (1981) Worst case and probabilistic analysis of a geometric location problem. SIAM J. Computing 10, 542557.
STARRET, D. A. (1974) Principles of optimal location in a large homogeneous area. J. Econom. Theory 9, 418448.
STEELE, J. M. (1981a) Subadditive Euclidean functionals and nonlinear growth in geometric probability. Arm. Prob. 9, 365376.
STEELE, J. M. (1981b) Optimal triangulation of random samples in the plane. Arm. Prob.
STEELE, J. M. (1981c) Growth rates of minimal spanning trees of random samples in space. Z. Wahrscheinlichkeitsth.
STEINHAUS, H. (1956) Sur la division de corps mat6riels en parties. Bull. Acad. Polon. Sci. 49 801804.

1
1