The Traveling Salesman Problem

Edited by E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, D. B. Shmoys (g) 1985 John Wiley & Sons Ltd.

Probabilistic analysis of heuristics

R. M. Karp

University of California, Berkeley

J. M. Steele

Princeton University

1 INTRODUMON 181

2 PROBABiLisTic ANALYSIS OF THE EucLmEAN TSP 183

2.1 Some elementary facts about the TSP 183

2.2 The fixed dissection algorithm 185

2.3 Two asymptotic probabilistic results 186

2.4 Probabilistic background 186

2.5 A simple proof of the BHH theorem 188

2.6 Analysis of the fixed dissection algorithm 190

2.7 The Euclidean directed TSP 193

3 PRoBABILIsmc ANALYSIS OF THF AsymmEmc TSP 194

3.1 The assignment problem and the patching operation 195

3.2 A probabilistic bound on the largest cost in an optimal assignment 197

3.3 Analysis of the patching algorithm 199

3.4 Open questions 203

1 INTRODUCTION

One of the mysteries surrounding the TSP is the remarkably effective performance of simple heuristic solution methods. The properties of such heuristic methods are usually established empirically, simply by trying the methods and observing the quality of the results. The present chapter explores a complementary theoretical approach, in which it is assumed that problem instances are drawn from certain simple probability distributions, and it is then proven mathematically that particular solution methods are highly likely to yield nearoptimal solutions when the number of cities is large. This analysis also reveals that the cost of an optimal solution to a random TSP is sharply predictable from the parameters of the underlying probabilistic model.

Two principal probabilistic models are discussed: a Euclidean model, in which the cities are points in ddimensional Euclidean space and their

181

1

i

i

182 6 Probabilistic analysis of heuristics

locations are drawn independently from the uniform distribution over the ddimensional unit cube, and an asymmetric model, in which the elements of the distance matrix (cii) are drawn independently from the uniform distribution over [0, 11 and neither symmetry nor the triangle inequality is assumed. Section 2, which treats the Euclidean model, and Section 3, which treats the asymmetric model. can be read independently. The reader should also refer to Chapter 11, where a random graph model is considered, in which the input is a random graph and the object is to determine whether a Hamiltonian cycle exists.

Within each of our two models certain predictions will hold true with very high probability when the number of cities is very large. In the ddimensional Euclidean case it is predictable that the cost of an optimal

W111d

solution will be close to a certain constant times n ' ' where n is the number of cities, and that simple, efficient algorithms based on a 'divideandconquer' principle will yield nearoptimal solutions. In the asymmetric case it is predictable that a nearoptimal solution to the TSP can be obtained by patching together the cycles of an optimal solution to the assignment problem.

The study of random Euclidean TSPs was initiated in the pioneering paper by Beardwood, Halton & Hammersley [ 1959 1, where the following is proved:

Let {Xi}, 1 i < , be independent random variables uniformly distributed over the d dimensional unit cube, and let L, denote the Euclidean length of a shortest closed path which connects all the elements of {X,, X2,. . ., XJ. Then there is a constant c. such that, with probability 1 , lim. _ L. n (d 1)1d = c, We will give a new and brief proof of this classic result in Section 2.

The study of cellular dissection algorithms for the approximate solution of random TSPs in the plane was initiated by Karp [ 1976, 1977 1. The locations of the n cities are assumed to be drawn independently from the uniform distribution over the unit square. The algorithms dissect the unit square into rectangular cells, each containing a small number of cities, construct an optimal tour through the set of cities in each cell, and then patch these subtours together into a tour through all the cities. Karp proposed a fixed dissection method in which all the cells are congruent squares, and an adaptive dissection method in which the locations of the cities determine the dissection. Refinements of the analysis of the fixed dissection method are due to Weide [ 1978] and Steele f 198 1 ]. The method is extended to higher dimensions by Halton & Terada [ 1982 ].

Section 2 presents the principal dissection methods, discusses their execution times and derives theoretical bounds on the quality of the solutions they produce. In each of the methods there occurs a parameter s(n) giving the number of cells into which the unit ddimensional cube is dissected. A typical, but simple, consequence of the results proved there is the following:

Assume that s (n) = o (n), so that the average number of cities per cell

i

i

1

2 Probabilistic analysis of the Euelidean TSP 183

grows without bound as n>. Let L.F(Xl,X2_ X.) be the length of the tour produced by the fixed dissection method; then, with probability 1, lim,LFn Idllld= cd, where is the same constant that appears in the

n Cd

statement of the BeardwoodHaltonHammersley theorem. Thus we see that the length of the tour produced by the fixed dissection method has the same asymptotic behavior as the length of the optimal tour.

Section 3 is concerned with probabilistic properties of the asymmetric TSP. It is assumed that the distances cil are drawn independently from the uniform distribution over [0, 11. Then the minimum cost of a tour is given by min,{Yi ci.(i)}, where 7r ranges over the cyclic permutations of ~l, 2. . , n}. A closely related problem which can be solved in time 0(n') is the assignment problem: min.~Yi c~,(i)}, where o, ranges over the permutations of {l, 2n}. An approximation algorithm for the TSP is presented which first solves the assignment problem to obtain a permutation a, and then constructs a tour by patching together the cycles of o. Let T* be the cost of an optimal tour, and let T be the cost of the tour produced by the approximation algorithm. It is proven that E[(T T*)1V1 = 0(n 112 ). Thus the approximation algorithm tends to give a nearoptimal tour when n is very large.

2 PROBABILISTIC ANALYSIS OF THE EUCLIDEAN TSP

The Euclidean TSP is the problem of finding a closed path (tour) of minimum length through a given set of points in ddimensional Euclidean space. We conduct a probabilistic analysis of this problem on the assumption that the points are drawn independently from the uniform distribution over the ddimensional unit cube. The probabilistic analysis will be concerned both with the length of the optimal tour and with the lengths of the tours produced by certain dissection algorithms. Each of these algorithms dissects the unit cube into regions, constructs an optimal tour through the points in each region, and then patches these tours together to obtain a single tour through all the given points.

2.1 Some elementary facts about the TSP

It is clear that the shortest closed path through n given points is always a simple polygon (unless all the points are collinear) but it will be convenient in describing certain approximation algorithms to allow closed paths which make repeated visits to some of the given points. Such a closed path can easily be converted to a simple polygon of smaller length.

The following lemma, illustrated in Figure 6.1, will often be useful.

Lemma 1 Let V be a set of points and E a multiset (i.e., a set which may contain repeated elements) of line segments joining pairs of points in V. Suppose that any two points in V can be joined by a path consisting of line

Figure 6.1 A multiset of line segments which determines a closed walk

6 Probabilistic analysis of heuristics

segments from E. Then the following are equivalent:

(i) Every point in V is the endpoint of an even number of line segments in E. (ii) E can be decomposed into a union of edgedisjoint cycles.

(iii) The line segments in E determine a closed walk through all the points in V.

The following bound will be used repeatedly.

Lenuna 2 Let I,, be a set of n points in the ddimensional unit cube, d 2.

(d 1)1d W2)1(d1)

Then there is a closed walk through 1, of length dn + 6dn

where 8d depends only on d.

Proof The proof is by induction on d, with d = 2 as the basis.

Basis. Let A = W n 1121 . Let the unit square be dissected into horizontal strips 51, S2 .., Sp,1121 of width A. Adjoin to I. new points at the righthand ends of the boundaries between S, and S2, S3 and S4,. . ., and at the lefthand ends of the boundaries between S2 and S3, S4 and S,, . . ., and call the resulting set of points I,,. Construct a closed walk which visits the points in s, n i,,, in lefttoright order, then visits the points in S2nin' in righttoleft order, etc., finally returning to the initial point from the last point of the final strip. The length of this tour is

f n 1121 +,&(n + [ n 1121 ) +

2,/_n_+ 2 + ~2.

Induction step. Assuming the result for d, we prove it for d+ 1. Let

4=11[n 1/(d+1)l . Let the unit cube in Rd' be dissected into parallelopipeds

S1, S2.... 5 S[n"d's), by hyperplanes parallel to one of the coordinate axes

and a distance A apart. Adjoin to I,, an arbitrary new point on each of these

hyperplanes, and call the resulting set of points I'. Let ni =is, n I'l. For

n n

each i, construct a closed walk through the points in si n I,' as follows:

(i) Project each of the points in si n i, onto the hyperplane which forms the base of Si.

(ii) Find a closed walk of length dn ~d Md + 8dn ~d2)1(d1)

1 1 through the set of

2 Probabilistic analysis of the Eudidean TSP 185

projected points (this entails iterative use of the construction being described here).

(iii) Visit the points of sin i, in the same order as the corresponding

n

projected points are visited.

By Lemma 1, the union of the set of closed walks constructed in this way determines a closed walk through all the points in I', and hence through all the points in the subset I..

The length of this walk is

rn Md, 11

(dn ~dllld+ 8dn ~d2)1(d1)) + (n+ In 1/(d+l) pA.

A concavity argument shows that this bound is maximized subject to 1 ni n + 1/A when each ni is equal to n A + 1, in which case the bound becomes

(d (n A+ 1Yd1)1d+ 8d (n A+ 1)(d2)1(dl)) +(n + In Il(d+1)l

(d + 1)n dAl + 1) + 0(n (dl)/d

This completes the induction step. n

Lemma 2 is sufficient for our purposes, but stronger results are possible. Few [19551 has shown that there is a closed walk of length 2 112 n "2 + 1.75 through any set of n points in the unit square. For higher dimensions, tight upper bounds on the length of the shortest closed walk through n points in the unit hypercube are given by Moran [1982].

2.2 The fixed dissection algorithm

Any method for the exact solution of the TSP can be used in conjunction with various divideandconquer strategies to produce interesting approximation algorithms for the Euclidean TSP. An example is the following fixed dissection algorithm fKarp, 1977; Halton & Terada, 19821. Let the dimension d be fixed. Let m(n) be an integervalued nondecreasing function such that m(n )d is o (n). Let s (n) de note m (n )d.

Fixed dissection algorithm

Input: A set I,, consisting of n points in the unit ddimensional cube Q. Partition Q into s(n) congruent subcubes Q. Construct an optimal tour through each nonempty set i. n Qi. Select an arbitrary element Xi from each nonempty set i,, n Qi, and obtain a tour through {Xi} using the construction in the proof of Lemma 2.

The tours within the subcubes, together with the tour through {XJ., determine a closed walk through I..

186 6 Probabilistic analysis of heuristics

2.3 Two asymptotic probabilistic results

The Euclidean TSP will be analyzed on the assumption that the cities are drawn independently from the uniform distribution over the ddimensional unit cube. Let Xi, 1 i < , be independent identically distributed (i.i.d.) random variables with the uniform distribution on [0, 11`. Let I,=

{X,, X Xj. For any finite set S c [0, 11', let L(S) denote the length of

an optimal tour through S, and let L'(S) denote the length of the tour

produced by the fixed dissection algorithm.

Theorem 1 [Beardwood, Halton & Hammersley, 19591 With probability

lim L(I.)n`d1)1d= Cd. n

(Here, cd is a constant depending on the dimension d.) Theorem 1 will be referred to as the BHH theorem. Theorem 2 With probability 1,

F (d 1)1d

liffl L Qj n Cd.

Roughly stated, these theorems assert that the length of an optimal tour

through n random points is sharply predictable when n is large, and that the

fixed dissection method tends to give nearoptimal solutions when n is large.

2.4 Probabilistic background

In this subsection we collect the principal definitions and tools from probability theory that will be used in our analysis of the Euclidean TSP. We do expect that the reader be familiar with some basic facts from elementary probability theory, such as E[Y_Xil=JE[Xil and var[cX]=c'var[Xl. A detailed introduction to probability theory is given by Feller [ 1968

Markov's inequality

Let X be a nonnegative random variable with mean g. Then, for a 1, Pr[X > ag 1 < Va.

Chebyshev's inequality

Let X be a random variable with mean tt and variance u. Then Pr[IXli~ao,] 1/02.

EfronStein inequality [Efron & Stein, 19811

Let Xl, X2, ..., X. be i.i.d. random variables. Let S = S(yi, y2, ..., y~,) be any symmetric function of n1 random variables. Let Si=

2 Probabilistic analysis of the Eudidean TSP 187

S(Xl, X2, Xi1, X1,._ X.), and let Y be any random variable. Then

var[S(Xl, X2_ X.JJ'E[ (Si y)2

McDiarmid's inequality

Let X, X2,. . ., X, be nonnegative integer random variables which sum to n and have the multinomial joint distribution; i.e.,

Pr[Xl = n 1, X2 ~ n2,. . ., X, = ns 1 = n! sn

11 n,. 1 *

Let be nonnegative nondecreasing functions. Then

var[li., fi(Xi)]Yi=l var[fi(Xi)l.

Poisson process

The Poisson process on Rd with unit intensity formalizes the idea of scattering points at random in R d so that the average number of points per unit volume is 1. The process is defined as a random function rI which associates with every measurable set A C Rd a random set of points in A such that:

(i) A c: B => n (A) ~: FI (B);

(ii) the integer Ifl(A)l is a Poisson random variable with parameter A ii(A), where ji is Lebesgue measure;

(iii) n(A) and fl(B) are independent if g (An B)= 0;

(iv) conditioned on the event ~II(A)1 = k, the k elements of II(A) are independently and uniformly distributed in A.

A Tauberian theorem [Schmidt, 1925; Bingham, 19811.

If f (k) is monotone increasing, and if, as A lr=o f(k)e'A'Ik! cA',

a > 0, then as n > , f(n) cC.

Stochastic convergence

Let {YJ, 1 n < , be a sequence of random variables and let Y be a random variable. We say Y.+ Y in probability if, for every e > 0, we have

lim Pr[ 1 Y, Y~ > 0

n

A stronger notion than convergence in probability is that of almost sure convergence (also called convergence with probability 1). We say Y,, Y almost surely (a.s.) provided

Pr lim sup Y. = Y = lim inf Y. 1 .

1 n n 1=

1 i 1

i

188 6 Probabilistic analysis of heuristics

To reinforce understanding of the notion of almost sure convergence, the reader might focus on the fact that lim sup._ Y., lim. inf, Y. and Y are all random variables. To say Y,, > Y almost surely just means that with probability 1, all three of these random variables are equal.

The difference between convergence in probability and convergence aImost surely is an important one both in theory and in practice. Particularly in the area of probabilistic analysis of algorithms it is valuable to preserve the distinction. An exercise is given later to show that convergence a.s. implies convergence in probability, but not the converse.

A key tool in the proof of almost sure convergence is the following.

BorelCantelli lemma

If, for every E>O, Y_,=1 PrflY, Y>eJ<, then Y. > Y almost surely.

Complete convergence

The BorelCantelli lemma gives a sufficient condition for the almost sure convergence of Y,, to Y, but the condition is not necessary. When the condition

1 Pr[IY,,Y~>el

holds we say Y. converges completely to Y.

In the section which follows we give a simple proof of the BHH theorem which rests on a subadditivity argument and the EfronStein inequality. These same tools were pushed a bit harder by Steele [19811 to prove complete convergence in place of almost sure convergence.

2.5 A simple proof of the BHEL theorem

The notions of the preceding section will now be applied to give a simple proof of the BHH theorem.

Theorem 3 Suppose Xi, 1 i < , are i. i.d. with the uniform distribution on [0,11d. Jet J.={Xl,X2_.,XJ. Then L(IJIn(dlVd,cd almost surely, where cd is a constant depending on the dimension d.

Proof To prove this theorem we first get the asymptotics of the expected values, i.e., we show E[LQJ1cdrt (dl)ld as n For this purpose it is handy to first consider a related situation in which the points are distributed according to a Poisson process.

Ut rI denote the Poisson process on R d with unit intensity; in particular, rj([0, tld ) denotes the set of points that fall in the cube [0, tld. Let F(t)

E[L(II([0, tld))]. Then

F(t)= Pr[III([0, tld)i = nIE[L(II([0, tld)) J J11([0, tld)l = n I.

n=0

i

2 Probabilistic analysis of the Eudidean T8P 189

Recalling that 1IfflO, t1`)l has a Poisson distribution with parameter t', and noting that, by an obvious scaling argument, an optimal tour through n points drawn independently from the uniform distribution over [0, tl' has expected length tE[L(IJ], we 'obtain

F(O = i e'd t dn tE[L(IJ], (1)

n=0 ~!

so information about F(t) should give us information about E[L(Ij].

By decomposing [0, tId into congruent subcubes Qi of side t/m and applying the fixed dissection algorithm, we obtain

md

L QI ([ 0, tId))__ L(II(Q)) + t(dMd1 + SdMd(d2)1(d1)

The first terms come from the optimal tours within the subcubes. The second term is from the bound of Lemma 2 applied to a set of arbitrarily chosen points, one from each Qi with ll(Q):~ 0. Taking expectations,

F(t) __ Md F t ) + t(dMd1 + 8dMd(d2)1(d1)

(M

Setting t = ms, we obtain

Rms) F(s) d m 1/(d1)

(MS)d S d + 8d M=1,2 .....

S d S d1

This inequality, together with the fact that F(t) is monotone and F(t)l td is bounded, implies that F( t)ltd approaches a limit cd as t

Returning to (1) and letting U = td, we see

U _ CdU(d1)1d. E[L(IJIe'n!

Since E[L(IJI is monotone, the Tauberian theorem of Section 2.4 tells us at once that

(dl)ld

E[L(Ij] Cdn

To bound the variances of the variables LQj we apply the EfronStein

inequality with S(Xl, X2,. L(I.1) and Y = LQj. This gives

[ (jj)2

var[L(I,,1)1E (L({Xl, X2 ., Xi1, Xij_ . ., Xj)L

nE[L(I.1)L(I. )12.

Since an easy calculation shows

E[L(I~1)L(In )12 = 0(n 211) and hence

var[ L Qj 1 = 0 (n (d211d).

190 6 Probabilistic analysis of heuristics

This implies var[L(I,,)n(dl)ld 0(n'), and taking nk = k' we get

var (d Wd < 00.

k=l k

By Chebyshev's inequality, we have for any E> 0

Pr[IL(I.) E[L(I.)11 > en (d Wd E 2 var

so, by the asymptotics of E[LQJ], we have

(dl)ldl (dl)ld (d Wd)

Pr[IL(I.)crt > En Pr[IL(I.)/(n cl E 1 < 0C.

k k k

k=l k=l

The BorelCantelli lemma then says,

(d 1)1d

L (In,) cn k with probability 1 as k (2)

Since nk,l/nk 1 and L (Ink L (In) L (Ink ~ 1) for nk n < nk,,, (2) implies the desired result that L (In) cn (d Wd with probability 1 as n

2.6 Analysis of the fixed dissection algorithm

First we will recall our model and the procedure we have called the fixed dissection algorithm. By Xi, 1 i < N, we denote independent random variables with the uniform distribution on the unit cube Q in Rd. and L. = L. (X,, X2 XJ is just the length of the minimal tour through M, X2, . . . , XJ

The fixed dissection algorithm consists of dividing the cube Q into s congruent subcubes Q, solving the TSP within each subcube for the data Q, n m, x, .. , x.} = Di, crudely touring a set R consisting of representatives of all of the nonempty Di, and crudely deleting excess edges to convert the resulting closed walk to a simple polygon.

The main objective of this section is to prove two results which respectively assert the effectiveness and the efficiency of the fixed dissection algorithm. First we'consider the effectiveness and give an elementary proof (not using BHH) that the ratio L.'1L. converges completely to 1. Specifically, we prove the following.

Theorem 4 If cr is an unbounded increasing function of n and s = n/cr, then

F

Pr Ln ~1+e <,VE>O. n=l L.

The proof will be obtained by two reasonably easy lemmas, the first of which asserts that L. is unlikely to be small compared to n 1d111d The proof of the first lemma is one of our listed exercises.

Lemma 3 There exist constants A > 0 and 0 < p < 1 such that for all n 1,

Pr[L.

2 Probabilistic analysis of the Eudidean TSP 191

The second lemma shows that L'~' is bounded above by L. plus a quantity

W1)1d which is deterministically small compared to n

Lemma 4 There is a remainder r, such that

L,, L Fn Ln + r,,

and rn = 0(n (d Wd (7_ 1/Wd1)) ), where the 0 depends only on d (not on n or a).

Proof Let T denote the optimal tour through {X,, X Xj. For each

f ace Fil, 1 j 2d, of Q we consider the set of 'marks' where an edge of T

which connects a point of oin{XI,X2,...,X,,} to a point of Qkn

{X, X2, . .. , XJ, k 71 i, intersects Fjj. We let Mjj, 1 i s, 1 j 2d, denote

the sets of marks in Fil. Trivially, the cardinality of Mj, (denoted by 1Mii I) is

even and is bounded by 2 nj = 2 1 Qi n {x,, x, xjj, the total number of

points in Q.

We will now get an upper bound on L'. Recalling the definition of L' and n

the bound of Lemma 2, we get

LF (d Wd + 0 (5 (d 1)1d)}.

n L.(Q)+{ds

Our main task is to bound the sum above using parts of the optimal path and extra lengths of lower order. Let Q n T denote the segments of T contained in Q. From each nonempty Mjj, 1 j 2d, choose an arbitrary element as its representative. To obtain a tour through Qi n {X, X2,. . ., XJ, add segments to Qi n T as follows: for each nonempty Mjj, a tour through Mj,; a tour through the representatives of the nonempty sets M,,; for each nonempty Mii of even cardinality, a perfect matching of the elements of Mij; for each Mii of odd cardinality, a perfect matching of the elements of Mii other than the representative; a perfect matching of the representatives of the Mil of odd cardinality (there will be an even number of such representatives). Note that a perfect matching can be constructed by taking alternate edges of a tour through an even number of points. This procedure yields a walk through Qi n {X, X2~ .... Xj and the cost of this walk is bounded by

2d

length(T n Qi) + ((d 1) 1Mijlld2)/(d~) + Sd1 X, M3Md2))S11d

+{d(2d )(dl)ld + 0 ((2d )(dl)ld)}s 11d

2d

+ ((d 1) ~Mij~ld2)1(dl)+ 8d1 XJ (d3)1(d2))slld

+ W (2d )(dl)ld + o ((2d )(dl)ld)~ S_ Ild.

FK:

192 6 Probabilistic analysis of heuristics

Summing over all the cells of the dissection,

1Mii Ild2)1(d1) + 8d_l 1Mij l(d3Md2) lid

(d 1) S

+2{d(2d )(dl)ld+ 0Q2d )(dl)ld)}S(dl)ld.

Since Y5=ll:'.,1Mijl12n and the functions X W2ffid1) and X1d3)/W2) are concave, the above expression is bounded above by the value it assumes when each M,, is 2n/(sd). This gives

E L.(Q) L. + 0(n (d2)1(d1) S 1 1(d (d 1)) )+O(S (d 1)1d

The substitution n/o s completes the proof of the lemma. G

Proof of Theorem 4 This is now easily given:

P4L..>l+e Pr[L.

The first summand is summable since it is dominated by a term going to 0 geometrically, and the second term is summable since it is 0 for all sufficiently large n. n

The execution time of the fixed dissection method clearly has the same order as the time to solve all of the TSPs within the subcubes. We assume that dynamic programming [Bellman, 1962; Held & Karp, 19621 is used to solve the TSPs within the subcubes. Since dynamic programming solves an xcity TSP in time bounded by f(x) = AX2 2', the order of the execution time of the fixed dissection procedure is bounded by order of

S

T. f(ni)

where nj denotes the number of elements of the set Qi n{xl, x2.... 1 Xn}. We shall derive upper bounds on the mean and variance of Tn. As a first step we bound the mean and variance of f(n) for a fixed i. The random variable nj has a binomial distribution:

Pr[ni k 1 = (n)

k S) S)

Hence

(n) (,)k(l 1)nk

2 k

E[f(ni)l Ak 2

k=0 k S S

and

)21 2 4 k (n)(1)k(l lyk.

E[f(ni A k 4

k=0 k S S

Using the identity

(n)Xk

k(k1) ... (k i + 1) n(n 1) ... (n i + 1)xi(l + x)`

k=0 k

2 Probabilistic analysis of the Eudidean TSP 193

one obtains, after some algebraic manipulation, that, as n oo and s 00 in such a way that o,=n/s>oo, E[f(nj]4A0,2 e' and var[f(ni)l256A 104 e'. Since expectations add, E[ Tn 1 4Anoe and, by McDiarmid's inequality, var[ Tn 1 25 6A 2 no,3 e 3 . Thus, for the specific choice 0, = [log n 1, the expected execution time of the fixed dissection method is 0(n 2 log n) and the variance is 0(n'(log n)').

2.7 The Euclidean directed TSP

Karp [19771 posed the problem of formulating a probabilistic model of the directed TSP for which one can find an algorithm that runs in polynomial time with high probability. One such formulation was given by Steele [19851; although, as we will see, the available results are far less complete than for the undirected case.

To specify the model let Xi, 1 i < , be independent random variables with the uniform distribution on [0, 1f. As the vertex set of a random graph G. we take Vn = {X,, X2,. . . , Xn}. To define a set of directed arcs for G. we first suppose that for 1::zi

By a solution to the directed Euclidean TSP we mean here a path through V. which has minimum Euclidean length. We denote this length by D,

The results established by Steele [ 1985 ] are the following.

Theorem 5 There is a constant 0 < 0 < such that as n 00,

E[Dn]0,/n.

Theorem 6 There is a polynomial algorithm which provides a directed path through Vn which has length D* satisfying

E[D*j(l+e)E[D.1

n

for all e > 0 and n N().

These results easily generalize to [0, 11', but the extension to almost sure (or complete) convergence results seems to be considerably more difficult than in the undirected case. These extensions remain as open problems.

Exercises

1. Prove that the shortest closed walk through a set of n points in R", not all of which are collinear, is a simple polygon. 2. Prove Lemma 3.

194 6 Probabilistic analysis of heuristics

3. Give your own proof, independent of Lemma 2, of the following fact: Let S be a set of n points in the unit ddimensional cube. Then L(S)

W1)1d

adn ' where ad depends only on the dimension d.

4. Prove that for each dimension d there exist constants Ad > 0 and B, < 1 such that Pr[L(IJ

6. The strips method for constructing a tour through n random points in the unit square dissects the square into 1/4 horizontal strips of width A, and then follows a zigzag path, visiting the points in the first strip in lefttoright order, then the points in the second strip in righttoleft order, etc., finally returning to the initial point from the final point of the last strip. Prove that, when A is suitably chosen, the expected length of the tour produced by the strips method is 0.93Jn = 0(,rn).

7. Let S be a set of points in the unit square 0, and let Q be dissected into rectangles Q. Prove: Y_i L(s n Q) L(S) + 32 Y_ per(Q), where per(Qi) is the perimeter of Q.

8. Generalize the preceding result to d dimensions.

9. Prove the following scaling principle: Let S be a set of n points drawn independently from the uniform distribution over [0, tl'. Then E[L(S)]

tE[L(IJ].

10. Complete the proof that F(t)It' approaches a limit.

11. Prove: L(I.) L(I_1) 2 mini= 1,2,...,nl~iXn Xi 1}. Here 1X.Xil denotes the Euclidean length of the vector X,Xi.

3 PROBABILISTIC ANALYSIS OF THE ASYMMETRIC TSP

In this section we consider a very general form of the TSP, in which the distances between cities are given by an arbitrary nonnegative n X n matrix (cil). Neither symmetry (cii = cii) nor the triangle inequality (cil + Cik ~' Cik) is required. The cost of an optimal tour is min,{Yi ci,(i)}, where ir ranges over the cyclic permutations of {l, 2, . . . , n}.

The asymmetric TSP appears to be a tough nut to crack, since it is MPhard to construct a tour whose cost is within a constant factor of the cost of an optimal tour. Despite this evidence that the problem is difficult, there is a simple heuristic algorithm which, in most instances, will produce a nearoptimal tour. This heuristic is based on solving a related problem called the assignment problem, and then using certain patching operations to convert the solution of the assignment problem into a tour. The assignment problem can be stated as follows:

minimize Ci~(i),

where o, ranges over the permutations (not just the cyclic permutations) of {l, 2n}. The assignment problem can be solved in 0(n') steps.

3 Probabilistic analysis of the asymmetric TSP 195

In order to validate this patching algorithm we conduct a probabilistic analysis on the assumption that the distances cii are drawn independently from the uniform distribution over [0, 1]. Let A* be the cost of an optimal assignment for the n X n matrix (cii), let T* be the cost of an optimal tour, and let T be the cost of the tour produced by the patching algorithm. Then A* T* T. The inequality A* T* follows because the assignment problem is a relaxation of the TSP. The inequality T* T follows because T* is the cost of an optimal solution of the TSP, and T is the cost of a feasible solution.

When the cii are drawn independently from the uniform distribution over [0, 1], A*, T* and T become random variables. It is a surprising fact, first proved by Walkup [19701, that E[A*], the expected value of A*, remains bounded as n. Karp [19841 has proved that, E[A*j<2 for all n. Lazarus [19791 has proved that E[A*j_l+l/e+0(l/n). Computational experiments indicate that E[A*] is close to 1.6 when n is greater than 100.

Our main results here are that

E[TA*1<2.33n 1/2 and E[ T T* 0(n 1/2

F_ 1 =

Thus the expected cost of an optimal tour remains bounded as n and tends to be close to the cost of an optimal assignment. Moreover, the percentage difference between the cost of an optimal tour and the cost of the tour produced by the patching algorithm tends to be very small when n is large.

3.1 The assignment problem and the patching operation

Recall that the assignment problem asks for a permutation (r that minimizes Ii ci,(i), while the TSP asks for a cyclic permutation 7r that minimizes Ii C,,(n. Thus the assignment problem is a relaxation of the TSP, and A*, the cost of an optimal assignment, is a lower bound on V, the cost of an optimal tour. Computational experience indicates that this lower bound is often very tight. In Chapter 10, Balas and Toth report the following experiment.

'We generated 400 problems with 50n_250, with the costs independently drawn from a uniform distribution of the integers over the intervals [1, 1001 and [1, 10001, and solved both the AP and the TSP. We found that on the average v(AP)[=A*] was 99.2% of v(TSP)[=T*I. Furthermore, we found the bound to improve with problem size, in that for the problems with 50 n ~150 and 150n_250 the outcomes were 98.8% and 99.6%, respectively.'

Since an optimal assignment can be computed in 0(n 3) steps, and since A* tends to be a tight lower bound on V, it is natural to solve the assignment problem as a first step towards the exact or approximate solution

i

i

1

196 6 Probabilistic analysis of heuristics

of the TSP. Balas and Toth survey several rather successful branch and bound algorithms for the directed TSP based on the use of A* as a lower bound on T*. In the present chapter we explore a patching algorithm which uses an optimal assignment as the starting point for the construction of a nearoptimal tour. A similar algorithm was presented by Karp [1979], but the analysis presented here is simpler and the results are stronger.

The patching algorithm is based on a patching operation. To explain this

operation it is necessary to discuss the cycle structure of a permutation. With

any permutation r we may associate a directed graph with vertex set

~l, 2.. , n} and arc set 1 i = 1, 2.n}. Each connected compo

nent of this graph is a directed cycle. These components are called the cycles

of r and, of course, a cyclic permutation is a permutation that has only one

cycle.

In general cr, the permutation which is the optimal solution of the assignment problem, will have many cycles. The patching algorithm converts cr to a cyclic permutation by a sequence of patching operations, each of which joins two cycles together. Let 7. be a permutation, and let i and j be elements that occur in two distinct cycles C and D. Then the (i, j)patching operation, depicted in Figure 6.2, joins C and D into a new cycle by

TU)

Figure 6.2 The (i, j)patching operation

inserting the arcs (i,,r(j)) and (j,,r(i)) and deleting the arcs (i,,r(i)) and (j,,r(j)). This operation increases the cost of the permutation by

A(T, i, j) = Ci,(i) + Ci,(i) Ci,(i) Ci,(i).

The following is a statement of the patching algorithm.

Patching algorithm

begin

{a is the optimal assignment and r is the permutation currently being

processed}

while r is not a cyclic permutation do begin let D, and D2be the two longest cycles in r;

i

198 6 Probabilistic analysis of heuristics

Setting 1* = [lg(n + 3) 21, we have l{k 1 dc,(i, k) 1*}1 (n + 1)/2 and J{k 1 dc,(k, i)1*}1(n+ 1)/2. Note that if i is contained in either of the sets {k 1 dc,(i, k) 1*} and {k 1 d,,(k, i) 1*} then the lemma has been proved, and so we assume that this is not the case. Hence there exists a k:~ i such that d,(i,k)1* and dG(kj)_l*, and it follows that i lies in a cycle of length 21* = 2[lg(n + 3)l 4. E

Lemma 7 Let A = (a,,) be an n X n matrix of O's and l's whose elements are independent random variables such that, for each i, j, Pr[aii = 1] = 10 log n/n. Then Pr[A is not an expanding matrix]= 0(n').

Proof Call a set S ~~: {1, 2, . n} rowfaulty if ISI n/4 and jr(S)l 2 ISI, and columnfaulty if IS1 n/4 and jr(S)l 2 JS1. Then A fails to be an expanding matrix if and only if there is a rowfaulty or columnfaulty set. The expected number of rowfaulty sets is bounded above by

(n)( n p)k(~2k)

k~l k 2k

where p = 10 log n/ n. Since

(n)_ e ( n ) 2k n101%

_ (~_) k, _ ' ne and 1p

the above summation is bounded above by

1 /41 2 2 k.

V ine n e _ 10 20k/

k=, k k 2 n n

Since k 1 and n 20kIn n' when k n/4, this summation is .

Given the cost matrix (cii), define the 01 matrix A (C)= (a,,) by

1 if c,, < 10 log n,

aq = n

0 otherwise.

By Lemma 7, P4A (C) is an expanding matrix]= 1 0(n 2).

Let G' be fhe digraph with vertex set {1,2_n} and are set W1 j) 1 Cia(i) < 10 log n/n}. Then G' is an expanding digraph if and only if A(C) is an expanding matrix. Thus, Pr[G is an expanding digraph]= 10(n'). To complete the proof of Lemma 5, we need only prove the following lemma.

Lemma 8 Let o, denote the optimal assignment for C. If G' is an expanding digraph then, for all i, Ci.(i), c' (n).

3 Probabilistic analysis of the asymmetric TSP 1 199

Proof The proof is by reductio ad absurdum. Suppose G' is an expanding digraph and ci,(i)>a(n). By Lemma 6, G' contains a cycle of length t 2 [lg(n + 3)l 4 from i to i. Let the successive vertices along this cycle be

il, i2_ i, where i = il. Consider the permutation * given by:

'&(il) = 0G2), 15G2) = a03), 15(0UG1)

and

la(j) = a(i) for 0 {i 1, i2, 0.

Then

n n

Cia(i) Ci~(i) (ci,~(i~) + + C41M+ C4GO)

+

But, by the way G' was constructed, each term in the first summation on the righthand side is 10 log n/n. Also each term in the second summation is :_:,0, and ci,,(i,)>a(n). It follows that

n n 10t log n

n

contradicting the optimality of tr. n

3.3 Analysis of the patching algorithm

Our main goal in this section is to derive an upper bound on the expected cost of converting the optimal assignment to a tour, using the patching algorithm.

Call the n X n matrix (cii) exceptional if its optimal assignment includes an are of cost>a(n). By Lemma 5 the probability that a matrix is exceptional is 0(n'), and thus the patching costs associated with exceptional matrices cannot contribute more than 0(n') to the overall expected patching cost. Call a cost ci, small if c,,. a (n), and large otherwise. Associate with (cil) a matrix (E,i) defined as follows:

Cil if cii issmall,

oo if cii is large.

If (cii) is not exceptional then (cii) and (Eii) have the same optimal assignment. Thus, for purposes of bounding the overall expected patching cost, we may assume that the optimal assignment is always computed using (i~ii) instead of (c,,). This means that, once a cost is determined to be large, it is never involved in the computation of the optimal assignment a, and thus is not conditioned in any way by that computation. Thus, at the beginning of the patching process, the large costs may be treated as independent random variables, each of which is drawn from the uniform distribution over (a (n), 11.

Now consider the patching step when cycle C, of length r, is patched into

6 Probabillistic analysis of heuristics

cycle D, of length m. Every arc occurring in C or D is of nonnegative cost. The cost of each arc (i, j) between C and D is either small (a (n)) or large (> a (0

The costs cii of the large arcs between cycles C and D are independent random variables with the uniform distribution over (a(n), 11. This holds true at the beginning of the patching process, and none of the computations performed in previous patching steps involve these values, so it remains true at the present step. Thus, the cost of patching C into D is stochastically smaller than it would be if the costs of all arcs within C or D were 0 and the costs of all arcs between C and D were uniform over (a (n), 11. We analyze the patching algorithm under this pessimistic assumption.

Certain properties of the optimal assignment a will be important for our analysis. First note that, since the cii are drawn independently from a continuous distribution, it will be true with probability 1 that no two permutations have the same cost and, as a consequence, that the optimal assignment is unique. Also, the optimal assignment is equally likely to be any one of the n! permutations of {l, 2, . . , n}. To see this, define two n X n cost matrices to be equivalent if one of them can be obtained by permuting the columns of the other. Each matrix lies in exactly one equivalence class. Excluding events of probability 0 such as the occurrence of two equal columns, an equivalence class consists of n! equally likely matrices, and each of the n! permutations is the optimal assignment for exactly one of these matrices.

The fact that the optimal assignment is a random permutation is essential for our analysis. To give the reader a feeling for the cycle structure of a random permutation, we list the cycle structures of ten randomly generated permutations of 1000 elements. The cycle structure is given as (a,, a2, . . . , a,), meaning that the permutation has t cycles, and their respec

tive lengths are a, a21. .. , a, where a, ~ a2~<

(2, 2, 3, 25,49,919)

(1, 8, 9, 20, 24, 147, 78 1)

(3, 6, 6, 10, 17, 42, 107, 156, 653)

(1,1,6,16,58,70,75,298,475)

(1, 9, 13, 16, 17, 35, 40, 41, 828)

(1, 2, 3, 3, 21, 94, 139, 338, 399)

(2, 3, 5, 28, 117, 332, 513)

(1, 1, 10, 16, 95, 155, 722)

(1, 2, 997~

(45,246,709)

In these examples the number of cycles is small and very few elements lie in

short cycles. The following facts about random permutations indicate that

this tends to be true in general. In the following three facts let a denote a

random permutation of n elements.

3 Probabilistic analysis of the asymmetric TSP 201

Fact 1. The probability that element 1 lies in a cycle of length x is l/n, for x=1,2,. ._ n.

Fact 2. The expected number of cycles in a is

1 + + +. . .+ 1 log n.

2 3 n

Fact 3. The probability that exactly t elements lie in cycles of (T of length less than or equal to r is 111t1r] !.

Lemma 9 The conditional expectation of the total patching cost, given that the

optimal assignment has cycle structure (a,, a2, at), is

2(t 1)a (n) + 12.\/~

k = 1 ,lak (ak 11 + ak+2 + a,)

Proof The algorithm performs t 1 patching operations. For k = 1, 2, . . . , t 1, a step occurs in which a cycle D, of length ak gets patched into the longest cycle D21 which is of length ak+l + ak+2 +. . . +a,. As discussed above, we may assume pessimistically that the arcs within D, and D2 are of cost 0, and the arcs between D, and D2 are independent random variables, each of which is the sum of two independent random variables drawn from the uniform distribution over (a (n), 1]. A standard calculation shows that the expected value of the patching cost is

2a (n) + 12,/~

,lak (ak 11 + ak+2 + a,)

and, summing over all the patching steps, the total patching cost is

2(t1)a(n)+121/~

k~lwlak(ak+l+ak,2+...+a,)

Theorem 7 E[TA*]_2.33n"+On 112

Proof By Lemma 9,

E[TA*j_E 2(t1)a(n)+21,/7r

1 k=l v/ak(ak+l+. . .+a,)

where (a,, a2,. . ., a,) is the cycle length distribution of a random permutation of {1,2,...,n}, By Fact 2, E[fllogn so E[2(t1)a(n)]= 0((log n)'1n). If o, is a permutation with cycle structure (a,, a2,. . . , a,) then

k = 1 Jak (ak 11 +. + aj {Cir(C)dQn/2) Jr(C)max{r(C), m(C)}'

where C ranges over the cycles of cr, r(C) is the number of elements in cycle C, and m (C) is the number of elements in cycles of length greater than r(C).

i

i

1

0

202 6 Probabilistic analysis of heuristics

Conditioning on the event that cr contains a given cycle Q we compute

,E[

,/max {r(C), m (C)}

Let r = r(C).

Case 1. r < n"'. The cycles of a different from C form a random permutation a' of nr elements. Hence m(C) is distributed as nrM(Pi r, r), where the random variable M(n r, r) denotes the number of elements in cycles of length r in a random permutation of n r elements. Let x = M0i r, r)/(n r). Then

1 1

m ~(C) = (n=r /1 x

Noting that

1+2x for 0 x 0.85,

1x

we obtain

E[ P4x 0. 851 1 E[ 1 + 2x x0.85]

,/max{r(C), rn (C)}

+= P4x > 0.851

r(C)

E[1+2x Pr[x>0.851.

r r(C)

Recalling the definition of x and using Fact 3:

P4x >0.851 =Pr[M(n r, r)>0.85(n r)]

It/(n

When r

goes to 0 faster than D' for any positive constant D. Also, by Fact 1, E[ x 1 = r/(n r). Hence

1 1 (1+ 2,

E[ 1 '~ ~ + o(D

max{r(C), m (C)} ./n r n r

where D is an arbitrarily large constant.

Case 2. r n 0,6. In this case we use 1/,~max {r(C), tn (C)} llv'r.

Combining the bounds obtained in Cases 1 and 2 with the fact that the

3 Probabilisfic analysis of the asymmetric TSP 203

expected number of cycles of length r is 1.1r, we obtain

. L1 1

E 1

k =1 ~,/a,(akli + a,)]

1W21

312 2

1 2r

/n r n r

312) 1 112) 112+

r o(n <2.65n o (n

Finally, applying Lemma 9, we have E[TA*]2.33n 112+0 (n 112).

We close by noting the following corollary, which establishes that the percentage difference between T, the cost of the tour produced by the patching algorithm, and V, the cost of the optimal tour, tends to be very small when n is large.

Corofiary 1 E TT* 0(n 112

1 F1 =

Proof Since T* A* it suffices to prove that E VA* 0(n`2). This

1 A*

follows from three observations:

(i) E[ T A * 1 = 0 (n 1`) (Theorem 7). (ii) On all instances, T A * 2 n.

(iii) For every e > 0, Pr[A* < 1 c 1 goes exponentially to 0 as n > . This is

most easily seen by noting that A* yimini{cii}. E]

3.4 Open questions

We mention two variants of the random directed TSP for which it should be possible to conduct a probabilistic analysis of approximation algorithms based on patching. The first of these is the random undirected TSP, in which the matrix (cij) is symmetric, and the elements on or above the main diagonal are drawn independently from the uniform distribution over [0, 11. The second variant is the random directed TSP with repeated visits to cities permitted, so that, instead of tours, we deal with directed spanning walks.

It would also be of great interest to make a,probabilistic analysis of branch and bound methods for the optimal solution of the directed TSP. One common branch and bound method makes use of the fact that the optimal solution of the assignment problem provides a lower bound on the cost of an optimal tour. The method develops a tree of problem instances, each of which is obtained from the original instance by setting certain costs c,, to oo, thus excluding tours which use the arc Q, j). Given such a derived instance I, let o,(I) be the optimal solution of the assignment problem. If (7j) happens to be a cyclic permutation then it solves instances I of the

i

i

1

k

204 6 Probabilistic analysis of heuristics

TSP, and there is no need to create descendants of I in the tree of problem instances. If the permutation o,(I) is not cyclic then its shortest cycle determines its descendants. Suppose the shortest cycle of a(I) is (io~ i 1, . . . , ik l). Then o, (I) maps io to i 1, i 1 to i2, . . . , ik 1 to il. Every cyclic permutation must omit at leat one of these arcs. Accordingly, instance I has as its children instances I, I, . . . , Ik_1 where Ii is obtained by setting to W.

The general step of the branch and bound method is as follows. In the current tree of problem instances, let instance I be the leaf for which the cost of the optimal solution to the assignment problem is least. If o(I) is a cyclic permutation then it is the optimal tour for the original instance of the TSP. If a(I) is not cyclic then its shortest cycle determines k children which become leaves of the tree of instances. The process continues until a cyclic permutation is found.

Since the optimal assignment is equally likely to be any one of the n! permutations of {l, 2, . . . ' n}, and exactly (n 1)! of these permutations are cyclic, there is a l/n chance that the solution of the original assignment problem will be an optimal tour (this chance can be increased to approximately eln by setting the diagonal elements cii to , and thus eliminating permutations with fixed points). If the problem instances occurring in the branch and bound tree were independent random instances then, independently at each step, there would be a l/n chance of finding a cyclic permutation and terminating the branch and bound computation. Two papers have been published which make such an erroneous independence assumption and conclude thereby that the optimal tour can be found by branch and bound in polynomial expected time. Lenstra & Rinnooy Kan [19781 point out the error in one of these erroneous papers. A correct analysis of the branch and bound method remains to be made.

Exercises

12. Prove: For every matrix (q), T*_A*.

13. Prove: In a random permutation of {l, 2, n}, Pr[element 1 lies in a

cycle of length r] = lIn, for r = 1, 2,, . ., n.

14, Prove: The expected number of cycles in a random permutation of

2,'. n} is 1 + 1'+ 1 + lIn.

2 3

15. Prove: For every positive integer a, Pr[g(n)n/a]1/a!. Here the random variable g(n) denotes the length of a longest cycle in a random permutation of {l, 2, . . . , n}.

16. Let (cii) be an n X n matrix in which every diagonal element cii is equal to co, and the offdiagonal elements are drawn independently from the uniform distribution over [0, 11. Prove:

Pd the optimal assignment for (cii) is a cyclic permutation] el n.

17. Let X(N) be the minimum of N independent random variables, each of which is the minimum of two independent samples from the uniform

3 Probabilistic analysis of the asymmetric TSP 205

distribution over [0, 1]. Prove:

E[MNA< 1 411T

2 N

18. Prove or disprove: There exists a constant such that, for every e > 0,

Pr[IA* pl> cl+ 0 as n

Here A* is the cost of the optimal assignment for a matrix (cii) whose elements are drawn independently from the uniform distribution over [0, 11. 19. What happens to the distribution of (T T*)l T* when the elements of (ci) are drawn independently from the uniform distribution over [a, 11, a > 0?