Please see PDF version
The Objective Method: Probabilistic Combinatorial Optimization and Local Weak Convergence
David Aldousi and J. Michael Steele 2
Department of Statistics, University of California Berkeley, Berkeley, CA,
947203860 aldous@stat.berkeley.edu 2 Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA 191046302 steele@wharton.upenn.edu
1 Introduction
This survey describes a general approach to a class of problems that arise in combinatorial probability and combinatorial optimization. Formally, the method is part of weak convergence theory, but in concrete problems the method has a flavor of its own. A characteristic element of the method is that it often calls for one to introduce a new, infinite, probabilistic object whose local properties inform us about the limiting properties of a sequence of finite problems.
The name objective method hopes to underscore the value of shifting ones attention to the new, large random object with fixed distributional properties and way from the sequence of objects with changing distributions. The new object always provides us with some new information on the asymptotic behavior of the original sequence, and, in the happiest cases, the constants associated with the infinite object even permit us to find the elusive limit constants for that sequence.
1.1 A Motivating Example: the Assignment Problem
The assignment problem for the n x n cost matrix (%) is the task of determining the permutation 7r : [n] ~> [n] that minimizes the total cost C1,,,(1) + C2,r(2) + ' .. + c~,,(n) of assigning one column to each row. This problem arises in many applied contexts, and it has spawned an extensive algorithmic literature. Here we are more specifically concerned with the value of the objective function
n
A~ = min
1r ci, W (i),
where the costs cij, 1 < i, j < n, are assumed to be independent and identically distributed random variables.
i
2 David Aldous and J. Michael Steele
Investigation of A,, seems to have begun in 1962, when Kurtzberg [421 used heuristic methods like greedy matching to obtain upper bounds on E[Al for uniformly distributed costs. Kurtzberg's bounds were of order log n, and these were not improved until 1979 when Walkup [641 showed that E[A,l is bounded independently of n, which was quite a surprising discovery at the time. Eight years later, Karp [381 introduced a new approach to the estimation of E~A,,] that was based on linear programming. Karp exploited the explicit bases that were known to be optimal for the assignment problem, and he obtained the elegant bound E[An] :5 2. Inspired in part by Karp's result, Dyer, Rieze and McDiarmid [191 developed a general bound for the objective function of linear programming problems with random costs, and they were able to recapture Karp's bound without recourse to special bases. A probabilist's interpretation of the DyerFYiezeMcDiarmid inequality forms the basis of Chapter 4 of Steele [601 where one can find further information on the early history of the assignment problem with random costs.
A new period in the development of the random assignment problem began in 1987 with the fascinating article of M6zard and Parisi [501 which offered a nonrigorous statistical mechanical argument for the assertion that
2
lim E[An] = , = ((2). (1.1)
n~ 6,
The desire for a rigorous proof of this limit has influenced much of the subsequent work on the assignment problem, and the critical first step is simply to show that actually E[An] converges as n > oo. Convergence would follow immediately if one could show that E[AA is nondecreasing, but it is still not known if E[An] is monotone. Nevertheless, in 1992 Aldous [31 used the objective method to show that E[Anj does indeed converge.
In 1998 Parisi [51] added further interest to the random assignment problem when he advanced the remarkable conjecture that for independent exponentially distributed random cost cij with mean 1, one has an exact formula: n
2
E[An] k (1.2)
k=l
A proof of Parisi's conjecture would immediately confirm the ((2) limit formula. moreover, there are good reasons to believe the truth of this stronger assertion. Specifically, Alm and Sorkin [91 proved the conjecture for all values of n up to n = 5. Unfortunately, even for such small values, the proofs are not easy, and Alm and Sorkin had to find their way through a swamp of special cases with the help of automated theorem proving computations. The conjectured formula for E[A,,l remains a tantalizing challenge.
Nevertheless, progress on the limit value for E[Anj has been more definitive. In 2001 by means of the objective method Aldous [4] finally proved the ((2) limit formula that M6zard and Parisi [50] first brought to light in 1987.
i
i
The Objective Method 3
1.2 A Stalking Horse: the Partial Matching Problem
One of the aims of this survey is to show how the objective method helps to solve problems such as the determination of the limiting value of E[A.], but the assignment problem itself is too burdened with individual nuance for it to serve as our basic guide. For this reason, we introduce a new problem, the maximal partial matching problem.
This problem does not have the long history of the random assignment problem, but it is especially well suited for illustrating the objective method. In particular, it builds on the theory of random trees, and the limit theory of such trees provides the prototype for the objective method. Also, the maximal partial matching problem leads inexorably to the notion of a distributional identity. Such identities have always had a role in probability theory, but, along with allied notions like selfsimilarity and subadditivity, distributional identities are now a topic of emerging importance.
The relevant identities for the maximal partial matching problem are also much more tractable than the corresponding identities for the random assignment problem, yet many of the same tools come into play. In particular, we will find that one often does well to begin by guessing the solution of a distributional identity. After one has a good guess, then classic tools like Kolmogorov's consistency theorem and contemporary tricks like "coupling from the pasC can be used to confirm the existence, uniqueness, and stability of the solutions of the distributional identity.
1.3 Organization of the Survey
The maximal partial matching problem provides our introductory case study, but before it can be addressed we need to deal with some foundational issues. In particular, Section 2 develops the notion of local weak convergence. Formally, this is nothing more than the weak convergence of probability measures on a certain metric space of "rooted geometric graphs," but the attending intuition differs substantially from the classic weak convergence theory. Section 2 also introduces the "standard construction," which is a general recipe for building rooted geometric graphs. This construction turns out to have a subtle, yet pervasive, influence on the applications of the objective method.
After we have dealt with the essential metric space formalities, we see what local weak convergence tells us about the simplest model for random trees. In particular, we review Grimmett's lemma, and we develop an understanding of the convergence of the large and the small parts of a random tree that has been cut at a randomly chosen edge. This analysis provides us with the essential distribution theory for random trees that is needed later.
The theory of the maximum partial matching problem developed in Section 3 is concrete and selfcontained. Nevertheless, it faces most of the issues that one meets in more complex applications of the objective method, and it offers the best introduction we know to the essential ideas.
i i
4 David Aldous and J. Michael Steele
In Section 4 we introduce the meanfield model of distance which is a physically motivated probability model designed to gain insight into problems for point processes in R d. In this model the distribution of some of interpoint distances of R' are captured precisely while other (hopefully less essential) interpoint distance distributions are distorted in comparison to Rd. The meanfield model leads us to the PWIT, or Poisson weighted infinite tree, which is arguably the most important infinite tree model. To illustrate the close connection of the theory of the PWIT and the objective method, we give a reasonably detailed proof of the ((3) theorem of Rieze.
The relationship of the PWIT to problems of combinatorial optimization is continued in Section 5 by developing the limit theory for the minimum cost C,, of a perfect matching of the complete graph K,, with independent edge weights ~, having a common distribution F. We provide a reasonably detailed (and hopefully friendly) sketch of the fact that E[C.] converges to ((2)/2 as n ~ oo when F is the exponential distribution with mean one; this result is the direct analog of the ((2) limit theorem for E[A,]. Understandably, some nontrivial details must be omitted from the sketch, but the section should still provide a useful introduction to Aldous's proof of the ~(2) limit theorem.
All of the problems in Sections 3 5 call on distance models with substantial intrinsic independence., for example, in Section 3 we have independent weights on every edge, and we face the infinite PoissonGaltonWatson tree in the limit, while in Sections 4 and 5, we have the tempered independence that one inherits from the meanfield model, and we face the PWIT in the limit. The problems of Section 6 are different; they deal directly with interpoint distances in R d rather than with edge weights that are blessed ex cathedra with independence properties, and in the limit we often need to deal with the Poisson process.
The probability theory of Euclidean combinatorial optimization has grown quite large, and in Section 6 we provide very few proofs. Nevertheless, we hope to provide a useful update of the survey of Steele [60]. In particular, we address progress on the minimal spanning tree problem, including a recent weak law of large numbers developed in Penrose and Yukich [531 which has close ties to the objective method. The section also contrasts the objective method and the subadditive method that has served for years as a principal workhorse in the probability theory of Euclidean combinatorial optimization. In the closing subsection, we break into entirely new territory and describe a remarkable new result of Benjamini and Schramm on the recurrence properties of the local weak limits of planar graphs.
In Section 7, the last section, we first summarize some of the circumstances that seem to be needed for the objective method to be successful. We then develop the background of an attractive conjecture on the independence number of a random regular graph. This problem is used in turn to illustrate several of the basic challenges that appear to be critical to the deeper development of the objective method.
i i
i
i
The Objective Method 5
Finally, we should note that even though our main intention is to provide a survey and a tutorial, this exposition also contains new results. In particular, the material in Section 3 on the maximum partial matching problem is new, including the basic limit theorem (Theorem 1) and the theorem that determines the limit constants (Theorem 2). Another new result is the Convergence Theorem for Minimal Spanning 'I~ees (Theorem 6). This theorem both generalizes and simplifies much earlier work; it is also applied several times during the course of the survey.
2 Geometric Graphs and Local Weak Convergence
Before the theory of local weak convergence can be brought facetoface with the concrete problems of combinatorial optimization, one is forced to introduce an appropriate complete metric space. This introduction has been made several times before on a purely ad hoc basis, but now there is enough cumulative experience to suggest a general framework that should suffice for most applications. After describing this framework, we will give it a quick test run by discussing a prototypical result from the limit theory of random trees Crimmett's lemma on the convergence of rooted Cayley trees.
2.1 Geometric Graphs
If G = (V, E) is a graph with a finite or countable vertex set V and a corre
sponding edge set E, then any function ~: E > (0, col can be used to define
a distance between vertices of G. Specifically, for any pair of.vertices u and
v, one just takes the distance between u and v to be the infirnum over all
paths between u and v of the sum of the lengths of the edges in the path.
Definition 1 (Geometric Graphs and the Two Classes G and 9,)
If G = (V, E) is a connected, undirected graph with a countable or infinite vertex set V and if ~ is an edge length function that makes G locally finite in the sense that for each vertex v and each real p < oo the number of vertices within distance g from v isfinite, then G is called a geometric graph. When there is also a distinguished vertex v, we say that G is a rooted geometric graph with root v. The set of geometric graphs will be denoted by 9, and the set of rooted geometric graphs will be denoted by G_
2.2 9, as a Metric Space
The set G, of rooted geometric graphs provides our basic workspace, but before honest work can begin we need to say what we mean for a sequence Gn of elements of G, to converge to a G in g.. The idea one wants to capture is that for large n, the rooted geometric graph G, looks very much like G in an arbitrarily large neighborhood of the root of G.
6 David Aldous and J. Michael Steele
To formalize this idea, we first recall that an isomorphism between graphs G = (V, E) and G' = (V', E) is a bijection 0 : V + V' such that (0(u), 0(v)) E E' if and only if (u, v) E E. Also, given any such isomorphism, one can extend the domain of 0 to E simply by defining 0(e) to be (0(u), 0 ()) for each e = (u, v) E E.
Finally, we say that two geometric graphs G = (V, E) and G' = (V', E') are isomorphic provided that (1) they are isomorphic as ordinary graphs and (2) there is a graph isomorphism 0 between G and G' that also preserves edge lengths (so f'(0(e)) = f(e) for all e E E). In the case of two rooted geometric graphs G = (V, E) and G' = (V, E'), we will say they are isomorphic provided that there is a graph isomorphism 0 that preserves edges lengths and that also maps the root of G to the root of G'.
Next we consider a special rooted geometric graph that one may view intuitively as the "neighborhood of radius p about the root" of the rooted geometric graph G. Specifically, for any p > 0 we let N,(G) denote the graph whose vertex set V,(G) is the set of vertices of G that are at a distance of at most p from the root of G and whose edge set consists of just those edges of G that have both vertices in V,(G), where, as before, the distance between any two vertices u and v in G is taken to be the infinuan over all paths between u and v of the sum of the lengths of the edges in the path. We again view N,(G) as an element of 9, whose the edge length function and root are just those of G. Finally, we say that Q > 0 is a continuity point of G if no vertex of G is exactly at a distance p from the root of G.
Definition 2 (Convergence in 9,) We say that G, converges to G,,,, in G, provided that for each continuity point Q of G,,~ there is an no = no (p, GJ such that for all n > no there exists a isomorphism y~,c from the rooted geometric graph N.(GJ to the rooted geometric graph N,(GJ such that for each edge e of NfflJ the length of yn,,(e) converges to the length of e as n ~ oc.
With a little work, one can show that this definition determines a topology that makes G, into a complete separable metric space. As a consequence, all of the usual tools of weak convergence theory apply to sequences of probability measures on g., and we may safely use the conventional notation and write
1A~d > 1A to mean that f dt,,, f dp
for each bounded continuous function f R.
2.3 Local Weak Convergence
The topology on the metric space G, turns out to give weak convergence of probability measures on 9. a local character that sharply differs from the traditional weak convergence such as one finds in the weak convergence of
i
The Objective Method 7
scaled random walk to Brownian motion. Weak convergence of measures on g, never involves any rescaling, and the special role of the neighborhoods N,(G) means that convergence in g, only informs us about behavior in the neighborhood of the root.
In practical terms, this means that weak convergence in 9. can tell us about local features such as the degree of the root, the length of the longest edge incident to the root, and so on; yet it cannot convey detailed information on a global quantity such as the length of the longest path. To underscore this difference, one sometimes speaks of weak convergence in g, as local weak convergence.
2.4 The Standard Construction
Most of the random processes considered here are associated with a standard construction that has some lasting consequences, even though it may seem rather bland at first. The construction begins with a probability measure that is concentrated on the subset of 9 consisting of geometric graphs with exactly n vertices. We then consider a random element G,, of 9 that is chosen according to this measure, and we choose a vertex X, at random according to the uniform distribution on the n vertices of G.. We then make G. into a random rooted geometric graph Gn[Xn] by distinguishing Xn as the root vertex. The distribution of Gn[Xn] is then a probability measure on the set of nvertex elements in the set of rooted geometric graphs. Finally, if a sequence {Gn [X,] : n = 1, 2, ...} of such 9,valued random variables converges weakly in 9,, to a G,valued random variable Q,., then we say that the distribution of G,,, is obtained by the standard construction.
One might think that virtually any measure on 9. might be obtained by the standard construction, but the measures given by the standard construction are not unconstrained. Later we will find that they must satisfy a modest symmetry property that we call involution invariance. This property may sometimes be used to rule out what would otherwise be a tempting candidate for a limiting object.
2.5 A Prototype: The Limit of Uniform Random Trees
A classical formula of Cayley tells us that if V is a set of n distinguishable elements, then the set S. of rooted trees with vertex set V has cardinality nnl. If Tn denotes a tree that is chosen from Sn according the uniform distribution, then one can show that T. converges in distribution to a random variable T that takes values in G,. Remarkably, one can characterize the distribution of T,,, by a direct construction that relies on the classical theory of branching processes.
THE INFINITE "SKELETON TREE" To~ OR, PGW~(1)
i
i i i i i
8 David Aldous and J. Michael Steele
To begin, we consider a GaltonWatson branching process with one progenitor and an offspring distribution that has the Poisson distribution with mean one. With probability one, any such branching process has a finite number of vertices, and the finite trees generated in this way are said to have the Poisson Galton Watson distribution with mean one. This distribution on the finite elements of 9, is denoted by PGW(I).
Now consider an infinite sequence of independent PGW(1) distributed trees To ~ T1 ~ T21 ... , and let vo, v,, v2.... denote their roots. Finally, to make this collection of trees into one infinite rooted tree, we add all of the edges {(Vi,Vi+i), 0 < i < c>o}, and we declare vo the root of the new infinite tree. The tree T. that one builds this way is said to have the PGW'(1) distribution, and it is illustrated in Figure 1.
0
1
UUU
1 1 /
0 0 0 00
0 0 00 0 00 0 0
Fig. 1. The "Skeleton Tree" T. also known as PGW(1). Here, one should note that only the root vo of T_ is labelled; the chimerical notations V1, V2, ... are only given to focus the eye on the unique path from the root to infinity. The existence of such a path is one of the most fundamental features of PGW (1).
One might imagine that T,,,, could be an undistinguished member of the multitude of infinite trees that anyone might paste together from finite trees, but this intuition is faulty. As the next lemma explains, the skeleton tree T. has an inevitable and highly distinguished role in the theory of random trees. We should note that in the next lemma, and throughout the survey, a graph without its own edge length function is given one by taking the length of each edge to be 1.
Lemma 1 (Crimmett's Lemma) The randomly rooted uniformly distributed labelled tree T. on n vertices converges weakly in 9, to the "skeleton tree" T,,,,, that is, one has
i
i
i
The Objective Method 9
T, T.as n ~ co.
This result was first formalized and proved in Grimmett[261, and further proofs and generalizations are given by Devroye [18] and Aldous [21. In some respects, Lemma 1 is the most fundamental fact about random trees under the model of uniform distribution. Grimmett's Lemma also has an interesting selfgeneralizing quality, and shortly we will see that the convergence in distribution of Tn actually implies that certain special parts of Tn MUSt converge to corresponding special parts of T,,..
This convergence of parts seems to be a recurring theme in the theory of local weak convergence. In particular, it provides one key to the appearance of the distributional identities that have had a crucial role in some of the most interesting applications of the objective method.
SPECIAL PARTS OF THE SKELETON TREE PM'(1)
Our first observation about the skeleton tree of Figure 1 is that it has a pleasantly recursive structure. From the root vo there is just one edge (vo, vJ that is part of the unique infinite path vo, ~ v, ~ ... of T., and if we delete that edge we find two subtrees one finite and one infinite. If T"'11 denotes the finite tree rooted at vo and T N9 denotes the infinite tree rooted at v,, then we see from the definition of T' that we can identify their distributions
,,, d ~ d pGW_(j).
TZ PGW(1) and Tb' =
00
This decomposition and the definition of weak convergence of probability measures on 9,, promptly suggest that one may give a more detailed interpretation of Grimmett's Lemma.
CONVERGENCE OF THE CORRESPONDING PARTS
Consider again the randomly rooted uniform tree T. on n vertices, and let r denote the root of Tn. Any child c of r determines an edge e = (r, c), and, when c is deleted from T, we let Tn, denote the remaining component of T. that contains c. We again view T,,, as a rooted geometric weighted graph by taking its root to be c.
Now, we let c* denote a child of r that is chosen at random from the set of all children of the root for which the cardinality of T., is maximal; in fact, for large n, one finds that with high probability there is a unique maximal subtree. If we remove the edge (r, c*) from Tn we get two subtrees that we now label T7" and T` according to their cardinality (and with any tie
n
being broken by a coin flip). Finally, we view Tn` and T".' as rooted trees
n
by letting the end point of the deleted edge determine the new roots.
Next, by our observations on the structure of the skeleton tree, we can give a more detailed view of the convergence that is guaranteed by Grimmett's Lemma. Specifically, Grimmett's Lemma automatically implies both the convergence of the small component
i
i
1
i
10 David Aldous and J. Michael Steele
T;~" '~ TZ` l PGW(I) and convergence of the large component
Tb'q d _~ T big pGW(1).
n ~~ W
Moreover, Grimmett's Lemma even implies the joint convergence
(T7', T big _~ d _~ (TZ11, Tb'g),
n 00
where the two processes TZ` PGW(1) and T big 1 PGW'(1) are
00
independent. This independence is often useful, and in the Section 3 its contribution is essential.
BEYOND THE SKELETON TREES
Many models for random trees have been introduced in combinatorics and the theory of algorithms (1471, [631), and for the majority of these models one finds that there is a natural weak limit. Moreover, the analysis of Aldous [2] shows that for a large class of these models the limit object is a random rooted tree which shares many of the qualitative features of the Skeleton Tree given in Grimmett's Lemma.
Specifically, one finds that each of these limit objects has a unique infinite path from the root, and, as we saw in our discussion of Grimmett's Lemma, when one removes this skeleton one finds a sequence of finite trees that have a useful description. In general, these finite trees are no longer independent and identically distributed; instead, this sequence trees is more closely described as a hidden Markov process. In fortunate circumstances, the Markov structure that one finds can serve as a useful substitute for the independence that we use here.
3 Maximal Weight Partial Matching on Random Trees
Like most methods, the objective method is best understood through the examination of a concrete example, and, for all the reasons mentioned in the introduction, we believe that the maximal weight partial matching problem is the right place to begin. It is tractable enough to be developed completely, yet rich enough to demonstrate the structural features of the objective method that are important in more complex problems such as the random assignment problem.
3.1 Weighted Matchings of Graphs in General
For any graph G = (V, E) a paTtial matching S is simply a subset of the set of edges of G such that no pair of edges of S share a common vertex. Any
1
i
The Objective Method 11
function w : E(C) _~ R may be viewed as a weight function on the edge set of G, and the weight of the partial matching S is defined simply as
def
W (S) = E w (e).
eES
If S* is a partial matching of G such that w(S*) is equal to the supremum of w(S) over all partial matchings of G, then S* is called a maximal weight partial matching of G. Such matchings are important in many problems of combinatorial optimization, and methods for computing maximum weight partial matchings have an important role in the theory of algorithms (cf. LovAsz and Plummer [461).
3.2 Our Case: Random Trees with Random Edge Weights
To each fixed finite rooted tree T and to each edge e of T, we now associate a random variable ~,. Moreover, we assume that the ensemble {~, : e E T} is independent and that ~, has distribution F for each e E T. We view e ~~ ~, as a weight function on T, and a T with this weight function will be called an Fweighted rooted tree. Finally, for each Fweighted rooted tree T we let M(T) denote the maximum weight over all partial matchings of T.
In particular, if T. denotes an Fweighted rooted tree that is chosen at random according to the uniform distribution on the set of nn1 rooted trees with nvertices, then we write M. as shorthand for M(Tn). The random variable M,, is therefore the maximum weight of a partial matching of a random nvertex Fweighted rooted tree, and it will be at the center of our attention for the rest of this section.
3.3 Two Obvious Guesses: One Right, One Wrong
When the edge weight distribution F has a finite mean, one immediately suspects that there is an asymptotic relationship of the form
E[M,,l yn as n > oo
where y = y(F) is a constant that depends only on F. What one further suspects is that even for the nicest choices of F the calculation of y might be an almost impossible task. By comparison, one can list perhaps a dozen problems where subadditive methods yield a similar asymptotic relationship, yet the limit constants have gone unknown for decades.
In the partial matching problem we face a happier circumstance. For continuous F the objective method not only yields a proof of the asymptotic relation, it also provides a concrete characterization of the limit constant. This characterization is not necessarily simple, but at least in the leading case of exponentially distributed edge weights it does lead to an explicit integral representation for y that can be calculated numerically.
i
1
12 David Aldous and J. Michael Steele
OUR AssUMPTIONS ON THE DISTRIBUTION F
The intuition that leads one to guess that E(MJ1n converges does not impose any constraint on the edgeweight distribution F, except the trivial constraint that the expectation of ~, F should be well defined. Nevertheless, here we will always assume that F is continuous and that P(~, > 0) = 1. these assumption guarantee that the maximal weight partial matching exists and is unique with probability one.
We do not believe that the continuity of F is needed for the convergence of E(MJ/n, but, without this assumption, the possibility of multiple optima would forced us to face many irritating complications. Since our main intention here is to demonstrate the fundamental features of the objective method in the simplest realistic light, the issue of discontinuous F is best left for another time.
3.4 Not Your Grandfather's Recursion
The first thought of anyone interested in the asymptoties of E[Mn] is to look for a relation between E[Mn] and the earlier expectations E[Mi], 1 < i < n. Here we will also hunt for a recurrence relation, but what we find differs radically from the recursions one commonly meets in discrete mathematics.
If we remove an edge e from the edge set of Tn, then T,, is broken into two connected components. We can then view these components as rooted trees where we take the old vertices of e to be our new roots. Next, we label these trees Tn"(e) and T"(e) according to their cardinality (with any tie n
being broken by taking the labels at random). This process is similar to the discussion of the refinement of Grimmett's Lemma, except that here the cut edge e can be any edge of Tn.
Now, we consider whether or not the edge e is in the maximal partial matching of Tn. If one does not use the edge e, then the maximum weight of a partial matching of Tn is also equal to
M (TZn` (e)) + M (T`q (e)). (3.3)
n
To go deeper we need some additional notation. If T is any weighted rooted tree, we define B(T) by letting M(T) B(T) denote the maximum weight of a partial matching of T where the root is not in an edge of the partial matching. By optimality of M(T) we see that B(T) > 0, and we think of B (T) as the bonus one gets from the option of being allowed to use edges that meet the root. With this notation, we see that the maximal weight of a partial matching that is required to use e is given by the sum
~,+{M(TZ""(e))B(TZ"")(e)}+{M(T"'(e))B(T"')(e)}. (3.4)
n n
When we compare the weights (3.3) and (3.4), we see that with probability one the edge e is in the maximum weight partial matching if and only if
i
i
1
The Objective Method 13
> B (T;~ "' (e) ) + B (T` g(e) ).
n (3.5)
This inclusion criterion naturally gives us a nice way to write M, as a sum over the edges of Tn. If we use 1 (A) to denote the indicator function for an event A, then the inclusion criterion (3.5), tells us that
Mn = 1: ~e l > B (T~"' (e) ) + B (Tn"(e) (3.6)
cETn
Now, if e denotes an edge chosen uniformly from the edge set of T,, we see from the sum (3.6) that the expectation E[Mn] can be written as
EIM~] = (n 1)E[~.1(~. > B(Tn"'(.))+B(T"(e)))].
n
Finally, since the distribution of ~. does not depend on e and since ~. is
independent of B (T7` (e)) and B (T"(e)), the last equation may be written
n
a bit more crisply as
E[Mn] = (n I)E[ ~1 (~ > B(Tn`(e)) + B(T`g(e)) (3.7)
n
where we understand that F and that ~ is independent of the pair
(B(T',"(e)), B(T big (e)
n
3.5 A Direct and Intuitive Plan
The representation for E[M,,] given by equation (3.7) is hardly a conventional recursion, yet it is still encouraging. The first factor n 1 hints loudly at the gross behavior one expects from E[Mn]. Moreover, from our discussion of the refinement of Grimmett's Lemma, one can guess that two tree processes Tn`(e) and T big (e) will converge in distribution, and this strongly suggests n that the second factor of formula (3.7) will converge to a constant as n+ 00.
Certainly, one should worry about the fact that B() is not a bounded continuous function on 9,; after all, B() is not even well defined on all of G.. Nevertheless, B() is well defined on the finite graphs of 9, and any optimist surely suspects it to nice enough for the random variables B(TZ,`(e)) and B(T`(e)) to inherit convergence in distribution from the weak convergence n of TZ"` (e) and T` (e). n
To refine these suspicions, we first note that TZ"(e) and T"g(e) differ n from the trees Tn` and T big in our discussion of Grimmett's Lemma only n in that the roots of TZ" (e) and T`1 (e) are determined by the random edge n e while the roots of T7"' and T` were determined by a random uniform n choice from the respective vertex sets.
As Figure 2 suggests, the root r of T7`(e) is not quite uniformly distributed on the vertex set of Tn when e is chosen at uniformly at random from the edge set of Tn. In fact, if n is odd, then there is always one vertex of
14 David Aldous and J. Michael Steele
T,, that has probability 0 of being the root of T;~"' (e), and, when n is even, then there are always two vertices that have probability 1/(2(n 1)) of being the root of Tn"(e). In both cases, all of the other vertices have probability 1/(n 1) of being the root of T7`(e), so, even though r is not uniformly distributed on the vertex set of Tn, it is almost uniformly distributed. In fact, the total variation distance between the distribution of r and the uniform distribution is always bounded by 1/2n, and for odd values of n this bound is exact.
Odd Valued n Even Valued n
1/7 1/7 1/6 1/6 1/8 1/8 1/7 1/7
0"" 0
1/7 1/6 1/8 1/8 1 /14 1/7
1/7 V1/7 0 1/6 1/8 1/8 1/14 1/7
1/7 1/7 1/6 1/6 1/8 1/8 1/7 1/7
Uniform Edge Biased Uniform Edge Biased
Fig. 2. In this figure, the value next to a vertex is the probability of that vertex being the root of T~"'. One should note that the uniform choice of an edge typically leads to a nonuniform distribution on the vertices.
Rom this observation and a traditional coupling argument, one finds that there is only a small change in the distribution of the bonus when the omitted edge is used to determine the root; specifically, for all n > 1 and all x E R we have the inequality
P(B(T7`(e)) 5 x) P(B(T7`) :5 x) 1 :5 (3.8)
2n
A parallel analysis of the root of T`(e) yields an identical bound for the n
distributions of the bonuses of the big tree components, and together these
bounds tell us that if the distribution F of ~ is not too longtailed, then the
asymptotic behavior of the right hand side of our exact formula (3.7) for
E[Mn] will not be changed if we replace the edge biased trees T7`(e) and
T"'(e) by their randomly rooted cousins T` and Tn` whose asymptotic
n n
behavior is well understood.
We now see that along the path to proving that E[M,,]/n converges, we have the natural intermediate task of showing that the bonuses B(T7`) and B (T") converge in distribution. Further, if we hope to have a serious chance
n
The Objective Method 15
of calculating the value of the limit of E[M,,11n, we also need a concrete characterization of the limiting distributions of B(TJ`) and B(T").
n
3.6 Characterization of the Limit of B(T;~"")
From our discussion of Grimmett's Lemma, we know that T~"" converges in distribution to a tree T with the Poisson GaltonWatson distribution PGW(1). When we view T as an Fweighted tree, then the bonus B(T) is well defined since T is almost surely finite. Moreover, one can check directly from the definition of weak convergence in 9, that we have
B(T~n"' ) d _+ B (T), (3.9)
so the real question is whether there is an efFective way to characterize the distribution of B(T). This is where the notion of a distributional identity enters the picture.
The basic idea is that the recursive definition of the tree T should translate into a useful selfreferencing identity for the distribution of B (T); we just need to put this idea to work. If C denotes the set of children of the root of T, then for each i E C there is a subtree T' of T that is determined by the descendants of i, and we may view T' as a rooted tree with root i. If ~i denotes the weight of the edge from the root of T to the child i, then the maximum weight of a matching on T that does not meet the root is given by the sum
M (T');
iEC
so, to get an identity for B(T) we just need to calculate the maximum weight of a partial matching that does meet the root.
If j E C, then the maximum weight for a partial matching that uses the edge from the root of T to j is equal to
~j+{M(Ti)B(Tj)}+ 1: M(T')=~jB(Ti)+Y:M(T'),
jEC, j94j iEC
so the maximal weight for a partial matching that does include the root is equal to
max {~j B (Tj)} + 1: M (P).
jEC iEC
We now see that the difference B(T) between the overall maximum and the constrained maximum Eic M(P) is given by
B(T) = max { 0, ~i B(P) : i E C}. (3.10)
Finally, the cardinality of C is Poisson with mean one, T 1 T' for all i E C, and all of the random variables on the right hand side of the identity
i
i
i
1
16 David Aldous and J. Michael Steele
(3.10) are independent, so we have before us a very powerful constraint on the distribution of B(T). In fact, we can cheek that any random variable that satisfies a distributional recursion like that defined by equation (3.10) must be equal in distribution to B(T).
FORMALIZATION
To formalize this assertion, we let F and G denote any two distributions, and we define DF(G) to be the distribution of the random variable
max{O, ~i Yi : 1 < i < N} (3.11)
where random variables in the collection {N, Yi, ~i : i = 1, 2.} are independent, N has the Poisson(l) distribution, and we have ~i F and Yi G for all 1 < i < oo. The next proposition tells us that one may characterize the distribution of the bonus B(T) as a fixedpoint of the mapping DF().
Proposition 1 If T is an Fweighted PGW(1) tree where F is continuous and FQ0, oo)) = 1, then the distribution
G(x) = P(B(T) < x) is the unique solution of the fixedpoint equation
DF(G) = G. (3.12)
Since the discussion leading up to this proposition shows that the distribution of B(T) is a solution of the fixedpoint equation, we see that the characterization will be complete if we show that any solution of equation (3.12) must be the distribution of B(T) for some Fweighted PGW(1) tree T. This will be done in a way that is particularly probabilistic.
A UNIQUENESS PROOF BY A PROBABILISTIC CONSTRUCTION
Let G denote a solution of the fixedpoint equation DF(G) = G, so our task is to show that G is the distribution function of B(T) where T is an Fweighted PGW(1) tree. Our plan is simply to use G to construct such a T, and Figure 3 suggests how we start.
In the first step, we just consider the tree consisting of a root and a set C of N children. We then attach a random variable Yi to each child i E C where the Yi are independent and Yi G. We then define Y by the identity
Y = max{O, Yi : i E C},
where the ~i are independent and ~i F. We describe this process by saying that the root has been expanded.
Next, take any unexpanded vertex v (at the second step this would be any of the elements of C) and expand v in the same way that we expanded the root. Specifically, we give v a set C(v) of children with cardinality N(v) =
i
i
The Objective Method 17
IC(v)l that is independent Poisson(l), attach a random variable Yi to each child i E C(v) where the Yi are independent and Yi G, and then define Yv by the identity
Y~ = ax{O, ~v,i Yi : i E C}, (3.13)
where as before the ~,,i are independent and ~v,i F.
y~, 1 y~, 2 y~,N()
Y2 YN
Y1 Y2 y~ YN y~l_~y,
y y
N Poisson(l) N(v) Poisson(l)
Yi ild. G Y~,i ild. G
y = max(O,~i Y, : 1 < i < N) Yv = max(O, ~v,i Yi : 1 < i < N(v))
Fig. 3. Expansion of the root and expansion of a descendant vertex.
When we expand the root, the fixedpoint property of G guarantees that Y has the distribution G. Later, when we expand v, the values of Yv and Y will be changed, but their distribution will not be changed. Thus, to complete the construction, we simply repeat the process of expanding any unexpanded vertex until no such vertices remain. The tree we construct in this way is precisely a GaltonWatson tree with the Poisson(l) offspring distribution, so, with probability one, the expansion process stops after finitely many steps. Moreover, even though each vertex expansion changes the values of the random variables that are attached to the vertices along the path back to the root, all of the distributions will be unchanged.
By this construction, we see that for any leaf ~ of the resulting tree T the value Ye associated to f by the value rule (3.13) is zero. The bonus B(T~) is also zero for any leaf, and the recursion (3.10) for the bonus is the same as the recursion that defines the associated values Y,, v E T. We therefore find that B(T) = Y, and consequently we see that any solution G of the fixedpoint equation DF(G) = G is equal to the distribution of B(T) for an Fweighted PGW(1) tree T.
AN ANALYTICAL QUESTION
This proof of Proposition 1 is quite natural given the way that we came to the fixedpoint equation DF(G) = G, but the uniqueness question for this
i
i
is David Aldous and J. Michael Steele
equation is in fact a purely analytical problem. It seems natural to ask if the uniqueness of the solution of DF(G) = G may be proved directly without recourse to the probabilistic interpretation of G, but we do not address that question here. For the moment the more pressing question is whether we might also be able to characterize B(T`) by a distributional identity like
n
the one we used to characterize B(TZ`%
3.7 Characterization of the Limit of B(T")
n
To be sure, one can begin just as before; specifically, we can note from our discussion of Grimmett's Lemma that T" converges weakly in G. to a random
n
tree, although this time the limit is T, the skeletal tree with distribution
PGW'(1). We would dearly love to conclude that
B(T"`) '+ B(T,,J, (3.14)
n
but as soon as one writes this equation trouble appears.
Since T,,, is almost surely infinite, one sees that the total weight of the maximal partial matching of T. is infinite. For the same reason, the total weight of the maximal partial matching that does not meet the root of T'>', is also infinite. The bottom line is that the bonus B(T.), which is nominally the difference of these two quantities, is not well defined.
Nevertheless, one should not lose heart. If indeed there is a finite random variable Z such that
B(T"") '.> Z, (3.15)
n
then we still may be able to use our intuition about the undefined quantity B(T,,,) to help us to understand Z. Moreover, if we successfully guess a distributional identity that characterizes Z, we may even be able to use the stability of that characterization to build an honest proof of the conjectured convergence (3.15).
GUESSING THE DISTRIBUTIONAL IDENTITY FOR Z
As physicists sometimes do, we cross our fingers and calculate with B (T,,,) even though it represents the difference of two infinite quantities. Since we only want to guess an identity for Z that will be subsequently justified by a rigorous argument, there is no cost to proceeding heuristically for the moment.
If we label T,,, as we did in Figure 1 and if we remove the edge (vO, v,), then we obtain two subtrees with roots vo and v,. If T denotes the subtree with root vo, then T has the PGW(1) distribution, and, if T' denotes the subtree with root v,, then V has the PGW'(1) distribution. By the
00
"definition" of B(TJ, the optimal partial matching of T. that does not match the root vo has weight M(T.) B(T..), and our first task is to get an alternative representation for this quantity.
i
The Objective Method 19
To begin, we let C denote the set of all of the children of vo except for the special child v, that is on the path from vo to infinity in T",. For each i E C we view the vertex i and its descendants as a subtree T' of T, and we take i as the root of T'. With this notation, we can write the weight of the optimal matching on T. that does not match the root vo to a second vertex and obtain the bookkeeping identity
M(T,J B(TJ = M(T') + 1: M(P). (3.16)
iec
The next step is to find an appropriate expression for the maximal weight of a partial matching of T.. that does meet the root. We first note the maximal weight of a partial matching of T,,, that contains the edge (vo,i) with j E C is given by
~j + {M(Ti) B(Ti)} + M(T') + 1: M(P)
iEC,iOi
which we may simplify to
~j B(Ti) } + M(V ) + 1: M(P).
00
iEC
Now, if ~ denotes the weight of the edge (vO, v,), then the maximal weight of a partial matching of T,,,, that contains the edge (V0, V1) can be written as
M(P) + ~ + {M(T' B(T'
00 00
iEC
so, choosing the best of all of the possibilities, we see that M(ToJ equals
1: M(P) + M(V ) + max { 0, max{~j B(Ti)}, B(T'
iEC jEC 00
When we subtract the value of M(TJ B(T,,,) given by equation (3.16), we find
B(T,J = max{O, max{~j B(Ti)}, B(T'~)}. j r=C
We know maXiEC{O, ~j B(Ti)} has the same distribution as B(T) by the basic distributional identity (3.10) for the bonus, and we also know that V0. and TO,, have the same distribution, so at last we have a nice, simple, and lamentably heuristic identity for B(T.):
) d
B(T,~ = max{B(T), B(T~)}.
CONVERSION TO AN HONEST PROPOSITION
i i
20 David Aldous and J. Michael Steele
For any given continuous distribution F, we know that the distribution G of the bonus B(T) for the Fweighted PM(I) tree T can be characterized as the unique solution of the fixedpoint equation DF(C) = G, and our heuristic derivation suggests that we can characterize the limiting distribution of B(T`) in a way that is perfectly analogous. Specifically, given any n distribution H and independent random variables
ZH, YG, and ~F,
we let 6F (H) denote the distribution of max{Y, Z}, and we consider the fixedpoint equation
DF (H) = H.
This time we do not know a priori that there is a solution to this equation, but we do expect the limiting distribution of B(T`Q) to exist, and for large n
n we also expect B(T`) to be much like B(T,,j. Therefore, we suspect
n
that the limiting distribution of B(T",') will satisfy 6F(H) = H. The next n proposition confirms these suspicions while also confirming the crucial fact that the solution of the fixedpoint equation 6F(H) = H is unique.
Proposition 2 For any continuous distribution F with support on [0, 00), the fixedpoint equation
DF(H) = H (3.17)
has a unique solution H,. moreover, one has
lim P(B(T") :5 x) = H(x) for all x E R. n no<)
Before proving this proposition, we will first show that it really does complete the program with which we began.
3.8 The Limit Theorem for Maximum Weight Partial Matchings
Our whole discussion of E[M,,l has been motivated by the formula
E[M.1 = (n 1)E[~l(~ > B(TZ"(e)) + B(T`.(e))) (3.18)
n
and now with the help of Proposition 1 and Proposition 2 we are ready to take the limit. If we assume that ~ has a finite expectation, then the integrands in formula (3.18) are uniformly integrable and they converge in distribution to ~l(~ > Y + Z), so without further concern we can take the limit under the expectation to obtain the main theorem of this section.
Theorem 1 If F is a continuous distribution with support on [0, oo) and a finite mean, then the maximum weight Mn of an Fweighted random tree on n vertices satisfies
lim 1 E[Mn] = E[~l(~ > Y + Z))l (3.19)
n oe n
The Objective Method 21
where the random variables ~, Y, and Z are independent,
~ F, Y G, and Z H,
and the distributions G and H are the unique solutions of the fixedpoint equations
DF (G) =G and .5F (H) = H. (3.20)
The expression E[~l(~ > Y + Z))] for the limit (3.19) in Theorem 1 always provides some structural insight into the value of the limit, but, since the distributions of Y an Z are only defined through the solutions of two fixedpoint equations, one might think that Theorem 1 is still rather abstract. The next result tells us that in one of the cases that matters most, the limit E[~l(~ > Y + Z))l may be calculated explicitly.
Theorem 2 If F(x) = 1 c' for x > 0, then the maximum weight Mn of a partial matching on an Fweighted random tree on n vertices satisfies
lim 1 E[Mn] = c(ey be') exp(cey ce('Y))se' dy ds,
n oa n OF 0
where the constants b and c are defined by equations (3.23) and (3.24) of Lemma 2. Moreover, by numerical integration one finds
lim 1 E[Mn] = 0.239583.... n oo n
To prove Theorem 2 we just need to solve two fixedpoint equations, and the lackofmemory property of the exponential distribution makes this easier than anyone has any right to expect. Later we will find several other situations like this one where distributional identities lead us to calculations that are mysteriously pleasant.
Lemma 2 If F(x) = 1 c' for x > 0, then the solutions G and H of the fixedpoint equations (3.20) are
G(y) = ep(cey) for y ~~. 0 (3.21)
and H(z) = (1 bez) exp(cez) for z > 0, (3.22)
where b and c are related by
b = C2/(C2 + 2c 1) (3.23)
and c is the unique strictly positive solution of
c 2 + ee = 1. (3.24)
Numerically, one finds b = 0.543353... and c = 0.714556....
i
i i
22 David Aldous and J. Michael Steele
To prove Lemma 2, we first consider Y G along with an independent pair of independent sequences {Yi : i = 1, 2,...} and {~i : i = 1, 2,...} with Y, G and ~i F. In terms of these variables, the equation DF(G) G simply means
P(Y < y) = P(~i Yi :5 y for all 1 < i < N) = exp(P(~ Y > y)
where the second identity follows by conditioning on the independent Poisson variable N. Now, if we take logarithms and apply the lackofmemory property of the exponential, then we find
log P(Y < Y) = P(~ > Y + Y) = eYP(~ > Y), (3.25)
and this identity gives us the explicit formula (3.21) when we set c equal to P(~ > Y).
Next, to get the determining equation (3.24) we only need to calculate
c = P(~ > Y) with help from the distribution (3.21) of Y; specifically, we
find
c M~ > Y) eYP(Y < y)dy = OF exp(ceY) eydy
exp(cz) dz = c` (1 e').
Finally, to get the distribution of Z, we use the defining relation (3.17) and the lackofmemory property of the exponential to see for z > 0 that
P(Z < Z) =P(Y < Z)P(~ Z < Z)
= exp(cez) P(~ :~ z + Z) = exp(ce') {1 e'P(~ > Z)},
so if we set b = P(~ > Z), then P(Z < z) = (1 be') exp(ce'), just as we hoped. Now, all that remains is to express b = P(~ > Z) as a function of c = P(~ > Y), and we begin by noting
b = P(~ > Z) = 1, P(Z < z)e'dz
= fw(l bez) exp(ce') e`dz = c b + b c 2 + beec.
0 C2 c
Now, if we use the identity (3.24) to replace e` by 1 _ C2, we see that the last equation may be solved for b to obtain the promised relation (3.23). Finally, we note that the uniqueness of c follows from the convexity of the map x 14 x 2 + e`. 3.9 Closing the Loop: Another Probabilistic Solution
The Objective Method 23
of a FixedPoint Equation
To complete our investigation of the maximal partial matching problem, we only need to prove Proposition 2. Specifically, we need to show that the fixedpoint equation
DF(H) = H
has a unique solution, and we need to show that this solution is the limiting distribution for the bonuses B(T big).
n
EXISTENCE FOLLOWS FROM GENERAL THEORY
The existence of a solution to the equation .5F (H) = H is almost obvious once one draws a connection between the fixedpoint equation and the theory of Markov chains. Specifically, if we set
K(z, A) = P (max{~ z, Y} EE A)
where F and Y G are independent, then K(x, A) defines a Markov transition kernel on R+. One then sees at that a distribution is a fixed point of DF if and only if it is a stationary distribution for the Markov chain with kernel K.
Finally, since the kernel K has the Feller property and since it is dominated by the distribution of max{~, Y}, the general theory of Markov chains on R+ tells us that there is a stationary distribution for K (cf. Meyn and Tweedie [491, Theorem 2.1.2).
UNIQUENESS OBSERVED AND CONFIRMED BY COUPLING
While we might also be able to show the uniqueness of the solution of the fixedpoint equation with help from Markov process theory, there is another approach via coupling that is particularly informative. Moreover, the argument anticipates a useful stability result that gives us just the tool we need to prove that B(T") converges in distribution.
n
To begin, we assume that the variables Y~ : 1 < m < oo } are independent with ~, F and Y,, G for all m. Next, simultaneously for each z > 0, we construct a Markov chain Z,(z), m = 1,2,... by taking Z0(z) = z and by setting
Z~n (z) = max{ Z._ 1 (z), Y. }for m > 1.
Now, for all z and z' in R+, we see that
ZmW ~~ Zm(z') implies Z,,i+I(Z) :5 Z~+1(Z'),
and from this antimonotone property we see that Z.(z) ~: Z,"(0) for all even values of m. Therefore, if we let
T = min{m odd : Z .. (0) = Y~
24 David Aldous and J. Michael Steele
then we see that ZT(z) = YT for all z. As a consequence, we see that T is a coupling time in the exceptionally strong sense that
Z. (z) = Z~ (0) for all z > 0 and all m > T; (3.26)
moreover, the definition of Z,,(0) gives us the simple bound
P (Z. (0) = Y. 1 Z. 1 (0) = z) ~1. P < Yj for all z > 0.
Now, if we set p = 1 P(~,,, < Yj then we can easily check that p < 1. To see why this is so, we first note that in a tree T that has exactly one edge with weight ~' the bonus B(T) is equal to ~'. Now, if T is a PGW(1) tree, then the probability that T has exactly one edge is equal to e', so for
Y A B(T) we have
P(Y E e'P(~' E
Now, ~ is independent of Y and has a continuous distribution, so we have the bound
p(y < e2p(~, :5
2e
Thus, we find p :5 1 1/2e < 1, so the bound
P(T > 2k + 1) :5 p" for all k > 0
gives us more than we need to show that T is finite with probability one.
Now, if H is the distribution function for a stationary measure for the kernel K and if Z H, then by stationarity we have Z.(Z) H for all n > 1. Therefore, by the bound on the coupling time, we find for all x E R that
IH(x) P(Z.(0) :5 x)i :S gk for all m > 2k + 1,
so we see that there is at most one stationary distribution.
A COUPLING CONNECTION
Here we should note that this simultaneous coupling argument shares several elements in common with the "coupling from the pasC technique that has become a popular tool in the theory of Markov chain Monte Carlo (cf. Propp and Wilson [541). In particular, the argument given above shows that for any random initial value X we have
IP(Z,(X) E A) P(Z E A)] :5 ok for all n > 2k + 1;
and, remarkably enough, one can even choose X so that it depends on the
elements of Y : m > 1}.
The Objective Method 25
3.10 From Coupling to Stability Thence to Convergence
If one repeats this coupling argument while relaxing the condition on the distribution of the variables Y,, so that they are only required to have a distribution that is close to G, one is led to bounds on distributions that can be viewed the solutions of an "approximate" fixedpoint equation. Any such stability result must face some notational clutter because many quantities are constantly changing, but the next lemma should be easy to parse if one keeps in mind that n is an index that increases as the distribution of Y,~ comes closer to G. Also, in the lemma we choose the time index m so that we can focus our attention on Xon, a quantity that one may view as the dependent variable at the "time" 0 when the "approximation quality" equals n.
Lemma 3 Fix the integer k > 1 and suppose for each n we have nonnegative random variables {X," : 2k + 1 < m < 0} that are linked to the variables {YIZ : 2k + 2 < m < 0} by the relation
X,n {~n _ XZ
', = max . _l,Y.} for allm = 2k+2,2k+3,...,0,
where the random variables {~,n,, : 2k + 2 < m < 0} are independent, have distribution F, and are independent from
Xn 2k+1 and {Y,,, : 2k + 2 < m < 0}.
Further, suppose that one has
(Ymn : 2k + 2 < m < 0) d (Ym : 2k + 2 < m < 0) as n ~ oo,
where the random variables {Ym m = 0, 1, 2,...} are independent and have distribution G. If H is the unique solution of
DF (H) = H, then we have the bound
lim sup 1 P(Xon :5 x) H(x) 1 :5 Qk for all X E R and k 2,
nw
where p is given by
p dF(x) dG(y) < 1.
y
FROM STABILITY TO CONVERGENCE
First, we let T,,,, denote an Fweighted PGW' (1) tree, and let VO, V1, V2.... denote the sequence of vertices along the unique path to infinity from the root vo according to the definition of PGW'(1) and the familiar Figure 1. Next,
26 David Aldous and J. Michael Steele
we let k > 1 be an arbitrary integer, and we note from the local weak convergence of T, to T.,, that there exists an integer N(k,w) so that for all n > N(k,w) there is a path 7r with 2k vertices and the following properties:
(1) the first vertex of 7r is the root v' of T,,
0
(2) if 7r is written as Vn _~ V, __+ Vn __+ V, then the deletion of
0 1 2 2k 1
the edges of 7r creates a forest of subtrees Tn, j 1, 2,, 2k 1, of Tn where
T n is rooted at C, and
(3) we have the weak convergence relation
(Tn : 0 < i < 2k 2) (T?' : 0 < i < 2k 2)
where all of the righthand components T', 1 < i < 2k 2, are independent PGW(1) trees.
Next, for each n > 1 and each 1 < i < 2k, we let M' denote the subtree of T, that is rooted at v!' that one obtains by removing the single edge (v~' 1, v!')
% t_ %
from the path 7r. We think of R7 as a "remainder subtree" that contains
n
everything "pasC v,!' 1 and to complete the picture we write RO Tn.
........ ..... .............. ..............
VTn
TO, T11
IR n
i+l:
V. V~ V!' v 1
0 1 `~
Fig. 4. The Residual Ri'+1
Finally, if we let ~,n denote the weight on edge (V7, Vn 1), then we may z i+ use exactly the same argument that gave us the heuristic derivation of the fixedpoint equation 5G(H) = H to show that
B(R n) = max{~!' B(R n 1), B(T~') } for all 0 < i < 2k 2. (3.27)
i % i+ %
In this case the argument is also rigorous since all of the trees involved are finite, and B() is well defined at each stage.
The Objective Method 27
Finally, we note that the recursion (3.27) sets up the direct application of Lemma 3. We only need to take m = i, to find that the lemma gives us
lim sup 1 P (B (TJ :5 x) H (x) 1 < p'.
nw
Since p < 1 and since k is arbitrary, the last inequality is more than we need to complete the proof of Proposition 2.
3.11 Looking Back: Perspective on a Case Study
When one looks back on the maximal partial matching problem, there are several themes that seem to be worth recording for future use. First of all, the inclusion criterion (3.5),
> B (TJ` (e) ) + B (T"`
n
helped drive the analysis almost from the beginning. In particular, this necessary and sufficient condition for the edge e to be in the maximal partial matching gave us the succinct representation (3.7) for E[M,,], which in turn encouraged us to study the limit theory of B(TJ`(e)) and B(T"q(e)).
n
The asymptotic analysis of B(Tj`(e)) was relatively straightforward, but the analysis of B (T" (e) ) forced us to work harder. The key to its be
n
havior turned out to be the random variable recursion (3.27),
B(R") = max{ ~in B(R'j), B(T7) } for all 0 < i < 2k 2,
i i+ Z
but this equation might have left us empty handed if we had not first studied its infinite analog. For the analog, we found there was a unique solution, and this was encouraging. Such equations often have "stabilized" versions, and this principle guided us to the proof of the weak convergence of B(T"(e)
n
One is unlikely to find too many problems that may be solved With a stepbystep application of the method used here, but we will soon see several other examples where most of the same elements are present. Certainly, the heuristic derivation and rigorous solution of a distributional identity emerges as a basic theme. Sometimes this identity is followed by stabilization, and at other times it is exploited to provide an analog of an inclusion rule similar to that provided by equation (3.5). Typically, the new inclusion rule would apply only in a more elaborate model that one constructs with help from the distributional identity.
We will see from the analysis of more complicated problems that the solution of the maximal partial matching problem is a hybrid; it is initiated and organized by the objective method, but it is completed with help from a traditional stochastic recursion. Nevertheless, the maximal partial matching problem offers persuasive evidence of the effectiveness of the objective method, and it provides a quick introduction to several of the method's main themes. In the next three sections, we hope to make this case more complete and more inclusive.
28 David Aldous and J. Michael Steele
4 The MeanField Model of Distance
In physics, and increasingly often in probability theory, one meets geometric problems where one tries to gain insight by building an analytic model that only captures part of the structure of the original set of interpoint distances. For example, instead of studying a random variable that depends on all of the interpoint distances of a point process in Rd, one may directly model just the distances from a given point to its successive neighbors and then study an appropriate analog of the original variable.
By preserving the distributions of the most essential interpoint distances while distorting the distribution of less important distances, one hopes to achieve a significant simplification while still gaining some insight into the phenomena originally of interest. Models that aim for this kind of simplification are commonly called mean field models following an earlier tradition in physics where complicated fields were replaced by simpler ones that were calibrated to match the average behavior of the more complicated fields.
In this section, we will describe several problems of probabilistic combinatorial optimization where meanfield models have been developed. These models have a natural kinship with the objective method, and local weak convergence turns out to be a most natural tool.
4.1 From Poisson Points in Rd to a Simple Distance Model
Consider for a moment a Poisson process 'P on R' that has uniform intensity 1/Vd where vd is the volume of the unit ball in Rd. For such a process the expected number of points in the unit ball is equal to one, and, if we look at the successive distances from the origin to the points of P, we have a new point process
0 < A1 < A2 < A3 < .... (4.28)
This process is again Poisson, although now we have nonconstant intensity, and the expected number of points in [0, x] equal to xd.
In a world where one is more concerned about the distances to one's nearest neighbors than about other distances, this simple reduction suggests a promising line of inquiry. Can one assign random distances to the edges of a complete graph K,, with n vertices so that for large n the successive distances from a point r in Kn will mimic the distances that one finds for the sequence of neighbors of 0 in the ddimensional Poisson process P?
A FINITE DISTANCE MODEL: THE dExPONENTIAL K,,
The answer to this question is affirmative, and it is easy to describe. For each edge e of Kn, we simply introduce an independent random variable
such that
~, 1 nl/dA,, or equivalently, P(~, ~: nl/dx) = e _Xd for x > 0. (4.29)
The Objective Method 29
Now, if we fix a vertex r of K,, and consider the distance from this root r to its nearest neighbor, next nearest neighbor, and so on, then one finds a set of n 1 increasing distances
0 < Dn < Dn < ... < Dn
1 2 nP
The key feature of the distribution (4.29) is that one may now easily prove that
> d ~ (Ai : i > 1) as n ~ oo. (4.30)
This useful result yields some geometric insight, but it is possible to go much further. One does not need long to guess that the distributional limit (4.30) may be strengthened to prove the local weak convergence of the finite distance models Kn. That is, one hopes to find that Kn satisfies the kind of limit theorem that we found for Tn in Section 2.
WHERE TO START
Anytime one hopes to prove a weak limit theorem in G,, the battle begins by searching out a reasonably explicit candidate for the limit object. Here for example, one should minimally note that any reasonable candidate for the local weak limit of Kn must have infinite degree at every vertex. Moreover, from the motivating limit (4.30) we also know that in the candidate object the distances from a vertex to its neighbors must behave like the successive values of the Poisson process (4.28).
After these easy observations, some deeper reflection is needed. In due course, one finds that the description of the candidate is almost complete., only one further observation is needed. Any candidate for the limit of graphs K. must be a tree despite the fact that Kn is more full of cycles than any other graph!
4.2 The Poisson Weighted Infinite Tree or, the PWIT
Even with a good limit object in mind, one needs to confirm that the candidate exists as a clearly defined process. Here this is easily done by an inductive procedure. One begins with a single root vertex that we denote by r. This root vertex is then given an infinite number of children, and the edges from the root to the children are assigned lengths according to a realization of a Poisson process (Qr : 1 < i < oo) with mean function x d . The children of the root are said to form generation one.
Now, recursively , each vertex v of generation k is given an infinite number of children, and the edges to these children of v are again assigned lengths according to an independent realization of a Poisson process (~Y : 1 < i < 00) with mean function x d . This procedure is then continued ad infinitum.
The resulting rooted infinite tree T is then a well defined random variable with values in 9,,. The tree T is said to be a Poisson weighted infinite tree, or, a PWIT. Still, the nickname intends no disrespect. PWITs have an inevitable
30 David Aldous and J. Michael Steele
role in the limit theory of trees, and the next theorem tells us that they are precisely the limits of appropriately weighted complete graphs on n vertices.
0.5 0.8 2.1
F9 1 F71
18/19 \3 7 0 9/12 \2 5 0 4/2 8 \4 1
E [1~] E [~ [~] n23 n31 n32 n33
Fig. 5. Part of a realization of a PWIT that shows just the first three children of each vertex; one needs to keep in mind that in the full PWIT each vertex has an infinite number of children. Here the values written next to the edges that descend from a fixed vertex are intended to reflect an independent realization of a Poisson process with mean function x d . Finally, the labels of the vertices in this figure are only for convenience; in a PWIT the vertices are unlabelled except for the root.
Theorem 3 (Convergence Theorem for KJ If K, is a randomly rooted complete graph on n vertices with independent edge lengths ~, that satisfy
P(~,~ ~~ nl/dx) = e'd for all x > 0,
then we have the local weak convergence
d
K~ ~ T as n4 oo. where T is a PWIT with edge weight mean function x d
This theorem from Aldous [3] permits one to study optimization problems for Kn in the same way that Grimmett's Lemma permits one to study optimization problems for the uniform random tree Tn. Nevertheless, there is one noteworthy distinction; the exponential weights on Kn are tied to the Poisson processes that generate the PWIT, but the Fweights for the edges of T, may be chosen with great freedom.
The Objective Method 31
We will not repeat the proof of Theorem 3 here, but we should indicate why the cycles of K, disappear in the limit. If we introduce a random variable that measures the length of the shortest cycle containing the root,
C,, = min {E ~, : C is a cycle of K,, that contains the root r 'EC then the key issue is that one can prove that
lim P(C. < 0 for all t > 0. (4.31)
noc
Intuitively this result says that in the limit there is no cycle of finite length that contains the root, and, with some attention to detail, this leads to a proof that the weak limit of K,, in G, must be a random tree.
THE DIMENSION PARAMETER AND A VOLUNTARY RESTRICTION
For the rest of this section, we will restrict our attention to the case when d = 1, and the Poisson process (4.28) of successive distances (Ai : 1 < i < oo) reduces to a plain vanilla Poisson process with constant rate 1. Nevertheless, we should emphasize that this restriction is voluntary; it is only taken to shorten and simplify our exposition.
For all of the problems studied in this section and the next, there are parallel results even for noninteger 0 < d < oo, and, in most cases, one will find complete proofs of these results in the references. Moreover, one should note that for the problems studied here, the logic of the analysis is identical for all values of 0 < d < oo, even though the explicit calculations often depend on the specific value. These calculations are almost always simplest for d = 1 where one has access to the lackofmemory property of the exponential.
4.3 The Cutoff Components of a Weighted Graph and a PWIT
For a moment, we consider a perfectly general weighted graph G, and we let G. denote the subgraph G. that one obtains from G by deleting all of the edges of length s or greater. Also, if v is a vertex of G we let c(G, v; s) denote the connected component of G, that contains v.
Now, if T is a dimensionone PWIT with root r we may view the component c(T, v; s) as a rooted tree with root r, and, rather pleasantly, this tree can identified with a GaltonWatson branching process with a single progenitor and an offspring distribution that is Poisson with mean 8. In other words, for all s > 0 we have
c(T, r; s) 1 PGW(s),
and this simple observation has several useful consequences.
In particular, if q(s) denotes the probability that the component c(T, r; s) is infinite, then classical formulas for the extinction probability of a branching
32 David Aldous and J. Michael Steele
process (such as those of Harris [30], p.7) tell us that we have q(s) = 0 when 0 < s < 1, and otherwise we find that q(s) is the unique strictly positive solution of
1 q(s) = exp(sq(s)) when s > 1. (4.32)
By inverting this relationship, one also sees that the value of 8 for which we have extinction probability 0 < q < 1 is given by
s(q) = q1 log(l q), (4.33)
a formula that will soon come in handy.
4.4 The Minimum Spanning Forests of an Infinite Graph
We want to define an analog of the minimal spanning tree for the PWIT and other infinite graphs, but some care must be taken. Naive attempts to mimic the usual greedy algorithms certainly will not work; in most cases, they can never get started.
Instead, we are guided by a wellknown criterion for an edge e = (V1, V2) with length s to be in the MST of a finite graph G with distinct edge weights. For such a graph, e is in the MST of G if and only if there does not exist a path in G from v, to V2 such that each edge of the path has weight less than s. This criterion is easy to prove, and it is occasionally useful in correctness proofs for MST algorithms, such as the five discussed in Section 4.4 of Matou~ek and NeWil [481.
The Objective Method 33
Definition 3 (Minimal Spanning Forest) The minimal spanning forest of an infinite graph G that has all distinct edge lengths is the subgraph MSF(G) of G with the same vertex set as G and with an edge set that contains each edge e = (V1, V2) of G for which
(1) c(G, vi; s) and C(G, V2; s) are disjoint, and
(2) c(G, v,., s) and C(G, V2; s) are not both infinite when s is taken to be the length of the edge e = (V1, V2) E G.
One can easily cheek that the graph MSF(G) defined this way will have no cycles, and, since it trivially spans the vertex set of G, we see that MSF(G) is certainly a spanning forest. Nevertheless, some work is needed to see why MSF(G) is the right analog to the MST of a finite graph and to justify the name minimal spanning forest.
A first step, one should note that all of the connected components of MSF(G) are infinite. To see this, suppose to the contrary that GO is a finite component of MSF(G) of minimal cardinality. Next, consider the shortest edge e = (V1, V2) between GO and its complement. Since one cannot get out of GO without using an edge length greater than s, we see that c(G, V, ; S) is contained in GO and hence that c(G, v,; s) is finite. Rom the definition of the MSF(G), we then have e E MSF(G), and this contradicts the minimality of the component Go.
Either c(G, v,; s) is finite
V2
V1
8
or C(C, V2; 8) is finite, or both are finite.
Fig. 6. The edge e (V1, V2) of G is in the minimal spanning forest if and only if at least one of the trees c(G, v,; s) and C(C, V2; 8) is finite when s = ~_
4.5 The Average Length Per Vertex of the MSF of a PWIT
If T is a PWIT with root r then the sum of the edges of MSF(T) that are incident to r can be written as
34 David Aldous and J. Michael Steele
SmsF(T) = 1: &l(e E MSF(T)),
e:rEe
and, if we think of each vertex as owning half of the length of each edge that meets the vertex, then it is reasonable to think of
A = 1 E[SMSF(T)l (4.34)
2
as an average length per vertex of the minimal spanning forest of T.
One might guess that A could be almost impossible to compute, but such a view underestimates the natural grace of the PWIT. The next lemma and its proof show that the PWIT supports some almost magical possibilities.
Lemma 4 The average cost A per vertex for the MSF of a P WIT satisfies
A = ((3) k3.
k=l
To prove the lemma, we first recall that if one conditions a Poisson process to have a point at s, then the set of complementary points is again a Poisson process. Thus, if we condition on the existence of an edge e of length s from the root to some child, then the subtrees obtained by cutting that edge are again independent PGW(s) trees.
The probability that at least one of these is finite is equal to 1 q2 (S), so this is also the probability that e is an edge in the minimal spanning forest and we find
A= 1f,(lq 2 (,)) d,. (4.35)
2
Calculation of this integral is quite a pleasing experience. We first integrate by parts and then we use the defining equation (4.32) for q(s) to find
1 2 1 1.82 1 11 10g2(1 q
A = 8 q(s)q(s) ds = q(s)q'(s) ds = ) dq,
2 fo 00 2 1 2 0 q
so the substitution u log(l q) lets us finish with
A = 1 1' U2 eu du = 1 U2 00 ku dU
2 1 e 2 Y7 e F3
0 OF k=l k=l
4.6 The Connection to Rieze's C(3) Theorem
The emergence of ((3) as the average cost per vertex of the minimal spanning forest of a PWIT is at least a little mysterious. Still, there is a related precedent, and when one sees ((3) in the context of trees, the famous ((3) theorem of Frieze [21] immediately comes to mind.
The Objective Method 35
Theorem 4 Let each edge e of the complete graph on n vertices be assigned an independent cost ~, with distribution F where F has a finite variance and where F(O) exists and is nonzero. If C~ls'(F) denotes the minimal cost of a spanning tree of this graph, then as n > oo, we have
C'M'ST p [C;~ST
n (F) ((3)IF'(0) and E (F)] , ((3)IF'(0). (4.36)
By a simple coupling argument, Steele [571 showed that the convergence in probability still holds without any moment assumptions, and later by a martingale argument Frieze and MeDiarmid [22] showed that convergence in probability can be strengthened to almost sure convergence.
Nevertheless, what matters here are simply the exponential costs, and to make the connection with our earlier calculations explicit, we note that Rieze's theorem implies that if the edge weight distribution & is exponential with mean n, then we also have
li 1 E [C,~4ST(F.)] = ((3). (4.37)
noc n
This limit says the average cost per vertex of the minimal spanning tree of the appropriately weighted complete graph tends to the same value as the expected average cost per vertex of the PWIT with d ~ 1. Conceivably, the reappearance of ((3) is only a coincidence, but it is not. Quite to the contrary, the limit (4.37) is a consequence of the fact that A = ((3).
THE LOCAL WEAK CONVERGENCE LINK
As a special case of Theorem 3, we know that for the complete graph K,, on n vertices with edge lengths that are independent and exponentially distributed with mean n, one has the local weak convergence
K. d ~ T as n > oo (4.38)
where T denotes a PWIT with d = 1. We will also see later in Theorem 6 of Section 5 that just local weak convergence (4.38) and the fact that the PWIT is an infinite graph with all distinct edge weights is enough to give us
(K,, MST(K.) ) d _~ (T, MSF(T)). (4.39)
This limit is typical of the way the theory of local weak convergence informs one about a problem in combinatorial optimization. At a conceptual level, the limit (4.39) answers almost any question one may have about the qualitative behavior of MST(K.) for large n, provided that the question meets one restriction; it be expressed in terms of a function of some neighborhood of the root of MST(Kn).
36 David Aldous and J. Michael Steele
A RANDOMIZED ROOT AND A TOTAL SUM
The expectation E[C~"'(F,,)] can be written as a function of the neighborhood of the root, yet this representation seems to be a lucky break. Eventually one calls on this break routinely, but at least once it seems useful to examine the full logical chain; we begin by noting
Q~ ST(F,') 1 Y7~,1(e E MST(KJ) c
so now if we let R denote a root chosen at random from the vertex set of K,, we also have
1 E[E~,1(e E MST(KJ)] E[ 1: ~,1(e c MST(K.))]
n c e:REe
and from the local weak convergence MSF(K,,) ~ MSF(T) we have
E ~~l(, e MST(KJ) 1: ~,1(e E MST(T)). (4.40)
e:REe e:REe
Now, once one checks uniform integrability (which is a not particularly easy, but which is a doable technical exercise), one finds from the basic distributional limit (4.40) that
E[ 1: ~,1(e E MST(Kn))] , E[ 1: ~,1(e E MST(T))] = 2((3),
e:REe e:REe
where the last equality comes from Lemma 4. IProm the links of the chain we
find 1 E[C1;~"(Fn)] ((3),
n
just as we hoped to show.
5 Minimal Cost Perfect Matchings
A perfect matching of a finite graph is a collection of disjoint edges with the property that at most one vertex of the graph fails to be contained in an edge of the collection. Perfect matchings are among the most intensively studied structures of graph theory, and they suggest many natural random variables. Here we focus on the prince among these, the minimal cost C. of a perfect matching of the complete graph K,, with independent edge weights ~, with distribution F.
The assignment problem discussed in the introduction is precisely the perfect matching problem for the bipartite graph K,,, and the analysis of C. is intimately connected to that of A.. Thus, the following theorem is a direct analog of the ((2) limit theorem for the expected assignment cost E[Anj, but here we rescale the Fweights to be consistent with our discussion of the local weak limit theory of K,,.
The Objective Method 37
Theorem 5 If C~ denotes the minimal cost of a perfect matching of the complete graph K,, on n vertices with independent edge costs ~,, having the exponential distribution with mean n, then
1 1
lim E[C,,l k 2 =
~((2). (5.41)
noo n 2
k=l
Given the earlier experience with the maximum partial matching on the weighted uniform tree Tn, one has every reason to expect that the local weak convergence of Kn should provide the key to the proof of this theorem. This view is well founded, but the earlier analyses only provide one with a proper orientation. There are many devils in the details, and a complete proof of Theorem 5 would take more space than we have here.
Nevertheless, we can provide the better part of onehalf of the proof; that is, we can provide most of the argument that proves the limit infirnum, of E[C.11n is not less than ((2)/2. While doing so, we can also underscore the main structural features of the full proof. In particular, we develop a basic distributional identity and solve the identity to find an unexpected connection to the logistic distribution. We can then use the distributional identity to construct a special stochastic process which can be used in turn to define an inclusion rule for the edges of a PWIT to be in a minimal cost matching. Finally, given just a few properties of the freshly constructed process, we can complete the analysis of the limit infimum.
In the course of this development, we can also complete a promise made in Section 2; specifically, we can formalize the involution invariance property that one finds for all limit objects that are given by the standard construction discussed in Section 2. Even though involution invariance seems perfectly innocuous, it has an unavoidable part in the proof of Theorem 5. In the earliest stage, it helps weed out defective candidates for the limit objects, and at a later stage it guides us to a proper understanding of the matchings on the PWIT that can be attained as limits of matching on K,.
5.1 A Natural Heuristic Which Fails for a Good Reason
The local weak convergence of K,, almost compels one to think of using good matchings on the PWIT to help in the hunt for good matchings on K,. The natural hope is that a perfect matching in the PWIT with minimal average cost per vertex should be close to a perfect matching of K. for large n; thus, weak convergence theory might set us on the trail of a proof of Theorem 5. This hope can be justified, but only after one restricts attention to an important subset of the matchings on the PWIT that satisfy a natural symmetry property. If one ignores this restriction, there are matchings on the PWIT that are simply too good to be tied to matchings on K,,.
38 David Aldous and J. Michael Steele
AN EXAMPLE: GREEDY MATCHING ON THE PWIT
In a finite weighted graph one reasonable way to hunt for a good matching is the global greedy algorithm where one picks the edge e with minimal weight, removes all of the edges incident to either endpoint of e, and proceeds recursively. Unfortunately, in a PWIT this method does not have a chance to start a PWIT has no edge of minimal cost.
A Symbolic PWIT and its Greedy Matching
The greedy matching
of a PWIT is not
involution invariant,
so we cannot use it
to help construct a
0 good matching on K,.
Fig. 7. To construct the greedy matching of a PWIT one takes the shortest edge e out of the root, removes all of the vertices incident to e, and proceeds recursively. The expected cost per vertex of the greedy matching is equal to onehalf, which beats the unbeatable limiting average cost of ((2)/2 per vertex for K..
In the PWIT, a reasonable alternative to the global greedy algorithm is the local greedy algorithm where one starts by choosing the edge of minimal cost that is incident to the root. In a PWIT such an edge e exists, and we even know that it has expected value equal to 1. One can then delete e, remove the edges incident to e, and proceed recursively along the infinite forest produced by this first step.
The local greedy algorithm produces a matching in the PWIT with an expected cost per vertex equal to onehalf, and this is intriguing since it is even smaller than the value ((2)/2 that we expected to show to be asymptotically optimal for the K,,,.
Unfortunately, the greedy matching for the PWIT simply cannot be the limit of a sequence of matchings for K,,. Moreover, one can even see this fact from first principles. The problem is that the greedy matching lacks a symmetry property that one must find in every limit object like the limit of the optimal matchings of K,,.
40 David Aldous and J. Michael Steele
fiG (e) = pc (v)
Tic (e') = pc
W Involution invariance
v would force
PG (V) = PG (W)
Fig. 8. Given C = (V, E) we let PG (v) denote the probability that v is the root of G, and if fi is the induced measure, we write ~G for the marginal measure given G. For any directed edge (v, w) of G, the measure ;~G puts mass pc (v) on the edge (v, w) and puts mass pc (w) on the edge (w, v). If one has involution invariance, then ;~G (V, W) = ;~G (w, v), and we see that pc (v) = pc (w). Thus, pc () is constant on each connected component of G whenever one has involution invariance.
Lemma 5 If ji is the probability distribution of G [X] where G [X] is given by the standard construction and C[X] is finite with probability one, then /A is involution invariant. Conversely, if the probability measure it is involution invariant and concentrated on the finite connected graphs in G, then JA is the distribution of a G,valued random variable G[Xl that is given by the standard construction.
To prove the direct implication, we first condition on the event that the underlying graph of G [X] is the fixed finite graph G. If G has n vertices and m undirected edges, then the marginal measure fiG puts mass l/n on each of the 2m directed edges of G, so we trivially get involution invariance. To prove the converse, one notes by the captioned discussion of Figure 8 that PG(.) is constant on each connected component of G. Finally, if we assume that G is connected, we see pc(.) is constant on G. This is precisely what it means to say that the root X of G is chosen uniformly from its vertex set, so the proof of the lemma is complete.
CONNECTION TO THE OBJECTS OF THE OBJECTIVE METHOD
One of the basic features of involution invariance is that it is preserved under local weak convergence in the metric space 9,. To prove this fact, one first shows that the topology on 9, extends in a natural way to a topology on ~. that makes ~. a complete separable metric space. Next one shows that the transformation p ~ fi is a continuous mapping with respect to weak convergence; specifically
d d
1An 1 Poo =~. /An Poo.
Finally, one shows that the involution map t ~ ~, is continuous.
The Objective Method 41
5.3 Involution Invariance and the Convergence of MSTs
The formalities of involution invariance give one no hint that the notion has much force, but the next theorem shows that involution invariance can be genuinely powerful. In fact, there is something fundamental about involution invariance, since, in a way, it replaces the spatial invariance that one uses constantly in the theory of homogeneous point processes. The pleasantly general Theorem 6 contains as special cases Rieze's ((3) theorem which was discussed earlier and the convergence theorem for MSTs of random samples in the cube which we will discuss in Section 6.
Theorem 6 (MST Convergence Theorem) Let G,,, denote a G,valued random variable such that with probability one Q, has infinitely many vertices and no two of the edges of C have the same length. Further, let {Gn : n = 1,2,...} denote a sequence of g~valued random variables such that for each n the distribution Of Gn is given by the standard construction and such that for each n the vertex set Of Gn has cardinality n with probability one. If
d
Gn ~ G,,. as n oo, (5.42)
then one has the joint weak convergence in 9,, x 9,,,
(Gn, MST(Gn) d _~ (G,,,, MSF(Q,.) (5.43)
Further, if N, denotes the degree of the root of MST(Gn) and N denotes the degree of the root of MSF(G,)~)
N,~ d ~~ N and E[Nn]+ EIN1 = 2, (5.44)
and, if Ln denotes the sum of lengths of the edges incident to the root of MST(Gn) and L denotes the corresponding quantities for MSF(G,'), then
Ln d ~ L. (5.45)
The applications of the limit (5.43) are easy to understand without working through the convergence proof. Nevertheless, a survey on local weak convergence should give at least one such proof in reasonable detail, and the limit (5.43) is a attractive candidate. The proof demonstrates several basic facets of involution invariance, and the net result is genuinely useful.
To begin the proof, we first note that the hypothesis (5.42) tells us the sequence {G,} is tight, and if we consider the sequence of pairs (G,, MST(G.) one can check that they determine a sequence {M,} of measures on 9, x 9. that is also tight. We then let p denote a measure on 9, x 9. that one obtains as the weak limit of a subsequence {/,.i }.
By Skorohod's embedding theorem, we may suppose the 9, x G.valued random variables (G,,,, MST(G.j) ) are all defined on a common probability space and that there is a 9,valued random variable S such that
42 David Aldous and J. Michael Steele
(G,j, MST(GJ) , (G,,,,, S) with probability one. (5.46)
From the definition of p as the weak limit of the sequence {pj }, we see that the proof of the limit (4.39) boils down to showing
S(w) = MSF(G,,J(w) with probability one. (5.47)
This identity will be proved in two steps one for each direction of the set inclusions.
THE FIRST INCLUSION is EASY
To show that MSF(G,,J(w) C: S(w) for almost all w, we begin by con~ sidering an edge e = (u, v) E MSF(G,,J that has length s. By the definition of the minimal spanning forest, we may assume without loss of generality that the component c(G,,, u; s) is finite. For convenience of notation, we will also assume that the subsequence nj is in fact the full sequence; when the argument is done, one will see that no generality was lost.
Now take a neighborhood H of u that is large enough to contain c(G., u; s) and note by the almost sure convergence of Gn to T that there is a finite N(W) such that for all n > N(w) there is an isomorphism between H and a neighborhood of the root of G, Also, since c(G., u; s) is finite, there is an E > 0 such that each edge of c(G, u; s) has length less than s .
If en = (U,, Vn) denotes the edge associated with c by the neighborhood isomorphism, then for large n the length of en is at least s E/2 and all edges of c(Gn, Un; s) have length strictly less than s /2. Therefore, there can be no path in G,, from Un to Vn that has all of its edges shorter than the length of c'. Thus, we find that there a finite N'(w) such that en is in MST(Gn) for all n > N'(w). Since MST(Gn) converges to S in G, we get our first inclusion
MSF(G~)(w) c S(w) with probability one. (5.48)
A MORE SUBTLE INCLUSION
To complete the proof we just need to show that the set difference
D(w) = S(w) \ MSF(G.)(w) (5.49)
is empty with probability one. By the general lemma below, we also see that it suffices to show that the number of edges in D that meet the root of G has expectation zero.
Lernma 6 Let G be an involution invariant element of G, with root r and let W be an involution invariant subset of the edges of G. If
E[ Y~ 1(e E W)]= 0,
e:r4~e
then W is empty with probability one.
The Objective Method 43
This lemma is proved in Aldous and Steele [7], and, since the way that it uses involution invariance is more simply illustrated by the proof of Lemma 7, we will not repeat the proof Lemma 6 here. Instead, we will go directly to its application.
Naturally, we want to apply Lemma 6 where W is taken to be the difference set D defined by equation (5.49), so our task is to estimate the corresponding expectation. If we let N,, denote the degree of root of G, in MST(G.) and let N denote the degree of the root of G,,, in MSF(G,.), then we have the usual Skorohod embedding and the inclusion relation (5.48) give us the bound
N < lim inf N, (5.50)
noo and the definitions of D, Nn and N also give us
Y7 1 (e E D) :5 (lim inf Nn) N. (5.51)
noc,
e:rEe
Since an nvertex tree has exactly n 1 edges, we always have the trivial limit
E[Nn] = 2(n 1)/n / 2,
so, by the bound (5.50) and Fatou's Lemma, we have
E[N] :5 E[lim inf N,,l :~ lim inf E[Nn] = 2.
noe n_00
This observation almost completes the proof of Theorem 6, and the next lemma does the rest. It is all we need to confirm that the expectation of the sum (5.51) is zero.
Lemma 7
E[N] = 2.
In fact, only half of the lemma remains to be proved. we just need to show E[N) > 2. To begin, we consider an edge e = (r, v) that is incident to the root r of MSF(G). If we write f for "finite" and oo for "infinite," then each such edge must be one of four possible types
(f, f ), (f, w), (oo, f ), and (co, ce)
according to the cardinality of the respective components of MSF(G,.) \ {e} that contain r and v. We will write Nff, Nf., N.f, and N.. for the numbers of edges of the corresponding types.
From the definition of the minimal spanning forest, we know that every vertex is in an infinite component of the MSF, so Nff is always zero. A more important observation is that unless r has infinite degree, the argument given
in the caption of Figure 9 gives us
2Nf. + N > 2. (5.53)
44 David Aldous and J. Michael Steele
The key step turns out to be a very simple involution invariance argument that shows that Nf. and N,,~f have the same expectation. Once this is confirmed, one can complete the proof of the lemma by noting that the bound (5.53) gives us
E[N] = E[Nf.] + E[N.f ] + E[N..]
= 2E[Nf~ + E[N,,~,,,] > 2.
1 ' \,V/, V
If the root is If the root is
a leaf not a leaf
Fig. 9. If the root of G. has finite degree, we are sure to have 2Nf_ + N_ ~: 2 because no edge of the MSF of G can connect two finite components of the MSF and because in the three remaining cases one always has either Nf_ ~: 1 or N~ > 2.
Finally, to establish the equality of E[Nf,,,l and E[N,,,,f], we first recall the measure fi that one uses to define involution invariance. For any directed edge e we have
E[Nf,~] = fi{e is of type (f, oo)}, and we also have
E[N,>,,f ] = fi{e is of type (oo,
= fi{e is of type (f, oo)},
where e is the reversal of e. Now, since G,,, is involution invariant, we have equality of reversals, so
fi{e is of type (f, o<))} = fA{e is if type (f, oo)},
and we find at last that E[N~fl = E[Nf,),j. This completes the proof of Lemma 7, and hence completes the proof of Theorem 6 as well.
The Objective Method 45
A THOUGHT ON INVOLUTION INVARIANCE
The way that involution invariance does its work is by telling us that "whatever is true from the perspective of the root r" is also true from the perspective of the root's neighbor v. In the case of a connected graph, one can chain this argument from vertex to vertex and conclude that "whatever is true from the perspective of one vertex is true from the perspective of any other vertex." This statement is far from precise, but the derivation of the identity E[N.fl = E[Nf.] suggests that it provides honest guidance.
RESULTs RELATED TO THE MST CONVERGENCE THEOREM
The MST Convergence Theorem given here is new, and it generality serves to underscores the benefit of working with the local weak convergence of sequences of 9,valued random variables. The simplicity of Theorem 6 should not conceal the fact that it contains as special cases both the basic convergence theory for MSTs for weighted complete graphs developed by Aldous [11 and the corresponding theory for random Euclidean points developed by Aldous and Steele [61. Avram and Bertsimas [121 independently studied both of these cases, and one further finds that their calculations are also much more transparent when viewed in the light of local weak convergence.
5.4 A Heuristic That Works by Focusing on the Unknown
The greedy algorithm gave us an excellent matching on the PWIT, but it lacked involution invariance and did not help a bit in our hunt for a good matching on K.. Moreover, the symmetry of the PWIT makes it hard to see how one can do better; almost any matching one might find by good guessing is likely to face the same failings that were met by the greedy matching. Nevertheless, a good involution invariant matching can be found, provided that one takes a hint from the analysis of the maximal partial matching problem.
The overarching plan is to consider the minimal cost per vertex of an involution invariant perfect matching on the PWIT and to treat this cost as an "unknown." We then begin the hunt for an equation that contains this unknown, and, in due course, heuristic considerations will lead us to a distributional identity. This identity can then be solved rigorously, and its solution can be used in turn to build a joint process with one marginal being a PWIT and another marginal being random variables that satisfy the distributional identity. These auxiliary variables can then used to construct an 9nclusion rule" that provides a test for an edge's membership in the matching on the PWIT.
One small wrinkle this time is that we will focus on a simple difference, rather than the more subtle bonus that we used before. Specifically, we focus on the difference
D(T) = C(T) C (T
46 David Aldous and J. Michael Steele
where C(T) is the minimal cost of a perfect matching of T and QT \ {r}) is the minimal cost of a perfect matching of the graph T \ {r}. Naturally, both C(T) and C(T \ {r}) are infinite, so D(T) is not really well defined. Nevertheless, in our analysis of the maximal partial matching problem we met a similar difficulty, and we still managed to prevail. We have every reason to hope for a similar outcome this time.
If j is a child of the root, then we let T.7 denote the descendants of j and we view Ti as a rooted graph with root j. We then introduce the analogous heuristic difference D(P), and note that even though D(T) and D(P) are not honestly defined, one can continue in the brave tradition of physical scientists and argue that nevertheless they should satisfy the distributional identity
D(T) = min { ~j D(P) (5.54)
1
To see this, we first note that the minimal cost of a matching that contains the edge (r, j) from the root r to the vertex j has cost
~j + C (Ti {j}) + 1: C (P) (5.55)
i:i96j
and that C(T) is just the minimum of all of these values. Also, we have the direct identity
C (T C(P). (5.56)
The difference of equations (5,55) and (5.56) is just ~j C(P) + C(Ti {j}), so in the end we see that
C (T) C (T \ {r}) = min { ~j C (P) + C (Ti (5.57)
1
5.5 A Distributional Identity with a Logistic Solution
Equation (5.54), or its expanded equivalent (5.57), motivates us to begin our rigorous analysis by first considering the distributional identity
X 1 min {~j xj} Oo < X < 00, (5.58)
lsi
which we interpret to be an identity for a distribution F that is fully specified by supposing (1) X F, (2) the Xi are independent, (3) Xi F for all i = 1, 2,..., and (4) the sequence {Xi} is independent of the process 0 < ~1 < ~2 < ... which is a Poisson process with constant intensity equal to one.
Naturally, one wants to know about the existence and uniqueness of a distribution F that satisfies this distributional identity. The next lemma provides an answer of striking simplicity.
The Objective Method 47
Lemma 8 (Logistic Solution of a Distributional Identity) The unique solution of the identity (5.58) is the logistic distribution
F(y) = 1/(1 e1), oo < y < oo (5.59)
corresponding to the densityfunction
f (y) = (ey/2 + CY12) 2, 00 < y < 00.
BETTER THAN A PROOF A DERIVATION
Rather than just give a direct verification of this lemma, one does better to see how the probability theory of the distributional identity leads one to discover the connection to the logistic distribution. Thus, for the moment, we assume that the distribution F satisfies the identity (5.58), and we will try to calculate F.
We first note that the distributional identity (5.58) asserts that
1 F(y) = P in {~i Xi} ~~ y
where the {Xi : i = 1, 2,...} on the righthand side are i.i.d. with distribution F and {~i : i = 1, 2, ...} is an independent Poisson process. Therefore, if we consider the collection of points P in R' given by { (~i, Xi) : i = 1, 2,...} we see that P is a Poisson process on (0, oo) x (oo, oo) with intensity dzdF. As a consequence, we have
1 F(y) =P(no point of 'P is in {(z, x) : z x < y}
= exp dz dF(x)
( < Y
= exp {1 F(z y)} dz)
= exp fwy {1 F(u)}du) .
When we differentiate this identity we find
F'(y) = (1 F(y)) (1 F(y)), (5.60)
and the symmetry of this equation implies that the density F() is symmetric.
As a consequence, we see that equation (5.60) is equivalent to
F'(y) = F(y)(1 F(y)),
which is easily seen to be separable. We can rewrite it as
dy = dF = dF + dF
F(1 F) F 1 F
48 David Aldous and J. Michael Steele
and solve by inspection to find
y = log F or F(y) = 1/(1 ey). 1 F
Thus, F is simply is the logistic distribution. It is famous for popping up in odd places, but its appearance here still a bit surprising. The remainder of our analysis does not call on any theory of the logistic distribution beyond the easily checked fact that it has variance 7r 2 /3.
5.6 A Stochastic Process that Constructs a Matching
With the explicit solution of the distributional identity (5.58) in our hands, the next step is to construct a joint process where the distributional identity (5.58) suggests an inclusion criterion that can be used to define a good matching on the PWIT. Specifically, we will construct simultaneously (1) a PWIT and (2) a special collection of random variables that are associated with the edges of the PWIT. The variables in this special collection are intended to mimic the differences D(Ti) that led us to the logistic recursion. To put this plan into action, we will need to introduce some notation that makes more explicit the consequences of cutting a PWIT at one of its edges. NOTATION FOR A SPLIT PWIT
When we take a PWIT and delete the edge (u, v) we get two disjoint trees that we can view as being rooted at u and v respectively. To help keep track of these trees (and the difference functionals that we will associate with them), we assign to each edge of the PWIT a pair of directed edges.; formally, we just distinguish between (u, v) and (v, u). Given any graph G we let Ec denote the set of edges obtained by directing the edges of G, and, if T is a PWIT with edge weights ~,, then we give the same weight to each of the directed edges associated with c. Finally, if e = (u, v) is a directed edge and X(e) is a random variable associated with that edge, we will often simply write X (u, v). In the case of the PWIT, the weight of the directed edge ~,, will also be written as ~(u, v). By definition we have ~(u, v) = ~(v,u), but, typically, X (u, v) and X (v, u) will be different.
Lemma 9 (MAple Tree Process) There exists a process (T, ~., X(e) : e E E7) where T is a PWIT with associated edge lengths {~, : e E Er} and where { X(e) : e E Er} is stochastic process indexed by the directed edges of T with the three following properties:
(i) for each directed edge e = (u, v) E Er we have
X (u, v) = min{~(v, w) X (v, w) : (v, w) E E1, w 7~ u}, (5.61)
(ii) for each e E ET that is directed away from the root of T the random variable X(e) has the logistic distribution, and
(iii) for each pair of oppositely directed edges (u, v) and (v, u) in Er, the random variables X (u, v) and X (v, u) are independent.
The Objective Method 49
............
T (v, u)
U V
T(u, v)
.......... ............
Fig. 10. When one cuts a PWIT at an edge, one obtains two trees that one can label by using the directed edges associated with the cut edge. The tree T(U, v) labelled by the directed edge (u, v) has the root v and contains all of the edges that follow v in the direction of (u, v).
The proof of Lemma 9 is easy and instructive; it shows how one can exploit a distributional identity to construct useful processes on trees and other graphs. To begin, we fix an integer g ~: 1 and then for each directed edge (u, v) with u in generation g and v in generation g + 1 (relative to the root), we create by flat an independent logistic random variable X (u, v). We next use the identity (5.61) to define X(e) for each edge within g generations of the root. This construction gives us a collection C, of random variables whose joint distribution satisfies properties (i), (ii) and (iii) for all vertices in the first g generations. Moreover, the distribution of the collection C, is equal to the marginal distribution of the larger collection C,+, restricted to the first g generations, so by the Kolmogorov consistency theorem there is a collection C,<, such that for each g the marginal distribution of C,,,, restricted to first g generations is equal to the distribution of C, 11
A USEFUL ADDENDUM TO THE TRIPLE TREE LEMMA
Once the triple process is in hand, one can go further, in fact, one can show that the process has some intrinsic independence properties that are not an explicit part of the construction. Moreover, we will find that this additional independence has a key role in our later computations.
First we take a fixed 0 < x < oo, and we condition on the event that there exist an edge a the root that has length x. We call this edge (r, v) and note that as in Figure 10, the edge determines two subtrees of the PWIT that one could label T(r, v) and T(v, r). As we have noted before, a Poisson process conditioned to have a point at x is again a Poisson process when this point is deleted, so by the definition of the PWIT we see that T(u, v) and T(v, u) are conditionally independent copies of the original PWIT. We therefore find an addendum to Lemma 9:
(iv) conditional on the existence of an edge (r, v) from the root having length x, the random variables X (r, v) and X (v, r) are independent random variables with the logistic distribution.
50 David Aldous and J. Michael Steele
Some of the double edges into and out of v
X (U, v) Infinitely many
............. v
edges are not drawn.
X (V, U)
Fig. 11. The edges that feature in the identity for X (u, v) are those directed edges out of v indicated by the solid arrows; note that the edge (v, u) is not one of these. Note also that the weight ~. of an edge does not depend on its direction, but the value of X does.
CONSTRUCTION OF AN EXCELLENT MATCHING ON T
For the triple process (T, ~, X) given by Lemma 9, one can use the additional information carried by the X process to specify a good matching on T. As before, the idea is to use an inclusion criterion that is motivated by the heuristic derivation of the distributional identity (5.58).
To define this matching, we first consider the map W : V > V on the set V of vertices of the PWIT that is given by W(v) = v* where v* is the vertex for which ~, X(e) is minimized when e = (v, v*). The next lemma confirms that the set of edges (v, v*) is a matching., we will subsequently show that it is a good one. More precisely, one can show that this matching is optimal in the sense that no other involution invariant matching of aPWIT can have a lower average expected length per edge.
Lemma 10 (Matching Lemma) The set of edges (v, v*) defined by the rule W(v) = v* is a matching on the special PWIT T.
To prove Lemma 10, we first note that it is exactly equivalent to showing that W is idempotent; that is, we must show that M)* = v for all v E V. Now, by the definition of W and the recursion (5.61) for X, we have
~(v, V*) X(V, V*) = min{~(V5 Y) X(V, Y) (V, Y) E T}
< minWV, Y) X(v, Y) (v, Y) E T, y 54 v*} = X(v*, v), or, a bit more simply
The Objective Method 51
~(v, V*) < X(V, V*) + X(v*' v). (5.62)
On the other hand, if z is a neighbor of v other than v*, then we have
(v, z) X (v, z) > min {~ (v, y) X (v, y) (v, y) E T}
min{~(Vi Y) X(v, Y) (v, Y) E 7, Y 0 Z}
X (Z, v).
This inequality tells us ~(v, z) > X (v, z) + X (z, v) whenever z 7~ v, so we see that v* is the unique neighbor of v that satisfies the inclusion criterion (5.62).
We can therefore see that we could have just as well defined ~o(v) as the unique v* satisfying the criterion (5.62). Finally, if one notes that both sides of the criterion (5.62) are symmetric we see that W(W(v)) = v, just as we hoped to show.
WISDOM OF A MATCHING RULE
Given the maneuvering that led us to the matching rule r* = W(r), one has a right to expect that there is no invariant matching that has a lower expected cost per vertex. The next lemma asserts that this expectation is justified.
Lemma 11 (Optimality on the PWIT) Let M denote any matching of 7_ that is involution invaiiant, If (r, s) is the edge of M that contains the root r of T, then one has
E[~(r, s)] ~~ E[~(r, (5.63)
The proof of Lemma 11 is unfortunately too lengthy to be included here, and even a convincing sketch would run beyond our boundaries, so for most details we must refer the reader to Proposition 18 of Aldous [41, which additionally proves that inequality (5.63) is strict except when the matching M is almost surely identical to the matching defined by W.
5.7 Calculation of a Limiting Constant: 1r2/6
If M~pt is the matching given by Lemma 10 and if r* = W(r) is the vertex that is matched to the root r by Mopt, then the expected value per edge of the weight of the M.pt is simply equal to EK(r, r*)]. This expected value turns out to be surprisingly easy to compute.
Lemma 12
*A W2
E[~(r, ' _r
52 David Aldous and J. Michael Steele
To prove the lemma we first fix 0 < x < oo and then we condition on the event Ax that there exists an edge (r, v) incident to the root of T that has length x. The inclusion criterion (5.62) then tells us r is matched to v if and only if we have
x < X(r, v) + X(v, r).
By the addendum to Lemma 9 we know that conditional on the event A.
the pair (X (r, v), X (v, r)) same distribution as a pair (X, Y) of independent
logistics, so by the uniform intensity of the Poisson process on [0, oo) we have
E[~(r, r*)] = xP(x < X + Y) dx = 1 E[(X + y)21(X + y > 0)].
10 ~ 2
Finally, the symmetry of the logistic distribution lets us rewrite the last expectation as
E[(X + y)2] E [X2] = 7F2 /6, 2
where in the last step we used the fact noted earlier that the logistic distribution has variance 7r 2 /3.
5.8 Passage from a PWIT Matching to a K, Matching
We are now in position to sketch the connections that complete the proof of Theorem 5, and we begin by noting that the observations in hand already give us one half of the theorem. For each n we let M, denote the optimal matching on K,, and we view M,, as a 9,valued random variable by setting the root of M,, to be the root of K,,. By the tightness of the joint process (M,,, K.) we know that for every subsequence of n = 1, 2, ... there is a further subsequence nj such that (Mj, K.j) converges weakly in G. x G.. By Skorohod's theorem and our earlier analysis of the weak limit of K,,, we may even assume that
Mj, K.j) (MJ) with probability one, (5.64)
where T is a PWIT and M is a matching on T. Moreover, from the fact that Mn is randomly rooted, we see that M is involution invariant.
Now, if c(MJ denotes the cost of matching the root in M. and c(M) denotes the cost of the edges that meets the root of T, then from the limit (5.64) and the definition of the topology on G, we have
c(M,J > c(M) with probability one.
From this limit, Fatou's Lemma, and Lemma 12 we then find
lim inf E[CMj)l ~~ E[c(M)l > 2 /6.
noo
Finally, by the generality of the subsequence {nj}, we therefore find in the notation of Theorem 5 that
The Objective Method 53
lim inf 2 E[C,,l ~~ 7r 2 /6. (5.65)
noo n
This proves half of the equality (5.41) of Theorein 5; the bad news is that this is the easy half.
THE HARDER HALF AND ITS Two PARTS
The harder half of Theorem 5 breaks into two natural steps. First one needs to show that there is a lowcost Efeasible matching for K,, for large values of n, and then one needs to show that such an Efeasible matching can be modified to provide an honest matching without substantially increasing the total cost.
To make this plan precise, we first recall that an feasible matching of a graph C with n vertices is a subset M* of edges of G with the property that each vertex of G is in some edge of M* and at most En elements of M* fail to be isolated from the other elements of M*. The fact that one can cheaply convert a lowcost Efeasible matching of K. to an honest lowcost complete matching of K,, is (almost) purely combinatorial. We do not deal here with that part of the argument, but one can refer to Proposition 2 of Aldous [31 for a proof of an analogous result in the context of bipartite matching.
HOW TO USE A MATCHING ON A PWIT To FIND ONE ON K~
We continue to work with the special PWIT ] that was constructed as part of Triple Tree Process of Lemma 9, and we let M denote the matching defined on T that was constructed in Lemma 10. Next, we fix a Q > 0, and we let N. (r, T) denote the neighborhood of radius p about the root r of 1. We view N, (r, 7_) as an element of 9, with root r, and we let N, (r, K.) denote the analogously defined Qneighborhood of the root of K,. Here one should note that N.(r, T) is always a tree since it is a subtree of T, but the corresponding neighborhood of K, given by Ne(r, KJ will not necessarily be a tree, Nevertheless, since p is fixed, the probability that N. (r, KJ is a tree will go to one as n becomes large.
Now, for any G E G, that is a feasible realization of N,(r, T) and for any w c 7, we let q(wIG) denote the conditional probability that r is matched to w in M given the event that the neighborhood Nc(r, T) is isomorphic to G as an element of G.. We will use q(wIG) to guide a sampling process on Kn that we can use to help build a matching.
Specifically, for any 1 < n < oo we define a subset M (g, n) of the directed edges of K,, by the following process: for each u E Kn we choose (u, v) with probability q(vl N,(u, K,)). Also, in just the same way, we define a subset M(&, oo) of the directed edges of ]_ that is, for each u EE T we choose the directed edge (u, v) with probability q (v IN, (u, T)). The key observation is that the event A(p, oo) defined by
A (g, oo) = { r is in exactly two edges of M (Q, oc)
1
54 David Aldous and J. Michael Steele
satisfies the identity
P (A (g, oo)) = 1 77 (Q) where 17 (p) 0 as p4 oo. (5.66)
Thus, if we collapse each directed loop {(u, v), (v, u)} of M (p, oo) to a single undirected edge, then for large 2 we have a high probability of obtaining an honest matching on T.
Now, we consider the analogous events for Kn. Specifically, we let A(p, n) denote the event that (1) the root r of Kn is in exactly two directed edges of M (p, n) and (2) these edges have the form (r, v) and (v, r) for some v. The local weak convergence of Kn to T and the definition of 77(p) then tells us that
lim P(A(p, n)) = 1 71(g). (5.67)
noo
Next, we let M' (p, n) denote the set of undirected edges one obtains by collapsing each directed loop Ev, w), (, v)} of M (p, n) into a single undirected edge. One can then show that the limit (5.67) implies that there is an E(Q) such that s(p) ~ 0 as p , oo for which we have
lim inf P(M'(p, n) is e(p)feasible) > 1 E(p). (5.68)
noo
Finally, to complete our sketch of the proof of Theorem 5, we just note that one can now use local weak convergence to relate the costs of the edges of M' (p, n) to the costs of those in the matching M on T, and, in due course, one can use this correspondence to show that the asymptotic mean cost of M' (Q, n) is at most n/2 times 1r2 /6. This is precisely what one was brought to expect from the calculation in Lemma 12, and thus it completes the sketch.
When one looks back on this construction, one can confirm that the basic idea of sampling from a locally defined conditional distribution is pleasantly general. In fact, there are many problems where a similar sampling method can be applied, and this method of conditional sampling seems to express one of the more fundamental ways that one can exploit local weak convergence. Further elaboration of the sampling construction is expected to be part of Aldous and Steele [71.
5.9 Finally Living Beyond One's Means
One of the original motivations for the study of the assignment problem was the discovery of Karp [371 that under the model of independent uniform costs % the value of A,, provides an impressively tight lower bound on the minimal cost of a tour of the vertex set of the complete directed graph where the cost of using the directed edge (i,j) is cij. This discovery led in turn to sustained exploration of algorithms that use the assignment problem and branch and bound techniques for the travelling salesman problem (cf. Balas and Toth[131), and it also motivated more detailed investigations of the precision of An as an approximation for the TSP cost (cf. Karp and Steele [391).
The Objective Method 55
While the objective method is particularly well focused on issues such as the convergence of the means E(AJ and the determination of this limit, it does not provide ally information on the variance of A,, or other measures of concentration of A,. For such information, other methods must be applied.
For example, Karp and Steele [391 used combinatorial and coupling methods to show that with high probability the greatest cost of an edge in an optimal assignment of size n is 0(log 2 n/n), and Talagrand [621 used this observation to illustrate one of his isoperimetric inequalities to show that there is a constant 0 < c < oo such that
Var(An) :~ c (log n)4 for all n > 1.
n
More recently, F~ieze and Sorkin [23] improved the bounds of Karp and Steele [39] to show that with high probability the longest edge in 0(log n/n), and the same technique was then further refined by Lee and Su [45] to show that there are positive constants cl and C2 such that
cin5/2 < Var(An) :5 G2 (log n) 2. (5.69)
n
Despite the wide gap between the bounds (5.69) they are the best that are currently known. Nevertheless, one certainly expects that in fact there is
2 > 0 such that a 2
Var(A.) as n ~ oo. (5.70)
n
Unfortunately, the objective method is powerless to help, and for the moment at least, the conjecture (5.70) seems to be far over the horizon.
56 David Aldous and J. Michael Steele
6 Problems in Euclidean Space
The objective method is especially well suited for many problems of Euclidean
combinatorial optimization. Typically, one begins with an optimization prob
lem that is driven by a sample {Xi : 1 < i < n} of independent random
variables with the uniform distribution on the unit dcube [0, Ild, so after
choosing a random "rooC I that is uniformly distributed on {1, 2, ..., n} and
independent of the {Xi}, one can take immediate advantage of the classical
fact that
n Ild(X, _ X, : 1 < i < n) d + Po as n ~ oo, (6.71)
where Po = P U {0} and TI is a homogenous Poisson process on R d with uniform intensity equal to 1.
Now, throughout this survey, we have interpreted the indicated convergence as local weak convergence as defined in Section 2, so it may seem unfair to call this result classical. Nevertheless, once one agrees to interpret the two sides of the limit (6.71) as rooted geometric graphs where the root is the vertex at the origin of Rd and where the edge lengths are given by Euclidean distances, then (up to an irrelevant rotation) local weak convergence (6.71) is in fact equivalent to the traditional weak convergence of point process theory.
The bottom line is that for problems of combinatorial optimization for points in R d' the Poisson limit (6.71) assumes the role that Grimmett's Lemma 1 played for the uniform trees and that Aldous's Theorem 3 played for the exponentially weighted complete graphs. Thus, for Euelidean problerns the first step of the objective method comes gratis; the game only begins with the introduction of an optimization problem.
6.1 A Motivating Problem
One of the early motivations for the objective method was a problem sparked by an empirical observation of R. Bland, who noted from simulations of an algorithm for the MST that for n independent uniformly distributed points in the unit square that one consistently finds that
1: 1e12,c>0 asnoc,
eEMST
where lel denotes the usual Euclidean distance from u to v when e = (u, v).
Stimulated by Bland's observation, Steele [58] proved that if the random vari
ables {Xi} are independent and have a distribution p with compact support,
then for all 0 < ce < d one actually has
n (ad)/d ~ ler ,(a, d) f(X)(dc,)ld dx a.s., (6.72)
e(=MST /Rd
where f is the density of the absolutely continuous part of IA and c(a,d) is a constant that depends only on a and d. This result goes beyond Bland's
The Objective Method 57
conjecture in several respects, but, ironically, the subadditive method used in Steele [581 falls short of covering the motivating case a = d. This is where the objective method entered the picture.
BENEFITS OF HINDSIGHT AND A PROPER POINT OF VIEW
Fifteen years ago it was not clear how one could make further progress on Bland's conjecture, but with the benefit of hindsight a good plan is obvious. One simply needs to extend the Poisson convergence (6.71) to the joint convergence the point sets and the corresponding spanning trees.
When this was done in Aldous and Steele [61, there was some mystery to emergence of a forest as the limit of a sequence of minimal spanning trees. Now the reasons for the appearance of the minimal spanning forest are understood more deeply, and, if one simply applies the MST Convergence Theorem of Section 5.3 to the Poisson limit (6.71), one obtains a theorem that explains most of the qualitative behavior of the MST of a random sample from the dcube.
Theorem 7 For the randomly rooted normalized sample
Sn = nIld(X, _ X, : 1 < i < n)