Please see PDF version
The Objective Method: Probabilistic Combinatorial Optimization and Local Weak Convergence
David Aldousi and J. Michael Steele 2
Department of Statistics, University of California Berkeley, Berkeley, CA,
947203860 aldous@stat.berkeley.edu 2 Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA 191046302 steele@wharton.upenn.edu
1 Introduction
This survey describes a general approach to a class of problems that arise in combinatorial probability and combinatorial optimization. Formally, the method is part of weak convergence theory, but in concrete problems the method has a flavor of its own. A characteristic element of the method is that it often calls for one to introduce a new, infinite, probabilistic object whose local properties inform us about the limiting properties of a sequence of finite problems.
The name objective method hopes to underscore the value of shifting ones attention to the new, large random object with fixed distributional properties and way from the sequence of objects with changing distributions. The new object always provides us with some new information on the asymptotic behavior of the original sequence, and, in the happiest cases, the constants associated with the infinite object even permit us to find the elusive limit constants for that sequence.
1.1 A Motivating Example: the Assignment Problem
The assignment problem for the n x n cost matrix (%) is the task of determining the permutation 7r : [n] ~> [n] that minimizes the total cost C1,,,(1) + C2,r(2) + ' .. + c~,,(n) of assigning one column to each row. This problem arises in many applied contexts, and it has spawned an extensive algorithmic literature. Here we are more specifically concerned with the value of the objective function
n
A~ = min
1r ci, W (i),
where the costs cij, 1 < i, j < n, are assumed to be independent and identically distributed random variables.
i
2 David Aldous and J. Michael Steele
Investigation of A,, seems to have begun in 1962, when Kurtzberg [421 used heuristic methods like greedy matching to obtain upper bounds on E[Al for uniformly distributed costs. Kurtzberg's bounds were of order log n, and these were not improved until 1979 when Walkup [641 showed that E[A,l is bounded independently of n, which was quite a surprising discovery at the time. Eight years later, Karp [381 introduced a new approach to the estimation of E~A,,] that was based on linear programming. Karp exploited the explicit bases that were known to be optimal for the assignment problem, and he obtained the elegant bound E[An] :5 2. Inspired in part by Karp's result, Dyer, Rieze and McDiarmid [191 developed a general bound for the objective function of linear programming problems with random costs, and they were able to recapture Karp's bound without recourse to special bases. A probabilist's interpretation of the DyerFYiezeMcDiarmid inequality forms the basis of Chapter 4 of Steele [601 where one can find further information on the early history of the assignment problem with random costs.
A new period in the development of the random assignment problem began in 1987 with the fascinating article of M6zard and Parisi [501 which offered a nonrigorous statistical mechanical argument for the assertion that
2
lim E[An] = , = ((2). (1.1)
n~ 6,
The desire for a rigorous proof of this limit has influenced much of the subsequent work on the assignment problem, and the critical first step is simply to show that actually E[An] converges as n > oo. Convergence would follow immediately if one could show that E[AA is nondecreasing, but it is still not known if E[An] is monotone. Nevertheless, in 1992 Aldous [31 used the objective method to show that E[Anj does indeed converge.
In 1998 Parisi [51] added further interest to the random assignment problem when he advanced the remarkable conjecture that for independent exponentially distributed random cost cij with mean 1, one has an exact formula: n
2
E[An] k (1.2)
k=l
A proof of Parisi's conjecture would immediately confirm the ((2) limit formula. moreover, there are good reasons to believe the truth of this stronger assertion. Specifically, Alm and Sorkin [91 proved the conjecture for all values of n up to n = 5. Unfortunately, even for such small values, the proofs are not easy, and Alm and Sorkin had to find their way through a swamp of special cases with the help of automated theorem proving computations. The conjectured formula for E[A,,l remains a tantalizing challenge.
Nevertheless, progress on the limit value for E[Anj has been more definitive. In 2001 by means of the objective method Aldous [4] finally proved the ((2) limit formula that M6zard and Parisi [50] first brought to light in 1987.
i
i
The Objective Method 3
1.2 A Stalking Horse: the Partial Matching Problem
One of the aims of this survey is to show how the objective method helps to solve problems such as the determination of the limiting value of E[A.], but the assignment problem itself is too burdened with individual nuance for it to serve as our basic guide. For this reason, we introduce a new problem, the maximal partial matching problem.
This problem does not have the long history of the random assignment problem, but it is especially well suited for illustrating the objective method. In particular, it builds on the theory of random trees, and the limit theory of such trees provides the prototype for the objective method. Also, the maximal partial matching problem leads inexorably to the notion of a distributional identity. Such identities have always had a role in probability theory, but, along with allied notions like selfsimilarity and subadditivity, distributional identities are now a topic of emerging importance.
The relevant identities for the maximal partial matching problem are also much more tractable than the corresponding identities for the random assignment problem, yet many of the same tools come into play. In particular, we will find that one often does well to begin by guessing the solution of a distributional identity. After one has a good guess, then classic tools like Kolmogorov's consistency theorem and contemporary tricks like "coupling from the pasC can be used to confirm the existence, uniqueness, and stability of the solutions of the distributional identity.
1.3 Organization of the Survey
The maximal partial matching problem provides our introductory case study, but before it can be addressed we need to deal with some foundational issues. In particular, Section 2 develops the notion of local weak convergence. Formally, this is nothing more than the weak convergence of probability measures on a certain metric space of "rooted geometric graphs," but the attending intuition differs substantially from the classic weak convergence theory. Section 2 also introduces the "standard construction," which is a general recipe for building rooted geometric graphs. This construction turns out to have a subtle, yet pervasive, influence on the applications of the objective method.
After we have dealt with the essential metric space formalities, we see what local weak convergence tells us about the simplest model for random trees. In particular, we review Grimmett's lemma, and we develop an understanding of the convergence of the large and the small parts of a random tree that has been cut at a randomly chosen edge. This analysis provides us with the essential distribution theory for random trees that is needed later.
The theory of the maximum partial matching problem developed in Section 3 is concrete and selfcontained. Nevertheless, it faces most of the issues that one meets in more complex applications of the objective method, and it offers the best introduction we know to the essential ideas.
i i
4 David Aldous and J. Michael Steele
In Section 4 we introduce the meanfield model of distance which is a physically motivated probability model designed to gain insight into problems for point processes in R d. In this model the distribution of some of interpoint distances of R' are captured precisely while other (hopefully less essential) interpoint distance distributions are distorted in comparison to Rd. The meanfield model leads us to the PWIT, or Poisson weighted infinite tree, which is arguably the most important infinite tree model. To illustrate the close connection of the theory of the PWIT and the objective method, we give a reasonably detailed proof of the ((3) theorem of Rieze.
The relationship of the PWIT to problems of combinatorial optimization is continued in Section 5 by developing the limit theory for the minimum cost C,, of a perfect matching of the complete graph K,, with independent edge weights ~, having a common distribution F. We provide a reasonably detailed (and hopefully friendly) sketch of the fact that E[C.] converges to ((2)/2 as n ~ oo when F is the exponential distribution with mean one; this result is the direct analog of the ((2) limit theorem for E[A,]. Understandably, some nontrivial details must be omitted from the sketch, but the section should still provide a useful introduction to Aldous's proof of the ~(2) limit theorem.
All of the problems in Sections 3 5 call on distance models with substantial intrinsic independence., for example, in Section 3 we have independent weights on every edge, and we face the infinite PoissonGaltonWatson tree in the limit, while in Sections 4 and 5, we have the tempered independence that one inherits from the meanfield model, and we face the PWIT in the limit. The problems of Section 6 are different; they deal directly with interpoint distances in R d rather than with edge weights that are blessed ex cathedra with independence properties, and in the limit we often need to deal with the Poisson process.
The probability theory of Euclidean combinatorial optimization has grown quite large, and in Section 6 we provide very few proofs. Nevertheless, we hope to provide a useful update of the survey of Steele [60]. In particular, we address progress on the minimal spanning tree problem, including a recent weak law of large numbers developed in Penrose and Yukich [531 which has close ties to the objective method. The section also contrasts the objective method and the subadditive method that has served for years as a principal workhorse in the probability theory of Euclidean combinatorial optimization. In the closing subsection, we break into entirely new territory and describe a remarkable new result of Benjamini and Schramm on the recurrence properties of the local weak limits of planar graphs.
In Section 7, the last section, we first summarize some of the circumstances that seem to be needed for the objective method to be successful. We then develop the background of an attractive conjecture on the independence number of a random regular graph. This problem is used in turn to illustrate several of the basic challenges that appear to be critical to the deeper development of the objective method.
i i
i
i
The Objective Method 5
Finally, we should note that even though our main intention is to provide a survey and a tutorial, this exposition also contains new results. In particular, the material in Section 3 on the maximum partial matching problem is new, including the basic limit theorem (Theorem 1) and the theorem that determines the limit constants (Theorem 2). Another new result is the Convergence Theorem for Minimal Spanning 'I~ees (Theorem 6). This theorem both generalizes and simplifies much earlier work; it is also applied several times during the course of the survey.
2 Geometric Graphs and Local Weak Convergence
Before the theory of local weak convergence can be brought facetoface with the concrete problems of combinatorial optimization, one is forced to introduce an appropriate complete metric space. This introduction has been made several times before on a purely ad hoc basis, but now there is enough cumulative experience to suggest a general framework that should suffice for most applications. After describing this framework, we will give it a quick test run by discussing a prototypical result from the limit theory of random trees Crimmett's lemma on the convergence of rooted Cayley trees.
2.1 Geometric Graphs
If G = (V, E) is a graph with a finite or countable vertex set V and a corre
sponding edge set E, then any function ~: E > (0, col can be used to define
a distance between vertices of G. Specifically, for any pair of.vertices u and
v, one just takes the distance between u and v to be the infirnum over all
paths between u and v of the sum of the lengths of the edges in the path.
Definition 1 (Geometric Graphs and the Two Classes G and 9,)
If G = (V, E) is a connected, undirected graph with a countable or infinite vertex set V and if ~ is an edge length function that makes G locally finite in the sense that for each vertex v and each real p < oo the number of vertices within distance g from v isfinite, then G is called a geometric graph. When there is also a distinguished vertex v, we say that G is a rooted geometric graph with root v. The set of geometric graphs will be denoted by 9, and the set of rooted geometric graphs will be denoted by G_
2.2 9, as a Metric Space
The set G, of rooted geometric graphs provides our basic workspace, but before honest work can begin we need to say what we mean for a sequence Gn of elements of G, to converge to a G in g.. The idea one wants to capture is that for large n, the rooted geometric graph G, looks very much like G in an arbitrarily large neighborhood of the root of G.
6 David Aldous and J. Michael Steele
To formalize this idea, we first recall that an isomorphism between graphs G = (V, E) and G' = (V', E) is a bijection 0 : V + V' such that (0(u), 0(v)) E E' if and only if (u, v) E E. Also, given any such isomorphism, one can extend the domain of 0 to E simply by defining 0(e) to be (0(u), 0 ()) for each e = (u, v) E E.
Finally, we say that two geometric graphs G = (V, E) and G' = (V', E') are isomorphic provided that (1) they are isomorphic as ordinary graphs and (2) there is a graph isomorphism 0 between G and G' that also preserves edge lengths (so f'(0(e)) = f(e) for all e E E). In the case of two rooted geometric graphs G = (V, E) and G' = (V, E'), we will say they are isomorphic provided that there is a graph isomorphism 0 that preserves edges lengths and that also maps the root of G to the root of G'.
Next we consider a special rooted geometric graph that one may view intuitively as the "neighborhood of radius p about the root" of the rooted geometric graph G. Specifically, for any p > 0 we let N,(G) denote the graph whose vertex set V,(G) is the set of vertices of G that are at a distance of at most p from the root of G and whose edge set consists of just those edges of G that have both vertices in V,(G), where, as before, the distance between any two vertices u and v in G is taken to be the infinuan over all paths between u and v of the sum of the lengths of the edges in the path. We again view N,(G) as an element of 9, whose the edge length function and root are just those of G. Finally, we say that Q > 0 is a continuity point of G if no vertex of G is exactly at a distance p from the root of G.
Definition 2 (Convergence in 9,) We say that G, converges to G,,,, in G, provided that for each continuity point Q of G,,~ there is an no = no (p, GJ such that for all n > no there exists a isomorphism y~,c from the rooted geometric graph N.(GJ to the rooted geometric graph N,(GJ such that for each edge e of NfflJ the length of yn,,(e) converges to the length of e as n ~ oc.
With a little work, one can show that this definition determines a topology that makes G, into a complete separable metric space. As a consequence, all of the usual tools of weak convergence theory apply to sequences of probability measures on g., and we may safely use the conventional notation and write
1A~d > 1A to mean that f dt,,, f dp
for each bounded continuous function f R.
2.3 Local Weak Convergence
The topology on the metric space G, turns out to give weak convergence of probability measures on 9. a local character that sharply differs from the traditional weak convergence such as one finds in the weak convergence of
i
The Objective Method 7
scaled random walk to Brownian motion. Weak convergence of measures on g, never involves any rescaling, and the special role of the neighborhoods N,(G) means that convergence in g, only informs us about behavior in the neighborhood of the root.
In practical terms, this means that weak convergence in 9. can tell us about local features such as the degree of the root, the length of the longest edge incident to the root, and so on; yet it cannot convey detailed information on a global quantity such as the length of the longest path. To underscore this difference, one sometimes speaks of weak convergence in g, as local weak convergence.
2.4 The Standard Construction
Most of the random processes considered here are associated with a standard construction that has some lasting consequences, even though it may seem rather bland at first. The construction begins with a probability measure that is concentrated on the subset of 9 consisting of geometric graphs with exactly n vertices. We then consider a random element G,, of 9 that is chosen according to this measure, and we choose a vertex X, at random according to the uniform distribution on the n vertices of G.. We then make G. into a random rooted geometric graph Gn[Xn] by distinguishing Xn as the root vertex. The distribution of Gn[Xn] is then a probability measure on the set of nvertex elements in the set of rooted geometric graphs. Finally, if a sequence {Gn [X,] : n = 1, 2, ...} of such 9,valued random variables converges weakly in 9,, to a G,valued random variable Q,., then we say that the distribution of G,,, is obtained by the standard construction.
One might think that virtually any measure on 9. might be obtained by the standard construction, but the measures given by the standard construction are not unconstrained. Later we will find that they must satisfy a modest symmetry property that we call involution invariance. This property may sometimes be used to rule out what would otherwise be a tempting candidate for a limiting object.
2.5 A Prototype: The Limit of Uniform Random Trees
A classical formula of Cayley tells us that if V is a set of n distinguishable elements, then the set S. of rooted trees with vertex set V has cardinality nnl. If Tn denotes a tree that is chosen from Sn according the uniform distribution, then one can show that T. converges in distribution to a random variable T that takes values in G,. Remarkably, one can characterize the distribution of T,,, by a direct construction that relies on the classical theory of branching processes.
THE INFINITE "SKELETON TREE" To~ OR, PGW~(1)
i
i i i i i
8 David Aldous and J. Michael Steele
To begin, we consider a GaltonWatson branching process with one progenitor and an offspring distribution that has the Poisson distribution with mean one. With probability one, any such branching process has a finite number of vertices, and the finite trees generated in this way are said to have the Poisson Galton Watson distribution with mean one. This distribution on the finite elements of 9, is denoted by PGW(I).
Now consider an infinite sequence of independent PGW(1) distributed trees To ~ T1 ~ T21 ... , and let vo, v,, v2.... denote their roots. Finally, to make this collection of trees into one infinite rooted tree, we add all of the edges {(Vi,Vi+i), 0 < i < c>o}, and we declare vo the root of the new infinite tree. The tree T. that one builds this way is said to have the PGW'(1) distribution, and it is illustrated in Figure 1.
0
1
UUU
1 1 /
0 0 0 00
0 0 00 0 00 0 0
Fig. 1. The "Skeleton Tree" T. also known as PGW(1). Here, one should note that only the root vo of T_ is labelled; the chimerical notations V1, V2, ... are only given to focus the eye on the unique path from the root to infinity. The existence of such a path is one of the most fundamental features of PGW (1).
One might imagine that T,,,, could be an undistinguished member of the multitude of infinite trees that anyone might paste together from finite trees, but this intuition is faulty. As the next lemma explains, the skeleton tree T. has an inevitable and highly distinguished role in the theory of random trees. We should note that in the next lemma, and throughout the survey, a graph without its own edge length function is given one by taking the length of each edge to be 1.
Lemma 1 (Crimmett's Lemma) The randomly rooted uniformly distributed labelled tree T. on n vertices converges weakly in 9, to the "skeleton tree" T,,,,, that is, one has
i
i
i
The Objective Method 9
T, T.as n ~ co.
This result was first formalized and proved in Grimmett[261, and further proofs and generalizations are given by Devroye [18] and Aldous [21. In some respects, Lemma 1 is the most fundamental fact about random trees under the model of uniform distribution. Grimmett's Lemma also has an interesting selfgeneralizing quality, and shortly we will see that the convergence in distribution of Tn actually implies that certain special parts of Tn MUSt converge to corresponding special parts of T,,..
This convergence of parts seems to be a recurring theme in the theory of local weak convergence. In particular, it provides one key to the appearance of the distributional identities that have had a crucial role in some of the most interesting applications of the objective method.
SPECIAL PARTS OF THE SKELETON TREE PM'(1)
Our first observation about the skeleton tree of Figure 1 is that it has a pleasantly recursive structure. From the root vo there is just one edge (vo, vJ that is part of the unique infinite path vo, ~ v, ~ ... of T., and if we delete that edge we find two subtrees one finite and one infinite. If T"'11 denotes the finite tree rooted at vo and T N9 denotes the infinite tree rooted at v,, then we see from the definition of T' that we can identify their distributions
,,, d ~ d pGW_(j).
TZ PGW(1) and Tb' =
00
This decomposition and the definition of weak convergence of probability measures on 9,, promptly suggest that one may give a more detailed interpretation of Grimmett's Lemma.
CONVERGENCE OF THE CORRESPONDING PARTS
Consider again the randomly rooted uniform tree T. on n vertices, and let r denote the root of Tn. Any child c of r determines an edge e = (r, c), and, when c is deleted from T, we let Tn, denote the remaining component of T. that contains c. We again view T,,, as a rooted geometric weighted graph by taking its root to be c.
Now, we let c* denote a child of r that is chosen at random from the set of all children of the root for which the cardinality of T., is maximal; in fact, for large n, one finds that with high probability there is a unique maximal subtree. If we remove the edge (r, c*) from Tn we get two subtrees that we now label T7" and T` according to their cardinality (and with any tie
n
being broken by a coin flip). Finally, we view Tn` and T".' as rooted trees
n
by letting the end point of the deleted edge determine the new roots.
Next, by our observations on the structure of the skeleton tree, we can give a more detailed view of the convergence that is guaranteed by Grimmett's Lemma. Specifically, Grimmett's Lemma automatically implies both the convergence of the small component
i
i
1
i
10 David Aldous and J. Michael Steele
T;~" '~ TZ` l PGW(I) and convergence of the large component
Tb'q d _~ T big pGW(1).
n ~~ W
Moreover, Grimmett's Lemma even implies the joint convergence
(T7', T big _~ d _~ (TZ11, Tb'g),
n 00
where the two processes TZ` PGW(1) and T big 1 PGW'(1) are
00
independent. This independence is often useful, and in the Section 3 its contribution is essential.
BEYOND THE SKELETON TREES
Many models for random trees have been introduced in combinatorics and the theory of algorithms (1471, [631), and for the majority of these models one finds that there is a natural weak limit. Moreover, the analysis of Aldous [2] shows that for a large class of these models the limit object is a random rooted tree which shares many of the qualitative features of the Skeleton Tree given in Grimmett's Lemma.
Specifically, one finds that each of these limit objects has a unique infinite path from the root, and, as we saw in our discussion of Grimmett's Lemma, when one removes this skeleton one finds a sequence of finite trees that have a useful description. In general, these finite trees are no longer independent and identically distributed; instead, this sequence trees is more closely described as a hidden Markov process. In fortunate circumstances, the Markov structure that one finds can serve as a useful substitute for the independence that we use here.
3 Maximal Weight Partial Matching on Random Trees
Like most methods, the objective method is best understood through the examination of a concrete example, and, for all the reasons mentioned in the introduction, we believe that the maximal weight partial matching problem is the right place to begin. It is tractable enough to be developed completely, yet rich enough to demonstrate the structural features of the objective method that are important in more complex problems such as the random assignment problem.
3.1 Weighted Matchings of Graphs in General
For any graph G = (V, E) a paTtial matching S is simply a subset of the set of edges of G such that no pair of edges of S share a common vertex. Any
1
i
The Objective Method 11
function w : E(C) _~ R may be viewed as a weight function on the edge set of G, and the weight of the partial matching S is defined simply as
def
W (S) = E w (e).
eES
If S* is a partial matching of G such that w(S*) is equal to the supremum of w(S) over all partial matchings of G, then S* is called a maximal weight partial matching of G. Such matchings are important in many problems of combinatorial optimization, and methods for computing maximum weight partial matchings have an important role in the theory of algorithms (cf. LovAsz and Plummer [461).
3.2 Our Case: Random Trees with Random Edge Weights
To each fixed finite rooted tree T and to each edge e of T, we now associate a random variable ~,. Moreover, we assume that the ensemble {~, : e E T} is independent and that ~, has distribution F for each e E T. We view e ~~ ~, as a weight function on T, and a T with this weight function will be called an Fweighted rooted tree. Finally, for each Fweighted rooted tree T we let M(T) denote the maximum weight over all partial matchings of T.
In particular, if T. denotes an Fweighted rooted tree that is chosen at random according to the uniform distribution on the set of nn1 rooted trees with nvertices, then we write M. as shorthand for M(Tn). The random variable M,, is therefore the maximum weight of a partial matching of a random nvertex Fweighted rooted tree, and it will be at the center of our attention for the rest of this section.
3.3 Two Obvious Guesses: One Right, One Wrong
When the edge weight distribution F has a finite mean, one immediately suspects that there is an asymptotic relationship of the form
E[M,,l yn as n > oo
where y = y(F) is a constant that depends only on F. What one further suspects is that even for the nicest choices of F the calculation of y might be an almost impossible task. By comparison, one can list perhaps a dozen problems where subadditive methods yield a similar asymptotic relationship, yet the limit constants have gone unknown for decades.
In the partial matching problem we face a happier circumstance. For continuous F the objective method not only yields a proof of the asymptotic relation, it also provides a concrete characterization of the limit constant. This characterization is not necessarily simple, but at least in the leading case of exponentially distributed edge weights it does lead to an explicit integral representation for y that can be calculated numerically.
i
1
12 David Aldous and J. Michael Steele
OUR AssUMPTIONS ON THE DISTRIBUTION F
The intuition that leads one to guess that E(MJ1n converges does not impose any constraint on the edgeweight distribution F, except the trivial constraint that the expectation of ~, F should be well defined. Nevertheless, here we will always assume that F is continuous and that P(~, > 0) = 1. these assumption guarantee that the maximal weight partial matching exists and is unique with probability one.
We do not believe that the continuity of F is needed for the convergence of E(MJ/n, but, without this assumption, the possibility of multiple optima would forced us to face many irritating complications. Since our main intention here is to demonstrate the fundamental features of the objective method in the simplest realistic light, the issue of discontinuous F is best left for another time.
3.4 Not Your Grandfather's Recursion
The first thought of anyone interested in the asymptoties of E[Mn] is to look for a relation between E[Mn] and the earlier expectations E[Mi], 1 < i < n. Here we will also hunt for a recurrence relation, but what we find differs radically from the recursions one commonly meets in discrete mathematics.
If we remove an edge e from the edge set of Tn, then T,, is broken into two connected components. We can then view these components as rooted trees where we take the old vertices of e to be our new roots. Next, we label these trees Tn"(e) and T"(e) according to their cardinality (with any tie n
being broken by taking the labels at random). This process is similar to the discussion of the refinement of Grimmett's Lemma, except that here the cut edge e can be any edge of Tn.
Now, we consider whether or not the edge e is in the maximal partial matching of Tn. If one does not use the edge e, then the maximum weight of a partial matching of Tn is also equal to
M (TZn` (e)) + M (T`q (e)). (3.3)
n
To go deeper we need some additional notation. If T is any weighted rooted tree, we define B(T) by letting M(T) B(T) denote the maximum weight of a partial matching of T where the root is not in an edge of the partial matching. By optimality of M(T) we see that B(T) > 0, and we think of B (T) as the bonus one gets from the option of being allowed to use edges that meet the root. With this notation, we see that the maximal weight of a partial matching that is required to use e is given by the sum
~,+{M(TZ""(e))B(TZ"")(e)}+{M(T"'(e))B(T"')(e)}. (3.4)
n n
When we compare the weights (3.3) and (3.4), we see that with probability one the edge e is in the maximum weight partial matching if and only if
i
i
1
The Objective Method 13
> B (T;~ "' (e) ) + B (T` g(e) ).
n (3.5)
This inclusion criterion naturally gives us a nice way to write M, as a sum over the edges of Tn. If we use 1 (A) to denote the indicator function for an event A, then the inclusion criterion (3.5), tells us that
Mn = 1: ~e l > B (T~"' (e) ) + B (Tn"(e) (3.6)
cETn
Now, if e denotes an edge chosen uniformly from the edge set of T,, we see from the sum (3.6) that the expectation E[Mn] can be written as
EIM~] = (n 1)E[~.1(~. > B(Tn"'(.))+B(T"(e)))].
n
Finally, since the distribution of ~. does not depend on e and since ~. is
independent of B (T7` (e)) and B (T"(e)), the last equation may be written
n
a bit more crisply as
E[Mn] = (n I)E[ ~1 (~ > B(Tn`(e)) + B(T`g(e)) (3.7)
n
where we understand that F and that ~ is independent of the pair
(B(T',"(e)), B(T big (e)
n
3.5 A Direct and Intuitive Plan
The representation for E[M,,] given by equation (3.7) is hardly a conventional recursion, yet it is still encouraging. The first factor n 1 hints loudly at the gross behavior one expects from E[Mn]. Moreover, from our discussion of the refinement of Grimmett's Lemma, one can guess that two tree processes Tn`(e) and T big (e) will converge in distribution, and this strongly suggests n that the second factor of formula (3.7) will converge to a constant as n+ 00.
Certainly, one should worry about the fact that B() is not a bounded continuous function on 9,; after all, B() is not even well defined on all of G.. Nevertheless, B() is well defined on the finite graphs of 9, and any optimist surely suspects it to nice enough for the random variables B(TZ,`(e)) and B(T`(e)) to inherit convergence in distribution from the weak convergence n of TZ"` (e) and T` (e). n
To refine these suspicions, we first note that TZ"(e) and T"g(e) differ n from the trees Tn` and T big in our discussion of Grimmett's Lemma only n in that the roots of TZ" (e) and T`1 (e) are determined by the random edge n e while the roots of T7"' and T` were determined by a random uniform n choice from the respective vertex sets.
As Figure 2 suggests, the root r of T7`(e) is not quite uniformly distributed on the vertex set of Tn when e is chosen at uniformly at random from the edge set of Tn. In fact, if n is odd, then there is always one vertex of
14 David Aldous and J. Michael Steele
T,, that has probability 0 of being the root of T;~"' (e), and, when n is even, then there are always two vertices that have probability 1/(2(n 1)) of being the root of Tn"(e). In both cases, all of the other vertices have probability 1/(n 1) of being the root of T7`(e), so, even though r is not uniformly distributed on the vertex set of Tn, it is almost uniformly distributed. In fact, the total variation distance between the distribution of r and the uniform distribution is always bounded by 1/2n, and for odd values of n this bound is exact.
Odd Valued n Even Valued n
1/7 1/7 1/6 1/6 1/8 1/8 1/7 1/7
0"" 0
1/7 1/6 1/8 1/8 1 /14 1/7
1/7 V1/7 0 1/6 1/8 1/8 1/14 1/7
1/7 1/7 1/6 1/6 1/8 1/8 1/7 1/7
Uniform Edge Biased Uniform Edge Biased
Fig. 2. In this figure, the value next to a vertex is the probability of that vertex being the root of T~"'. One should note that the uniform choice of an edge typically leads to a nonuniform distribution on the vertices.
Rom this observation and a traditional coupling argument, one finds that there is only a small change in the distribution of the bonus when the omitted edge is used to determine the root; specifically, for all n > 1 and all x E R we have the inequality
P(B(T7`(e)) 5 x) P(B(T7`) :5 x) 1 :5 (3.8)
2n
A parallel analysis of the root of T`(e) yields an identical bound for the n
distributions of the bonuses of the big tree components, and together these
bounds tell us that if the distribution F of ~ is not too longtailed, then the
asymptotic behavior of the right hand side of our exact formula (3.7) for
E[Mn] will not be changed if we replace the edge biased trees T7`(e) and
T"'(e) by their randomly rooted cousins T` and Tn` whose asymptotic
n n
behavior is well understood.
We now see that along the path to proving that E[M,,]/n converges, we have the natural intermediate task of showing that the bonuses B(T7`) and B (T") converge in distribution. Further, if we hope to have a serious chance
n
The Objective Method 15
of calculating the value of the limit of E[M,,11n, we also need a concrete characterization of the limiting distributions of B(TJ`) and B(T").
n
3.6 Characterization of the Limit of B(T;~"")
From our discussion of Grimmett's Lemma, we know that T~"" converges in distribution to a tree T with the Poisson GaltonWatson distribution PGW(1). When we view T as an Fweighted tree, then the bonus B(T) is well defined since T is almost surely finite. Moreover, one can check directly from the definition of weak convergence in 9, that we have
B(T~n"' ) d _+ B (T), (3.9)
so the real question is whether there is an efFective way to characterize the distribution of B(T). This is where the notion of a distributional identity enters the picture.
The basic idea is that the recursive definition of the tree T should translate into a useful selfreferencing identity for the distribution of B (T); we just need to put this idea to work. If C denotes the set of children of the root of T, then for each i E C there is a subtree T' of T that is determined by the descendants of i, and we may view T' as a rooted tree with root i. If ~i denotes the weight of the edge from the root of T to the child i, then the maximum weight of a matching on T that does not meet the root is given by the sum
M (T');
iEC
so, to get an identity for B(T) we just need to calculate the maximum weight of a partial matching that does meet the root.
If j E C, then the maximum weight for a partial matching that uses the edge from the root of T to j is equal to
~j+{M(Ti)B(Tj)}+ 1: M(T')=~jB(Ti)+Y:M(T'),
jEC, j94j iEC
so the maximal weight for a partial matching that does include the root is equal to
max {~j B (Tj)} + 1: M (P).
jEC iEC
We now see that the difference B(T) between the overall maximum and the constrained maximum Eic M(P) is given by
B(T) = max { 0, ~i B(P) : i E C}. (3.10)
Finally, the cardinality of C is Poisson with mean one, T 1 T' for all i E C, and all of the random variables on the right hand side of the identity
i
i
i
1
16 David Aldous and J. Michael Steele
(3.10) are independent, so we have before us a very powerful constraint on the distribution of B(T). In fact, we can cheek that any random variable that satisfies a distributional recursion like that defined by equation (3.10) must be equal in distribution to B(T).
FORMALIZATION
To formalize this assertion, we let F and G denote any two distributions, and we define DF(G) to be the distribution of the random variable
max{O, ~i Yi : 1 < i < N} (3.11)
where random variables in the collection {N, Yi, ~i : i = 1, 2.} are independent, N has the Poisson(l) distribution, and we have ~i F and Yi G for all 1 < i < oo. The next proposition tells us that one may characterize the distribution of the bonus B(T) as a fixedpoint of the mapping DF().
Proposition 1 If T is an Fweighted PGW(1) tree where F is continuous and FQ0, oo)) = 1, then the distribution
G(x) = P(B(T) < x) is the unique solution of the fixedpoint equation
DF(G) = G. (3.12)
Since the discussion leading up to this proposition shows that the distribution of B(T) is a solution of the fixedpoint equation, we see that the characterization will be complete if we show that any solution of equation (3.12) must be the distribution of B(T) for some Fweighted PGW(1) tree T. This will be done in a way that is particularly probabilistic.
A UNIQUENESS PROOF BY A PROBABILISTIC CONSTRUCTION
Let G denote a solution of the fixedpoint equation DF(G) = G, so our task is to show that G is the distribution function of B(T) where T is an Fweighted PGW(1) tree. Our plan is simply to use G to construct such a T, and Figure 3 suggests how we start.
In the first step, we just consider the tree consisting of a root and a set C of N children. We then attach a random variable Yi to each child i E C where the Yi are independent and Yi G. We then define Y by the identity
Y = max{O, Yi : i E C},
where the ~i are independent and ~i F. We describe this process by saying that the root has been expanded.
Next, take any unexpanded vertex v (at the second step this would be any of the elements of C) and expand v in the same way that we expanded the root. Specifically, we give v a set C(v) of children with cardinality N(v) =
i
i
The Objective Method 17
IC(v)l that is independent Poisson(l), attach a random variable Yi to each child i E C(v) where the Yi are independent and Yi G, and then define Yv by the identity
Y~ = ax{O, ~v,i Yi : i E C}, (3.13)
where as before the ~,,i are independent and ~v,i F.
y~, 1 y~, 2 y~,N()
Y2 YN
Y1 Y2 y~ YN y~l_~y,
y y
N Poisson(l) N(v) Poisson(l)
Yi ild. G Y~,i ild. G
y = max(O,~i Y, : 1 < i < N) Yv = max(O, ~v,i Yi : 1 < i < N(v))
Fig. 3. Expansion of the root and expansion of a descendant vertex.
When we expand the root, the fixedpoint property of G guarantees that Y has the distribution G. Later, when we expand v, the values of Yv and Y will be changed, but their distribution will not be changed. Thus, to complete the construction, we simply repeat the process of expanding any unexpanded vertex until no such vertices remain. The tree we construct in this way is precisely a GaltonWatson tree with the Poisson(l) offspring distribution, so, with probability one, the expansion process stops after finitely many steps. Moreover, even though each vertex expansion changes the values of the random variables that are attached to the vertices along the path back to the root, all of the distributions will be unchanged.
By this construction, we see that for any leaf ~ of the resulting tree T the value Ye associated to f by the value rule (3.13) is zero. The bonus B(T~) is also zero for any leaf, and the recursion (3.10) for the bonus is the same as the recursion that defines the associated values Y,, v E T. We therefore find that B(T) = Y, and consequently we see that any solution G of the fixedpoint equation DF(G) = G is equal to the distribution of B(T) for an Fweighted PGW(1) tree T.
AN ANALYTICAL QUESTION
This proof of Proposition 1 is quite natural given the way that we came to the fixedpoint equation DF(G) = G, but the uniqueness question for this
i
i
is David Aldous and J. Michael Steele
equation is in fact a purely analytical problem. It seems natural to ask if the uniqueness of the solution of DF(G) = G may be proved directly without recourse to the probabilistic interpretation of G, but we do not address that question here. For the moment the more pressing question is whether we might also be able to characterize B(T`) by a distributional identity like
n
the one we used to characterize B(TZ`%
3.7 Characterization of the Limit of B(T")
n
To be sure, one can begin just as before; specifically, we can note from our discussion of Grimmett's Lemma that T" converges weakly in G. to a random
n
tree, although this time the limit is T, the skeletal tree with distribution
PGW'(1). We would dearly love to conclude that
B(T"`) '+ B(T,,J, (3.14)
n
but as soon as one writes this equation trouble appears.
Since T,,, is almost surely infinite, one sees that the total weight of the maximal partial matching of T. is infinite. For the same reason, the total weight of the maximal partial matching that does not meet the root of T'>', is also infinite. The bottom line is that the bonus B(T.), which is nominally the difference of these two quantities, is not well defined.
Nevertheless, one should not lose heart. If indeed there is a finite random variable Z such that
B(T"") '.> Z, (3.15)
n
then we still may be able to use our intuition about the undefined quantity B(T,,,) to help us to understand Z. Moreover, if we successfully guess a distributional identity that characterizes Z, we may even be able to use the stability of that characterization to build an honest proof of the conjectured convergence (3.15).
GUESSING THE DISTRIBUTIONAL IDENTITY FOR Z
As physicists sometimes do, we cross our fingers and calculate with B (T,,,) even though it represents the difference of two infinite quantities. Since we only want to guess an identity for Z that will be subsequently justified by a rigorous argument, there is no cost to proceeding heuristically for the moment.
If we label T,,, as we did in Figure 1 and if we remove the edge (vO, v,), then we obtain two subtrees with roots vo and v,. If T denotes the subtree with root vo, then T has the PGW(1) distribution, and, if T' denotes the subtree with root v,, then V has the PGW'(1) distribution. By the
00
"definition" of B(TJ, the optimal partial matching of T. that does not match the root vo has weight M(T.) B(T..), and our first task is to get an alternative representation for this quantity.
i
The Objective Method 19
To begin, we let C denote the set of all of the children of vo except for the special child v, that is on the path from vo to infinity in T",. For each i E C we view the vertex i and its descendants as a subtree T' of T, and we take i as the root of T'. With this notation, we can write the weight of the optimal matching on T. that does not match the root vo to a second vertex and obtain the bookkeeping identity
M(T,J B(TJ = M(T') + 1: M(P). (3.16)
iec
The next step is to find an appropriate expression for the maximal weight of a partial matching of T.. that does meet the root. We first note the maximal weight of a partial matching of T,,, that contains the edge (vo,i) with j E C is given by
~j + {M(Ti) B(Ti)} + M(T') + 1: M(P)
iEC,iOi
which we may simplify to
~j B(Ti) } + M(V ) + 1: M(P).
00
iEC
Now, if ~ denotes the weight of the edge (vO, v,), then the maximal weight of a partial matching of T,,,, that contains the edge (V0, V1) can be written as
M(P) + ~ + {M(T' B(T'
00 00
iEC
so, choosing the best of all of the possibilities, we see that M(ToJ equals
1: M(P) + M(V ) + max { 0, max{~j B(Ti)}, B(T'
iEC jEC 00
When we subtract the value of M(TJ B(T,,,) given by equation (3.16), we find
B(T,J = max{O, max{~j B(Ti)}, B(T'~)}. j r=C
We know maXiEC{O, ~j B(Ti)} has the same distribution as B(T) by the basic distributional identity (3.10) for the bonus, and we also know that V0. and TO,, have the same distribution, so at last we have a nice, simple, and lamentably heuristic identity for B(T.):
) d
B(T,~ = max{B(T), B(T~)}.
CONVERSION TO AN HONEST PROPOSITION
i i
20 David Aldous and J. Michael Steele
For any given continuous distribution F, we know that the distribution G of the bonus B(T) for the Fweighted PM(I) tree T can be characterized as the unique solution of the fixedpoint equation DF(C) = G, and our heuristic derivation suggests that we can characterize the limiting distribution of B(T`) in a way that is perfectly analogous. Specifically, given any n distribution H and independent random variables
ZH, YG, and ~F,
we let 6F (H) denote the distribution of max{Y, Z}, and we consider the fixedpoint equation
DF (H) = H.
This time we do not know a priori that there is a solution to this equation, but we do expect the limiting distribution of B(T`Q) to exist, and for large n
n we also expect B(T`) to be much like B(T,,j. Therefore, we suspect
n
that the limiting distribution of B(T",') will satisfy 6F(H) = H. The next n proposition confirms these suspicions while also confirming the crucial fact that the solution of the fixedpoint equation 6F(H) = H is unique.
Proposition 2 For any continuous distribution F with support on [0, 00), the fixedpoint equation
DF(H) = H (3.17)
has a unique solution H,. moreover, one has
lim P(B(T") :5 x) = H(x) for all x E R. n no<)
Before proving this proposition, we will first show that it really does complete the program with which we began.
3.8 The Limit Theorem for Maximum Weight Partial Matchings
Our whole discussion of E[M,,l has been motivated by the formula
E[M.1 = (n 1)E[~l(~ > B(TZ"(e)) + B(T`.(e))) (3.18)
n
and now with the help of Proposition 1 and Proposition 2 we are ready to take the limit. If we assume that ~ has a finite expectation, then the integrands in formula (3.18) are uniformly integrable and they converge in distribution to ~l(~ > Y + Z), so without further concern we can take the limit under the expectation to obtain the main theorem of this section.
Theorem 1 If F is a continuous distribution with support on [0, oo) and a finite mean, then the maximum weight Mn of an Fweighted random tree on n vertices satisfies
lim 1 E[Mn] = E[~l(~ > Y + Z))l (3.19)
n oe n
The Objective Method 21
where the random variables ~, Y, and Z are independent,
~ F, Y G, and Z H,
and the distributions G and H are the unique solutions of the fixedpoint equations
DF (G) =G and .5F (H) = H. (3.20)
The expression E[~l(~ > Y + Z))] for the limit (3.19) in Theorem 1 always provides some structural insight into the value of the limit, but, since the distributions of Y an Z are only defined through the solutions of two fixedpoint equations, one might think that Theorem 1 is still rather abstract. The next result tells us that in one of the cases that matters most, the limit E[~l(~ > Y + Z))l may be calculated explicitly.
Theorem 2 If F(x) = 1 c' for x > 0, then the maximum weight Mn of a partial matching on an Fweighted random tree on n vertices satisfies
lim 1 E[Mn] = c(ey be') exp(cey ce('Y))se' dy ds,
n oa n OF 0
where the constants b and c are defined by equations (3.23) and (3.24) of Lemma 2. Moreover, by numerical integration one finds
lim 1 E[Mn] = 0.239583.... n oo n
To prove Theorem 2 we just need to solve two fixedpoint equations, and the lackofmemory property of the exponential distribution makes this easier than anyone has any right to expect. Later we will find several other situations like this one where distributional identities lead us to calculations that are mysteriously pleasant.
Lemma 2 If F(x) = 1 c' for x > 0, then the solutions G and H of the fixedpoint equations (3.20) are
G(y) = ep(cey) for y ~~. 0 (3.21)
and H(z) = (1 bez) exp(cez) for z > 0, (3.22)
where b and c are related by
b = C2/(C2 + 2c 1) (3.23)
and c is the unique strictly positive solution of
c 2 + ee = 1. (3.24)
Numerically, one finds b = 0.543353... and c = 0.714556....
i
i i
22 David Aldous and J. Michael Steele
To prove Lemma 2, we first consider Y G along with an independent pair of independent sequences {Yi : i = 1, 2,...} and {~i : i = 1, 2,...} with Y, G and ~i F. In terms of these variables, the equation DF(G) G simply means
P(Y < y) = P(~i Yi :5 y for all 1 < i < N) = exp(P(~ Y > y)
where the second identity follows by conditioning on the independent Poisson variable N. Now, if we take logarithms and apply the lackofmemory property of the exponential, then we find
log P(Y < Y) = P(~ > Y + Y) = eYP(~ > Y), (3.25)
and this identity gives us the explicit formula (3.21) when we set c equal to P(~ > Y).
Next, to get the determining equation (3.24) we only need to calculate
c = P(~ > Y) with help from the distribution (3.21) of Y; specifically, we
find
c M~ > Y) eYP(Y < y)dy = OF exp(ceY) eydy
exp(cz) dz = c` (1 e').
Finally, to get the distribution of Z, we use the defining relation (3.17) and the lackofmemory property of the exponential to see for z > 0 that
P(Z < Z) =P(Y < Z)P(~ Z < Z)
= exp(cez) P(~ :~ z + Z) = exp(ce') {1 e'P(~ > Z)},
so if we set b = P(~ > Z), then P(Z < z) = (1 be') exp(ce'), just as we hoped. Now, all that remains is to express b = P(~ > Z) as a function of c = P(~ > Y), and we begin by noting
b = P(~ > Z) = 1, P(Z < z)e'dz
= fw(l bez) exp(ce') e`dz = c b + b c 2 + beec.
0 C2 c
Now, if we use the identity (3.24) to replace e` by 1 _ C2, we see that the last equation may be solved for b to obtain the promised relation (3.23). Finally, we note that the uniqueness of c follows from the convexity of the map x 14 x 2 + e`. 3.9 Closing the Loop: Another Probabilistic Solution
The Objective Method 23
of a FixedPoint Equation
To complete our investigation of the maximal partial matching problem, we only need to prove Proposition 2. Specifically, we need to show that the fixedpoint equation
DF(H) = H
has a unique solution, and we need to show that this solution is the limiting distribution for the bonuses B(T big).
n
EXISTENCE FOLLOWS FROM GENERAL THEORY
The existence of a solution to the equation .5F (H) = H is almost obvious once one draws a connection between the fixedpoint equation and the theory of Markov chains. Specifically, if we set
K(z, A) = P (max{~ z, Y} EE A)
where F and Y G are independent, then K(x, A) defines a Markov transition kernel on R+. One then sees at that a distribution is a fixed point of DF if and only if it is a stationary distribution for the Markov chain with kernel K.
Finally, since the kernel K has the Feller property and since it is dominated by the distribution of max{~, Y}, the general theory of Markov chains on R+ tells us that there is a stationary distribution for K (cf. Meyn and Tweedie [491, Theorem 2.1.2).
UNIQUENESS OBSERVED AND CONFIRMED BY COUPLING
While we might also be able to show the uniqueness of the solution of the fixedpoint equation with help from Markov process theory, there is another approach via coupling that is particularly informative. Moreover, the argument anticipates a useful stability result that gives us just the tool we need to prove that B(T") converges in distribution.
n
To begin, we assume that the variables Y~ : 1 < m < oo } are independent with ~, F and Y,, G for all m. Next, simultaneously for each z > 0, we construct a Markov chain Z,(z), m = 1,2,... by taking Z0(z) = z and by setting
Z~n (z) = max{ Z._ 1 (z), Y. }for m > 1.
Now, for all z and z' in R+, we see that
ZmW ~~ Zm(z') implies Z,,i+I(Z) :5 Z~+1(Z'),
and from this antimonotone property we see that Z.(z) ~: Z,"(0) for all even values of m. Therefore, if we let
T = min{m odd : Z .. (0) = Y~
24 David Aldous and J. Michael Steele
then we see that ZT(z) = YT for all z. As a consequence, we see that T is a coupling time in the exceptionally strong sense that
Z. (z) = Z~ (0) for all z > 0 and all m > T; (3.26)
moreover, the definition of Z,,(0) gives us the simple bound
P (Z. (0) = Y. 1 Z. 1 (0) = z) ~1. P < Yj for all z > 0.
Now, if we set p = 1 P(~,,, < Yj then we can easily check that p < 1. To see why this is so, we first note that in a tree T that has exactly one edge with weight ~' the bonus B(T) is equal to ~'. Now, if T is a PGW(1) tree, then the probability that T has exactly one edge is equal to e', so for
Y A B(T) we have
P(Y E e'P(~' E
Now, ~ is independent of Y and has a continuous distribution, so we have the bound
p(y < e2p(~, :5
2e
Thus, we find p :5 1 1/2e < 1, so the bound
P(T > 2k + 1) :5 p" for all k > 0
gives us more than we need to show that T is finite with probability one.
Now, if H is the distribution function for a stationary measure for the kernel K and if Z H, then by stationarity we have Z.(Z) H for all n > 1. Therefore, by the bound on the coupling time, we find for all x E R that
IH(x) P(Z.(0) :5 x)i :S gk for all m > 2k + 1,
so we see that there is at most one stationary distribution.
A COUPLING CONNECTION
Here we should note that this simultaneous coupling argument shares several elements in common with the "coupling from the pasC technique that has become a popular tool in the theory of Markov chain Monte Carlo (cf. Propp and Wilson [541). In particular, the argument given above shows that for any random initial value X we have
IP(Z,(X) E A) P(Z E A)] :5 ok for all n > 2k + 1;
and, remarkably enough, one can even choose X so that it depends on the
elements of Y : m > 1}.
The Objective Method 25
3.10 From Coupling to Stability Thence to Convergence
If one repeats this coupling argument while relaxing the condition on the distribution of the variables Y,, so that they are only required to have a distribution that is close to G, one is led to bounds on distributions that can be viewed the solutions of an "approximate" fixedpoint equation. Any such stability result must face some notational clutter because many quantities are constantly changing, but the next lemma should be easy to parse if one keeps in mind that n is an index that increases as the distribution of Y,~ comes closer to G. Also, in the lemma we choose the time index m so that we can focus our attention on Xon, a quantity that one may view as the dependent variable at the "time" 0 when the "approximation quality" equals n.
Lemma 3 Fix the integer k > 1 and suppose for each n we have nonnegative random variables {X," : 2k + 1 < m < 0} that are linked to the variables {YIZ : 2k + 2 < m < 0} by the relation
X,n {~n _ XZ
', = max . _l,Y.} for allm = 2k+2,2k+3,...,0,
where the random variables {~,n,, : 2k + 2 < m < 0} are independent, have distribution F, and are independent from
Xn 2k+1 and {Y,,, : 2k + 2 < m < 0}.
Further, suppose that one has
(Ymn : 2k + 2 < m < 0) d (Ym : 2k + 2 < m < 0) as n ~ oo,
where the random variables {Ym m = 0, 1, 2,...} are independent and have distribution G. If H is the unique solution of
DF (H) = H, then we have the bound
lim sup 1 P(Xon :5 x) H(x) 1 :5 Qk for all X E R and k 2,
nw
where p is given by
p dF(x) dG(y) < 1.
y
FROM STABILITY TO CONVERGENCE
First, we let T,,,, denote an Fweighted PGW' (1) tree, and let VO, V1, V2.... denote the sequence of vertices along the unique path to infinity from the root vo according to the definition of PGW'(1) and the familiar Figure 1. Next,
26 David Aldous and J. Michael Steele
we let k > 1 be an arbitrary integer, and we note from the local weak convergence of T, to T.,, that there exists an integer N(k,w) so that for all n > N(k,w) there is a path 7r with 2k vertices and the following properties:
(1) the first vertex of 7r is the root v' of T,,
0
(2) if 7r is written as Vn _~ V, __+ Vn __+ V, then the deletion of
0 1 2 2k 1
the edges of 7r creates a forest of subtrees Tn, j 1, 2,, 2k 1, of Tn where
T n is rooted at C, and
(3) we have the weak convergence relation
(Tn : 0 < i < 2k 2) (T?' : 0 < i < 2k 2)
where all of the righthand components T', 1 < i < 2k 2, are independent PGW(1) trees.
Next, for each n > 1 and each 1 < i < 2k, we let M' denote the subtree of T, that is rooted at v!' that one obtains by removing the single edge (v~' 1, v!')
% t_ %
from the path 7r. We think of R7 as a "remainder subtree" that contains
n
everything "pasC v,!' 1 and to complete the picture we write RO Tn.
........ ..... .............. ..............
VTn
TO, T11
IR n
i+l:
V. V~ V!' v 1
0 1 `~
Fig. 4. The Residual Ri'+1
Finally, if we let ~,n denote the weight on edge (V7, Vn 1), then we may z i+ use exactly the same argument that gave us the heuristic derivation of the fixedpoint equation 5G(H) = H to show that
B(R n) = max{~!' B(R n 1), B(T~') } for all 0 < i < 2k 2. (3.27)
i % i+ %
In this case the argument is also rigorous since all of the trees involved are finite, and B() is well defined at each stage.
The Objective Method 27
Finally, we note that the recursion (3.27) sets up the direct application of Lemma 3. We only need to take m = i, to find that the lemma gives us
lim sup 1 P (B (TJ :5 x) H (x) 1 < p'.
nw
Since p < 1 and since k is arbitrary, the last inequality is more than we need to complete the proof of Proposition 2.
3.11 Looking Back: Perspective on a Case Study
When one looks back on the maximal partial matching problem, there are several themes that seem to be worth recording for future use. First of all, the inclusion criterion (3.5),
> B (TJ` (e) ) + B (T"`
n
helped drive the analysis almost from the beginning. In particular, this necessary and sufficient condition for the edge e to be in the maximal partial matching gave us the succinct representation (3.7) for E[M,,], which in turn encouraged us to study the limit theory of B(TJ`(e)) and B(T"q(e)).
n
The asymptotic analysis of B(Tj`(e)) was relatively straightforward, but the analysis of B (T" (e) ) forced us to work harder. The key to its be
n
havior turned out to be the random variable recursion (3.27),
B(R") = max{ ~in B(R'j), B(T7) } for all 0 < i < 2k 2,
i i+ Z
but this equation might have left us empty handed if we had not first studied its infinite analog. For the analog, we found there was a unique solution, and this was encouraging. Such equations often have "stabilized" versions, and this principle guided us to the proof of the weak convergence of B(T"(e)
n
One is unlikely to find too many problems that may be solved With a stepbystep application of the method used here, but we will soon see several other examples where most of the same elements are present. Certainly, the heuristic derivation and rigorous solution of a distributional identity emerges as a basic theme. Sometimes this identity is followed by stabilization, and at other times it is exploited to provide an analog of an inclusion rule similar to that provided by equation (3.5). Typically, the new inclusion rule would apply only in a more elaborate model that one constructs with help from the distributional identity.
We will see from the analysis of more complicated problems that the solution of the maximal partial matching problem is a hybrid; it is initiated and organized by the objective method, but it is completed with help from a traditional stochastic recursion. Nevertheless, the maximal partial matching problem offers persuasive evidence of the effectiveness of the objective method, and it provides a quick introduction to several of the method's main themes. In the next three sections, we hope to make this case more complete and more inclusive.
28 David Aldous and J. Michael Steele
4 The MeanField Model of Distance
In physics, and increasingly often in probability theory, one meets geometric problems where one tries to gain insight by building an analytic model that only captures part of the structure of the original set of interpoint distances. For example, instead of studying a random variable that depends on all of the interpoint distances of a point process in Rd, one may directly model just the distances from a given point to its successive neighbors and then study an appropriate analog of the original variable.
By preserving the distributions of the most essential interpoint distances while distorting the distribution of less important distances, one hopes to achieve a significant simplification while still gaining some insight into the phenomena originally of interest. Models that aim for this kind of simplification are commonly called mean field models following an earlier tradition in physics where complicated fields were replaced by simpler ones that were calibrated to match the average behavior of the more complicated fields.
In this section, we will describe several problems of probabilistic combinatorial optimization where meanfield models have been developed. These models have a natural kinship with the objective method, and local weak convergence turns out to be a most natural tool.
4.1 From Poisson Points in Rd to a Simple Distance Model
Consider for a moment a Poisson process 'P on R' that has uniform intensity 1/Vd where vd is the volume of the unit ball in Rd. For such a process the expected number of points in the unit ball is equal to one, and, if we look at the successive distances from the origin to the points of P, we have a new point process
0 < A1 < A2 < A3 < .... (4.28)
This process is again Poisson, although now we have nonconstant intensity, and the expected number of points in [0, x] equal to xd.
In a world where one is more concerned about the distances to one's nearest neighbors than about other distances, this simple reduction suggests a promising line of inquiry. Can one assign random distances to the edges of a complete graph K,, with n vertices so that for large n the successive distances from a point r in Kn will mimic the distances that one finds for the sequence of neighbors of 0 in the ddimensional Poisson process P?
A FINITE DISTANCE MODEL: THE dExPONENTIAL K,,
The answer to this question is affirmative, and it is easy to describe. For each edge e of Kn, we simply introduce an independent random variable
such that
~, 1 nl/dA,, or equivalently, P(~, ~: nl/dx) = e _Xd for x > 0. (4.29)
The Objective Method 29
Now, if we fix a vertex r of K,, and consider the distance from this root r to its nearest neighbor, next nearest neighbor, and so on, then one finds a set of n 1 increasing distances
0 < Dn < Dn < ... < Dn
1 2 nP
The key feature of the distribution (4.29) is that one may now easily prove that
> d ~ (Ai : i > 1) as n ~ oo. (4.30)
This useful result yields some geometric insight, but it is possible to go much further. One does not need long to guess that the distributional limit (4.30) may be strengthened to prove the local weak convergence of the finite distance models Kn. That is, one hopes to find that Kn satisfies the kind of limit theorem that we found for Tn in Section 2.
WHERE TO START
Anytime one hopes to prove a weak limit theorem in G,, the battle begins by searching out a reasonably explicit candidate for the limit object. Here for example, one should minimally note that any reasonable candidate for the local weak limit of Kn must have infinite degree at every vertex. Moreover, from the motivating limit (4.30) we also know that in the candidate object the distances from a vertex to its neighbors must behave like the successive values of the Poisson process (4.28).
After these easy observations, some deeper reflection is needed. In due course, one finds that the description of the candidate is almost complete., only one further observation is needed. Any candidate for the limit of graphs K. must be a tree despite the fact that Kn is more full of cycles than any other graph!
4.2 The Poisson Weighted Infinite Tree or, the PWIT
Even with a good limit object in mind, one needs to confirm that the candidate exists as a clearly defined process. Here this is easily done by an inductive procedure. One begins with a single root vertex that we denote by r. This root vertex is then given an infinite number of children, and the edges from the root to the children are assigned lengths according to a realization of a Poisson process (Qr : 1 < i < oo) with mean function x d . The children of the root are said to form generation one.
Now, recursively , each vertex v of generation k is given an infinite number of children, and the edges to these children of v are again assigned lengths according to an independent realization of a Poisson process (~Y : 1 < i < 00) with mean function x d . This procedure is then continued ad infinitum.
The resulting rooted infinite tree T is then a well defined random variable with values in 9,,. The tree T is said to be a Poisson weighted infinite tree, or, a PWIT. Still, the nickname intends no disrespect. PWITs have an inevitable
30 David Aldous and J. Michael Steele
role in the limit theory of trees, and the next theorem tells us that they are precisely the limits of appropriately weighted complete graphs on n vertices.
0.5 0.8 2.1
F9 1 F71
18/19 \3 7 0 9/12 \2 5 0 4/2 8 \4 1
E [1~] E [~ [~] n23 n31 n32 n33
Fig. 5. Part of a realization of a PWIT that shows just the first three children of each vertex; one needs to keep in mind that in the full PWIT each vertex has an infinite number of children. Here the values written next to the edges that descend from a fixed vertex are intended to reflect an independent realization of a Poisson process with mean function x d . Finally, the labels of the vertices in this figure are only for convenience; in a PWIT the vertices are unlabelled except for the root.
Theorem 3 (Convergence Theorem for KJ If K, is a randomly rooted complete graph on n vertices with independent edge lengths ~, that satisfy
P(~,~ ~~ nl/dx) = e'd for all x > 0,
then we have the local weak convergence
d
K~ ~ T as n4 oo. where T is a PWIT with edge weight mean function x d
This theorem from Aldous [3] permits one to study optimization problems for Kn in the same way that Grimmett's Lemma permits one to study optimization problems for the uniform random tree Tn. Nevertheless, there is one noteworthy distinction; the exponential weights on Kn are tied to the Poisson processes that generate the PWIT, but the Fweights for the edges of T, may be chosen with great freedom.
The Objective Method 31
We will not repeat the proof of Theorem 3 here, but we should indicate why the cycles of K, disappear in the limit. If we introduce a random variable that measures the length of the shortest cycle containing the root,
C,, = min {E ~, : C is a cycle of K,, that contains the root r 'EC then the key issue is that one can prove that
lim P(C. < 0 for all t > 0. (4.31)
noc
Intuitively this result says that in the limit there is no cycle of finite length that contains the root, and, with some attention to detail, this leads to a proof that the weak limit of K,, in G, must be a random tree.
THE DIMENSION PARAMETER AND A VOLUNTARY RESTRICTION
For the rest of this section, we will restrict our attention to the case when d = 1, and the Poisson process (4.28) of successive distances (Ai : 1 < i < oo) reduces to a plain vanilla Poisson process with constant rate 1. Nevertheless, we should emphasize that this restriction is voluntary; it is only taken to shorten and simplify our exposition.
For all of the problems studied in this section and the next, there are parallel results even for noninteger 0 < d < oo, and, in most cases, one will find complete proofs of these results in the references. Moreover, one should note that for the problems studied here, the logic of the analysis is identical for all values of 0 < d < oo, even though the explicit calculations often depend on the specific value. These calculations are almost always simplest for d = 1 where one has access to the lackofmemory property of the exponential.
4.3 The Cutoff Components of a Weighted Graph and a PWIT
For a moment, we consider a perfectly general weighted graph G, and we let G. denote the subgraph G. that one obtains from G by deleting all of the edges of length s or greater. Also, if v is a vertex of G we let c(G, v; s) denote the connected component of G, that contains v.
Now, if T is a dimensionone PWIT with root r we may view the component c(T, v; s) as a rooted tree with root r, and, rather pleasantly, this tree can identified with a GaltonWatson branching process with a single progenitor and an offspring distribution that is Poisson with mean 8. In other words, for all s > 0 we have
c(T, r; s) 1 PGW(s),
and this simple observation has several useful consequences.
In particular, if q(s) denotes the probability that the component c(T, r; s) is infinite, then classical formulas for the extinction probability of a branching
32 David Aldous and J. Michael Steele
process (such as those of Harris [30], p.7) tell us that we have q(s) = 0 when 0 < s < 1, and otherwise we find that q(s) is the unique strictly positive solution of
1 q(s) = exp(sq(s)) when s > 1. (4.32)
By inverting this relationship, one also sees that the value of 8 for which we have extinction probability 0 < q < 1 is given by
s(q) = q1 log(l q), (4.33)
a formula that will soon come in handy.
4.4 The Minimum Spanning Forests of an Infinite Graph
We want to define an analog of the minimal spanning tree for the PWIT and other infinite graphs, but some care must be taken. Naive attempts to mimic the usual greedy algorithms certainly will not work; in most cases, they can never get started.
Instead, we are guided by a wellknown criterion for an edge e = (V1, V2) with length s to be in the MST of a finite graph G with distinct edge weights. For such a graph, e is in the MST of G if and only if there does not exist a path in G from v, to V2 such that each edge of the path has weight less than s. This criterion is easy to prove, and it is occasionally useful in correctness proofs for MST algorithms, such as the five discussed in Section 4.4 of Matou~ek and NeWil [481.
The Objective Method 33
Definition 3 (Minimal Spanning Forest) The minimal spanning forest of an infinite graph G that has all distinct edge lengths is the subgraph MSF(G) of G with the same vertex set as G and with an edge set that contains each edge e = (V1, V2) of G for which
(1) c(G, vi; s) and C(G, V2; s) are disjoint, and
(2) c(G, v,., s) and C(G, V2; s) are not both infinite when s is taken to be the length of the edge e = (V1, V2) E G.
One can easily cheek that the graph MSF(G) defined this way will have no cycles, and, since it trivially spans the vertex set of G, we see that MSF(G) is certainly a spanning forest. Nevertheless, some work is needed to see why MSF(G) is the right analog to the MST of a finite graph and to justify the name minimal spanning forest.
A first step, one should note that all of the connected components of MSF(G) are infinite. To see this, suppose to the contrary that GO is a finite component of MSF(G) of minimal cardinality. Next, consider the shortest edge e = (V1, V2) between GO and its complement. Since one cannot get out of GO without using an edge length greater than s, we see that c(G, V, ; S) is contained in GO and hence that c(G, v,; s) is finite. Rom the definition of the MSF(G), we then have e E MSF(G), and this contradicts the minimality of the component Go.
Either c(G, v,; s) is finite
V2
V1
8
or C(C, V2; 8) is finite, or both are finite.
Fig. 6. The edge e (V1, V2) of G is in the minimal spanning forest if and only if at least one of the trees c(G, v,; s) and C(C, V2; 8) is finite when s = ~_
4.5 The Average Length Per Vertex of the MSF of a PWIT
If T is a PWIT with root r then the sum of the edges of MSF(T) that are incident to r can be written as
34 David Aldous and J. Michael Steele
SmsF(T) = 1: &l(e E MSF(T)),
e:rEe
and, if we think of each vertex as owning half of the length of each edge that meets the vertex, then it is reasonable to think of
A = 1 E[SMSF(T)l (4.34)
2
as an average length per vertex of the minimal spanning forest of T.
One might guess that A could be almost impossible to compute, but such a view underestimates the natural grace of the PWIT. The next lemma and its proof show that the PWIT supports some almost magical possibilities.
Lemma 4 The average cost A per vertex for the MSF of a P WIT satisfies
A = ((3) k3.
k=l
To prove the lemma, we first recall that if one conditions a Poisson process to have a point at s, then the set of complementary points is again a Poisson process. Thus, if we condition on the existence of an edge e of length s from the root to some child, then the subtrees obtained by cutting that edge are again independent PGW(s) trees.
The probability that at least one of these is finite is equal to 1 q2 (S), so this is also the probability that e is an edge in the minimal spanning forest and we find
A= 1f,(lq 2 (,)) d,. (4.35)
2
Calculation of this integral is quite a pleasing experience. We first integrate by parts and then we use the defining equation (4.32) for q(s) to find
1 2 1 1.82 1 11 10g2(1 q
A = 8 q(s)q(s) ds = q(s)q'(s) ds = ) dq,
2 fo 00 2 1 2 0 q
so the substitution u log(l q) lets us finish with
A = 1 1' U2 eu du = 1 U2 00 ku dU
2 1 e 2 Y7 e F3
0 OF k=l k=l
4.6 The Connection to Rieze's C(3) Theorem
The emergence of ((3) as the average cost per vertex of the minimal spanning forest of a PWIT is at least a little mysterious. Still, there is a related precedent, and when one sees ((3) in the context of trees, the famous ((3) theorem of Frieze [21] immediately comes to mind.
The Objective Method 35
Theorem 4 Let each edge e of the complete graph on n vertices be assigned an independent cost ~, with distribution F where F has a finite variance and where F(O) exists and is nonzero. If C~ls'(F) denotes the minimal cost of a spanning tree of this graph, then as n > oo, we have
C'M'ST p [C;~ST
n (F) ((3)IF'(0) and E (F)] , ((3)IF'(0). (4.36)
By a simple coupling argument, Steele [571 showed that the convergence in probability still holds without any moment assumptions, and later by a martingale argument Frieze and MeDiarmid [22] showed that convergence in probability can be strengthened to almost sure convergence.
Nevertheless, what matters here are simply the exponential costs, and to make the connection with our earlier calculations explicit, we note that Rieze's theorem implies that if the edge weight distribution & is exponential with mean n, then we also have
li 1 E [C,~4ST(F.)] = ((3). (4.37)
noc n
This limit says the average cost per vertex of the minimal spanning tree of the appropriately weighted complete graph tends to the same value as the expected average cost per vertex of the PWIT with d ~ 1. Conceivably, the reappearance of ((3) is only a coincidence, but it is not. Quite to the contrary, the limit (4.37) is a consequence of the fact that A = ((3).
THE LOCAL WEAK CONVERGENCE LINK
As a special case of Theorem 3, we know that for the complete graph K,, on n vertices with edge lengths that are independent and exponentially distributed with mean n, one has the local weak convergence
K. d ~ T as n > oo (4.38)
where T denotes a PWIT with d = 1. We will also see later in Theorem 6 of Section 5 that just local weak convergence (4.38) and the fact that the PWIT is an infinite graph with all distinct edge weights is enough to give us
(K,, MST(K.) ) d _~ (T, MSF(T)). (4.39)
This limit is typical of the way the theory of local weak convergence informs one about a problem in combinatorial optimization. At a conceptual level, the limit (4.39) answers almost any question one may have about the qualitative behavior of MST(K.) for large n, provided that the question meets one restriction; it be expressed in terms of a function of some neighborhood of the root of MST(Kn).
36 David Aldous and J. Michael Steele
A RANDOMIZED ROOT AND A TOTAL SUM
The expectation E[C~"'(F,,)] can be written as a function of the neighborhood of the root, yet this representation seems to be a lucky break. Eventually one calls on this break routinely, but at least once it seems useful to examine the full logical chain; we begin by noting
Q~ ST(F,') 1 Y7~,1(e E MST(KJ) c
so now if we let R denote a root chosen at random from the vertex set of K,, we also have
1 E[E~,1(e E MST(KJ)] E[ 1: ~,1(e c MST(K.))]
n c e:REe
and from the local weak convergence MSF(K,,) ~ MSF(T) we have
E ~~l(, e MST(KJ) 1: ~,1(e E MST(T)). (4.40)
e:REe e:REe
Now, once one checks uniform integrability (which is a not particularly easy, but which is a doable technical exercise), one finds from the basic distributional limit (4.40) that
E[ 1: ~,1(e E MST(Kn))] , E[ 1: ~,1(e E MST(T))] = 2((3),
e:REe e:REe
where the last equality comes from Lemma 4. IProm the links of the chain we
find 1 E[C1;~"(Fn)] ((3),
n
just as we hoped to show.
5 Minimal Cost Perfect Matchings
A perfect matching of a finite graph is a collection of disjoint edges with the property that at most one vertex of the graph fails to be contained in an edge of the collection. Perfect matchings are among the most intensively studied structures of graph theory, and they suggest many natural random variables. Here we focus on the prince among these, the minimal cost C. of a perfect matching of the complete graph K,, with independent edge weights ~, with distribution F.
The assignment problem discussed in the introduction is precisely the perfect matching problem for the bipartite graph K,,, and the analysis of C. is intimately connected to that of A.. Thus, the following theorem is a direct analog of the ((2) limit theorem for the expected assignment cost E[Anj, but here we rescale the Fweights to be consistent with our discussion of the local weak limit theory of K,,.
The Objective Method 37
Theorem 5 If C~ denotes the minimal cost of a perfect matching of the complete graph K,, on n vertices with independent edge costs ~,, having the exponential distribution with mean n, then
1 1
lim E[C,,l k 2 =
~((2). (5.41)
noo n 2
k=l
Given the earlier experience with the maximum partial matching on the weighted uniform tree Tn, one has every reason to expect that the local weak convergence of Kn should provide the key to the proof of this theorem. This view is well founded, but the earlier analyses only provide one with a proper orientation. There are many devils in the details, and a complete proof of Theorem 5 would take more space than we have here.
Nevertheless, we can provide the better part of onehalf of the proof; that is, we can provide most of the argument that proves the limit infirnum, of E[C.11n is not less than ((2)/2. While doing so, we can also underscore the main structural features of the full proof. In particular, we develop a basic distributional identity and solve the identity to find an unexpected connection to the logistic distribution. We can then use the distributional identity to construct a special stochastic process which can be used in turn to define an inclusion rule for the edges of a PWIT to be in a minimal cost matching. Finally, given just a few properties of the freshly constructed process, we can complete the analysis of the limit infimum.
In the course of this development, we can also complete a promise made in Section 2; specifically, we can formalize the involution invariance property that one finds for all limit objects that are given by the standard construction discussed in Section 2. Even though involution invariance seems perfectly innocuous, it has an unavoidable part in the proof of Theorem 5. In the earliest stage, it helps weed out defective candidates for the limit objects, and at a later stage it guides us to a proper understanding of the matchings on the PWIT that can be attained as limits of matching on K,.
5.1 A Natural Heuristic Which Fails for a Good Reason
The local weak convergence of K,, almost compels one to think of using good matchings on the PWIT to help in the hunt for good matchings on K,. The natural hope is that a perfect matching in the PWIT with minimal average cost per vertex should be close to a perfect matching of K. for large n; thus, weak convergence theory might set us on the trail of a proof of Theorem 5. This hope can be justified, but only after one restricts attention to an important subset of the matchings on the PWIT that satisfy a natural symmetry property. If one ignores this restriction, there are matchings on the PWIT that are simply too good to be tied to matchings on K,,.
38 David Aldous and J. Michael Steele
AN EXAMPLE: GREEDY MATCHING ON THE PWIT
In a finite weighted graph one reasonable way to hunt for a good matching is the global greedy algorithm where one picks the edge e with minimal weight, removes all of the edges incident to either endpoint of e, and proceeds recursively. Unfortunately, in a PWIT this method does not have a chance to start a PWIT has no edge of minimal cost.
A Symbolic PWIT and its Greedy Matching
The greedy matching
of a PWIT is not
involution invariant,
so we cannot use it
to help construct a
0 good matching on K,.
Fig. 7. To construct the greedy matching of a PWIT one takes the shortest edge e out of the root, removes all of the vertices incident to e, and proceeds recursively. The expected cost per vertex of the greedy matching is equal to onehalf, which beats the unbeatable limiting average cost of ((2)/2 per vertex for K..
In the PWIT, a reasonable alternative to the global greedy algorithm is the local greedy algorithm where one starts by choosing the edge of minimal cost that is incident to the root. In a PWIT such an edge e exists, and we even know that it has expected value equal to 1. One can then delete e, remove the edges incident to e, and proceed recursively along the infinite forest produced by this first step.
The local greedy algorithm produces a matching in the PWIT with an expected cost per vertex equal to onehalf, and this is intriguing since it is even smaller than the value ((2)/2 that we expected to show to be asymptotically optimal for the K,,,.
Unfortunately, the greedy matching for the PWIT simply cannot be the limit of a sequence of matchings for K,,. Moreover, one can even see this fact from first principles. The problem is that the greedy matching lacks a symmetry property that one must find in every limit object like the limit of the optimal matchings of K,,.
40 David Aldous and J. Michael Steele
fiG (e) = pc (v)
Tic (e') = pc
W Involution invariance
v would force
PG (V) = PG (W)
Fig. 8. Given C = (V, E) we let PG (v) denote the probability that v is the root of G, and if fi is the induced measure, we write ~G for the marginal measure given G. For any directed edge (v, w) of G, the measure ;~G puts mass pc (v) on the edge (v, w) and puts mass pc (w) on the edge (w, v). If one has involution invariance, then ;~G (V, W) = ;~G (w, v), and we see that pc (v) = pc (w). Thus, pc () is constant on each connected component of G whenever one has involution invariance.
Lemma 5 If ji is the probability distribution of G [X] where G [X] is given by the standard construction and C[X] is finite with probability one, then /A is involution invariant. Conversely, if the probability measure it is involution invariant and concentrated on the finite connected graphs in G, then JA is the distribution of a G,valued random variable G[Xl that is given by the standard construction.
To prove the direct implication, we first condition on the event that the underlying graph of G [X] is the fixed finite graph G. If G has n vertices and m undirected edges, then the marginal measure fiG puts mass l/n on each of the 2m directed edges of G, so we trivially get involution invariance. To prove the converse, one notes by the captioned discussion of Figure 8 that PG(.) is constant on each connected component of G. Finally, if we assume that G is connected, we see pc(.) is constant on G. This is precisely what it means to say that the root X of G is chosen uniformly from its vertex set, so the proof of the lemma is complete.
CONNECTION TO THE OBJECTS OF THE OBJECTIVE METHOD
One of the basic features of involution invariance is that it is preserved under local weak convergence in the metric space 9,. To prove this fact, one first shows that the topology on 9, extends in a natural way to a topology on ~. that makes ~. a complete separable metric space. Next one shows that the transformation p ~ fi is a continuous mapping with respect to weak convergence; specifically
d d
1An 1 Poo =~. /An Poo.
Finally, one shows that the involution map t ~ ~, is continuous.
The Objective Method 41
5.3 Involution Invariance and the Convergence of MSTs
The formalities of involution invariance give one no hint that the notion has much force, but the next theorem shows that involution invariance can be genuinely powerful. In fact, there is something fundamental about involution invariance, since, in a way, it replaces the spatial invariance that one uses constantly in the theory of homogeneous point processes. The pleasantly general Theorem 6 contains as special cases Rieze's ((3) theorem which was discussed earlier and the convergence theorem for MSTs of random samples in the cube which we will discuss in Section 6.
Theorem 6 (MST Convergence Theorem) Let G,,, denote a G,valued random variable such that with probability one Q, has infinitely many vertices and no two of the edges of C have the same length. Further, let {Gn : n = 1,2,...} denote a sequence of g~valued random variables such that for each n the distribution Of Gn is given by the standard construction and such that for each n the vertex set Of Gn has cardinality n with probability one. If
d
Gn ~ G,,. as n oo, (5.42)
then one has the joint weak convergence in 9,, x 9,,,
(Gn, MST(Gn) d _~ (G,,,, MSF(Q,.) (5.43)
Further, if N, denotes the degree of the root of MST(Gn) and N denotes the degree of the root of MSF(G,)~)
N,~ d ~~ N and E[Nn]+ EIN1 = 2, (5.44)
and, if Ln denotes the sum of lengths of the edges incident to the root of MST(Gn) and L denotes the corresponding quantities for MSF(G,'), then
Ln d ~ L. (5.45)
The applications of the limit (5.43) are easy to understand without working through the convergence proof. Nevertheless, a survey on local weak convergence should give at least one such proof in reasonable detail, and the limit (5.43) is a attractive candidate. The proof demonstrates several basic facets of involution invariance, and the net result is genuinely useful.
To begin the proof, we first note that the hypothesis (5.42) tells us the sequence {G,} is tight, and if we consider the sequence of pairs (G,, MST(G.) one can check that they determine a sequence {M,} of measures on 9, x 9. that is also tight. We then let p denote a measure on 9, x 9. that one obtains as the weak limit of a subsequence {/,.i }.
By Skorohod's embedding theorem, we may suppose the 9, x G.valued random variables (G,,,, MST(G.j) ) are all defined on a common probability space and that there is a 9,valued random variable S such that
42 David Aldous and J. Michael Steele
(G,j, MST(GJ) , (G,,,,, S) with probability one. (5.46)
From the definition of p as the weak limit of the sequence {pj }, we see that the proof of the limit (4.39) boils down to showing
S(w) = MSF(G,,J(w) with probability one. (5.47)
This identity will be proved in two steps one for each direction of the set inclusions.
THE FIRST INCLUSION is EASY
To show that MSF(G,,J(w) C: S(w) for almost all w, we begin by con~ sidering an edge e = (u, v) E MSF(G,,J that has length s. By the definition of the minimal spanning forest, we may assume without loss of generality that the component c(G,,, u; s) is finite. For convenience of notation, we will also assume that the subsequence nj is in fact the full sequence; when the argument is done, one will see that no generality was lost.
Now take a neighborhood H of u that is large enough to contain c(G., u; s) and note by the almost sure convergence of Gn to T that there is a finite N(W) such that for all n > N(w) there is an isomorphism between H and a neighborhood of the root of G, Also, since c(G., u; s) is finite, there is an E > 0 such that each edge of c(G, u; s) has length less than s .
If en = (U,, Vn) denotes the edge associated with c by the neighborhood isomorphism, then for large n the length of en is at least s E/2 and all edges of c(Gn, Un; s) have length strictly less than s /2. Therefore, there can be no path in G,, from Un to Vn that has all of its edges shorter than the length of c'. Thus, we find that there a finite N'(w) such that en is in MST(Gn) for all n > N'(w). Since MST(Gn) converges to S in G, we get our first inclusion
MSF(G~)(w) c S(w) with probability one. (5.48)
A MORE SUBTLE INCLUSION
To complete the proof we just need to show that the set difference
D(w) = S(w) \ MSF(G.)(w) (5.49)
is empty with probability one. By the general lemma below, we also see that it suffices to show that the number of edges in D that meet the root of G has expectation zero.
Lernma 6 Let G be an involution invariant element of G, with root r and let W be an involution invariant subset of the edges of G. If
E[ Y~ 1(e E W)]= 0,
e:r4~e
then W is empty with probability one.
The Objective Method 43
This lemma is proved in Aldous and Steele [7], and, since the way that it uses involution invariance is more simply illustrated by the proof of Lemma 7, we will not repeat the proof Lemma 6 here. Instead, we will go directly to its application.
Naturally, we want to apply Lemma 6 where W is taken to be the difference set D defined by equation (5.49), so our task is to estimate the corresponding expectation. If we let N,, denote the degree of root of G, in MST(G.) and let N denote the degree of the root of G,,, in MSF(G,.), then we have the usual Skorohod embedding and the inclusion relation (5.48) give us the bound
N < lim inf N, (5.50)
noo and the definitions of D, Nn and N also give us
Y7 1 (e E D) :5 (lim inf Nn) N. (5.51)
noc,
e:rEe
Since an nvertex tree has exactly n 1 edges, we always have the trivial limit
E[Nn] = 2(n 1)/n / 2,
so, by the bound (5.50) and Fatou's Lemma, we have
E[N] :5 E[lim inf N,,l :~ lim inf E[Nn] = 2.
noe n_00
This observation almost completes the proof of Theorem 6, and the next lemma does the rest. It is all we need to confirm that the expectation of the sum (5.51) is zero.
Lemma 7
E[N] = 2.
In fact, only half of the lemma remains to be proved. we just need to show E[N) > 2. To begin, we consider an edge e = (r, v) that is incident to the root r of MSF(G). If we write f for "finite" and oo for "infinite," then each such edge must be one of four possible types
(f, f ), (f, w), (oo, f ), and (co, ce)
according to the cardinality of the respective components of MSF(G,.) \ {e} that contain r and v. We will write Nff, Nf., N.f, and N.. for the numbers of edges of the corresponding types.
From the definition of the minimal spanning forest, we know that every vertex is in an infinite component of the MSF, so Nff is always zero. A more important observation is that unless r has infinite degree, the argument given
in the caption of Figure 9 gives us
2Nf. + N > 2. (5.53)
44 David Aldous and J. Michael Steele
The key step turns out to be a very simple involution invariance argument that shows that Nf. and N,,~f have the same expectation. Once this is confirmed, one can complete the proof of the lemma by noting that the bound (5.53) gives us
E[N] = E[Nf.] + E[N.f ] + E[N..]
= 2E[Nf~ + E[N,,~,,,] > 2.
1 ' \,V/, V
If the root is If the root is
a leaf not a leaf
Fig. 9. If the root of G. has finite degree, we are sure to have 2Nf_ + N_ ~: 2 because no edge of the MSF of G can connect two finite components of the MSF and because in the three remaining cases one always has either Nf_ ~: 1 or N~ > 2.
Finally, to establish the equality of E[Nf,,,l and E[N,,,,f], we first recall the measure fi that one uses to define involution invariance. For any directed edge e we have
E[Nf,~] = fi{e is of type (f, oo)}, and we also have
E[N,>,,f ] = fi{e is of type (oo,
= fi{e is of type (f, oo)},
where e is the reversal of e. Now, since G,,, is involution invariant, we have equality of reversals, so
fi{e is of type (f, o<))} = fA{e is if type (f, oo)},
and we find at last that E[N~fl = E[Nf,),j. This completes the proof of Lemma 7, and hence completes the proof of Theorem 6 as well.
The Objective Method 45
A THOUGHT ON INVOLUTION INVARIANCE
The way that involution invariance does its work is by telling us that "whatever is true from the perspective of the root r" is also true from the perspective of the root's neighbor v. In the case of a connected graph, one can chain this argument from vertex to vertex and conclude that "whatever is true from the perspective of one vertex is true from the perspective of any other vertex." This statement is far from precise, but the derivation of the identity E[N.fl = E[Nf.] suggests that it provides honest guidance.
RESULTs RELATED TO THE MST CONVERGENCE THEOREM
The MST Convergence Theorem given here is new, and it generality serves to underscores the benefit of working with the local weak convergence of sequences of 9,valued random variables. The simplicity of Theorem 6 should not conceal the fact that it contains as special cases both the basic convergence theory for MSTs for weighted complete graphs developed by Aldous [11 and the corresponding theory for random Euclidean points developed by Aldous and Steele [61. Avram and Bertsimas [121 independently studied both of these cases, and one further finds that their calculations are also much more transparent when viewed in the light of local weak convergence.
5.4 A Heuristic That Works by Focusing on the Unknown
The greedy algorithm gave us an excellent matching on the PWIT, but it lacked involution invariance and did not help a bit in our hunt for a good matching on K.. Moreover, the symmetry of the PWIT makes it hard to see how one can do better; almost any matching one might find by good guessing is likely to face the same failings that were met by the greedy matching. Nevertheless, a good involution invariant matching can be found, provided that one takes a hint from the analysis of the maximal partial matching problem.
The overarching plan is to consider the minimal cost per vertex of an involution invariant perfect matching on the PWIT and to treat this cost as an "unknown." We then begin the hunt for an equation that contains this unknown, and, in due course, heuristic considerations will lead us to a distributional identity. This identity can then be solved rigorously, and its solution can be used in turn to build a joint process with one marginal being a PWIT and another marginal being random variables that satisfy the distributional identity. These auxiliary variables can then used to construct an 9nclusion rule" that provides a test for an edge's membership in the matching on the PWIT.
One small wrinkle this time is that we will focus on a simple difference, rather than the more subtle bonus that we used before. Specifically, we focus on the difference
D(T) = C(T) C (T
46 David Aldous and J. Michael Steele
where C(T) is the minimal cost of a perfect matching of T and QT \ {r}) is the minimal cost of a perfect matching of the graph T \ {r}. Naturally, both C(T) and C(T \ {r}) are infinite, so D(T) is not really well defined. Nevertheless, in our analysis of the maximal partial matching problem we met a similar difficulty, and we still managed to prevail. We have every reason to hope for a similar outcome this time.
If j is a child of the root, then we let T.7 denote the descendants of j and we view Ti as a rooted graph with root j. We then introduce the analogous heuristic difference D(P), and note that even though D(T) and D(P) are not honestly defined, one can continue in the brave tradition of physical scientists and argue that nevertheless they should satisfy the distributional identity
D(T) = min { ~j D(P) (5.54)
1
To see this, we first note that the minimal cost of a matching that contains the edge (r, j) from the root r to the vertex j has cost
~j + C (Ti {j}) + 1: C (P) (5.55)
i:i96j
and that C(T) is just the minimum of all of these values. Also, we have the direct identity
C (T C(P). (5.56)
The difference of equations (5,55) and (5.56) is just ~j C(P) + C(Ti {j}), so in the end we see that
C (T) C (T \ {r}) = min { ~j C (P) + C (Ti (5.57)
1
5.5 A Distributional Identity with a Logistic Solution
Equation (5.54), or its expanded equivalent (5.57), motivates us to begin our rigorous analysis by first considering the distributional identity
X 1 min {~j xj} Oo < X < 00, (5.58)
lsi
which we interpret to be an identity for a distribution F that is fully specified by supposing (1) X F, (2) the Xi are independent, (3) Xi F for all i = 1, 2,..., and (4) the sequence {Xi} is independent of the process 0 < ~1 < ~2 < ... which is a Poisson process with constant intensity equal to one.
Naturally, one wants to know about the existence and uniqueness of a distribution F that satisfies this distributional identity. The next lemma provides an answer of striking simplicity.
The Objective Method 47
Lemma 8 (Logistic Solution of a Distributional Identity) The unique solution of the identity (5.58) is the logistic distribution
F(y) = 1/(1 e1), oo < y < oo (5.59)
corresponding to the densityfunction
f (y) = (ey/2 + CY12) 2, 00 < y < 00.
BETTER THAN A PROOF A DERIVATION
Rather than just give a direct verification of this lemma, one does better to see how the probability theory of the distributional identity leads one to discover the connection to the logistic distribution. Thus, for the moment, we assume that the distribution F satisfies the identity (5.58), and we will try to calculate F.
We first note that the distributional identity (5.58) asserts that
1 F(y) = P in {~i Xi} ~~ y
where the {Xi : i = 1, 2,...} on the righthand side are i.i.d. with distribution F and {~i : i = 1, 2, ...} is an independent Poisson process. Therefore, if we consider the collection of points P in R' given by { (~i, Xi) : i = 1, 2,...} we see that P is a Poisson process on (0, oo) x (oo, oo) with intensity dzdF. As a consequence, we have
1 F(y) =P(no point of 'P is in {(z, x) : z x < y}
= exp dz dF(x)
( < Y
= exp {1 F(z y)} dz)
= exp fwy {1 F(u)}du) .
When we differentiate this identity we find
F'(y) = (1 F(y)) (1 F(y)), (5.60)
and the symmetry of this equation implies that the density F() is symmetric.
As a consequence, we see that equation (5.60) is equivalent to
F'(y) = F(y)(1 F(y)),
which is easily seen to be separable. We can rewrite it as
dy = dF = dF + dF
F(1 F) F 1 F
48 David Aldous and J. Michael Steele
and solve by inspection to find
y = log F or F(y) = 1/(1 ey). 1 F
Thus, F is simply is the logistic distribution. It is famous for popping up in odd places, but its appearance here still a bit surprising. The remainder of our analysis does not call on any theory of the logistic distribution beyond the easily checked fact that it has variance 7r 2 /3.
5.6 A Stochastic Process that Constructs a Matching
With the explicit solution of the distributional identity (5.58) in our hands, the next step is to construct a joint process where the distributional identity (5.58) suggests an inclusion criterion that can be used to define a good matching on the PWIT. Specifically, we will construct simultaneously (1) a PWIT and (2) a special collection of random variables that are associated with the edges of the PWIT. The variables in this special collection are intended to mimic the differences D(Ti) that led us to the logistic recursion. To put this plan into action, we will need to introduce some notation that makes more explicit the consequences of cutting a PWIT at one of its edges. NOTATION FOR A SPLIT PWIT
When we take a PWIT and delete the edge (u, v) we get two disjoint trees that we can view as being rooted at u and v respectively. To help keep track of these trees (and the difference functionals that we will associate with them), we assign to each edge of the PWIT a pair of directed edges.; formally, we just distinguish between (u, v) and (v, u). Given any graph G we let Ec denote the set of edges obtained by directing the edges of G, and, if T is a PWIT with edge weights ~,, then we give the same weight to each of the directed edges associated with c. Finally, if e = (u, v) is a directed edge and X(e) is a random variable associated with that edge, we will often simply write X (u, v). In the case of the PWIT, the weight of the directed edge ~,, will also be written as ~(u, v). By definition we have ~(u, v) = ~(v,u), but, typically, X (u, v) and X (v, u) will be different.
Lemma 9 (MAple Tree Process) There exists a process (T, ~., X(e) : e E E7) where T is a PWIT with associated edge lengths {~, : e E Er} and where { X(e) : e E Er} is stochastic process indexed by the directed edges of T with the three following properties:
(i) for each directed edge e = (u, v) E Er we have
X (u, v) = min{~(v, w) X (v, w) : (v, w) E E1, w 7~ u}, (5.61)
(ii) for each e E ET that is directed away from the root of T the random variable X(e) has the logistic distribution, and
(iii) for each pair of oppositely directed edges (u, v) and (v, u) in Er, the random variables X (u, v) and X (v, u) are independent.
The Objective Method 49
............
T (v, u)
U V
T(u, v)
.......... ............
Fig. 10. When one cuts a PWIT at an edge, one obtains two trees that one can label by using the directed edges associated with the cut edge. The tree T(U, v) labelled by the directed edge (u, v) has the root v and contains all of the edges that follow v in the direction of (u, v).
The proof of Lemma 9 is easy and instructive; it shows how one can exploit a distributional identity to construct useful processes on trees and other graphs. To begin, we fix an integer g ~: 1 and then for each directed edge (u, v) with u in generation g and v in generation g + 1 (relative to the root), we create by flat an independent logistic random variable X (u, v). We next use the identity (5.61) to define X(e) for each edge within g generations of the root. This construction gives us a collection C, of random variables whose joint distribution satisfies properties (i), (ii) and (iii) for all vertices in the first g generations. Moreover, the distribution of the collection C, is equal to the marginal distribution of the larger collection C,+, restricted to the first g generations, so by the Kolmogorov consistency theorem there is a collection C,<, such that for each g the marginal distribution of C,,,, restricted to first g generations is equal to the distribution of C, 11
A USEFUL ADDENDUM TO THE TRIPLE TREE LEMMA
Once the triple process is in hand, one can go further, in fact, one can show that the process has some intrinsic independence properties that are not an explicit part of the construction. Moreover, we will find that this additional independence has a key role in our later computations.
First we take a fixed 0 < x < oo, and we condition on the event that there exist an edge a the root that has length x. We call this edge (r, v) and note that as in Figure 10, the edge determines two subtrees of the PWIT that one could label T(r, v) and T(v, r). As we have noted before, a Poisson process conditioned to have a point at x is again a Poisson process when this point is deleted, so by the definition of the PWIT we see that T(u, v) and T(v, u) are conditionally independent copies of the original PWIT. We therefore find an addendum to Lemma 9:
(iv) conditional on the existence of an edge (r, v) from the root having length x, the random variables X (r, v) and X (v, r) are independent random variables with the logistic distribution.
50 David Aldous and J. Michael Steele
Some of the double edges into and out of v
X (U, v) Infinitely many
............. v
edges are not drawn.
X (V, U)
Fig. 11. The edges that feature in the identity for X (u, v) are those directed edges out of v indicated by the solid arrows; note that the edge (v, u) is not one of these. Note also that the weight ~. of an edge does not depend on its direction, but the value of X does.
CONSTRUCTION OF AN EXCELLENT MATCHING ON T
For the triple process (T, ~, X) given by Lemma 9, one can use the additional information carried by the X process to specify a good matching on T. As before, the idea is to use an inclusion criterion that is motivated by the heuristic derivation of the distributional identity (5.58).
To define this matching, we first consider the map W : V > V on the set V of vertices of the PWIT that is given by W(v) = v* where v* is the vertex for which ~, X(e) is minimized when e = (v, v*). The next lemma confirms that the set of edges (v, v*) is a matching., we will subsequently show that it is a good one. More precisely, one can show that this matching is optimal in the sense that no other involution invariant matching of aPWIT can have a lower average expected length per edge.
Lemma 10 (Matching Lemma) The set of edges (v, v*) defined by the rule W(v) = v* is a matching on the special PWIT T.
To prove Lemma 10, we first note that it is exactly equivalent to showing that W is idempotent; that is, we must show that M)* = v for all v E V. Now, by the definition of W and the recursion (5.61) for X, we have
~(v, V*) X(V, V*) = min{~(V5 Y) X(V, Y) (V, Y) E T}
< minWV, Y) X(v, Y) (v, Y) E T, y 54 v*} = X(v*, v), or, a bit more simply
The Objective Method 51
~(v, V*) < X(V, V*) + X(v*' v). (5.62)
On the other hand, if z is a neighbor of v other than v*, then we have
(v, z) X (v, z) > min {~ (v, y) X (v, y) (v, y) E T}
min{~(Vi Y) X(v, Y) (v, Y) E 7, Y 0 Z}
X (Z, v).
This inequality tells us ~(v, z) > X (v, z) + X (z, v) whenever z 7~ v, so we see that v* is the unique neighbor of v that satisfies the inclusion criterion (5.62).
We can therefore see that we could have just as well defined ~o(v) as the unique v* satisfying the criterion (5.62). Finally, if one notes that both sides of the criterion (5.62) are symmetric we see that W(W(v)) = v, just as we hoped to show.
WISDOM OF A MATCHING RULE
Given the maneuvering that led us to the matching rule r* = W(r), one has a right to expect that there is no invariant matching that has a lower expected cost per vertex. The next lemma asserts that this expectation is justified.
Lemma 11 (Optimality on the PWIT) Let M denote any matching of 7_ that is involution invaiiant, If (r, s) is the edge of M that contains the root r of T, then one has
E[~(r, s)] ~~ E[~(r, (5.63)
The proof of Lemma 11 is unfortunately too lengthy to be included here, and even a convincing sketch would run beyond our boundaries, so for most details we must refer the reader to Proposition 18 of Aldous [41, which additionally proves that inequality (5.63) is strict except when the matching M is almost surely identical to the matching defined by W.
5.7 Calculation of a Limiting Constant: 1r2/6
If M~pt is the matching given by Lemma 10 and if r* = W(r) is the vertex that is matched to the root r by Mopt, then the expected value per edge of the weight of the M.pt is simply equal to EK(r, r*)]. This expected value turns out to be surprisingly easy to compute.
Lemma 12
*A W2
E[~(r, ' _r
52 David Aldous and J. Michael Steele
To prove the lemma we first fix 0 < x < oo and then we condition on the event Ax that there exists an edge (r, v) incident to the root of T that has length x. The inclusion criterion (5.62) then tells us r is matched to v if and only if we have
x < X(r, v) + X(v, r).
By the addendum to Lemma 9 we know that conditional on the event A.
the pair (X (r, v), X (v, r)) same distribution as a pair (X, Y) of independent
logistics, so by the uniform intensity of the Poisson process on [0, oo) we have
E[~(r, r*)] = xP(x < X + Y) dx = 1 E[(X + y)21(X + y > 0)].
10 ~ 2
Finally, the symmetry of the logistic distribution lets us rewrite the last expectation as
E[(X + y)2] E [X2] = 7F2 /6, 2
where in the last step we used the fact noted earlier that the logistic distribution has variance 7r 2 /3.
5.8 Passage from a PWIT Matching to a K, Matching
We are now in position to sketch the connections that complete the proof of Theorem 5, and we begin by noting that the observations in hand already give us one half of the theorem. For each n we let M, denote the optimal matching on K,, and we view M,, as a 9,valued random variable by setting the root of M,, to be the root of K,,. By the tightness of the joint process (M,,, K.) we know that for every subsequence of n = 1, 2, ... there is a further subsequence nj such that (Mj, K.j) converges weakly in G. x G.. By Skorohod's theorem and our earlier analysis of the weak limit of K,,, we may even assume that
Mj, K.j) (MJ) with probability one, (5.64)
where T is a PWIT and M is a matching on T. Moreover, from the fact that Mn is randomly rooted, we see that M is involution invariant.
Now, if c(MJ denotes the cost of matching the root in M. and c(M) denotes the cost of the edges that meets the root of T, then from the limit (5.64) and the definition of the topology on G, we have
c(M,J > c(M) with probability one.
From this limit, Fatou's Lemma, and Lemma 12 we then find
lim inf E[CMj)l ~~ E[c(M)l > 2 /6.
noo
Finally, by the generality of the subsequence {nj}, we therefore find in the notation of Theorem 5 that
The Objective Method 53
lim inf 2 E[C,,l ~~ 7r 2 /6. (5.65)
noo n
This proves half of the equality (5.41) of Theorein 5; the bad news is that this is the easy half.
THE HARDER HALF AND ITS Two PARTS
The harder half of Theorem 5 breaks into two natural steps. First one needs to show that there is a lowcost Efeasible matching for K,, for large values of n, and then one needs to show that such an Efeasible matching can be modified to provide an honest matching without substantially increasing the total cost.
To make this plan precise, we first recall that an feasible matching of a graph C with n vertices is a subset M* of edges of G with the property that each vertex of G is in some edge of M* and at most En elements of M* fail to be isolated from the other elements of M*. The fact that one can cheaply convert a lowcost Efeasible matching of K. to an honest lowcost complete matching of K,, is (almost) purely combinatorial. We do not deal here with that part of the argument, but one can refer to Proposition 2 of Aldous [31 for a proof of an analogous result in the context of bipartite matching.
HOW TO USE A MATCHING ON A PWIT To FIND ONE ON K~
We continue to work with the special PWIT ] that was constructed as part of Triple Tree Process of Lemma 9, and we let M denote the matching defined on T that was constructed in Lemma 10. Next, we fix a Q > 0, and we let N. (r, T) denote the neighborhood of radius p about the root r of 1. We view N, (r, 7_) as an element of 9, with root r, and we let N, (r, K.) denote the analogously defined Qneighborhood of the root of K,. Here one should note that N.(r, T) is always a tree since it is a subtree of T, but the corresponding neighborhood of K, given by Ne(r, KJ will not necessarily be a tree, Nevertheless, since p is fixed, the probability that N. (r, KJ is a tree will go to one as n becomes large.
Now, for any G E G, that is a feasible realization of N,(r, T) and for any w c 7, we let q(wIG) denote the conditional probability that r is matched to w in M given the event that the neighborhood Nc(r, T) is isomorphic to G as an element of G.. We will use q(wIG) to guide a sampling process on Kn that we can use to help build a matching.
Specifically, for any 1 < n < oo we define a subset M (g, n) of the directed edges of K,, by the following process: for each u E Kn we choose (u, v) with probability q(vl N,(u, K,)). Also, in just the same way, we define a subset M(&, oo) of the directed edges of ]_ that is, for each u EE T we choose the directed edge (u, v) with probability q (v IN, (u, T)). The key observation is that the event A(p, oo) defined by
A (g, oo) = { r is in exactly two edges of M (Q, oc)
1
54 David Aldous and J. Michael Steele
satisfies the identity
P (A (g, oo)) = 1 77 (Q) where 17 (p) 0 as p4 oo. (5.66)
Thus, if we collapse each directed loop {(u, v), (v, u)} of M (p, oo) to a single undirected edge, then for large 2 we have a high probability of obtaining an honest matching on T.
Now, we consider the analogous events for Kn. Specifically, we let A(p, n) denote the event that (1) the root r of Kn is in exactly two directed edges of M (p, n) and (2) these edges have the form (r, v) and (v, r) for some v. The local weak convergence of Kn to T and the definition of 77(p) then tells us that
lim P(A(p, n)) = 1 71(g). (5.67)
noo
Next, we let M' (p, n) denote the set of undirected edges one obtains by collapsing each directed loop Ev, w), (, v)} of M (p, n) into a single undirected edge. One can then show that the limit (5.67) implies that there is an E(Q) such that s(p) ~ 0 as p , oo for which we have
lim inf P(M'(p, n) is e(p)feasible) > 1 E(p). (5.68)
noo
Finally, to complete our sketch of the proof of Theorem 5, we just note that one can now use local weak convergence to relate the costs of the edges of M' (p, n) to the costs of those in the matching M on T, and, in due course, one can use this correspondence to show that the asymptotic mean cost of M' (Q, n) is at most n/2 times 1r2 /6. This is precisely what one was brought to expect from the calculation in Lemma 12, and thus it completes the sketch.
When one looks back on this construction, one can confirm that the basic idea of sampling from a locally defined conditional distribution is pleasantly general. In fact, there are many problems where a similar sampling method can be applied, and this method of conditional sampling seems to express one of the more fundamental ways that one can exploit local weak convergence. Further elaboration of the sampling construction is expected to be part of Aldous and Steele [71.
5.9 Finally Living Beyond One's Means
One of the original motivations for the study of the assignment problem was the discovery of Karp [371 that under the model of independent uniform costs % the value of A,, provides an impressively tight lower bound on the minimal cost of a tour of the vertex set of the complete directed graph where the cost of using the directed edge (i,j) is cij. This discovery led in turn to sustained exploration of algorithms that use the assignment problem and branch and bound techniques for the travelling salesman problem (cf. Balas and Toth[131), and it also motivated more detailed investigations of the precision of An as an approximation for the TSP cost (cf. Karp and Steele [391).
The Objective Method 55
While the objective method is particularly well focused on issues such as the convergence of the means E(AJ and the determination of this limit, it does not provide ally information on the variance of A,, or other measures of concentration of A,. For such information, other methods must be applied.
For example, Karp and Steele [391 used combinatorial and coupling methods to show that with high probability the greatest cost of an edge in an optimal assignment of size n is 0(log 2 n/n), and Talagrand [621 used this observation to illustrate one of his isoperimetric inequalities to show that there is a constant 0 < c < oo such that
Var(An) :~ c (log n)4 for all n > 1.
n
More recently, F~ieze and Sorkin [23] improved the bounds of Karp and Steele [39] to show that with high probability the longest edge in 0(log n/n), and the same technique was then further refined by Lee and Su [45] to show that there are positive constants cl and C2 such that
cin5/2 < Var(An) :5 G2 (log n) 2. (5.69)
n
Despite the wide gap between the bounds (5.69) they are the best that are currently known. Nevertheless, one certainly expects that in fact there is
2 > 0 such that a 2
Var(A.) as n ~ oo. (5.70)
n
Unfortunately, the objective method is powerless to help, and for the moment at least, the conjecture (5.70) seems to be far over the horizon.
56 David Aldous and J. Michael Steele
6 Problems in Euclidean Space
The objective method is especially well suited for many problems of Euclidean
combinatorial optimization. Typically, one begins with an optimization prob
lem that is driven by a sample {Xi : 1 < i < n} of independent random
variables with the uniform distribution on the unit dcube [0, Ild, so after
choosing a random "rooC I that is uniformly distributed on {1, 2, ..., n} and
independent of the {Xi}, one can take immediate advantage of the classical
fact that
n Ild(X, _ X, : 1 < i < n) d + Po as n ~ oo, (6.71)
where Po = P U {0} and TI is a homogenous Poisson process on R d with uniform intensity equal to 1.
Now, throughout this survey, we have interpreted the indicated convergence as local weak convergence as defined in Section 2, so it may seem unfair to call this result classical. Nevertheless, once one agrees to interpret the two sides of the limit (6.71) as rooted geometric graphs where the root is the vertex at the origin of Rd and where the edge lengths are given by Euclidean distances, then (up to an irrelevant rotation) local weak convergence (6.71) is in fact equivalent to the traditional weak convergence of point process theory.
The bottom line is that for problems of combinatorial optimization for points in R d' the Poisson limit (6.71) assumes the role that Grimmett's Lemma 1 played for the uniform trees and that Aldous's Theorem 3 played for the exponentially weighted complete graphs. Thus, for Euelidean problerns the first step of the objective method comes gratis; the game only begins with the introduction of an optimization problem.
6.1 A Motivating Problem
One of the early motivations for the objective method was a problem sparked by an empirical observation of R. Bland, who noted from simulations of an algorithm for the MST that for n independent uniformly distributed points in the unit square that one consistently finds that
1: 1e12,c>0 asnoc,
eEMST
where lel denotes the usual Euclidean distance from u to v when e = (u, v).
Stimulated by Bland's observation, Steele [58] proved that if the random vari
ables {Xi} are independent and have a distribution p with compact support,
then for all 0 < ce < d one actually has
n (ad)/d ~ ler ,(a, d) f(X)(dc,)ld dx a.s., (6.72)
e(=MST /Rd
where f is the density of the absolutely continuous part of IA and c(a,d) is a constant that depends only on a and d. This result goes beyond Bland's
The Objective Method 57
conjecture in several respects, but, ironically, the subadditive method used in Steele [581 falls short of covering the motivating case a = d. This is where the objective method entered the picture.
BENEFITS OF HINDSIGHT AND A PROPER POINT OF VIEW
Fifteen years ago it was not clear how one could make further progress on Bland's conjecture, but with the benefit of hindsight a good plan is obvious. One simply needs to extend the Poisson convergence (6.71) to the joint convergence the point sets and the corresponding spanning trees.
When this was done in Aldous and Steele [61, there was some mystery to emergence of a forest as the limit of a sequence of minimal spanning trees. Now the reasons for the appearance of the minimal spanning forest are understood more deeply, and, if one simply applies the MST Convergence Theorem of Section 5.3 to the Poisson limit (6.71), one obtains a theorem that explains most of the qualitative behavior of the MST of a random sample from the dcube.
Theorem 7 For the randomly rooted normalized sample
Sn = nIld(X, _ X, : 1 < i < n)
of independent random variables with the uniform distribution on [0, 1]d, one has the joint local weak conveTyence
(5,,, MST(SJ) ~'> (P,,, MSF(P,,)). (6.73)
A QUESTION OF MEANS
On a conceptual level, the limit (6.73) tells one almost everything there is to say about MST(SJ for large n. Nevertheless, there are inevitably technical hurdles to be cleared when one applies Theorem 7 to a concrete problem. For example, to obtain the convergence of the mean of MST(Sn), one needs some information on the uniform integrability of the set of variables {MST(S.) : n = 1, 2.}.
Uniform integrability of this sequence was proved in Aldous and Steele [6] by direct estimates, and later the 9ens geometry" of the MST and spacefilling curves were used in Chapter 5 of Steele [601 to give more inclusive moment bounds. In the end, one finds for all p ~l, 1 and all d there is a constant fid > 0 such that for random uniform samples of size n from the dcube one has
1: leld ~ pd in LP as n ~ oo. cEMST
Moreover, one can identify the limit constant i6d in terms of the limit object by the formula
fid E leld 2 [Y
' ER
58 David Aldous and J. Michael Steele
where R is the set of edges of the MSF of Po that are incident to the root at the origin of Rd.
The objective method almost always gives one a characterization of the limit constant in terms of the limit object, and we have seen examples in Sections 3 through 5 where this characterization could be pressed to provide an exact determination of the limit constant. In this respect, Euclidean prob~ lems are different. Despite tantalizing characterizations such as that provided by formula (6.74), none of the limit constants of finite dimensional Euclidean optimization have been determined analytically. The reason for this failing seems to be that in the Euclidean case, one does not have the selfsimilarity that is crucial to the recursions that lead us to the limit constants in results like Theorem 2 and Theorem 5.
6.2 Far Away Places and Their Influence
Throughout the theory of Euclidean combinatorial optimization, one finds hints that tractable problems must be local in some sense; or, looked at the other way around, one finds that in tractable problems the points that are far away cannot have much influence on the problem's solution in the neighborhood of a nearby point. This intuition can be formalized in several ways, but the recently introduced notion of stability is perhaps most in tune with the philosophy of the objective method.
SUMS OF LOCAL CONTRIBUTIONS
To begin, we consider nonnegative functions of the form ~(x; X) where x E R' and X is a subset of Rd that is locally finite in the sense that each bounded subset of Rd contains only finitely many elements of X. Also, if a is a positive scalar and y Ei Rd we denote the usual dilations and translations of X by
aX :={ax:xEX} and y+X :={y+x :xE X}. If we assume ~ is translation invariant in the sense that
~(y + x; y + X) = ~(x; X) for all y and X, then the sum H~(X) defined by setting
(X) : = 1: ~ (X; X) (6.75)
xEX
is also translation invariant in the sense that H~(y + X) = H~ (X) for all y c Rd. Moreover, sums of this form (6.75) may be used to represent almost all of the functionals of interest in Euclidean combinatorial optimization.
For example, the length of the MST of X may be written as H~ (X) if we take ~(x., X) to be half of the sum of the edges of the MST that are incident to X, so
The Objective Method 59
2 E ]X Y1 1( (X, y) E MST(X)
YEX
For a less obvious example, consider the representation H~ (X) of the number of connected components of the nearest neighbor graph of X. In this case we can take
~(x., X) = 1/card(C.)
where Q, is the component of the nearest neighbor graph of X that contains the vertex x. Essentially all of the Euclidean functionals discussed in Steele [601 or Yukich [651 may be represented as H~(X) for an appropriate choice of ~(x; X).
STABLE FUNCTIONALS AND THE INFLUENCE FUNCTION
One benefit of the representation H~ (X) is that the summands ~(x; X) suggest a natural way to formalize the idea of local dependence. To explain this idea, we first take r > 0 and let B(x; r) denote the Euclidean ball of radius r about x, then, for any locally finite point set S C R d and any integer m > 1 we set
Tn) :== sup (ess sup,,,,{~(0,. (S n B(O,. m)) U A)}
nEN
and where ess sup,,,, denotes the essential supremum with respect to Lebesgue measure on Rd, and where A C Rd \ B(O; m) is a set of cardinality n. Analogously, we define
~(S; m) := inf (ess inf,ln{~(0'~ (S n B(O; m)) U A)}
nEN
and we think of and ~ (S; Tn) and ~(S; m) as measures of the changes that one may make by a iinite additions to S outside a ball of radius m. The interesting case occurs when these two possibilities are comparable for large
M.
Definition 5 (Stable Functionals and the Influence Function)
The functional ~ is said to be stable on the locally finite set S provided that one has
Iiin ~(S5 m) = lim ~(SI m). (6.76)
rn C10 M00
Moreover, when ~ is stable on S, we let ~. (S) denote the value of the common limit and we call (S) the influence function of ~ on 8.
The locally finite sets that are of most interest to us are naturally the realizations of homogeneous Poisson processes, and one should note that if P, is the homogeneous Poisson process with constant intensity r on R d then the sequence of random variables ffiP,; m) : m = 1, 2, ...} is nonincreasing and the sequence {~ (P,; m) : m = 1, 2, ...} is nondecreasing. Thus, both sequences
1
60 David Aldous and J. Michael Steele
converge almost surely, and we see that ~ is almost surely stable on P, if the two limits are equal with probability one. Many of the natural functionals ~(x, X) of Euclidean combinatorial optimization have this property, and there are some useful consequences of this fact.
A WEAK LAw OF LARGE NUMBERS
Now consider a sequence X,, X2,... of independent ddimensional random variables that have a common density f. Next, set X. = {Xl,,X.} and consider the scaled summand
~.(x; X) := ~(nl/dx; nl/dX n (6.77)
together with the corresponding scaled sums
(Xn) := 1: ~.(X; Xn) = 1: ~(n'/'x; nl/dy n
XEX. XEX~
When ~ is stable for almost every realization of any homogeneous Poisson process, these sums typically satisfy a weak of large numbers; in fact, this notion of stability was introduced in Penrose and Yukich [531 for the purpose of framing such weak laws. The next theorem is typical of the results obtained there.
Theorem 8 For each constant 0 < r < oo we suppose that ~ is almost surely stable for the homogenous Poisson process P, with intensity r, and we let ('P,) denote the corresponding influence function. If the sequence
~(nl/dX,; n'1dX,,) : 1 < n < oo } is uniformly integrable, (6.78)
where observations in the sample Xn = {X,, X2,._ Xn} are i. i. d. with density f on Rd, then the influence function satisfies
I[] = E[~.(P,)] < oo for each 0 < r < oo such that f (r) > 0, (6.79) and the normalized sums satisfy
n'fi~(Xn) I [f (x)]f (x)dx in L'. (6.80)
CONSEQUENCES OF HOMOGENEITY
Many of the functionals addressed by Theorem 8 have the property that there is a constant y > 0 such that
~(ax. aX) = a"~(x; X) for all a E R',
and, in this case the righthand side of the limit (6.80) is more simply written as
1
The Objective Method 61
E[e.(P1)] /Rd f(,)(d,y)/dd,. (6.81)
Moreover, when ~ is scale invariant, or homogeneous of order 0, the limit
(6.81) simply boils down to and the limit given by Theorem 8
does not depend on the density of underlying point set. Formula (6.81) also
makes it clear that although ~,, may at first seem subtle to define, it has an
inevitable place in the limit theory of the normalized sums H~.
CONFIRMATION OF STABILITY
Theorem 8 applies to many problems of computational and it provides limit laws for the minimal spanning tree, the knearest neighbors graph Q341, [20]), the Voronoi graph [321, the Delaunay graph, and the sphere of influence graphs Q271, [331). Nevertheless, before Theorem 8 can be applied, one needs to prove the associated summands ~(x, X) are indeed stable on P, for all 0 < r < 00.
In some cases, stability may be proved easily, but, in other cases, the proof may be quite challenging. For example, even the stability of the MST is not easy without the help of some powerful tools. For example, Penrose and Yukich [531 prove stability MST with help from the idea of a blocking set, which is a set of points that effectively isolates the MST in a neighborhood of the origin from any influence by points outside of a certain large disk. This basic structure was introduce by Kesten and Lee [40] (cf. Lemmas 3 and 5), and it provides one of the keys to the of the central limit theory for the MST as further developed by Lee ([431, [441).
A COX PROCESS COUPLING
Finally, we should note that the proof of Theorem 8 given in Penrose and Yukich [53] calls on an interesting coupling argument that is likely to have further applications. The key observation is that one may simulate samples that closely approximate the samples from a general density f with help from a Cox process that has a random but conditionally uniform intensity. This approximation provides one with a new way to convert results for uniform samples to results for nommiform samples, and, in the right circumstances, it seems to have advantages over older methods where one first approximates f by a locally flat density and then creates a coupling. Nevertheless, each method ultimately relies on some sort of smoothness of the sums H~.
6.3 Euclidean Methods and Some Observations in Passing
There are so many connections between percolation theory, minimum spanning trees, and the objective method that we cannot survey them in detail, but we do want to review a few developments that seem particularly informative. The first of these is a recent observation of Bezuidenhout, Crimmett, and L6filer [161 that draws a simple connection between MSTs and the notion of free energy that originates in statistical mechanics.
62 David Aldous and J. Michael Steele
MSTs, FREE ENERGY, AND PERCOLATION CONNECTIONS
To begin, if V is a subset of R' and a > 0, we let F(a, V) denote the graph with vertex set V and edge set consisting of pairs (X, y) of elements of V such that 1x yl < ce. Next, we let 'P denotes the homogeneous Poisson process on Rd with unit intensity, and take S(a) to be the size of the connected component of F(a, P U {0}) that contains 0, the origin of Rd, and, finally, we define K(Ce), the free energy of the Poisson process, by setting
r.(a) = E [11S(a)].
The free energy turns out to have a remarkably simple relationship to the empirical distribution of the edge lengths of the minimal spanning tree T', of the set Vn = Pn 1n, n]d. Specifically, if we consider the empirical distribution of the set of edge lengths of T,
F.(c,) = (JV.1 1)1 E 1(lei < c,),
e.ET,
then we have the succinct limit
lim Fn (a) = 1 r,(a) a.s. and in L'. (6.82)
n,>o
This engaging identity may be explained in just a few lines. The key observation is that if v(a,n) denotes the number of connected components of the graph F(ck,P n [n, n]d), then by counting the missing edges we have
F. (ce) = 1Vnj (a, n) (6.83)
~v~ 1 '
To finish off, one now just needs to note that the Ergodic Theorem for the Poisson process can be used to show
lim v(oz, n)/JVn] = E [11S(a)] := r,(a) a.s. and in V,
n~
so the limit (6.82) follows immediately from the identity (6.83).
Naturally, if one is to make good use of this observation more work is needed, and the real effort in Bezuidenhout, Grimmett, and Uffier [161 goes into the development of the analytical properties of the free energy and its analogs for Bernoulli site percolation. In that setting, they show that the free energy defect H(a) = 1 r,(a) is indeed a distribution function, and they also show that one has the convergence of the corresponding mth moments for all 0 < rn < oo,
/0 ~ a' dFn(ce) > 1000 a'dH(a) a.s. and in V.
1
The Objective Method 63
A SECOND PERCOLATION CONNECTION FOREST vs. TREE
When the minimal spanning forest for the Poisson process was introduce in Aldous and Steele [61, there was a strong suspicion that the minimal spanning forest might actually be a tree at least in low dimensions. This conjecture was confirmed in dimension d = 2 by Alexander [8], Example 2.8, but, the question remains open for dimension d > 3.
The suspicion now is that the MSF of the Poisson process is not a tree if the dimension is sufficiently high, and Theorem 2.5, part (ii), of Alexander [8] provides some indirect evidence for this conjecture. Because of results of Hara and Slade ([28], [291) that show one does not have percolation at the critical point for dimension d > 19 for Bernoulli bond percolation, Alexander's theorem implies that in high dimensions a related MSF does not have any doubly infinite paths. If one relies on the analogies between Bernoulli bond percolation, Bernoulli site percolation, and continuum percolation, Alexander's result suggests in turn that if the MSF of the Poisson process is a tree, then in high dimensions that tree must consist of a sequence of finite trees that are connected to one singly infinite path. Finally, for such a structure to cover all of the point of a Poisson process, this path would then be forced to wander around space in a way that does not seem feasible.
INCLUSION CRITERIA AND COMPUTATIONAL COMPLEXITY
Almost all of the probability problems of Euclidean combinatorial optimization may be framed in terms of a graph G, that depends on a random sample {Xl, X2., Xn} in R', and for many of these problems the subadditive method introduced by Beardwood, Halton and Haminersley [141 provides a competing technology to the objective method. Although the objective method has intuitive and analytical advantages in almost every case where it applies, there are still instances where the only concrete results have been found with the subadditive method.
For example, subadditive methods work quite naturally for traveling salesman problem (TSP), but when one tries to apply the objective method to the TSP one some deep and interesting questions emerge. Specifically, one needs to define an analog 7_ of the traveling salesman tour for the Poisson process in Rd, and in practical terms this means one needs to specify an inclusion criterion that tells one when an edge (x, y) is in T. We were able to provide such an inclusion criterion for the maximal matching problem, and classical results for the MST suggested the defining criterion for the minimal spanning forest. On the other hand, the theory of computational complexity suggest that one is unlikely to find an inclusion criterion that will provide a suitable analog for the TSP of the Poisson process. The basic intuition is that no problem that is NPcomplete for finite sets of points can be expected to have an inclusion criterion that provides a suitable limit object on the Poisson process.
64 David Aldous and J. Michael Steele
FURTHER COMPARISON TO SUBADDITIVE METHODS
In addition to the examples suggested by computational complexity, there are other cases where subadditive methods seem to have an advantage over the objective method. One such situation is provided by the asymptotic analysis of heuristic methods, and the greedy matching problem [111 is perhaps the most natural example.
Nevertheless, there are surely cases where subadditive methods have been used in the past, but where the objective now provides an easier or more informative approach. Confirmed cases on this list include the limit theory for MST with power weighted edges [58] and MST vertex degrees [61]; and, with some systematic effort, the list can probably be extended to include the theory of Euclidean semimatchings [591, optimal cost triangulations [561 and the Kmedian problem [35]. Moreover, there are many cases where the objective method quickly gives one the essential limit theory, yet subadditive methods appear to be awkward to apply. Here the list can be made as long as one likes, but one should certainly include the limit theory for Voronoi regions [32] and the sphere of influence graphs Q27], [33]).
6.4 Recurrence of Random Walks in Limits of Planar Graphs
Most of the local weak limits that we have surveyed here have their origins in combinatorial optimization, but the objective method and the theory of local weak convergence can also be applied to problems that are more purely probabilistic. One striking example is given by the recurrence theory for random walks on planar graphs recently given by Benjamini and Schramm [15].
Theorem 9 Suppose that G is a 9,valued random variable that is a weak limit in G~ of a sequence {Gn [X,] : n = 1, 2,...} of uniformly rooted random graphs where
(1) for each n > 1, the graph G. is planar with probability one, and
(2) max{deg(v) : V E Gn} 5 M < 00 for all n > 1, then with probability one, the random walk on the graph G(w) is recurrent.
This theorem is clearly part of the theory of local weak convergence, but the theorem and its proof differ greatly the other results in this survey. Even a sketch of the proof of Theorem 9 would take us far off our path, but we should at least note that one key to the proof is the remarkable Circle Packing Theorem of Koebe [411. This theorem says that if a finite graph G = (V, E) is planar, then there exist a collection C of circles in the plane with disjoint interiors and a onetoone correspondence 0 : V ~ C such that (X, Y) E E if and only if the circles 0(x) and 0(y) make contact.
From the perspective of this survey, there are two attractive features of Theorem 9. First, its very difference serves as a reminder that there must be other, as yet unimagined, areas where the theory of local weak convergence can play a natural role. Second, Theorem 9 suggest a useful generic question:
The Objective Method 65
What are the properties of a sequence of random rooted finite graphs that are preserved under local weak convergence? In many cases this question is bound to yield only pedestrian answers, but the BenjaminiSchramm Theorem shows that there are indeed cases when the generic question can lead one to remarkable results.
7 Limitations, Challenges, and Perspectives
This treatment of the objective method is part survey and part tutorial. The basic intention has been to make the objective method more accessible, while also illustrating its effectiveness on problems with some subtlety and substance. Any survey has an element of advocacy, and here we certainly hoped to show the objective method in a positive light. Nevertheless, one should be aware of the method's limitations.
SOME INTRINsic LimITATIONS
First, the objective method deals with reasonably complex entities such as the Skeleton Tree PGW' (1), the PWIT, or de rninimus the Poisson process in Rd. In many cases, one can make immediate and effective use of these offtheshelf structures, but in more original problems, one is forced to invent (or discover) new objects. Any such object begins life as the weak limit in G, of a sequence of random finite graphs with uniformly distributed random roots, but, if it is to be used with effect, one almost always needs an independent characterization. This highly constructive feature of the objective method means that it seldom provides short proofs. Ironically, this remains so even when the result that one wants to prove is "conceptually clear" from the point of view offered by the limiting object.
Second, when one focuses directly on a limit object, one tends to lose almost all trace of how one gets to that limit. As a consequence, the objective method is typically ill suited for obtaining information about rates of convergence. To be sure, there are special circumstances where rates may be found; for example, the recursion used in our analysis of the maximum partial matching problem provides at least some rate information. Nevertheless, this is an exception that reflects the hybrid nature of one particular problem.
Third, the theory of local weak convergence has a substantial limitation on the type of information that it can provide. Basically, the method only addresses those problems that may be reduced to a calculation in a neighborhood of a randomly chosen root. For example, we are able to address totalsumoflength problems only because of a lucky piece of arithmetic; the sum of the lengths of the edges in a graph with n vertices is just n times the expected value of half of the sum of the lengths of the edges incident to a randomly chosen root.
Many similar sounding problems are not so lucky. For example, consider the problem of calculating the expected length of the longest path in the
1
66 David Aldous and J. Michael Steele
MST of a uniform random sample of size n from the unit square. This problem sounds quite similar to the total length question, but, on reflection, one finds that it cannot be reduced to a calculation that just depends on a neighborhood of a random root; thus, it falls outside of the scope of the objective method. For the same reason, the objective method cannot help us show that the variance of A, is asymptotic to a 2 /n, nor can it help with more general measures of concentration. Local weak convergence is always a prisoner that cannot escape the neighborhoods of a random root.
A CURRENT CHALLENGE: THE LARGEST INDEPENDENT SET
The limitations on the objective method are substantial, but one still finds a wide range of challenges for which the objective method offers the most likely path for progress. Here we will just discuss one problem that we find especially attractive.
An independent set S in a graph C is collection of vertices such that no two elements of S are joined by an edge of G, and the independence number a(G) is the maximum cardinality of an independent set in G. The independence number is one of the most studied quantities of graph theory, and it also leads one to an interesting test of potential of the objective method.
For any integer k > 3 there is a standard and model for a random kregular graph with n vertices, and the properties of these graphs a studied in detail in the monographs of Bollob6s [171 and Janson, et.al. [361. Now, if one starts to think about the independence number of a random kregular graph Gk,n, one almost immediately comes to conjecture that there exists a constant 0Q > 0 such that
lim nlE[a(Gk,J] = ak. (7.84)
noc
The origins of this conjecture are lost in the mists of time, and no doubt it has occurred independently to many people. From the perspective of this survey, one of its interesting features is that it offers several parallels to the random assignment problem as that problem was understood circa 1986.
For example, no one has yet proved that indicated limit (7.84) actually exists yet no one doubts that it does. Moreover, some concrete limits have already been place on the possible size Of ak. Specifically, Frieze and Suen [241 studied the greedy algorithm for constructing an independent set for G3,n, and they have proved
liminfn'E[a(C3,,,)] ~~ 61og(3/2) 2 = 0.432 ... (7.85)
noc
Finally, the methods of statistical mechanics have already found a role. In particular, Hartmann and Weigt [31] have used the replica method to study the independence number in a model that is closely related to Gk,~.
When we view conjecture (7.84) from the perspective of the objective method, we do not need long to hit on a promising plan. In fact, we can
1
The Objective Method 67
take one step almost automatically, since it is reasonably obvious that Gk,. converges in G, to the randomly rooted infinite kregular tree T. Thus, the intrigue only begins once one starts to look for a large involution invariant independent subset S of T, and, remarkably enough, an attractive candidate comes to mind almost immediately.
To define our candidate S, we first flip a coin. If it comes up heads, we take the root r to be an element of S, and then we add to S every other vertex on the paths that descend from r, but if the coin comes up tails, we take all of the children of r to be in S and then add to S every other vertex on the paths that descend from these children. The set S is clearly an independent set in 7, and one can check that S is involution invariant. No checking is needed to see that S has density onehalf, and, since no independent set can have density larger than onehalf, the principles developed in Section 3 might seem to suggest that we are well on our way to proving that a, = 1/2 for all r.
Unfortunately, we meet a bump in the road. Our candidate S cannot be realized as the weak limit of a sequence of independent subsets Of G,,,k, even though S does satisfy the necessary condition of involution invariance. The problem with S is that it exhibits a extreme form of longrange dependence, and, in this particular case, one can exploit that dependence to show that S is not the weak limit of sequence independent subsets Of Gn,k as n+ oo,
Thus, we come back to the challenge of defining an independent subset T that can be realized as the limit of independent subsets of the sequence G,,k, n = 1, 2,.... One natural thought is that it may be possible to characterize ak as the maximum density of an involution invariant random independent set in T that does not exhibit some particular type of longrange dependence, but, so far, there are only the faintest hints how one might specify the precise form of dependence that one should rule out.
SOME FINAL PERSPECTIVES
One can never know how a mathematical theory will evolve, and even wellinformed guesses are still just guesses. Nevertheless, the objective method and the theory of local weak convergence have a certain inevitability to them that invites speculation.
At the most concrete level, the objective method helps one to decouple two basic tasks: the proof of the existence of a limit and the determination of a limit constant. This contrasts with analytic methods where techniques based on recursions and generating functions typically address these two tasks simultaneously. It also contrasts with subadditive methods where one may prove the existence of a limit and then be left without any serious indication of its numerical value.
At a more abstract level, the objective method helps one focus on the most essential elements of a complex limit theorem. Just m one finds in the traditional theory of weak convergence that Brownian motion appears as the limit of many different processes, one finds in the theory of local weak
68 David Aldous and J. Michael Steele
convergence that basic objects such as the skeleton tree and the PWIT appear as the limit of many different processes. These new structures do not have the amazing universality of Brownian motion, yet they do have some measure of universality. Moreover, even though these semidiscrete structures are simpler than Brownian motion, they still have substantial internal structure that can support concrete computation; the calculation in Lemma 4 showing that ((3) is the average length per vertex of the PWIT provides one example. Finally, the list of objects is surely far from exhausted, and perhaps, it has only been begun.
One surely expects to see new local weak limits with all the charm of the skeleton tree or the PWIT. Recent support for this view is offered by Angel and Schramm (2002) which proves local weak convergence of uniform random triangulations of the sphere with n vertices to a limit random triangulation of the plane and by Gamarnik (2002) where arguments motivated by the PWITbased analysis of the random assignment are applied to a linear programming relaxation of random KSAT.
Finally, we should note that the objective method is tightly bound with the theory of recursive distributional equations. For example, the fixed point equations (3.12) and (5.58) were at the heart of our analysis of the minimum cost perfect matching problems, and this situation seems to be typical. In fact, equations that characterize an unknown distribution by equality in distributional with a function of a collection of independent copies of itself arise in a wide variety of probabilistic settings. Aldous and Bandyopadhyay (2002) provide a survey of such equations, both with and without direct connections to the objective method.
Acknowledgements: The authors are pleased to thank Kenneth Alexander, Harry Kesten, Sungehul Lee, Mathew Penrose, Gordon Slade, Z. Su, and Joseph Yukich for kindly providing preprints, references, and comments. The research of David Aldous is supported by the NSF under Grant DMS9970901.
References
1. Aldous, D.J. (1990): A random tree model associated with random graphs.
Random Structures Algorithms, 1, 383402.
2. Aldous, D.J. (1991): Asymptotic fringe distributions for general families of
random trees. Arm. Appl. Probab. 1, 228266.
3. Aldous, D.J. (1992): Asymptotics in the random assignment problem. Probab.
Th. Rel. Fields, 93, 507534.
4. Aldous, D.J. (200l): The ~(2) limit in the random assignment problem. Random
Structures Algorithms, 18, 381418.
5. Aldous, D.J. and Bandyopadhyay, A. (2002): A Survey of Maxtype Recursive
Distributional Equations. Technical Report, U.C. Berkeley.
The Objective Method 69
6. Aldous, D.J. and Steele, J.M. (1992): Asymptotics of Euclidean Minimal Spanning Trees on Random Samples. Probab. Th. Rel. Fields, 92 247258.
7. Aldous, D.J. and Steele, J.11. (2002): The asymptotic essential uniqueness property for minimal spanning trees on random points, manuscipt in preparation.
8. Alexander, K.S. (1995) Percolation and minimal spanning forest in infinite graphs. Arm. Probab. 23, 87104.
9. Alm, S.E. and Sorkin, G.B. (2002): Exact expectations and distributions for the random assignment problem. Combinatorics, Probability, and Computing (to appear).
10. Angel, 0. and Schramm, 0. (2002): Uniform Infinite Planar Triangulations, arXiv:math.PR/0207153.
11. Avis, D., Davis, B., and Steele, J.M. (1988): Probabilistic analysis of a greedy heuristic for Euclidean matching. Probability in the Engineering and Information Sciences, 2, 143156.
12. Avram, F. and Bertsimas, D. (1992): The minimum spanning tree constant in geometric probability and under the independent model: a unified approach. Annals of Applied Probability, 2, 113130.
13. Balas, E. and Toth, P. (1985): Branch and bound methods. In The Traveling Salesman Problem: A Guided Tour of Combinatorial Optirnzation. Lawler, E.L, Lenstra, J.K, Rinnooy Kan, A.H.C., and Smoys, D.B. (eds), Wiley, NY.
14. Beardwood, 1, Halton, J.H., and Hammersley, J.M. (1959): The shortest path through many points. Proceedings of the Cambridge Philosphical Society, 55, 299327.
15. Benjamini, I. and Schramm, 0. (2001): Recurrence of distributional limits of finite planar graphs. Electronic Joural of Probability, 6, Paper No. 23, 113.
16. Bezuidenhout, C., Grimmett, Q, and L6filer, A. (1998): Percolation and minimal spanning trees. J. Statist. Phys., 92, 134.
17. Bollob19, B. (1985): Random Graphs. Academic Press, London.
18. Devroye, L. (1998): Branching processes and their application in the analysis of tree structures and tree algorithms. In M. Habib, editor, Probabilistic Methods for Algorithmic Discrete Mathematics, SpringerVerlag.
19. Dyer, M. E., Frieze, A. M., and MeDiarmid, C. (1986): On linear programs with random costs. Math. Programming, 35, 316.
20. Eppstein, D., Paterson, M.S. and Yao, F.F. (1997): On nearestneighbor graphs. Discrete Comput. Geom., 17, 263282.
21. Frieze, A.M. (1985): On the value of a random minimum spanning tree problem. Discrete Appl. Math., 10, 4756.
22. Frieze, A.M. and MeDiarmid, C3.11. (1989): On random minimum lenght spanning trees. Combinatorica, 9, 363374.
23. Frieze, A. and Sorkin, G.B. (2001): The probabilistic relationship between the assignment problem and asymmetric traveling salesman problems. Proceedings of SODA, ACM Publishers, 652660.
24. Frieze, A. and Suen, S. (1994): On the independence number of random cubic graphs. Random Structures and Algorithms, 5, 640664.
25. Gamarnik, D. (2202): Linear Phase Transition in Random Linear Constraint Satisfaction Problem. Technical Report: IBM TA. Watson Research Center.
26. Grimmett, C.R. (1980): Random labelled trees and their branching networks. J. Austral. Math. Soc. (Ser. A), 30, 229237.
27. Fiiredi, Z. (1995): The expected size of a random sphere of influence graph, Intuitive Geometry, Bolyai Mathematical Society, 6, 319326.
1
1
70 David Aldous and J. Michael Steele
28. Hara, T. and Slade, G. (1990): Meanfield critical behaviour for percolation in high dimensions. Commun. Math. Phys., 128, 333391.
29. Hara, T. and Slade, G. (1994): Meanfield behaviour and the lace expansion. Probability and Phase Transition (G. Grimmett, ed.), Kluwer, Dordrecht.
30. Harris, T.H. (1989): The Theory of Branching Processes. Dover Publications, New York.
31. Hartmann, A.K. and Weigt, M. (200l): Statistical mechanics perspective on the phase transition in vertex covering of finiteconnectivity random graphs. Theoretical Computer Science, 265, 199225.
32. Hayen, A. and Quine, M.P. (2000): The proportion of triangles in a PoissonVoronoi tessellation of the plane. Adv. in Appl. Probab., 32, 6774.
33. Hitczenko, R, Janson, S., and Yukich, J.E. (1999): On the variance of the random sphere of influence graph. Random Structures Algorithms, 14, 139152.
34. Henze, N. (1987): On the fraction of random points with specified nearest neighbor interrelations and degree of attraction. Adv. Appl. Prob., 19, 873895.
35. Hochbaum, D. and Steele, J.M. (1982): Steinhaus' geometric location problem for random samples in the plane. Advances in Applied Probability, 14, 5567.
36. Janson, S., Luezak, T., and Rucifiski (2000): Random Graphs. Wiley Interscience Publishers, New York.
37. Karp, R.M. (1979): A patching algorithm for the nonsymmetric traveling salesman problem. SIAM Journal on Computing, 8, 561573.
38. Karp, R.M. (1987): An upper bound on the expected cost of an optimal assignrnent. In: Discrete Algorithms and Complexity: Proceedings of the JapanU. S. Joint Seminar, Academic Press, New York.
39. Karp, R.M. and Steele, J.M. (1985): Probabilistic analysis of heuristics. In The Traveling Salesman Problem: A Guided Tour of Combinatorial Optirnzation. Lawler, E.L, Lenstra, J.K, Rinnooy Kan, A.H.G., and Smoys, D.B. (eds), Wiley, NY, 181205.
40. Kesten, H. and Lee, S. (1996): The central limit theorem for weighted minimal spanning trees on random points. Annals of Applied Probability, 6, 495527.
41. Koebe, P. (1936): Kontaktprobleme der Konformen Abbildung. Ber. Sschs. Akad. Wiss. Leipzig, Math.Phys. KI. 88, 141164.
42. Kurtzberg, J. M. (1962): On approximation methods for the assignment problem. J. Assoc. Comput. Mach., 9, 419439.
43. Lee, S. (1997): The central limit theorem for Euclidean minimal spanning trees I. Annals of Applied Probability, 7, 9961020.
44. Lee, S. (1999): The central limit theorem for Euclidean minimal spanning trees Il., Advances in Applied Probability, 31, 969984.
45. Lee, S. and Su, Z. (200l): On the fluctutation in the random assignment problem. Technical Report, Yonsei University Department of Mathematics, April 2001.
46. LovAsz, L. and Plummer, M.D. (1986): Matching Theory. Annals of Discrete Mathematics, vol. 29. NorthHolland Publishers. Amsterdam.
47. Mahmoud, H.M. (1992): Evolution of Random Search Trees, Wiley, New York, 1992.
48. Matougek, J. and Negtfil, Y. (1998): Discrete Mathematics, Oxford Universtity Press, Oxford.
The Objective Method 71
49. Meyn, S.P. and Tweedie, R.L. (1993): Markov Chains and Stochastic Stability, SpringerVerlag, New York.
50. M6zard, M. and Parisi, G. (1987): On the solution of the random link matching problem. J. Physique, 48, 14511459.
51. Parisi, G. (1998): A conjecture on radorn bipartite matching. ArXiv Condmat 9801176.
52. Penrose, M.D. (1996): The random minimal spanning tree in high dimensions. Ann of Probability, 24 19031925.
53. Penrose, M.D. and Yukich, J.E. (2002): Weak laws of large numbers geometric probability, Annals of Applied Probability, to appear.
54. Propp, J. and Wilson D. (1998): Coupling from the past: a user's guide, In D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability, number 41 in DIMACS Ser. Discrete Math. Theoret. Comp. Sci., pages 181192.
55. Steele, J.M. (1981): Subadditive Euclidean functionals and nonlinear growth in geometric probability. Annals of Probability, 9, 365376.
56. Steele, J.M. (1982): Optimal triangulation of random samples in the plane. Annals of Probability, 10, 548553.
57. Steele, J.M. (1987): On Frieze's ~(3) limit for the lenths of minimal spanning trees, Discrete Applied Mathematics, 18, 99103.
58. Steele, J.M. (1988): Growth rates of Euclidean minimal spanning trees with power weighted edges. Annals of Probability, 16, 17671787, 1988.
59. Steele, J.M. (1992): Euclidean semimatchings of random samples. Mathematical Programming, 53, 127146, 1992.
60. Steele, J.M. (1997): Probability Theory and Combinatorial Optimization, NSFCBMS Volume 69. Society for Industrial and Applied Mathematics, Philadelphia.
61. Steele, J.M., Shepp, L.A. J.M. Eddy, W. (1987): On the number of leaves of a Euelidean minimal spanning tree. J. Appl. Probab., 24, 809826.
62. Talagrand, M. (1995): Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math. IHES, 81, 73205.
63. Vitter, J.S. and Flajolet, P. (1990): Analysis of algorithms and data structures. In Handbook of Theoretical Computer Science, volume A: Algorithms and Complexity (Chapter 9), NorthHolland, 431524.
64. Walkup, D. W. (1979): On the expected value of a random assignment problem. SIAM J. Comput., 8, 440442.
65. Yukich, J.E. (1998): Probability Theory of Classical Euclidean Optimization Problems, Lecture Notes in Mathematics, 1675, SpringerVerlag, New York.
Index
Aldous, D.J., 2, 8, 10, 30, 41, 44, 50, 9,5
52, 53, 55, 56, 62 g.,5
Alexander, K.S., 62, 67 as metric space, 5
Alm, S.E., 2 geometric graphs, 5
assignment problem, 1 greedy matching
variance conjecture, 54 on PWIT, 36
Avis, D., 63 Grimmett's Lemma, 8
Avram, R, 44 Grimmett, G., 5, 8, 13, 60, 61
Balas, E., 53 Halton, J.H., 62
Beardwood, 1, 62 Hammersley, J.M., 62
Benjamini, L, 4, 63 Hara, T., 62
Bertsimas, D., 44 Harris, T.H., 31
Bezuidenhout, C., 60, 61 Hartmann, A.K., 65
Bland, R., 55 Hayen, A., 60, 63
Bollobfis, B., 65 Henze, N., 60
bonus, 12 Hitczenko, R, 60, 63
computational complexity, 62 Hochbaum, D., 63
convergence in 6 inclusion criteria, 62
convergence in 6 independent set conjecture, 65
Cox process coupling, 60 influence function, 58
Davis, B., 63 involution invariance, 37
Devroye, L., 8 definition, 38
distributional identity, 16, 17, 45 MST convergence, 39
Dyer, M.E., 2 Janson, S., 60, 63, 65
Eppstein, D., 60
Euclidean MST, 55 K~, 28
Karp, R.M., 1, 53, 54
fixed point equation, 16 Kesten, H., 60
probabilistic solution, 22 Koebe, P., 63
Flajolet, P., 10
forest vs. tree, 61 Lee, S., 54, 60, 67
free energy, 61 local weak convergence, 1, 5, 6
Frieze's ~(3) theorem, 33 Lbffler, A., 60
Frieze, AM, 2, 4, 33, 34, 40,54, 65 logistic distribution, 45
Ffiredi, Z., 60, 63 Loviisz, L., 11
1
74 Index
Luezak, T., 65 definition, 29
length per vertex, 32
Mahmoud, H.M., 10 limit theorem, 29
matching
on K, 51 Quine, M.P., 60, 63
on PWIT, 49
matching lemma random planar graphs, 63
on PWIT, 49 random regular graphs, 65
Matou~ek, 1, 31 recurrence, 63
maximal weight partial matching, 10 recursion, 12
McDiarmid, G, 2, 34 rooted geometric graph, 5
meanfield model, 27 Rucifiski, 65
Meyn, S.P., 23
M6zard, M., 2 Schramm, 0., 4, 63
minimal spanning forest, 31 skeleton tree, 7
minimal spanning tree Slade, G., 62
in a graph, 31 Sorkin, C.B., 2, 54
MSF, 31 stability, 57
MST standard construction, 7
Euelidean space, 55 Steele, J.M., 2, 4, 34, 41, 44, 5356, 58,
in a graph, 31 62
MST convergence theorem, 40 Su, Z., 54
subadditive methods, 62
NeWil, Y., 31 Suen, S., 65
objective method, 1 Talagrand, M., 54
future, 66 T, 7
limitation, 64 Toth, R, 53
triple tree process, 47
Parisi's conjecture, 2 Tweedie, R.L., 23
Parisi, G., 2
partial matching problem, 2 Vitter, J.S., 10
Paterson, M.S., 60
Penrose, M.D., 4, 59, 60, 67 Walkup, D. W., 1
percolation, 61 weak law of large numbers
perfect matching for stable functionals, 59
minimal cost, 35 Weigt, M., 65
Plummer, M.D., 11 Wilson D., 24
PGW(I), 8
Propp, 1, 24 Yao, F.F., 60
PWIT Yukich, J.E., 4, 5860, 63, 67