Please see PDF version


J. Michael Steele, Princeton University

1. Introduction

The purpose of this report is to show the applicability of some recent work in computer science to a problem of current interest in robust regression. In particular, pointline duality and linesweep methods are applied to provide an algorithm for computing least median of squares regression lines. The algorithm which is obtained has a worst case time complexity of 0(n 2 log n) using only 0(n) space. This report should be thought of as an introduction to the more detailed papers Steele and Steiger (1986) and Souvaine and Steele (1986).
First, there is a brief review of some notions of robust regression and the motivation for regression methods with high breakdown point. The least median of squares regression method is introduced, and its computational issues are reviewed. The third section discusses pointline duality and its application to the problem of least median of squares regression. The fourth section develops the key observations of the sweep line method and sketches an algorithm for least median of squares regression.
The final section provides information about a second algorithm which is detailed in Souvaine and Steele (1986). The second algorithm uses more sophisticated data structures, has worst case complexity of only 0(n2), but suffers from a 0(n 2) space requirement Additionally, the final section reviews some other recent work in computer science which can be expected to have an impact on work in computational statistics.

2. Notions of Robust Regression and Breakdown Point
The breakdown point of an estimator is (roughly) the smallest amount of contamination of the data which can cause the estimator to take on arbitrary large values. The first extensive discussion of breakdown point is given in Donoho and Huber (1983), but they trace the concept to the thesis of F.R. Hample (1968). The discussion of Donolio and Huber lends much credence to the notion that the crude notion of breakdown is a very useful measure of robustness, especially as applied in small sample contexts.
Information about the breakdown properties of many robust regression proposals is provided in the key paper by Rousseeuw (1984) on Least Median of Squares Regression
(LMS regression), and LMS regression if found to compare favorably with all current proposals.
Formally, if we have data (xi,yi), 1 :5 i !~ n, and we wish to fit a line y = ax + b, we consider the set of residuals ri = yi  axi b associated with a and b. We then choose as estimates the d and 1; which minimize the feature of merit

2 f (a, b) = median ri
1 !5 i !5 A

One point to note is that if median is replaced by mean then the corresponding feature of merit would lead to a rephrasing of the procedure of ordinary least squares.
The first considerations of the computation of LMS

regression are in Rousseeuw (1984) and Leroy and Rousseeuw (1984), but the first formal study was given in Steele and Steiger (1986). For (xi,yi) in general position, it was proved that f (oc, P) has at least cn 2 local minima. One consequence of this large number of local minima is that traditional local optimization methods will prove fruitless for determining d and 1;.
One must confront a combinatorial problem, and Steele and Steiger (1986) showed that there is a reduction to a tractable combinatorial procedure which yields an exact minimization of f via a finite search with worst time complexity of 0(n 3 ).
Souvaine and Steele (1986) give two faster algorithms for computing LMS regressions. The first of these depends upon the technique of point line duality and the sweepline method. This algorithm was shown to have worst case time complexity 0(n 2 log n) with a space requirement of 0(n). The second algorithm required more complex data structures, needed space of size 0(n 2 ), but was able to provide worst case time complexity of 0(n 2 ). For the purpose of this exposition, perhaps the best objective is to review just the first of these two methods and suggest how such technology is used in LMS regression. Suggestions are also made that might be useful in related work.

3. Point Line Duality and Its Application
The duality which will concern us here is given explicitly by the transformation T on points and lines which takes the point (a, b) to the line y = ax +b and which takes the line y = ax + b to the point (a, b).
Under this transformation, the point P that is determined by two lines L, and L 2 is mapped to the line TP which is determined by the two points TL , and TL 2. Likewise the line L determined by two points P 1 and P 2 'S mapped to the point TL determined by the two lines TP 1 and TP 2. These relations lie at the base of duality and they are shared by the classical transformation S of Poncelet which maps the point (a, b) to the line ax +by + 1 = 0 and maps the line ax + by + 1 = 0 to the point (a, b). It is interesting to. note that S 2 = 1, but T 2 * 1 where I is the identity mapping. This small loss of mathematical symmetry is well compensated by other properties of T which will be identified shortly.
In statistical work the point/line duality T has been used most recently in the work of Johnstone and Velleman (1985) which gives an algorithm for a different robust regression method, the resistant line. In data analysis and in simple linear regression, duality has also been used by Emerson and Hoaglin (1983), Dolby (1960), and Daniels (1954).
The benefit of the transformation T is that, in addition to the basic duality properties, it has some important order invariance properties. The first of these ordering properties is that the transformation T preserves the relationship of


i 1




'.above" and "below" for points and lines. That is, if P is above L then TP is above TL. Secondly, the transformation T preserves the vertical distance between points and lines; TL is above TP by exactly the same distance that L is above P, etc. Understandably, this property is crucial in a problem where a median residual is to be minindzed over a selection of lines. One simply has to have some form of isometric invariance in the duality if it is to prove at all useful.

The final important order invariance property of T is that if the slope of L is larger than the slope of L, then the xcoordinate of TL is smaller than the xcoordinate of TV. This relation is trivial from the definition, but it is still very useful in practical applications.
In order to phrase LMS regression as a finite problem which can be precisely dualized, it is worthwhile to introduce some terminology relevant to its local structure. First, for any a and the line 1,,, = {(x,y): y = ax + P} defines residuals ri (c~ which are functions of a and P. We can
typically write ri ((x, P) as ri without fear of confusion. We
say the line 1C0 bisects three distinct points
(xi J yi J ) j = 1, 2, 3 if all of the ri J are of the same.magni
tude r but not all have the same sign. If xi, < x,, < xi, and
ri, = ri, ri, we say 1,, p equioscillates with respect to the points.
It was proved in Steele and Steiger (1986) that the LMS regression line must be an equioscillating line. Since

there are at most

such lines, a naive algorithm would

be to examine them alL Obviously, any algorithm faster than 0(n b must work differently.
Given an equioscillating line 1 there are two related lines with slope cc; the line L, determined by the points p, = (xi,,yi,) and P3 = (xi,,yi,), and the line L2 which goes
through the point P 2 = (xi,,yi,). A key property of the LMS regression line 1.0 is that the number K of points between L 1 and L 2 must satisfy

{1(n  4)12 n even
(n  5)12 n odd

provided n > 3 (c.L Steele and Steiger (1986), Main Lemma).
One simple method of determining the LMS regression line can now be given. For each triple of data points p l, p 21 and P 3 which are ordered by x coordinate, we first determine lines L, and L 2 as above. Next we find such a triple with exactly K data points between L, and L 2 and such that the vertical distance between L, and L 2 is minimized. The LMS regression line is the line which goes exactly between L 1 and L 2*
We can now see the rephrasing of our problem obtained by applying T to the data points and the lines they determine. The condition that there are K points between the lines L 1 and L2 becomes the condition that there are K lines between the points TL 1 and TL 2. Further, TL 1 is the point determined by the two lines TP 1 and TP 3; TL 2 is the point on the line TP2 which has the same xcoordinate as

Our algorithmic problem in dual can now be expressed as follows:
Given n lines Li, 1 :~ i !~ n, in general posi
tion in the plane with intersection points
PiP 1 5 i < j :5 n, find that line L* and inter
section point P* such that among all linepoint
pairs (L, P) which have exactly K of the Li cut
ting the vertical segment S joining L and P, the
pair (L*, P*) has the smallest vertical distance.
The sweepline technique for solving this dualized problem will be given in the next section.

4. SweepLine Algorithm and Space Efficiency
The sweepline technique "sweeps" the plane by combinatorially moving a vertical "sweepline" from the left to the right. Souvaine and Steele (1986) provide what is probably the first application of this technique in the area of regression, but Shamos and Hoey (1976) and Bentley and Ottinan (1979) have shown that sweepline techniques are applicable to a variety of problems in computational geometry.
The sweepline algorithm requires two offtheshelf data structures. The first of these (to be called) LIST will maintain our set of n lines in a geometrically meaningful order. LIST can be implemented as a simple linked list but we will want to augment it with some additional pointers which will help LIST interact with the second data structure.

Our second structure will be used to store a certain subset of n  1 out of the set of n (n  1)12 intersection points Pip The structure we create will permit us to access the 11 smallesf' element in time 0(1) and permits the insertion or deletion in time 0(log n). The ordering we will use to give meaning to "smallest" is that P,, < The initialization of these structures will show how
they will be used and will help make the algorithm tran
sparent. We begin by computing a value A such that all of
the intersection points lie to the right of a line L 1 x = A.
This would be trivial to do this task in time 0(n by com
puting the values Pij successively and retaining only the
leftmost. It is instructive (and illustrative of a basic tech
nique) to see how this can be done in time 0(n log n).
To find A fast, we first note that to the left of all of the
intersection points the lines are themselves ordered slope, i.e. the line with lowest slope is above all of the other lines, etc. This observation implies that if we first sort the lines Li, 1 :5 i :9 n, by slope then the leftmost intersection point Pi, must be an intersection of two lines w ch are adjacent in the list of lines sorted by slope. Thus, to find the leftmost intersection point we can successively compute the intersection points of adjacent elements of the slopesorted list of lines, keep only the leftmost Pul and proceed down the list. This computation requires


Olit log n) time because of the preliminary sorting of the worth understanding that there are more sophisticated
lines by slope. We did not have to dig out a clever way of methods for the systematic search of an arrangement and
executing this step, but doing so sets up a similar idea there may be instances where the associated overheads
which will be crucial for our overall time complexity. (both computational and intellectual) are justifled.
We next introduce another vertical line L which we There is one particularly relevant data structure due to
call the sweepline and we initialize L to L, The line L is Chazelle (1984) called a Hammock and which is used in
not essential to the algorithm but it helps articulate a pro Souvaine and Steele (1986) to provide a 0(n~ time 0(n2)
gram invariant which helps make verification of the algo space algorithm for LMS regression. Moreover, there are
rithm clearer. data structures for searching higher dimensional arrange
We initialize LIST by placing the Li into the list in mentsdue to Edelsbrunner (1983) which should be
decreasing order of L, intercept. We will also augment the very useful in the study of LMS methods for multivatiate
structure holding the L,~ by storint along with each of the Li regression (at least as a theoretical tool).
a set of four pointers Upl, UpK, Downl, and DownK which Finally, the technique of line sweep has been
will point to the lines which are 1 above Li, K above L,, extended recently by Edelsbrunner and Guibas (1986) to
and so forth in the ordering which is given by LIST. Since what they call "topological sweeping of an arrangement."
lines too near the top or bottom will not have lines satisfy This is an extremely promising technology which will be
ing the required conditions we set the pointers to the null sure to have an impact on computational statistics.
value to flag this situation. References
A key geometric observation parallel to the idea used
to compute L,, is that from the whole set of n (n  1)/2 inter Aho, A.V., Hoperoft, LE, and Ullman, J.1). (1974).
section points Pip 1 5 i < j :5 n, the one which is The Design and Analysis of Algorithms,
nearest to the sweepline L must be determined by a pair of
AdisonWesley, Reading, MA.
lines {L,, L,} which are adjacent in an ordered list which
was initialized by L,,intercept. This observation permits Bentley, LL. and Ottinan, T.A. (1979). "Algorithms
us to keep a set of candidates for the "next point to explord' for reporting and counting geometric intersec
which is at most of cardinality n  1.
tions", IEEE Trans. Comp., C28, 643647.
Formally, we pass through LIST and for each adjacent
pair Li and L, in LIST we install Pij in BEAP. We do this Chazelle, B. (1984). 1ntersecting is easier than
while respecting the ordering of the Pij by xcoordinate sorting", Proc. 16th ACM STOC, 125135.
and we end up with a heap of size n  1 with the leftmost
of the Pi, installed at the root. As an important adjunct to Chazelle, B. Guibas, L.L and Lee, D.T. (1983). "The
the heap building process we make a double set of pointers power of geometric duality," Proc. 24th IEEE
which associate the elements of BEAP and the elements of FOCS, 217225.
LIST. Specifically, for each adjacent pair Li and Lj in
LIST (and each corresponding intersection point PO in Daniels, H.E. (1954). "A distributionfree test for
HEAP) we create pointers from Li and L, to P,, and regression parameters," Annals of Mathemati
to Li and L,. cal Statistics, 25, 499513.
pointers from P ii
This step completes our preprocessing and initializa
tion phase, so we should make a preliminary tally of Dobkin, D.P. and Souvaine, D.L. (1986). "Computa
resources consumed. First, because of the sorting, LIST tional Geometry  A Users Guide," Advances
requires time 0(n log n) to build and space 0(n) to store. in Robotics P Algorithmic and Geometric
Since BEA.P is also of size 0(n), it can be created in time Aspects of Robotics, LT. Schwartz and C.K.
Yap (Eds.), Lawrence ErIbaum Associates,
0(n) and space 0(n) using wellknown methods.
Hillsdale, NS.
The algorithm for computing the LMS regression line
is now almost trivial using the data structures we have Dolby, LL. (1960). "Graphical procedures for fitting
built. The picture to keep in mind is that of sweeping the the best line to a set of points," Technometrics,
vertical line L from L,, until it has passed through all of the 2,477481.
points. What we will do is manage LIST and HEAP in
such a way that (1) LIST always provides the ordering of Donolio, D.L. and Huber P.J. (1083). "The notion of
L, according to decreasing order of Lintercept, (2) the breakdown point", A Festschriftfor Erich L.
pointers stored in LIST remain consistent with their initial Lehman, P.L Birkel, K. Doksum, and LL.
definition, and (3) HEAP always stores as its minimal ele Hodges (Eds.), Wadsworth, Belimont CA,
merit the intersection point Pii which has smallest x 157184.
coordinate of any intersection point to the right of L.
Edelsbrunner, H. O'Rourke, J. and Seidel, R. (1983).
5. Final Remarks "Constructing arrangements of lines and
The sweepline method which was just discussed is hyperplanes with applications," Proc 24th
easy to implement and has substantial generality. Still, it isIEEE FOCS, 8391.


~ 1

i J
f 1

Edelsbrunner, H. and Guibas, L.G. (1986). '7opologically sweeping an arrangement," Technical Report~ Stanford University, Department of Computer Science.

Emerson, J.1). and Hoaglin, D.C. (1983). "Resistant lines for y versusx." In D.C. Hoaglin, F. Mosteller and J. Tukey (Eds.), Understanding Robust and Exploratory Data Analysis, John Wiley, New York.

Hample, F.R. (1968). "Contributions to the Theory of Robust Estimation," Ph.D. Thesis, University of California, Berkeley.

Harary, Frank (1972). Graph Theory, AddisonWesley Publishing Company, Menlo Park, CA.

Johnstone, I.M. and Vellernan, P.F. (1985). "The resistant line and related regression methods," J. Amer. Statis. Assoc., 80, 10411054.



Leroy, A. and Rousseeuw, P.J. (1984). "PROGRESS: A program for robust regression," Report 20 1, Centrum voor Statistiek en Operationell Onderzoek, Univ. Brussels.

Rousseeuw P.J. (1984). 1east median of squares regression," J. Amer. Statist. Assoc., 79, 871880.

Shamos, M. and Hoey, D. (1976). "Geometric intersection problems, IEEE FOCS Conference, Houston, TX.

Souvaine, D.L. and Steele, LM. (1986). "Time and space efficient algorithms for least median of squares regression," Technical Report, Program in Statistics and Operations Research, Princeton University.

Steele, LM. and Steiger, W.L. (1986). "Algorithms and complexity for least median of squares regression," Discrete Applied Mathematics, 13 509517.



J. Michael Steelet, Princeton University


Gibbs models provide a flexible alternative intuition behind (1. 1), one should note that as t the
to traditional methods for specifying statistical measure g, tends to the uniform distribution on S. This
models. Because of the availability of cheap limit relation captures the notion that at high temperature
computation, the Gibbs alternative is now sub any state of S is as likely as any other state. Conversely, as
stantially competitive with traditional methods in r ) 0, the probability measure g, becomes increasingly
contexts where one can be satisfied with system concentrated on, the set {s: f (s) = minf (s')}, the interpre
simulations, but Gibbs models are at some disad
vantage in theoretical contexts because their tation of which is that at low temperature a few especially
analytical properties are more difficult to investi low energy states become extremely likely. '
gate than those of models built from independent Now, these stories from physics are all well and good
variables. but how do they influence probability models which do not
The first purpose of this talk is to present a have direct physical interpretation? The main point of
specific Gibbs model concerning random trees Steele (1987) is that combinatorial problems can often be
and show how MACSYMA proved of value in associated with an intuitive Gibbs model with a direct com
exandning its theoretical properties. A second binatorial interpretation. In fact, the Gibbs paradigm does
purpose is to initiate a broader discussion of the not need much pushing to be seen to be extremely useful in
role of MACSYMA in statistical (and other many situations which are quite removed from the usual
scientific) research. problems of thermodynamics.
While the preceding remarks seem necessary to
motivate our specific problem, the main purpose of the
1. Introduction to Gibbs Models present discussion is to show how MACSYMA was useful
in the theoretical exploration of a simple Gibbs model from
Most traditional models in statistics are based on combinatorics. A second purpose is to review for a statisti
functions of independent random variables, or more gen cal audience some of the literature on the use of symbolic
erally, on variables whose joint distribution is specified by computation in applied science.
means of a covariance structure. Although such models
serve us well in many contexts, there are numerous prob
lems where physical or otherphenomena exhibit a qualita
tive dependency which is not easily expressed by such 2. Trees of Prilfer and Wnyi
techniques. It is in such cases that the flexibility provided The specific Gibbs model which concerns us has its
by the method of Gibbs models can be especially useful. origin in the hunt for a tractable parametric family of ran
To illustrate the idea of a Gibbs measure in as con dom trees which can serve to model those trees which one
crete a way as possible we will concentrate on measures on finds in problems of computer science. To get some per
finite sets S. In the description of Gibbs models it is useful spective we will recall some results concerning the uniform
to employ some evocative language. In particular, the set distribution on random trees.
S will be thought of as the set of possible states of a One of the earliest results in the enumerative theory of
physical system, and several notions of thermodynamics combinatorics is the remarkable theorem of Cayley which
will guide our notation. The most essential ingredient of says there are exactly n`2 labeled trees on n vertices.
the method is the function f : S ~, E which we will call a When R6nyi (1959) took up the modeling of random trees,
potential function. The function f has several interpreta it was natural to consider the uniform model where any one
tions as either a measure of complexity or of likelihood, but of the Cayley's n`2 labeled trees would have the same
the bottom line is that we use f to define a measure on S probability of being chosen. The object of main interest to
by the formula R6nyi was the number of leaves of a random tree, and to
ef (SY1 obtain a handle on this problem, R6nyi relied on an elegant
Ms) = Z(t) code for labeled trees due to Prbfer (1918).
Here, of course, the denominator Z(t) is chosen in just the Pffifer's simple code is directly applicable to many
way needed to make S have total mass one, i.e. g,(S) = 1 questions concerning trees. One builds a onetoone
and Z(t) The choice of t as the indexing correspondence between a labeled tree on n vertices
PPP21 .... P,, and an (n 2)tuple with elements from
s {11, 2, n } via the following recipe:
parameter reflects the historical origins of li, where r car
ries the interpretation of temperature. From the perspective First we find the leaf P, with the highest value of
of statistics we should note that the change of variable the index i. We then delete Pi from the tree and
0 = lit brings (1.1) into the form of an exponential family record the index i of the unique vertex Pj to
with natural parameter 0. To benefit from the physical which Pi was adjacent. Ibis process is repeated


on the pruned tree until n  2 vertices have been This model has a number of benefits. First, it is an
deleted (and two vertices are left remaining). exponential model so many of the nicest results of
Such sequence of recorded adjacency indices is now mathematical statistics apply to the estimation of 0. Also,
called the MUfler code for the tree, and one can show it is a Gibbs model so the method of Metropolis can
without great difficulty that the coding provides a bona fide be used to generate random trees which satisfy the given
onetoone correspondence. As a consequence of PrUer's law. Finally, as shown in Steelc (1987), the random
coding, the formula of Cayley for the number of trees on n variables L. are asymptotically normal.
vertices becomes obvious; our code has n  2 places and The purpose of the next section is to illustrate how
each place can be held by n values, so n`2 is the cardinal MACSYMA entered into the exploration of this model and
ity of the set of labeled trees. how it helped lead to the resolution of the asymptotic nor
Although it cuts a good story short, it suffices to say mality of L. under the Gibbs model.
that R6nyi used the Prilfer code to show that T(n.k), the
number of labeled trees with k leaves and n vertices, must 3. Harper's Method and Computational Empiricism
satisfy the polynomial identity The central limit theorem for L. was obtained in
X Steele (1987) through the application of a teclmique which
T (n, k) kl was introduced by Harper (1967) for the study of Stirling
n2 numbers. Harper's basic idea is so simple it can be dis
X (2.1) cussed in a beginning probability course. Still, the method
k2 is quite powerful, and, even when it fails, it often brings a
[,. k new analytical insight into the problem.

This elegant expression permits one to make many So, what is this nifty method? Suppose you wish to
deductions concerning the number of leaves of a tree study a random variable Y with probability generating
chosen at random from the set of all n`2 labeled trees. In function h (x), and you consider the equation:
particular, if we let L. denote the number of leaves of a h (x) =PO +P IX +P 2:X2+ .+P"Xn = 0. (3.1)
random tree then one can easily obtain an asymptotic for Now, suppose for some reason you know that (3. 1) has
mula for the expected value for the number L,,: only real roots, ri, 1 :5 i :9 n. Since the pi are non
EL. ~ n le. (2.2) negative, the ri are all nonpositive, and h can then be fac
With more effort one can also obtain the variance relation: tored into the form
L. ~ n (e  2)1e 2. (2.3) h (x) = 5x + r,) 1 (1 + ri) = nn(pix+ qi) (3.2)
i"l i1
Finally, R6nyi obtained the central limit theorem for Now, with this before us virtually all probability questions
L,, by means of substituting x = 1  iti~n into his key about Y are easily resolved, since (3.2) says that Y is equal
identity (2.1). Several technical problems needed to be to the sum of n independent Bernoulli random variables.
resolved.~, R6nyi as he moved from (2. 1) to the asymptot Returning to the Gibbs model of random trees, Steele
ics of Ee a ', but his analytical expertise prevailed. (1987) showed that only a little manipulating is required to
Now the interesting point comes upon us. R6nyi's establish that (1) Rtnyi's CLT (2) a BerryEssen
detailed analysis pertain to a model which is almost cer refinement of R6nyi's CLT and (3) the CLT for random
tainly too limited to apply to any of the empirical situations trees under the general Gibbs model will all follow easily
of interest! For example, suppose that for your set of once one proves that the equation
observed trees all of the trees have about n15 leaves, ftt
instead of nle leaves. The uniform model would be inap Y_T(k,n)xk = 0 (3.3)
plicable, and it seems to offer no place to turn. The ques k2
tion becomes, how can we obtain a parametric family of has only real roots.

trees with a natural parameter closely related to the number The handles we have on (3.3) are that T(n,k) can be
ofleaves? expressed in terms of Stirling numbers of the second kind
The answer is simple using a Gibbs model. We just by
let S be the set of all labeled tree on n vertices, and we
define f (s) to be the number of leaves of s 4= S. If we set T(n,k) = s(n2,nk)n!lk! (3.4)
r1, we can write (1.1) in the form and the Stirling numbers satisfy the classical recursion
ge(s) = e  (2.4)

where ~(0) = log(y_e ~Of W) and 0'(0) = E L, is the s (n, k ks (n 1, k) + s (n 1, k  1).
expected number of a random tree chosen according to the
probability measure (2.4). Once we learn enough about the geometry of zeros of
By varying our choice of 0 can make E.L. range polynomials, it is not hard to show (3.3) has only real roots;
through the whole interval Rn11, and thus we have a but before even starting to learn what needs to be known, it
family of measures on trees which is essentially indexed by seems to be almost a psychological requirement to find out
the expected number of leaves. whether Harper's method really has a chance. After all, it


is quite possible that (3.3) does have nontrivial complex polynomial transformation which preserves the property of
roots. having only real. mots. The interplay between Gibbs
Fortunately, one can easily investigate (3.3) using models and zero location are amusing in the combinatorial
MACSYMA and the following interactive code should context but there are also far more serious contributions
illustrate the point. (The c lines are entered command and with a similar flavor. Kac (1974) tells a charming story
the dlines are MACSYMA's replies. The other notations how a result on zeros due to P61ya (1926) was instrumental
should either be intuitive, or at least easily understood after in the work of C.N. Yang and T.D. Lee in the theory of
reading the companion article by Rand (1987).) phase transition of a lattice gas.
(cl) sfn,k]:=s[nl,kl*k +s[n1,k11; 4. MACSYMA in Applications
The use of MACSYMA reported here has several pro
(d 1) s.,k := s. _1,k k + s. _1,k _1; perties which seem typical of cases where MACSYMA
proves to be of real service in mathematical research. First,
(c2) for n:l through 50 do (s In, 11: 1); the assistance provided by MACSYMA came in the form
(d2) done of encouragement rather than proof. Second, MACSYMA
became engaged with the problem through the exploration
(c3) for n: 1 through 50 do (s In, n ]: 1); of a significant example, rather than by frontal assault on
the general problem. Third, MACSYMA provided a con
(d3) done venient interactive environment for performing a caicula
tion which could have been performed with far humbler
(0) t[n,k]:=s[n2,nk]lfactorial(k); tools.
(d5) t sn 2~4 ~k There is no reason to apologize for any of these pro
n k k ! ' perties, and in fact, there is some power in embracing these
{Technical Remark: One should note that t,,, differ features as the honest path of MACSYMA application.
from the values T(n,k) of (3.3) only by a factor of n!, The problem considered here was amenable to a very
a factor not impacting the location of roots.} straight forward application of MACSYMA as a tool of
exploratory mathematics. Still, even in modest explora
(c6) p[nl(x):=sum(x^* t[n,jl,j,2,n1); don, MACSYMA sometimes will buckle under the weight
of a seemingly reasonable request. Fortunately, calcula
W6) p. (x) := sum(x t. j, 2, n 1); tions in MACSYMA can be guided toward a problem along
(c7) p [5](x); many different paths and sometimes such guidance makes
a big difference. For a study of such a case and its happy
X 4 X3 X2 resolution, one should read the discussion provided in
(0) +  +  Campbell and Gardin (1982) concerning some symbolic
24 2 2 determinants.
{Technical Remark: This differs from the probability
generating function for L. by a factor of n!ln' 2, This report has emphasized the easy uses of
which for n = 5 is a reassuring 24125.} MACSYMA, but MACSYMA can indeed support exten
sive development. The possibilities of such development
(c8) all roots are well brought out by Watanabe (1984) which reports on
(d8) Ix = 0.0, x 0.0, x = 1.101020514433644, the successes and failures of a 1400 line MACSYMA ODE
x = ~10.89897948556636]; solver as applied to the 542 equations solved in Karnke
(c9) p [81(x); In addition to personal exploration, the best place to
1 31x 6 3x$ 65x 3 5X 3 X 2 learn the capabilities of MACSYMA is Rand's Computer
(d9) X + + + + +, Algebra in Applied Mathematics: An Introduction to
5040 720 4 24 2 2 MACSYMA. The examples in that volume guide one
through a useful exploration of MACSYMA's facilities,
All roots again can do its job and find that the roots but its principle benefit is that the focus is kept on the use
are, to more places than we care to record, of MACSYMA in scientific work, in contrast to profes
{O, 0, A.27, 0.99, 3.23, 14.20~198.28}. In the same sional involvement with symbolic computation. There are
way, we find for n  15 the roots of p [151(x) are numerous surveyi of the application of MACSYMA in sci
{0, 0, 0.065 56488.80}, and again we find that all ence and engineering, and one can get a good feeling for
of the roots are indeed real. the scope of MACSYMA applications from Engeler and
Before leaving this example, or embarking upon any MIder (1985), Fitch (1979), and Pavelle (1985).
summary observations about symbolic computation, some Finally, some comment should be made which ack
points should be made concerning zero locations and their nowledges that MACSYMA is but one of many symbolic
role in Gibbs models. First, we should note that proof of computational environments. Loos (1982) mentions that
the fact that (3.3) has only real roots is given in Steele there are at least 60 computer algebra systems. Besides
(1987), where a key lemma is provided by a result due to MACSYMA, the best known are probably muMATH,
P61ya and Schur (1914) which exhibits a polynomial to REDUCE, and SCRATCHPAD, and each of these has its






own virtues. One possibly historic development in symbolic computation took place this month with the introduction of the HewlettPackard 28C, the first handheld calculator to do symbolic algebra and symbolic calculus. If history serves as a reliable guide,symbolic algebra will soon be as available as there are people with a desire to participate in its benefits. t Research supported in part by National Science Foundation Grant #DMS8414069.


Campbell, LA. and Gardin, F. (1982). "Transformation of P61ya, G. and Szegb, G. (1976), Problems and Theorems in
an intractable problem into a tractable problem: Analysis 11, SpringerVerlag, New York, p. 45.
evaluation of a determinant in general variables,"
Computer Algebra: EUROCAM '82 (0. Goos and J. P61ya, G. (1926). "Bemerkung Uer die Integraldarstellung
Hartmanis, eds.), Lecture Notes in Computer Science der Riemannschen gfunction," Acta Math., 48, 305
144, SpringerVertag, New York. 317 (also George P61ya: Collected Papers Vol 11
Location of Zeros, pp. 243255, R.P. Boas, ed., MIT
Engeler, E. and MAder, R. (1985). "Scientitic computation: Press, Cambridge Massachusetts).
the integration of symbolic, numeric and graphic com
putation," in EUROCAL '85 (G. Goos and J. Hart P61ya, G. and Schur, 1. (1914). "Uber zwei Arten von Fak
manis, eds.), Lecture Notes in Computer Science, 203, torenfolgen in der Theorie der algebraischen
SpringerVerlag, New York. Gleichungen," J. Reine Angew. Math, 144, 89113
(also George Pblya: Collected Papers Vol 11 Location
Fitch, LP. (1979). "The application of symbolic algebra to of Zeros, 100124, R.P. Boas, ed., MIT Press, Cam
physics  a case of creeping flow," in European Sym bridge Massachusetts).
posium on Symbolics and Algebraic Computation (G.
Goos and J. Hartmanis, eds.), Lecture Notes in Com PrUer, A. (1918). "Neuer Beweis eines Satzes uber Permu
puter Science 72, SpringerVerlag, New York. tationen," Archiv fur Mathemcrik und Physik, 27, 142
Harper, L.H. (1967). "Stirling behavior is asymptotically
normal, Ann. Math. Statis., 38,410414. Rand, R.H. (1987). "Computer algebra applications using
MACSYMA," in this volume.
Kac, M. (1974). "Comments on 'Bemerkung bber die
Integraldarstellung der Riemannschen gfunction,"' pp. Rand, R.H. (1984). Computer Algebra in Applied
424426 (also George Pblya: Collected Papers Vat 11 Mathematics: An Introduction to MACSYMA, Pitman
Location of Zeros, pp. 243255, R.P. Boas, ed., MIT Advanced Publishing Program, Boston.
Press, Cambridge Massachusetts). R~nyi, A. (1959), "Some remarks on the theory of trees,"
Karnke, E. (1959). Differential Gleichungen MTA Mat. Kut. Int. Kozi., 4, 7385.
Lbsungsmethoden and Usungen, Chelsea, New York. Steele, I.M. (1987). "Gibbs measures on combinatorial
Loos, R. (1982). Introduction to Computer Algebra Sym objects and the central limit theorem for an exponen
bolic and Algebraic Computation, (eds. Buchberger, tial family of random trees, Probability in the
B., Collins, G.E., and Loos, R.), Computing Sup Engineering and Information Sciences, 1, 4759.
plementurn 4, SpringerVerlag. Stoutemayer, D.R. (1985). "A preview of the next IBMPC
Metropolis, K, Rosenbluth, A.W., Rosenbluth. M.N., version of muMATW' in EUROCAL'85, G. Goos and
Teller, A.H., and Teller, E. (1953), "Equations of state J. Hartmanis, eds., Lecture Notes in Computer Science
calculation by fast computing machines, J. Chem. Phy 203, SpringerVerlag, New York.
sics, 21, 10871092. Wa 1 tanabe, 5. (1984). "An experiment toward a general
Pavelle, R. (1985). "MACSYMA: capabilities and applica quadrature for second order linear ordinary differential
tions to problems in engineering and the sciences," in equations by symbolic computation," EUROSAM '84,
EUROCAL '85, G. Goos and 1. Hartmanis, eds., Lec G. Goos and 3. Hartmanis, eds., Lecture Notes in Com
ture Notes in Computer Science 203. SpringerVerlag, puter Science 174, SpringerVerlag, New York.
New York.