Please see PDF version







Optimal Strategies for Second Guessers

J. MICHAEL STEELE and JAMES MEK*
A model is given for a class of contests in which the participants derives from a hierarchical linear model like that studied
try to guess (or estimate) unknown quantities, and the objective in Lindley and Smith (1972). It is also closely connected
of each player is to come closer to the unknown quantities than an
adversary. A general optimality result is proved that gives the with the JamesStein estimator and was originally moti
beet guessing rules for the second guesser. These rules are first vated by the "Batting Average" example of Efron and
calculated exactly in a certain hierarchical linear model, and then
simpler approximate rules are given. Morris (1973).
Our program begins by establishing in Section 2 a
KEY WORDS: Guessing; Optimal strategies; Hierarchical linear formal theory of guessing contests. We also give a simple
model; Stein estimator; Posterior median. but very general optimality result that forms the basis
1. INTRODUCTION for the rest of the article.
The third section determines the exact optimal strategy
The goal in many activities or contests is not neces for second guessing under a certain linear model. Practical
sarily to do well in any absolute sense, but merely to approximations to this optimal strategy are worked out
outperform an adversary. The objective of this article in Section 4. The final section gives a critical discussion
is to provide a model for such a contest, establish the of the various sources of difficulties inherent in applying
optimality of certain procedures, and provide suitable this theory of guessing contests. While the main point of
approximations to these optimal procedures. But be this aiticle is to provide a tractable theory of guessing
fore yielding to the mathematics of the model, we wish contests, we feel that the largest single point established
to fix ideas with an anecdote. is the approximate optimality of the simple rule given
Two statisticians, Bob and Mike, engaged in a contest by (4. 1).
to guess weights of people at a party. They agreed that
Bob would always guess first. Mike would then guess,
and finally the person in question would say who is 2. HOTELLINGS STRATEGY
closer. For example, for person number one Bob guessed The structure of our guessing model can be described
137 pounds. Mike then guessed 137.01 pounds, and the by a system of four p vectors.
guest declared Mike the victor. The contest continued
in a similar vein, and to Bob's dismay he won barely a Target values: (01, 02, 0p) = 0
quarter of the time. First guess: J,, X21 1 X,) = X
It is intuitively clear that the second guesser has an Second guesser's hunch: (Y1, Y2, Yp) = Y
advantage, and one of the results of Section 2 shows that Second guess: (G,, G2, G,) = G
this advantage is typically as large as the 75 percent
obtained by Mike in the anecdote. The 0, represent the real values to be guessed. The Xi
To continue the story, Bob was so stunned by defeat are guesses made by the person who goes first, and all
and eager for revenge, that he elicited the assistance of a these are assumed to be available to the second guesser
professional weight guesser. Mike agreed that since the before he acts. The Yi represent the second guesser's
new team was so powerful it should be willing to make best estimate of the 0j. Finally, the Gi are the guesses to
all its guesses about the weights of the guests before Mike be announced by the second guesser. Our principal task
had to state any of his guesses. The team agreed to the is to determine how G should be based on X and Y.
proposed rule change, and Mike then proceeded to win The objective of each player is to come closer to 0
even more convincingly than before. than his opponent, so we begin by setting
The strategy used by Mike in the second case is
naturally more sophisticated than the one he used when
V (G, o) V, (G, 0) (2.1)
he was matched against an equal. This second strategy j.1

*J. Michael Steele is Assistant Professor, Department of Sta, where
tistics, Stanford University, Stanford, CA 94305. James Zidek is Vj (G, 0) 1 1Gi  Oil Ji  Ojl
Professor, Department of Mathematics, University of British
Columbia, Vancouver, BC Y8T1W5, Canada. This work was sup 0 otherwise
ported in part by the Office of Naval Research under Grant
N0001476~C0475 (NR~042267) and the Army Research Office
Grant DAAG~2977~G0031. The authors wish to thank R. 0 Journal of the American Statistical Association
Chacon, P. Diaconis, J. Kadane, I. Olkin, A. Pittinger, and D.C. September 1900, Volume 75, Number 371
Wu for their comments on earlier drafts of this article. Theory and Methods Section
596 1


Steele and Zidek: Strategies for Second Guessers 597

The strategic objective of the second guesser is therefore 1,i (X, Y) is on the same side of Xi as Yj. This immediately to maximize EV (G, 0); that is, the second guesser wishes implies that the Hotelling and hunchguided strategies to maximize the expected number of times his guesses will then coincide.
come closer to the true values. Without distributional assumptions on 0, one can no
The only probabilistie assumptions to be made now are longer speak of the optimality of a guessing strategy, but
that 0, X, and Y have a joint distribution that is con the following result points out a case in which the second
tinuous. This assumption is made for convenience and guesser can still realize a substantial advantage.
avoids the ad hoe conventions required for dealing with
ties. Theorem 2ThreeQuarter TheoreW: If .9 = X
Now let zi(X, Y) denote the median of the conditionaland r = Y  0 are identically distributed, independent
distribution of Oi given X and Y. A key role in our guessing and symmetric about zero, then the hunchguided guess
theory is played by the following strategy: has probability 1 of winning as e , 0.
Gi. = Xi + c if Xi < ii (X, Y) Proof.. As e+ 0, the probability that the hunch
guided guesser loses is P(:F = Xi  £ otherwise symmetry and exchangeability this probability also

These strategies will subsequently be called Hotelling equals
strategies since they were essentially put forward in 2P(O < P' < 1) = P (0 < 1 and 0 < :P)
Hotelling (1929, p. 51). There are broad differences
between the present model and Hotelling's problem in In a practical application of the threequarter theorem
location economies, but the relationship seems close the assumption of identical distributions might seem to
enough to justify (or even require) the name. The main pose some difficulties. It is reassuring that the result is
fact in this section is the following simple result: quite robust. For example, assuming unbiased jointly
Theorem 1: The Hotelling strategies are e optimal; normal guesses, the second guesser still wins with prob
that is, ability greater than .68 when var Y'/varl = 2.5 and wins
with probability greater than .59 when var:P/varg = 10.
lim E V (G,, 0) = sup E V (G, 0) (These probabilities are easily confirmed by tables of the
40 0 bivariate normal, e.g., Owen 1956.) The more detailed
Proof.. Since any guess Gi must be on one side or the assessment of robustness in guessing competitions will
other of Xi, we have be dealt with in a subsequent report, but one should note
1,AY(I Gi  0i 1 < 1 Xi  Oi 1) an obvious aspect of nonrobustness under gross changes
in the model of Theorem 1 is that the probability of the
:5 maxtpx.y(oi < xi), PXYoi k xi)) second guesser winning will tend to 1 or 1 according as
The basic observation about Gil is that varY/Yarl tends to o or 0.
lim Px^1 Gi*  Oi 1 < 1 Xi  0i 1)
3. GAUSSIAN GUESSING

max (PXX (01 :5 Xi), PxX (01 ~~ Xi) 1 Since Hotelling strategies have been shown to be
Taking expectations in the two preceding relations and optimal, one would naturally like to provide a class of
summing over 1 ::5 i < p, the theorem is proved. models in which the strategies can be determined
A compelling impediment to the use of Hotelling explicitly. The main result of this section is to give such strategies is that they require the knowledge of the joint explicit strategies under a multivariate normal model
distribution of 0, X, and Y, or at least the knowledge of studied by Lindley (1971) and Lindley and Smith ;,i(X, Y). The key task of the remainder of this article (1972).
is to isolate some feasible circumstances in which this We write U 1 V for the conditional distribution of U
irn ediment can be overcome. given V, 1, for the row p vector (1, 1, and I,
p for the p X p identity matrix.
To begin, consider the strategies Our Gaussian model assumptions are the following:
Oi=Xi+4E if Xi = Xi  E if Xi > Yi Ar(.uo,

where the second guesser places his guess just a bit to the and
side of the first guess in the direction of his own "hunch" X, Y 10, N (0 (1,, 1,), r)
Yi.
In some cases one can show that these hunchguided A result equivalent to that given here was told to the first
guesses are in fact Hotelling strategies. Certainly, if the author in 1975 by R. Chacon and was known much earlier to R.
vectors (0j, Yj, Xi), 1 < i :5 p are independent and Chacon and 8. Koehen. The result was also known earlier to T.
Cover in the form: Between two dtequally matched" baskeball
6i 1 Yj ~ N (Yi, 0, Y2), Xi 1 0j, Yi = Xi 1 0i ~ N (0i, axl), then teams the odds are 3 to 1 in favor of the team leading at the half.


598 Journal of the American Statistical Association, September 1980
where the exact values of these quantities cannot generally
(UX2j, 0 be assumed to be known. The next objective is thus to
r derive reasonable estimates to the unknown mean and
0 'T Y 21 variances. One benefit of this analysis is a clearer under

The physical motives behind this model are that the standing of the empirical fact (Efron and Morris 1973) true weights 0 of the persons we see are viewed as in the Stein estimator performs well with respect to the dependent realizations of a single fixed random process reward function V.
that was itself once drawn from a population of random A Bayesian approach to the estimations given before processes. For example, the parameter m can be viewed as can be made along the lines suggested in Lindley and a geographically fixed quantity determined at an earlier Smith (1972), but such a procedure can prove quite time by (random) immigrations. The assumption of complex (cf. discussion by V. D. Barnett in Lindley and normality is made partially out of traditional con Smith 1972). The estimators considered here are based venience, but also because it seems justifiable in the on an empirical Bayes procedure that seems both simple weightguessing example. The structural model together and sensible.
with the normality lead uniquely to the Gaussian model As before, we write Z = (X, Y) and begin by trans
specified in (3.1). The promised explicit determination of forming Z into a canonical form. Next we recall that
the Hotelling strategy is now possible. 1,71, = pPTAP, where P denotes the p X p Helmert
orthogonal matrix (cf. Bennett and Franklin 1954, p.
Theorem 3: Under the preceding Gaussian model the 102) and A is the p X p matrix with 1 in the (1, 1) posi
Hotelling strategy is tion and all other entries zero. We define (U, V) by
Gj*=Xi+e if (U, V) = 21 (X, Y) I, I, PT 0
( I, IX 0 PT)
(3.2) A straightforward calculation shows
=XiE otherwise (U, V)  N(iA*, 1*)
where where
o~2(,72 + aX1 + 0,Y2)1 A* = (2p) lpo (e, 0)
or 2 a02 + poP2 e (1, 0, . , 0)
+ + ay1)1 [0, + 21 p a 21 p I p 0
21 21 + 20.92 0 .] + 2p_.2 j
and p (7+ P] 0 0
UX2(0,X1 + ffy2)1 and

The proof of the preceding theorem depends on a multi 0r+2 (C.X2 + 0, Y1) ' o'2 (aXl _ a Y2)
variate calculation that we have deferred to the Ap
pendix in order to take up directly the problem of inter From the canonical form shown one notes that Cr,2 can
preting the result. not be meaningfully estimated since there is only one
The basic part is the mixture of means, degree of freedom available for its estimation.
We now turn to the analysis of important special
.Ypo + (1  7)1pt + cases that correspond to qualitatively different contexts.

which is perturbed on trial i by the "mixture" of residuals Case AKnown Variances: We need to estimate only
a. 0 + (1 a) CO (Xi  1) + (1  #) (Yi  ]?) ]. The co go, and this is done by maximizing the Type II likelihood
efficient P axl(ax~, + ory1)1 appearing in these ndx (cf. Good 1965). This calculation follows easily from
tures is near 0, 1, or 1 accordingly as oj~o, Ywl is near oc, 1, equation (A. 1) ' of the Appendix, and the estimator
or 0. This ratio is one natural measure of the relative obtained is
abilities of the two guessers, and this interpretation is 110 = (12,2;ZZ'Z7)(12p2;ZZ112p T)1
reinforced by considering the extreme cases. When
cr.)~2or y2  oo the first guess is essentially ignored and when This simplifies further to just
01X 2 Oly  0 it is the hunch that is ignored. This last PO + (1
case is of particular interest since it corresponds to trying
to outguess a far better informed adversary. so the estimated Hotelling strategy becomes

4. STEINGUIDED GUESSI.NG Gi* = Xi + e if Xi < a[fil + (1 ' 0) fl
+ (1  a) C8xi + (1 16) yi]
The strategies just derived have the drawback that = Xi  E otherwise
they are functions of go, o,,,2, oo, axl, and ay 2. Although
the magnitude of go and of the relevant variance ratios where the parameters a and 0 are as specified in Theorem
may be sufficiently understood for some applications, 2.


Steele and Zidek: Strategies for Second Guessers 599

Case B: a,' = 0; ax' = a Y2 = 62 and aoz unknown. The The formal analysis begins as in Case B. Since a Y2 = C0
canonical model simplifies to the Y is uniformative and the analysis must rest on X.
IP 0 IP 0 Also ' since a,2 = 0, the canonical form of the X's marginal
(U, V) N( (2p)lgo(e, 0), 61 (0 J)+20.92 (0 model can be simplified to

U*  N(pigoe , (0.X2 + a02)j') and this time estimators are easily found without carrying out the likelihood maximization. We take the esti The obvious estimate of go is given by
mators of go, r = (61 + 2a#2)1 and 62, respectively, given A, = g '
by (2p)AU, and if we require that the estimate, tx, of
rx (ax, + aj)1
p(E U,2)1 be unbiased,
i2 p
and fx = (p 3) (xi 
$2 = p _IVVT
becomes the natural choice and the estimated Hotelling
In terms of X and Y we then get strategy is
PO = 1
+ Gi* Xi + e if Xi < a*! + (1  &*)Xi
2 1 xi  E otherwise
= p CP (xi + yi)  + n] )_
2 V2 where minll,&) and
and p
2 yX2(p  3)C7 Ji
E (xi  Yi)
i_l [72 The direction of the guess on the side of Xi is deter
The approximate Hotelling strategy for this case is mined by &*X + (1  4*)Xi, which is precisely the Stein
theref ore estimator as modified by Lindley (cf. James and Stein
1961 and the discussion following Lindley and Smith
Gi*=Xi+e if Xi<&.12(,£+:P)+(1&).12(Xi+yi) 1972).
=Xic. otherwise Now since the convex combination of 1 and Xi will
always be on the same side of Xi as X, the estimated
where Hotelling strategy can be more simply written as just

[1  Yi) 2 Gi* = Xi + c if Xi < X (4.1)
~2 X 11 = Xi  c otherwise .
E  (xi + yi)  1 (1 + This is an extraordinarily simple procedure in a model
 [ipl L/12 that we feel may be realistic in several sporting and
business contexts.
One should note that a has a natural interpretation. It To assess the performance of this Steinguided strategy,
is just the ratio (between the guesser variance)(be guessing trials were simulated for a variety of special
tween the trial variance). The optimal strategy favOrs cases. The Oi were chosen as Oi = 0, Oi = i, and Oi = 0
using when & is close to 1 and favors IJi+ Yi) for each of i = 1, 2, . . ., p with p = 10 and then with
when & is close to 0. Also, the strategy can be improved p = 100; thus, in all, 3 X 2 = 6 cases were considered.
slightly by replacing a by 1 if it happens that A > 1. As an illustration of the computation consider the case
CaSe C: a,.1 = 0, aY2 = 00, aX2 known; a@' unknown. in which Oi = i and p = 100. In this case 200 repetitions
This is a case we feel to be of particular interest. Cal were made as follows:
culating as before, we find that a Stein estimator deter 1. X was generated as N(O, I) with 0 = (1, 2,
mines the approximate Hotelling strategy, but that it 100).
plays a cameo role since the strategy simplifies to just 2. Gi* was calculated by (4.1) with e = 101.
"betting on the Xi side of l." This simple result gives 3. V was then calculated, and the process was repeated
some theoretical justification to the otherwise somewhat 200 times.
mysterious empirical fact that ? performs even better 4. The 200 realizations of V1100 were used to estimate
than the Stein estimator in terms of "ganibler's" loss on the density of V1100, the percentage of times the second
the batting average data set (cf. Plackett's comment player wins using the G* of (4.1).
in Efron and Morris 1973, p. 416, and an easy com 5. This density was plotted in Figure B (in this case,
putation). the unshaded density in the middle graph).


600 Journal of the American Statistical Association, September 1980

A. Estimated Density of the Proportion of Second similar estimates of the density of V/200, and the estiGuesser Wins Using the SteinGuided Optimal Strat mates from runs number 1, 17, and 21 were selected as
egy (based on runs of 200 contests) indicative of the variability in the 25 runs.
45 
13 all i 5. REALITIES OF APPLICATION
P.100 Almost any discussion of the preceding theory
25  eventually turns to the problem of football betting, and
it seems generally worthwhile to note why the theory is
P.10 not applicable to that problem. A key reason is that the
1 bookie (or person "setting the line") is not trying to
estimate the actual point spread. The bookie is trying
to produce a point spread that will produce a nearly
110 25  equal number of takers on each side of the spread. The
1 all L
bookie is therefore not a first guesser in the sense of this
p 10 P100 article, and our theory naturally does not apply.
Consider instead two bookies of equal caliber, one of
whom sets his line on Monday and the other on Tuesday
E
(for the game on Sunday). If the Tuesday bookie
W
25  9.0 oil i P. 100 wished to obtain only a more even distribution of custo
mers on either side of his line than the Monday bookie,
P10 he should be able to do so in almost threequarters of
1 the games by using the hunchguided guessing of Section
0 .5 1.0 2. In this case, each bookie is a bona fide guesser of that
x . proportion of p triels won by the second guesser spread s that will evenly split the pool of bettors.
From Figure A one learns by looking at the unshaded Since there are actually many games each week, the density in the top graph that when as many as 10 Tuesday bookie could actually outperform the Monday parameters growing like is are to be guessed the modal bookie by using the Steinguided strategy of Section 4, percentage of correct guesses made by the second guesser particularly (4.1). The assumptions of (3.1) may not be is about 95 percent. The general conclusions to be drawn applicable to the whole set of games; but if one con
from Figure A are siders only noncharismatic games outside the bookie's
city, then (3.1) seems reasonable. (This is a stratifica
1. The more parameters to be guessed, the greater tion step to obtain increased homogeneity of the spreads
the advantage to the second guesser. to be guessed.)
2. The more spread out the Oi to be guessed, the more The examples put forward before are in the long tradi
the advantage to the second guesser. tion of gedanken experimente, and the problem of pro
In Figure B, these conclusions are further examined ducing a truly telling application remains open. An by taking the Oi themselves to be random. Here p = 20 intriguing aspect of a theory of this nature is that it is

was fixed throughout. First we took a realization Of only necessary to find one good application.
o  N(O, 4120). Then 200 of the X's we generated with APPENDIX: PROOF OF THEOREM 3
the same fixed underlying 0 (just as in Figure A). The
density of V/200 was estimated as before, and altogether By Theorem 1 the problem depends on the calculation
25 runs were made. The 25 runs produced remarkably of the posterior median vi(X, Y), which by the normality
assumptions (3.1) coincides with the posterior mean.
B. Estimated Density of the Proportion of Second The argument given here for completeness is similar to
Guesser Wins In p = 20 Trials Using the SteinGuided those of Lindley (1971) and Lindley and Smith (1972).
Optimal Strategy (three runs of 100 contests and com It depends on the wellknown fact (cf. Anderson 1958,
bined runs of 2,500 contests) p. 27) that if U and V are jointly normal then

10 Run 21 UiV , N(EU  EV.P + W, Mu.v)
OL fondorn Run 17 where 2;vi~'lvu, Zu.v = Zur  luvlvv~'Zuv, 1
denotes the covariance matrix of V and so on.
Run 1
5 Setting Z = (X, Y) and applying the preceding
identities to (3.1), we have
E
W Z  N(P0(1,' 1'), 2;zz) (A.1)
1 where
1 1 JA
0 .5 Lo
x proportion of p. 20 triols won by the second guesser Zzz ~ diag 1 aX2, a Y2 ) (~j,+a#2j2OI,+01,'J2(9)lprlll


Steele and Zidek: Strategies for Second Guessers
where (9 denotes the Kronecker matrix product and J2 where
is the matrix with all 1 entries. Further, X X, X, + X,  2X,

X(2)
where E(O1Z) = E(O)  E(Z).~ + Z.~ ~,/2.3
X1+X2++Xp1 (P  I)Xp
with ZZ2~'.Zzo  (p (1  PW
2; Z0 2:00 so
0 XA = oI.YI(P1, 01 1 0) P + 0'6'a (0, X (2)) P
and 2;00 = 0,#11, + a 21,71, = 0r2yll, + 0.02o,(X  tl
We now determine Zzz1. First we note that A similar result holds for YA. Introducing the resulting
1,71, = pprAP expressions for XA and YA in (A.2) yields

where P denotes the Helmert orthogonal matrix (cf. (OrXIX+OV2y)A
Bennett and Franklin 1954, p. 102) and A is the p X p since (A.3)
matrix with 1 in the (1, 1) position and all remaining ^10`2(aX2 + ay2)
entries zero. and
One then notes that aoO(OrX2 + OV2)

(12 (& P)TIZZ(I2 0 PT)
From the expression just obtained in (A.3), E(Z).~
reduces to just and hence E(O)  E(Z).~ are easily found. To get
E(Z).~, simply substitute M01, for both X and Y. This
diaglox', a~1) 0 1, + 0`JJ2 (D Ip + PO`JJ2 0 A
gives
which makes it straightforward to show E(Z).~ = ya1CyX2 + ayl],Aol,
IZZ (12 (E) PT) E) (12 0 P) At the same time E (0) = pol, so we have
where
E(O)  E(i).( = IiAol, (A.4)
(aX 0 The proof of Theorem 3 is now an immediate con
0 0. l~) (OT _) sequence of (A.3) and (AA).

+ ( 11 1)0(aX2ay2or 2 ^1 0
 1 0 orx2ay~2aoaI,1) [Received September 1978. Revised March 1980.]
where y and a are in Theorem 2.
From these results an explicit expression for Z is REFERENCES
readily obtained. We first note Anderson, T.W. (1958), Introduction to Multivariate Statistical
Z. (XPT, YPr) E) (PT, PT) T2:00 Analysis, New York: John Wiley & Sons.
Bennett, C.A., and Franklin, N.L. (1954), Statistical Analysis in
and Chemistry and the Chemical Industry, New York: John Wiley
& Sons.
0 (P T, p 7) T X,, 41 Y2) T Efron, B., and Morris, C. (1973), "Combining Possibly Related
.,0" Estimation Problems" (with discussion), Journal of the Royal
Statistical Society, Ser. B, 35, 379421.
Thus Feller, W. (1966), An Introduction to Probability Theory and Its
Applications (Vol. 2), New York: John Wiley & Sons.
Z  (XP'p' YPT) Good, I.J. (1965), The Estimation of Probabilities, Cambridge,
Mass.: MIT Press.
Hotelling, H. (1929), "Stability in Competition," Economic Journal,
0 I0J1 39, 41~57.
(or X~2, a Y2) T 0 ('y CL James, W., and Stein, C. (1961), "Estimation With Quadratic
(a&ll, + p(r,.14)p = (orX2X + ay'Y)A (A.2) Loss," Proceedings of the Fourth Berkeley Symposium on Prob
bility and Statistics, 1, 361379.
where Lindley, D.V. (1971), "The Estimation of Many Parameters,"
0,2Y 0 in Foundations of Statistical inference, eds. V.P. Godambe and
A = PT p D.A. Sprott, Toronto: Holt, Rinehart and Winston of Canada,
0 435455.
Lindley, D.V., and Smith, A.F.M. (1972), "Bayes Estimates for
Now represent X as the Linear Model" (with discussion), Journal of the Royal Statis
tical Society, Ser. B, 34,141.
X = !l, + (X  11') Owen, D.B. (1956), "Tables for Computing Bivariate Normal
Probabilities," Annals of Mathematical Statistics, 27, 10751090.
Then we have Stein, C. (1962), "Confidence Sets for the Mean of a Multivariate
Normal Distribution" (with discussion), Journal of the Royal
XPT 0, 0) + (0, X(2)) Statistical Society, Ser. B, 24, 265296.

i