Optimal Strategies for Second Guessers

J. MICHAEL STEELE and JAMES MEK*

A model is given for a class of contests in which the participants derives from a hierarchical linear model like that studied

try to guess (or estimate) unknown quantities, and the objective in Lindley and Smith (1972). It is also closely connected

of each player is to come closer to the unknown quantities than an

adversary. A general optimality result is proved that gives the with the JamesStein estimator and was originally moti

beet guessing rules for the second guesser. These rules are first vated by the "Batting Average" example of Efron and

calculated exactly in a certain hierarchical linear model, and then

simpler approximate rules are given. Morris (1973).

Our program begins by establishing in Section 2 a

KEY WORDS: Guessing; Optimal strategies; Hierarchical linear formal theory of guessing contests. We also give a simple

model; Stein estimator; Posterior median. but very general optimality result that forms the basis

1. INTRODUCTION for the rest of the article.

The third section determines the exact optimal strategy

The goal in many activities or contests is not neces for second guessing under a certain linear model. Practical

sarily to do well in any absolute sense, but merely to approximations to this optimal strategy are worked out

outperform an adversary. The objective of this article in Section 4. The final section gives a critical discussion

is to provide a model for such a contest, establish the of the various sources of difficulties inherent in applying

optimality of certain procedures, and provide suitable this theory of guessing contests. While the main point of

approximations to these optimal procedures. But be this aiticle is to provide a tractable theory of guessing

fore yielding to the mathematics of the model, we wish contests, we feel that the largest single point established

to fix ideas with an anecdote. is the approximate optimality of the simple rule given

Two statisticians, Bob and Mike, engaged in a contest by (4. 1).

to guess weights of people at a party. They agreed that

Bob would always guess first. Mike would then guess,

and finally the person in question would say who is 2. HOTELLINGS STRATEGY

closer. For example, for person number one Bob guessed The structure of our guessing model can be described

137 pounds. Mike then guessed 137.01 pounds, and the by a system of four p vectors.

guest declared Mike the victor. The contest continued

in a similar vein, and to Bob's dismay he won barely a Target values: (01, 02, 0p) = 0

quarter of the time. First guess: J,, X21 1 X,) = X

It is intuitively clear that the second guesser has an Second guesser's hunch: (Y1, Y2, Yp) = Y

advantage, and one of the results of Section 2 shows that Second guess: (G,, G2, G,) = G

this advantage is typically as large as the 75 percent

obtained by Mike in the anecdote. The 0, represent the real values to be guessed. The Xi

To continue the story, Bob was so stunned by defeat are guesses made by the person who goes first, and all

and eager for revenge, that he elicited the assistance of a these are assumed to be available to the second guesser

professional weight guesser. Mike agreed that since the before he acts. The Yi represent the second guesser's

new team was so powerful it should be willing to make best estimate of the 0j. Finally, the Gi are the guesses to

all its guesses about the weights of the guests before Mike be announced by the second guesser. Our principal task

had to state any of his guesses. The team agreed to the is to determine how G should be based on X and Y.

proposed rule change, and Mike then proceeded to win The objective of each player is to come closer to 0

even more convincingly than before. than his opponent, so we begin by setting

The strategy used by Mike in the second case is

naturally more sophisticated than the one he used when

V (G, o) V, (G, 0) (2.1)

he was matched against an equal. This second strategy j.1

*J. Michael Steele is Assistant Professor, Department of Sta, where

tistics, Stanford University, Stanford, CA 94305. James Zidek is Vj (G, 0) 1 1Gi Oil Ji Ojl

Professor, Department of Mathematics, University of British

Columbia, Vancouver, BC Y8T1W5, Canada. This work was sup 0 otherwise

ported in part by the Office of Naval Research under Grant

N0001476~C0475 (NR~042267) and the Army Research Office

Grant DAAG~2977~G0031. The authors wish to thank R. 0 Journal of the American Statistical Association

Chacon, P. Diaconis, J. Kadane, I. Olkin, A. Pittinger, and D.C. September 1900, Volume 75, Number 371

Wu for their comments on earlier drafts of this article. Theory and Methods Section

596 1

Steele and Zidek: Strategies for Second Guessers 597

The strategic objective of the second guesser is therefore 1,i (X, Y) is on the same side of Xi as Yj. This immediately to maximize EV (G, 0); that is, the second guesser wishes implies that the Hotelling and hunchguided strategies to maximize the expected number of times his guesses will then coincide.

come closer to the true values. Without distributional assumptions on 0, one can no

The only probabilistie assumptions to be made now are longer speak of the optimality of a guessing strategy, but

that 0, X, and Y have a joint distribution that is con the following result points out a case in which the second

tinuous. This assumption is made for convenience and guesser can still realize a substantial advantage.

avoids the ad hoe conventions required for dealing with

ties. Theorem 2ThreeQuarter TheoreW: If .9 = X

Now let zi(X, Y) denote the median of the conditionaland r = Y 0 are identically distributed, independent

distribution of Oi given X and Y. A key role in our guessing and symmetric about zero, then the hunchguided guess

theory is played by the following strategy: has probability 1 of winning as e , 0.

Gi. = Xi + c if Xi < ii (X, Y) Proof.. As e+ 0, the probability that the hunch

guided guesser loses is P(:F

These strategies will subsequently be called Hotelling equals

strategies since they were essentially put forward in 2P(O < P' < 1) = P (0 < 1 and 0 < :P)

Hotelling (1929, p. 51). There are broad differences

between the present model and Hotelling's problem in In a practical application of the threequarter theorem

location economies, but the relationship seems close the assumption of identical distributions might seem to

enough to justify (or even require) the name. The main pose some difficulties. It is reassuring that the result is

fact in this section is the following simple result: quite robust. For example, assuming unbiased jointly

Theorem 1: The Hotelling strategies are e optimal; normal guesses, the second guesser still wins with prob

that is, ability greater than .68 when var Y'/varl = 2.5 and wins

with probability greater than .59 when var:P/varg = 10.

lim E V (G,, 0) = sup E V (G, 0) (These probabilities are easily confirmed by tables of the

40 0 bivariate normal, e.g., Owen 1956.) The more detailed

Proof.. Since any guess Gi must be on one side or the assessment of robustness in guessing competitions will

other of Xi, we have be dealt with in a subsequent report, but one should note

1,AY(I Gi 0i 1 < 1 Xi Oi 1) an obvious aspect of nonrobustness under gross changes

in the model of Theorem 1 is that the probability of the

:5 maxtpx.y(oi < xi), PXYoi k xi)) second guesser winning will tend to 1 or 1 according as

The basic observation about Gil is that varY/Yarl tends to o or 0.

lim Px^1 Gi* Oi 1 < 1 Xi 0i 1)

3. GAUSSIAN GUESSING

max (PXX (01 :5 Xi), PxX (01 ~~ Xi) 1 Since Hotelling strategies have been shown to be

Taking expectations in the two preceding relations and optimal, one would naturally like to provide a class of

summing over 1 ::5 i < p, the theorem is proved. models in which the strategies can be determined

A compelling impediment to the use of Hotelling explicitly. The main result of this section is to give such strategies is that they require the knowledge of the joint explicit strategies under a multivariate normal model

distribution of 0, X, and Y, or at least the knowledge of studied by Lindley (1971) and Lindley and Smith ;,i(X, Y). The key task of the remainder of this article (1972).

is to isolate some feasible circumstances in which this We write U 1 V for the conditional distribution of U

irn ediment can be overcome. given V, 1, for the row p vector (1, 1, and I,

p for the p X p identity matrix.

To begin, consider the strategies Our Gaussian model assumptions are the following:

Oi=Xi+4E if Xi

where the second guesser places his guess just a bit to the and

side of the first guess in the direction of his own "hunch" X, Y 10, N (0 (1,, 1,), r)

Yi.

In some cases one can show that these hunchguided A result equivalent to that given here was told to the first

guesses are in fact Hotelling strategies. Certainly, if the author in 1975 by R. Chacon and was known much earlier to R.

vectors (0j, Yj, Xi), 1 < i :5 p are independent and Chacon and 8. Koehen. The result was also known earlier to T.

Cover in the form: Between two dtequally matched" baskeball

6i 1 Yj ~ N (Yi, 0, Y2), Xi 1 0j, Yi = Xi 1 0i ~ N (0i, axl), then teams the odds are 3 to 1 in favor of the team leading at the half.

598 Journal of the American Statistical Association, September 1980

where the exact values of these quantities cannot generally

(UX2j, 0 be assumed to be known. The next objective is thus to

r derive reasonable estimates to the unknown mean and

0 'T Y 21 variances. One benefit of this analysis is a clearer under

The physical motives behind this model are that the standing of the empirical fact (Efron and Morris 1973) true weights 0 of the persons we see are viewed as in the Stein estimator performs well with respect to the dependent realizations of a single fixed random process reward function V.

that was itself once drawn from a population of random A Bayesian approach to the estimations given before processes. For example, the parameter m can be viewed as can be made along the lines suggested in Lindley and a geographically fixed quantity determined at an earlier Smith (1972), but such a procedure can prove quite time by (random) immigrations. The assumption of complex (cf. discussion by V. D. Barnett in Lindley and normality is made partially out of traditional con Smith 1972). The estimators considered here are based venience, but also because it seems justifiable in the on an empirical Bayes procedure that seems both simple weightguessing example. The structural model together and sensible.

with the normality lead uniquely to the Gaussian model As before, we write Z = (X, Y) and begin by trans

specified in (3.1). The promised explicit determination of forming Z into a canonical form. Next we recall that

the Hotelling strategy is now possible. 1,71, = pPTAP, where P denotes the p X p Helmert

orthogonal matrix (cf. Bennett and Franklin 1954, p.

Theorem 3: Under the preceding Gaussian model the 102) and A is the p X p matrix with 1 in the (1, 1) posi

Hotelling strategy is tion and all other entries zero. We define (U, V) by

Gj*=Xi+e if (U, V) = 21 (X, Y) I, I, PT 0

( I, IX 0 PT)

(3.2) A straightforward calculation shows

=XiE otherwise (U, V) N(iA*, 1*)

where where

o~2(,72 + aX1 + 0,Y2)1 A* = (2p) lpo (e, 0)

or 2 a02 + poP2 e (1, 0, . , 0)

+ + ay1)1 [0, + 21 p a 21 p I p 0

21 21 + 20.92 0 .] + 2p_.2 j

and p (7+ P] 0 0

UX2(0,X1 + ffy2)1 and

The proof of the preceding theorem depends on a multi 0r+2 (C.X2 + 0, Y1) ' o'2 (aXl _ a Y2)

variate calculation that we have deferred to the Ap

pendix in order to take up directly the problem of inter From the canonical form shown one notes that Cr,2 can

preting the result. not be meaningfully estimated since there is only one

The basic part is the mixture of means, degree of freedom available for its estimation.

We now turn to the analysis of important special

.Ypo + (1 7)1pt + cases that correspond to qualitatively different contexts.

which is perturbed on trial i by the "mixture" of residuals Case AKnown Variances: We need to estimate only

a. 0 + (1 a) CO (Xi 1) + (1 #) (Yi ]?) ]. The co go, and this is done by maximizing the Type II likelihood

efficient P axl(ax~, + ory1)1 appearing in these ndx (cf. Good 1965). This calculation follows easily from

tures is near 0, 1, or 1 accordingly as oj~o, Ywl is near oc, 1, equation (A. 1) ' of the Appendix, and the estimator

or 0. This ratio is one natural measure of the relative obtained is

abilities of the two guessers, and this interpretation is 110 = (12,2;ZZ'Z7)(12p2;ZZ112p T)1

reinforced by considering the extreme cases. When

cr.)~2or y2 oo the first guess is essentially ignored and when This simplifies further to just

01X 2 Oly 0 it is the hunch that is ignored. This last PO + (1

case is of particular interest since it corresponds to trying

to outguess a far better informed adversary. so the estimated Hotelling strategy becomes

4. STEINGUIDED GUESSI.NG Gi* = Xi + e if Xi < a[fil + (1 ' 0) fl

+ (1 a) C8xi + (1 16) yi]

The strategies just derived have the drawback that = Xi E otherwise

they are functions of go, o,,,2, oo, axl, and ay 2. Although

the magnitude of go and of the relevant variance ratios where the parameters a and 0 are as specified in Theorem

may be sufficiently understood for some applications, 2.

Steele and Zidek: Strategies for Second Guessers 599

Case B: a,' = 0; ax' = a Y2 = 62 and aoz unknown. The The formal analysis begins as in Case B. Since a Y2 = C0

canonical model simplifies to the Y is uniformative and the analysis must rest on X.

IP 0 IP 0 Also ' since a,2 = 0, the canonical form of the X's marginal

(U, V) N( (2p)lgo(e, 0), 61 (0 J)+20.92 (0 model can be simplified to

U* N(pigoe , (0.X2 + a02)j') and this time estimators are easily found without carrying out the likelihood maximization. We take the esti The obvious estimate of go is given by

mators of go, r = (61 + 2a#2)1 and 62, respectively, given A, = g '

by (2p)AU, and if we require that the estimate, tx, of

rx (ax, + aj)1

p(E U,2)1 be unbiased,

i2 p

and fx = (p 3) (xi

$2 = p _IVVT

becomes the natural choice and the estimated Hotelling

In terms of X and Y we then get strategy is

PO = 1

+ Gi* Xi + e if Xi < a*! + (1 &*)Xi

2 1 xi E otherwise

= p CP (xi + yi) + n] )_

2 V2 where minll,&) and

and p

2 yX2(p 3)C7 Ji

E (xi Yi)

i_l [72 The direction of the guess on the side of Xi is deter

The approximate Hotelling strategy for this case is mined by &*X + (1 4*)Xi, which is precisely the Stein

theref ore estimator as modified by Lindley (cf. James and Stein

1961 and the discussion following Lindley and Smith

Gi*=Xi+e if Xi<&.12(,£+:P)+(1&).12(Xi+yi) 1972).

=Xic. otherwise Now since the convex combination of 1 and Xi will

always be on the same side of Xi as X, the estimated

where Hotelling strategy can be more simply written as just

[1 Yi) 2 Gi* = Xi + c if Xi < X (4.1)

~2 X 11 = Xi c otherwise .

E (xi + yi) 1 (1 + This is an extraordinarily simple procedure in a model

[ipl L/12 that we feel may be realistic in several sporting and

business contexts.

One should note that a has a natural interpretation. It To assess the performance of this Steinguided strategy,

is just the ratio (between the guesser variance)(be guessing trials were simulated for a variety of special

tween the trial variance). The optimal strategy favOrs cases. The Oi were chosen as Oi = 0, Oi = i, and Oi = 0

using when & is close to 1 and favors IJi+ Yi) for each of i = 1, 2, . . ., p with p = 10 and then with

when & is close to 0. Also, the strategy can be improved p = 100; thus, in all, 3 X 2 = 6 cases were considered.

slightly by replacing a by 1 if it happens that A > 1. As an illustration of the computation consider the case

CaSe C: a,.1 = 0, aY2 = 00, aX2 known; a@' unknown. in which Oi = i and p = 100. In this case 200 repetitions

This is a case we feel to be of particular interest. Cal were made as follows:

culating as before, we find that a Stein estimator deter 1. X was generated as N(O, I) with 0 = (1, 2,

mines the approximate Hotelling strategy, but that it 100).

plays a cameo role since the strategy simplifies to just 2. Gi* was calculated by (4.1) with e = 101.

"betting on the Xi side of l." This simple result gives 3. V was then calculated, and the process was repeated

some theoretical justification to the otherwise somewhat 200 times.

mysterious empirical fact that ? performs even better 4. The 200 realizations of V1100 were used to estimate

than the Stein estimator in terms of "ganibler's" loss on the density of V1100, the percentage of times the second

the batting average data set (cf. Plackett's comment player wins using the G* of (4.1).

in Efron and Morris 1973, p. 416, and an easy com 5. This density was plotted in Figure B (in this case,

putation). the unshaded density in the middle graph).

600 Journal of the American Statistical Association, September 1980

A. Estimated Density of the Proportion of Second similar estimates of the density of V/200, and the estiGuesser Wins Using the SteinGuided Optimal Strat mates from runs number 1, 17, and 21 were selected as

egy (based on runs of 200 contests) indicative of the variability in the 25 runs.

45

13 all i 5. REALITIES OF APPLICATION

P.100 Almost any discussion of the preceding theory

25 eventually turns to the problem of football betting, and

it seems generally worthwhile to note why the theory is

P.10 not applicable to that problem. A key reason is that the

1 bookie (or person "setting the line") is not trying to

estimate the actual point spread. The bookie is trying

to produce a point spread that will produce a nearly

110 25 equal number of takers on each side of the spread. The

1 all L

bookie is therefore not a first guesser in the sense of this

p 10 P100 article, and our theory naturally does not apply.

Consider instead two bookies of equal caliber, one of

whom sets his line on Monday and the other on Tuesday

E

(for the game on Sunday). If the Tuesday bookie

W

25 9.0 oil i P. 100 wished to obtain only a more even distribution of custo

mers on either side of his line than the Monday bookie,

P10 he should be able to do so in almost threequarters of

1 the games by using the hunchguided guessing of Section

0 .5 1.0 2. In this case, each bookie is a bona fide guesser of that

x . proportion of p triels won by the second guesser spread s that will evenly split the pool of bettors.

From Figure A one learns by looking at the unshaded Since there are actually many games each week, the density in the top graph that when as many as 10 Tuesday bookie could actually outperform the Monday parameters growing like is are to be guessed the modal bookie by using the Steinguided strategy of Section 4, percentage of correct guesses made by the second guesser particularly (4.1). The assumptions of (3.1) may not be is about 95 percent. The general conclusions to be drawn applicable to the whole set of games; but if one con

from Figure A are siders only noncharismatic games outside the bookie's

city, then (3.1) seems reasonable. (This is a stratifica

1. The more parameters to be guessed, the greater tion step to obtain increased homogeneity of the spreads

the advantage to the second guesser. to be guessed.)

2. The more spread out the Oi to be guessed, the more The examples put forward before are in the long tradi

the advantage to the second guesser. tion of gedanken experimente, and the problem of pro

In Figure B, these conclusions are further examined ducing a truly telling application remains open. An by taking the Oi themselves to be random. Here p = 20 intriguing aspect of a theory of this nature is that it is

was fixed throughout. First we took a realization Of only necessary to find one good application.

o N(O, 4120). Then 200 of the X's we generated with APPENDIX: PROOF OF THEOREM 3

the same fixed underlying 0 (just as in Figure A). The

density of V/200 was estimated as before, and altogether By Theorem 1 the problem depends on the calculation

25 runs were made. The 25 runs produced remarkably of the posterior median vi(X, Y), which by the normality

assumptions (3.1) coincides with the posterior mean.

B. Estimated Density of the Proportion of Second The argument given here for completeness is similar to

Guesser Wins In p = 20 Trials Using the SteinGuided those of Lindley (1971) and Lindley and Smith (1972).

Optimal Strategy (three runs of 100 contests and com It depends on the wellknown fact (cf. Anderson 1958,

bined runs of 2,500 contests) p. 27) that if U and V are jointly normal then

10 Run 21 UiV , N(EU EV.P + W, Mu.v)

OL fondorn Run 17 where 2;vi~'lvu, Zu.v = Zur luvlvv~'Zuv, 1

denotes the covariance matrix of V and so on.

Run 1

5 Setting Z = (X, Y) and applying the preceding

identities to (3.1), we have

E

W Z N(P0(1,' 1'), 2;zz) (A.1)

1 where

1 1 JA

0 .5 Lo

x proportion of p. 20 triols won by the second guesser Zzz ~ diag 1 aX2, a Y2 ) (~j,+a#2j2OI,+01,'J2(9)lprlll

Steele and Zidek: Strategies for Second Guessers

where (9 denotes the Kronecker matrix product and J2 where

is the matrix with all 1 entries. Further, X X, X, + X, 2X,

X(2)

where E(O1Z) = E(O) E(Z).~ + Z.~ ~,/2.3

X1+X2++Xp1 (P I)Xp

with ZZ2~'.Zzo (p (1 PW

2; Z0 2:00 so

0 XA = oI.YI(P1, 01 1 0) P + 0'6'a (0, X (2)) P

and 2;00 = 0,#11, + a 21,71, = 0r2yll, + 0.02o,(X tl

We now determine Zzz1. First we note that A similar result holds for YA. Introducing the resulting

1,71, = pprAP expressions for XA and YA in (A.2) yields

where P denotes the Helmert orthogonal matrix (cf. (OrXIX+OV2y)A

Bennett and Franklin 1954, p. 102) and A is the p X p since (A.3)

matrix with 1 in the (1, 1) position and all remaining ^10`2(aX2 + ay2)

entries zero. and

One then notes that aoO(OrX2 + OV2)

(12 (& P)TIZZ(I2 0 PT)

From the expression just obtained in (A.3), E(Z).~

reduces to just and hence E(O) E(Z).~ are easily found. To get

E(Z).~, simply substitute M01, for both X and Y. This

diaglox', a~1) 0 1, + 0`JJ2 (D Ip + PO`JJ2 0 A

gives

which makes it straightforward to show E(Z).~ = ya1CyX2 + ayl],Aol,

IZZ (12 (E) PT) E) (12 0 P) At the same time E (0) = pol, so we have

where

E(O) E(i).( = IiAol, (A.4)

(aX 0 The proof of Theorem 3 is now an immediate con

0 0. l~) (OT _) sequence of (A.3) and (AA).

+ ( 11 1)0(aX2ay2or 2 ^1 0

1 0 orx2ay~2aoaI,1) [Received September 1978. Revised March 1980.]

where y and a are in Theorem 2.

From these results an explicit expression for Z is REFERENCES

readily obtained. We first note Anderson, T.W. (1958), Introduction to Multivariate Statistical

Z. (XPT, YPr) E) (PT, PT) T2:00 Analysis, New York: John Wiley & Sons.

Bennett, C.A., and Franklin, N.L. (1954), Statistical Analysis in

and Chemistry and the Chemical Industry, New York: John Wiley

& Sons.

0 (P T, p 7) T X,, 41 Y2) T Efron, B., and Morris, C. (1973), "Combining Possibly Related

.,0" Estimation Problems" (with discussion), Journal of the Royal

Statistical Society, Ser. B, 35, 379421.

Thus Feller, W. (1966), An Introduction to Probability Theory and Its

Applications (Vol. 2), New York: John Wiley & Sons.

Z (XP'p' YPT) Good, I.J. (1965), The Estimation of Probabilities, Cambridge,

Mass.: MIT Press.

Hotelling, H. (1929), "Stability in Competition," Economic Journal,

0 I0J1 39, 41~57.

(or X~2, a Y2) T 0 ('y CL James, W., and Stein, C. (1961), "Estimation With Quadratic

(a&ll, + p(r,.14)p = (orX2X + ay'Y)A (A.2) Loss," Proceedings of the Fourth Berkeley Symposium on Prob

bility and Statistics, 1, 361379.

where Lindley, D.V. (1971), "The Estimation of Many Parameters,"

0,2Y 0 in Foundations of Statistical inference, eds. V.P. Godambe and

A = PT p D.A. Sprott, Toronto: Holt, Rinehart and Winston of Canada,

0 435455.

Lindley, D.V., and Smith, A.F.M. (1972), "Bayes Estimates for

Now represent X as the Linear Model" (with discussion), Journal of the Royal Statis

tical Society, Ser. B, 34,141.

X = !l, + (X 11') Owen, D.B. (1956), "Tables for Computing Bivariate Normal

Probabilities," Annals of Mathematical Statistics, 27, 10751090.

Then we have Stein, C. (1962), "Confidence Sets for the Mean of a Multivariate

Normal Distribution" (with discussion), Journal of the Royal

XPT 0, 0) + (0, X(2)) Statistical Society, Ser. B, 24, 265296.

i