Models: Masterpieces and Lame Excuses

An All Too Common Example:

Two politicians are arguing and one of them says, “But you are lying!” The other one says, “Yes I am …. but hear me out.” ---- West Wing's President Jeb Bartlett (Martin Sheen), in the show where there is an assassination attempt on the President.

Physical Examples

The general notion of a model is much richer than what statisticians commonly have in mind when they use the word. First consider two concrete examples:

One big benefit of an architectural model is that it sharpens our intuition. The architect has a mental image of a building before embarking on the creation of his scale model, but the building of the model sharpens his vision. It provokes questions from fellow architects, and it lets other people in on the conversation. These other people include clients and bankers. Without them having a clear vision of the project, it is unlikely to happen.

There is a second stage in notion of a "scale model" that also has a bearing on statistical modeling. It is more abstract than a physical scale model, but it is still substantially more concrete than a statistical model.

One Step Toward Abstraction: Maps of Cities

Perhaps my favorite example of a model is a map, say a map of the city of Philadelphia.

Even the cheapest city map is likely to answer all the reasonable questions that one can imagine. How far is Independence Hall from South Street? How do you get to the airport? Where is the Italian market? The humble city map does all this and still fits neatly in your pocket (if properly folded).

A good tourist map can even answer questions about things that no longer exist, such as the location of Ben Franklin's house, or Patrick Henry's printing leaflet shop.

There are also much more specialized maps. There are maps of bus routes, and there are maps of water mains. There are also maps that specify responsibilities for fire houses and police stations and maps that define political boundaries, such as school districts and "wards" (whatever a "ward" is).

These models --- our maps --- continue to call on proportionality to a physical entity, but they also take up tasks that cannot be "seen" in the physical entity.

Nice Examples, but So What? And Where Did the Devil Come In?

Statisticians use the word model in a way that diverges greatly from these examples, but my reason for starting this conversation has everything to do with statistical models --- and their cousins, stochastic models and economic models. The burr under my saddle comes from the old-saw that everyone attributes to G.E.P. Box: "All models are wrong, but some are useful."

Whenever you hear this phrase, there is a good chance that you are about to be sold a bill of goods. Something should be done to drive this devilish nostrum from our midst. At least once in a while, someone should do what Martin Luther is alleged to have done when the devil appeared to him.

But It's True, Right?

Because there are wonderful models --- like city maps --- it's easy to agree that there are useful models.

Also, if you get confused, you can say that such models are "all wrong." This little square here on the map is not the same as Independence Hall. Look, it doesn't even have a door. It's just a mark on a piece of paper.

Darrell Huff had a way to deal with such reasoning. He asked, "Did someone change the conversation?"

If I say that a map is wrong, it means that a building is misnamed, or the direction of a one-way street is mislabeled. I never expected my map to recreate all of physical reality, and I only feel ripped off if my map does not correctly answer the questions that it claims to answer.

My maps of Philadelphia are useful. Moreover, except for a few that are out-of-date, they are not wrong.

Be Fair! Maps are Different

So, you say, "Yes, a map can be thought of as a model, but surely it would be more precise to say that a map is a 'visually enhanced database.' Such databases can be correct. These are not the kinds of models that Box had in mind."

I agree. Box was probably thinking of something humble like a regression model or a time series model. He may have also been thinking about physical models --- like gravitation where Newton's model is useful, even though wrong in light of Einstein's corrections.

There is an example on the other end of the continuum which I am not capable of describing, but which Richard Feynman has described beautifully. This is QED, quantum electro-dynamics. It is a rich theory with a vast body of empirical predictions. It has been tested carefully for more than sixty years, and it has passed every test.

Is QED a wrong model? I'll need to dig out the right quotes from Feynman's book, but the overall impression from the book is that QED --- for all its incredible strangeness --- is simply true.

You can argue, "Hey, that's what Laplace would have said about Newtonian gravitation." Of course, you may be right. Still, QED has stood up to some marvelous tests.

Try to Focus on Stuff We Understand, OK?

I confess, I will never understand QED well enough to speak persuasively about it. Moreover, the people who might read this web page are also unlikely to understand QED well enough to be persuaded by the analogy. The conversation needs to be focused on things we all understand --- regressions and time series and such.

I promise to do just that.

For the moment, let's peek ahead to see where the conversation might go. It could be come a long one.

Later Parts of the Conversation --- Adequacy or Fitness?

It is common to consider the adequacy of a model. Typically, a statistical model is considered to be adequate if ---after it has been calibrated to the data --- there are no departures from the statistical assumptions that one would call significant.

One can quibble about this definition. In particular, it seems naive to ignore the often shocking divergence between "in sample" and "out of sample" model performance. A better notion of "adequate" should perhaps involve out of sample testing.

There are further quibbles, some of which are part of the usual discussion of adequacy. For example, a model that fails to incorporate "all relevant information" might be regarded by some as inadequate. This criterion is not well defined, but still it catches a useful sentiment.

Personally, I think that the notion of adequacy should be replaced with that of Fitness for Purpose. Abba Krieger once mentioned the old saying: "The saw is sharp enough if it cuts the tree." This seems entirely sensible to me.

Statisticians have largely avoided the use of fitness for purpose. I think this may come from the near universal conceit that we are designing tools for people with such diverse objectives that we could never hope to attend honestly to fitness of purpose. This strikes me as egotistical baloney.

The majority of published statistical methods hunger for one honest example.

If you ask the Foster-Stine-Wyner question:"Does this fancy new method really beat what you get from a thoughtful regression analysis?", then the number of "new statistical methods" that have a convincing answer to this question is embarrassingly small.

In part this is due to our perverse incentive structure. Collectively, for some strange reason, we are more likely to reward creators of new methods than providers of sensible analyses.

Too many forget (or never absorbed) the point of view that Tukey often took: Keep your eye on the science and keep your statistical tools very simple. Very seldom will you lose very much, and you are almost certain to spend much less of your life talking nonsense.

Send me mail, if you have comments on what I have written so far.

Topics On the List

An Important Addendum

I would be foolish to leave the impression (while this essay is still under construction) that I in any way have a negative view of the work of George Box. On the contrary, he has always been one of the most consistently sensible among us. You have only to read his ASA Presidential Remarks to confirm this --- if you have ever forgotten.

gif new re: sweave Models Again

"Statisticians, like artists, have the bad habit of falling in love with their models." --- G.E.P. Box again, but more recently quoted by Dick DeVeaux in his interesting talk "Math is music, Statistics is literature"

Moral? It is both cheaper and safer for artists to work from photos! --- JMS

Chris Chatfield, D. R. Cox --- And Much More to Come

In his comments on Chris Chatffield's 1995 RSS Paper, D.R. Cox notes:

"Finally it does not seem helpful just to say that all models are wrong. The very word model implies simplification and idealization. The idea that complex physical, biological or sociological systems can be exactly described by a few formulae is patently absurd. The construction of idealized representations that capture important stable aspects of such systems is, however, a vital part of general scientific analysis and statistical models, especially substantive ones (Cox, 1990), do not seem essentially different from other kinds of model."

Other Resources

The Only Law of Economics

"The only law of economics that I believe in is Hamming’s law, “You cannot consume what is not produced”. There is not another single, reliable law in all of economics I know of which is not either a tautology in mathematics, or else it is sometimes false. Hence when you do simulations in economics you have not the reliability you have in the hard sciences." --- Richard Hamming, Art of Doing Science and Engineering (p. 137)

Assuming Normality

Assuming normality means never having to say you don’t have enough data.

As you see, there are no inverted comma's and no attribution, so I accept responsibility for the unrepentant rerun of Erich Segal’s mushy: “Love means never having to say you’re sorry.”

Macro Forecasting--- 70s and Now

"Our expectations for forecasting were quite appropriately revised downward in the 1970s and 1980s, and the ensuing humility has been good for all. The new humility, however, is not symptomatic of failure, just as the bravado of the 1960s was not symptomatic of success." --- Frank Diebold JEP 1998

Questions of Markets and Not-So-Small Change

What was the highest volume day on the NYSE? Ans: January 4, 2001, when 2,129,445,637 shares traded. What was the lowest volume day on the NYSE? March 16, 1830, when only 31 shares traded. Things change. [Added: Sept 2008 old volume record beaten --- details to follow.]

Navigation: Back to Steele's Home Page or Back to the Rank Bank or Surprise Me!

Technical Notes: Note1 and Note2

Still Not Amused? Consider Post-Modern Statistics.