Replication and Reproducibility

Gary King's Replication Crusade

Gary King argues quite convincingly that replication deserves a much greater role in social science an excellent way for a graduate student to begin a research career in social science is by replicating some previously published result.

Students in 956 are invited to read King's essay "How to Write A Publishable (Class) Paper." It is tilted toward social science, but it will be useful for all of us to discuss how King's advice can be modified for the use of statisticians.

A Fama-French Mini-Project

I don't know the full history of the sequence of papers that people have in mind when they say "Fama-French." A nice micro-project for 956 would be to sort this out --- it would take less than a day, and with a little Google luck it might take less. The first step would be to build a time line and one-paragraph summary detailing what added (or subtracted) at each step.

One aim of this project is to take a few steps toward a bolted-to-the-ground understanding of cross-sectional versus time series analyses. This is something that I have put off far too long.

What puts this project on this page is that WRDS has "the" Fama-French data set. Here we need the "the" in quotes, because --- as note above --- there is a sequence of papers that needs to be absorbed.

Reproducible Results --- At Least in Statistics

"... the entire analysis should be reproducible. In real science, this is hard. Redoing all the chemistry, or all the field work, or whatever is asking a lot. But in mathematical and computing sciences, like statistics, reproducibility is perfectly possible. It only takes will and knowledge to do it." --- Charlie Geyer, in his discussion of Sweave, a R function that coordinates Latex integration with R and guarantees that you will be able to replicate your results (and create an archive that lets others replicate your results).

Sweave implements one more piece of Don Knuth's "literate computing" paradigm. In time, fully documented computations will be come standard practice --- at least for published computations. Self-documenting computations spread knowledge efficiently, and they can save billions of your own brain cells. Why not jump in now and get ahead of the game?