One feature of hypothesis testing that is perhaps too often ignored is that one seldom tests just one hypothesis. Even in the most classical situations, one almost always makes multiple tests. In more modern investigations, such as those that are common in machine learning, bioinformatics, or even the econometrics of financial time series, one might test several thousands of hypotheses.
Obviously such activities call for some rethinking of the traditional p=0.05 standards.
In fact, a large (and rambling) literature addresses this important issue. As our course continues, I will introduce what seem to me to be some of the main messages that emerge.
A great place to start is with Holm (1979).
It would be nice to have a simple survey or textbook presentation that gives one the current "big picture.".
The best that I have found on the web is the tutorial by Lee and Whitmore. Unfortunately, these slides don't come with a sound track, so you will need to fill in a few blanks. Let me know if you find a more suitable introduction.
The false discovery rate (FDR) is a notion that is closely tied up with the methodology for testing multiple hypotheses. Eventually I expect to develop a page that specifically addresses FDR. Another related topic is "data snooping," and issue that is an eternal part of the conversation when one considers trading strategies. Of the papers noted below, only Romano and Wolf (2005) explicitly address econometric concerns, but I am sure that there are further relevant resources.
Hommel (1988) seems like a sensible place to start if you get interested in these issues. Wright (1992) is also well written and gives a slightly shifted perspective. Marcus, Peritiz, and Gabriel (1976), which introduces the notion of a "closed set" of hypotheses, is warmly recommended by Bob Stine.
Wright, S. P. (1992) Adjusted P-Values for Simultaneous Inference, Biometics 48, 1005--1013.