Tony Cai's Papers

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Tony Cai and Wenguang Sun

Abstract: In large-scale multiple testing problems, data are often collected from heterogeneous sources and hypotheses form into groups that exhibit different characteristics. Conventional approaches, including the pooled and separate analyses, fail to efficiently utilize the external grouping information. We develop a compound decision theoretic framework for testing grouped hypotheses and introduce an oracle procedure that minimizes the false non-discovery rate subject to a constraint on the false discovery rate. It is shown that both the pooled and separate analyses can be uniformly improved by the oracle procedure. We then propose a data-driven procedure that is shown to be asymptotically optimal. Simulation studies show that our procedures enjoy superior performance and yield the most accurate results in comparison with both the pooled and separate procedures. A real data example with grouped hypotheses is studied in detail using different methods. Both theoretical and numerical results demonstrate that exploiting external information of the sample can greatly improve the efficiency of a multiple testing procedure. The results also provide insights on how the grouping information is incorporated for optimal simultaneous inference.
Paper: pdf file.
Other related papers:
Sun, W. & Cai, T. (2007).
Oracle and adaptive compound decision rules for false discovery rate control.
J. American Statistical Association 102, 901-912.
Jin, J. & Cai, T. (2007).
Estimating the null and the proportion of non-null effects in large-scale multiple comparisons.
J. American Statistical Association 102, 495-506.
Cai, T., Jin, J. & Low, M. (2007).
Estimation and confidence sets for sparse normal mixtures.
The Annals of Statistics 35, 2421-2449.
Sun, W. & Cai, T. (2009).
Large-scale multiple testing under dependency.
Journal of the Royal Statistical Society, Series B 71, 393-424.
Cai, T. & Jin, J. (2010).
Optimal rates of convergence for estimating the null and proportion of non-null effects in large-scale multiple testing.
The Annals of Statistics 38, 100-145.

Back to Tony Cai's Homepage