Sample Size and Power Analysis for Sparse Signal Recovery in Genome-Wide Association Studies
Jichun Xie, Tony Cai and Hongzhe Li
Abstract:
Genome-wide association studies have successfully identified hundreds of novel genetic
variants associated with many complex human diseases. However, there is a lack of
rigorous work on evaluating the statistical power for identifying these variants. In
this paper, we consider the problem of sparse signal identification in genome-wide
association studies and present two analytical frameworks for detailed analysis of the
statistical power for detecting and identifying the disease-associated variants. We present
an explicit sample size formula for achieving a given false non-discovery rate while
controlling the false discovery rate based on an optimal false discovery rate procedure.
The problem of sparse genetic variants recovery is also considered and a boundary
condition is established in terms of sparsity and signal strength for almost exact recovery
of disease-associated variants as well as nondisease-associated variants. A data-adaptive
procedure is proposed to achieve this bound. These results provide important tools for
sample size calculation and power analysis for large-scale multiple testing problems. The
analytical results are illustrated with a genome-wide association study of neuroblastoma.