False Discovery Rate Control for High Dimensional Dependent Data with an Application to Large-Scale Genetic Association Studies
Jichun Xie, Tony Cai, John Maris and Hongzhe Li
Large-scale genetic association studies are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. Motivated by an application for analysis of data from the genetic association studies, we consider the problem of false discovery rate (FDR) control under the high dimensional multivariate normal model. Using the compound decision rule framework, we develop an optimal joint oracle procedure and propose to use a marginal procedure to approximate the optimal joint optimal procedure. We show that the marginal plug-in procedure is asymptotically optimal under mild conditions. Our results indicate that the multiple testing procedure developed under the independent model is not only valid but also asymptotically optimal for the high dimensional multivariate normal data under some weak dependency. We evaluate various procedures using simulation studies and demonstrate its application to a genome-wide association study of neuroblastoma (NB). The proposed procedure identified a few more genetic variants that are potentially associated with NB than the standard p-value-based FDR controlling procedure.