Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics
Rong Ma, Tony Cai, and Hongzhe Li
Abstract:
Integrating the summary statistics from a genome-wide association study and expression quantitative trait loci data provides a powerful way of identifying genes with expression levels that are potentially associated with complex diseases. We introduce a parameter called T-score that quantifies the genetic overlap between a gene and the disease phenotype based on the summary statistics, based on the mean values of two Gaussian sequences. Specifically, given two independent samples X_{n} ∼ N(θ, Σ_{1}) and Y_{n} ∼ N(μ, Σ_{2}), the T-score is defined as ∑^{n}_{i=1} |θ_{i} μ_{i}|, a nonsmooth functional, that characterizes the number of shared signals between two absolute normal mean vectors |θ| and |μ|. Using approximation theory, estimators are constructed and shown to be minimax rate-optimal and adaptive over various parameter spaces. Simulation studies demonstrate the superiority of the proposed estimators over existing methods. Lastly, the method is applied to an integrative analysis of heart failure genomics data sets and we identify several genes and biological pathways that are potentially causal to human heart failure.