Optimal detection of weak positive latent dependence between two sequences of multiple tests
Dave Zhao, Tony Cai, and Hongzhe Li
This paper studies the problem of detecting dependence between two mixture distributions, motivated by questions arising from statistical genomics. The fundamental limits of detecting weak positive dependence are derived and an oracle test statistic is proposed. It is shown that for mixture distributions whose components are stochastically ordered, the oracle test statistic is asymptotically optimal. Connections are drawn between dependency detection and signal detection, where the goal of the latter is to detect the presence of non-null components in a single mixture distribution. It is shown that the oracle test for dependency can also be used as a signal detection procedure in the two-sample setting, and there can achieve detection even when detection using each sample separately is provably impossible. A nonparametric data-adaptive test statistic is then proposed, and its closed-form asymptotic distribution under the null hypothesis of independence is established. Simulations show that the adaptive procedure performs as well as the oracle test statistic, and that both can be more powerful than existing methods. In an application to the analysis of the shared genetic basis of psychiatric disorders, the adaptive test is able to detect genetic relationships not detected by other procedures.