Optimal Sparse Segment Identification with Application in Copy Number Variation AnalysisX. Jessie Jeng, Tony Cai, and Hongzhe Li
- Abstract: Motivated by DNA copy number analysis in genetics, we consider the problem of detecting and identifying sparse short segments in a long one-dimensional sequence of data with additive Gaussian white noise, where the number, length and location of the segments are unknown. We present a statistical characterization of the identifiable region of a segment where it is possible to reliably separate the segment from Gaussian noise. An efficient likelihood ratio selection (LRS) procedure for identifying the segments is developed and the asymptotic optimality of this method is presented in the sense that the LRS can separate the signal segments from the noise as long as the signal segments can be estimated. The proposed method is demonstrated with simulations and analysis of a real data set on identification of copy number variants based on high-density single nucleotide polymorphisms (SNP) data. The results show that the LRS procedure can yield greater gain in power of detecting the true segments than some standard signal identification methods.
- Paper: pdf file.