Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data AnalysisTony Cai, X. Jessie Jeng, and Hongzhe Li
- Abstract: Motivated by copy number variation analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in an ultra long linear sequence of data with unspecified noise distribution. Based on a local median transformation, we propose a computationally efficient method called robust segment identifier (RSI), which provides a robust and optimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the segment signals and show that the RSI consistently estimates the signal segments whenever it is possible to detect their existence. We present simulations to demonstrate the effect of data transformation and the efficiency of our method under different noise distributions. We also present results from an application to copy number variant analysis using next generation sequencing data of the HapMap Yoruban sample NA19240 to further illustrate the theory and the methods.
- Paper: pdf file.
- Other related papers:
Jeng, X. J., Cai, T., and Li, H. (2010).
Optimal sparse segment identification with application in copy number variation analysis .
Journal of American Statistical Association , to appear.