Lectures on Ultra High Dimensional Regression
at Department of Biostatistics, Harvard University, April 16, 2010
The analysis of high-dimensional data now commonly arising in scientific investigations poses many statistical challenges not present in smaller scale studies. In these lectures I will discuss high-dimensional linear regression with large p and small n. This problem has attracted much recent interest in a number of fields including applied mathematics, electrical engineering, and statistics. To provide a proper background and foundation for the main topics, we shall begin with discussions on the high-dimensional Gaussian sequence model. We then consider the linear model y = Xβ + z, where the dimension of the signal β is much larger than the number of observations. It is now well understood that l1 minimization methods provide effective ways for high dimensional sparse regression. I will present an elementary and unified analysis of l1 minimization methods including Lasso and the Dantzig Selector in three settings: noiseless, bounded error and Gaussian noise. Time permitting, I will also discuss l1 minimization approaches to sparse precision matrix estimation.
References and Slides:
- Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig Selector. The Annals of Statistics 37, 1705-1732.
- Cai, T., Liu, W. & Luo, X. (2011).A constrained l1 minimization approach to sparse precision matrix estimation. J. American Statistical Association 106, 594-607.
- Cai, T., Wang, L. & Xu, G. (2010). Shifting inequality and recovery of sparse signals. IEEE Transactions on Signal Processing 58, 1300-1308.
- Cai, T., Wang, L. & Xu, G. (2010).Stable recovery of sparse signals and an oracle inequality.IEEE Transactions on Information Theory 56, 3516-3522.
- Cai, T., Zhang, C.-H. & Zhou, H. (2010).Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics 38, 2118-2144.
- Candes, E. T. and Tao, T. (2007). The Dantzig Selector: Statistical estimation when p is much larger than n (with discussion), The Annals of Statistics 35, 2313-2351.
Harvard Lecture Slides