High-Dimensional Inference: Sparse Signal Recovery
and Covariance Matrix Estimation
IMS Medallion Lecture
Presented at the Joint Statistics Meetings in Washington DC, on August 3, 2009
The analysis of high-dimensional data now commonly arising in scientific investigations poses many statistical challenges not present in smaller scale studies. In this talk I will discuss two problems in high-dimensional inference: sparse signal recovery (compressed sensing, linear regression with large p and small n) and estimation of large covariance matrices.
Reconstructing a high-dimensional sparse signal based on a small number of measurements, possibly corrupted by noise, has attracted much recent interest in a number of fields including applied mathematics, electrical engineering, and statistics. Specifically one considers the linear model y = Xβ + z, where the dimension of the signal β is much larger than the number of observations. It is clear that in this setting the linear model is under-determined and regularity conditions are needed. A commonly used framework is the so-called restricted isometry property (RIP), which essentially requires that every subset of columns of X with certain cardinality approximately behaves like an orthonormal system. The method of l1 minimization provides an effective way for reconstructing sparse signals. It has been shown that l1 minimization can recover β with a small or zero error under various RIP conditions.
In this talk I will give a concise and unified analysis of the constrained l1 minimization method in three settings: noiseless, bounded error and Gaussian noise. The noiseless case, which is a purely mathematical problem, is of particular interest. It yields identifiability conditions, provides deep insight into the problem in the noisy cases, and has a close connection to decoding of linear codes. Our analysis, which yields strong results, is surprisingly simple and elementary. At the heart of our simplified analysis is an elementary, yet highly useful, inequality called Shifting Inequality.
In addition to the RIP, another commonly used condition is the mutual incoherence property (MIP) which requires pairwise correlation between columns of X to be small. We will also analyze l1 minimization under the MIP framework. We give a sharp MIP condition for stable recovery of sparse signals and derive oracle inequalities under the MIP condition.
Covariance matrix plays a fundamental role in multivariate analysis. Estimation of the covariance matrix is needed in many statistical analyses and a wide range of applications, including microarray studies, fMRI analysis, risk management and portfolio allocation, and web search problems.
The sample covariance matrix often performs poorly in high-dimensional settings. A number of regularization methods have been introduced recently and several existing methods and theoretical analyses essentially employ the strategy of reducing the matrix estimation problem to that of estimating vectors, with the aim of optimally estimating individual rows/columns separately. Asymptotic properties including explicit rates of convergence have been given. However, it is unclear whether any of these rates of convergence are optimal.
I will discuss results on the optimal rate of convergence for estimating the covariance matrix as well as its inverse under the operator norm. The results indicate that optimal estimation of the rows/columns does not in general lead to optimal estimation of the matrix under the operator norm. As a vector estimator, our procedure has larger variance than squared bias for each row/column. Other risk measures such as Frobenius norm and matrix l1 norm are also considered. In particular, it is shown that optimal procedures under the operator norm and the Frobenius norm are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation.
A key step in obtaining the optimal rates of convergence is the derivation of the minimax lower bounds. The lower bounds are established by using a testing argument, where at the core is the construction of a collection of least favorable multivariate normal distributions. The technical analysis reveals new features that are quite different from those in the more conventional function/sequence estimation problems.
Covariance Matrix Estimation: