Computational and Statistical Boundaries for Submatrix Localization in A Large Noisy Matrix
Tony Cai, Tengyuan Liang and Alexander Rakhlin
Abstract:
The interplay between computational efficiency and statistical accuracy in high-dimensional inference has drawn increasing attention in the literature. In this paper, we study computational and statistical boundaries for submatrix localization. Given one observation of a signal submatrix (of magnitude λ and size k_{m} × k_{n}) contaminated with a noise matrix (of size m × n), we establish two transition thresholds for the signal to noise ratio λ/σ in terms of m, n, k_{m}, and k_{n}. The first threshold, SNR_{c}, corresponds to the computational boundary. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis. We introduce an adaptive linear time algorithm that identifies the submatrix with high probability when the signal strength is above the threshold SNR_{c}. The second threshold, SNR_{s}, captures the statistical boundary, below which no method can succeed with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. The results show an interesting phenomenon that SNR_{c} is always significantly larger than SNR_{s}, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.