Class 13: Introduction to Array Factorizations

Review from previous class

Using foreign functions

Writing a program in C and compiling with all of the optimization possible produces the fastest code we'll see. You have to decide whether its worth all of the effort. Often, time spent coming up with a better algorithm is better than working hard to code a poor algorithm --- but not always!

Array functions in Lisp

Lisp treats arrays very differently from lists. Arrays are atoms, blocks of memory unsuited for places where you would use a list. These are more suited for many of the iterative applications and linear algebra problems so common in statistics, particularly those related to multiple regression.

Status of Projects

Need to do some computing! See Assignment #3.
My page remains near the lead...

Arrays Operations for use with Regression

Fast stepwise regression

Though it's certainly not new, stepwise regression remains a powerful model building tools, especially when it is combined with some of the more recent methods of variable selection which enhance the AIC and BIC. Indeed, variations on stepwise power some of the most important multivariate smoothing methods such as MARS (Friedman, 1989).

Lisp supports stepwise regression via its ability to manipulate a sweep matrix. A sweep matrix is a computational array related to Gaussian elimination that allows you to add and delete variables quickly from a regression model.

Matrix Factorizations

Numerical analysts frown upon regression calculations which manipulate the cross product matrix X'X. Why? In some so-called ill-conditioned problems, the inaccuracies of computer implementations of arithmetic lead to calculation errors. One case can be introduced in polynomial regression. The hugh collinearity eventually overwhelmes the numerical range of the computer and it gets the wrong answer. Avoiding the X'X calculation improves the computers accuracy.

Perhaps a more important reason to look at alternative calculation strategies is the understanding that they produce. We will look briefly at five matrix factorizations and discuss how each factorization (or decomposition as they are also called) plays an important role in statistics:

LU decompositions: These lead to a system of recursive solutions of linear equations and a first look at factorization. They get you starting thinking about what is involved in solving systems of equations, such as backsubstitutions.
Cholesky factorization: The most common matrix square root, this is basically an LU decomposition in which the factors are equal, but transposed. The statistical application is to generating correlated normal samples, as in regression or time series.
QR factorization: Rather than factor the matrix into triangular forms, the QR forms a Gram-Schmidt reduction into a triangular and orthogonal system. This is the first that we have seen that works with non-square matrices and a popular choice for regression calculations.
Spectral representation: All square matrices can be represented through a nice cannonical form using eigenvectors and eigenvalues. The spectral representation leads to a different version of the square root of a matrix. The version in LispStat is limited to symmetric matrices.
Singular value decomposition: This representation generalizes the notion of eigenvalues/vectors to non-square matrices. The resulting representation leads to a different type of regression.

Lisp script for today's class

class13.lsp

Next time

Matrix factorizations and their applications in statistics.