Class 23: Choosing the Smoothing Parameter

Review from previous class

This class considered the methods for computing smoothing splines as a particular case of smoothing methods.

Choosing the Smoothing Parameters

Lisp files for today's class

This is a yet more extended and revised draft. Promise, its the last version!

Cross validation

Before going into smoothing methods too deeply, we can get a lot of ideas from the comparable problems in regression. We'll start here today.

Smoothing matrix

The smoothing matrix, which plays the role of the hat matrix or projection matrix H in regression, is the key to understanding various properties of smoothing splines. Once you have a "fast" way to compute S, you can use it to understand how the smoothers are all basically weighted moving averages (with so-called equivalent kernels) and find an "expedient" way to compute the CVSS.

Effects for choosing the smoothing parameter

It can seem in examples that smoothing splines have little advantage over simpler methods, like polynomial regression. The advantages only become visible when you have the right sort of test function, particularly one that has a jump. Jumps lead to the Gibbs phenomenon and lots of ripples that hurt global polynomial fits. Smoothing splines do quite a bit better in these cases.

Effects for choosing the smoothing parameter

It can seem in examples that smoothing splines have little advantage over simpler methods, like polynomial regression. For relatively smooth functions, polynomial regression has much smaller MSE than smoothing splines.

The advantages only become visible when you have the right sort of test function, particularly one that has a jump. Jumps lead to the Gibbs phenomenon and lots of ripples that hurt global polynomial fits. Smoothing splines do quite a bit better in these cases.

Picking the smoothing parameter from data

Simulation lets you see how well a procedure can do, and how its MSE is determined by the smoothing parameter. What it does not show is how to choose this parameter from data, without the benefit of knowing the test function. Cross-validation becomes useful at this point, though the calculations can be hard, they are easily done in a one-time setting. You really only become challenged if you need to simulate such an estimator.

Better things?

The problem with smoothing splines and other smoothers that have a single, "global" smoothing parameter is that they cannot adjust the level of smoothness to suit the underlying function. You either have a very smooth fit or a very rough one, but you cannot mix both. Wavelets, which is where we are ultimately headed, offer one approach to this problem (as do regression splines and the partitioning methods of Friedman).

Next time

Onward to a different type of smoothing based on orthogonal decomposition. We'll start with Fourier series, then move on to wavelets.