Class 23: Choosing the Smoothing Parameter


Review from previous class


Choosing the Smoothing Parameters

Lisp files for today's class

Handout
This is a yet more extended and revised draft. Promise, its the last version!

Cross validation
Before going into smoothing methods too deeply, we can get a lot of ideas from the comparable problems in regression. We'll start here today.

Smoothing matrix
The smoothing matrix, which plays the role of the hat matrix or projection matrix H in regression, is the key to understanding various properties of smoothing splines. Once you have a "fast" way to compute S, you can use it to understand how the smoothers are all basically weighted moving averages (with so-called equivalent kernels) and find an "expedient" way to compute the CVSS.

Effects for choosing the smoothing parameter
It can seem in examples that smoothing splines have little advantage over simpler methods, like polynomial regression. The advantages only become visible when you have the right sort of test function, particularly one that has a jump. Jumps lead to the Gibbs phenomenon and lots of ripples that hurt global polynomial fits. Smoothing splines do quite a bit better in these cases.

Effects for choosing the smoothing parameter
It can seem in examples that smoothing splines have little advantage over simpler methods, like polynomial regression. For relatively smooth functions, polynomial regression has much smaller MSE than smoothing splines.

The advantages only become visible when you have the right sort of test function, particularly one that has a jump. Jumps lead to the Gibbs phenomenon and lots of ripples that hurt global polynomial fits. Smoothing splines do quite a bit better in these cases.

Picking the smoothing parameter from data
Simulation lets you see how well a procedure can do, and how its MSE is determined by the smoothing parameter. What it does not show is how to choose this parameter from data, without the benefit of knowing the test function. Cross-validation becomes useful at this point, though the calculations can be hard, they are easily done in a one-time setting. You really only become challenged if you need to simulate such an estimator.

Better things?
The problem with smoothing splines and other smoothers that have a single, "global" smoothing parameter is that they cannot adjust the level of smoothness to suit the underlying function. You either have a very smooth fit or a very rough one, but you cannot mix both. Wavelets, which is where we are ultimately headed, offer one approach to this problem (as do regression splines and the partitioning methods of Friedman).


Next time

Onward to a different type of smoothing based on orthogonal decomposition. We'll start with Fourier series, then move on to wavelets.