fhat, R^N --> R^N is linear,
and hence a hat matrix exists.
If the hat matrix is a projection, we denote it by H:
fhat <- H %*% y
If the hat matrix is not a projection, we denote it by S for 'smoother':
fhat <- S %*% y
+ In real applications the hat matrix is not needed,
but for understanding a smoothing method, it is useful.
Q: How can we obtain a hat matrix if the software only provides fits?
A: Apply the smoothing algorithm to all canonical basis vectors as y!
S[,j] <- smoothing.algorithm(x, y=j_th_basis_vector) # Pseudo-R-code
+ All smoothers have a complexity or bandwidth parameter.
They are linear smoothers only for a fixed choice of this parameter.
Data-driven choice such as with cross-validation makes them non-linear.
+ Some R infrastructure:
# Toy Data: The X and Y variables:
x1 <- scale(mktg[,'in.']); hist(x, col="gray")
y1 <- scale(mktg[,'rev']); hist(x, col="gray")
plot(x1, y1, pch=16)
# ==> Ties in x! It is important for smoothers to be able to handle ties.
# Parallel sort according to x:
ord <- order(x1)
x <- x1[ord]
y <- y1[ord]
N <- length(x)
rm(x1,y1)
# Generate the j'th canonical basis vector:
canon.basis.vec <- function(j,N) as.numeric(seq(N)==j)
# Generate smoother matrix:
make.S <- function(x, smoother, ...) {
N <- length(x)
S <- matrix(NA, nrow=N, ncol=N)
for(j in seq(N)) S[,j] <- smoother(x, y=canon.basis.vec(j,N), ...)
S
}
# Plot a basis discretized to the observed x values::
plot.cols <- function(x, mat, type="l", col="blue", lwd=2, pch=16, cex=.5) {
ord <- order(x)
par(mfcol=c(ceiling(ncol(mat)/2),2), mar=c(2,2,1,1), mgp=c(1.5,.5,0))
for(j in seq(ncol(mat))) {
plot(x, mat[,j], type="n", ylab="", xlab="", ylim=range(c(0,mat[,j])))
abline(h=0, col="gray")
if(grepl("[bl]",type)) lines(x[ord], mat[ord,j], col=col, lwd=lwd)
if(grepl("[bp]",type)) points(x, mat[,j], col=col, pch=pch, cex=cex)
}
}
# Plot selected rows of a hat/smoother matrix
plot.rows <- function(sel, x, mat, type="l", col="red", lwd=2, pch=16, cex=.5) {
ord <- order(x)
x.rows <- mat[sel,]; x.loc <- x[sel]
par(mfcol=c(ceiling(length(sel)/2),2), mar=c(3,3,1,1), mgp=c(1.5,.5,0))
for(j in seq(along=sel)) {
plot(x, x.rows[j,], type="n", ylab="", xlab="", ylim=range(c(0,x.rows[j,])))
abline(h=0, col="gray"); abline(v=x.loc[j], col="gray")
if(grepl("[bl]",type)) lines(x[ord], x.rows[j,ord], col=col, lwd=lwd)
if(grepl("[bp]",type)) points(x, x.rows[j,], col=col, pch=pch, cex=cex)
}
}
. PROJECTION SMOOTHERS:
Expand X in a basis to form multiple predictors,
then use multiple regression. Examples:
+ POLYNOMIALS:
o Nice because they provide a hierarchy of complexity (the degree)
but they have unfortunate nonlocal behaviors.
o If you ever use polynomials, do not use monomial bases.
They may be numerically unstable.
Instead use orthogonal polynomial bases such as Hermite polynomials.
They have easy recurrence relationships.
You can later orthogonalize them wrt to the discretization of the
observed x values.
o R - Monomials, Hermite polynomials, and empirically orthogonalized polynomials:
# Monomial basis, just for curiosity: Not recommended but works for low orders
x.mon <- sapply(0:5, function(j) x^j)
plot.cols(x, x.mon); rug(jitter(x,5))
# Orthogonalize empirically with a Q-R decomposition:
x.mon.orth <- qr.Q(qr(x.mon))
round(crossprod(x.mon.orth), 14) # Check orthogonality, same as t(...)%*%(...)
plot.cols(x, x.mon.orth)
# Recursion for Hermite polynomials:
x.herm <- matrix(NA, ncol=6, nrow=length(x))
x.herm[,1] <- 1
x.herm[,2] <- x
for(j in 3:ncol(x.herm)) x.herm[,j] <- x*x.herm[,j-1] - (j-1)*x.herm[,j-2]
plot.cols(x, x.herm)
# Orthogonolization wrt to sampled points: use a Q-R decomposition
x.herm.orth <- qr.Q(qr(x.herm))
max(abs(x.herm.orth - x.mon.orth)) # The same; why?
plot.cols(x, x.herm.orth)
# Hat matrix
H.poly <- tcrossprod(x.herm.orth) # same as: x.herm.orth %*% t(x.herm.orth)
sel <- seq(1,1926,by=275) # Select columns in steps of 275
plot.rows(sel, x, H.poly) # Plot weights at 8 locations (8 rows)
rug(x) # Add a 'rug' to show where the x values fall.
# Note the vertical lines -- they indicate the location for which this row provides fhat(x).
# Discuss these traces!
# Would it be justified to call polynomial regression a 'local smoother'?
# ...
fhat.poly <- H.poly %*% y
windows()
plot(x, y, pch=16, cex=.5); lines(x, fhat.poly, lwd=2, col="green")
# Looks nice in spite of the unreasonable looking row weights.
+ B-SPLINES REGRESSION ('B'=basis):
o Preferred over polynomials because they are more local.
o Idea: Place K (<