---------------------------------------------------------------- LECTURE 14: * Quiz: - What is yhat(xx)? ... - How do you find a stderr and stderr.est for yhat(xx)? compute var(yhat(xx)), estimate sigma with s - What guarantees does a CI for yhat(xx) make? P[ in CI ] = .95 - What are the assumptions? normal linear model is correct (finite sample "exact" inference) N large (asymptotic approximate inference) - What is the conceptual framework for forming a PI? - How do you find the width of a PI? - What are the sources of variation a PI has to account for? . dataset-to-dataset in yhat(xx) . data variation in y(xx) * New Material: - Problems with linear models: (1) E[y] != X beta for any beta (2) V[y] != sigma^2 I for any sigma (3) err !~ N(0,sigma^2) (4) leverage points: outlyingness in predictor space => Need . Detection: diagnostics => graphical methods, combined with inferential ideas . Fixes: better models (may not be possible) - Fixes: . If one can assume that all relevant predictors are at hand, and yet the model does not fit, try TRANSFORMATIONS and additional FUNCTIONAL TERMS such as interactions. . Box-Cox family of power/log-transformation: assuming y>0 Try power transformations, but Box & Cox proposed a power family that contains the logarithm as well: box.cox <- function(y, pow=1) if(pow != 0) return( (y^pow - 1)/pow ) else return( log(y) ) Illustration: y <- seq(0,3,length=201) plot(x=range(y), y=c(-3,1.5), type="n", xlab="y", ylab="box.cox(y)") abline(h=0, col="gray90"); abline(v=1, col="gray90") for(pow in seq(-2,2,by=.2)) { lines(y, box.cox(y,pow), col="gray50") points(0, box.cox(0,pow), pch=16, cex=.3, col="gray50") } pow <--1; lines(y, box.cox(y,pow), col="green", lwd=2) pow <- 0; lines(y, box.cox(y,pow), col="red", lwd=2) pow <- 1; lines(y, box.cox(y,pow), col="blue", lwd=2) pow <- 2; lines(y, box.cox(y,pow), col="orange", lwd=2) Properties: ascending, convex for pow>1, concave for pow<1, BC(1)=0 Ex.: Pick the transformation of y that yields the highest R2 Data example: Boston housing data (stay tuned for background) site <- "http://stat.wharton.upenn.edu/~buja/STAT-541/DATA/" boston <- read.table(paste(site,"boston.dat",sep=""), header=T) rownames(boston) <- paste(scan(paste(site,"boston.geography",sep=""), w=""), 1:nrow(boston)) boston[1:5,] Try a collection of Box-Cox parameters for(pow in c(2.0,1.0, 0.5, 0.0, -0.5, -1.0, -2.0)) cat("power =",pow,":\n R^2 =", summary(lm(box.cox(MEDV, pow) ~ ., data=boston))["r.squared"][[1]],"\n") Winner: pow=0 (but by how much?) (Could be used for transformations of predictors also: Box-Tiao) . The ultimate hammer for transformations: the ACE algorithm (Breiman-Friedman 1985, JASA) -- stay tuned! . Non-linear model terms: When it is likely that y = f(x1,x2,...) + eps with f(...) nonlinear, try: + interactions: y ~ x1 * x2 = y ~ x1 + x2 + x1:x2 => b1*x1 + b2*x2 + b12*x1*x2 = b1*x1 + (b2 + b12*x1)*x2 = b2*x2 + (b1 + b12*x2)*x1 the "effect" of x1 on y depends on level of x2 + Polynomial terms: y ~ x + I(x^2) amounts to a quadratic transformation of the predictor but with free parameters: y = b0 + b1*x + b2*x^2 + r (b2 is the "curvature parameter" for the transformation of x.) - Diagnostics: . R diagnostics plots: help(plot.lm) boston.lm <- lm(MEDV ~ ., data=boston) par(mfrow=c(2,2)) plot(boston.lm) . Refined: par(mfrow=c(2,2), mgp=c(1.8,.5,0), mar=c(3,3,2.5,2)) plot(boston.lm, pch=16, cex=.5, cex.axis=.8, cex.lab=.8) + Residuals vs fitted values: to see violations of ... + Normal q-q plot of stdized residuals: to see violations of ... + S-L plot of sqrt(|stdized residuals|): to see violations of ... + Stdized residuals vs leverage values P_{ii}: to see ... . Missing diagnostics plots in R: + Residuals versus order: to see "lurking" variables (such as: time, space,...) par(mfrow=c(1,1), mgp=c(1.8,.5,0), mar=c(3,3,2.5,2)) plot(resid(boston.lm), xlab="order", ylab="residuals", pch=16, cex=.5, cex.axis=.8, cex.lab=.8); abline(h=0, col="gray50") + Leverage plots: to see the influence of cases on ... source(paste(site,"leverage-plots.R",sep="")) leverage.plots(boston.lm, cex=.5) (Unlike JMP, this routine does not add the means back into the adjusted variables; this is why these plots are zero-centered.) - INFERENCE FOR DIAGNOSTICS PLOTS: . General idea: Q: How can we get an idea whether data that follow the fitted model look anything like the observed data? A: Create datasets from estimated model ------------------------ | "parametric bootstrap" | ------------------------ Parametric bootstrap for linear model: y* ~ N(X b, s^2 I) r <- resid(boston.lm) # actual residual vector RMSE <- function(model) sqrt( deviance(model) / model$df.resid ) X <- model.matrix(boston.lm) yhat <- fitted(boston.lm) s <- RMSE(boston.lm) N <- nrow(X) y.boot <- rnorm(N, m=yhat, s=s) # a simulated response vector r.boot <- resid(lm(y.boot ~ X)) # a simulated residual vector windows(wid=10, hei=5) par(mfrow=c(1,2), mgp=c(1.8,.5,0), mar=c(3,3,2,1)) qqnorm(r, main="Actual Data", pch=16, cex=.5) qqnorm(r.boot, main="Bootstrap Data", pch=16, cex=.5) # Animate the above! Standard R diagnostics for bootstrapped data: boston.boot.lm <- lm(y.boot ~ X) par(mfrow=c(2,2)) plot(boston.boot.lm, pch=16, cex=.6) The idea can be used - for other types of model checks as well, and - for any type of 'generative model'. ----------------------------------------------------------------