================================================================ LECTURE 8: * ORG: - HW 2 is graded, will be returned shortly. - HW 4 will be posted tomorrow, due Wed in a week (Use TA office hr Thu and instructor''s office hr Mon after class.) * RECAP: - Self-influence: measured by ... Pii - Standardized residuals: . Root cause of the need to standardize: Unlike errors, residuals are not ... . ri* = ri ... - Degrees of freedom related to fits and residuals = dimensions of ... = traces of ... - Unbiased estimate of sigma^2: ... . Required assumptions: ... and ... . What is the area of statistics that checks model assumptions? ... (Other than checks of model assumptions what else is lumped into this area? ... and ...) - Today we will need the formula for the slope of a simple linear regression without intercept: b = ------- |x|^2 * ROADMAP: - Algebra, geometry, and meaning of ADJUSTMENT (contd.) - Distribution theory for linear models - Inference for linear models ================================================================ * "ADJUSTMENT" AND ITS ALGEBRA: - Intuitive Example: 'Explain' student''s school performance. . Factors of interest: Measures of school quality (Class size, training of teachers, teacher salaries, physical plant,...) ==> Collect these predictors in a matrix X1. . Factors NOT of interest in the study but cannot be ignored: Demographics (Household income, home zip code, education level of parents, ...) ==> Collect these predictors in a matrix X2. . Outcome variables: SAT scores, grades, graduation vs dropping out, ... . Idea: Need to ADJUST the variables of interest for demographics. - Partitioning linear regression: Partition the predictor matrix: X = (X1,X2) (Nxp1, Nxp2) X1 = predictors of primary interest X2 = non-ignorable predictors not of primary interest Partition the LS estimator: b^T = (b1^T,b2^T) (p+1 = p1+p2) y = X b + r = X1 b1 + X2 b2 + r Compare with naive approach: y = X1 b1* + r* - Q: How are b1 and b1* related to each other? A: In general we only know that b1 and b1* are not the same. Reason: The following problems produce different b1 coefficients: | y - (X1 b1 + X2 b2) |^2 = min_b1,b2 | y - (X1 b1*) |^2 = min_b1* - Define orthogonal (LS) projections according to the partitions: P = projection onto column space of X P1 = projection onto column space of X1 P2 = projection onto column space of X2 (P1, P2 are NOT partitions of P; all are NxN) Question: When is P = P1 + P2 ? (see HW2, P7) - Definition: residual operations I-P2 = "adjustment operator" w.r.t. X2 y.2 = (I-P2) y is "y adjusted for X2" Intuitively: Removing the variation that can be "explained" by X1 or X2 "accounted for" Practical example from above: X1 = predictors describing school quality X2 = predictors describing demographics y = performance on SAT y.2 = performance on SAT adjusted for demographics Note: y.2 is a residual vector in a model with predictors X2. Upcoming surprise: We will need X1 adjusted for X2 !!!! X1.2 = (I-P2) X1 = Collection of predictors of interest (school quality) adjusted for demographics (X2). - Some higher trivialities: Assuming y = X1 b1 + X2 b2 + r is the LS decomposition based on both X1 and X2, note the following: r and the columns of X are orthogonal to each other, hence r and the columns of X1, X2 are ... ==> r is adjusted for both X1 and X2. P X = ... X P X1 = ... X1 P X2 = ... X2 P r = ... 0 (I-P) r = ... r (I-P) X = ... 0 P1 r = ... 0 P2 r = ... 0 (I-P1) r = ... r (I-P2) r = ... r (I-P1) X1 = ... 0 (I-P2) X2 = ... 0 - Adjust the LS decomposition y = X1 b1 + X2 b2 + r for X2 by applying I-P2 to both sides: (I-P2) y = (I-P2) X1 b1 + r (*****) ^^^^^^^^ ^^^^^^^^^ ^ X2-adjusted X2-adjusted Remains fixed under response y predictors X1 X2-adjustment Notation: y.2 = (I-P2) y 'y adjusted for X2' X1.2 = (I-P2) X1 'X1 adjusted for X2' The following is a LS decomposition: y.2 = X1.2 b1 + r It solves | y.2 - X1.2 b1 |^2 = min_b1 (***) (Why? Because ... and ... are orthogonal. Why are they orthogonal? ...) The solution b1 of (***) is the same as that of | y - X1 b1 - X2 b2 |^2 = min_b1,b2 (**) - Interpretation of b1: To get the LS regression coefficients b1 for the joint regression y on X1 and X2, one can: -------------------------------------------------- | | | Regress the X2-adjusted response, (I-P2)y, | | on the X2-adjusted X1, (I-P2)X1. | | | -------------------------------------------------- - Application to a special case: A single predictor adjusted for all else X1 = single column xj (Nx1) X2 = all columns of X except column xj (Nxp) => The adjusted equation (*****) amounts to a simple linear regression without intercept. Simplified notation: y.adj = (I-P2) y (y adjusted for all xk except xj) xj.adj = (I-P2) xj (xj adjusted for all xk except xj) bj = slope of simple lin. regr. of "y adjusted for all but xj" onto xj adjusted for all other predictors < y.adj, xj.adj> bj = ---------------- ("adjustment formula" for bj) |xj.adj|^2 Reminder: bj = j''th MULTIPLE regression coefficient . Elementary consequence: Does it matter for the value of bj if we add or drop some of the other variables? ... . Some weirdness: Adjusting y is actually not necessary. Adjusting xj is. Numerator of adjustment formula = < y.adj, x.adj> = <(I-P2)y, (I-P2)xj> = (Why? Why? Why?) = < y, xj.adj> bj = -------------- (the "weird adjustment formula" for bj) |xj.adj|^2 - Adjusted variables plot or "leverage plot": . A plot of y.adj vs. xj.adj shows what the LS criterion "sees" when determining bj. . Example: Fetch data from class website; after downloading to folder 'Data'... cars <- read.table("Data/CarModels2003-4.TXT", header=T, sep=",", na=".") names(cars); dim(cars) rownames(cars) <- paste(1:nrow(cars),as.character(cars[,"Model"])) apply(cars, 2, function(x) sum(is.na(x))) # there is an NA in "MPG.Highway" sel <- !is.na(cars[,"MPG.Highway"]) # to select non-NAs # multiple regression of response on two predictors: cars.lm <- lm(MPG.Highway ~ Weight.lbs + Horsepower, data=cars[sel,]) # adjusted response and other predictor: mpg.adj <- resid(lm(MPG.Highway ~ Horsepower, data=cars[sel,])) wgt.adj <- resid(lm(Weight.lbs ~ Horsepower, data=cars[sel,])) # compare coeff of Weight: same!! "proof" by example... lm(mpg.adj ~ wgt.adj); cars.lm plot(mpg.adj ~ wgt.adj, pch=16, cex=.5); abline(lm(mpg.adj ~ wgt.adj)) . "Leverage Plot" because it shows points that have high leverage for bj (every bj can have its own leverage points) . Canned function for leverage plots with optional point identification: source("http://www-stat.wharton.upenn.edu/~buja/STAT-541/leverage-plots.R") leverage.plots(model=cars.lm, ident=c(1,2), frame.sz=5) # stop with right click . Another use of leverage plots: predictor variables shrink when adjusted; leverage plots show the shrinkage in the horizontal width; one can then read off what estimated average difference in the response corresponds to the adjusted range (xxx work this out w example) . Note to future TAs of Stats classes: The JMP software provides leverage plots with all regression outputs. - Variance/stderr of bj: . Note about terminology: We call 'standard error' the true sdev of an estimate. We call 'standard error estimate' the estimated sdev of an estimate. (Most people say 'standard error' and mean 'standard error estimate'.) . Goal: Obtain Var[ bj ] from the adjustment formula for bj. Recall variance of a linear combination: Var[ ] = a^T V[y] a = sigma^2 |a|^2 Use a = xj.adj: Var[ ] = sigma^2 |xj.adj|^2 To obtain V[bj], you need to divide this by |xj.adj|^4: sigma^2 Var[bj] = ------------ |xj.adj|^2 --------------------------- | | | sigma | "Adjustment formula" | stderr[bj] = ---------- | for standard errors | |xj.adj| | of regression coeffs | | --------------------------- . Consequences for statistical inference: + What does |xj.adj| mean statistically speaking? Consider |xj.adj|/sqrt(N-1): ... sdev(xj.adj) + Do we like small or large values of |xj.adj| ? + How do you "see" |xj.adj| in the leverage plot? + What happens to |xj.adj| if we add a predictor that is highly correlated with xj? + How would you measure how collinear xj is with all other predictors? + See Mosteller & Tukey, p.326f, on the proxy problem ================================================================