================================================================ LECTURE 12: * ORG: - HW 5 due Friday, Oct 22, noon - Reading/writing assignment yet to be posted, check back... * RECAP AND REMINDERS: - Reminder: xj.adj = xj adjusted for all other predictors in the model = ... for all k (=/=j) = ... for all residual vectors - Linear algebra: If Q = (Q1,Q2) is an nxn orthogonal matrix decomposed into two blocks, Nx(p+1) and Nx(n-p-1), what are the following? Q1^T Q1 = ... Q2^T Q2 = ... Q1 Q1^T = ... Q2 Q2^T = ... Q1 Q1^T + Q2 Q2^T = ... - Assume: V[y] = sigma^2 I in R^N. . If u is a unit length vector in R^N, what is the variance V[]? ... . If u1 and u2 are vectors in R^N, and u1 _|_ u2, what is Cov[, ]? ... . If in addition y is multivariate normal, what can you say about the dependence between and ? ... . Assuming y multivariate normal, N(0, sigma^2 I), and u0, u1, ..., up orthonormal, what can we say about the dependence between and (^2 + ^2 +...+ ^2)/p ? . For kicks, what about the dependence between f(,) and g(,...,) ? - Distribution theory: Assume y ~ N(mu, sigma^2 I). . If u in R^N is a unit vector, and if Q is Nxk with o.n. columns, Q^T Q = I_k, and if Q^T u = 0, and if Q^T mu = 0, what is the distribution of < u, y-mu > / sqrt( | Q^T y |^2 / k ) ??? . What is the distribution of (y1-mu1) / sqrt(((y2-mu2)^2+...+(yN-muN)^2)/(N-1)) ??? . What is the distribution of sum_i (yi-mui)/sqrt(N) / sqrt( sum_i (yi - mean(yi))^2 / (N-1)) ??? - F-tests: . If t has a t-distribution with k degree of freedom, what is the distribution of t^2 ? . If y ~ N(0, sigma^2 I), and if Q1 is of is of size Nxp1 and Q2 of size Nxp2, and if Q1 and Q2 have both o.n. columns, and if Q1^T Q2 = 0, what is the distribution of the following quantity and why? (|Q1^T y|^2 / p1) / (|Q2^T y|^2 / p2) . If y ~ N(beta0 + beta1*x1 + beta2*x2, sigma^2 I), and if r is the residual vector of the corresponding regression, what is the distribution of the following ratio and why? (|y.0 - r|^2 / 2) / (|r|^2 / (N-3)) . If R2 is the R-square of a linear regression with p predictors, what is the distribution of the following ration and why? (R2 / p) / ((1-R2) / (N-p-1)) * ROADMAP: - CISs and PIs for the response ================================================================ * CIS AND PIS FOR THE RESPONSE: - Inference for the response vs. prediction of the response: CIs vs PIs . Let xx be a row (!) from the design matrix X written as a column, or a new set of predictor values. Ex.: When modeling the car data with MPG = b0 + b1*WEIGHT + b2*HP we have xx=(1,WEIGHT,HP)^T We might want to know something about y=MpG at xx=(1, 3000, 200)^T yhat(xx) = where b=(b0,b1,b2)^T. . Inference for the response is really inference for E[y(xx)] = We look for an interval that contains the true with prob 95%. This interval is called a 'CI' for . It is of the form yhat(xx) +- width. . Prediction of the response is + either point estimation of E[y(xx)] using yhat(xx) = , + or interval estimation of E[y(xx)] using yhat(xx) +- width. In the latter case the interval is called a 'PI'. Its gambling guarantee should be to catch future response values y at xx based on past training data with probability 95%. (Both future response values and past data are random!) . Q: Which is going to be wider, the CI or the PI? - Inference for E[y(xx)]: . CI = yhat(xx) +- 2*stderr.est(yhat(xx)) + If the model is unbiased, E[yhat(xx)] = then: Var[yhat(xx)] = Var[] = Var[xx^T (X^T X)^{-1} X^T y] = sigma^2 (xx^T (X^T X)^{-1} xx) ==> stderr.est(yhat(xx)) = s sqrt(xx^T (X^T X)^{-1} xx) Note that for xx=xi (= i''th row of X), this is stderr.est(yhat(xx)) = s sqrt(Pii) + CI: 95%CI = yhat(xx) +- 2 stderr.est(yhat(xx)) for true E[y(xx)]=mu(xx)= Uses: Confidence bands around fitted lines in simple regression + Testing H0: Example (rarely done) H0: E[MPG(WEIGHT=3000,HP=200)] = mu0(xx) = 20 mpg (rarely done) => xx = (1,3000,200)^T t = (yhat(xx) - mu0(xx))/stderr.est(yhat(xx)) Reject at ~5% significance when |t|>2. + '2' in CIs and tests should really be: qt(p=.975, df=n-p-1) + Ex.: Simple linear regression with one predictor x, calculate (X^T X)^{-1} plot CI(x) as a function of x. - Prediction intervals based on yhat(xx): . Let y(xx) be the random variable of FUTURE response values at xx whereas yhat(xx) is obtained from PAST data. . Assume the model is correct, wrt first and second moments and normality for past and future data. . Wanted: an interval around yhat(xx) that has a 95% chance of catching y(xx) . Approach: Find distribution of y(xx)-yhat(xx). E[ y(xx) - yhat(xx) ] = 0 V[ y(xx) - yhat(xx) ] = V [ y(xx) ] + V[ yhat(xx) ] = sigma^2 + stderr(yhat(xx))^2 = sigma^2 + sigma^2*(xx^T (X^T X)^{-1} xx) = sigma^2 * (1 + xx^T (X^T X)^{-1} xx) Assuming past and future response values are normally distributed: y(xx) - yhat(xx) t = -------------------------------- has a t-distribution with df=n-p-1 s*sqrt(1 + xx^T (X^T X)^{-1} xx) Reasons: The Gaussian assumption AND the numerator is independent of the denominator because - s is computed from the residuals of past data y - yhat(xx) is computed from the projection of y onto the X-space - y(xx) is future data at xx => s is independent of yhat(xx) and y(xx) (recall: V[r,yhat]=0 => under normality r and yhat are independent) . PI(xx) = yhat(xx) +- 2*s*sqrt( 1 + xx^T (X^T X)^{-1} xx ) ('2' should be ...) ================================================================