================================================================ LECTURE 6: * ORG: - Today''s lecture is being video taped because of special holidays. - We have a TA! Name: Yuzhou Liu . He will be the grader from now on. . His office hrs: Thu 2-3pm, Rm JMHH 440 - HW1: returned by email . Change extension from '.R' to '.txt' for viewing in an editor. . Apologies: The comments ('##AB: ') tend to be a little pedantic. . Recommendation: Study posted solutions to learn to 'think in R'. Review of solutions... - HW2: Will be graded and solutions posted when all submissions are in. - HW3: Due Fri, Oct 2, 12 noon Some instructions added at the top of the file. Please, download a new copy and read the header. * RECAP: - Instructor''s HW: Draw a nice LS pic. LS.geom() ## Download from class webpage, Lecture 5. - Linear model and estimation: y = X beta + eps = X b + r (Greeks vs Romans) - What is random, and in what sense? ... - What are the stochastic 1st and 2nd order assumptions? ... . What do they mean? ... . Can we be sure the assumptions hold? ... . What can we do to get a sense of how realistic they are? ... - What is the relation between 'b' and 'beta'? ... . What assumptions are used to derive this? ... . What algebraic identy is used? ... - What are the 2nd order properties of 'b'? ... . What assumptions are used to derive them? ... . What algebraic identity is used? ... - In all this, what kind of variability is being described? ... - Recall the simulation we performed at the end of last class: . What did we simulate 'Nsim' times? ... . Explain the following code snippets: apply(bs, 2, mean) beta and: cov(bs) solve(t(X)%*%X)*sigma^2 . In both cases, what happens as 'Nsim' --> Inf? . What do these plots illustrate? windows(height=5, width=10); par(mfrow=c(1,2), mgp=c(1.8,.5,0), mar=c(3,3,1,1)) plot(cbind(x=X[,2],y=y), pch=16, cex=1, col="gray", ylim=c(-2,6)) for(i in 1:100) abline(bs[i,]) plot(bs, pch=16, cex=.5) . In practical data analysis, do we usually get to see what amounts to multiple 'simulations'? ... with the exception of ... * ROADMAP: - Linear Model Analysis, Step 2: Variability of yhat and r - Sampling properties of the predictor matrix to answer this Q: Where is root-N hiding? - Degrees of freedom and dimensions of subspaces - Estimating the error variance 'sigma' ================================================================ * LINEAR MODEL ANALYSIS, STEP 2: VARIABILITY OF yhat AND r - There is variability in 'b' ... => There is variability in 'yhat' and 'r'. yhat = ...X b = ... P y r = y - yhat = ... (I-P) y - Strangeness about the randomness in yhat and r: Even though both are N-dimensional, . yhat can only range in ... . r can only range in ... - Let as usual: P = X (X^T X)^{-1} X^T . First order: E[yhat] = ... E[ P y ] = P E[y] = P X beta = X beta E[r] = ... E[ (I-P) y ] = (I-P) E[y] = (I-P) X beta = 0 What assumptions are used? ... 1st order model What assumptions are not used? ... 2nd order model . Second order: V[yhat] = ... V[P y] = P V[y] P = P sigma^2 I P = sigma^2 P V[r] = ... V[(I-P) y] = (I-P) V[y] (I-P) = (I-P) (sigma^2 I) (I-P) = sigma^2 (I-P) V[yhat,r] = ... V[ Py, (I-P)y ] = P V[y,y] (I-P) = P sigma^2 I (I-P) = 0 What assumptions are used? ... 2nd order model What assumptions are not used? ... 1st order model - Special cases: . Diagonal elements: Var[yhati] = ... sigma^2 Pii Var[ri] = ... sigma^2 (1-Pii) . Off-diagonal elements: Cov[yhati,yhatj] = ... sigma^2 Pij Cov[ri,rj] = ... sigma^2 (-Pij) - What variability is being described in the above equations? ... - Implications: . Residuals are generally [...]correlated even when the errors are assumed ... uncorrelated. . Residuals are generally [...hetero?]scedastic even when the errors are assumed ... homoscedastic. . Pairs of distinct fits and corresponding pairs of residuals are generally correlated with the [...same/opposite] sign. . The variance of residuals is generally ... (>, Inf), what happens to (X^T X) ? ... [converges/diverges...] what happens to (X^T X)^{-1} ? ... [converges/diverges...] what happens to X^T X/(N-1) ? ... converges to the population cov matrix of the predictors what happens to ( X^T X/(N-1) )^{-1} ? ... converges to the inverse of the pop cov matrix of predictors . Q: What does this formula say about the precision of estimates of regression coefficients in relation to the distribution of the predictor vectors? A: We gain precision of estimation with more data at the rate 1/(N-1) ~ 1/N for dataset-to-dataset variances. Proof: V[b] = sigma^2 (X^T X)^{-1} = sigma^2 (sum_{n=1...N} xn^T xn)^{-1} = sigma^2 (sum_{n=1...N} xn^T xn / (N-1))^{-1} / (N-1) ~ sigma^2 V[predictors] / (N-1) . Memorize: X^T X diverges/explodes (Why? Summing up squares) Vhat(x) = (X^T X)/(N-1) converges to V[x]. (Why? LLN: mean of squares) ==> V[b] = sigma^2 Vhat(x) /(N-1) shrinks at the rate ~1/N. ================================================================