================================================================ LECTURE 5: * ORG: - Q: Does Lecture 6 need to be video-recorded? YES Reason: Special holidays next week - Lecture 4 posted - HW honor code posted: code of conduct for scientific writing - HW 1: Solutions posted Do not consult if you received an extension (honor code). Most everybody seems to have done well. - HW 2: due Fri noon; questions? Ask for an extension if necessary. This HW is the best litmus test for this course. - HW 3: Problems with 'dict' anyone? - HW 4 posted: Linear algebra and linear inference - HW 5 will most likely be another exercise in text analysis. - HW 6 will most likely be another exercise in linear statistical methods. - Q: Did anyone have a problem installing R-2.9.2? My Rgui blows up, even though R works under Emacs. I also had three warnings when installing R. * RECAP: - Multivariate Expectation Vectors and Variance/Covariance Matrices: . Given: Random vectors Y (px1) and Z (qx1) with joint distribution Example: (Y,Z) with a joint multivariate normal distribution . Definition: E[Y] = average of Y over infinitely many i.i.d. draws (LLN) technically: prob-weighted average/integral of Y(omega) . Properties: E[a*Y+b*Z] = ... linearity E[c] = ... c . Definition: V[Y,Z] = ave of (Y-E[Y])(Z-E[Z])^T over inf. iid draws of (Y,Z) = prob-weighted ave/integral of (Y-E[Y])(Z-E[Z])^T for each matrix component (i,j): (Yi-E[Yi])(Zj-E[Zj]) Special case: V[Y] = V[Y,Y] . Properties: V[Y,Z] = V[Z,Y]... V[a*Y1+b*Y2,Z] = ... V[c] = ... V[Y+c,Z+d] = ... V[aY+bZ] = ... V[AY,BZ] = A V[Y,Z] B^T... (A, B commensurate fixed coeff matrices) V[AY] = A V[Y] A^T ... - Linear Model Assumptions: . Ingredients: y = ... Nx1 random X = ... Nx(p+1) considred fixed/... beta = ... fixed unknown eps = ... . 'Trick' questions: beta = vector of LS estimates, Y/N? eps = residual vector, Y/N? . Assumptions: Structure: y = ... X beta + eps 1st order moments: E[eps] = ... 0 (Nx1) E[y] = ... X beta (Nx1) 2nd order moments: V[eps] = ... sigma^2 I V[y] = ... sigma^2 I . What do the following mean? What kind of means/vars/covs are these? E[eps] = ... V[eps] = ... * ROADMAP: Analysis of linear models - 1st and 2nd order properties of . LS estimate 'b' of 'beta' . 'yhat' and 'r' - Sampling properties of the predictor matrix X (even though we condition on X). ================================================================ * LINEAR MODEL ANALYSIS, STEP 1: VARIABILITY OF b - Reminder: The LS Estimate is b = (X^T X)^{-1} X^T y - Type of variability in b: ... b = (p+1)x1 random vector varying from response vector y to response vector y... - Making the linear model assumptions, it follows: E[b] = ... beta Actually, which part of the assumptions was used? ... 1st order => Unbiased estimator - Making the linear model assumptions, it follows: V[b] = ... sigma^2 (X^T X)^{-1} Actually, which part of the assumptions was used? ... 2nd order NOTE: V[b] is conditional on X (if X is random...) - What variability does V[b] describe? ... - What is the interpretation of the diagonal elements in V[b]? ... - Could one hope to estimate V[b]? ... - What would such estimates be good for? ... - Simulation example: simple linear regression with intercept ==> X is a 1-predictor matrix, hence of size Nx2. ## R: N <- 100 # Picking a sample size X <- cbind(x0=rep(1,N), x1=runif(N)) # Making up a predictor matrix sigma <- 1 # Assuming a 'true' error variance beta <- c(b0=1, b1=2) # Assuming a 'true' coeff vector y <- rnorm(N, m=X%*%beta, s=sigma) # Simulating a response vector b <- solve(t(X)%*%X)%*%t(X)%*%y; b ## Simulate and animate parallel universes/possible worlds/...: for(i in 1:1000) { y <- rnorm(N, m=X%*%beta, s=sigma) # Simulating a response vector b <- solve(t(X)%*%X)%*%t(X)%*%y; b plot(x=X[,2], y=y, pch=16, col="gray20", ylim=c(-2,6)) abline(b) for(j in 1:100000) {} # Crude way to slow down the animation } ## ==> Dataset-to-dataset (sampling) variability ## Simulate many response vectors (possible worlds, parallel universes,...): Nsim <- 10000 bs <- matrix(NA, nrow=Nsim, ncol=length(beta)) # Storage for coeff estimates colnames(bs) <- paste("b",0:(length(beta)-1),sep="") # Cosmetics for(isim in 1:Nsim) { y <- rnorm(N, m=X%*%beta, s=sigma) # Simulating a response vector bs[isim,] <- solve(t(X)%*%X)%*%t(X)%*%y if(isim%%100==0) cat(isim,"...") } ## Plot the 'sampling' or 'dataset-to-dataset' variability of b: plot(bs, pch=16, cex=.5) ## ==> Correlation between b0 and b1 !!! ## Compare: cov(bs) # The matrix of multivariate variance/covariance estimates of 'bs' solve(t(X)%*%X)*sigma^2 # What's this again? ## Some intuitions: draw some lines according to 'bs' and compare to plot of 'bs' windows(height=5, width=10); par(mfrow=c(1,2), mgp=c(1.8,.5,0), mar=c(3,3,1,1)) plot(cbind(x=X[,2],y=y), pch=16, cex=1, col="gray", ylim=c(-2,6)) # last y for(i in 1:100) abline(bs[i,]) # lines from first 100 simulations plot(bs, pch=16, cex=.5) # second plot: shows negative correlation ## How does the negative correlation on the right ## relate to the lines on the left? ## ... ================================================================