================================================================ LECTURE 4: * RECAP: - R intro: . Basic data types . Composite data types . Loops and conditionals . Character/string/text analysis . Plotting . Distributions and their operations . Additional comments: * Distinguish 'if()' and 'ifelse()') * Coercion of matrices/arrays to vectors: column-major storage (like Fortran, unlike C) - Fundamental intuitions about statistical inference . Statistical inference deals with variability in ... estimates . Of what nature is this variability? ... dataset to dataset . What is strange about the statistical variability * in regression models? ... conditioning on X * in time series models? ... - Beginnings of linear model theory: . Ingredients: * Design/predictor matrix: X of size N x (p+1) * Response vector: y of size N x 1 * Squared norm/length: |y|^2 = y1^2 + y2^2 + ... + yN^2 . LS minimization: | y - X b |^2 = min_b . Estimates: b = argmin_b | y - X b |^2 = (X^T X)^{-1} X^T y (?) . Orthogonal projections: needed to generate yhat and r yhat = X (X^T X)^{-1} X^T y (any problems with this?) ------- P -------- r = y - yhat = (I-P) y . Express these quantities in terms of the above: RSS = |r|^2 RMSE = |r|^2 / (N-p-1) * ROADMAP: - Prerequisites for linear model analysis: . Expectations of random vectors . Variance/covariance matrices of random vectors - Linear models assumptions: . first and second order, . but no normality yet, . conditional on X - First and second order properties of coefficient estimates b - First and second order properties of yhat and r and their implications for . model diagnostics and . degrees of freedom - The role of the predictor variance/covariance matrix and the LLN for coefficient estimates * VECTOR EXPECTATIONS AND VARIANCE/COVARIANCE MATRICES - In preparation for linear models analysis we define vector expectations and variance/covariances. Confusing fact: . In multivariate analysis, these describe within-dataset properties. . In regression analysis, these describe dataset-to-dataset properties. - Assume: Y ~ Ix1 random vector Z ~ Jx1 random vector with joint distribution ("observed on the same objects") Ex.: Y = (Height, Weight, Age, ...) of a person Z = (English Grade, Math Grade, SAT score, ...) of same person The Y- and Z-variables will generally be 'associated'. - EXPECTATION: . Definition: E[Y] = vector of E[Yi] (i=1...I) . Estimation: In Multivariate Analysis one assumes N i.i.d. draws yn from Y, where yn ~ Ix1 = n'th sample. Estimate of E[Y]: ybar = (sum_n yn)/N ~ Ix1 Unfortunately in regression we only see one draw y of Y Do you see what's confusing? ... . Affine transformation: if A mxn, c mx1, both constant, then E[AY + c] = A E[Y] + c Proof: Write out the components of AY and apply E[]. Q: Why do we need this? A: Because we will analyze objects such as b = (X^T X)^{-1} X^T y yhat = P y . Algebra: Assuming Y and Z are both Ix1, then E[a*Y + b*Z] = a*E[Y] + b*E[Z] This property is called linearity of E[...]. - COVARIANCE: . Definition: V[Y,Z] = E[ (Y-E[Y])(Z-E[Z])^T ] V[Y] = V[Y,Y] . Components: V[Y,Z]_ij = ... V[Y]_ij = ... V[Y]_ii = ... . What are the ranks of (Y-E[Y])(Z-E[Z])^T and (Y-E[Y])(Y-E[Y])^T ? Hence a covariance matrix is an average of ... . Sizes: V[Y,Z] is IxJ, V[Y] is IxI . Generalizing univariate V[X] = E[X^2] - E[X]^2: V[Y,Z] = E[ Y Z^T ] - E[Y] E[Z]^T V[Y] = E[ Y Y^T ] - E[Y] E[Y]^T . Algebra: Assuming Y, Z are both Ix1 and U is Kx1: V[Y,Z] = V[Z,Y]^T V[a*Y + b*Z,U] = a*V[Y,U] + b*V[Z,U] V[a*Y + b*Z] = a^2 V[Y] + b^2 V[Z] + 2*a*b*V[Y,Z] These properties are called ... and ... and ... 'symmetry' 'distributive law' 'binomial expansion' . Affine transformation: if A mxn, c mx1, both constant, then ---------------------------- | V[AY,BZ] = A V[Y,Z] B^T | | | | V[AY] = A V[Y] A^T | | | | V[Y+c,Z+d] = V[Y,Z] | ---------------------------- (Exercise: Specialize the middle to a linear form, in particular to A = t(c(1,1,...,1)).) . Multivariate Analysis: Estimation of V[Y] * Given N i.i.d. samples yn drawn from Y Vhat[Y] = sum_i ( (yn-ybar)(yn-ybar)^T )/(N-1) * Affine transformation: zn = A yn + c Vhat[z] = A Vhat[y] A^T (Specialize A to a linear form: A ~ mx1) ================================================================ * LINEAR MODELS: - In regression we observe only one single y-vector, but we will make assumptions about its expectation and covariance and see what follows from them. [Whether the assumptions are realistic for a given dataset needs to be checked with diagnostics.] - Linear model assumptions: . Convention: y is Nx1 (not Ix1) Compare asymptotics in Multivariate Analysis: y1,...,yN are Ix1 and N-->Inf Regression Analysis: y is Nx1, and N-->Inf . Model in vector/matrix language: y = beta0 + beta1*x1 + beta2*x2 + ... + betap*xp + eps = X beta + eps where beta is (p+1)x1 where y, xj, eps are Nx1 X = (1,x1,x2,...,xp) is Nx(p+1) Note: + The response vector y is a random vector -- inspite of lower-case notation! + Predictors xj are NOT random but fixed and known (set or observed but conditioned on). + The error vector 'eps' is a random vector. . Stochastic assumptions: E[eps] = 0 : On average [across worlds, universes, datasets] the model is "correct" ["model is unbiased"]. V[eps] = sigma^2 * I: Uncorrelated errors, constant variances ("homoscedastic errors") . Equivalently: E[y] = X beta V[y] = sigma^2 * I [Distinguish: "unbiased estimator" vs "unbiased model"] . Note conditionality on X: the assumption of fixed X Simulation would only sample eps-vectors, leaving X fixed, varying only y (Why is this weird? In observational data...) . Q: What variability do E[...] and V[...] refer to? A: ... holding ... fixed ================================================================