Class 4

What you need to have learnt from Class 3.

*
Understand, interpret and distinguish the regression summaries:
*
R-squared.
*
Root Mean Squared Error (RMSE).

*
The interpretation and the benefits of using a confidence interval for the slope.
*
Two types of prediction and interval (range of feasible values):
*
Estimate a typical observation (conf curve:fit).
*
Predict a single new observation (conf curve:indiv).

*
The dangers in extrapolating outside the range of your data: (three sources).
*
The uncertainty in our estimate of the true regression line.
*
The uncertainty due to the inherent variation of the data about the line.
*
The uncertainty due to the fact that maybe we should not be using a line in the first place (model misspecification)!

New material for Class 4.

*
Making more realistic models with many X-variables - multiple regression analysis.
*
The fundamental differences between simple and multiple regression.
*
The X-variables may be related (correlated) with one another.
*
Consequence: looking at one X-variable at a time may present a misleading picture of the true relationship between Y and the X-variables.
*
The difference between marginal and partial slopes. Marginal: the slope of the regression line for one X-variable ignoring the impact all the others. Partial: the slope of the regression line for one X-variable taking into account all the others. Recall the death penalty example from Stat603.

*
Key graphics for multiple regression.
*
The leverage plot. A ``partial tool'': the analog of the scatterplot for simple regression. It lets you look at a large multiple regression one variable at a time, in a legitimate way (controls for other X-variables). Potential uses:
*
Spot leveraged points.
*
Identify large residuals.
*
Diagnose systematic lack of fit, i.e. spot curvature which may suggest transformations.
*
Identify heteroscedasticity.

*
The scatterplot matrix. A ``marginal tool'': presents all the two-variable (bivariate) relationships. Potential uses:
*
Identify collinearity (correlation) between X-variables.
*
Identify marginal non-linear relationships between Y and X-variables.
*
Determine which X-variables are marginally most significant (thin ellipses).

*
Facts to know.
*
R-squared always increases as you add variables to the model.
*
RMSE does not have to decrease as variables are added to the model.

*
Model building philosophy in this course.
*
Keep it as simple as possible (parsimony).
*
Make sure everything is interpretable (especially any transformations).
*
After having met the above criteria go for biggest R-squared, smallest RMSE and the model that makes most sense (signs on regression slopes).



Richard Waterman
Mon Sep 16 21:27:53 EDT 1996