Class 4
What you need to have learnt from Class 3.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Understand, interpret and distinguish
the regression summaries:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- R-squared.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Root Mean Squared Error (RMSE).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- The interpretation and the benefits of using a confidence
interval for the slope.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Two types of prediction and interval (range of feasible values):
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Estimate a typical observation (conf curve:fit).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Predict a single new observation (conf curve:indiv).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- The dangers in extrapolating outside the range of your data:
(three sources).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The uncertainty in our estimate of the true regression line.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The uncertainty due to the inherent variation of the data
about the line.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The uncertainty due to the fact that maybe we should not be
using a line in the first place (model misspecification)!
New material for Class 4.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Making more realistic models with many X-variables - multiple
regression analysis.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The fundamental differences between simple and multiple regression.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The X-variables may be related (correlated) with one another.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Consequence: looking at one X-variable at a time may present
a misleading picture of the true relationship between Y and the
X-variables.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The difference between marginal and partial
slopes. Marginal: the slope of the regression line for one
X-variable ignoring the impact all the others. Partial: the
slope of the regression line for one X-variable taking into
account all the others. Recall the death penalty example from Stat603.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Key graphics for multiple regression.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The leverage plot. A ``partial tool'': the analog of the
scatterplot for simple
regression. It lets you look at a large multiple regression one
variable at a time, in a legitimate way (controls for other
X-variables). Potential uses:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Spot leveraged points.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Identify large residuals.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Diagnose systematic lack of fit, i.e. spot curvature which may
suggest transformations.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Identify heteroscedasticity.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The scatterplot matrix. A ``marginal tool'': presents all the
two-variable (bivariate) relationships. Potential uses:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Identify collinearity (correlation) between X-variables.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Identify marginal non-linear relationships between Y and X-variables.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/redball.gif)
- Determine which X-variables are marginally most
significant (thin ellipses).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Facts to know.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- R-squared always increases as you add variables to the model.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- RMSE does not have to decrease as variables are added to the model.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Model building philosophy in this course.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Keep it as simple as possible (parsimony).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Make sure everything is interpretable (especially any
transformations).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- After having met the above criteria go for biggest R-squared,
smallest RMSE and the model that makes most sense (signs on
regression slopes).
Richard Waterman
Mon Sep 16 21:27:53 EDT 1996