Class 5. Collinearity and hypothesis
testing
What you need to have learnt from Class 4.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- What is multiple regression?
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The model:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The picture:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- The interpretation of the partial slopes in multiple
regression. Example: if we have two X variables X1 and X2 then the
partial slope of X1 is interpreted as ``the change in Y for every
one unit change in X1 holding X2 constant''.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- The essential difference between multiple regression and simple
(one X) regression - the fact that in multiple regression the X's
may be correlated which implies that looking at partial slopes or marginal
slopes can lead to different decisions.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- What makes a good model (it can depend on your
objectives).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- What can be learnt from a leverage plot.
New material for Class 5. Collinearity and Hypothesis testing
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Collinearity
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Definition: correlation between the X-variables.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Consequence: it is difficult to establish which of the
X-variables are most important (they all look the same). Visually
the regression plane becomes very unstable (sausage in space, legs
on the table).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Diagnostics:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Thin ellipses in the scatterplot matrix. (High correlation.)
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Counter-intuitive signs on the slopes.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Large standard errors on the slopes (there's
little information on them).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Collapsed leverage plots.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- High Variance Inflation Factors. The increase in the variance
of the slope estimate due to collinearity.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Insignificant t-statistics even though over all regression is
significant (ANOVA F-test).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Fix ups:
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Ignore it. OK if sole objective is prediction in the range of
the data.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Combine collinear variables in a meaningful way.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Delete variables. OK if extremely correlated.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Hypothesis testing in multiple regression. Three flavors. They all test
whether slopes are equal to zero or not. They differ
in the number of slopes we are looking at simultaneously.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Test a single regression coefficient (slope).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Look for the t-statistic.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The hypothesis test in English: does this variable add any
explanatory power to the model that already includes all the
other X-variables?
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Small p-value says YES, big p-value says NO.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Test all the regression coefficients at once.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Look for the F-statistic in the ANOVA table.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The hypothesis test in English: do any of the X-variables in
the model explain any of the variability in the Y-variable?
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Small p-value says YES, big p-value says NO.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Note that the test does not identify which variables are
important.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- If you answer this question as NO then it's back to the
drawing board - none of your variables are any good!
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Test a subset of the regression coefficients (more than one,
but not all of them - the Partial F-test).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- It's no use looking for this one on the output. You have to
calculate it yourself. See formula on p. 169 of the Bulk Pack.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The test in English: do any of the X-variables in the
subset under consideration explain any of the variability in
Y?
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- We use a rule of thumb for this one. If the partial F is
less than one then you can be sure that the answer is NO. If it
is greater than 4 then you can be sure that the answer is
YES. If it is in between 1 and 4 then we will let it be a
judgment call.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Must be able to answer this question: ``why not do a whole
bunch of t-tests rather than one partial F-test?'' Answer: the
partial F-test is an honest simultaneous test (see
p. 149 of Bulk Pack).
Richard Waterman
Wed Sep 18 21:45:23 EDT 1996
|