Class 5. Collinearity and hypothesis testing

What you need to have learnt from Class 4.

*
What is multiple regression?
*
The model:

displaymath47

*
The picture: Regression plane
image

*
The interpretation of the partial slopes in multiple regression. Example: if we have two X variables X1 and X2 then the partial slope of X1 is interpreted as ``the change in Y for every one unit change in X1 holding X2 constant''.
*
The essential difference between multiple regression and simple (one X) regression - the fact that in multiple regression the X's may be correlated which implies that looking at partial slopes or marginal slopes can lead to different decisions.
*
What makes a good model (it can depend on your objectives).
*
What can be learnt from a leverage plot.

New material for Class 5. Collinearity and Hypothesis testing

*
Collinearity
*
Definition: correlation between the X-variables.
*
Consequence: it is difficult to establish which of the X-variables are most important (they all look the same). Visually the regression plane becomes very unstable (sausage in space, legs on the table).
*
Diagnostics:
*
Thin ellipses in the scatterplot matrix. (High correlation.)
*
Counter-intuitive signs on the slopes.
*
Large standard errors on the slopes (there's little information on them).
*
Collapsed leverage plots.
*
High Variance Inflation Factors. The increase in the variance of the slope estimate due to collinearity.
*
Insignificant t-statistics even though over all regression is significant (ANOVA F-test).

*
Fix ups:
*
Ignore it. OK if sole objective is prediction in the range of the data.
*
Combine collinear variables in a meaningful way.
*
Delete variables. OK if extremely correlated.

*
Hypothesis testing in multiple regression. Three flavors. They all test whether slopes are equal to zero or not. They differ in the number of slopes we are looking at simultaneously.
*
Test a single regression coefficient (slope).
*
Look for the t-statistic.
*
The hypothesis test in English: does this variable add any explanatory power to the model that already includes all the other X-variables?
*
Small p-value says YES, big p-value says NO.

*
Test all the regression coefficients at once.
*
Look for the F-statistic in the ANOVA table.
*
The hypothesis test in English: do any of the X-variables in the model explain any of the variability in the Y-variable?
*
Small p-value says YES, big p-value says NO.
*
Note that the test does not identify which variables are important.
*
If you answer this question as NO then it's back to the drawing board - none of your variables are any good!

*
Test a subset of the regression coefficients (more than one, but not all of them - the Partial F-test).
*
It's no use looking for this one on the output. You have to calculate it yourself. See formula on p. 169 of the Bulk Pack.
*
The test in English: do any of the X-variables in the subset under consideration explain any of the variability in Y?
*
We use a rule of thumb for this one. If the partial F is less than one then you can be sure that the answer is NO. If it is greater than 4 then you can be sure that the answer is YES. If it is in between 1 and 4 then we will let it be a judgment call.

*
Must be able to answer this question: ``why not do a whole bunch of t-tests rather than one partial F-test?'' Answer: the partial F-test is an honest simultaneous test (see p. 149 of Bulk Pack).



Richard Waterman
Wed Sep 18 21:45:23 EDT 1996