No Title

Class 5. Collinearity and hypothesis testing

What you need to have learnt from Class 4.

What is multiple regression?

The model:

The picture: Regression plane
image

The interpretation of the partial slopes in multiple regression. Example: if we have two X variables X1 and X2 then the partial slope of X1 is interpreted as ``the change in Y for every one unit change in X1 holding X2 constant''.

The essential difference between multiple regression and simple (one X) regression - the fact that in multiple regression the X's may be correlated which implies that looking at partial slopes or marginal slopes can lead to different decisions.

What makes a good model (it can depend on your objectives).

What can be learnt from a leverage plot.

New material for Class 5. Collinearity and Hypothesis testing

Collinearity

Definition: correlation between the X-variables.

Consequence: it is difficult to establish which of the X-variables are most important (they all look the same). Visually the regression plane becomes very unstable (sausage in space, legs on the table).

Diagnostics:

: Thin ellipses in the scatterplot matrix. (High correlation.)
: Counter-intuitive signs on the slopes.
: Large standard errors on the slopes (there's little information on them).
: Collapsed leverage plots.
: High Variance Inflation Factors. The increase in the variance of the slope estimate due to collinearity.
: Insignificant t-statistics even though over all regression is significant (ANOVA F-test).

Fix ups:

: Ignore it. OK if sole objective is prediction in the range of the data.
: Combine collinear variables in a meaningful way.
: Delete variables. OK if extremely correlated.

Hypothesis testing in multiple regression. Three flavors. They all test whether slopes are equal to zero or not. They differ in the number of slopes we are looking at simultaneously.

Test a single regression coefficient (slope).

: Look for the t-statistic.
: The hypothesis test in English: does this variable add any explanatory power to the model that already includes all the other X-variables?
: Small p-value says YES, big p-value says NO.

Test all the regression coefficients at once.

: Look for the F-statistic in the ANOVA table.
: The hypothesis test in English: do any of the X-variables in the model explain any of the variability in the Y-variable?
: Small p-value says YES, big p-value says NO.
: Note that the test does not identify which variables are important.
: If you answer this question as NO then it's back to the drawing board - none of your variables are any good!

Test a subset of the regression coefficients (more than one, but not all of them - the Partial F-test).

: It's no use looking for this one on the output. You have to calculate it yourself. See formula on p. 169 of the Bulk Pack.
: The test in English: do any of the X-variables in the subset under consideration explain any of the variability in Y?
: We use a rule of thumb for this one. If the partial F is less than one then you can be sure that the answer is NO. If it is greater than 4 then you can be sure that the answer is YES. If it is in between 1 and 4 then we will let it be a judgment call.

Must be able to answer this question: ``why not do a whole bunch of t-tests rather than one partial F-test?'' Answer: the partial F-test is an honest simultaneous test (see p. 149 of Bulk Pack).

Richard Waterman
Wed Sep 18 21:45:23 EDT 1996