Class 2

From class 1.

Understand, interpret and distinguish the regression summaries:

The interpretation and the benefits of using a confidence interval for the slope.

Two types of prediction and interval (range of feasible values):

The dangers in extrapolating outside the range of your data: (three sources).

: The uncertainty in our estimate of the true regression line.
: The uncertainty due to the inherent variation of the data about the line.
: The uncertainty due to the fact that maybe we should not be using a line in the first place (model misspecification)!

: The X-variables may be related (correlated) with one another.
: Consequence: looking at one X-variable at a time may present a misleading picture of the true relationship between Y and the X-variables.
: The difference between marginal and partial slopes. Marginal: the slope of the regression line for one X-variable ignoring the impact all the others. Partial: the slope of the regression line for one X-variable taking into account all the others.

: Spot leveraged points.
: Identify large residuals.
: Diagnose systematic lack of fit, i.e. spot curvature which may suggest transformations.
: Identify heteroscedasticity.

: Identify collinearity (correlation) between X-variables.
: Identify marginal non-linear relationships between Y and X-variables.
: Determine which X-variables are marginally most significant (thin ellipses).

: Keep it as simple as possible (parsimony).
: Make sure everything is interpretable (especially any transformations).
: After having met the above criteria go for biggest R-squared, smallest RMSE and the model that makes most sense (signs on regression slopes).

: Thin ellipses in the scatterplot matrix. (High correlation.)
: Counter-intuitive signs on the slopes.
: Large standard errors on the slopes (there's little information on them).
: Collapsed leverage plots.
: High Variance Inflation Factors. The increase in the variance of the slope estimate due to collinearity.
: Insignificant t-statistics even though over all regression is significant (ANOVA F-test).

: Look for the t-statistic.
: The hypothesis test in English: does this variable add any explanatory power to the model that already includes all the other X-variables?
: Small p-value says YES, big p-value says NO.

: Look for the F-statistic in the ANOVA table.
: The hypothesis test in English: do any of the X-variables in the model explain any of the variability in the Y-variable?
: Small p-value says YES, big p-value says NO.
: Note that the test does not identify which variables are important.
: If you answer this question as NO then it's back to the drawing board - none of your variables are any good!

: It's no use looking for this one on the output. You have to calculate it yourself. See formula on p.154 of the Bulk Pack.
: The test in English: do any of the X-variables in the subset under consideration explain any of the variability in Y?
: We use a rule of thumb for this one (because we are not using F-tables). If the partial F is less than one then you can be sure that the answer is NO. If it is greater than 4 then you can be sure that the answer is YES. If it is in between 1 and 4 then we will let it be a judgment call.

Examples

Car89.jmp p111.

Richard Waterman
Sun Aug 17 22:24:25 EDT 1997