Class 3

What you need to have learnt from Class 2.

*
The second rule of data analysis: always check the residuals.
*
The regression assumptions.
*
Consequences of assumption violations.
*
Diagnosing violations through residual plots.
*
Categorizing unusual points: leverage, residuals and influence.
*
Impact of unusual points on the regression.
*
How important is a point? Remove it and see how your decision changes.

New material for Class 3.

*
Understanding almost all the regression output
*
R-squared.
*
The proportion of variability in Y explained by the regression model.
*
Answers the question ``how good is the fit''?

*
Root mean squared error (RMSE).
*
The spread of the points about the fitted model.
*
Answers the question ``can you do good prediction''?
*
Write the variance of the tex2html_wrap_inline39 as tex2html_wrap_inline41 , then RMSE estimates tex2html_wrap_inline43 .
*
Only a meaningful measure with respect to the range of Y.
*
A rule of thumb 95% prediction interval: up to the line +/- 2 RMSE (only works in the range of the data).

*
Confidence interval for the slope.
*
Answers the question ``is there any point in it all''?
*
If the CI contains 0, then 0 is a feasible value for the slope, i.e. the line may be flat, that is X tells you nothing about Y.
*
The p-value associated with the slope is testing the hypothesis Slope = 0 vs Slope tex2html_wrap_inline45 0.

*
Two types of prediction (concentrate on the second)
*
Estimating an average, ``where is the regression line''?

displaymath47

Range of feasible values should reflect uncertainty in the true regression line.

*
Predicting a new observation, ``where's a new point going to be''?

displaymath49

Range of feasible values should reflect uncertainty in the true regression line AND the variability of the points about the line.

*
What a difference a leveraged point can make.
   From pages 96 and 100.

     Measure      With outlier    Without outlier

     R-squared    0.78            0.075
     RMSE         3570             3634
     Slope        9.75             6.14
     SE(slope)    1.30             5.56

*
If someone comes with a great R-squared it does not mean they have a great model; maybe there is just a highly leveraged point well fit by the regression line.



Richard Waterman
Wed Sep 11 23:19:07 EDT 1996