No Title

Class 3

What you need to have learnt from Class 2.

: The second rule of data analysis: always check the residuals.
: The regression assumptions.
: Consequences of assumption violations.
: Diagnosing violations through residual plots.
: Categorizing unusual points: leverage, residuals and influence.
: Impact of unusual points on the regression.
: How important is a point? Remove it and see how your decision changes.

New material for Class 3.

Understanding almost all the regression output

R-squared.

: The proportion of variability in Y explained by the regression model.
: Answers the question ``how good is the fit''?

Root mean squared error (RMSE).

: The spread of the points about the fitted model.
: Answers the question ``can you do good prediction''?
: Write the variance of the as , then RMSE estimates .
: Only a meaningful measure with respect to the range of Y.
: A rule of thumb 95% prediction interval: up to the line +/- 2 RMSE (only works in the range of the data).

Confidence interval for the slope.

: Answers the question ``is there any point in it all''?
: If the CI contains 0, then 0 is a feasible value for the slope, i.e. the line may be flat, that is X tells you nothing about Y.
: The p-value associated with the slope is testing the hypothesis Slope = 0 vs Slope 0.

Two types of prediction (concentrate on the second)

Estimating an average, ``where is the regression line''?

Range of feasible values should reflect uncertainty in the true regression line.

Predicting a new observation, ``where's a new point going to be''?

Range of feasible values should reflect uncertainty in the true regression line AND the variability of the points about the line.

What a difference a leveraged point can make.

   From pages 96 and 100.

     Measure      With outlier    Without outlier

     R-squared    0.78            0.075
     RMSE         3570             3634
     Slope        9.75             6.14
     SE(slope)    1.30             5.56

If someone comes with a great R-squared it does not mean they have a great model; maybe there is just a highly leveraged point well fit by the regression line.

Richard Waterman
Wed Sep 11 23:19:07 EDT 1996