next up previous
Next: 2.2 Up: 2. Previous: 2.

2.1 Plain vanilla

The plain residual ei and its plot is useful for checking how well the regression line fits the data, and in particular if there is any systematic lack of fit, for example curvature.

But, what value should be considered as a big residual?

*
Problem: ei retains the scale of the response variable (Y).
*
Answer: standardize by an estimate of the variance of the residual.
*
Know, $Var(y_i) = \sigma^2$ estimated by (RMSE)2.
*
But, $e_i = (y_i - \hat y_i)$, which is more than just yi.
*
Turns out, $Var(e_i) = \sigma^2 (1 - h_{ii})$.
*
Use standardized residual, si.
*
The quantity, hii is fundamental to regression.
*
An heuristic explanation of hii (visually we are dragging a single point upward and measuring how the regression line follows):
*
Think about yi the observed value, and $\hat y_i$ the estimated value (ie the point on the regression line).
*
For a fixed xi perturb yi a little bit, how much do you expect $\hat y_i$ to move?
*
If $\hat y_i$ moves as much as yi then clearly yi has the potential to drive the regression - so yi is leveraged.
*
If $\hat y_i$ hardly moves at all then clearly yi has no chance of driving the regression.
*
In other words hii is the measure of ``leverage''.
*
More precisely

\begin{displaymath}h_{ii} = \frac{d\hat y_i}{d y_i},\end{displaymath}

and it depends only on the x-values.
*
Understanding leverage is essential in regression because leverage exposes the potential role of individual data points. Do you want your decision to be based on a single observation?


next up previous
Next: 2.2 Up: 2. Previous: 2.
Richard Waterman
1999-09-20