Todays class.
|
Key idea. Regression is a sensitive procedure. It can be strongly influenced by errant observations. It would be good to have an alternative methodology that fits linear models but is not sensitive to outliers.
Analogy: Mean is very sensitive to outliers, whereas the median is not.
Do a regression procedure that is akin to using the median.
Answer: robust/resistant regression - a regression that gives lower weight to outliers.
Varieties:
1. Regression that explicitly down-weights outliers.
Idea: run a regression, down-weight big residuals, rerun regression with weights based on residuals, keep going until convergence (estimates settle down). Called Iteratively Reweighted Least Squares (IRLS).
SPlus command rreg
Example:
coef(rreg(pre82price, post82price)) |
2. Regression that chooses the line to minimize a quantity other than the sum of the squares of the residuals.
Chosen one today - minimize the sum of the absolute deviations - sometimes called L1-regression.
Good for heavy tailed residual distributions.
Problem: analytical solutions are impossible, no standard error formulae etc - but we can always bootstrap to get a handle on uncertainty.
SPlus command l1fit
coef(l1fit(pre82price, post82price)) |
Bootstrap standard error for the slope:
summary(bootstrap(data=pollute,
coef(l1fit(pre82price, post82price))[2], B = 1000)) |
Compare these to plain old least squares:
coef(lm(post82price ~ pre82price)) |