Class 14 Stat701 Fall 1997

Weighted Least Squares and Robusr Regression.

Todays class.


*
Monte Carlo simulations are a computer intensive method for evaluating and comparing the properties of different decision making regimes.
*
Elements of a Monte Carlo study
*
A virtual world model
*
A procedure that operates on the virtual world

*
Pseudo code for the MC study

for(i in 1: simsize) {
          create virtual world 
          evaluate procedure on this virtual world 
          store result of evaluation } 
Look at long run properties of results

Law of large numbers assures that sample averages tend to true population values
Major caveat: virtual world model must be a reasonable approximation to reality.

Link to your problem

Example

The housing data set, code for the weighted least squares and an auxiliary program for heteroscedasticity diagnostics. (Assumes you have the Zinn functions already).


Introduction to ``robust regression''.

Key idea. Regression is a sensitive procedure. It can be strongly influenced by errant observations. It would be good to have an alternative methodology that fits linear models but is not sensitive to outliers.

Analogy: Mean is very sensitive to outliers, whereas the median is not.

Do a regression procedure that is akin to using the median.

Answer: robust/resistant regression - a regression that gives lower weight to outliers.

Varieties:

1. Regression that explicitly down-weights outliers.

Idea: run a regression, down-weight big residuals, rerun regression with weights based on residuals, keep going until convergence (estimates settle down). Called Iteratively Reweighted Least Squares (IRLS).

SPlus command rreg

Example:
coef(rreg(pre82price, post82price))

2. Regression that chooses the line to minimize a quantity other than the sum of the squares of the residuals.

Chosen one today - minimize the sum of the absolute deviations - sometimes called L1-regression.

Good for heavy tailed residual distributions.

Problem: analytical solutions are impossible, no standard error formulae etc - but we can always bootstrap to get a handle on uncertainty.

SPlus command l1fit

coef(l1fit(pre82price, post82price))

Bootstrap standard error for the slope:

summary(bootstrap(data=pollute,
coef(l1fit(pre82price, post82price))[2],
B = 1000))

Compare these to plain old least squares:

coef(lm(post82price ~ pre82price))



Richard Waterman
Wed Oct 22 22:27:21 EDT 1997