In Module 3 there will be an interpretation of R2 as "fraction of variance explained". For now we are content with the interpretation of R2 as a measure of
The more elongated the best-fitting ellipse, the higher R2.
Recall how to find "density ellipses" in JMP:
JMP: Analyze > Fit Y by X > pick X, Y; OK > red diamond >
Density Ellipse > 0.95
R2=0.978 | R2=0.815 |
RMSE: measure of quality of a fit, on the scale of the response.
The above pictures give the impression that the three simulated
datasets are pretty good fakes of the real thing (top left picture).
The model seems to work quite nicely as a summary of the data.
(There are some small discrepancies between the
actual and the simulated data. The actual data seem to have a
little less variability than their simulated cousins. After
perusing the actual prices,
one sees that there is a slight preponderance of round values,
such as multiples of $10 and $5. A good data forger would round
the simulated prices first to whole dollars and then some with
small probability to nearest multiples of 10 or 5. Reduction of
variability due to rounding is typical for monetary
variables.)
response = signal + noise (general) = straight line + normal variability (special)As always, Greek letters stand for true but unknown population numbers. They are to be estimated by their sample-analogs b0 and b1, written in Roman letters.
In general we call an observation influential
if leaving it out changes at least one of the following quantities
substantially: b1, b0, R2, RMSE.
(What is "substantial change"? We're being vague.
Most of the time you know it when you see it.)
A leverage point will be influential by moving the slope b1,
and also affecting R2 (by driving it up).
The best way to learn about influence and leverage is by playing
with the
leverage applet created by our colleague at the University of Chicago.
Work through the details of slides 2-12...2-23. They have excellent examples.