Class 7. Categorical variables: more than two groups.
What you need to have learnt from Class 6.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Two types of model.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Parallel lines model: different intercepts - same slopes.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Non-parallel lines: different intercepts and different slopes.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Two key facts in understanding the JMP output.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- JMP always makes comparisons to the ``average''
of the groups.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- JMP always leaves one group out - you figure out the missing
difference (easy).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Non-parallel slopes, an interaction model.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Interaction. A three variable concept (Y,X1,X2). Generic
description: the impact of X1 on Y depends on the value of X2.
New material for today: more than two groups.
Example. Consider a variable with three groups (G1,G2,G3).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Parallel lines regression - Three of them, one for each group.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Key fact: 3 groups, JMP gives 2 comparisons.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- G1 to average.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- G2 to average.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- You work out G3: if G1 is 4 above average and G2 is 3 above
average then G3 must be 7 below average.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Rule: what number added to the others makes them all sum to zero?
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- A negative on a categorical variable coefficient -- BELOW par.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- A positive on a categorical variable coefficient -- ABOVE par.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Non-parallel lines - 3 different intercepts and three
different slopes.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Presenting categorical variable regression, an equation for
each group. Follows p.218 in the bulk pack.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
-
Baseline: RunTime = 179.59 + 0.23 RunSize
G1 : RunTime = (179.59 + 22.94) + (0.23 + 0.07) Runsize
G2 : RunTime = (179.59 + 6.90) + (0.23 + -.10) Runsize
G3 : RunTime = (179.59 + -29.84) + (0.23 + .03) Runsize
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Is a difference significant? Look at the t-stat.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Are the differences significant? Look at a partial-F (``effect
test'' in JMP).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- The partial-F
/ 2 2 \ /Number of
| R - R | / variables
\ BIG SMALL / / in the subset.
__________________________________________________________________
/ 2 \ / Number of observations
| 1 - R | / minus number of parameters
\ BIG / / in Big model. (inc. intercept).
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Strategy for when some groups are significantly different and
others are not: collapse the non-significant groups together.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- More than one categorical variable - fine. (e.g. gender and
race). What does a parallel lines regression mean here? Take Y
as income and explain it in English.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Interaction with more than one variable - fine (see p. 209).
Richard Waterman
Wed Sep 25 22:11:13 EDT 1996