Class 6: Categorical variables with two groups.
What you need to have learnt from Class 5.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Collinearity.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Correlation of the X's leads to an unstable regression plane.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- There is extreme uncertainty about the true slopes.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- This is not a problem if predicting in the range of the data.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Know the consequences of collinearity.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Understand the collinearity diagnostics.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Be aware of the fix ups.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Hypothesis testing - three types of test.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Last in test - the t-test.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- ALL at once test - the ANOVA F-test.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Testing a subset of variables - the Partial F-test.
Some model building
strategies.
New material for today: including a categorical variable in a
regression.
Start with 2 groups in the categorical variable, more than two groups
is covered in class 7.
Key fact: When JMP compares two groups in a regression, the comparison
is between each group and the average of the two groups. In fact JMP
only compares one group to the average, but if you know that one group
is for example
three below the average then you know that the other group must be
three above the average, so it's not a big deal.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Parallel lines regression - allowing different intercepts for
the two groups.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Declare the variable as NOMINAL.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Add it just like any other X-variable.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Including the categorical variable allows you to fit a separate
line to each group so that you can compare them.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Recognize that the comparison is between each group and the
average of the two groups.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/blueball.gif)
- Recognize that the lines are forced to be parallel.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The ``slope'' estimate on the categorical variable is the
difference between one group and the average of the two groups for
the estimated Y-value.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The height difference between the parallel lines is given by
twice the estimated slope for the categorical variable.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/greenball.gif)
- Non-parallel lines regression - allowing different intercepts
and different slopes for each group.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Declare the categorical variable as NOMINAL.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- Add it just like any other X-variable but also add the cross
product term. Cross product terms are sometimes known as
interaction terms.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The ``slope'' on the categorical variable tells you the
difference between intercepts, comparing each group to the average
of the two groups.
![*](http://compstat.wharton.upenn.edu:8001/~waterman/icons/yellowball.gif)
- The ``slope'' on the cross product term tells you the
difference between slopes for the two groups, comparing each group
slope to the average of the group slopes.
Richard Waterman
Mon Sep 23 22:36:21 EDT 1996