Class 6: Categorical variables with two groups.

What you need to have learnt from Class 5.

*
Collinearity.
*
Correlation of the X's leads to an unstable regression plane.
*
There is extreme uncertainty about the true slopes.
*
This is not a problem if predicting in the range of the data.
*
Know the consequences of collinearity.
*
Understand the collinearity diagnostics.
*
Be aware of the fix ups.

*
Hypothesis testing - three types of test.
*
Last in test - the t-test.
*
ALL at once test - the ANOVA F-test.
*
Testing a subset of variables - the Partial F-test.


Some model building strategies.


New material for today: including a categorical variable in a regression.

Start with 2 groups in the categorical variable, more than two groups is covered in class 7.

Key fact: When JMP compares two groups in a regression, the comparison is between each group and the average of the two groups. In fact JMP only compares one group to the average, but if you know that one group is for example three below the average then you know that the other group must be three above the average, so it's not a big deal.

*
Parallel lines regression - allowing different intercepts for the two groups.
*
Declare the variable as NOMINAL.
*
Add it just like any other X-variable.
*
Including the categorical variable allows you to fit a separate line to each group so that you can compare them.
*
Recognize that the comparison is between each group and the average of the two groups.
*
Recognize that the lines are forced to be parallel.

*
The ``slope'' estimate on the categorical variable is the difference between one group and the average of the two groups for the estimated Y-value.
*
The height difference between the parallel lines is given by twice the estimated slope for the categorical variable.

*
Non-parallel lines regression - allowing different intercepts and different slopes for each group.
*
Declare the categorical variable as NOMINAL.
*
Add it just like any other X-variable but also add the cross product term. Cross product terms are sometimes known as interaction terms.
*
The ``slope'' on the categorical variable tells you the difference between intercepts, comparing each group to the average of the two groups.
*
The ``slope'' on the cross product term tells you the difference between slopes for the two groups, comparing each group slope to the average of the group slopes.



Richard Waterman
Mon Sep 23 22:36:21 EDT 1996