next up previous
Next: 5.3 Up: 5. Previous: 5.1

5.2 Interaction terms in categorical variables

Interaction: a three variable concept. One Y and two X's. X1 and X2.

The impact of X1 on Y depends on the level of X2.

In the gold example; the impact of SP500 return on gold return depends on the date (Pre 1980 vs. Post 1980).

In formulae, denote th categorical variable by z, and let z = 1 for group 1 and z = -1 for group 2. Then for group 1:


\begin{displaymath}Av(Y\vert x,z) = \beta_0 + \beta_1 x + \beta_2 z + \beta_3 x \times z.\end{displaymath}


\begin{displaymath}Av(Y\vert x,z = 1) = \beta_0 + \beta_1 x + \beta_2 \times 1+ \beta_3 x \times 1\end{displaymath}


\begin{displaymath}Av(Y\vert x,z) = \beta_0 + \beta_2 + (\beta_1 + \beta_3) x,\end{displaymath}

and group 2

\begin{displaymath}Av(Y\vert x,z = -1) = \beta_0 + \beta_1 x + \beta_2 \times -1+ \beta_3 x \times -1\end{displaymath}


\begin{displaymath}Av(Y\vert x,z) = \beta_0 - \beta_2 + (\beta_1 - \beta_3) x.\end{displaymath}

Hence $2 \beta_2$ is the difference in intercepts and $2 \beta_3$ is the difference in slopes.

When doing categorical variable regression always check the residuals for each group.

Comparison boxplots are good for this.

There are many different types of coding schemes for categorical variables. We will investigate them in more detail.

Example from the Gold data set.

Consider Q-Q plot of one set of residuals against the other.


next up previous
Next: 5.3 Up: 5. Previous: 5.1
Richard Waterman
1999-09-13