Class 11. Regression for a categorical response

What you need to have learnt from Class 10: Comparing the mean across two categorical variables.

*
Two basic models:
*
No interaction: the impact of X1 on Y does not depend on the level of X2.
*
Interaction: the impact of X1 on Y depends on the level of X2.

*
Practical consequences:
*
If NO interaction, then you can investigate the impact of each X by itself.
*
If there is interaction (consider practical importance as well as statistical significance) then you must consider both X1 and X2 together.

*
Know and check the assumptions for ANOVA.


New material for today: Regression for a categorical response (logistic regression).

*
Objective: model a categorical (2-group) response.
*
Example: how do gender and income impact the probability of the purchase of a product.
*
Problem: linear regression does not respect the range of the response data (it's categorical).
*
Solution: model the probability that Y = 1, ie P(Y = 1 | X), in a special way.
*
Transform P(Y = 1) with the ``logit'' transform.
*
Now fit a straight line regression to the logit of the probabilities (this respects the range of the data).
*
On the original scale (probabilities) the transform looks like this: the curve gives the probability that Y = 1 for a fixed value of X.

ORINGS_PIC

*
The logit is defined as logit(p) = ln(p/(1-p)). Example logit(.25) = ln(.25/(1 - .25)) = ln (1/3) = -1.099.
*
The three worlds of logistic regression.
*
The probabilities: this is where most people live. Probability lies in (0,1).
*
The odds: this is where the gamblers live. Odds lies in (0, Infinity)
*
The logit: this is where the model lives. Logit lies in (-Infinity, Infinity). Lines lie in (-Infinity,Infinity), therefore fit a line to the logit.

*
Must feel comfortable moving between the three worlds.
*
Rules for moving between the worlds. Call P(Y = 1|X), p for simplicity.
*
logit(p) = ln(p/(1-p))
*
p = exp(logit(p))/(1 + exp(logit(p))) *** Key to get back to the real world.
*
odds(p) = p/(1-p)
*
odds(p) = exp(logit(p)) *** Key for interpretation.

*
Interpreting the output.
*
P-values are under the Prob>ChiSq column.
*
Main equation logit(p) = B0 + B1 X.
*
B1 = 0. No relationship between X and p.
*
B1 > 0. As X goes up p goes up.
*
B1 < 0. As X goes up p goes down.
*
B1: for every one unit change in X, the ODDS that Y = 1 changes by a multiplicative factor of exp(B1).
*
At X = -B1/B0 there is a 50% chance that Y = 1.

*
Key calculation - based on the logistic regression output calculate a probability. Example: Challenger output on p.306.
*
logit(p) = 15.05 - 0.23 Temp.
*
Find the probability that Y = 1 (at least one failure) at a temperature of 31.
*
logit(p) = 15.05 - 0.23 * 31
*
logit(p) = 7.96.
*
p = exp(logit(p))/(1 + exp(logit(p)))
*
p = exp(7.96)/(1 + exp(7.96)) = 0.99965
*
There is a 99.965 percent chance of at least one failure at a temperature of 31 degrees.



Richard Waterman
Wed Oct 9 22:50:58 EDT 1996