Class 11. Regression for a categorical response
What you need to have learnt from Class 10: Comparing the mean
across two categorical variables.

- Two basic models:

- No interaction: the impact of X1 on Y does not depend on the
level of X2.

- Interaction: the impact of X1 on Y depends on the level of X2.

- Practical consequences:

- If NO interaction, then you can investigate the impact of
each X by itself.

- If there is interaction (consider practical importance as
well as statistical significance) then you must consider both X1
and X2 together.

- Know and check the assumptions for ANOVA.
New material for today: Regression for a categorical response
(logistic regression).

- Objective: model a categorical (2-group) response.

- Example: how do gender and income impact the probability of the
purchase of a product.

- Problem: linear regression does not respect the range of the
response data (it's categorical).

- Solution: model the probability that Y = 1, ie P(Y = 1 | X), in a
special way.

- Transform P(Y = 1) with the ``logit'' transform.

- Now fit a straight line regression to the logit of the
probabilities (this respects the range of the data).

- On the original scale (probabilities) the transform looks
like this: the curve gives the probability that Y = 1 for a fixed
value of X.

- The logit is defined as logit(p) = ln(p/(1-p)). Example
logit(.25) = ln(.25/(1 - .25))
= ln (1/3) = -1.099.

- The three worlds of logistic regression.

- The probabilities: this is where most people
live. Probability lies in (0,1).

- The odds: this is where the gamblers live. Odds
lies in (0, Infinity)

- The logit: this is where the model lives. Logit
lies in (-Infinity, Infinity). Lines lie in (-Infinity,Infinity),
therefore fit a line to the logit.

- Must feel comfortable moving between the three worlds.

- Rules for moving between the worlds. Call P(Y = 1|X), p for simplicity.

- logit(p) = ln(p/(1-p))

- p = exp(logit(p))/(1 + exp(logit(p))) *** Key to get back to
the real world.

- odds(p) = p/(1-p)

- odds(p) = exp(logit(p)) *** Key for interpretation.

- Interpreting the output.

- P-values are under the Prob>ChiSq column.

- Main equation logit(p) = B0 + B1 X.

- B1 = 0. No relationship between X and p.

- B1 > 0. As X goes up p goes up.

- B1 < 0. As X goes up p goes down.

- B1: for every one unit change in X, the ODDS that Y = 1
changes by a multiplicative factor of exp(B1).

- At X = -B1/B0 there is a 50% chance that Y = 1.

- Key calculation - based on the logistic regression output
calculate a probability. Example: Challenger output on p.306.

- logit(p) = 15.05 - 0.23 Temp.

- Find the probability that Y = 1 (at least one failure) at a
temperature of 31.

- logit(p) = 15.05 - 0.23 * 31

- logit(p) = 7.96.

- p = exp(logit(p))/(1 + exp(logit(p)))

- p = exp(7.96)/(1 + exp(7.96)) = 0.99965

- There is a 99.965 percent chance of at least one failure
at a temperature of 31 degrees.
Richard Waterman
Wed Oct 9 22:50:58 EDT 1996