No Title

Class 9. Comparing group means.

What you need to have learnt from Class 8: categorical variables in regression.

Tests: the null hypothesis is always that the differences are zero, that is no difference between the groups. Three types of test: (a) Slope or intercept differences non-zero. (b) Slope differences non-zero? (c) Intercept differences non-zero?

: Are any of the slope or intercept differences non-zero. (i.e. does adding the categorical variable and its interaction buy us any explanatory power?). Use the partial-F. You have to calculate this one yourself, see p. 232 of the BulkPack.
: Are any of the slope differences non-zero? Do we need separate slopes (i.e. do we need an interaction term)? Use the partial-F as given on the interaction term in the ``effect test''.
: Are any of the intercept differences non-zero? Given we don't need interaction, do we need separate intercepts? Use the partial-F as given on the categorical variable term in the ``effect test'' from a model excluding the interaction.

When you do a partial-F to compare a BIG model versus a LITTLE model then the BIG model must include all the variables in the LITTLE model for the comparison to be valid. (Technical term: the little model is nested in the big model.)

New material for today: ANOVA.

Objective: compare means (of a Y-variable) across different groups. Example: Is CEO compensation different between sectors?

A single continuous Y-variable and one categorical X-variable.

Recognize: X (the group variable) is categorical.

Conceptually different from regression.

: Regression usually has a model building and prediction objective.
: ANOVA has a group comparison objective - no model building.

Two basic questions:

Are the group means all the same or are some significantly different? Look in the overall ANOVA table to answer this. Analysis done from ``Fit Y by X'' button.

If some are different (first test does not tell you which) use follow up and refocus question: compare groups to one another - which ones are significantly different? Various comparison procedures:

: Compare each pair, one at a time. BAD.
: Compare all pairs at once. GOOD. Tukey.
: Compare each group with best. GOOD. Hsu.

Critical issue to understand: why is comparing each pair, one pair at a time BAD? Must read pp. 252-254 in Bulk Pack.

The procedure which compares each pair, one pair at a time (a two-sample t-test) fails to take into account the number of comparisons we are making. If we make a lot of comparisons then just by chance alone we tend to see something significant. (If we buy many lottery tickets we tend to win the lottery even though any single ticket is unlikely to win.) No fishing.

We want to use a procedure that adjusts for the number of comparisons that are made and also recognizes that the comparisons may be data driven. Tukey's and Hsu's do just this. They are multiple comparison procedures with honest Type I error rates. (Recall: Type I error - saying there's a difference when really there is not.) Honest means that when they declare a 5% error rate, then there is a 5% chance of one or more errors in the entire set of comparisons NOT a 5% chance of any particular comparison being wrong.

Multiple comparison procedures achieve honesty by making it harder to declare a difference significant.

Assumptions: p-values only have credibility if assumptions hold. Check by graphing residuals.

: Independent errors.
: Same variance in each group.
: Approximately normal.

Dealing with JMP output for multiple comparisons. Two choices - exactly the same conclusions:

: Use graphical output (circle clicking).
: Use table output (reading numbers).

Richard Waterman
Wed Oct 2 21:48:17 EDT 1996