Class 4

What you need to know from last time

*The standard error of the sample mean; tex2html_wrap_inline83 . It measures how accurate the sample mean is as an estimate of the population mean. No confusion - always use the formula (don't use the standard deviation of sample means)
*The Central Limit Theorem
*Tracking sample means and standard deviations: x-bar and s-charts. Setting control limits
*Confidence intervals. For example, an approx 95% CI for tex2html_wrap_inline82 is given by tex2html_wrap_inline117 . Can see how the standard error through the confidence interval conveys the accuracy of tex2html_wrap_inline86
*Using a confidence interval to make a decision

Todays class - two themes

*Sampling
*Begin decision making - hypothesis testing


Sampling

*Context; there is a target population - the group you wish to make inferences about. You draw a sample. Use the sample to make statements about the population.
*Sample must be representative of the population
*Sampling is the way to obtain reliable information in a cost effective way (why not census?)
*Objective; collect information. Issues:
*What to measure?
*How accurate do we need it?
*How often do we need it?
*Does it meet end user requirements?

Issues in sampling

*Representativeness
*Interviewer discretion
*Respondent discretion - non-response
*Key question: is the reason for non-response related to the attribute you are trying to measure? Illegal aliens/Census. Start-up companies/not in phone book. Library exit survey.

*Good samples;
*Good samples; probability samples; each unit in the population has a known probability of being in the sample
*Simplest case; equal probability sample, each unit has the same chance of being in the sample
*Bad samples - the rest, convenience samples

The utopian sample for analysis

*You have a complete and accurate list of ALL the units in the target population (sampling frame)
*From this you draw an equal probability sample (generate a list of random numbers)
*Reality check; incomplete frame, impossible frame, practical constraints on the simple random sample (cost and time of sampling)

Precision considerations

*How large a sample do I need? p.117
*Focus on confidence interval - choose coverage rate (90%, 95%, 99%) margin of error (half the width). Typically trade off width against coverage rate.
*Simple rule of thumb for a population proportion - if it's a 95% CI then use n = 1/(margin of error)^2.

More issues

*The independence assumption. No overlap of information between observations.
*Finite population correction. What if population size is 400 and you sampled 399 of them?

Examples


survey1.jmp A hotel customer satisfaction survey.

survey2.jmp More on the survey.


Making decisions

Hypothesis testing

*Deciding on one of two choices
*Null hypothesis: status quo
*Alternative hypothesis: the converse of the null
*Example; jury trial. Null is Innocent. Alternative is Guilty
*Note - one is taken as true a priori
*Decision based on collecting data - the jury votes. If jury votes = 12 then convict else acquit and declare NOT GUILTY. Note, do not declare innocent!
*Two types of error
*Innocent, but declare guilty (null true but go with alternative - Type I)
*Guilty, but say innocent (alternative true but go with null - Type II)

*Price of the errors? Which is worse (think capital trial)
*What should error rates be?
*Beyond all reasonable doubt - very small chance of incorrectly declaring guilty - small chance of a Type I error
*The preponderance of the evidence
*Criminal vs. civil court - context, cost dependent.



Richard Waterman
Wed Aug 13 21:54:12 EDT 1997