Stat 601, Fall 2001, Class 2

What you need to know from last time

: Summary measures; mean, median,variance,sd,IQR
: Graphical summaries/diagnostics; histogram,boxplot,normal quantile plot
: If approx normal then can use empirical rule
: What is the Empirical rule?
: Often data is approx normal - but not always

Today's class

: Tracking sample means and standard deviations: x-bar and s-charts. Setting control limits
: The standard error of the mean; $\sigma/\sqrt{n}$
: The Central Limit Theorem
: Confidence intervals
: Using a confidence interval to make a decision
: Assumptions and their role in analysis
: Ideas behind sampling

Monitoring the mean and variance of a process

Example

ShaftXtr.jmp

Objective

Monitor a production process assuming observations are independent.

: Achieve this by placing control limits
: How to choose limits - can use empirical rule on sample means
: In control: mean and variance stable over time
: Capable: process meets specs
: E.R. needs to know s.d. of the sample means
: SD of ${\overline X} = \sigma/\sqrt{n}$ where n is number of observations in sample mean
: Can use overall sample mean +/- 3 $\sigma/\sqrt{n}$ as "3 sigma limits"
: Chances a particular observation is outside these limits if process is in control is 1 -.997 (from ER), ie small
: Unlikely evens signal something is wrong -> take action

Standard error of the mean

Sample means are less variable than raw data
SE( $\overline{X}$ ) = $\sigma/\sqrt{n}$ where $\sigma$ is the true s.d. of a single observation and n is the number of observations in the sample mean

The Central Limit Theorem

Sample means are approximately normally distributed. (see p.66 of CaseBook)

: E( $\overline{X}$ ) = $\mu$ .
: Var( $\overline{X}$ ) = $\sigma^2/n$ .
: s.d.( $\overline{X}$ ) = SE( $\overline{X}$ ) = $\sigma/\sqrt{n}$ .

Because sample means are approx. normal can use Empirical Rule on them.

Control charts

Two types

: X-bar chart; track sample means
: s-chart; track sample standard deviations

Setting the control limits - two ways (JMP gives choice);

: From the engineer; use their specs to create limits
: From the data; use overall sample mean and overall sample variance plus the Empirical Rule to create limits (typically 3-sigma)

Two examples

ShaftXtr.jmp A well behaved process -- in control.

CarSeam.jmp A process that fails to meet engineers specs.

CompChip.jmp A process that breaks down.

Notes

: S-charts are usually one-sided in manufacturing
: Dealing with miracles; someone has to win the lottery but the same person should not win it three times in a row. (p.63 of CaseBook)
: Daily means, weekly means, monthly means or WHAT? (p.79 of CaseBook)

Confidence intervals

What is it?

: 1. A range of feasible values for an unknown population parameter, e.g. $\mu$ or $\sigma^2$
: 2. A statement conveying the confidence that the range of feasible values really does include the unknown population value

Where does it come from?

: Inverting the Empirical rule
: If 95% of the time the sample mean is within +/- 2 standard errors from $\mu$ , then 95% of the time the true $\mu$ is within +/- 2 standard errors from the sample mean

Why is it important?

: Move away from a single ``estimate'' to a range of values, which is more realistic
: Get to make the meta-level statement - our confidence about the first statement

How do I use it to make a decision?

: Example, is 812 a feasible value for the true mean?
: Answer: look to see if 812 lies in the confidence interval
: If it's in the interval then it's a feasible value
: If it's outside the interval then it is not feasible

ShaftXtr.jmp A confidence interval for the population mean.

CompPur.jmp A confidence interval for the intent to purchase.

Sampling

Introduction

Context; there is a target population - the group you wish to make inferences about. You draw a sample. Use the sample to make statements about the population.

Sample must be representative of the population

Sampling is the way to obtain reliable information in a cost effective way (why not census?)

Objective; collect information. Issues:

: What to measure?
: How accurate do we need it?
: How often do we need it?
: Does it meet end user requirements?

Issues in sampling

Representativeness

: Interviewer discretion
: Respondent discretion - non-response
: Key question: is the reason for non-response related to the attribute you are trying to measure? Illegal aliens/Census. Start-up companies/not in phone book. Library exit survey.

Good samples;

: Good samples; probability samples; each unit in the population has a known probability of being in the sample
: Simplest case; equal probability sample, each unit has the same chance of being in the sample
: Bad samples - the rest, convenience samples

The utopian sample for analysis

: You have a complete and accurate list of ALL the units in the target population (sampling frame)
: From this you draw an equal probability sample (generate a list of random numbers)
: Reality check; incomplete frame, impossible frame, practical constraints on the simple random sample (cost and time of sampling)

Precision considerations

: How large a sample do I need? p.117
: Focus on confidence interval - choose coverage rate (90%, 95%, 99%) margin of error (half the width). Typically trade off width against coverage rate.
: Simple rule of thumb for a population proportion - if it's a 95% CI then use n = 1/(margin of error)**2.

Examples

Survey1.jmp A hotel customer satisfaction survey.

Subsections

2001-09-06