Stat 601, Fall 2001, Class 2
-
- Summary measures; mean, median,variance,sd,IQR
-
- Graphical summaries/diagnostics; histogram,boxplot,normal quantile plot
-
- If approx normal then can use empirical rule
-
- What is the Empirical rule?
-
- Often data is approx normal - but not always
-
- Tracking sample means and standard deviations: x-bar and s-charts. Setting control limits
-
- The standard error of the mean;
-
- The Central Limit Theorem
-
- Confidence intervals
-
- Using a confidence interval to make a decision
-
- Assumptions and their role in analysis
-
- Ideas behind sampling
ShaftXtr.jmp
Monitor a production process assuming observations are independent.
-
- Achieve this by placing control limits
-
- How to choose limits - can use empirical rule on sample means
-
- In control: mean and variance stable over time
-
- Capable: process meets specs
-
- E.R. needs to know s.d. of the sample means
-
- SD of
where n is number of
observations in sample mean
-
- Can use overall sample mean +/- 3
as "3 sigma
limits"
-
- Chances a particular observation is outside these limits if process is
in control is 1 -.997 (from ER), ie small
-
- Unlikely evens signal something is wrong -> take action
- Sample means are less variable than raw data
- SE(
)
=
where
is the true s.d. of a single observation and n is the number of observations in the sample mean
-
- Sample means are approximately normally distributed. (see p.66 of CaseBook)
-
- E(
)
= .
-
- Var(
)
=
.
-
- s.d.(
)
= SE(
)
=
.
-
- Because sample means are approx. normal can use Empirical Rule on them.
-
- Two types
-
- X-bar chart; track sample means
-
- s-chart; track sample standard deviations
-
- Setting the control limits - two ways (JMP gives choice);
-
- From the engineer; use their specs to create limits
-
- From the data; use overall sample mean and overall sample variance
plus the Empirical Rule to create limits (typically 3-sigma)
Two examples
ShaftXtr.jmp A well behaved process -- in control.
CarSeam.jmp A process that fails to meet engineers specs.
CompChip.jmp A process that breaks down.
-
- S-charts are usually one-sided in manufacturing
-
- Dealing with miracles; someone has to win the lottery but the same person should not win it three times in a row. (p.63 of CaseBook)
-
- Daily means, weekly means, monthly means or WHAT? (p.79 of CaseBook)
-
- What is it?
-
- 1. A range of feasible values for an unknown population parameter, e.g.
or
-
- 2. A statement conveying the confidence that the range of feasible values really does include the unknown population value
-
- Where does it come from?
-
- Inverting the Empirical rule
-
- If 95% of the time the sample mean is within +/- 2 standard errors from ,
then 95% of the time the true
is within +/- 2 standard errors from the sample mean
-
- Why is it important?
-
- Move away from a single ``estimate'' to a range of values, which is more realistic
-
- Get to make the meta-level statement - our confidence
about the first statement
-
- How do I use it to make a decision?
-
- Example, is 812 a feasible value for the true mean?
-
- Answer: look to see if 812 lies in the confidence interval
-
- If it's in the interval then it's a feasible value
-
- If it's outside the interval then it is not feasible
ShaftXtr.jmp A confidence interval for the population mean.
CompPur.jmp A confidence interval for the intent to purchase.
-
- Context; there is a target population -
the group you wish to make inferences about. You draw a sample.
Use the sample to make statements about the population.
-
- Sample must be representative of the population
-
- Sampling is the way to obtain reliable information in a cost
effective way (why not census?)
-
- Objective; collect information. Issues:
-
- What to measure?
-
- How accurate do we need it?
-
- How often do we need it?
-
- Does it meet end user requirements?
-
- Representativeness
-
- Interviewer discretion
-
- Respondent discretion - non-response
-
- Key question: is the reason for non-response related to the
attribute you are trying to measure? Illegal aliens/Census.
Start-up companies/not in phone book. Library exit survey.
-
- Good samples;
-
- Good samples; probability samples;
each unit in the population has a known
probability of being in the sample
-
- Simplest case; equal probability sample, each unit has the same
chance of being in the sample
-
- Bad samples - the rest, convenience samples
-
- You have a complete and accurate list of ALL the units in the
target population (sampling frame)
-
- From this you draw an equal probability sample (generate a list of
random numbers)
-
- Reality check; incomplete frame, impossible frame, practical
constraints on the simple random sample (cost and time of sampling)
-
- How large a sample do I need? p.117
-
- Focus on confidence interval - choose coverage rate (90%, 95%,
99%) margin of error (half the width). Typically trade off width
against coverage rate.
-
- Simple rule of thumb for a population proportion - if it's a 95% CI then use n = 1/(margin of error)**2.
Survey1.jmp A hotel customer satisfaction survey.
Subsections
2001-09-06