Stat 601, Fall 2001, Class 1
- Premise: all business becomes information driven
- Competitiveness: how you collect and exploit information to your advantage
- The challenges
- Most corporate data systems are not ready.
- Can they share information?
- What is the quality of the information going in
- Most data techniques come from the empirical sciences; the world is not a lab.
- Cutting through vendor hype, info-topia.
- Defining metrics, abandoning gut rules of thumb is not a safe path for the manager.
- Communicating success, setting the right expectations.
- Recognize where and how information analysis can feed into your business
- Strategy driven by hard information?
- Change emphasis toward interpretation and practical application
- The importance of graphics in informing analyses
- Enjoy it
- Course material
- Grading/assessment
- TA's and office hours??
- Evaluations
- Computing
Metaphor; the spoken language, not the grammar.
- Material
- Classes 1-4. Understanding/measuring variability. Why it is important. Factor in variability/uncertainty to the decision making process
- Who cares? The Basel Accord
- Risky investments need higher reserves
- Need to measure risk. e.g. J.P.Morgan
- Risk == volatility of returns
- Volatility == variability
- The Four Book average - smoothing to reduce noise
- Classes 5-10. Regression/statistical modeling/forecasting/explaining
variability
- Models
- Stock market
- Market share
- Real estate prices
- What's different? Our model explicitly incorporate variability; don't just get to model the process, get to say how good the model is. Meta-information: statements about the quality of information.
Value added - the idea of precision.
- What to get out of the course
- Perform statistical analysis - hands on
- No stats background - not math based
- PRACTICAL APPLIED MODERN STATISTICS
- Success in the course
- Learn the right questions to ask
- Critical evaluation of another's analysis
- Confidence to perform analysis/use tools
- Presentation and communication of results
- Guarantee: you will be faced with more data, not less.
This course is about evaluating, summarizing and leveraging information
- Key concept: Summarizing data
- Key tool: the Empirical rule
- Key graphics: Histogram and boxplot
- Key concept: Exploring data for sources of variation
- Key graphics: Multiple boxplots + time series charts
Box plot |
Identification of outliers |
Histogram |
Shape of data, skewness. Outliers |
Normal quantile plot |
Diagnostic for normality |
|
CENTER |
SPREAD |
Sensitive to outliers |
Mean |
Variance/SD |
Robust |
Median |
IQR |
- Mean = average. True
,
estimated
- Median = order the data, the one in the middle. Not standard.
- Variance = average squared distance from the mean. True
,
estimated s2
- S.D. =
.
True
,
estimated s
- IQR = 75 pctile - 25 pctile. Not standard.
- Symmetric bell shaped; mean
median
- Right skew; mean greater than median
- Left skew; mean smaller than median
- Symmetric bell shaped - good news.
- Skewness - watch out!
If data bell shaped and symmetric then say approximately normal.
Key: the mean and standard deviation summarize the data efficiently in
these circumstances.
The EMPIRICAL RULE rule applies when data is approximately normal.
Rule of thumb for normal data - it ties together the mean and standard
deviation, (
and
)
into a rule that establishes where most of the data should lie. If the data is outside this range then it's an
``atypical'' observation; in J.P. Morgan's terminology an adverse market move.
Special one:
gives a 10% chance of
falling out of the range. That is 5% on each side (tail), one in 20 times
we see the lower event, about 1 trading day a month.
- Variables that explain structure in the data
- Segmentation/Aggregation in marketing
- The ``style'' of a portfolio manager
- Trends over time
- Looking for leads in the data to explain its structure
- Characterize the good prospects
- Capable process: meets design specs - engineering view.
- In control process: no trend in mean or variance - statistical view.
- In control - more of a monitoring concept
- An excellent way to view data, broken out by a second variable, e.g. sales by region, or employee evaluation by age.
- Look for: differences between the medians, and differences in length, center and spread.
- 1. Always, always plot your data
- 2. If its recorded against time, plot it against time
- Summary measures
- Robust vs. Sensitive
- Empirical rule for mound shaped and symmetric data
- Ties together mean and s.d. to help define an ``unusual event''
- Disparate data may be approx normal, ie GMAT and GM
- But not ALL data is normal, ie Eisner's compensation.
Subsections
2001-09-06