Stat 601, Fall 2001, Class 1
-
- Premise: all business becomes information driven
-
- Competitiveness: how you collect and exploit information to your advantage
-
- The challenges
-
- Most corporate data systems are not ready.
-
- Can they share information?
-
- What is the quality of the information going in
-
- Most data techniques come from the empirical sciences; the world is not a lab.
-
- Cutting through vendor hype, info-topia.
-
- Defining metrics, abandoning gut rules of thumb is not a safe path for the manager.
-
- Communicating success, setting the right expectations.
-
- Recognize where and how information analysis can feed into your business
-
- Strategy driven by hard information?
-
- Change emphasis toward interpretation and practical application
-
- The importance of graphics in informing analyses
-
- Enjoy it
-
- Course material
-
- Grading/assessment
-
- TA's and office hours??
-
- Evaluations
-
- Computing
Metaphor; the spoken language, not the grammar.
-
- Material
-
- Classes 1-4. Understanding/measuring variability. Why it is important. Factor in variability/uncertainty to the decision making process
-
- Who cares? The Basel Accord
-
- Risky investments need higher reserves
-
- Need to measure risk. e.g. J.P.Morgan
-
- Risk == volatility of returns
-
- Volatility == variability
-
- The Four Book average - smoothing to reduce noise
-
- Classes 5-10. Regression/statistical modeling/forecasting/explaining
variability
-
- Models
-
- Stock market
-
- Market share
-
- Real estate prices
-
- What's different? Our model explicitly incorporate variability; don't just get to model the process, get to say how good the model is. Meta-information: statements about the quality of information.
Value added - the idea of precision.
-
- What to get out of the course
-
- Perform statistical analysis - hands on
-
- No stats background - not math based
-
- PRACTICAL APPLIED MODERN STATISTICS
-
- Success in the course
-
- Learn the right questions to ask
-
- Critical evaluation of another's analysis
-
- Confidence to perform analysis/use tools
-
- Presentation and communication of results
-
- Guarantee: you will be faced with more data, not less.
This course is about evaluating, summarizing and leveraging information
- Key concept: Summarizing data
- Key tool: the Empirical rule
- Key graphics: Histogram and boxplot
- Key concept: Exploring data for sources of variation
- Key graphics: Multiple boxplots + time series charts
Box plot |
Identification of outliers |
Histogram |
Shape of data, skewness. Outliers |
Normal quantile plot |
Diagnostic for normality |
|
CENTER |
SPREAD |
Sensitive to outliers |
Mean |
Variance/SD |
Robust |
Median |
IQR |
-
- Mean = average. True ,
estimated
-
- Median = order the data, the one in the middle. Not standard.
-
- Variance = average squared distance from the mean. True ,
estimated s2
-
- S.D. =
.
True ,
estimated s
-
- IQR = 75 pctile - 25 pctile. Not standard.
-
- Symmetric bell shaped; mean
median
-
- Right skew; mean greater than median
-
- Left skew; mean smaller than median
-
- Symmetric bell shaped - good news.
-
- Skewness - watch out!
If data bell shaped and symmetric then say approximately normal.
Key: the mean and standard deviation summarize the data efficiently in
these circumstances.
The EMPIRICAL RULE rule applies when data is approximately normal.
Rule of thumb for normal data - it ties together the mean and standard
deviation, (
and )
into a rule that establishes where most of the data should lie. If the data is outside this range then it's an
``atypical'' observation; in J.P. Morgan's terminology an adverse market move.
Special one:
gives a 10% chance of
falling out of the range. That is 5% on each side (tail), one in 20 times
we see the lower event, about 1 trading day a month.
-
- Variables that explain structure in the data
-
- Segmentation/Aggregation in marketing
-
- The ``style'' of a portfolio manager
-
- Trends over time
-
- Looking for leads in the data to explain its structure
-
- Characterize the good prospects
-
- Capable process: meets design specs - engineering view.
-
- In control process: no trend in mean or variance - statistical view.
-
- In control - more of a monitoring concept
-
- An excellent way to view data, broken out by a second variable, e.g. sales by region, or employee evaluation by age.
-
- Look for: differences between the medians, and differences in length, center and spread.
-
- 1. Always, always plot your data
-
- 2. If its recorded against time, plot it against time
-
- Summary measures
-
- Robust vs. Sensitive
-
- Empirical rule for mound shaped and symmetric data
-
- Ties together mean and s.d. to help define an ``unusual event''
-
- Disparate data may be approx normal, ie GMAT and GM
-
- But not ALL data is normal, ie Eisner's compensation.
Subsections
2001-09-06