Class 2
What you need to know from last time
Summary measures; mean, median,variance,sd,IQR
Graphical summaries/diagnostics; histogram,boxplot,normal quantile plot
If approx normal then can use empirical rule
What is the Empirical rule?
Often data is approx normal - but not always
Two parts to todays class
Covariance and correlation
Tracking means and variances
Covariance and correlation
Summary measures for 2 variables.
Covariance - measuring the linear relationship
Correlation - measuring the linear relationship on a unitless
scale: -1 <= Correlation <= 1
-1 is perfect negative dependence
+1 is perfect positive dependence
When data is Normal then Correlation = 0 is equivalent to
independence
Can compare two correlations, but not typically two covariances.
Corr(X,Y) = Cov(X,Y)/Sqrt{Var(X) * Var(Y)}
Cov(X,Y) = Corr(X,Y) * Sqrt{Var(X) * Var(Y)}
Key application: building low risk portfolios.
Idea: buying instruments that move in opposite directions can lower
portfolio variability dramatically
Example
StockRet.jmp
Theory - population quantities - true but unknown
Practice - use sample statistics
Finance arithmetic
Average of sum is sum of averages
X is return on IBM, Y is return on Walmart
Portfolio is X + Y
E[X + Y] = E[X] + E[Y]
Variance of sum is sum of variances only if X and Y are
uncorrelated
Var(X + Y) = Var(X) + Var(Y)
Variance of sum is sum of variances PLUS 2 * sum of all covariance
pairs
Var(X + Y) = Var(X) + Var(Y) + 2 * Cov(X,Y)
All details in course pack
Toy example:
Two instruments X and Y.
Make a portfolio, with weights w1 and w2 = 1 - w1.
Say Var(X) = Var(Y) = 1.
How does the portfolio variance change with w1 and
?
Down load image or copy in class.
Part 2 of class
Monitoring the mean and variance of a process
Example
ShaftDia.jmp
Objective
Monitor a production process assuming observations are independent.
Achieve this by placing control limits
How to choose limits - can use empirical rule on sample means
Sample means are approx normal
(central limit theorem -- more later)
In control: mean and variance stable over time
Capable: process meets specs
E.R. needs to know s.d. of the sample means
SD of
where n is number of
observations in sample mean
Can use overall sample mean +/- 3 *
as "3 sigma limits"
Chances a particular observation is outside these limits if process is
in control is 1 -.997 (from ER), ie small
Unlikely events signal something is wrong -> take action
Richard Waterman
Wed Aug 6 22:51:49 EDT 1997