Last time we discussed Q-Q plots which plot OBSERVED against EXPECTED quantiles. As with most diagnostics when the model holds true observed and expected should be about equal and in the case of the Q-Q plot we hope to see an approximate line in the plot.
But what exactly is a Quantile.? Imagine taking lots of data, enough to be able to draw an extremely detailed histogram. Now "normalize" the histogram so that the area beneath it equals one. The function that describes the shape of the normalized histogram is called a probability density function and often denoted by the convention f(x). The probability density function for the standard normal distribution looks like this:
The cumulative distribution function for the normal distribution looks like this:
Now the quantiles come about by looking at the CDF graph the other way around. Rather than asking, "how much area is there to the left of x?", the quantile function goes the other way - "what x-value has a certain area to the left of it?" For example, we could ask "what x-value has 0.9 area to the left of it?". The answer to this question is precisely the 0.9 quantile, or the 90th percentile of the normal distribution. From the graph above you can see that it's about 1.6, but of course if you want the exact number you get it from a computer.
Finally the Q-Q plot comes into being by plotting the observed quantiles of the data against the expected quantiles from a standard normal distribution.