Quantiles and Q-Q plots

Last time we discussed Q-Q plots which plot OBSERVED against EXPECTED quantiles. As with most diagnostics when the model holds true observed and expected should be about equal and in the case of the Q-Q plot we hope to see an approximate line in the plot.

But what exactly is a Quantile.? Imagine taking lots of data, enough to be able to draw an extremely detailed histogram. Now "normalize" the histogram so that the area beneath it equals one. The function that describes the shape of the normalized histogram is called a probability density function and often denoted by the convention f(x). The probability density function for the standard normal distribution looks like this:

Now imagine adding up the area to the left of a point on the X-axis. The function that gives this area is called the Cumulative Distribution Function and usually written as F(x).

The cumulative distribution function for the normal distribution looks like this:

As an example, think about the area to the left of 0 for the standard normal above. Since it's symmetric there must be 0.5 worth of area to the left of 0. Which is why the Cumulative Distribution Function (CDF) takes on the value of 0.5 above 0. That's just the way the CDF is defined -- area to the left of in the density graph.

Now the quantiles come about by looking at the CDF graph the other way around. Rather than asking, "how much area is there to the left of x?", the quantile function goes the other way - "what x-value has a certain area to the left of it?" For example, we could ask "what x-value has 0.9 area to the left of it?". The answer to this question is precisely the 0.9 quantile, or the 90th percentile of the normal distribution. From the graph above you can see that it's about 1.6, but of course if you want the exact number you get it from a computer.

Finally the Q-Q plot comes into being by plotting the observed quantiles of the data against the expected quantiles from a standard normal distribution.


Richard Waterman 09/10/97