11 Probability Models for Counts

R becomes more useful for the exercises in this chapter because it has built-in functions for a wide range of probability models, including both the binomial and Poisson models for random variables described in this chapter. For each family of random variables, R includes three types of functions, identified by the leading letter:

  • the probability distribution (or density function, starting with “d”),
  • the cumulative probability distribution (starting with “p”), and
  • the quantile function (starting with “q”).

For example, dbinom computes a binomial probability distribution \(P(X = x)\), pbinom the cumulative distribution \(P(X \le x)\), and qbinom the quantiles (a.k.a. percentiles). Given a probability \(p\), the quantile function finds the value \(x\) such that \(P(X \le x) = p\). Similarly, dpois, ppois, and qpois compute these functions for a Poisson random variable. (R defines such functions for many other random variables, such as t-distributions and chi-squared distributions.)

To draw the density function of a discrete random variable like the binomial, the textbook presents the density in the “tethered balloon” style illustrated in R in Chapter 8. For example, here’s the probability distribution of a binomial random variable with parameters \(n = 20\) trials and \(p = 0.25\). The number of trials is specified in R by the size argument.

x <- 0:20
p_x <- dbinom(x, size=20, prob=0.25)             # P(X = x)
plot  (x, p_x, col='gray', type='h', bty='L', ylab="Binomial(20,0.25)")
points(x, p_x, pch=19)                        # solid circle

R also has a convenient way to draw the cumulative distribution. This function is tricky to draw otherwise because of the vertical jumps at the integers. (I would prefer that R omit the vertical line at the jump points, but that’s not what it does.)

x   <- 0:20
P_x <- pbinom(x, size=20, prob=0.25)    # P(X <= x)
plot  (x, P_x , type='s', bty='L', ylab="Binomial(20,0.25)")
points(x, P_x, pch=18)               # diamond shaped point

To see other choices for drawing points, look at the help information given for pch.

11.1 Analytics in R: Focus on Sales

The random variable is \(Y \sim Bi(n=9, p=0.8)\). Here’s the probability distribution of \(Y\), first listed in a table and then shown in a plot.

y   <- 0:9                          # possible values of the r.v.
p_y <- dbinom(y, size=9, prob=0.8)     # P(Y = y)

To show these neatly as a table, join the two vectors side by side to form a matrix, then assign names to the columns of the matrix.

dist <- cbind(y,p_y)
colnames(dist) <- c("y", "p(y)")
dist
##       y        p(y)
##  [1,] 0 0.000000512
##  [2,] 1 0.000018432
##  [3,] 2 0.000294912
##  [4,] 3 0.002752512
##  [5,] 4 0.016515072
##  [6,] 5 0.066060288
##  [7,] 6 0.176160768
##  [8,] 7 0.301989888
##  [9,] 8 0.301989888
## [10,] 9 0.134217728
plot(y, p_y, type='h', col='gray', bty='l')
points(y,p_y, pch=19)

It is easy to check that this is a probability distribution: the values of p_y are positive and sum to 1.

sum(p_y)
## [1] 1

The sought probability in the example is \(P(Y=6) ≈ 0.1761\). (Be careful: this is the 7th item in the vector of probabilities because zero is the first value.)

11.2 Analytics in R: Defects in Semiconductors

The Poisson distribution in this example is concentrated on small integers.

x   <- 0:6
p_x <- dpois(x, lambda=314/400)
cbind(x,p_x)   # show a table with the probabilites
##      x          p_x
## [1,] 0 0.4561197018
## [2,] 1 0.3580539659
## [3,] 2 0.1405361816
## [4,] 3 0.0367736342
## [5,] 4 0.0072168257
## [6,] 5 0.0011330416
## [7,] 6 0.0001482396
plot(x, p_x, type='h', col='gray', bty='l', xlab="x", ylab="p(x)")
points(x,p_x, pch=19)

Notice that the Poisson probabilities in this table sum to a little less than 1. That’s because every Poisson distribution puts some probability on values all the way out to infinity.

sum(p_x)
## [1] 0.9999816

How much probability is left out, computed directly? Use ppois to find out.

1 - ppois(6, lambda=314/400)   # 1 - P(X <= 6)  = P(X > 6)
## [1] 1.840955e-05