The EM algorithm is now a force of nature. It holds a powerful sway over the modeling and estimating energies of numerous statisticians, bioinformaticists, and machine learners.
It is an almost Herculean task to try to understand both the the current scope and historical trajectory of the EM algorithm.
Also, at the end of the day, we are all more interested in what is new than what is old. An important resource for both is
Meng, X.-L. and van Dyk, D. (1997). "The EM Algorithm --- an Old Folk-song Sung to a Fast New Tune," J. R. Statist. Soc. 59 (3), 511-567.
Meng and van Dyk observe that while it is difficult to say who first sang the EM tune, there is no doubt about the artists who first put it on the top-ten list.
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). "Maximum likelihood from incomplete data via the EM algorithm (with discussion)," J. R. Statist. Soc. B, 39, 1--38.
The paper that fixes the triangle inequality snaffu in DLR is
Wu, J. (1983) "On the Convergence Properties of the EM Algorithm", Annals of Statistics, 11, no. 1, 95-103.