Shattered Sets --- A Story of a Colorful Term

The Concept

One notion that has become widely used in the theory of Vapnik-Chervonenkis dimension is that of a shattered set.

Definition: Given a collection F of subsets of a set S, we say that the finite subset A of S is shattered provided that every subset B contained in A can be written as intersection of A with an element of F.

Thus, a set of A of three points in general position in the plane S is shattered by the collection F of half-planes, but a set of four points A' in the plane is not shattered by the collection F of half-planes.

The relevance of this concept to uniform laws of large numbers is due to Vapnik and Chervonenkis. On the other hand, the notion had antecedents in both combinatorics (Sauer) and mathematical logic (Shelah) .

Ah, but the Term "Shattered Set" ...

At the risk of seeming immodest, I will stick my neck out in hopes of catching a few rays of reflected glory.

I believe that the term "shattered set" was introduced by me in my Ph.d thesis in 1975, the most relevant piece of which was published in the Annals of Probability in 1978.

Naturally, I am delighted that this evocative term has become part of the standard vocabulary, and, after discussion with some friends, it occurred to me that it might be fun to stir the pot a bit by staking a web claim to this little piece of machine learning history. I am pretty sure that there are no antecedents to my use of the term "shattered set" --- even in the Russian originals which I once inexpertly decoded. Still, as anyone who has dug into priority issues knows, the story is seldom simple. If there is something that I should add to what I have said, please let me know.

There's More to Every Story -- even a short one about Shattered Sets

In the Fall of 1975 I mentioned the Vapnik-Chervonenkis lemma to Erdös, and Erdös told me about an the paper of Sauer which seems to be the earliest of the sources for the VC lemma. Erdös also mentioned that there was related work of Shelah. I did not find the Shelah result (or could not see my way to it through the thicket of mathematical logic),so I ended up just citing Sauer (1972) in my '78 J. Comb. Theory paper where I give an "n-color" version of the Sauer-Shelah-Vapnik-Chervonenkis lemma.

Where Is the "Sauer-Shelah Lemma" Today ?

This theory has been rolling along for almost thirty years now, and it may be useful to do a survey that attends to the mathematics --- not just etymological curiosities. In particular, I mentioned several conjectures in my JCT paper, and it has been dog's years since I checked on their status.

Google Scholar makes this easier --- yet still not easy. Haussler and Long (1995) provide the beginnings of a survey, though principally as an introduction to the their new contributions. Even so, the paper of Haussler and Long is now more than ten years old.

For at least a while, I will continue to follow the trail of references to see if a survey might be in order. I'll also scan my thesis and post it to the web. I've been planning to do this for a long time, and now I have a motivation. Check back in a month or so for further developments.

References

  1. D. Haussler and P. M. Long, A generalization of Sauer's Lemma. Journal of Combinatorial Theory, Series A, 71(2) (1995) 219-240.
  2. S. Shelah, A combinatorial problem: stability and order for models and theories in infinitary languages, Pacific J. Math. 41 (1972), 247--261. MR 46:7018
  3. J.M. Steele, Empirical discrepancies and subadditive processes, Annals of Probability, 6 (1978), 118--227.
  4. J.M. Steele , Existence of submatrices with all possible columns, Journal of Combinatorial Theory, 24 (1978) 84--88.

Invitation

If you know the Vapnik-Chervonenkis papers in the original Russian, please let me know if what I have written here conforms to your understanding. Naturally, if anyone knows of a survey of the Sauer-Shelah Lemma and its tributaries, I would be delighted to learn of it..

 

BACK TO: J. Michael Steele Home Page

Back to Steele's Home Page or Surprise Me

[Key Words: Sauer Lemma, Sauer-Shelah Lemma, VC Dimension, Uniform Strong Laws, Shattered Set, Vapnik-Chervonenkis Lemma, Combinatorial Entropy, Subadditive Processes, induced subsets, Dart calculus]