Please see PDF version

Canad. Math. Bull. Vol. 21 (1), 1978



ABSTRACT. For any ergodic transformation T a set A of measure less than e is constructed with the property that for every finite set F there is a j = j(F) such that F c T1A. The basic tool used to prove this is a purely combinatorial result which says there is a small subset of {1, 2, . . . , n} which can be shifted a small amount to cover any k set in {j: 8n :s j 5 n}. Applications are given to the theory of combinatorial entropy.

I. Introduction. The primary objective of this paper is to prove the following:
If T is an ergodic transformation, then there is a set A of measure less than a such that for every finite set F there is a j such that Fa TiA.
In the first place the very definition of ergodicity provides us with the initial introduction to the theorem. For an ergodic T and any set A of positive measure, one has that the images TiA are sufficient to "catch" almost any single element set. The crux of the present result is that there is some (arbitrarily small) A whose images T1A are sufficient to "catch" any finite element set.
As this relation to ergodicity suggests, this type of finite covering property relates in severalways to the taxonomy of measure preserving transformations. In fact the aperiodicity of a transformation is equivalent in a sense described in the third section to the possession of the finite covering property.
The method of proof employed here also has byproducts which may prove to be of wider application. In particular the proof proceeds by first establishing some purely combinatorial. results, and then using Rohlin's Theorem as a means of applying the combinatorial facts to ergodic theory problems. The second section contains only combinatorial work and can be read indepen
4 1 dently of the remainder.
Finally, we give several applications of the finite covering property. Particularly, the relationship of Kolmogorov's entropy to the combinatorial entropy of Vapnik and Chervonenkis is considered.

11. A combinationd _ lemm. In the following, the notation [0, n] will indicate the set {j: 0 s jS n}. Similarly, we will use [0, r) for {j: 0  j < r} and

Received by the editors October 22, 1976 and, in revised form, July 14, 1977. Key words. Ergodic, aperiodic, entropy, combinatorial method. 85



86 J. M. STEELE [Match

[0, Sn] for {j: 0:5 jS Sn}, even when 8n is not an integer. As usual ISI denotes the number of elements of the set S.
We now have the main result of this section.

LEmmA II. l. Let k be a positive integer and 6, e, and c, be positive reals. Then there is an arbitrarily large n and a set S  [0, n] such that the following conditions hold:

(1) ISIM:5 s,
(2) For any k element set F c [8n, n] there is a natural number t such that F S + t with t:5 e,n.

Proof. We first write n = i! and r will be chosen in the course of the proof. The set S will be given as the union over j = 1, 2, . ., k of the sets Si defined by

Si ={r:rE(s(,r2j+ 1)!(72j% s(,r2j+1)!] for some se[l,T!1(72j+l)!]}

Now let nj < % < % ... < nk be the elements of the k set F. We need to check that there is a t 5 E2n such that S + t contains F. The integer t will be given as the sum of k integers tl, t2, tk which will be constructed to have the properties that

n,(tl+t2+. . .+tj)=si(T2j+l)!, Si > 0,
ti+l + ti+2 + tk (= [0, (T  2j)!).

By the definition of the Si we therefore have that n,  t EE Si for j = 1, 2, . ., k and hence F c: S+ t.
We now construct the tp First by choosing r ~t 318 we have 8n ~: 3(.r  1)! and hence from the fact that nj ~~t 8n we have

Sj(T1)!:5n,<(sl+l)(T1)! with s,3.
We then define
t, = nj  sl(,r  1)!

and note that 0 t t < (,r  1)! Now for any j we write inductively that

sj(72j+l)!5n,tlt2* . tj,<(sj+l)(i2j+l)!

and define t, by

ti = nj  t, t2
We now note that

0:5 t, <(T2j+ 1)!



and hence t,,, + tj+,+. +tk < k(T2j 1)!. By taking T~. 3k we have for j=1,2,. .., k1

ti,l + . . . + tk < (,r  2j)!

Thus we have t,+ t,+. + tk <(,rl)!+(,r2)! so n, tlt2
and hence sj~(.71)!1(72j+1)9>0, for j= 2, 3,. k. We have thus verified that S + t contains F. This has been done with t < (,r  1)! + (,r  2)! so t < C2n provided r is chosen with r > 2/62. Property (2) is thus completely verified.
For the fact that ISI/n < e we note that for j = 1, 2,. . ., k we have IS,1:~~ (,r2j)!7.!/(,r2j+l)!5(,r2k)!,r!/(72k+l)! so ISI(,r!)'lcl(,r2k+l). The last requirement we place on r is that it be so large that Q(,r2k+l) The difficulty of Lemma II.l. is tied to the finiteness of the interval [0, n]. For the infinite interval one can obtain analogous results more easily. Only Lemma II.1 will be used in the sequel, but we given an example of what a more precise, yet easier, result On [0, ).

THEOREM 11.2. There is a subset S c [0, ) with the following properties:
(1) iim, s n [0, nl/n = 0, i.e. S has zero density.
(2) For any disjoint finite sets F and G there is a positive t such that F c: S  t and Gn(st)=o.

Proof. Let C,, C Cf, where f = 2 k , be a list of the 2' district strings of 0
and 1 of length k. Let CO be a string of 2` zeros, and lk the string of O's and
l's obtained by writing CO followed by Q, C2 etc. Finally let 1 be the infinite
string obtained by writing. 12,13, etc. The set S is now given by

S = {r E [0, ): The rth element of 1 is a 1}.

It is easily checked that S has zero density due to the large blocks of zeros Co.
Now given any finite disjoint sets F and G there is an h such that F U G c
[0, h], and since

c = (a,, a2, a*), , aj = {1 if iEF
0 otherwise,

is equal to one of the blocks in 1 there is a t such that S  t contains F and is disjoint from G. Hence the construction is complete.
With these combinatorial observations, we can proceed to the main result.

III. Aperiodle transformations and the main result. We will be concerned with transformations T on a probability space (fl, 5, g) which are aperiodic. That is, we assume that for any n and any B E 5 with g (B) > 0 there is a Bc: B



88 J. M. STEELE [March

such that g(BAT n B ~) > 0. It will also be required that T is measure preserving, i.e. for A E 1 we have T`A E: 1 and g (T`A) = g (A). (Here, of course, A AB denotes A \ B U B \ A, the symmetric difference.)
Any ergodic transformation is aperiodic, and for an example of an aperiodic transformation which is not ergodic one can take the transformation on [0, 11 X [0, 11 defined by T(x, y) = ((x + a)mod 1, y) where a is irrational. The only fact which we use about aperiodic transformations is the following version of Rohlin's Theorem.

LEMMA III.l. Let T bean aperiodic transformation on anon atomic probability space. Given any e>0 and any integer n~tl, there is an Ee1 such that E, T'E, T 2 E,..., T n E are disjoint and g (U,~=0 TT) ~t 1  c.

The proof of this result is essentially contained in [1, p. 71] and is also a special case of [2, p. 282]. Our main reason for focusing on the class of aperiodic transformations is the availability of Lemma III.l. We now have the main result of this paper.

THEOREM III.l. Let T be an aperiodic, bimeasurable, measure preserving transformation on a nonatomic probability space. There is a set A of measure less than a > 0 such that for every finite set F there is a j = j(F) such that F c TjA.

Proof. By the combinatorial lemma we select Sk c [0, nk] such that
(4.1) 1Ski/nk:5e2 k , and
(4.2) For any k element set F c [n2 nj there is a positive integer t such that F.= Sk + t where t  n 2 k , n  nk.
Now by Rohlin's theorem we obtain a set Ek such that Ek, T'Ek,
T`Ek,, TnEk are disjoint and whose unio ' n has measure greater than
1~2 k . The basic building block of the construction is then defined by

(4.3) Ak = U TiEk'

We first observe that g(Ak):51Ski/nk:5e2 k so one has 9(UZ2Ak):5e.
Next we let B(k) be defined by

( n ), U (n2'
(4.4) B (k) = U TiEk U TJE).
10 j0

Since g (B (k)):52 k11 we have for B= n72 U '_1 B (k) that g (B)= 0. The set k
A required by the theorem can now be taken to be

(4.5) A=( 5 k) U ( ' PB).
k2 10

By the previous estimates g (A) 5 c so it remains, only to show that A can



"catch" any finite set F. First let F, = B n F and note that B c TjA for all j so F, c TjA for all j also. Next let F2 = F n B'. Since F2 c B' there is a k,, such that F2cB(k)' for all kko. Now for F2cB(k)' we define a subset of [n2 k n] by F={m:(TmE)nF2740} and recall there is a t, W5t:5n2 k such that F'c Sk + t. Consequently we have

(4.6) F2 c U T~E c U TmE c TAk.
mEF' MEsk+t

For this choice of t we therefore have F = F, U F2 c T`A so the theorem is complete.
An easy consequence of the preceding result is that the finite covering property actually characterizes the aperiodic transformations.

THEOREM 111.2. A bimeasurable, measure preserving transformation on a nonatomic probability space is aperiodic if and only if for e > 0 there is an A with ii(A) < c such that for every finite F there is a j  j(F) such that F TjA.

We have already proved that aperiodicity yields the finite covering property and the converse is a consequence of the following elementary lemma.

LEMMA 111.2. If for any e > 0 there is a A with g (A)< e such that g (U j=0 T1 'A) = 1 then T is aperiodic.

Proof. Suppose T is not aperiodic and let B E 1 denote an invariant set with 0 < g(B) 5 1 for which there is an n such that g(B'ATnB) = 0 for all Wc B, Ye 1. If A is chosen as in the hypothesis, then g(B A U 'i0 T`(A n B)) = 0 since B is invariant. But g(A n (B ATn(A n Bffl = 0 so 1£(UiOTI(AiIB)):5ng(AnB), and consequently g(B):sni£(AnB):5 nii(A). Since A can be chosen independently of n and arbitrarily small, we have a contradiction to g (B) 0 0. This contradiction proves the lemma.

IV. Applications to combinatorial entropy. If S is a class of subsets of 11 and X,, X2, .. . , Xn, . . . is a sequence of independent identically distributed random variables which take values in £1 we first define a new random variable as follows:

'AS (Xl, X2,.. , X.) = I{B: B = A n {xl, x, x,,}, A E S}1

i.e. &s(Xl,X2,..., Xj is the number of distinct subsets of the sample X,, X2, . . ., Xn obtained by intersection with an element of S.
The VapfflkChervonenkis entropy of the class S with respect to the random variables Xi is defined by the limit

h (S)= litn 1 log E(A (X,, X2,. Xj).
r n

This limit always exists, as proved by Vapnik and Chervonenkis [4] and [51.

90 J. M. STEELE [March

The most important consequence of this definition is that the random variables D. ~ SUPAES In` 1,~=, 1, (Xi)  P(Xl e A)l converges to zero in probability if and only if h(S) = 0. (Here 1A denotes the indicator of the set A.) This result was obtained in [4], and in [3] it was proved that, in fact, D, converges to zero a.s. if and only if h(S) = 0.
Now, given a transformation T it is especially reasonable to consider the class S defined by S = {T'A}i.o. Once this is done there is the hope of comparing the VapnikChervonenkis entropy of S with the Kolmogorov entropy of T and the partition (A, A'). One aspect of these hopes, that zero Kolmogorov entropy implies zero VC entropy, is dashed by the following.

THEOREM IVA. If T is any ergodic transformation there is an A such that .he VC entropy of S= {T`Arj1 is positive with respect to the i.i.d. random variables Xi i = 1, 2.... with uniform distribution (i.e. P(XicA) = tt(A)).

Proof. One simply notes that by Theorem 111.2. there is an A such that S = {TtAri0 covers almost every finite sample {X,, X2,..., Xn}. If A is alse chosen so that g(A)51 we have
(4.1) Dn =SUP n1 1A(Xi)P(X,A) 2:1

and hence by the main theorem of Vapnik and Chervonenkis we have h (S) 0 0. To complete the comparison of Kolmogorov entropy and VapnikChervonenkis entropy we add two observations due to M. Ellis (personal communication). The first is that a transformation with positive Kolmogorov entropy need not have positive VC entropy for all A, and the second is that a transformation with completely positive entropy (or Kautomorphism) must have positive VC entropy for all A.
To prove the first observation let S be a transformation on [0, 1] with positive Kentropy. If I is the identity on [0, 11, let T be defined by T = S X L T can be checked to have positive Kentropy and be aperiodic, but [0, 1 ]X 10, 21] is a fixed set under T which hence has VC entropy 0.
The transformation constructed above naturally has a factor of zero entropy, and if one rules out this pathology one has that all nontrivial A have positive VC entropy. Specifically, if T has no factors of zero entropy it is known that T must be mixing [1, 1211. But for T mixing one has the dfold product S=TxTX ... X T is ergodic, so for any A with 0 < g (A) < 1 one has for B=AXAX ... X A one has U', SiB has product measure 1. By the criterion of (4.1) one then sees A has positive VC entropy. This completes the proof of the second observation.


1. P. Billingsley, Ergodic Theory and Information, Wiley, New York, 1965.
2. P. R. Halmos, Lectures on Ergodic Theory, Chelsea Publishing, New York, 1956.



3. L. K. Jones and U. Krengel, On transformations without invariant measure, Adv. in Math., 12 (1974), 275295.
4. J. M. Steele, Combinatorial Entropy and Uniform Limit Laws, Stanford University (Thesis), 1975.
5. V. N. Vapnik and A. Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory Prob. and Appl., 16 (1971), 264280.
6. , Teodya Raspoznavaniya Obrazov, Nauka, Moscow, 1974.