Please see PDF version
The Annals of Probability
1986, Vol. 14, No. 1, 326335
FISHER INFORMATION AND DETECTION OF A EUCLIDEAN
PERTURBATION OF AN INDEPENDENT
STATIONARY PROCESS
BY J. MICHAEL STEELE
P~.inceton University
An independent stationary process {X,}'i, in R' is perturbed by a sequence of Euclidean motions to obtain a new process {Y,)',_1. Criteria are given for the singularity or equivalence of these processes. When the distribution of the X process has finite Fisher information, the criteria are necessary and sufficient. Moreover, it is proved that it is exactly under the condition of finite Fisher information that the criteria are necessary and sufficient.
1. Introduction. The purpose of this article is to provide results which tell when an independent stationary process in Rd ' which has been perturbed by Euclidean motions, can be distinguished from the original process. The first results of this nature are due to Feldman (1961), Shepp (1965), and Renyi (1967). In particular, Shepp settled the question completely in the case of translations in W.
Here we will obtain extensions of Shepp's results to Rd, but, more pointedly, we extend the group of perturbations from translations to the whole group of proper rigid Euelidean motions (i.e., rotations, translations, and their compositions).
One benefit of this extension comes from having to spell out the proper analogue of finite Fisher information. A second benefit comes from the fact that the noncommuting perturbations studied here do not have a convenient harmonic analysis. This forces one to develop tools which are different from those used in the commutative case. A final benefit comes from seeing how several simple facts from the local theory of Lie groups can be put to work on a statistical problem.
Before stating the main results some notation needs to be developed. We will let G denote any closed continuous subgroup of the group of rigid motions of R d~ It is known that such a G must actually be a differentiable manifold, and hence that there is a tangent space TjG at the identity e. The elements A C TeG can be viewed as matrices, and for all t one can define a new matrix exp(tA) by the converging sum E' 0(t'ln!)An'
n which we will denote by p(t). The set {p(t):
t E= R} can be verified to be a group, and it is called the onedimensional
subgroup generated by A.
To concretize these notions and to develop some facts which will be used later, we now consider the important special case where G is the full group of rigid
Received December 1981; revised March 1985.
AMS 1980 subject cIassifications. Priinary 6OG30; secondary 60B15
Key wor& and phrases. Fisher information, Kakutani's product theorem, product measures, Euclidean motions, singular processes, Hellinger integrals.
326
i
i
i
1
INFORMATION AND DETECTION 327
motions on W. The usual real parametrization of G is given by the 3 X 3 matrices.
cos 0 sin 0 a
R(O, a, b) = sin 0 cos 0 b 1r < 0 . 1r, oo 0 0 1
Here, by parametrizing R 2 by the twodimensional set in R' given by
Y 00
one sees that matrix multiplication by R(O, a, b) corresponds to a rotation by 0 followed by a translation by a and b along the x and y axes. A basis for TG is given by
0 1 0 0 0 1 0 0 0
AO= 1 0 0 A. = 0 0 0 Ab 0 0 1
0 0 0 0 0 0 0 0 0
and direct power series computation establishes
cos 0 sinO 0 1 0 a
exp(OA0) = sin 0 cos 0 0 eXP(aAa) = 0 1 0
0 0 1 0 0 1
1 0 0
eXP(bAb) = 0 1 b
0 0 1
If H = {p(t): t E= R} is a oneparameter subK. up, we say a measure ja is invariant under H if g(p(t)B) = g(B) for all measurable B and all t. Naturally, Lebesgue measure is invariant under any subgroup of the rigid motions. Also, note that if g has a radially symmetric density (say in R'), then g is invariant under the subgroup of rotations.
Finally, we recall that there is a neighborhood N of the identity and an c > 0 such that each g e N can be written uniquely as g exp(t.4) for some A r= TG with 11All = 1 and with Itl < c. Here the norm 11 is computed by expressing A = (aij) with respect to a fixed basis and taking 11All = (Ea? .)1/2 . For any V
g EE N we define 11g11 = t where g = etA is the canonical representation of g. If g 55 N we just take ligil = 1. Referring to the previous examples, we see that 11A911 = C2, so for the rotation g = exp(OAO) we have 11g11 = JOIC2. Similarly, for g = exp(aA.) we get 11g11 = Jal. [For the facts used in this paragraph and subsequently, the handiest reference seems to be Auslander and MacKensie (1977), Chapter 7, pages 117134.] With these conventions it is now possible to state the first results.
THEOREM 1. Suppose that X,, X21 ... is a sequence of independent random variables with distribution it on R ~ which is not invariant under any continuous one dimensional subgroup of rigid motions. If gi is any sequence of rigid motions converging to the identity, but such that E?' lligill' = oo, then the processes {X,, X2.... } and {914 92X21... } are mutually singular.
i
i
1
i
1
328 J. M. STEELE
Before stating the next result, it is worth observing that the case of radial symmetry in R' shows the necessity of ruling out invariant distributions tt. The possibility of discrete symmetry underlies the necessity of restricting attention to gi converging to e.
To state the second theorem we need the notion of finite Fisher information. To motivate our definition we recall that if ff.) is a smooth density on R, the translation family fo(x) f (x 0) has Fisher information
2 00
I= EO (70 log fo (X) 00 ( p(X)211(X) } dx
00 d 2
=4f. ~f(x) dx.
_ ( dx
This last equality suggests that we generalize the notion that the derivative of
h(x) = fl (x) is in L 2 . For any continuous oneparameter subgroup H = {p(t):
t e R} we define an operator on Cow(Rd) by setting
(L.O)(x) = d 0(pWX)
dt LO*
This operator is called the infinitesimal operator associated with the subgroup H. For example, if H is the subgroup of rotations in R', then one can easily cheek that L = x(dlay) y(aldx).
The infinitesimal operators can be extended from CO'0 in the usual way to the class of distributions (generalized functions) on Rd. In particular, Lh is welldefined for any function h, although Lh may not necessarily be a function. We can now give the main definition.
We say that a density f on Rd has finite Fisher information, provided for h
Cf we have Lh E: L 2(R d) for all infinitesimal operators associated with continuous oneparameter subgroups of rigid motions.
2
Here it is useful to recall that a distribution P is said to be in L ' provided sup i,(.0) < oo for all 0 E= Co' with 110112 = 1. We also note that such a P must be in the dual of L 2 which is just L 2 again, so the statement Lh E= L 2(R d) entails the conclusion that Lh is a function and is in L 2(R d) in the usual sense.
The next result shows that 12 Euclidean perturbations of distributions with finite Fisher information are equivalent. 7`his result thus provides large class of examples of a particularly strong type of quasiinvariance [c.f. Feldman (1961fl.
THEoREm 2. If X,, X2.... are independent with density f on Rd with finite Fisher information, then the processes {Xl, X21 ... } and {91X11 92X21 ... } are mutually absolutely continuous whenever Ewk 111gk112 < 00.
'The final result shows that finite Fisher information is much more than a convenience in the detection problem. In fact, the class of distributions with finite Fisher information is precisely the class for which 11 perturbations never permit certain detection.
1 i i
i 1
INFORMATION AND DETECTION 329
THEOREM 3. If {X,, X2,... } is an independent stationary process on R d
and {X,, X2.... } and {91X11 92X21... } are mutually absolutely continuous for all sequences {g,} such that E~o lligill 2 <
$ oo, then the Xis have a strictly positive density f (x) with finite Fisher information.
The proofs of these theorems are given in the next three sections. The fifth section discusses some related literature and mentions some open problems.
2. Proof of Theorem 1. The main tool used in the proofs which follow is the theorem of Kakutani (1948) on the singularity and equivalence of product measures. If g and v are any two probability measures which are absolutely continuous with respect to a measure m, then the Hellinger integral Hfig, P) is defined by
H( g, P) f ( fg) 1/2 cim,
where f = dg/dm and g = dp/dm are the RadonNikod~m derivatives. One can check without difficulty that H(g, i,) does not depend upon m and that 0 5 H(It, P) :5 1.
Now if Itk, k = 1, 2.... and v,,, k = 1, 2.... are any two sequences of probability measures such that ttk and xk are mutually absolutely continuous for each k, then the theorem of Kakutani states that the infinite product measures
00 00
g=rjttk and P=I1pt
k=l k1
are either mutually singular or mutually absolutely continuous accordingly as
00
fl H(ttkI Pk) = 0
k1
or
00
11 H(JUk, Pk) > 0. k1
To prove Theorem 1 we first note that since gi are converging to the identity e, there is no loss in assuming that all the g, are in the neighborhood N where each g E= N can be written as g = exp(L4) for a unique A c= TG with 11All = 1 and a unique t, 0 .~ t _.5 e < oo.
We now take a sequence Z, which are Li.d. N(O, I) and consider the sequences {Xil} = {Xi + Zi} and {Y,'} = {gi(Xi + Z)}. It is intuitive that if {Xi'} and {Y,} are singular, then so are {Xi} and {giXi}. To establish this rigorously we first note that {gi(Xi + Z)} =d{giXi + Zi} by the affine character of gi and the spherical symmetry of Z,. By.KAutani's theorem we see therefore that the sihgularity of {X,'} and {Yi'} implies the singularity of {Xi + Zj and {giX, + Zj}. Now, we use a Fubini argument. By the singularity of {Xk + Zk} and {gk Xk + Zk} there is a measurable subset B C R such that the events E, = {{Xk + Zk} (= B} and EO = {{9kXk + Z0 Is B} have measures 1 and 0, respectively under the
i
i
i
1
330 J.M.STEELE
product measure P X P, where P is the measure on 2 given by the {Xk} process and P' is the measure on 2' given by the {Zk} process. By Fubini's theorem there is a subset 21 c 2' of P' measure one such that for all w' E: 210 the events
0
El(£j) = {{Xk, + Zk(w)} E B} and Eo(w) = {(gkXA, + Zl'(C0)} e B} have P measures 1 and 0, respectively. We choose some fixed w' c fl' and define a new 0 0
measurable subset 13 c R 00 by A = B (ZA,(w)}. We then have PQXk} E 13)
0
1 and PQgkXk} E 0 which proves that {Xk} and {gkXk} are singular.
Our objective is now to start computing Hellinger integrals in order to use the assumption E00 111&112 = oo to show {Xil} and {Yj'} are singular.
I'5enote the common density of the Xj' and consider the Hellinger
We let f integrals
Hk = f ~f (x)f (g~'xy dx = fh(x)h(eAx) dx,
where h(x) ~f WX) and gk = e 1A. Setting OA(t) = P(x)h(e `Ax) dx we note
that OA(O) 1 and that it is not difficult to show that 0A0 is infinitely
differentiable.
By the change of variables y = e'Ax one also sees that 0A0 is an even function and that consequently the twoterm Taylor series with remainder is just
(2.1) OA(t) = 1 + 'O"(u)(t u) du.
fo A
By computing the first derivative and then changing variables before comput
ing the second, one has
(2.2) O,A,(t) = f {vh(eAy) . AetAy} (Vh(y) . Ay} dy,
which simplifies for t = 0 to
f (Vh(Y)Ay}2 d
OA,,(0) = Y f {Lh}2 dy.
Now, if 0Al'(0) = 0, we will show that h is invariant under the oneparameter group p(t) = e 1. To see this, first note that 011(0) = 0 implies 1 ;PAW = Qt A since 0.(t) is even. But setting 0 = to < t, < ... < t., = t we have n
0:!~ 1 f h(x)h(p(t)x) dx E fh(x){h(p(tkl)x) h(p(tk)x)} dx
k1
n
(f {h(p (tk_ I)X) h(p(th)X)}, d
X) 1/2
k1
1/2
tk)X) _ h(X)} 2 d
n (f WMtk1 X
k1
n 1/2
F, (2 20A(tk1 Q)
k1
i
i
i
INFORMATION AND DETECTION 331
Now the fact that 1 OA(t) = 0(0) shows the last term above can be made as small as we like. Thus fh(x)h(p(t)x) dx = 1 for all t, and since H(P, tt) = 1 only if P and g are equal, we see 1(x) = f (p(t)x) for almost every x. Our assumption that f is not invariant under a oneparameter subgroup therefore implies that 011(0) < 0 for all A r= T,,G, 11A11 = 1.
A
By the continuity of OA(t) as a function of (t, A) and by the compactness of the set K = {(0, A): 11A11 = 1}, we obtain an open neighborhood 0 containing K and a 8 > 0 such that OA(t) :5 8 for all (t, A) E= 0.
Applying this bound in the Taylor expansion (2.1) we see
(2.3) OA(t) < 1 W/2
for all A, 11A11 = 1 and all t in an e neighborhood of 0. Since the gt are by IA,Ak
assumption converging to e, there is no loss in assuming gk = e ' 1JAkil
1Q =_ ligkil e. If g. is the measure corresponding to the {X, + Zk} and Pk corresponds to {gh(Xk + Zk)}, then
00
(2.4) H(gh, PJ.5 rl (1 _ allgh112 /2}.
k1 k1
Since Ekl111gk112 = oo by assumption, the right side of (2.4) diverges to zero. By Kakutani's theorem this shows {Xil} and {Yj1} are singular and by the earlier reduction this completes the proof. Cl
3. Proof of Theorem 2. Let 0. denote a normal density with mean zero and covariance matrix eL Under the hypothesis of Theorem 2 it is easy to cheek that
lim f ~o,.1(x) . f (x~ dx f f (x) dx = 1.
e0
We can therefore choose eh 10 so rapidly that
00
(3.1) 11 f
1 k1 ' f (x). f (x) dx > 0.
If the Z1, ~ N(O, ekI) 1 :5 k < oo are independent, then (3.1) and Kakutani's theorem will give
(3.2) {Xk} {Xk + Z0
and
(3.3) {gkxk} ~ {gk(xk + ZJ} ~ fgkxk + Z0
Here is used to denote measure theoretic equivalence or mutual absolute
continuity of the processes. One should note that the second equivalence of (3.3)
comes from the fact that g(Xk + Zk) =dgXk + Zk which is due to the spherical
symmetry of N(O, I).
Now letting h,,(x) 0,(x) ; g = eA and 0 (t, c, A) = fh,,(x)h,(e `Ax) dx,
we see, as in (2.1) and (2.2), that
(3.4) ;p(t, e, A) = 1 + fo 'O"(u, e, A)(t u) du
i
i
1
332 J. M. STEELE
and
(3.5) e, A)= f (Vh.(eAy).AeAy}{Vh,(y).Ay} dy.
Applying the Schwarz inequality to (3.5) and using the invariance of Lebesgue measure we have f {(Vhj(y). Ay}2 d
(3.6) A) 1 < Y.
To bound this last integral, first reexpress it and then use Schwarz's inequality f {(Vh,')(Y) . Ay}2 d
Y f ~ d f 'O,(e tAx)) 2/f 'O,(e tAx) dx it and
d 2 (f d d
f * 0,(eAx) f (e tAx Y)2
dt dt
fo,( Y) ( d f(etAx y) f (etAx y) dy
dt ~l/
f f (etAx y),P,(y) dy.
Hence f {(Vhj(Y) Ay}2 d
Y f fo,(y) ~ d f (eAx y)) 21f (eAx y) dy dx
4 it
1 f ( d f(e tAX))21f(etAx) dx
4 it
f ((V ~f (
Yj).Ay}2d
Y < 00.
The finiteness of the last integral naturally is just the application of the hypothesis of finite Fisher information.
Since the last integral is just a quadratic function of A, we have a uniform
bound
sup f {(V ~f (y) Ay~' dy B < oo,
11A111
which in (3.4) and (3.6) yields
(3.7) 1 M2/2:5 ~(t, e, A)
fok all 0 5 e < oo, IIAII = 1, and oo < t < oo. From (3.7) it follows im
mediately that rl,'10(t., e., A) > 0 whenever E~, 111g,112 < oo. By Kakutani's
theorem we then have {Xi + Zi} {gi(Xi + Z)}, so by the equivalences (3.2)
and (3.3) the proof of Theorem 2 is complete. E1
1
INFORMATION AND DETECTION 333
4. Proof of Theorem 3. If tt is any measure on R', we define the translated measure it., a FE R', by tt(A) = g(A + a) for all Borel A c R I.
LEMMA 4.1 [e.f. Shepp (1965), Lemma 51. # 9 A a for all a E= R n, then tt is absolutely continuous with respect to Lebesgue measure and corresponds to a strictly positive density.
PRoop. Letting A denote Lebesgue measure we have by Fubini's theorem that
(4.1) fR' u(A + a) da = X(A).
Now, if g(A) = 0, then g,, ~ g for all a implies g(A + a) = 0 for all a, which by (4.1) shows X(A) = 0. The RadonNikod~m theorem then shows g has a density f().
Now, to prove Theorem 3 we suppose that (X,, X2,... } is equivalent to {g, X,, g2 X2,... } for all (g)?,0_ 1 such that E~Q 111gill 2 < oo. Since we can take g, to be an arbitrary translation and then take gi e for i ~. 2, we see from Lemma 4.1 that the Xi must have a positive density f.
Next, let L be any infinitesimal operator associated with a oneparameter subgroup H rigid motions. We have to show that Lh r= L2(Rd) Where h = J
There are two cases to consider: H compact and H noncompact. It is well known that if H is compact, it must be conjugate to the subgroup of rotations in the X1X2 plane. Further, if H is noncompact, it must be conjugate the subgroup of translations along the x,axis.
We consider first the harder case of H compact. Writing H = {etA: t C= R} we known there is a rigid motion M so that
(4.2) exp(L4)M = Mexp (tAO),
where exp(L49) corresponds to rotation in the X,_X2 plane.
Now we let ho(x) = h(Mx) and calculate
I(t) = fh(x)h(exp(t4)x) dx fh(Mx)h(exp(tA)Mx) dx
(4.3)
= fh(Mx)h(Mexp(t.40)x) dx = fho(x)ho(exp(t.49)x) dx.
We write (X11 X21. . ., Xd) = (p cos 0, p sin 0, X31 1 Xd), and define
h00(0y PY X3P P Xd) = ho(Xl, X21, Xd) to obtain
1(t) = fho(x)ho(exp(tAo)x) dx
(4.4) = P00(0, P, X31 1 XdMOO(O + t, P, X31 1 Xd)
.p dp dO C'Xd.
i
i
i
1
1
334 J. M. STEELE
Setting
A00(n, P, X31 ... 1 Xd) einOhoo(O,
727£v P' X3P ... 1 Xd) dO,
one obtains by Parseval's identity that
(4.5) I(t) f {I: 1 1.(n, P, X31. Xd )12 cos(nt) P dP dX3, , dXd
z
and
I(t) A00(n, P, X3,,Xd) 2 cos(nt) PdPdX3, ,dXd
z
(4.6) 0 21 W
4 f n hoo(n, P, X31 7 Xd } P dP dX31 1 dXd.
Intl_.51
Now, we note that Lh E= L2 if and only if d h00(0, P, X31 ... 1 Xd) E= L2(p dp dO dX31 .... dXd). dO
2
Therefore, if Lh 0 L ' the function
T(t) f { I: n21 100(n, P, X31 ... 1 Xd) 12 P dP dX3 .... 1 dXd
Intl_ 1
satisfies T(t) oo as t 0.
We can now use an elementary lemma on real sequences which is from Shepp (1965).
LEMMA 4.2. If T(x) oo as x 0, there exists a real sequence ak with Ea' < oc, but Ea2T(%) oo.
k k
Applied to (4.6) this lemma shows that if Lh 14 L2 there exist (%) EE 12 such that II1G I(aJ = 0. By Kakutani's theorem this says that the processes {X,, X2.... } and {exp(a,A)Xl,eXP(q2A)X2.... } are singular. This contradicts the hypothesis of Theorem 3, and therefore establishes the fact that Lh E= L2 in the case that H is compact. For the noncompact case, one performs a similar reduction to the case of a onedimensional translation. After that reduction the proof can be completed just as above except that the Fourier transform replaced the Fourier series. 11
5. Final remarks. There is an intimate connection between equivalence under 12 Euclidean perturbations and finite Fisher information. Shepp (1965) posed the question of determining the class of distributions F which are equivalent for all 1P translation perturbations with p * 2. This problem was settled definitively by Chatteiji and Mandrekar (1977). It would be interesting to know
INFORMATION AND DETECTION 335
if the results of Chatterji and Mandrekar (1977) can be extended to the case of 1P Euclidean perturbations with p * 2.
Acknowledgment. I would like to thank F. Huffer, M. Shahshahani, and L. A. Shepp for their comments on an earlier draft of this manuscript. This
research was supported in part by National Science Foundation Grants MCS
8024649 and DMS8414069, and technical assistance from the Department of
Statistics, University of Chicago.
REFERENCES
AUSLANDFR, L. and MAcKENsiE, R. E. (1977). Introduction to Differentiable Manifolds. Dover, New
York.
CHATTERJI, S. D. and MANDREKAR, V. (1977). Quasiinvariant measures under translation. Math. Z. 1541929.
FELDMAN, J. (1961). Examples of nonGaussian quasiinvariant distributions in a Ilibert space. Trans. Amer. Math. Soc. 99 342349.
KAI(UTANI, S. (1948). On equivalence of product measures. Ann. Math. 49 214224.
RENYi, A. (1967). On some basic problems of statistics from the point of view of information theory, Proc. Fifth Berkeley Symp. Math. StatisL Prob. 1531543. Univ. California Press.
SHEPP, L. A. (1965). Distinguishing a sequence of random variables from a translate of itself. Ann. Math. StatisL 36 11071112.
DEPARTMENT OF STATISTICS FINE HALL
WASHINGTON ROAD
PRINCETON UNIVERSITY
PRINCETON, NEW JERSEY 08544