Please see PDF version

The Annals of Statistics 1978, Vol. 6, No. 4, 932934

University of British Columbia
If f.(x) is any estimator of the density f(x), it is proved that the mean integrated square error is no better than 0(n1).

1. Introduction. In Wegman's paper [51 on nonparametric density estimation, he states that it would be interesting to show that there is no density estimator which has mean integrated square rate better than 0(n1). The object of this note is to prove such a result, making no arbitrary assumptions about the specific form of the estimator. This proof is given in Section 2. Our method applies to some other measures of error, as we point out in Section 3.
To be precise, a density estimator i.(x) i.(x; xl, . . ., x,,) is a sequence of Borel functions defined on Rn+1. If X, X, .. are independent identically distributed random variables with density f(x), then y,,(x; X, . .., X,) provides an estimate for f(x). The mean integrated square error is defined to be

(1) MISE (n) = Ef ~  (f(x)  Y.(x))2 dx ,
where Ef denotes expectation according to the densityf.
Tartar and Kronmal [31, and Wegman [4, 51 give a nice review of the extensive literature of such estimators.

2. The main result.

THEOREM. For any density estimator i,.(x), there is a square integrable densityf, and a constant c > 0 such that

Ef (f(X) _ f.(X))2
(2) dx ~ cIn ,

for infinitely many n. Thus there is no density estimatorfor which MISE (n) is better than 0(n1). In (2), f can be chosen to be a normal density with mean zero.

PROOF. We shall introduce a parametric family of densitiesfo (x), and use i,(x) to construct an estimator 6,, for the parameter 0. Specifically, if n(x; 0, U2) is a normal density with mean 0, we define

(3) 0 ~ 1 n(x; 0, a') dx

and let J = 10: 0 1 a t~ 21, a closed interval. For each 0 E J there is
a unique a c [1, 21 for which (3) holds; we denote the corresponding density

Received March 1977; revised October 1977.
1 Supported in part by the National Research Council ofCanada and the I. W. Killarn Foundation.
AMS 1970 subject classifications. Primary 62G20; Secondary 62F20.
Key words and phrases. Nonparametric, density estimation, mean integrated square error, Crarn6rRao inequality.


n(x; 0, a2) by fo(x). Thus 0 fo(x) dx. Let j,(x) b e obtained from Y.(x) by
'0 "
truncating in the following way: 0.(x) = min (max (1,(x), 0), 1) so that 0 t,j,(x) :~ 1. We use this to construct the following estimator of 0:

(4) 0,, = ~ 1 j,(x) dx .

The basic observation is that by Schwarz' inequality we have

(5) (0  b.)' = 1 ~ 1 (fo(x)  j.(x)) dx}l
< ~1 (x) j,,(x))' dx (fo(x)  j..(x))2 dx .
= . (fo
Thus, writing EO = Efo,

'E"(0 _ 0.)2 EO (fo(X) j.,(X))l dX
The theorem will thus be proved by exhibiting a 0* such that

(7) EO.(0*  0.)2 > cIn,

for infinitely many n.
By the Cram6rRao inequality [6], page 188, one has

(8) 'EO(O _ b.)2 k B(O)2 + (1 + B'(0))'InI(O) ,

where B(O) = EO(O~)  0, and 1(0) = E0Qala0) log fo(X))2. The validity of (8) may be justified most easily by checking that the steps of the proof of (8) in [6], pages 182188, are valid for the densityfo used here. It is easily checked that B(O), Y(O) and 1(0) are continuous functions of 0. Since i is closed, we have sup,,, 1(0) = M < oo. Let J1 = [a, b] be any closed interval in the interior of J, and let n, satisfy ni1 t~ (b  a)18. Let S2 = SUpo~,' (1 + BI(O))'. if S, > 11 then there is an interval J, c J, on which (1 + Y(O))' ;~ 1 and thus (8) implies E60  6~) k 118n, M for 0 E J2. On the other hand, if S' ::s~ J, we can argue as follows: B(b)  B(a) = B'(c)(b  a) for some c with a < c < b. But 1  JB1(c)l :5~ 1 + B'(c) :!2~ S so IB'(c)l ~~ 1  S > ~ and thus

(9) max (JB(a)J, JB(b)1) ~~> JB(b)  B(a)112 ~~t (b  a)14 .

Let 0, = a or b satisfy JB(O)l = max (JB(a)J, JB(b)J). Then (8), (9) and the choice of n, imply

(10) E01(01 _ 0.1)2 ~~: B(O1)l ~~ 41ni.

By the continuity of B(O), there is a closed interval J, c J1 such that for 0 e J, one has E.(0  d.,)' > 1Inj. Repeating the argument, we obtain a sequence of nested closed intervals J1 D J2 D ... and a sequence of integers n, < n2 < . . such that E.(0  0,.k)2 k cInj, for 0 c Jk. Since the intersectionn,=, ik is nonempty, there is thus a 0* for which (7) holds for n = n,, n21 This completes the proof.

3. Further considerations. The application of Schwarz' inequality in (5) can be replaced by an application of H61der's inequality to yield a priori lower


bounds on the mean integrated pth power error for p ~t 1. We omit these for
the sake of brevity.
For certain specific classes of estimators one may obtain more precise lower
bounds. For example, Rosenblatt [21 shows that estimators of kernel type have MISE (n) no better than 0(nt). Fryer[ 1 ]has made an empirical study of such estimators for small n, whenf is a normal density, and these indicate that the rate predicted by Rosenblatt is attainable.
We wish to emphasize the fact that our result does not depend on any assump
tions about the densityf or the estimatorj.. Moreover, the proof shows that, even if f were known to be normal, with known mean, no improvement on the rate 0(n1) would be possible.

[1] FRYER, M. J. (1976). Some errors associated with the nonparametric estimation of density functions. J. Inst. Math. Appl. 18 371380.
[2] ROSENBLATT, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 832837.
[31 TARTAR, M. E. and KRONMAL, R. A. (1976). An introduction to the implementation and theory of nonparametric density estima ' tion. Amer. Statist. 30 105112.
141 WEGMAN, E. J. (1972). Nonparametric probability density estimation: I. A summary of available methods. Technometrics 14 513546.
[5] WEGMAN, E. J. (1972). Nonparametric probability density estimation: IL A comparison of estimation methods. J. Statist. Comp. and Simulation 1225245.
[6] ZACKS, S. (197 1). The Theory of Statistical Inference. Wiley, Toronto.