DATA ANALYTIC TOOLS FOR CHOOSING TRANSFORMATIONS IN SIMPLE LINEAR REGRESSION

Richard D. De Veaux andJ. Michael Steele, Princeton University
Program in Statistics and Operations Research, School of Engineering and Applied Science, Princeton, NJ 08544

ABSTRACT One problem of ACE is that the transformations,
while maximizing linear association, may introduce
Transformations of the regressor andlor the hetero'scedasticity in the response. The AVAS algorithm of
response in simple regression are often sought to Tibshirani (1988) is designed to alleviate this problem. It
increase linear association and to make residuals is similar to the ACE algorithm except that instead of using
appear more nearly normally distributed with g.,,(y) = E (f,, (x) 1 y) 1 [var (E (f,, (x) 1 y))J"4 it uses the
constant variance. The ACE (alternating condi. asymptotic variancestabilizing transformation. (The
tionaI expectation) algorithm of Breiman and details of the algorithm can be found in Tibshirani (1988)).
Friedman (1985) finds the transformations max The ACE and AVAS algorithms can be useful as
irnizing the correlation between the regressor and standalone tools for descriptive purposes. The result of
response, while the AVAS (additivity and vari each algorithm is two estimated functions f (xi) and
ance stabilization) algorithm of Tibshirani (1988) g (yi), 1 !5 i 5 n. However, it is often desirable or even
uses a variancestabilizing transformation of the necessary to obtain specific functional forms for f and g,
response. An exploratory data tool, the bulging that is, to find functions which approximate f and g and
rule of Mosteller and Tukey (1977) is used to find retain the desirable properties of the ACE or AVAS
specific functional forms for the relationships transformations.
suggested by the ACE and AVAS algorithms. The purpose of this paper is to illustrate the use of the
Data on the water content of soil are used to illus bulging rule of Mosteller and Tukey (1977) as an aid in
trate the procedure. finding an explicit functional form approximating the ACE
and AVAS transformations. Additionally, we compare the
1. Introduction. differences in the transformations suggested in these algo
rithms. Data from an experiment on soil water diffusivity
Recently, two powerful methods for estimating are used as an example. The parameter of interest, the dif
Optima] transformations for regression and correlation have fusion coefficient is a product of two functionals (a deriva
been proposed. For data (xi,yi), 1 :5 1 5 n, the ACE algo. tivo and an integral) of X and Y. By finding an explicit
rithm of Breirnan and Friedman finds transformations f additive functional form F (Y) = a + bG (X) we are able to
and g such that the empirical correlation of the transformed calculate p ~the functionals explicitly.
data T (xi), 9 (YjA 1 !~ 15 n, is approximately maxindzed. Th&'article is organized as follows. The diffusion
711e term ACE is an acronym for alternating condition problem is more extensively described in Section 2. The
Cxpectation. If (X, Y) is a pair ofjointly distributed random procedure for finding the transformations is outlined in
variables, one can define f and g as the limits of the func Section 3. In Section 4 we carry out the procedure on the
tions f,, and g, determined by taking f 0(x) = x, g 0(y) . y experimental data in detail. Section 5 contains discussion
and applying the recursions and some concluding remarks.
f.+,(X) =E(g.(Y) IX) 2. Soil Water Diffusion Problem
And
The movement of water in a horizontal column of
g.+,(Y) =E(f,(X) 1 Y)l [var(E(f,(X)IY))]'A unsaturated soil is commonly modeled by means of the
The f and g determined by this process can be shown to onedimensional diffusion equation
... ax'mize COrr(f(X),9M) (subject to var(g(Y))=1). 20 . a D(O) ae ] 0Naturally, if the joint distribution of x and Y is not known, at ax ax
one cannot find f and g precisely by this method. but one
can derive an empirically based algorithm as a natural where 0 is the water content of the soil, t is time, x is the
nlodification of the theoretical algorithm, by replacing the position in the horizontal column, and D (0) is the
conditional expectations by s'catterplot smoothers. In their coefficient of soil water diffusivity at the moisture level 0.
"Plernentation, Breiman and Friedman used a refined ver Any two variable function 0(x, t) that satisfies (2. 1) can be
lion of the suPersn'00ther of Friedman and Stuetzle (1982). shown (cf. Jost (1960, p. 31)) to be a function of the single

105

variable 1=xlt%, which is often called the Boltzinan variable. After writing 0(1) for the new function of one variable, one can check that 0Q.) satisfies the ordinary differential equation:
C>
1 dO d Ci
[D (0) (2.2) 0
dX

From this equation one can then easily obtain the
expression for D (0) which underlies our approach to its 0
CM
estimation:
D(O)=  1 d), X(u)du, (2.3)
2 dO

where 00 is the initial water content of the soil. The simul 17
taneous appearance of both derivative and integral terms in
this expression for D (0) provides one of the most intriguing
features of its estimation. 0 1 2 3 4 5 6
The process that has been most widely used to esti 3
mate D (0) is the transientflow experiment of Bruce and 1(10 mlsec
Klute (1956). In that experiment, water is held at a
Figure 1. Scatterplot of the volumetric water content, 9,
constant head and permitted to infiltrate into a horizontal versus the Boltzman variable 2,
column containing airdry soil. After a fixed time interval,
the column is sectioned, and the water content of the indi tion. The R 2 values from the regressions of
vidual sections is determined either by weighing, or by f(O,) on g(?.i) will be used as benchmarks
other methods. The data of Clothier and Scotter (1982 ' ) on against which we will compare the R 2 from More
Manawatu sandy loam plotted in Figure 1 are typical of analytically tractable tratisformations.
those obtained through horizontal infiltration experiments.
They also give an indication of some of the inherent Step 2. Use the socalled bulging rule of Mos
difficulties in estimating D (0). For instance, many smooth teller and Tukey (1977) to suggest analytically
ing methods when applied to the data of Figure 1 would tractable functions F(O) and G(I) which retain
lead to a virtually useless estimate of the derivative of 1 the desirable properties of the functions found in
with respect to 0. We will return to the problem of estimat Step 1.
ing D (0) in Section 4. For further details on the experiment
and historical background the reader is referred to De Step 3. Perform the regression of F(O) on G (1),
Veaux and Steele (1989) and Clothier and Scotter (1982). and use diagnostic tools to assess the appropri
ateness of the linear model:
3. Estimation Process F(O) = a +bG (l)+e. (3.1)
The details of the method we propose are possibly
best explained in the context of an example such as the Step 4. If functionals of 0 or 1 need to be
analysis of D (0) of Manawatu sandy loam. Moreover, one estimated, one can use (3.1) directly to obtain
almost has to have an honest example in hand in order to estimates.
detail the role of the tools we have used to assist our
transformation choice: the bulging rule and the ACE and As an example of step 4, consider the case of the dif
AVAS algorithms. With that said, it seems useful to have a fusion equation (2.1). After performing steps 13, one
topdown view of the method of the proposed method. The would use the chain rule to extract from (3. 1) an expression
four basic steps are the following: for d 1 in terms of 0. Then either analytic or numerical
dO
Step 1. Find estimated transformationsP0) and integration is used to determine the values of the definite
g (X) from the ACE and AVAS algorithms. (We integrals:
use X for the regressor and 0 for the response.)
The transformed data values in 1(0)= 1(u)du (3.2)
both cases will exhibit a strong linear associa 00

106

i i

for ,111 00 i~ 0 5 0, Finally the diffusion coefficient D (0) is cstiniated by the expression

0
D (0) d'k J 1(u ) du (3.3)
2 dO 00

where the indicated derivative and integral are those determined previously.

4. An Example: Manawatu Sandy Loam

TO understand the extent of the linear association between X and 0 that can be achieved by marginal transformations, we examine the results of applying the ACE and AVAS algorithm. Even for the best choices of f and g, the linear association between X and 0 is imperfect. Still, the ACE transformed variables plotted in Figure 2 and the AVAS transformed variables plotted in Figure 3 exhibit iubstantially greater linear association than the plot of the untransformed variables given in Figure 1. (We have used the implementation of the empirical ACE algorithm due. to L Brieman which is incorporated in The Statistics Store (I.M. Schilling (1985)), and the implementation of AVAS obtained from R. Tibshirani (see Tibshirani (1988)). When we measure the linear association of the transformed vari
2 2
&bles in terms of R . we find respectable values of R =. 93 for ACE and R 2. .92 for AVAS. These values provide us with a benchmark, and, in fact, one of the principal benefits of the ACE and AVAS algorithms is that they provide a standard against which more analytically appealing transformations can be judged.

lk * ** *

f (0)

(M

* * **g***

1

0 1 2

9(X)

Figure 2. Scatterplot of the ACE tranyormed Oi versus the ACE transformed 4. This plot is used to assess line~ of 'he ACE transformation.

0

f (0)

C14
1

Pt

1 ** **~
1  . 1 1 1 1

1.0 0.0

00

1.0 2.0

Figure 3. Scatterplot of the AVAS transformed Oi versus the AVAS transformed 4.. This plot is used to assess linearitY of the A VAS transformation.

To aid the search for such surrogates for f and g, the ACE and AVAS transformed variables are plotted against the untransformed variables to see if simpler functional forms might suffice. Figures 4 and 5 show the plots of
1:ri !5 n and 151 !~n. respectively,
for both ACE and AVAS.

I

C4
1

22%

,I

A
1

0 1 2 3 4 56

X

Figure 4. Scatterplot of the ACE and AVAS tranffiormed Xi versus Ii ~ is used to suggest a pouer transformation approitimating g (M.
I  ACE 2 = AVAS

107

standa~ (~e ~

0

P0) '

C~

1
1

141
_j

24

1 21 ".

0.10 0.20 0.30

0

Figure 5. Scatterplot of the ACE and AVAS transformed Oi
versus 0, that is used to suggest a power transformation
appro;dtwting.f (0). Notice that the bulge rule my not be
directly applicable here.
1 = ACE 2  AVAS standwlized

The hunt for analytically tractable replacements for f and g is further guided by the socalled bulging rule of Mosteller and Tukey (1977). Loosely speaking, the bulging rule suggests finding an outward normal to a smoothed plot of the data and using the signs of the normal components to guide one's choice of transfornation. For example, Figure 4 exhibits a bulge where both the x and y components of the outward normal are positive. The bulging rule then suggests that both the variables plotted on the horizontal axis should be transformed by moving up the scale of powers. In fact, the successive examination of plots of (X,x, g (Xi)), 1 s i :5 n, for larger values of cc continues to suggest moving up the scale of powers, and we are thus led to consider the exponential transformation. For comparison e 1 (standardized to have mean 0 and variance 1) is also shown in Figure 4. The exponential appears to be a compromise between the transformations suggested by ACE and AVAS. An alternative approach to this exploratory search for an appropriate transformation would be to use the method of Box and Cox (1964).

When we begin a similar examination of the plot of (0i  f (m), 1 :5 i < n given in Figure 5, the bulging rule for reexpression diverges for ACE and AVAS. For the ACE transformation, there may be a modest indication that we might wish to send 0 down the scale of powers, but the indication is not supported when tried. Fortunately, we have recourse to a second approach that does suggest an appropriate transformation, and we can consider the plot of

Oi versus ek which is shown in Figure 6. After alli since we have having settled on e I as the surrogate for f (1), the principal remaining task is to determine a surrogate F for f such that the scatterplot (F(O,), ek ) is approximately linearized. The bulging rule applied to Figure 6 initially suggests that we consider a transformation F that moves 0 up the ladder of powers, and successive applications of the bulging rule eventually lead us to the choice of F (0) = 03.

a
C?
0

0 C)
CM
C)

2

1 0 1

e 1

Figure 6. Scatterplot of Oi versus ek. The bulge rule suggests going up the ladderfor either 0 or 2L

For the AVAS transformation in Figure 4, the bulging
rule is directly applicable and suggests using F (0) = &
again. (The correlation between the AVAS 1(0j) and 0 ' ? is
.999) Thus, for this data set, we are led to the same
transformation from both algorithms. Notice that G (1) = e 1

and F (0) = & preserve the homoscedasticity of the AVAS transformations and the linearity of both ACE and AVAS. Strikingly, using F (0) = 0 3 and G (1) = e X achieves an R 2 of 2
.93 that meets the level of the optimal R =93 achieved by the ACE transformations. Moreover, when we consider the plot of 0i3 versus e 2S (both standarized) given in Figure 7, the visual impact of the linear association exhibited by this figure seems to compare well with that exhibited by the ACE transformed variables of Figure 2 and the AVAS transformed variables of Figure 3. On the basis of the quantitative evidence provided by comparing R 2,S, the subjective evidence provided by comparison of the scatterplots of Figures 2, 3, and 7, and the fact that both the ACE and AVAS algorithms suggested the same transformations, it seems appropriate to settle on the transformation choices of Figure 7.

108

C\J

3
0 Residual
1 ~_6i3
C\J

e
Predicted 6J3
Figure 7. Scatterplot of Oi 3 versus c X, (standardized).
Approximate linearity is achieved with this tranVormation Figure Scatterplot of rpiduals versus predicted viduesfor
which should be compared with the ACE tranTormadonsof the model 6 1 3 =a+be ', Residuals appear to be
Figure 2. appro;dmately homoscedastic.
For the Manawatu sandy loam data our exploratory and, for the choices that were made by means of the
analysis has led us to an approximate relationship of the exploratory analysis of the Manawatu sandy loam data, one
form finds a particularly simple not result:
d 7. 1
F(O) = a +bG(I), (4.1) dO . 301(ola) (4.3)

Where F(O)=&, G0,)=e>. The coefficients in (4.1) can In order to obtain D (0) it remains only to determine
now be estimated by ordinary least squares from which we the integral of 1(0)=GI((F(O)a)lb). FortheManawatu
obtain a=4.48x10~2, and b=1.20x10~' with nominal 3_
standard errors of 5.30 x 10r4 and 3.30 x W6, reSpeCtiVely sandy loam data we find .1(0) = log Q0 a)lb), and the
In Figure 8, we show a plot of the predicted values, integral of k(O) can be determined analytically. For more
details.including a discussion of interval estimates of D (0),
versus the residuals that are obtained from fitting the model the reader is referred to De Veaux and Steele (1989).
(4.1) by ordinary least squares. The residuals appear
APProximately homoscedastic, and we have no reason to be 5. Dis . cussiofi
discontent with the estimates obtained by ordinary least
squares. If the scatterplot of Figure 8 had exhibited a Both the ACE and AVAS algorithms were used to
greater heteroseedasticity, we would have probably elected suggest analytic forms for a transformation of a regressor
to aPPlY iteratively reweighted least squares, or a similarly and response which would exhibit linearity and homos
directed technique. cedasticity. To aid the search for such functional forms,
As a final check on the reasonability of the fitted the bulging rule of Mosteller and Tukey (1977) was used
model, one should consider the fit in terms of the when appropriate. The ACE transformation, while display
2
untransformed variables as exhibited in Figure 8. This plot ing a high degree of linearity (R =.93) also showed non
constant variance in the response. The AVAS transforma
has no flagrant defects; indeed it suggests that the pro tions did nearly as well in terms of linearity (R 2 = .92) and
cedure has been a reasonable one.
BY differentiating (4.1) we find the general relation the transformed response had nearly constant variances.
For our data set, both the ACE and AVAS algorithms led to
ship the same function forms G (1) = c X and F (0) = 0' for the
d X (0) (4.2) regressor and response respectively. Functionals of the
d 0 W(X) curves were directly attainable from the linear model
b IF'(0) 1 (G* (G '((F (0) a) 1 b))), F (0) = a + bG (X) which through residual analysis seemed

109

plausible. The success of the procedure in this case sug De Veaux, R.D. and Steele. J.M. (1989). "ACE guided transfor
gests that both the ACE and AVAS transformations should mation method for estimation of the coefficient of soil
be considered as exploratory tools by the data analyst to water diffusivity," Technometrics, (In press).
suggest appropriate functional forms for transformation. DuChateau. P.C., Noffiger, D.L., Ahuja, L.R., and Schwartzen
druber. D. (1972). "Experimental curves and rates of
change from piecewise parabolic fits," Agron. J., 64, 538
542.
References Friedman, J.H. and Stuetzle, W. (1982). "Smoothing of scatter
plots," Technical Report Orion 3, Deparnnent of Statistics.
Box, G.E.P. and Cox, D.R. (1964). "An analysis of transforma Stanford University.
tions." J. Royal Statist. Soc., B26, 211243, discussion
244252. Jost, W. (1960). Diffusion in Solids, Liquids, and Gases 3rd ed.,