Find parameter estimates in an independent strata Generalized
Linear Model using projected score equations.
DESCRIPTION:
Produces an object of class "psglm" which contains
components of the solution to projected score equations
for a Generalized Linear Model.
USAGE:
psglm(nu = matrix(1, nrow = dim(y)[1], ncol = dim(y)[2]), y, x,
family = gaussian, weights = rep(1, length(y[, 1])), start = NULL,
expinfo = T, conditional = T, paramtest = NULL,
control = glm.control(epsilon = 0.0001, maxit = 100,...), ...)
REQUIRED ARGUMENTS:
y: a matrix of response variables with rows corresponding to
strata.
x: either a three dimensional array of covariates or a matrix
of covariates.
OPTIONAL ARGUMENTS:
nu: A matrix of trials for when y is binomial.
family: a family object - a list of functions and expressions
for defining the link and inverse functions etc. Cumulants
are obtained by turning this into a familytoo object.
weights: a vector of weights, one for each stratum.
start: a list of two components. The first contains initial
estimates for the parameters of interest -- the within
strata parameters, the second contains initial estimates
for the nuisance parameters -- the parameters associated
with the strata totals.
conditional: when set to T the projected score equations are
solved giving approximate conditional estimates. When set
to F the full maximum likelihood score equations are
solved.
expinfo: a flag to indicate if the computationally faster
maximum likelihood score information should be used to
solve the projected score equations. If set to F then the
projected score efficient information is calculated which
requires a numerical differentiation.
paramtest: a vector of integers denoting which parameters of
interest should be tested using a projected score test.
control: a list of iteration and algorithmic constants. See
glm.control for their names and default values. In
addition to the controls used in glm, psglm has to define
a step size for the numerical differentiations. This is
taken to be sqrt(eps).
VALUE:
An object of class "psglm".
DETAILS:
The output can be examined by print and summary. Some
checking is done of the input but not a major amount. In
particular no check is done for "conditionally
uninformative strata" which at present will cause the
estimation routine to run into trouble. These
uninformative strata could be removed in a pre-processing
step.
Parameter estimates for a Generalized Linear Model are
obtained by solving projected score equations -- resulting
in approximately conditional estimates. The solutions are
obtained by iterating between estimating the parameters of
interest and estimating the nuisance parameters using
either a Fisher scoring or a Newton-Raphson algorithm. The
data is assumed to be described by a two way array in
which the rows/strata are independent and conditional on
an individual stratum parameter, the within strata
observations are also independent. Standard examples are
conditional logistic regression in case control studies
and the one-parameter Rasch model.
At present only canonical links are allowed for.
Weights are available and have two uses. Firstly they are
valuable in reducing the dimensionality of some data sets;
identical strata (both response and covariates) can be
collapsed into a single stratum with a weight that
represents the number of collapsed strata. Secondly
dispersion parameters for the Gaussian and Gamma families
can be incorporated using weights equal to the inverse of
the dispersion parameter.
A projected score test is available for hypothesis
testing. Currently the null hypothesis is that the
selected parameters are equal to zero.
There are differences between psglm and glm output because
projected scores do not have to correspond to any
likelihood and therefore notions such as deviance do not
carry over in an unambiguous way. If there is a single
stratum, and "conditional" is set to F, then the parameter
estimates and standard errors from psglm and glm should be
identical. (One caveat: the link for the Gamma family is
taken to be -1/mu, so -psglm = glm.) Since psglm uses
higher order asymptotics, in particular the third and
fourth cumulant functions are needed, some of the common
methods used to implement glm fail in psglm. An example is
the way that binomial data is treated in glm -- the
binomial trials are subsumed into the weights because they
only enter the maximum likelihood score equations in a
linear manner. In psglm they enter the projected score
equation in a non-linear fashion and therefore can not be
factored out into the weights.
REFERENCES:
Waterman, R.P. and Lindsay, B.G. (1996). A simple and
accurate method for approximate conditional inference in
generalized linear models. JRSSB 58: 177-78.
Waterman, R.P. and Lindsay, B.G. (1996). The accuracy of
projected score methods in approximating conditional
scores. Biometrika 83: 1-14.
Small, C.G. and McLeish, D.L. (1989). Projection as a
method for increasing sensitivity and eliminating nuisance
parameters. Biometrika 76: 693-703.
SEE ALSO:
familytoo, cumulants.
EXAMPLES:
# Conditional estimation of item parameters in a Rasch model.
source("raschwei.q")
summary(psglm(y=y,x=z,family=binomial,weights=w,trace=T,
paramtest=c(1,2)),correlation=T)
# Estimating a log-odds ratio in a proportional hazards model
source("coxreg.q")
summary(psglm(nu=n,y=y,x=a,family=binomial,paramtest=c(1)))
# An exact conditional estimate of the log-odds ratio for paired
# Bernoulli trials. The estimate should be 0.5*log(4/7) and the
# estimated standard error 0.5* sqrt(1/7 + 1/4)
y <- matrix(c(1,0,0,1),byrow=T,ncol=2)
w <- c(4,7)
z <- matrix(c(1,-1),ncol = 1)
psglm(y = y,x = z,family=binomial,weights = w,expinfo=F)