Since residuals can be correlated, potentially existing observed outcomes of the same individual can be informative for predicting the unobserved valued of the same individual.
Assume that the data is sorted such that Yij = yij, j = k + 1, k + 2, …, p are observed and Yij, j = 1, 2, …, k are not. The special case of all outcomes being unobserved (new individual) is covered with k = p.
Let further $$ \Sigma_i(X_i, \theta) = \begin{pmatrix} \Sigma_i^{new,new}(X_i,\theta) & \Sigma_i^{new,old}(X_i,\theta)\\ \Sigma_i^{old,new}(X_i,\theta) & \Sigma_i^{old,old}(X_i,\theta)\end{pmatrix} $$
be a block decomposition where Σinew, new(Xi, θ) = ((Σi(Xi, θ))j, l)j = 1…k, l = 1…k and similarly for the other blocks.
Predictions can then be made based on the conditional distribution Yi, 1…k | Xi, Yi, k + 1…p = yi, k + 1…p ∼ 𝒩(μi, Ai)
with
μi(β, θ) = (Xi β)1…k + Σinew, old(Xi, θ) ((Σiold, old(Xi, θ))−1(yik + 1…p − (Xi β)k + 1…p)) and
Ai(β, θ) = Σinew, new(Xi, θ) − Σiold, new(Xi, θ)(Σiold, old(Xi, θ))−1Σinew, old(Xi, θ) . Note that Ai does not depend on β.
predict
For implementing predict()
, only μ̂i := μi(β̂, θ̂)
is required.
For predict(interval = "confidence")
additionally
standard errors are required. These could be derived using the delta
methods since μi is a function
of the estimated model parameters β and θ. This would require the Jacobian
∇μi(β, θ)|(β̂, θ̂)
in addition to the estimated variance covariance matrix of the parameter
estimate (β̂, θ̂),
Ŝ. Standard errors for μ̂ (i) are then
given by the square root of the diagonal elements of (∇μi(β, θ)|(β̂, θ̂))⊤ Ŝ (∇μi(β, θ)|(β̂, θ̂))
For predict(interval = "prediction")
one would use the
square root of the diagonal elements of Ai(β̂, θ̂)
instead. The delta method could again be used to make upper and lower
boundaries reflect parameter estimation uncertainty.
Alternatively, both intervals can be derived using a parametric
bootstrap sample of the unrestricted parameters θ. This would probably also be
easier for the interval = "prediction"
case.
Please note that for these intervals, we assume that the distribution is approximately normal: we use μi, j(β̂, θ̂) ± Zα * sqrt(Ai, j, j(β̂, θ̂)) to construct it, where μi, j(β̂, θ̂) is the jth element of μi(β̂, θ̂), and Ai, j, j(β̂, θ̂) is the j, j element of Ai(β̂, θ̂).
With the conditional variance formula
Var(Yi) = Var(E(Yi|θ)) + E(Var(Yi|θ))
and the conditional expectation E(Yi|θ) and the conditional variance Var(Yi|θ) being already described as
E(Yi|θ) = μi(β, θ)
and
Var(Yi|θ) = Ai(β, θ),
we can sample on θ and obtain β, then calculate the variance of conditional mean and the mean of conditional variance.
If there are no observations for a subject, then the prediction is quite simple:
Yi = Xiβ̂
To create simulation of responses from a fitted model, we have multiple situations: whether this simulation is conditional on both θ and β estimates, or it is marginal?
Under conditional simulation setting, the variance-covariance matrix, and the expectation of Yi are already given in Mathematical Derivations.
Please note that in implementation of predict
function,
we only use the diagonal elements of Ai, however,
here we need to make use of the full matrix Ai to obtain
correctly correlated simulated observations.
To simulate marginally, we take the variance of θ̂ and β̂ into consideration. For each simulation, we first generate θ assuming it approximately follows a multivariate normal distribution. Then, conditional on the θ we sampled, we generate β also assuming it approximately follows a multivariate normal distribution.
Now we have θ and β estimates, and we just follow the conditional simulation.
simulate
To implement simulate
function, we first ensure that the
expectation (μ) and
variance-covariance matrix (A)
are generated in predict
function, for each of the
subjects.
For simulate(method = "conditional")
, we use the
estimated θ and β to construct the μ and A directly, and generate response
with N(μ, A)
distribution.
For simulate(method = "marginal")
, for each repetition
of simulation, we generate θnew
from the mmrm fit, where the estimate of θ and variance-covariance matrix of
θ are provided. Using the
generated θnew,
we then obtain the βnew
and its variance-covariance matrix, with θnew
and the data used in fit.
Then we sample β as
follows. We note that on the C++
side we already have the
robust Cholesky decomposition of the inverse of its asymptotic
covariance matrix: cov(β̂) = (X⊤WX)−1 = (LDL⊤)−1
Hence we make sure to report the lower triangular matrix L and the diagonal matrix D back to the R
side,
and afterwards we can generate β samples as follows: βsample = βnew + L−⊤D−1/2zsample
where zsample
is drawn from the standard multivariate normal distribution, since cov(L−⊤D−1/2zsample) = L−⊤D−1/2IpD−1/2L−1 = L−⊤D−1L−1 = (LDL⊤)−1 = cov(β̂)
We note that calculating w = L−⊤D−1/2zsample
is efficient via backwards solving L⊤w = D−1/2zsample
since L⊤ is upper
right triangular.
Then we simulate the observations once with
simulate(method = "conditional", beta = beta_sample, theta = theta_new)
.
We pool all the repetitions together and thus obtain the marginal
simulation results.
predict
and simulate
ResultsWe summarize the different options for predict
and
simulate
methods and explain how they relate to each
other.
predict
optionspredict(type = "confidence")
gives the variance of the
predictions conditional on the θ estimate, taking into account the
uncertainty of estimating β.
So here we ignore the uncertainty in estimating θ. It also does not add measurement
error ϵ. We can use this
prediction when we are only interested in predicting the mean of the
unobserved yi, assuming the
estimated θ as the true
variance parameters.predict(type = "prediction")
in contrast takes into
account the full uncertainty, including the variance of θ and the measurement error ϵ. We can use this prediction when
we are interested in reliable confidence intervals for the unobserved
yi as well
as the observed yi, assuming we
would like to predict repeated observations from the same subjects and
time points.simulate
optionssimulate(type = "conditional")
simulates observations
while keeping the θ and β estimates fixed. It adds
measurement errors ϵ when
generating the simulated values. Hence the mean of the simulated values
will be within the confidence intervals from
predict(type = "conditional")
.simulate(type = "marginal")
simulates observations
while taking into account the uncertainty of β and θ through sampling from their
asymptotic frequentist distributions. On top of that, it also adds
measurement errors ϵ. This
hence is using the same distribution as
predict(type = "prediction")
.SAS
In SAS
, from proc mixed
, we are able to
generate predictions using the outp
argument in the
model
statement. For example:
PROC MIXED DATA = fev_data method=reml;
CLASS RACE(ref = 'Asian') AVISIT(ref = 'VIS4') SEX(ref = 'Male') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = ARMCD / ddfm=Satterthewaite solution chisq outp=pred;
REPEATED AVISIT / subject=USUBJID type=un r rcorr;
LSMEANS ARMCD / pdiff=all cl alpha=0.05 slice=AVISIT;
RUN;
However, there are some differences between the SAS
implementation and our mmrm
package, described as
follows:
mmrm
and SAS
both provide predicted
means (conditional on other observations) for unobserved records,
SAS
also provides predicted means for observed records
while mmrm
does not. The rationale is that in the
mmrm
package we want to be consistent with the notion of
predictions conditional on the observed records - which means that
observed records are observed and therefore there is no prediction
uncertainty anymore.mmrm
and SAS
. While in SAS
the prediction standard
error is conditional on the estimated variance parameters θ, in mmrm
the marginal
prediction standard error is provided. The rationale is that in the
mmrm
package we want to take into account the full
uncertainty about parameter estimates including θ.SAS
are based on the t
distribution, while currently in mmrm
we use the normal
distribution. We will be considering an extension towards using the t
distribution in the future and welcome feedback on this detail.