Title: | Reference Based Multiple Imputation |
---|---|
Description: | Implements standard and reference based multiple imputation methods for continuous longitudinal endpoints (Gower-Page et al. (2022) <doi:10.21105/joss.04251>). In particular, this package supports deterministic conditional mean imputation and jackknifing as described in Wolbers et al. (2022) <doi:10.1002/pst.2234>, Bayesian multiple imputation as described in Carpenter et al. (2013) <doi:10.1080/10543406.2013.834911>, and bootstrapped maximum likelihood imputation as described in von Hippel and Bartlett (2021) <doi: 10.1214/20-STS793>. |
Authors: | Craig Gower-Page [aut, cre], Alessandro Noci [aut], Marcel Wolbers [ctb], Isaac Gravestock [aut], F. Hoffmann-La Roche AG [cph, fnd] |
Maintainer: | Craig Gower-Page <[email protected]> |
License: | Apache License (>= 2) |
Version: | 1.3.1 |
Built: | 2024-12-18 15:24:42 UTC |
Source: | https://github.com/insightsengineering/rbmi |
Utility function to add a class to an object. Adds the new class after any existing classes.
add_class(x, cls)
add_class(x, cls)
x |
object to add a class to. |
cls |
the class to be added. |
Adjust trajectories due to the intercurrent event (ICE)
adjust_trajectories( distr_pars_group, outcome, ids, ind_ice, strategy_fun, distr_pars_ref = NULL )
adjust_trajectories( distr_pars_group, outcome, ids, ind_ice, strategy_fun, distr_pars_ref = NULL )
distr_pars_group |
Named list containing the simulation parameters of the multivariate normal distribution assumed for the given treatment group. It contains the following elements:
|
outcome |
Numeric variable that specifies the longitudinal outcome. |
ids |
Factor variable that specifies the id of each subject. |
ind_ice |
A binary variable that takes value |
strategy_fun |
Function implementing trajectories after the intercurrent event (ICE). Must
be one of |
distr_pars_ref |
Optional. Named list containing the simulation parameters of the reference arm. It contains the following elements:
|
A numeric vector containing the adjusted trajectories.
Adjust trajectory of a subject's outcome due to the intercurrent event (ICE)
adjust_trajectories_single( distr_pars_group, outcome, strategy_fun, distr_pars_ref = NULL )
adjust_trajectories_single( distr_pars_group, outcome, strategy_fun, distr_pars_ref = NULL )
distr_pars_group |
Named list containing the simulation parameters of the multivariate normal distribution assumed for the given treatment group. It contains the following elements:
|
outcome |
Numeric variable that specifies the longitudinal outcome. |
strategy_fun |
Function implementing trajectories after the intercurrent event (ICE).
Must be one of |
distr_pars_ref |
Optional. Named list containing the simulation parameters of the reference arm. It contains the following elements:
|
outcome
should be specified such that all-and-only the post-ICE observations
(i.e. the
observations to be adjusted) are set to NA
.
A numeric vector containing the adjusted trajectory for a single subject.
This function takes multiple imputed datasets (as generated by
the impute()
function) and runs an analysis function on
each of them.
analyse( imputations, fun = ancova, delta = NULL, ..., ncores = 1, .validate = TRUE )
analyse( imputations, fun = ancova, delta = NULL, ..., ncores = 1, .validate = TRUE )
imputations |
An |
fun |
An analysis function to be applied to each imputed dataset. See details. |
delta |
A |
... |
Additional arguments passed onto |
ncores |
The number of parallel processes to use when running this function. Can also be a
cluster object created by |
.validate |
Should |
This function works by performing the following steps:
Extract a dataset from the imputations
object.
Apply any delta adjustments as specified by the delta
argument.
Run the analysis function fun
on the dataset.
Repeat steps 1-3 across all of the datasets inside the imputations
object.
Collect and return all of the analysis results.
The analysis function fun
must take a data.frame
as its first
argument. All other options to analyse()
are passed onto fun
via ...
.
fun
must return a named list with each element itself being a
list containing a single
numeric element called est
(or additionally se
and df
if
you had originally specified method_bayes()
or method_approxbayes()
)
i.e.:
myfun <- function(dat, ...) { mod_1 <- lm(data = dat, outcome ~ group) mod_2 <- lm(data = dat, outcome ~ group + covar) x <- list( trt_1 = list( est = coef(mod_1)[[group]], se = sqrt(vcov(mod_1)[group, group]), df = df.residual(mod_1) ), trt_2 = list( est = coef(mod_2)[[group]], se = sqrt(vcov(mod_2)[group, group]), df = df.residual(mod_2) ) ) return(x) }
Please note that the vars$subjid
column (as defined in the original call to
draws()
) will be scrambled in the data.frames that are provided to fun
.
This is to say they will not contain the original subject values and as such
any hard coding of subject ids is strictly to be avoided.
By default fun
is the ancova()
function.
Please note that this function
requires that a vars
object, as created by set_vars()
, is provided via
the vars
argument e.g. analyse(imputeObj, vars = set_vars(...))
. Please
see the documentation for ancova()
for full details.
Please also note that the theoretical justification for the conditional mean imputation
method (method = method_condmean()
in draws()
) relies on the fact that ANCOVA is
a linear transformation of the outcomes.
Thus care is required when applying alternative analysis functions in this setting.
The delta
argument can be used to specify offsets to be applied
to the outcome variable in the imputed datasets prior to the analysis.
This is typically used for sensitivity or tipping point analyses. The
delta dataset must contain columns vars$subjid
, vars$visit
(as specified
in the original call to draws()
) and delta
. Essentially this data.frame
is merged onto the imputed dataset by vars$subjid
and vars$visit
and then
the outcome variable is modified by:
imputed_data[[vars$outcome]] <- imputed_data[[vars$outcome]] + imputed_data[["delta"]]
Please note that in order to provide maximum flexibility, the delta
argument
can be used to modify any/all outcome values including those that were not
imputed. Care must be taken when defining offsets. It is recommend that you
use the helper function delta_template()
to define the delta datasets as
this provides utility variables such as is_missing
which can be used to identify
exactly which visits have been imputed.
To speed up the evaluation of analyse()
you can use the ncores
argument to enable parallelisation.
Simply providing an integer will get rbmi
to automatically spawn that many background processes
to parallelise across. If you are using a custom analysis function then you need to ensure
that any libraries or global objects required by your function are available in the
sub-processes. To do this you need to use the make_rbmi_cluster()
function for example:
my_custom_fun <- function(...) <some analysis code> cl <- make_rbmi_cluster( 4, objects = list("my_custom_fun" = my_custom_fun), packages = c("dplyr", "nlme") ) analyse( imputations = imputeObj, fun = my_custom_fun, ncores = cl ) parallel::stopCluster(cl)
Note that there is significant overhead both with setting up the sub-processes and with
transferring data back-and-forth between the main process and the sub-processes. As such
parallelisation of the analyse()
function tends to only be worth it when you have
> 2000
samples generated by draws()
. Conversely using parallelisation if your samples
are smaller than this may lead to longer run times than just running it sequentially.
It is important to note that the implementation of parallel processing within analyse()
has
been optimised around the assumption that the parallel processes will be spawned on the same
machine and not a remote cluster. One such optimisation is that the required data is saved to
a temporary file on the local disk from which it is then read into each sub-process. This is
done to avoid the overhead of transferring the data over the network. Our assumption is that
if you are at the stage where you need to be parallelising your analysis over a remote cluster
then you would likely be better off parallelising across multiple rbmi
runs rather than within
a single rbmi
run.
Finally, if you are doing a tipping point analysis you can get a reasonable performance
improvement by re-using the cluster between each call to analyse()
e.g.
cl <- make_rbmi_cluster(4) ana_1 <- analyse( imputations = imputeObj, delta = delta_plan_1, ncores = cl ) ana_2 <- analyse( imputations = imputeObj, delta = delta_plan_2, ncores = cl ) ana_3 <- analyse( imputations = imputeObj, delta = delta_plan_3, ncores = cl ) parallel::clusterStop(cl)
extract_imputed_dfs()
for manually extracting imputed
datasets.
delta_template()
for creating delta data.frames.
ancova()
for the default analysis function.
## Not run: vars <- set_vars( subjid = "subjid", visit = "visit", outcome = "outcome", group = "group", covariates = c("sex", "age", "sex*age") ) analyse( imputations = imputeObj, vars = vars ) deltadf <- data.frame( subjid = c("Pt1", "Pt1", "Pt2"), visit = c("Visit_1", "Visit_2", "Visit_2"), delta = c( 5, 9, -10) ) analyse( imputations = imputeObj, delta = deltadf, vars = vars ) ## End(Not run)
## Not run: vars <- set_vars( subjid = "subjid", visit = "visit", outcome = "outcome", group = "group", covariates = c("sex", "age", "sex*age") ) analyse( imputations = imputeObj, vars = vars ) deltadf <- data.frame( subjid = c("Pt1", "Pt1", "Pt2"), visit = c("Visit_1", "Visit_2", "Visit_2"), delta = c( 5, 9, -10) ) analyse( imputations = imputeObj, delta = deltadf, vars = vars ) ## End(Not run)
Performs an analysis of covariance between two groups returning the estimated "treatment effect" (i.e. the contrast between the two treatment groups) and the least square means estimates in each group.
ancova( data, vars, visits = NULL, weights = c("counterfactual", "equal", "proportional_em", "proportional") )
ancova( data, vars, visits = NULL, weights = c("counterfactual", "equal", "proportional_em", "proportional") )
data |
A |
vars |
A |
visits |
An optional character vector specifying which visits to
fit the ancova model at. If |
weights |
Character, either |
The function works as follows:
Select the first value from visits
.
Subset the data to only the observations that occurred on this visit.
Fit a linear model as vars$outcome ~ vars$group + vars$covariates
.
Extract the "treatment effect" & least square means for each treatment group.
Repeat points 2-3 for all other values in visits
.
If no value for visits
is provided then it will be set to
unique(data[[vars$visit]])
.
In order to meet the formatting standards set by analyse()
the results will be collapsed
into a single list suffixed by the visit name, e.g.:
list( trt_visit_1 = list(est = ...), lsm_ref_visit_1 = list(est = ...), lsm_alt_visit_1 = list(est = ...), trt_visit_2 = list(est = ...), lsm_ref_visit_2 = list(est = ...), lsm_alt_visit_2 = list(est = ...), ... )
Please note that "ref" refers to the first factor level of vars$group
which does not necessarily
coincide with the control arm. Analogously, "alt" refers to the second factor level of vars$group
.
"trt" refers to the model contrast translating the mean difference between the second level and first level.
If you want to include interaction terms in your model this can be done
by providing them to the covariates
argument of set_vars()
e.g. set_vars(covariates = c("sex*age"))
.
For weights = "counterfactual"
(the default) the lsmeans are obtained by
taking the average of the predicted values for each patient after assigning all patients
to each arm in turn.
This approach is equivalent to standardization or g-computation.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", counterfactual = "<treatment>")
Note that to ensure backwards compatibility with previous versions of rbmi
weights = "proportional"
is an alias for weights = "counterfactual"
.
To get results consistent with emmeans
's weights = "proportional"
please use weights = "proportional_em"
.
For weights = "equal"
the lsmeans are obtained by taking the model fitted
value of a hypothetical patient whose covariates are defined as follows:
Continuous covariates are set to mean(X)
Dummy categorical variables are set to 1/N
where N
is the number of levels
Continuous * continuous interactions are set to mean(X) * mean(Y)
Continuous * categorical interactions are set to mean(X) * 1/N
Dummy categorical * categorical interactions are set to 1/N * 1/M
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "equal")
For weights = "proportional_em"
the lsmeans are obtained as per weights = "equal"
except instead of weighting each observation equally they are weighted by the proportion
in which the given combination of categorical values occurred in the data.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "proportional")
Note that this is not to be confused with weights = "proportional"
which is an alias
for weights = "counterfactual"
.
Performance analysis of covariance. See ancova()
for full details.
ancova_single( data, outcome, group, covariates, weights = c("counterfactual", "equal", "proportional_em", "proportional") )
ancova_single( data, outcome, group, covariates, weights = c("counterfactual", "equal", "proportional_em", "proportional") )
data |
A |
outcome |
Character, the name of the outcome variable in |
group |
Character, the name of the group variable in |
covariates |
Character vector containing the name of any additional covariates to be included in the model as well as any interaction terms. |
weights |
Character, either |
group
must be a factor variable with only 2 levels.
outcome
must be a continuous numeric variable.
For weights = "counterfactual"
(the default) the lsmeans are obtained by
taking the average of the predicted values for each patient after assigning all patients
to each arm in turn.
This approach is equivalent to standardization or g-computation.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", counterfactual = "<treatment>")
Note that to ensure backwards compatibility with previous versions of rbmi
weights = "proportional"
is an alias for weights = "counterfactual"
.
To get results consistent with emmeans
's weights = "proportional"
please use weights = "proportional_em"
.
For weights = "equal"
the lsmeans are obtained by taking the model fitted
value of a hypothetical patient whose covariates are defined as follows:
Continuous covariates are set to mean(X)
Dummy categorical variables are set to 1/N
where N
is the number of levels
Continuous * continuous interactions are set to mean(X) * mean(Y)
Continuous * categorical interactions are set to mean(X) * 1/N
Dummy categorical * categorical interactions are set to 1/N * 1/M
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "equal")
For weights = "proportional_em"
the lsmeans are obtained as per weights = "equal"
except instead of weighting each observation equally they are weighted by the proportion
in which the given combination of categorical values occurred in the data.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "proportional")
Note that this is not to be confused with weights = "proportional"
which is an alias
for weights = "counterfactual"
.
## Not run: iris2 <- iris[ iris$Species %in% c("versicolor", "virginica"), ] iris2$Species <- factor(iris2$Species) ancova_single(iris2, "Sepal.Length", "Species", c("Petal.Length * Petal.Width")) ## End(Not run)
## Not run: iris2 <- iris[ iris$Species %in% c("versicolor", "virginica"), ] iris2$Species <- factor(iris2$Species) ancova_single(iris2, "Sepal.Length", "Species", c("Petal.Length * Petal.Width")) ## End(Not run)
A dataset containing data from a publicly available example data set from an antidepressant clinical trial. The dataset is available on the website of the Drug Information Association Scientific Working Group on Estimands and Missing Data. As per that website, the original data are from an antidepressant clinical trial with four treatments; two doses of an experimental medication, a positive control, and placebo and was published in Goldstein et al (2004). To mask the real data, week 8 observations were removed and two arms were created: the original placebo arm and a "drug arm" created by randomly selecting patients from the three non-placebo arms.
antidepressant_data
antidepressant_data
A data.frame
with 608 rows and 11 variables:
PATIENT
: patients IDs.
HAMATOTL
: total score Hamilton Anxiety Rating Scale.
PGIIMP
: patient's Global Impression of Improvement Rating Scale.
RELDAYS
: number of days between visit and baseline.
VISIT
: post-baseline visit. Has levels 4,5,6,7.
THERAPY
: the treatment group variable. It is equal to PLACEBO
for observations from
the placebo arm, or DRUG
for observations from the active arm.
GENDER
: patient's gender.
POOLINV
: pooled investigator.
BASVAL
: baseline outcome value.
HAMDTL17
: Hamilton 17-item rating scale value.
CHANGE
: change from baseline in the Hamilton 17-item rating scale.
The relevant endpoint is the Hamilton 17-item rating scale for depression (HAMD17) for which baseline and weeks 1, 2, 4, and 6 assessments are included. Study drug discontinuation occurred in 24% subjects from the active drug and 26% from placebo. All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.
Goldstein, Lu, Detke, Wiltse, Mallinckrodt, Demitrack. Duloxetine in the treatment of depression: a double-blind placebo-controlled comparison with paroxetine. J Clin Psychopharmacol 2004;24: 389-399.
Takes a delta dataset and adjusts the outcome variable by adding the corresponding delta.
apply_delta(data, delta = NULL, group = NULL, outcome = NULL)
apply_delta(data, delta = NULL, group = NULL, outcome = NULL)
data |
|
delta |
|
group |
character vector of variables in both |
outcome |
character, name of the outcome variable in |
analysis
objectCreates an analysis object ensuring that all components are correctly defined.
as_analysis(results, method, delta = NULL, fun = NULL, fun_name = NULL)
as_analysis(results, method, delta = NULL, fun = NULL, fun_name = NULL)
results |
A list of lists contain the analysis results for each imputation
See |
method |
The method object as specified in |
delta |
The delta dataset used. See |
fun |
The analysis function that was used. |
fun_name |
The character name of the analysis function (used for printing) purposes. |
This function takes a data.frame and attempts to convert it into a simple ascii format suitable for printing to the screen It is assumed all variable values have a as.character() method in order to cast them to character.
as_ascii_table(dat, line_prefix = " ", pcol = NULL)
as_ascii_table(dat, line_prefix = " ", pcol = NULL)
dat |
Input dataset to convert into a ascii table |
line_prefix |
Symbols to prefix infront of every line of the table |
pcol |
name of column to be handled as a p-value. Sets the value to <0.001 if the value is 0 after rounding |
Utility function to set an objects class.
as_class(x, cls)
as_class(x, cls)
x |
object to set the class of. |
cls |
the class to be set. |
Makes any character string above x chars Reduce down to a x char string with ...
as_cropped_char(inval, crop_at = 30, ndp = 3)
as_cropped_char(inval, crop_at = 30, ndp = 3)
inval |
a single element value |
crop_at |
character limit |
ndp |
Number of decimal places to display |
Convert object to dataframe
as_dataframe(x)
as_dataframe(x)
x |
a data.frame like object Utility function to convert a "data.frame-like" object to an actual |
draws
objectCreates a draws
object which is the final output of a call to draws()
.
as_draws(method, samples, data, formula, n_failures = NULL, fit = NULL)
as_draws(method, samples, data, formula, n_failures = NULL, fit = NULL)
method |
A |
samples |
A list of |
data |
R6 |
formula |
Fixed effects formula object used for the model specification. |
n_failures |
Absolute number of failures of the model fit. |
fit |
If |
A draws
object which is a named list containing the following:
data
: R6 longdata
object containing all relevant input data information.
method
: A method
object as generated by either method_bayes()
,
method_approxbayes()
or method_condmean()
.
samples
: list containing the estimated parameters of interest.
Each element of samples
is a named list containing the following:
ids
: vector of characters containing the ids of the subjects included in the original dataset.
beta
: numeric vector of estimated regression coefficients.
sigma
: list of estimated covariance matrices (one for each level of vars$group
).
theta
: numeric vector of transformed covariances.
failed
: Logical. TRUE
if the model fit failed.
ids_samp
: vector of characters containing the ids of the subjects included in the given sample.
fit
: if method_bayes()
is chosen, returns the MCMC Stan fit object. Otherwise NULL
.
n_failures
: absolute number of failures of the model fit.
Relevant only for method_condmean(type = "bootstrap")
, method_approxbayes()
and method_bmlmi()
.
formula
: fixed effects formula object used for the model specification.
This function creates the object that is returned from impute()
. Essentially
it is a glorified wrapper around list()
ensuring that the required elements have been
set and that the class is added as expected.
as_imputation(imputations, data, method, references)
as_imputation(imputations, data, method, references)
imputations |
A list of |
data |
A |
method |
A |
references |
A named vector. Identifies the references to be used when generating the
imputed values. Should be of the form |
Converts a string of 0's and 1's into index positions of the 1's padding the results by 0's so they are all the same length
as_indices(x)
as_indices(x)
x |
a character vector whose values are all either "0" or "1". All elements of the vector must be the same length |
i.e.
patmap(c("1101", "0001")) -> list(c(1,2,4,999), c(4,999, 999, 999))
Converts a design matrix + key variables into a common format In particular this function does the following:
Renames all covariates as V1
, V2
, etc to avoid issues of special characters in variable names
Ensures all key variables are of the right type
Inserts the outcome, visit and subjid variables into the data.frame
naming them as outcome
, visit
and subjid
If provided will also insert the group variable into the data.frame
named as group
as_mmrm_df(designmat, outcome, visit, subjid, group = NULL)
as_mmrm_df(designmat, outcome, visit, subjid, group = NULL)
designmat |
a |
outcome |
a numeric vector. The outcome value to be regressed on in the MMRM model. |
visit |
a character / factor vector. Indicates which visit the outcome value occurred on. |
subjid |
a character / factor vector. The subject identifier used to link separate visits that belong to the same subject. |
group |
a character / factor vector. Indicates which treatment group the patient belongs to. |
Derives the MMRM model formula from the structure of mmrm_df. returns a formula object of the form:
as_mmrm_formula(mmrm_df, cov_struct)
as_mmrm_formula(mmrm_df, cov_struct)
mmrm_df |
an mmrm |
cov_struct |
Character - The covariance structure to be used, must be one of |
outcome ~ 0 + V1 + V2 + V4 + ... + us(visit | group / subjid)
data.frame
into a design matrixExpands out a data.frame
using a formula to create a design matrix.
Key details are that it will always place the outcome variable into
the first column of the return object.
as_model_df(dat, frm)
as_model_df(dat, frm)
dat |
a data.frame |
frm |
a formula |
The outcome column may contain NA's but none of the other variables listed in the formula should contain missing values
Converts a string list of variables into a formula object
as_simple_formula(outcome, covars)
as_simple_formula(outcome, covars)
outcome |
character (length 1 vector). Name of the outcome variable |
covars |
character (vector). Name of covariates |
A formula
Converts a numeric value of length 1 into a 1 dimension array. This is to avoid type errors that are thrown by stan when length 1 numeric vectors are provided by R for stan::vector inputs
as_stan_array(x)
as_stan_array(x)
x |
a numeric vector |
Collapse multiple categorical variables into distinct unique categories. e.g.
as_strata(c(1,1,2,2,2,1), c(5,6,5,5,6,5))
would return
c(1,2,3,3,4,1)
as_strata(...)
as_strata(...)
... |
numeric/character/factor vectors of the same length |
## Not run: as_strata(c(1,1,2,2,2,1), c(5,6,5,5,6,5)) ## End(Not run)
## Not run: as_strata(c(1,1,2,2,2,1), c(5,6,5,5,6,5)) ## End(Not run)
Performs an assertion check to ensure that a vector of variable exists within a data.frame as expected.
assert_variables_exist(data, vars)
assert_variables_exist(data, vars)
data |
a data.frame |
vars |
a character vector of variable names |
Provided a vector of variable names this function converts any character variables into factors. Has no affect on numeric or existing factor variables
char2fct(data, vars = NULL)
char2fct(data, vars = NULL)
data |
A data.frame |
vars |
a character vector of variables in |
Check the quality of the MCMC draws from the posterior distribution by checking whether the relative ESS is sufficiently large.
check_ESS(stan_fit, n_draws, threshold_lowESS = 0.4)
check_ESS(stan_fit, n_draws, threshold_lowESS = 0.4)
stan_fit |
A |
n_draws |
Number of MCMC draws. |
threshold_lowESS |
A number in |
check_ESS()
works as follows:
Extract the ESS from stan_fit
for each parameter of the model.
Compute the relative ESS (i.e. the ESS divided by the number of draws).
Check whether for any of the parameter the ESS is lower than threshold
.
If for at least one parameter the relative ESS is below the threshold,
a warning is thrown.
A warning message in case of detected problems.
Check that:
There are no divergent iterations.
The Bayesian Fraction of Missing Information (BFMI) is sufficiently low.
The number of iterations that saturated the max treedepth is zero.
Please see rstan::check_hmc_diagnostics()
for details.
check_hmc_diagn(stan_fit)
check_hmc_diagn(stan_fit)
stan_fit |
A |
A warning message in case of detected problems.
Diagnostics of the MCMC
check_mcmc(stan_fit, n_draws, threshold_lowESS = 0.4)
check_mcmc(stan_fit, n_draws, threshold_lowESS = 0.4)
stan_fit |
A |
n_draws |
Number of MCMC draws. |
threshold_lowESS |
A number in |
Performs checks of the quality of the MCMC. See check_ESS()
and check_hmc_diagn()
for details.
A warning message in case of detected problems.
Adapt covariance matrix in reference-based methods. Used for Copy Increments in Reference (CIR) and Jump To Reference (JTR) methods, to adapt the covariance matrix to different pre-deviation and post deviation covariance structures. See Carpenter et al. (2013)
compute_sigma(sigma_group, sigma_ref, index_mar)
compute_sigma(sigma_group, sigma_ref, index_mar)
sigma_group |
the covariance matrix with dimensions equal to |
sigma_ref |
the covariance matrix with dimensions equal to |
index_mar |
A logical vector indicating which visits meet the MAR assumption for the subject. I.e. this identifies the observations that after a non-MAR intercurrent event (ICE). |
Carpenter, James R., James H. Roger, and Michael G. Kenward. "Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation." Journal of Biopharmaceutical statistics 23.6 (2013): 1352-1371.
imputation_list_single()
objects to an imputation_list_df()
object
(i.e. a list of imputation_df()
objects's)Convert list of imputation_list_single()
objects to an imputation_list_df()
object
(i.e. a list of imputation_df()
objects's)
convert_to_imputation_list_df(imputes, sample_ids)
convert_to_imputation_list_df(imputes, sample_ids)
imputes |
a list of |
sample_ids |
A list with 1 element per required imputation_df. Each element
must contain a vector of "ID"'s which correspond to the To accommodate for
This function is best illustrated by an example: imputes = list( imputation_list_single( id = "Tom", imputations = matrix( imputation_single_t_1_1, imputation_single_t_1_2, imputation_single_t_2_1, imputation_single_t_2_2, imputation_single_t_3_1, imputation_single_t_3_2 ) ), imputation_list_single( id = "Tom", imputations = matrix( imputation_single_h_1_1, imputation_single_h_1_2, ) ) ) sample_ids <- list( c("Tom", "Harry", "Tom"), c("Tom") ) Then imputation_list_df( imputation_df( imputation_single_t_1_1, imputation_single_h_1_1, imputation_single_t_2_1 ), imputation_df( imputation_single_t_1_2, imputation_single_h_1_2, imputation_single_t_2_2 ), imputation_df( imputation_single_t_3_1 ), imputation_df( imputation_single_t_3_2 ) ) Note that the different repetitions (i.e. the value set for D) are grouped together sequentially. |
Calculates a delta value based upon a baseline delta value and a post ICE scaling coefficient.
d_lagscale(delta, dlag, is_post_ice)
d_lagscale(delta, dlag, is_post_ice)
delta |
a numeric vector. Determines the baseline amount of delta to be applied to each visit. |
dlag |
a numeric vector. Determines the scaling to be applied
to |
is_post_ice |
logical vector. Indicates whether a visit is "post-ICE" or not. |
See delta_template()
for full details on how this calculation is performed.
data.frame
templateCreates a data.frame
in the format required by analyse()
for the use
of applying a delta adjustment.
delta_template(imputations, delta = NULL, dlag = NULL, missing_only = TRUE)
delta_template(imputations, delta = NULL, dlag = NULL, missing_only = TRUE)
imputations |
an |
delta |
|
dlag |
|
missing_only |
Logical, if |
To apply a delta adjustment the analyse()
function expects
a delta data.frame
with 3 variables: vars$subjid
, vars$visit
and delta
(where vars
is the object supplied in the original call to draws()
as created by the set_vars()
function).
This function will return a data.frame
with the aforementioned variables with one
row per subject per visit. If the delta
argument to this function is NULL
then the delta
column in the returned data.frame
will be 0 for all observations.
If the delta
argument is not NULL
then delta
will be calculated separately
for each subject as the accumulative sum of delta
multiplied by the scaling
coefficient dlag
based upon how many visits after the subject's intercurrent
event (ICE) the visit in question is.
This is best illustrated with an example:
Let delta = c(5,6,7,8)
and dlag=c(1,2,3,4)
(i.e. assuming there are 4 visits)
and lets say that the subject had an ICE on visit 2. The calculation would then be
as follows:
v1 v2 v3 v4 -------------- 5 6 7 8 # delta assigned to each visit 0 1 2 3 # lagged scaling starting from the first visit after the subjects ICE -------------- 0 6 14 24 # delta * lagged scaling -------------- 0 6 20 44 # accumulative sum of delta to be applied to each visit
That is to say the subject would have a delta offset of 0 applied for visit-1, 6 for visit-2, 20 for visit-3 and 44 for visit-4. As a comparison, lets say that the subject instead had their ICE on visit 3, the calculation would then be as follows:
v1 v2 v3 v4 -------------- 5 6 7 8 # delta assigned to each visit 0 0 1 2 # lagged scaling starting from the first visit after the subjects ICE -------------- 0 0 7 16 # delta * lagged scaling -------------- 0 0 7 23 # accumulative sum of delta to be applied to each visit
In terms of practical usage, lets say that you wanted a delta of 5 to be used for all post ICE visits
regardless of their proximity to the ICE visit. This can be achieved by setting
delta = c(5,5,5,5)
and dlag = c(1,0,0,0)
. For example lets say a subject had their
ICE on visit-1, then the calculation would be as follows:
v1 v2 v3 v4 -------------- 5 5 5 5 # delta assigned to each visit 1 0 0 0 # lagged scaling starting from the first visit after the subjects ICE -------------- 5 0 0 0 # delta * lagged scaling -------------- 5 5 5 5 # accumulative sum of delta to be applied to each visit
Another way of using these arguments
is to set delta
to be the difference in time between visits and dlag
to be the
amount of delta per unit of time. For example lets say that we have a visit on weeks
1, 5, 6 & 9 and that we want a delta of 3 to be applied for each week after an ICE. This
can be achieved by setting delta = c(0,4,1,3)
(the difference in weeks between each visit)
and dlag = c(3, 3, 3, 3)
. For example lets say we have a subject who had their ICE on week-5
(i.e. visit-2) then the calculation would be:
v1 v2 v3 v4 -------------- 0 4 1 3 # delta assigned to each visit 0 0 3 3 # lagged scaling starting from the first visit after the subjects ICE -------------- 0 0 3 9 # delta * lagged scaling -------------- 0 0 3 12 # accumulative sum of delta to be applied to each visit
i.e. on week-6 (1 week after the ICE) they have a delta of 3 and on week-9 (4 weeks after the ICE) they have a delta of 12.
Please note that this function also returns several utility variables so that
the user can create their own custom logic for defining what delta
should be set to. These additional variables include:
is_mar
- If the observation was missing would it be regarded as MAR? This variable
is set to FALSE
for observations that occurred after a non-MAR ICE, otherwise it is set to TRUE
.
is_missing
- Is the outcome variable for this observation missing.
is_post_ice
- Does the observation occur after the patient's ICE as defined by the
data_ice
dataset supplied to draws()
.
strategy
- What imputation strategy was assigned to for this subject.
The design and implementation of this function is largely based upon the same functionality as implemented in the so called "five marcos" by James Roger. See Roger (2021).
Roger, James. Reference-based mi via multivariate normal rm (the “five macros” and miwithd), 2021. URL https://www.lshtm.ac.uk/research/centres-projects-groups/missing-data#dia-missing-data.
## Not run: delta_template(imputeObj) delta_template(imputeObj, delta = c(5,6,7,8), dlag = c(1,2,3,4)) ## End(Not run)
## Not run: delta_template(imputeObj) delta_template(imputeObj, delta = c(5,6,7,8), dlag = c(1,2,3,4)) ## End(Not run)
draws
fits the base imputation model to the observed outcome data
according to the given multiple imputation methodology.
According to the user's method specification, it returns either draws from the posterior distribution of the
model parameters as required for Bayesian multiple imputation or frequentist parameter estimates from the
original data and bootstrapped or leave-one-out datasets as required for conditional mean imputation.
The purpose of the imputation model is to estimate model parameters
in the absence of intercurrent events (ICEs) handled using reference-based imputation methods.
For this reason, any observed outcome data after ICEs, for which reference-based imputation methods are
specified, are removed and considered as missing for the purpose of estimating the imputation model, and for
this purpose only. The imputation model is a mixed model for repeated measures (MMRM) that is valid
under a missing-at-random (MAR) assumption.
It can be fit using maximum likelihood (ML) or restricted ML (REML) estimation,
a Bayesian approach, or an approximate Bayesian approach according to the user's method specification.
The ML/REML approaches and the approximate Bayesian approach support several possible covariance structures,
while the Bayesian approach based on MCMC sampling supports only an unstructured covariance structure.
In any case the covariance matrix can be assumed to be the same or different across each group.
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'approxbayes' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'condmean' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'bmlmi' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'bayes' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'approxbayes' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'condmean' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'bmlmi' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE) ## S3 method for class 'bayes' draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
data |
A |
data_ice |
A |
vars |
A |
method |
A |
ncores |
A single numeric specifying the number of cores to use in creating the draws object.
Note that this parameter is ignored for |
quiet |
Logical, if |
draws
performs the first step of the multiple imputation (MI) procedure: fitting the
base imputation model. The goal is to estimate the parameters of interest needed
for the imputation phase (i.e. the regression coefficients and the covariance matrices
from a MMRM model).
The function distinguishes between the following methods:
Bayesian MI based on MCMC sampling: draws
returns the draws
from the posterior distribution of the parameters using a Bayesian approach based on
MCMC sampling. This method can be specified by using method = method_bayes()
.
Approximate Bayesian MI based on bootstrapping: draws
returns
the draws from the posterior distribution of the parameters using an approximate Bayesian approach,
where the sampling from the posterior distribution is simulated by fitting the MMRM model
on bootstrap samples of the original dataset. This method can be specified by using
method = method_approxbayes()]
.
Conditional mean imputation with bootstrap re-sampling: draws
returns the
MMRM parameter estimates from the original dataset and from n_samples
bootstrap samples.
This method can be specified by using method = method_condmean()
with
argument type = "bootstrap"
.
Conditional mean imputation with jackknife re-sampling: draws
returns the
MMRM parameter estimates from the original dataset and from each leave-one-subject-out sample.
This method can be specified by using method = method_condmean()
with
argument type = "jackknife"
.
Bootstrapped Maximum Likelihood MI: draws
returns the MMRM parameter estimates from
a given number of bootstrap samples needed to perform random imputations of the bootstrapped samples.
This method can be specified by using method = method_bmlmi()
.
Bayesian MI based on MCMC sampling has been proposed in Carpenter, Roger, and Kenward (2013) who first introduced reference-based imputation methods. Approximate Bayesian MI is discussed in Little and Rubin (2002). Conditional mean imputation methods are discussed in Wolbers et al (2022). Bootstrapped Maximum Likelihood MI is described in Von Hippel & Bartlett (2021).
The argument data
contains the longitudinal data. It must have at least the following variables:
subjid
: a factor vector containing the subject ids.
visit
: a factor vector containing the visit the outcome was observed on.
group
: a factor vector containing the group that the subject belongs to.
outcome
: a numeric vector containing the outcome variable. It might contain missing values.
Additional baseline or time-varying covariates must be included in data
.
data
must have one row per visit per subject. This means that incomplete
outcome data must be set as NA
instead of having the related row missing. Missing values
in the covariates are not allowed.
If data
is incomplete
then the expand_locf()
helper function can be used to insert any missing rows using
Last Observation Carried Forward (LOCF) imputation to impute the covariates values.
Note that LOCF is generally not a principled imputation method and should only be used when appropriate
for the specific covariate.
Please note that there is no special provisioning for the baseline outcome values. If you do not want baseline
observations to be included in the model as part of the response variable then these should be removed in advance
from the outcome variable in data
. At the same time if you want to include the baseline outcome as covariate in
the model, then this should be included as a separate column of data
(as any other covariate).
Character covariates will be explicitly
cast to factors. If you use a custom analysis function that requires specific reference
levels for the character covariates (for example in the computation of the least square means
computation) then you are advised
to manually cast your character covariates to factor in advance of running draws()
.
The argument data_ice
contains information about the occurrence of ICEs. It is a
data.frame
with 3 columns:
Subject ID: a character vector containing the ids of the subjects that experienced
the ICE. This column must be named as specified in vars$subjid
.
Visit: a character vector containing the first visit after the occurrence of the ICE
(i.e. the first visit affected by the ICE).
The visits must be equal to one of the levels of data[[vars$visit]]
.
If multiple ICEs happen for the same subject, then only the first non-MAR visit should be used.
This column must be named as specified in vars$visit
.
Strategy: a character vector specifying the imputation strategy to address the ICE for this subject.
This column must be named as specified in vars$strategy
.
Possible imputation strategies are:
"MAR"
: Missing At Random.
"CIR"
: Copy Increments in Reference.
"CR"
: Copy Reference.
"JR"
: Jump to Reference.
"LMCF"
: Last Mean Carried Forward.
For explanations of these imputation strategies, see Carpenter, Roger, and Kenward (2013), Cro et al (2021),
and Wolbers et al (2022).
Please note that user-defined imputation strategies can also be set.
The data_ice
argument is necessary at this stage since (as explained in Wolbers et al (2022)), the model is fitted
after removing the observations which are incompatible with the imputation model, i.e.
any observed data on or after data_ice[[vars$visit]]
that are addressed with an imputation
strategy different from MAR are excluded for the model fit. However such observations
will not be discarded from the data in the imputation phase
(performed with the function (impute()
). To summarize, at this stage only pre-ICE data
and post-ICE data that is after ICEs for which MAR imputation is specified are used.
If the data_ice
argument is omitted, or if a subject doesn't have a record within data_ice
, then it is
assumed that all of the relevant subject's data is pre-ICE and as such all missing
visits will be imputed under the MAR assumption and all observed data will be used to fit the base imputation model.
Please note that the ICE visit cannot be updated via the update_strategy
argument
in impute()
; this means that subjects who didn't have a record in data_ice
will always have their
missing data imputed under the MAR assumption even if their strategy is updated.
The vars
argument is a named list that specifies the names of key variables within
data
and data_ice
. This list is created by set_vars()
and contains the following named elements:
subjid
: name of the column in data
and data_ice
which contains the subject ids variable.
visit
: name of the column in data
and data_ice
which contains the visit variable.
group
: name of the column in data
which contains the group variable.
outcome
: name of the column in data
which contains the outcome variable.
covariates
: vector of characters which contains the covariates to be included
in the model (including interactions which are specified as "covariateName1*covariateName2"
).
If no covariates are provided the default model specification of outcome ~ 1 + visit + group
will be used.
Please note that the group*visit
interaction
is not included in the model by default.
strata
: covariates used as stratification variables in the bootstrap sampling.
By default only the vars$group
is set as stratification variable.
Needed only for method_condmean(type = "bootstrap")
and method_approxbayes()
.
strategy
: name of the column in data_ice
which contains the subject-specific imputation strategy.
In our experience, Bayesian MI (method = method_bayes()
) with a relatively low number of
samples (e.g. n_samples
below 100) frequently triggers STAN warnings about R-hat such as
"The largest R-hat is X.XX, indicating chains have not mixed". In many instances, this warning
might be spurious, i.e. standard diagnostics analysis of the MCMC samples do not indicate any
issues and results look reasonable. Increasing the number of samples to e.g. above 150 usually
gets rid of the warning.
A draws
object which is a named list containing the following:
data
: R6 longdata
object containing all relevant input data information.
method
: A method
object as generated by either method_bayes()
,
method_approxbayes()
or method_condmean()
.
samples
: list containing the estimated parameters of interest.
Each element of samples
is a named list containing the following:
ids
: vector of characters containing the ids of the subjects included in the original dataset.
beta
: numeric vector of estimated regression coefficients.
sigma
: list of estimated covariance matrices (one for each level of vars$group
).
theta
: numeric vector of transformed covariances.
failed
: Logical. TRUE
if the model fit failed.
ids_samp
: vector of characters containing the ids of the subjects included in the given sample.
fit
: if method_bayes()
is chosen, returns the MCMC Stan fit object. Otherwise NULL
.
n_failures
: absolute number of failures of the model fit.
Relevant only for method_condmean(type = "bootstrap")
, method_approxbayes()
and method_bmlmi()
.
formula
: fixed effects formula object used for the model specification.
James R Carpenter, James H Roger, and Michael G Kenward. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics, 23(6):1352–1371, 2013.
Suzie Cro, Tim P Morris, Michael G Kenward, and James R Carpenter. Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Statistics in Medicine, 39(21):2815–2842, 2020.
Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Hoboken, New Jersey, 2002. [Section 10.2.3]
Marcel Wolbers, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, Jonathan W. Bartlett. Standard and reference-based conditional mean imputation. https://arxiv.org/abs/2109.11162, 2022.
Von Hippel, Paul T and Bartlett, Jonathan W. Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws. 2021.
method_bayes()
, method_approxbayes()
, method_condmean()
, method_bmlmi()
for setting method
.
set_vars()
for setting vars
.
expand_locf()
for expanding data
in case of missing rows.
For more details see the quickstart vignette:
vignette("quickstart", package = "rbmi")
.
This is a utility function that attempts to evaluate a call to mmrm
managing any warnings or errors that are thrown. In particular
this function attempts to catch any warnings or errors and instead
of surfacing them it will simply add an additional element failed
with a value of TRUE. This allows for multiple calls to be made
without the program exiting.
eval_mmrm(expr)
eval_mmrm(expr)
expr |
An expression to be evaluated. Should be a call to |
This function was originally developed for use with glmmTMB which needed more hand-holding and dropping of false-positive warnings. It is not as important now but is kept around encase we need to catch false-positive warnings again in the future.
## Not run: eval_mmrm({ mmrm::mmrm(formula, data) }) ## End(Not run)
## Not run: eval_mmrm({ mmrm::mmrm(formula, data) }) ## End(Not run)
data.frame
rowsThese functions are essentially wrappers around base::expand.grid()
to ensure that missing
combinations of data are inserted into a data.frame
with imputation/fill methods for updating
covariate values of newly created rows.
expand(data, ...) fill_locf(data, vars, group = NULL, order = NULL) expand_locf(data, ..., vars, group, order)
expand(data, ...) fill_locf(data, vars, group = NULL, order = NULL) expand_locf(data, ..., vars, group, order)
data |
dataset to expand or fill in. |
... |
variables and the levels that should be expanded out (note that duplicate entries of levels will result in multiple rows for that level). |
vars |
character vector containing the names of variables that need to be filled in. |
group |
character vector containing the names of variables to group
by when performing LOCF imputation of |
order |
character vector containing the names of additional variables to sort the |
The draws()
function makes the assumption that all subjects and visits are present
in the data.frame
and that all covariate values are non missing; expand()
,
fill_locf()
and expand_locf()
are utility functions to support users in ensuring
that their data.frame
's conform to these assumptions.
expand()
takes vectors for expected levels in a data.frame
and expands out all
combinations inserting any missing rows into the data.frame
. Note that all "expanded"
variables are cast as factors.
fill_locf()
applies LOCF imputation to named covariates to fill in any NAs created
by the insertion of new rows by expand()
(though do note that no distinction is
made between existing NAs and newly created NAs). Note that the data.frame
is sorted
by c(group, order)
before performing the LOCF imputation; the data.frame
will be returned in the original sort order however.
expand_locf()
a simple composition function of fill_locf()
and expand()
i.e.
fill_locf(expand(...))
.
The fill_locf()
function performs last observation carried forward imputation.
A natural consequence of this is that it is unable to impute missing observations if the
observation is the first value for a given subject / grouping.
These values are deliberately not imputed as doing so risks silent errors in the case of time
varying covariates.
One solution is to first use expand_locf()
on just
the visit variable and time varying covariates and then merge on the baseline covariates
afterwards i.e.
library(dplyr) dat_expanded <- expand( data = dat, subject = c("pt1", "pt2", "pt3", "pt4"), visit = c("vis1", "vis2", "vis3") ) dat_filled <- dat_expanded %>% left_join(baseline_covariates, by = "subject")
## Not run: dat_expanded <- expand( data = dat, subject = c("pt1", "pt2", "pt3", "pt4"), visit = c("vis1", "vis2", "vis3") ) dat_filled <- fill_loc( data = dat_expanded, vars = c("Sex", "Age"), group = "subject", order = "visit" ) ## Or dat_filled <- expand_locf( data = dat, subject = c("pt1", "pt2", "pt3", "pt4"), visit = c("vis1", "vis2", "vis3"), vars = c("Sex", "Age"), group = "subject", order = "visit" ) ## End(Not run)
## Not run: dat_expanded <- expand( data = dat, subject = c("pt1", "pt2", "pt3", "pt4"), visit = c("vis1", "vis2", "vis3") ) dat_filled <- fill_loc( data = dat_expanded, vars = c("Sex", "Age"), group = "subject", order = "visit" ) ## Or dat_filled <- expand_locf( data = dat, subject = c("pt1", "pt2", "pt3", "pt4"), visit = c("vis1", "vis2", "vis3"), vars = c("Sex", "Age"), group = "subject", order = "visit" ) ## End(Not run)
Takes a string including potentially model terms like *
and :
and
extracts out the individual variables
extract_covariates(x)
extract_covariates(x)
x |
string of variable names potentially including interaction terms |
i.e. c("v1", "v2", "v2*v3", "v1:v2")
becomes c("v1", "v2", "v3")
Set to NA outcome values that would be MNAR if they were missing (i.e. which occur after an ICE handled using a reference-based imputation strategy)
extract_data_nmar_as_na(longdata)
extract_data_nmar_as_na(longdata)
longdata |
R6 |
A data.frame
containing longdata$get_data(longdata$ids)
, but MNAR outcome
values are set to NA
.
stanfit
objectExtract draws from a stanfit
object and convert them into lists.
The function rstan::extract()
returns the draws for a given parameter as an array. This function
calls rstan::extract()
to extract the draws from a stanfit
object
and then convert the arrays into lists.
extract_draws(stan_fit)
extract_draws(stan_fit)
stan_fit |
A |
A named list of length 2 containing:
beta
: a list of length equal to the number of draws containing
the draws from the posterior distribution of the regression coefficients.
sigma
: a list of length equal to the number of draws containing
the draws from the posterior distribution of the covariance matrices. Each element
of the list is a list with length equal to 1 if same_cov = TRUE
or equal to the
number of groups if same_cov = FALSE
.
Takes an imputation object as generated by imputation_df()
and uses
this to extract a completed dataset from a longdata
object as created
by longDataConstructor()
. Also applies a delta transformation
if a data.frame
is provided to the delta
argument. See analyse()
for
details on the structure of this data.frame
.
Subject IDs in the returned data.frame
are scrambled i.e. are not the original
values.
extract_imputed_df(imputation, ld, delta = NULL, idmap = FALSE)
extract_imputed_df(imputation, ld, delta = NULL, idmap = FALSE)
imputation |
An imputation object as generated by |
ld |
A |
delta |
Either |
idmap |
Logical. If |
A data.frame
.
Extracts the imputed datasets contained within an imputations
object generated
by impute()
.
extract_imputed_dfs( imputations, index = seq_along(imputations$imputations), delta = NULL, idmap = FALSE )
extract_imputed_dfs( imputations, index = seq_along(imputations$imputations), delta = NULL, idmap = FALSE )
imputations |
An |
index |
The indexes of the imputed datasets to return. By default,
all datasets within the |
delta |
A |
idmap |
Logical. The subject IDs in the imputed |
A list of data.frames equal in length to the index
argument.
delta_template()
for creating delta data.frames.
## Not run: extract_imputed_dfs(imputeObj) extract_imputed_dfs(imputeObj, c(1:3)) ## End(Not run)
## Not run: extract_imputed_dfs(imputeObj) extract_imputed_dfs(imputeObj, c(1:3)) ## End(Not run)
Extracts the beta and sigma coefficients from an MMRM model created
by mmrm::mmrm()
.
extract_params(fit)
extract_params(fit)
fit |
an object created by |
fit_mcmc()
fits the base imputation model using a Bayesian approach.
This is done through a MCMC method that is implemented in stan
and is run by using the function rstan::sampling()
.
The function returns the draws from the posterior distribution of the model parameters
and the stanfit
object. Additionally it performs multiple diagnostics checks of the chain
and returns warnings in case of any detected issues.
fit_mcmc(designmat, outcome, group, subjid, visit, method, quiet = FALSE)
fit_mcmc(designmat, outcome, group, subjid, visit, method, quiet = FALSE)
designmat |
The design matrix of the fixed effects. |
outcome |
The response variable. Must be numeric. |
group |
Character vector containing the group variable. |
subjid |
Character vector containing the subjects IDs. |
visit |
Character vector containing the visit variable. |
method |
A |
quiet |
Specify whether the stan sampling log should be printed to the console. |
The Bayesian model assumes a multivariate normal likelihood function and weakly-informative priors for the model parameters: in particular, uniform priors are assumed for the regression coefficients and inverse-Wishart priors for the covariance matrices. The chain is initialized using the REML parameter estimates from MMRM as starting values.
The function performs the following steps:
Fit MMRM using a REML approach.
Prepare the input data for the MCMC fit as described in the data{}
block of the Stan file. See prepare_stan_data()
for details.
Run the MCMC according the input arguments and using as starting values the REML parameter estimates estimated at point 1.
Performs diagnostics checks of the MCMC. See check_mcmc()
for details.
Extract the draws from the model fit.
The chains perform method$n_samples
draws by keeping one every method$burn_between
iterations. Additionally
the first method$burn_in
iterations are discarded. The total number of iterations will
then be method$burn_in + method$burn_between*method$n_samples
.
The purpose of method$burn_in
is to ensure that the samples are drawn from the stationary
distribution of the Markov Chain.
The method$burn_between
aims to keep the draws uncorrelated each from other.
A named list composed by the following:
samples
: a named list containing the draws for each parameter. It corresponds to the output of extract_draws()
.
fit
: a stanfit
object.
Fits a MMRM model allowing for different covariance structures using mmrm::mmrm()
.
Returns a list
of key model parameters beta
, sigma
and an additional element failed
indicating whether or not the fit failed to converge. If the fit did fail to converge
beta
and sigma
will not be present.
fit_mmrm( designmat, outcome, subjid, visit, group, cov_struct = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), REML = TRUE, same_cov = TRUE )
fit_mmrm( designmat, outcome, subjid, visit, group, cov_struct = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), REML = TRUE, same_cov = TRUE )
designmat |
a |
outcome |
a numeric vector. The outcome value to be regressed on in the MMRM model. |
subjid |
a character / factor vector. The subject identifier used to link separate visits that belong to the same subject. |
visit |
a character / factor vector. Indicates which visit the outcome value occurred on. |
group |
a character / factor vector. Indicates which treatment group the patient belongs to. |
cov_struct |
a character value. Specifies which covariance structure to use. Must be one of |
REML |
logical. Specifies whether restricted maximum likelihood should be used |
same_cov |
logical. Used to specify if a shared or individual covariance matrix should be
used per |
Generate data for a single group
generate_data_single(pars_group, strategy_fun = NULL, distr_pars_ref = NULL)
generate_data_single(pars_group, strategy_fun = NULL, distr_pars_ref = NULL)
pars_group |
A |
strategy_fun |
Function implementing trajectories after the intercurrent event (ICE).
Must be one of |
distr_pars_ref |
Optional. Named list containing the simulation parameters of the reference arm. It contains the following elements:
|
A data.frame
containing the simulated data. It includes the following variables:
id
: Factor variable that specifies the id of each subject.
visit
: Factor variable that specifies the visit of each assessment. Visit 0
denotes
the baseline visit.
group
: Factor variable that specifies which treatment group each subject belongs to.
outcome_bl
: Numeric variable that specifies the baseline outcome.
outcome_noICE
: Numeric variable that specifies the longitudinal outcome assuming
no ICEs.
ind_ice1
: Binary variable that takes value 1
if the corresponding visit is
affected by ICE1 and 0
otherwise.
dropout_ice1
: Binary variable that takes value 1
if the corresponding visit is
affected by the drop-out following ICE1 and 0
otherwise.
ind_ice2
: Binary variable that takes value 1
if the corresponding visit is affected
by ICE2.
outcome
: Numeric variable that specifies the longitudinal outcome including ICE1, ICE2
and the intermittent missing values.
Function creates a Stack()
object and populated the stack with bootstrap
samples based upon method$n_samples
get_bootstrap_stack(longdata, method, stack = Stack$new())
get_bootstrap_stack(longdata, method, stack = Stack$new())
longdata |
A |
method |
A |
stack |
A |
Takes parameters for a multivariate normal distribution and observed values to calculate the conditional distribution for the unobserved values.
get_conditional_parameters(pars, values)
get_conditional_parameters(pars, values)
pars |
a |
values |
a vector of observed values to condition on, must be same length as |
A list with the conditional distribution parameters:
mu
- The conditional mean vector.
sigma
- The conditional covariance matrix.
This function creates the default delta template (1 row per subject per visit)
and extracts all the utility information that users need to define their own logic
for defining delta. See delta_template()
for full details.
get_delta_template(imputations)
get_delta_template(imputations)
imputations |
an imputations object created by |
Fit the base imputation model using a ML/REML approach on a given number of bootstrap samples as
specified by method$n_samples
. Returns the parameter estimates from the model fit.
get_draws_mle( longdata, method, sample_stack, n_target_samples, first_sample_orig, use_samp_ids, failure_limit = 0, ncores = 1, quiet = FALSE )
get_draws_mle( longdata, method, sample_stack, n_target_samples, first_sample_orig, use_samp_ids, failure_limit = 0, ncores = 1, quiet = FALSE )
longdata |
R6 |
method |
A |
sample_stack |
A stack object containing the subject ids to be used on each mmrm iteration. |
n_target_samples |
Number of samples needed to be created |
first_sample_orig |
Logical. If |
use_samp_ids |
Logical. If |
failure_limit |
Number of failed samples that are allowed before throwing an error |
ncores |
Number of processes to parallelise the job over |
quiet |
Logical, If |
This function takes a Stack
object which contains multiple lists of patient ids. The function
takes this Stack and pulls a set ids and then constructs a dataset just consisting of these
patients (i.e. potentially a bootstrap or a jackknife sample).
The function then fits a MMRM model to this dataset to create a sample object. The function
repeats this process until n_target_samples
have been reached. If more than failure_limit
samples fail to converge then the function throws an error.
After reaching the desired number of samples the function generates and returns a draws object.
A draws
object which is a named list containing the following:
data
: R6 longdata
object containing all relevant input data information.
method
: A method
object as generated by either method_bayes()
,
method_approxbayes()
or method_condmean()
.
samples
: list containing the estimated parameters of interest.
Each element of samples
is a named list containing the following:
ids
: vector of characters containing the ids of the subjects included in the original dataset.
beta
: numeric vector of estimated regression coefficients.
sigma
: list of estimated covariance matrices (one for each level of vars$group
).
theta
: numeric vector of transformed covariances.
failed
: Logical. TRUE
if the model fit failed.
ids_samp
: vector of characters containing the ids of the subjects included in the given sample.
fit
: if method_bayes()
is chosen, returns the MCMC Stan fit object. Otherwise NULL
.
n_failures
: absolute number of failures of the model fit.
Relevant only for method_condmean(type = "bootstrap")
, method_approxbayes()
and method_bmlmi()
.
formula
: fixed effects formula object used for the model specification.
stanfit
objectExtract the Effective Sample Size (ESS) from a stanfit
object
get_ESS(stan_fit)
get_ESS(stan_fit)
stan_fit |
A |
A named vector containing the ESS for each parameter of the model.
Compute pooled point estimates, standard error and degrees of freedom according to the Von Hippel and Bartlett formula for Bootstrapped Maximum Likelihood Multiple Imputation (BMLMI).
get_ests_bmlmi(ests, D)
get_ests_bmlmi(ests, D)
ests |
numeric vector containing estimates from the analysis of the imputed datasets. |
D |
numeric representing the number of imputations between each bootstrap sample in the BMLMI method. |
ests
must be provided in the following order: the firsts D elements are related to analyses from
random imputation of one bootstrap sample. The second set of D elements (i.e. from D+1 to 2*D)
are related to the second bootstrap sample and so on.
a list containing point estimate, standard error and degrees of freedom.
Von Hippel, Paul T and Bartlett, Jonathan W8. Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws. 2021
Simulate a realistic example dataset using simulate_data()
with hard-coded
values of all the input arguments.
get_example_data()
get_example_data()
get_example_data()
simulates a 1:1 randomized trial of
an active drug (intervention) versus placebo (control) with 100 subjects per
group and 6 post-baseline assessments (bi-monthly visits until 12 months).
One intercurrent event corresponding to treatment discontinuation is also simulated.
Specifically, data are simulated under the following assumptions:
The mean outcome trajectory in the placebo group increases linearly from 50 at baseline (visit 0) to 60 at visit 6, i.e. the slope is 10 points/year.
The mean outcome trajectory in the intervention group is identical to the placebo group up to visit 2. From visit 2 onward, the slope decreases by 50% to 5 points/year.
The covariance structure of the baseline and follow-up values in both groups is implied by a random intercept and slope model with a standard deviation of 5 for both the intercept and the slope, and a correlation of 0.25. In addition, an independent residual error with standard deviation 2.5 is added to each assessment.
The probability of study drug discontinuation after each visit is calculated according to a logistic model which depends on the observed outcome at that visit. Specifically, a visit-wise discontinuation probability of 2% and 3% in the control and intervention group, respectively, is specified in case the observed outcome is equal to 50 (the mean value at baseline). The odds of a discontinuation is simulated to increase by +10% for each +1 point increase of the observed outcome.
Study drug discontinuation is simulated to have no effect on the mean trajectory in the placebo group. In the intervention group, subjects who discontinue follow the slope of the mean trajectory from the placebo group from that time point onward. This is compatible with a copy increments in reference (CIR) assumption.
Study drop-out at the study drug discontinuation visit occurs with a probability of 50% leading to missing outcome data from that time point onward.
simulate_data()
, set_simul_pars()
Function creates a Stack()
object and populated the stack with jackknife
samples based upon
get_jackknife_stack(longdata, method, stack = Stack$new())
get_jackknife_stack(longdata, method, stack = Stack$new())
longdata |
A |
method |
A |
stack |
A |
get_mmrm_sample
fits the base imputation model using a ML/REML approach.
Returns the parameter estimates from the fit.
get_mmrm_sample(ids, longdata, method)
get_mmrm_sample(ids, longdata, method)
ids |
vector of characters containing the ids of the subjects. |
longdata |
R6 |
method |
A |
A named list of class sample_single
. It contains the following:
ids
vector of characters containing the ids of the subjects included in the original dataset.
beta
numeric vector of estimated regression coefficients.
sigma
list of estimated covariance matrices (one for each level of vars$group
).
theta
numeric vector of transformed covariances.
failed
logical. TRUE
if the model fit failed.
ids_samp
vector of characters containing the ids of the subjects included in the given sample.
Takes a design matrix with multiple rows per subject and returns a dataset
with 1 row per subject with a new column pgroup
indicating which group
the patient belongs to (based upon their missingness pattern and treatment group)
get_pattern_groups(ddat)
get_pattern_groups(ddat)
ddat |
a |
The column is_avail
must be a character or numeric 0
or 1
Takes a dataset of pattern information and creates a summary dataset of it with just 1 row per pattern
get_pattern_groups_unique(patterns)
get_pattern_groups_unique(patterns)
patterns |
A |
The column pgroup
must be a numeric vector indicating which pattern group the patient belongs to
The column pattern
must be a character string of 0
's or 1
's. It must be identical for all
rows within the same pgroup
The column group
must be a character / numeric vector indicating which covariance group the observation
belongs to. It must be identical within the same pgroup
Returns the elements expected to be contained in the analyse object depending on what analysis method was specified.
get_pool_components(x)
get_pool_components(x)
x |
Character name of the analysis method, must one of
either |
Takes patient level data and beta coefficients and expands them
to get a patient specific estimate for the visit distribution parameters
mu
and sigma
. Returns the values in a specific format
which is expected by downstream functions in the imputation process
(namely list(list(mu = ..., sigma = ...), list(mu = ..., sigma = ...))
).
get_visit_distribution_parameters(dat, beta, sigma)
get_visit_distribution_parameters(dat, beta, sigma)
dat |
Patient level dataset, must be 1 row per visit. Column order must be in the same order as beta. The number of columns must match the length of beta |
beta |
List of model beta coefficients. There should be 1 element for each sample
e.g. if there were 3 samples and the models each had 4 beta coefficients then this argument
should be of the form |
sigma |
List of sigma. Must have the same number of entries as |
Returns a list defining the imputation strategies to be used to create the multivariate normal distribution parameters by merging those of the source group and reference group per patient.
getStrategies(...)
getStrategies(...)
... |
User defined methods to be added to the return list. Input must be a function. |
By default Jump to Reference (JR), Copy Reference (CR), Copy Increments in Reference (CIR), Last Mean Carried Forward (LMCF) and Missing at Random (MAR) are defined.
The user can define their own strategy functions (or overwrite the pre-defined ones)
by specifying a named input to the function i.e. NEW = function(...) ...
.
Only exception is MAR which cannot be overwritten.
All user defined functions must take 3 inputs: pars_group
, pars_ref
and
index_mar
. pars_group
and pars_ref
are both lists with elements mu
and sigma
representing the multivariate normal distribution parameters for
the subject's current group and reference group respectively. index_mar
will be
a logical vector specifying which visits the subject met the MAR assumption
at. The function must return a list with elements mu
and sigma
. See the implementation
of strategy_JR()
for an example.
## Not run: getStrategies() getStrategies( NEW = function(pars_group, pars_ref, index_mar) code , JR = function(pars_group, pars_ref, index_mar) more_code ) ## End(Not run)
## Not run: getStrategies() getStrategies( NEW = function(pars_group, pars_ref, index_mar) code , JR = function(pars_group, pars_ref, index_mar) more_code ) ## End(Not run)
Utility function to see if an object has a particular class. Useful when we don't know how many other classes the object may have.
has_class(x, cls)
has_class(x, cls)
x |
the object we want to check the class of. |
cls |
the class we want to know if it has or not. |
TRUE
if the object has the class.
FALSE
if the object does not have the class.
A wrapper around if() else()
to prevent unexpected
interactions between ifelse()
and factor variables
ife(x, a, b)
ife(x, a, b)
x |
True / False |
a |
value to return if True |
b |
value to return if False |
By default ifelse()
will convert factor variables to their
numeric values which is often undesirable. This connivance
function avoids that problem
imputation_df
objectCreate a valid imputation_df
object
imputation_df(...)
imputation_df(...)
... |
a list of |
A container for multiple imputation_df's
imputation_list_df(...)
imputation_list_df(...)
... |
objects of class |
imputation_singles()
grouped by a single subjid IDA collection of imputation_singles()
grouped by a single subjid ID
imputation_list_single(imputations, D = 1)
imputation_list_single(imputations, D = 1)
imputations |
a list of |
D |
the number of repetitions that were performed which determines how many columns the imputation matrix should have This is a constructor function to create a The |
imputation_single
objectCreate a valid imputation_single
object
imputation_single(id, values)
imputation_single(id, values)
id |
a character string specifying the subject id. |
values |
a numeric vector indicating the imputed values. |
impute()
creates imputed datasets based upon the data and options specified in
the call to draws()
. One imputed dataset is created per each "sample" created by
draws()
.
impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() ) ## S3 method for class 'random' impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() ) ## S3 method for class 'condmean' impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() )
impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() ) ## S3 method for class 'random' impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() ) ## S3 method for class 'condmean' impute( draws, references = NULL, update_strategy = NULL, strategies = getStrategies() )
draws |
A |
references |
A named vector. Identifies the references to be used for reference-based
imputation methods. Should be of the form |
update_strategy |
An optional |
strategies |
A named list of functions. Defines the imputation functions to be used.
The names of the list should mirror the values specified in |
impute()
uses the imputation model parameter estimates, as generated by draws()
, to
first calculate the marginal (multivariate normal) distribution of a subject's longitudinal
outcome variable
depending on their covariate values.
For subjects with intercurrent events (ICEs) handled using non-MAR methods, this marginal distribution
is then updated depending on the time of the first visit affected by the ICE,
the chosen imputation strategy and the chosen reference group as described in Carpenter, Roger, and Kenward (2013) .
The subject's imputation distribution used for imputing missing values is then defined as
their marginal distribution conditional on their observed outcome values.
One dataset is being generated per set of parameter estimates provided by draws()
.
The exact manner in how missing values are imputed from this conditional imputation distribution depends
on the method
object that was provided to draws()
, in particular:
Bayes & Approximate Bayes: each imputed dataset contains 1 row per subject & visit from the original dataset with missing values imputed by taking a single random sample from the conditional imputation distribution.
Conditional Mean: each imputed dataset contains 1 row per subject & visit from the
bootstrapped or jackknife dataset that was used to generate the corresponding parameter
estimates in draws()
. Missing values are imputed by using the mean of the conditional
imputation distribution. Please note that the first imputed dataset refers to the conditional
mean imputation on the original dataset whereas all subsequent imputed datasets refer to
conditional mean imputations for bootstrap or jackknife samples, respectively, of the original data.
Bootstrapped Maximum Likelihood MI (BMLMI): it performs D
random imputations of each bootstrapped
dataset that was used to generate the corresponding parameter estimates in draws()
. A total number of
B*D
imputed datasets is provided, where B
is the number of bootstrapped datasets. Missing values
are imputed by taking a random sample from the conditional imputation distribution.
The update_strategy
argument can be used to update the imputation strategy that was
originally set via the data_ice
option in draws()
. This avoids having to re-run the draws()
function when changing the imputation strategy in certain circumstances (as detailed below).
The data.frame
provided to update_strategy
argument must contain two columns,
one for the subject ID and another for the imputation strategy, whose names are the same as
those defined in the vars
argument as specified in the call to draws()
. Please note that this
argument only allows you to update the imputation strategy and not other arguments such as the
time of the first visit affected by the ICE.
A key limitation of this functionality is
that one can only switch between a MAR and a non-MAR strategy (or vice versa) for subjects without
observed post-ICE data. The reason for this is that such a change would affect whether the post-ICE data is included
in the base imputation model or not (as explained in the help to draws()
).
As an example, if a subject had their ICE on "Visit 2"
but had observed/known values for "Visit 3" then the function will throw an error
if one tries to switch the strategy from MAR to a non-MAR strategy. In contrast, switching from
a non-MAR to a MAR strategy, whilst valid, will raise a warning as not all usable data
will have been utilised in the imputation model.
James R Carpenter, James H Roger, and Michael G Kenward. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics, 23(6):1352–1371, 2013. [Section 4.2 and 4.3]
## Not run: impute( draws = drawobj, references = c("Trt" = "Placebo", "Placebo" = "Placebo") ) new_strategy <- data.frame( subjid = c("Pt1", "Pt2"), strategy = c("MAR", "JR") ) impute( draws = drawobj, references = c("Trt" = "Placebo", "Placebo" = "Placebo"), update_strategy = new_strategy ) ## End(Not run)
## Not run: impute( draws = drawobj, references = c("Trt" = "Placebo", "Placebo" = "Placebo") ) new_strategy <- data.frame( subjid = c("Pt1", "Pt2"), strategy = c("MAR", "JR") ) impute( draws = drawobj, references = c("Trt" = "Placebo", "Placebo" = "Placebo"), update_strategy = new_strategy ) ## End(Not run)
This function performs the imputation for a single subject at a time implementing the
process as detailed in impute()
.
impute_data_individual( id, index, beta, sigma, data, references, strategies, condmean, n_imputations = 1 )
impute_data_individual( id, index, beta, sigma, data, references, strategies, condmean, n_imputations = 1 )
id |
Character string identifying the subject. |
index |
The sample indexes which the subject belongs to e.g |
beta |
A list of beta coefficients for each sample, i.e. |
sigma |
A list of the sigma coefficients for each sample split by group i.e.
|
data |
A |
references |
A named vector. Identifies the references to be used when generating the
imputed values. Should be of the form |
strategies |
A named list of functions. Defines the imputation functions to be used.
The names of the list should mirror the values specified in |
condmean |
Logical. If |
n_imputations |
When |
Note that this function performs all of the required imputations for a subject at the same time. I.e. if a subject is included in samples 1,3,5,9 then all imputations (using sample-dependent imputation model parameters) are performed in one step in order to avoid having to look up a subjects's covariates and expanding them to a design matrix multiple times (which would be more computationally expensive). The function also supports subject belonging to the same sample multiple times, i.e. 1,1,2,3,5,5, as will typically occur for bootstrapped datasets.
This is the work horse function that implements most of the functionality of impute.
See the user level function impute()
for further details.
impute_internal( draws, references = NULL, update_strategy, strategies, condmean )
impute_internal( draws, references = NULL, update_strategy, strategies, condmean )
draws |
A |
references |
A named vector. Identifies the references to be used for reference-based
imputation methods. Should be of the form |
update_strategy |
An optional |
strategies |
A named list of functions. Defines the imputation functions to be used.
The names of the list should mirror the values specified in |
condmean |
logical. If TRUE will impute using the conditional mean values, if values will impute by taking a random draw from the multivariate normal distribution. |
Draws a random sample from a multivariate normal distribution.
impute_outcome(conditional_parameters, n_imputations = 1, condmean = FALSE)
impute_outcome(conditional_parameters, n_imputations = 1, condmean = FALSE)
conditional_parameters |
a list with elements |
n_imputations |
numeric representing the number of random samples from the multivariate
normal distribution to be performed. Default is |
condmean |
should conditional mean imputation be performed (as opposed to random sampling) |
Utility function used to replicated purrr::transpose. Turns a list inside out.
invert(x)
invert(x)
x |
list |
Takes a list of elements and creates a new list containing 1 entry per unique element value containing the indexes of which original elements it occurred in.
invert_indexes(x)
invert_indexes(x)
x |
list of elements to invert and calculate index from (see details). |
This functions purpose is best illustrated by an example:
input:
list( c("A", "B", "C"), c("A", "A", "B"))}
becomes:
list( "A" = c(1,2,2), "B" = c(1,2), "C" = 1 )
Returns true if a value is either NULL, NA or "". In the case of a vector all values must be NULL/NA/"" for x to be regarded as absent.
is_absent(x, na = TRUE, blank = TRUE)
is_absent(x, na = TRUE, blank = TRUE)
x |
a value to check if it is absent or not |
na |
do NAs count as absent |
blank |
do blanks i.e. "" count as absent |
returns true if x is character or factor vector
is_char_fact(x)
is_char_fact(x)
x |
a character or factor vector |
returns true if x is a length 1 character vector
is_char_one(x)
is_char_one(x)
x |
a character vector |
Returns TRUE
if the package is being developed on i.e. you have a local copy of the
source code which you are actively editing
Returns FALSE
otherwise
is_in_rbmi_development()
is_in_rbmi_development()
Main use of this function is in parallel processing to indicate whether the sub-processes need to load the current development version of the code or whether they should load the main installed package on the system
returns true if x is a character, numeric or factor vector
is_num_char_fact(x)
is_num_char_fact(x)
x |
a character, numeric or factor vector |
Returns a vector after applied last observation carried forward imputation.
locf(x)
locf(x)
x |
a vector. |
## Not run: locf(c(NA, 1, 2, 3, NA, 4)) # Returns c(NA, 1, 2, 3, 3, 4) ## End(Not run)
## Not run: locf(c(NA, 1, 2, 3, NA, 4)) # Returns c(NA, 1, 2, 3, 3, 4) ## End(Not run)
A longdata
object allows for efficient storage and recall of longitudinal datasets for use in
bootstrap sampling. The object works by de-constructing the data into lists based upon subject id
thus enabling efficient lookup.
The object also handles multiple other operations specific to rbmi
such as defining whether an
outcome value is MAR / Missing or not as well as tracking which imputation strategy is assigned
to each subject.
It is recognised that this objects functionality is fairly overloaded and is hoped that this can be split out into more area specific objects / functions in the future. Further additions of functionality to this object should be avoided if possible.
data
The original dataset passed to the constructor (sorted by id and visit)
vars
The vars object (list of key variables) passed to the constructor
visits
A character vector containing the distinct visit levels
ids
A character vector containing the unique ids of each subject in self$data
formula
A formula expressing how the design matrix for the data should be constructed
strata
A numeric vector indicating which strata each corresponding value of
self$ids
belongs to.
If no stratification variable is defined this will default to 1 for all subjects
(i.e. same group).
This field is only used as part of the self$sample_ids()
function to enable
stratified bootstrap
sampling
ice_visit_index
A list indexed by subject storing the index number of the first visit affected by the ICE. If there is no ICE then it is set equal to the number of visits plus 1.
values
A list indexed by subject storing a numeric vector of the original (unimputed) outcome values
group
A list indexed by subject storing a single character
indicating which imputation group the subject belongs to as defined
by self$data[id, self$ivars$group]
It is used
to determine what reference group should be used when imputing the subjects data.
is_mar
A list indexed by subject storing logical values indicating
if the subjects outcome values are MAR or not.
This list is defaulted to TRUE for all subjects & outcomes and is then
modified by calls to self$set_strategies()
.
Note that this does not indicate which values are missing, this variable
is True for outcome values that either occurred before the ICE visit
or are post the ICE visit and have an imputation strategy of MAR
strategies
A list indexed by subject storing a single character
value indicating the imputation
strategy assigned to that subject. This list is defaulted to "MAR"
for all subjects and is then
modified by calls to either self$set_strategies()
or self$update_strategies()
strategy_lock
A list indexed by subject storing a single
logical value indicating whether a
patients imputation strategy is locked or not. If a strategy is
locked it means that it can't change
from MAR to non-MAR. Strategies can be changed from non-MAR to MAR though
this will trigger a warning.
Strategies are locked if the patient is assigned a MAR strategy and
has non-missing after their ICE date. This list is populated by a call to
self$set_strategies()
.
indexes
A list indexed by subject storing a numeric vector of
indexes which specify which rows in the
original dataset belong to this subject i.e. to recover the full data
for subject "pt3" you can use
self$data[self$indexes[["pt3"]],]
. This may seem redundant over filtering
the data directly
however it enables efficient bootstrap sampling of the data i.e.
indexes <- unlist(self$indexes[c("pt3", "pt3")]) self$data[indexes,]
This list is populated during the object initialisation.
is_missing
A list indexed by subject storing a logical vector indicating whether the corresponding outcome of a subject is missing. This list is populated during the object initialisation.
is_post_ice
A list indexed by subject storing a logical vector
indicating whether the corresponding
outcome of a subject is post the date of their ICE. If no ICE data has
been provided this defaults to False
for all observations. This list is populated by a call to self$set_strategies()
.
get_data()
Returns a data.frame
based upon required subject IDs. Replaces missing
values with new ones if provided.
longDataConstructor$get_data( obj = NULL, nmar.rm = FALSE, na.rm = FALSE, idmap = FALSE )
obj
Either NULL
, a character vector of subjects IDs or a
imputation list object. See details.
nmar.rm
Logical value. If TRUE
will remove observations that are
not regarded as MAR (as determined from self$is_mar
).
na.rm
Logical value. If TRUE
will remove outcome values that are
missing (as determined from self$is_missing
).
idmap
Logical value. If TRUE
will add an attribute idmap
which
contains a mapping from the new subject ids to the old subject ids. See details.
If obj
is NULL
then the full original dataset is returned.
If obj
is a character vector then a new dataset consisting of just those subjects is
returned; if the character vector contains duplicate entries then that subject will be
returned multiple times.
If obj
is an imputation_df
object (as created by imputation_df()
) then the
subject ids specified in the object will be returned and missing values will be filled
in by those specified in the imputation list object. i.e.
obj <- imputation_df( imputation_single( id = "pt1", values = c(1,2,3)), imputation_single( id = "pt1", values = c(4,5,6)), imputation_single( id = "pt3", values = c(7,8)) ) longdata$get_data(obj)
Will return a data.frame
consisting of all observations for pt1
twice and all of the
observations for pt3
once. The first set of observations for pt1
will have missing
values filled in with c(1,2,3)
and the second set will be filled in by c(4,5,6)
. The
length of the values must be equal to sum(self$is_missing[[id]])
.
If obj
is not NULL
then all subject IDs will be scrambled in order to ensure that they
are unique
i.e. If the pt2
is requested twice then this process guarantees that each set of observations
be have a unique subject ID number. The idmap
attribute (if requested) can be used
to map from the new ids back to the old ids.
A data.frame
.
add_subject()
This function decomposes a patient data from self$data
and populates
all the corresponding lists i.e. self$is_missing
, self$values
, self$group
, etc.
This function is only called upon the objects initialization.
longDataConstructor$add_subject(id)
id
Character subject id that exists within self$data
.
validate_ids()
Throws an error if any element of ids
is not within the source data self$data
.
longDataConstructor$validate_ids(ids)
ids
A character vector of ids.
TRUE
sample_ids()
Performs random stratified sampling of patient ids (with replacement) Each patient has an equal weight of being picked within their strata (i.e is not dependent on how many non-missing visits they had).
longDataConstructor$sample_ids()
Character vector of ids.
extract_by_id()
Returns a list of key information for a given subject. Is a convenience wrapper to save having to manually grab each element.
longDataConstructor$extract_by_id(id)
id
Character subject id that exists within self$data
.
update_strategies()
Convenience function to run self$set_strategies(dat_ice, update=TRUE) kept for legacy reasons.
longDataConstructor$update_strategies(dat_ice)
dat_ice
A data.frame
containing ICE information see impute()
for the format of this dataframe.
set_strategies()
Updates the self$strategies
, self$is_mar
, self$is_post_ice
variables based upon the provided ICE
information.
longDataConstructor$set_strategies(dat_ice = NULL, update = FALSE)
dat_ice
a data.frame
containing ICE information. See details.
update
Logical, indicates that the ICE data should be used as an update. See details.
See draws()
for the specification of dat_ice
if update=FALSE
.
See impute()
for the format of dat_ice
if update=TRUE
.
If update=TRUE
this function ensures that MAR strategies cannot be changed to non-MAR in the presence
of post-ICE observations.
check_has_data_at_each_visit()
Ensures that all visits have at least 1 observed "MAR" observation. Throws an error if this criteria is not met. This is to ensure that the initial MMRM can be resolved.
longDataConstructor$check_has_data_at_each_visit()
set_strata()
Populates the self$strata
variable. If the user has specified stratification variables
The first visit is used to determine the value of those variables. If no stratification variables
have been specified then everyone is defined as being in strata 1.
longDataConstructor$set_strata()
new()
Constructor function.
longDataConstructor$new(data, vars)
data
longitudinal dataset.
vars
an ivars
object created by set_vars()
.
clone()
The objects of this class are cloneable with this method.
longDataConstructor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Calculates the design vector as required to generate the lsmean
and standard error. ls_design_equal
calculates it by
applying an equal weight per covariate combination whilst
ls_design_proportional
applies weighting proportional
to the frequency in which the covariate combination occurred
in the actual dataset.
ls_design_equal(data, frm, fix) ls_design_counterfactual(data, frm, fix) ls_design_proportional(data, frm, fix)
ls_design_equal(data, frm, fix) ls_design_counterfactual(data, frm, fix) ls_design_proportional(data, frm, fix)
data |
A data.frame |
frm |
Formula used to fit the original model |
fix |
A named list of variables with fixed values |
Estimates the least square means from a linear model. The exact implementation / interpretation depends on the weighting scheme; see the weighting section for more information.
lsmeans( model, ..., .weights = c("counterfactual", "equal", "proportional_em", "proportional") )
lsmeans( model, ..., .weights = c("counterfactual", "equal", "proportional_em", "proportional") )
model |
A model created by |
... |
Fixes specific variables to specific values i.e.
|
.weights |
Character, either |
For weights = "counterfactual"
(the default) the lsmeans are obtained by
taking the average of the predicted values for each patient after assigning all patients
to each arm in turn.
This approach is equivalent to standardization or g-computation.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", counterfactual = "<treatment>")
Note that to ensure backwards compatibility with previous versions of rbmi
weights = "proportional"
is an alias for weights = "counterfactual"
.
To get results consistent with emmeans
's weights = "proportional"
please use weights = "proportional_em"
.
For weights = "equal"
the lsmeans are obtained by taking the model fitted
value of a hypothetical patient whose covariates are defined as follows:
Continuous covariates are set to mean(X)
Dummy categorical variables are set to 1/N
where N
is the number of levels
Continuous * continuous interactions are set to mean(X) * mean(Y)
Continuous * categorical interactions are set to mean(X) * 1/N
Dummy categorical * categorical interactions are set to 1/N * 1/M
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "equal")
For weights = "proportional_em"
the lsmeans are obtained as per weights = "equal"
except instead of weighting each observation equally they are weighted by the proportion
in which the given combination of categorical values occurred in the data.
In comparison to emmeans
this approach is equivalent to:
emmeans::emmeans(model, specs = "<treatment>", weights = "proportional")
Note that this is not to be confused with weights = "proportional"
which is an alias
for weights = "counterfactual"
.
Regardless of the weighting scheme any named arguments passed via ...
will
fix the value of the covariate to the specified value.
For example, lsmeans(model, trt = "A")
will fix the dummy variable trtA
to 1
for all patients (real or hypothetical) when calculating the lsmeans.
See the references for similar implementations as done in SAS and
in R via the emmeans
package.
https://CRAN.R-project.org/package=emmeans
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_glm_details41.htm
## Not run: mod <- lm(Sepal.Length ~ Species + Petal.Length, data = iris) lsmeans(mod) lsmeans(mod, Species = "virginica") lsmeans(mod, Species = "versicolor") lsmeans(mod, Species = "versicolor", Petal.Length = 1) ## End(Not run)
## Not run: mod <- lm(Sepal.Length ~ Species + Petal.Length, data = iris) lsmeans(mod) lsmeans(mod, Species = "virginica") lsmeans(mod, Species = "versicolor") lsmeans(mod, Species = "versicolor", Petal.Length = 1) ## End(Not run)
rbmi
ready clusterCreate a rbmi
ready cluster
make_rbmi_cluster(ncores = 1, objects = NULL, packages = NULL)
make_rbmi_cluster(ncores = 1, objects = NULL, packages = NULL)
ncores |
Number of parallel processes to use or an existing cluster to make use of |
objects |
a named list of objects to export into the sub-processes |
packages |
a character vector of libraries to load in the sub-processes This function is a wrapper around If If |
## Not run: # Basic usage make_rbmi_cluster(5) # User objects + libraries VALUE <- 5 myfun <- function(x) { x + day(VALUE) # From lubridate::day() } make_rbmi_cluster(5, list(VALUE = VALUE, myfun = myfun), c("lubridate")) # Using a already created cluster cl <- parallel::makeCluster(5) make_rbmi_cluster(cl) ## End(Not run)
## Not run: # Basic usage make_rbmi_cluster(5) # User objects + libraries VALUE <- 5 myfun <- function(x) { x + day(VALUE) # From lubridate::day() } make_rbmi_cluster(5, list(VALUE = VALUE, myfun = myfun), c("lubridate")) # Using a already created cluster cl <- parallel::makeCluster(5) make_rbmi_cluster(cl) ## End(Not run)
These functions determine what methods rbmi
should use when creating
the imputation models, generating imputed values and pooling the results.
method_bayes( burn_in = 200, burn_between = 50, same_cov = TRUE, n_samples = 20, seed = NULL ) method_approxbayes( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, n_samples = 20 ) method_condmean( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, n_samples = NULL, type = c("bootstrap", "jackknife") ) method_bmlmi( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, B = 20, D = 2 )
method_bayes( burn_in = 200, burn_between = 50, same_cov = TRUE, n_samples = 20, seed = NULL ) method_approxbayes( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, n_samples = 20 ) method_condmean( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, n_samples = NULL, type = c("bootstrap", "jackknife") ) method_bmlmi( covariance = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), threshold = 0.01, same_cov = TRUE, REML = TRUE, B = 20, D = 2 )
burn_in |
a numeric that specifies how many observations should be discarded prior to extracting actual samples. Note that the sampler is initialized at the maximum likelihood estimates and a weakly informative prior is used thus in theory this value should not need to be that high. |
burn_between |
a numeric that specifies the "thinning" rate i.e. how many observations should be discarded between each sample. This is used to prevent issues associated with autocorrelation between the samples. |
same_cov |
a logical, if |
n_samples |
a numeric that determines how many imputed datasets are generated.
In the case of |
seed |
deprecated. Please use |
covariance |
a character string that specifies the structure of the covariance
matrix to be used in the imputation model. Must be one of |
threshold |
a numeric between 0 and 1, specifies the proportion of bootstrap datasets that can fail to produce valid samples before an error is thrown. See details. |
REML |
a logical indicating whether to use REML estimation rather than maximum likelihood. |
type |
a character string that specifies the resampling method used to perform inference
when a conditional mean imputation approach (set via |
B |
a numeric that determines the number of bootstrap samples for |
D |
a numeric that determines the number of random imputations for each bootstrap sample.
Needed for |
In the case of method_condmean(type = "bootstrap")
there will be n_samples + 1
imputation models and datasets generated as the first sample will be based on
the original dataset whilst the other n_samples
samples will be
bootstrapped datasets. Likewise, for method_condmean(type = "jackknife")
there will
be length(unique(data$subjid)) + 1
imputation models and datasets generated. In both cases this is
represented by n + 1
being displayed in the print message.
The user is able to specify different covariance structures using the the covariance
argument. Currently supported structures include:
Unstructured ("us"
) (default)
Ante-dependence ("ad"
)
Heterogeneous ante-dependence ("adh"
)
First-order auto-regressive ("ar1"
)
Heterogeneous first-order auto-regressive ("ar1h"
)
Compound symmetry ("cs"
)
Heterogeneous compound symmetry ("csh"
)
Toeplitz ("toep"
)
Heterogeneous Toeplitz ("toeph"
)
For full details please see mmrm::cov_types()
.
Note that at present Bayesian methods only support unstructured.
In the case of method_condmean(type = "bootstrap")
, method_approxbayes()
and method_bmlmi()
repeated
bootstrap samples of the original dataset are taken with an MMRM fitted to each sample.
Due to the randomness of these sampled datasets, as well as limitations in the optimisers
used to fit the models, it is not uncommon that estimates for a particular dataset can't
be generated. In these instances rbmi
is designed to throw out that bootstrapped dataset
and try again with another. However to ensure that these errors are due to chance and
not due to some underlying misspecification in the data and/or model a tolerance limit
is set on how many samples can be discarded. Once the tolerance limit has been reached
an error will be thrown and the process aborted. The tolerance limit is defined as
ceiling(threshold * n_samples)
. Note that for the jackknife method estimates need to be
generated for all leave-one-out datasets and as such an error will be thrown if
any of them fail to fit.
Please note that at the time of writing (September 2021) Stan is unable to produce reproducible samples across different operating systems even when the same seed is used. As such care must be taken when using Stan across different machines. For more information on this limitation please consult the Stan documentation https://mc-stan.org/docs/2_27/reference-manual/reproducibility-chapter.html
Simple wrapper around lapply
and parallel::clusterApplyLB
to abstract away
the logic of deciding which one to use
par_lapply(cl, fun, x, ...)
par_lapply(cl, fun, x, ...)
cl |
Cluster created by |
fun |
Function to be run |
x |
object to be looped over |
... |
extra arguements passed to |
Calculates confidence intervals based upon a parametric distribution.
parametric_ci(point, se, alpha, alternative, qfun, pfun, ...)
parametric_ci(point, se, alpha, alternative, qfun, pfun, ...)
point |
The point estimate. |
se |
The standard error of the point estimate. If using a non-"normal" distribution this should be set to 1. |
alpha |
The type 1 error rate, should be a value between 0 and 1. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". |
qfun |
The quantile function for the assumed distribution i.e. |
pfun |
The CDF function for the assumed distribution i.e. |
... |
additional arguments passed on |
Pool analysis results obtained from the imputed datasets
pool( results, conf.level = 0.95, alternative = c("two.sided", "less", "greater"), type = c("percentile", "normal") ) ## S3 method for class 'pool' as.data.frame(x, ...) ## S3 method for class 'pool' print(x, ...)
pool( results, conf.level = 0.95, alternative = c("two.sided", "less", "greater"), type = c("percentile", "normal") ) ## S3 method for class 'pool' as.data.frame(x, ...) ## S3 method for class 'pool' print(x, ...)
results |
an analysis object created by |
conf.level |
confidence level of the returned confidence interval. Must be a single number between 0 and 1. Default is 0.95. |
alternative |
a character string specifying the alternative hypothesis,
must be one of |
type |
a character string of either |
x |
a |
... |
not used. |
The calculation used to generate the point estimate, standard errors and
confidence interval depends upon the method specified in the original
call to draws()
; In particular:
method_approxbayes()
& method_bayes()
both use Rubin's rules to pool estimates
and variances across multiple imputed datasets, and the Barnard-Rubin rule to pool
degree's of freedom; see Little & Rubin (2002).
method_condmean(type = "bootstrap")
uses percentile or normal approximation;
see Efron & Tibshirani (1994). Note that for the percentile bootstrap, no standard error is
calculated, i.e. the standard errors will be NA
in the object / data.frame
.
method_condmean(type = "jackknife")
uses the standard jackknife variance formula;
see Efron & Tibshirani (1994).
method_bmlmi
uses pooling procedure for Bootstrapped Maximum Likelihood MI (BMLMI).
See Von Hippel & Bartlett (2021).
Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994. [Section 11]
Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Hoboken, New Jersey, 2002. [Section 5.4]
Von Hippel, Paul T and Bartlett, Jonathan W. Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws. 2021.
Get point estimate, confidence interval and p-value using the normal approximation.
pool_bootstrap_normal(est, conf.level, alternative)
pool_bootstrap_normal(est, conf.level, alternative)
est |
a numeric vector of point estimates from each bootstrap sample. |
conf.level |
confidence level of the returned confidence interval. Must be a single number between 0 and 1. Default is 0.95. |
alternative |
a character string specifying the alternative hypothesis,
must be one of |
The point estimate is taken to be the first element of est. The remaining n-1 values of est are then used to generate the confidence intervals.
Get point estimate, confidence interval and p-value using
percentiles. Note that quantile "type=6" is used,
see stats::quantile()
for details.
pool_bootstrap_percentile(est, conf.level, alternative)
pool_bootstrap_percentile(est, conf.level, alternative)
est |
a numeric vector of point estimates from each bootstrap sample. |
conf.level |
confidence level of the returned confidence interval. Must be a single number between 0 and 1. Default is 0.95. |
alternative |
a character string specifying the alternative hypothesis,
must be one of |
The point estimate is taken to be the first element of est
. The remaining
n-1 values of est
are then used to generate the confidence intervals.
Dispatches pool methods based upon results object class. See
pool()
for details.
pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'jackknife' pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'bootstrap' pool_internal( results, conf.level, alternative, type = c("percentile", "normal"), D ) ## S3 method for class 'bmlmi' pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'rubin' pool_internal(results, conf.level, alternative, type, D)
pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'jackknife' pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'bootstrap' pool_internal( results, conf.level, alternative, type = c("percentile", "normal"), D ) ## S3 method for class 'bmlmi' pool_internal(results, conf.level, alternative, type, D) ## S3 method for class 'rubin' pool_internal(results, conf.level, alternative, type, D)
results |
a list of results i.e. the |
conf.level |
confidence level of the returned confidence interval. Must be a single number between 0 and 1. Default is 0.95. |
alternative |
a character string specifying the alternative hypothesis,
must be one of |
type |
a character string of either |
D |
numeric representing the number of imputations between each bootstrap sample in the BMLMI method. |
Prepare input data to run the Stan model.
Creates / calculates all the required inputs as required by the data{}
block of the MMRM Stan program.
prepare_stan_data(ddat, subjid, visit, outcome, group)
prepare_stan_data(ddat, subjid, visit, outcome, group)
ddat |
A design matrix |
subjid |
Character vector containing the subjects IDs. |
visit |
Vector containing the visits. |
outcome |
Numeric vector containing the outcome variable. |
group |
Vector containing the group variable. |
The group
argument determines which covariance matrix group the subject belongs to. If you
want all subjects to use a shared covariance matrix then set group to "1" for everyone.
A stan_data
object. A named list as per data{}
block of the related Stan file. In particular it returns:
N - The number of rows in the design matrix
P - The number of columns in the design matrix
G - The number of distinct covariance matrix groups (i.e. length(unique(group))
)
n_visit - The number of unique outcome visits
n_pat - The total number of pattern groups (as defined by missingness patterns & covariance group)
pat_G - Index for which Sigma each pattern group should use
pat_n_pt - number of patients within each pattern group
pat_n_visit - number of non-missing visits in each pattern group
pat_sigma_index - rows/cols from Sigma to subset on for the pattern group (padded by 0's)
y - The outcome variable
Q - design matrix (after QR decomposition)
R - R matrix from the QR decomposition of the design matrix
analysis
objectPrint analysis
object
## S3 method for class 'analysis' print(x, ...)
## S3 method for class 'analysis' print(x, ...)
x |
An |
... |
Not used. |
draws
objectPrint draws
object
## S3 method for class 'draws' print(x, ...)
## S3 method for class 'draws' print(x, ...)
x |
A |
... |
not used. |
imputation
objectPrint imputation
object
## S3 method for class 'imputation' print(x, ...)
## S3 method for class 'imputation' print(x, ...)
x |
An |
... |
Not used. |
Object is initalised with total number of iterations that are expected to occur.
User can then update the object with the add
method to indicate how many more iterations
have just occurred.
Every time step
* 100 % of iterations have occurred a message is printed to the console.
Use the quiet
argument to prevent the object from printing anything at all
step
real, percentage of iterations to allow before printing the progress to the console
step_current
integer, the total number of iterations completed since progress was last printed to the console
n
integer, the current number of completed iterations
n_max
integer, total number of expected iterations to be completed acts as the denominator for calculating progress percentages
quiet
logical holds whether or not to print anything
new()
Create progressLogger object
progressLogger$new(n_max, quiet = FALSE, step = 0.1)
n_max
integer, sets field n_max
quiet
logical, sets field quiet
step
real, sets field step
add()
Records that n
more iterations have been completed
this will add that number to the current step count (step_current
) and will
print a progress message to the log if the step limit (step
) has been reached.
This function will do nothing if quiet
has been set to TRUE
progressLogger$add(n)
n
the number of successfully complete iterations since add()
was last called
print_progress()
method to print the current state of progress
progressLogger$print_progress()
clone()
The objects of this class are cloneable with this method.
progressLogger$clone(deep = FALSE)
deep
Whether to make a deep clone.
Determines the (not necessarily unique) quantile (type=6) of "est" which gives a value of 0 From this, derive the p-value corresponding to the percentile bootstrap via inversion.
pval_percentile(est)
pval_percentile(est)
est |
a numeric vector of point estimates from each bootstrap sample. |
The p-value for H_0: theta=0 vs H_A: theta>0 is the value alpha
for which q_alpha = 0
.
If there is at least one estimate equal to zero it returns the largest alpha
such that q_alpha = 0
.
If all bootstrap estimates are > 0 it returns 0; if all bootstrap estimates are < 0 it returns 1. Analogous
reasoning is applied for the p-value for H_0: theta=0 vs H_A: theta<0.
A named numeric vector of length 2 containing the p-value for H_0: theta=0 vs H_A: theta>0
("pval_greater"
) and the p-value for H_0: theta=0 vs H_A: theta<0 ("pval_less"
).
QR decomposition as defined in the Stan user's guide (section 1.2).
QR_decomp(mat)
QR_decomp(mat)
mat |
A matrix to perform the QR decomposition on. |
Constructs a character representation of the random effects formula
for fitting a MMRM for subject by visit in the format required for mmrm::mmrm()
.
random_effects_expr( cov_struct = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), cov_by_group = FALSE )
random_effects_expr( cov_struct = c("us", "ad", "adh", "ar1", "ar1h", "cs", "csh", "toep", "toeph"), cov_by_group = FALSE )
cov_struct |
Character - The covariance structure to be used, must be one of |
cov_by_group |
Boolean - Whenever or not to use separate covariances per each group level |
For example assuming the user specified a covariance structure of "us" and that no groups were provided this will return
us(visit | subjid)
If cov_by_group
is set to FALSE
then this indicates that separate covariance matrices
are required per group and as such the following will be returned:
us( visit | group / subjid )
Define settings that modify the behaviour of the rbmi
package
Each of the following are the name of options that can be set via:
options(<option_name> = <value>)
rbmi.cache_dir
Default = tools::R_user_dir("rbmi", which = "cache")
Directory to store compiled Stan models in to avoid having to re-compile.
If the environment variable RBMI_CACHE_DIR
has been set this will be used
as the default value.
Note that if you are running rbmi in multiple R processes at the same time
(that is say multiple calls to Rscript
at once) then there is a theoretical
risk of the processes breaking each other as they attempt to read/write to the
same cache folder at the same time. To avoid this potential issue it is recommended
to set the cache directory to a unique folder for each R session e.g.
options("rbmi.cache_dir" = tempdir(check = TRUE))
set_options()
set_options()
## Not run: options(rbmi.cache_dir = "some/directory/path") ## End(Not run)
## Not run: options(rbmi.cache_dir = "some/directory/path") ## End(Not run)
This function silences all warnings, errors & messages and instead returns a list containing the results (if it didn't error) + the warning and error messages as character vectors.
record(expr)
record(expr)
expr |
An expression to be executed |
A list containing
results - The object returned by expr
or list()
if an error was thrown
warnings - NULL or a character vector if warnings were thrown
errors - NULL or a string if an error was thrown
messages - NULL or a character vector if messages were produced
## Not run: record({ x <- 1 y <- 2 warning("something went wrong") message("O nearly done") x + y }) ## End(Not run)
## Not run: record({ x <- 1 y <- 2 warning("something went wrong") message("O nearly done") x + y }) ## End(Not run)
Utility function used to replicated purrr::reduce. Recursively applies a function to a list of elements until only 1 element remains
recursive_reduce(.l, .f)
recursive_reduce(.l, .f)
.l |
list of values to apply a function to |
.f |
function to apply to each each element of the list in turn
i.e. |
This function takes a data.frame
with variables visit
, outcome
& subjid
.
It then removes all rows for a given subjid
if they don't have any non-missing
values for outcome
.
remove_if_all_missing(dat)
remove_if_all_missing(dat)
dat |
a |
Compute degrees of freedom according to the Barnard-Rubin formula.
rubin_df(v_com, var_b, var_t, M)
rubin_df(v_com, var_b, var_t, M)
v_com |
Positive number representing the degrees of freedom in the complete-data analysis. |
var_b |
Between-variance of point estimate across multiply imputed datasets. |
var_t |
Total-variance of point estimate according to Rubin's rules. |
M |
Number of imputations. |
The computation takes into account limit cases where there is no missing data
(i.e. the between-variance var_b
is zero) or where the complete-data degrees of freedom is
set to Inf
. Moreover, if v_com
is given as NA
, the function returns Inf
.
Degrees of freedom according to Barnard-Rubin formula. See Barnard-Rubin (1999).
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
Pool together the results from M
complete-data analyses according to Rubin's rules. See details.
rubin_rules(ests, ses, v_com)
rubin_rules(ests, ses, v_com)
ests |
Numeric vector containing the point estimates from the complete-data analyses. |
ses |
Numeric vector containing the standard errors from the complete-data analyses. |
v_com |
Positive number representing the degrees of freedom in the complete-data analysis. |
rubin_rules
applies Rubin's rules (Rubin, 1987) for pooling together
the results from a multiple imputation procedure. The pooled point estimate est_point
is
is the average across the point estimates from the complete-data analyses (given by the input argument ests
).
The total variance var_t
is the sum of two terms representing the within-variance
and the between-variance (see Little-Rubin (2002)). The function
also returns df
, the estimated pooled degrees of freedom according to Barnard-Rubin (1999)
that can be used for inference based on the t-distribution.
A list containing:
est_point
: the pooled point estimate according to Little-Rubin (2002).
var_t
: total variance according to Little-Rubin (2002).
df
: degrees of freedom according to Barnard-Rubin (1999).
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955
Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Hoboken, New Jersey, 2002. [Section 5.4]
rubin_df()
for the degrees of freedom estimation.
Performs a stratified bootstrap sample of IDS ensuring the return vector is the same length as the input vector
sample_ids(ids, strata = rep(1, length(ids)))
sample_ids(ids, strata = rep(1, length(ids)))
ids |
vector to sample from |
strata |
strata indicator, ids are sampled within each strata ensuring the that the numbers of each strata are maintained |
## Not run: sample_ids( c("a", "b", "c", "d"), strata = c(1,1,2,2)) ## End(Not run)
## Not run: sample_ids( c("a", "b", "c", "d"), strata = c(1,1,2,2)) ## End(Not run)
sample_list
objectGiven a list of sample_single
objects generate by sample_single()
,
creates a sample_list
objects and validate it.
sample_list(...)
sample_list(...)
... |
A list of |
Sample random values from the multivariate normal distribution
sample_mvnorm(mu, sigma)
sample_mvnorm(mu, sigma)
mu |
mean vector |
sigma |
covariance matrix Samples multivariate normal variables by multiplying univariate random normal variables by the cholesky decomposition of the covariance matrix. If mu is length 1 then just uses rnorm instead. |
sample_single
classCreates an object of class sample_single
which is a named list
containing the input parameters and validate them.
sample_single( ids, beta = NA, sigma = NA, theta = NA, failed = any(is.na(beta)), ids_samp = ids )
sample_single( ids, beta = NA, sigma = NA, theta = NA, failed = any(is.na(beta)), ids_samp = ids )
ids |
Vector of characters containing the ids of the subjects included in the original dataset. |
beta |
Numeric vector of estimated regression coefficients. |
sigma |
List of estimated covariance matrices (one for each level of |
theta |
Numeric vector of transformed covariances. |
failed |
Logical. |
ids_samp |
Vector of characters containing the ids of the subjects included in the given sample. |
A named list of class sample_single
. It contains the following:
ids
vector of characters containing the ids of the subjects included in the original dataset.
beta
numeric vector of estimated regression coefficients.
sigma
list of estimated covariance matrices (one for each level of vars$group
).
theta
numeric vector of transformed covariances.
failed
logical. TRUE
if the model fit failed.
ids_samp
vector of characters containing the ids of the subjects included in the given sample.
Scales a design matrix so that all non-categorical columns have a mean of 0 and an standard deviation of 1.
The object initialisation is used to determine the relevant mean and SD's to scale by and then the scaling (and un-scaling) itself is performed by the relevant object methods.
Un-scaling is done on linear model Beta and Sigma coefficients. For this purpose the first column on the dataset to be scaled is assumed to be the outcome variable with all other variables assumed to be post-transformation predictor variables (i.e. all dummy variables have already been expanded).
centre
Vector of column means. The first value is the outcome variable, all other variables are the predictors.
scales
Vector of column standard deviations. The first value is the outcome variable, all other variables are the predictors.
new()
Uses dat
to determine the relevant column means and standard deviations to use
when scaling and un-scaling future datasets. Implicitly assumes that new datasets
have the same column order as dat
scalerConstructor$new(dat)
dat
A data.frame
or matrix. All columns must be numeric (i.e dummy variables,
must have already been expanded out).
Categorical columns (as determined by those who's values are entirely 1
or 0
)
will not be scaled. This is achieved by setting the corresponding values of centre
to 0
and scale to 1
.
scale()
Scales a dataset so that all continuous variables have a mean of 0 and a standard deviation of 1.
scalerConstructor$scale(dat)
dat
A data.frame
or matrix whose columns are all numeric (i.e. dummy
variables have all been expanded out) and whose columns are in the same
order as the dataset used in the initialization function.
unscale_sigma()
Unscales a sigma value (or matrix) as estimated by a linear model
using a design matrix scaled by this object. This function only
works if the first column of the initialisation data.frame
was the outcome
variable.
scalerConstructor$unscale_sigma(sigma)
sigma
A numeric value or matrix.
A numeric value or matrix
unscale_beta()
Unscales a beta value (or vector) as estimated by a linear model
using a design matrix scaled by this object. This function only
works if the first column of the initialization data.frame
was the outcome
variable.
scalerConstructor$unscale_beta(beta)
beta
A numeric vector of beta coefficients as estimated from a linear model.
A numeric vector.
clone()
The objects of this class are cloneable with this method.
scalerConstructor$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function provides input arguments for each study group needed to
simulate data with simulate_data()
. simulate_data()
generates data for a two-arms
clinical trial with longitudinal continuous outcomes and two intercurrent events (ICEs).
ICE1 may be thought of as a discontinuation from study treatment due to study drug or
condition related (SDCR) reasons. ICE2 may be thought of as discontinuation from study
treatment due to uninformative study drop-out, i.e. due to not study drug or
condition related (NSDRC) reasons and outcome data after ICE2 is always missing.
set_simul_pars( mu, sigma, n, prob_ice1 = 0, or_outcome_ice1 = 1, prob_post_ice1_dropout = 0, prob_ice2 = 0, prob_miss = 0 )
set_simul_pars( mu, sigma, n, prob_ice1 = 0, or_outcome_ice1 = 1, prob_post_ice1_dropout = 0, prob_ice2 = 0, prob_miss = 0 )
mu |
Numeric vector describing the mean outcome trajectory at each visit (including baseline) assuming no ICEs. |
sigma |
Covariance matrix of the outcome trajectory assuming no ICEs. |
n |
Number of subjects belonging to the group. |
prob_ice1 |
Numeric vector that specifies the probability of experiencing ICE1
(discontinuation from study treatment due to SDCR reasons) after each visit for a subject
with observed outcome at that visit equal to the mean at baseline ( |
or_outcome_ice1 |
Numeric value that specifies the odds ratio of experiencing ICE1 after each visit corresponding to a +1 higher value of the observed outcome at that visit. |
prob_post_ice1_dropout |
Numeric value that specifies the probability of study drop-out following ICE1. If a subject is simulated to drop-out after ICE1, all outcomes after ICE1 are set to missing. |
prob_ice2 |
Numeric that specifies an additional probability that a post-baseline visit is affected by study drop-out. Outcome data at the subject's first simulated visit affected by study drop-out and all subsequent visits are set to missing. This generates a second intercurrent event ICE2, which may be thought as treatment discontinuation due to NSDRC reasons with subsequent drop-out. If for a subject, both ICE1 and ICE2 are simulated to occur, then it is assumed that only the earlier of them counts. In case both ICEs are simulated to occur at the same time, it is assumed that ICE1 counts. This means that a single subject can experience either ICE1 or ICE2, but not both of them. |
prob_miss |
Numeric value that specifies an additional probability for a given post-baseline observation to be missing. This can be used to produce "intermittent" missing values which are not associated with any ICE. |
For the details, please see simulate_data()
.
A simul_pars
object which is a named list containing the simulation parameters.
This function is used to define the names of key variables within the data.frame
's
that are provided as input arguments to draws()
and ancova()
.
set_vars( subjid = "subjid", visit = "visit", outcome = "outcome", group = "group", covariates = character(0), strata = group, strategy = "strategy" )
set_vars( subjid = "subjid", visit = "visit", outcome = "outcome", group = "group", covariates = character(0), strata = group, strategy = "strategy" )
subjid |
The name of the "Subject ID" variable. A length 1 character vector. |
visit |
The name of the "Visit" variable. A length 1 character vector. |
outcome |
The name of the "Outcome" variable. A length 1 character vector. |
group |
The name of the "Group" variable. A length 1 character vector. |
covariates |
The name of any covariates to be used in the context of modeling. See details. |
strata |
The name of the any stratification variable to be used in the context of bootstrap sampling. See details. |
strategy |
The name of the "strategy" variable. A length 1 character vector. |
In both draws()
and ancova()
the covariates
argument can be specified to indicate
which variables should be included in the imputation and analysis models respectively. If you wish
to include interaction terms these need to be manually specified i.e.
covariates = c("group*visit", "age*sex")
. Please note that the use of the I()
function to
inhibit the interpretation/conversion of objects is not supported.
Currently strata
is only used by draws()
in combination with method_condmean(type = "bootstrap")
and method_approxbayes()
in order to allow for the specification of stratified bootstrap sampling.
By default strata
is set equal to the value of group
as it is assumed most users will want to
preserve the group size between samples. See draws()
for more details.
Likewise, currently the strategy
argument is only used by draws()
to specify the name of the
strategy variable within the data_ice
data.frame. See draws()
for more details.
## Not run: # Using CDISC variable names as an example set_vars( subjid = "usubjid", visit = "avisit", outcome = "aval", group = "arm", covariates = c("bwt", "bht", "arm * avisit"), strategy = "strat" ) ## End(Not run)
## Not run: # Using CDISC variable names as an example set_vars( subjid = "usubjid", visit = "avisit", outcome = "aval", group = "arm", covariates = c("bwt", "bht", "arm * avisit"), strategy = "strat" ) ## End(Not run)
Generate data for a two-arms clinical trial with longitudinal continuous outcome and two intercurrent events (ICEs). ICE1 may be thought of as a discontinuation from study treatment due to study drug or condition related (SDCR) reasons. ICE2 may be thought of as discontinuation from study treatment due to uninformative study drop-out, i.e. due to not study drug or condition related (NSDRC) reasons and outcome data after ICE2 is always missing.
simulate_data(pars_c, pars_t, post_ice1_traj, strategies = getStrategies())
simulate_data(pars_c, pars_t, post_ice1_traj, strategies = getStrategies())
pars_c |
A |
pars_t |
A |
post_ice1_traj |
A string which specifies how observed outcomes occurring after
ICE1 are simulated.
Must target a function included in |
strategies |
A named list of functions. Default equal to |
The data generation works as follows:
Generate outcome data for all visits (including baseline) from a multivariate
normal distribution with parameters pars_c$mu
and pars_c$sigma
for the control arm and parameters pars_t$mu
and pars_t$sigma
for the treatment
arm, respectively.
Note that for a randomized trial, outcomes have the same distribution at baseline
in both treatment groups, i.e. one should set
pars_c$mu[1]=pars_t$mu[1]
and pars_c$sigma[1,1]=pars_t$sigma[1,1]
.
Simulate whether ICE1 (study treatment discontinuation due to SDCR reasons) occurs
after each visit according to parameters pars_c$prob_ice1
and pars_c$or_outcome_ice1
for the control arm and pars_t$prob_ice1
and pars_t$or_outcome_ice1
for the
treatment arm, respectively.
Simulate drop-out following ICE1 according to pars_c$prob_post_ice1_dropout
and
pars_t$prob_post_ice1_dropout
.
Simulate an additional uninformative study drop-out with probabilities pars_c$prob_ice2
and pars_t$prob_ice2
at each visit. This generates a second intercurrent event ICE2, which
may be thought as treatment discontinuation due to NSDRC reasons with subsequent drop-out.
The simulated time of drop-out is the subject's first visit which is affected by
drop-out and data from this visit and all subsequent visits are consequently set to missing.
If for a subject, both ICE1 and ICE2 are simulated to occur,
then it is assumed that only the earlier of them counts.
In case both ICEs are simulated to occur at the same time, it is assumed that ICE1 counts.
This means that a single subject can experience either ICE1 or ICE2, but not both of them.
Adjust trajectories after ICE1 according to the given assumption expressed with
the post_ice1_traj
argument. Note that only post-ICE1 outcomes in the intervention arm can be
adjusted. Post-ICE1 outcomes from the control arm are not adjusted.
Simulate additional intermittent missing outcome data as per arguments pars_c$prob_miss
and pars_t$prob_miss
.
The probability of the ICE after each visit is modeled according to the following
logistic regression model:
~ 1 + I(visit == 0) + ... + I(visit == n_visits-1) + I((x-alpha))
where:
n_visits
is the number of visits (including baseline).
alpha
is the baseline outcome mean.
The term I((x-alpha))
specifies the dependency of the probability of the ICE on
the current outcome value.
The corresponding regression coefficients of the logistic model are defined as follows:
The intercept is set to 0, the coefficients corresponding to discontinuation after
each visit for a subject with outcome equal to
the mean at baseline are set according to parameters pars_c$prob_ice1
(pars_t$prob_ice1
),
and the regression coefficient associated with the covariate I((x-alpha))
is set
to log(pars_c$or_outcome_ice1)
(log(pars_t$or_outcome_ice1)
).
Please note that the baseline outcome cannot be missing nor be affected by any ICEs.
A data.frame
containing the simulated data. It includes the following variables:
id
: Factor variable that specifies the id of each subject.
visit
: Factor variable that specifies the visit of each assessment. Visit 0
denotes
the baseline visit.
group
: Factor variable that specifies which treatment group each subject belongs to.
outcome_bl
: Numeric variable that specifies the baseline outcome.
outcome_noICE
: Numeric variable that specifies the longitudinal outcome assuming
no ICEs.
ind_ice1
: Binary variable that takes value 1
if the corresponding visit is
affected by ICE1 and 0
otherwise.
dropout_ice1
: Binary variable that takes value 1
if the corresponding visit is
affected by the drop-out following ICE1 and 0
otherwise.
ind_ice2
: Binary variable that takes value 1
if the corresponding visit is affected
by ICE2.
outcome
: Numeric variable that specifies the longitudinal outcome including ICE1, ICE2
and the intermittent missing values.
Simulate drop-out
simulate_dropout(prob_dropout, ids, subset = rep(1, length(ids)))
simulate_dropout(prob_dropout, ids, subset = rep(1, length(ids)))
prob_dropout |
Numeric that specifies the probability that a post-baseline visit is affected by study drop-out. |
ids |
Factor variable that specifies the id of each subject. |
subset |
Binary variable that specifies the subset that could be affected by drop-out.
I.e. |
subset
can be used to specify outcome values that cannot be affected by the
drop-out. By default
subset
will be set to 1
for all the values except the values corresponding to the
baseline outcome, since baseline is supposed to not be affected by drop-out.
Even if subset
is specified by the user, the values corresponding to the baseline
outcome are still hard-coded to be 0
.
A binary vector of length equal to the length of ids
that takes value 1
if the
corresponding outcome is
affected by study drop-out.
Simulate intercurrent event
simulate_ice(outcome, visits, ids, prob_ice, or_outcome_ice, baseline_mean)
simulate_ice(outcome, visits, ids, prob_ice, or_outcome_ice, baseline_mean)
outcome |
Numeric variable that specifies the longitudinal outcome for a single group. |
visits |
Factor variable that specifies the visit of each assessment. |
ids |
Factor variable that specifies the id of each subject. |
prob_ice |
Numeric vector that specifies for each visit the probability of experiencing the ICE after the current visit for a subject with outcome equal to the mean at baseline. If a single numeric is provided, then the same probability is applied to each visit. |
or_outcome_ice |
Numeric value that specifies the odds ratio of the ICE corresponding to a +1 higher value of the outcome at the visit. |
baseline_mean |
Mean outcome value at baseline. |
The probability of the ICE after each visit is modeled according to the following
logistic regression model:
~ 1 + I(visit == 0) + ... + I(visit == n_visits-1) + I((x-alpha))
where:
n_visits
is the number of visits (including baseline).
alpha
is the baseline outcome mean set via argument baseline_mean
.
The term I((x-alpha))
specifies the dependency of the probability of the ICE on the current
outcome value.
The corresponding regression coefficients of the logistic model are defined as follows:
The intercept is set to 0, the coefficients corresponding to discontinuation after each visit
for a subject with outcome equal to
the mean at baseline are set according to parameter or_outcome_ice
,
and the regression coefficient associated with the covariate I((x-alpha))
is set to
log(or_outcome_ice)
.
A binary variable that takes value 1
if the corresponding outcome is affected
by the ICE and 0
otherwise.
Creates a longitudinal dataset in the format that rbmi
was
designed to analyse.
simulate_test_data( n = 200, sd = c(3, 5, 7), cor = c(0.1, 0.7, 0.4), mu = list(int = 10, age = 3, sex = 2, trt = c(0, 4, 8), visit = c(0, 1, 2)) ) as_vcov(sd, cor)
simulate_test_data( n = 200, sd = c(3, 5, 7), cor = c(0.1, 0.7, 0.4), mu = list(int = 10, age = 3, sex = 2, trt = c(0, 4, 8), visit = c(0, 1, 2)) ) as_vcov(sd, cor)
n |
the number of subjects to sample. Total number of observations returned
is thus |
sd |
the standard deviations for the outcome at each visit. i.e. the square root of the diagonal of the covariance matrix for the outcome |
cor |
the correlation coefficients between the outcome values at each visit. See details. |
mu |
the coefficients to use to construct the mean outcome value at each visit. Must
be a named list with elements |
The number of visits is determined by the size of the variance covariance matrix. i.e. if 3 standard deviation values are provided then 3 visits per patient will be created.
The covariates in the simulated dataset are produced as follows:
Patients age is sampled at random from a N(0,1) distribution
Patients sex is sampled at random with a 50/50 split
Patients group is sampled at random but fixed so that each group has n/2
patients
The outcome variable is sampled from a multivariate normal distribution, see below for details
The mean for the outcome variable is derived as:
outcome = Intercept + age + sex + visit + treatment
The coefficients for the intercept, age and sex are taken from mu$int
,
mu$age
and mu$sex
respectively, all of which must be a length 1 numeric.
Treatment and visit coefficients are taken from mu$trt
and mu$visit
respectively
and must either be of length 1 (i.e. a constant affect across all visits) or equal to the
number of visits (as determined by the length of sd
). I.e. if you wanted a treatment
slope of 5 and a visit slope of 1 you could specify:
mu = list(..., "trt" = c(0,5,10), "visit" = c(0,1,2))
The correlation matrix is constructed from cor
as follows.
Let cor = c(a, b, c, d, e, f)
then the correlation matrix would be:
1 a b d a 1 c e b c 1 f d e f 1
data.frame
Sorts a data.frame
(ascending by default) based upon variables within the dataset
sort_by(df, vars = NULL, decreasing = FALSE)
sort_by(df, vars = NULL, decreasing = FALSE)
df |
data.frame |
vars |
character vector of variables |
decreasing |
logical whether sort order should be in descending or ascending (default) order.
Can be either a single logical value (in which case it is applied to
all variables) or a vector which is the same length as |
## Not run: sort_by(iris, c("Sepal.Length", "Sepal.Width"), decreasing = c(TRUE, FALSE)) ## End(Not run)
## Not run: sort_by(iris, c("Sepal.Length", "Sepal.Width"), decreasing = c(TRUE, FALSE)) ## End(Not run)
Transform an array into list of arrays where the listing is performed on a given dimension.
split_dim(a, n)
split_dim(a, n)
a |
Array with number of dimensions at least 2. |
n |
Positive integer. Dimension of |
For example, if a
is a 3 dimensional array and n = 1
,
split_dim(a,n)
returns a list of 2 dimensional arrays (i.e.
a list of matrices) where each element of the list is a[i, , ]
, where
i
takes values from 1 to the length of the first dimension of the array.
Example:
inputs:
a <- array( c(1,2,3,4,5,6,7,8,9,10,11,12), dim = c(3,2,2))
,
which means that:
a[1,,] a[2,,] a[3,,] [,1] [,2] [,1] [,2] [,1] [,2] --------- --------- --------- 1 7 2 8 3 9 4 10 5 11 6 12
n <- 1
output of res <- split_dim(a,n)
is a list of 3 elements:
res[[1]] res[[2]] res[[3]] [,1] [,2] [,1] [,2] [,1] [,2] --------- --------- --------- 1 7 2 8 3 9 4 10 5 11 6 12
A list of length n
of arrays with number of dimensions equal to the
number of dimensions of a
minus 1.
imputation_single()
into multiple imputation_df()
's by IDSplit a flat list of imputation_single()
into multiple imputation_df()
's by ID
split_imputations(list_of_singles, split_ids)
split_imputations(list_of_singles, split_ids)
list_of_singles |
A list of |
split_ids |
A list with 1 element per required split. Each element
must contain a vector of "ID"'s which correspond to the |
This function converts a list of imputations from being structured per patient to being structured per sample i.e. it converts
obj <- list( imputation_single("Ben", numeric(0)), imputation_single("Ben", numeric(0)), imputation_single("Ben", numeric(0)), imputation_single("Harry", c(1, 2)), imputation_single("Phil", c(3, 4)), imputation_single("Phil", c(5, 6)), imputation_single("Tom", c(7, 8, 9)) ) index <- list( c("Ben", "Harry", "Phil", "Tom"), c("Ben", "Ben", "Phil") )
Into:
output <- list( imputation_df( imputation_single(id = "Ben", values = numeric(0)), imputation_single(id = "Harry", values = c(1, 2)), imputation_single(id = "Phil", values = c(3, 4)), imputation_single(id = "Tom", values = c(7, 8, 9)) ), imputation_df( imputation_single(id = "Ben", values = numeric(0)), imputation_single(id = "Ben", values = numeric(0)), imputation_single(id = "Phil", values = c(5, 6)) ) )
This is a simple stack object offering add / pop functionality
stack
A list containing the current stack
add()
Adds content to the end of the stack (must be a list)
Stack$add(x)
x
content to add to the stack
pop()
Retrieve content from the stack
Stack$pop(i)
i
the number of items to retrieve from the stack. If there are less than i
items left on the stack it will just return everything that is left.
clone()
The objects of this class are cloneable with this method.
Stack$clone(deep = FALSE)
deep
Whether to make a deep clone.
Returns a vector of TRUE
/FALSE
for each element of x
if it contains any element in subs
i.e.
str_contains( c("ben", "tom", "harry"), c("e", "y")) [1] TRUE FALSE TRUE
str_contains(x, subs)
str_contains(x, subs)
x |
character vector |
subs |
a character vector of substrings to look for |
These functions are used to implement various reference based imputation strategies by combining a subjects own distribution with that of a reference distribution based upon which of their visits failed to meet the Missing-at-Random (MAR) assumption.
strategy_MAR(pars_group, pars_ref, index_mar) strategy_JR(pars_group, pars_ref, index_mar) strategy_CR(pars_group, pars_ref, index_mar) strategy_CIR(pars_group, pars_ref, index_mar) strategy_LMCF(pars_group, pars_ref, index_mar)
strategy_MAR(pars_group, pars_ref, index_mar) strategy_JR(pars_group, pars_ref, index_mar) strategy_CR(pars_group, pars_ref, index_mar) strategy_CIR(pars_group, pars_ref, index_mar) strategy_LMCF(pars_group, pars_ref, index_mar)
pars_group |
A list of parameters for the subject's group. See details. |
pars_ref |
A list of parameters for the subject's reference group. See details. |
index_mar |
A logical vector indicating which visits meet the MAR assumption for the subject. I.e. this identifies the observations after a non-MAR intercurrent event (ICE). |
pars_group
and pars_ref
both must be a list containing elements mu
and sigma
.
mu
must be a numeric vector and sigma
must be a square matrix symmetric covariance
matrix with dimensions equal to the length of mu
and index_mar
. e.g.
list( mu = c(1,2,3), sigma = matrix(c(4,3,2,3,5,4,2,4,6), nrow = 3, ncol = 3) )
Users can define their own strategy functions and include them via the strategies
argument to impute()
using getStrategies()
. That being said the following
strategies are available "out the box":
Missing at Random (MAR)
Jump to Reference (JR)
Copy Reference (CR)
Copy Increments in Reference (CIR)
Last Mean Carried Forward (LMCF)
Utility function used to replicate str_pad. Adds white space to either end of a string to get it to equal the desired length
string_pad(x, width)
string_pad(x, width)
x |
string |
width |
desired length |
Takes an imputation_df
object and transposes it e.g.
list( list(id = "a", values = c(1,2,3)), list(id = "b", values = c(4,5,6) ) )
transpose_imputations(imputations)
transpose_imputations(imputations)
imputations |
An |
becomes
list( ids = c("a", "b"), values = c(1,2,3,4,5,6) )
Transposes a Results object (as created by analyse()
) in order to group
the same estimates together into vectors.
transpose_results(results, components)
transpose_results(results, components)
results |
A list of results. |
components |
a character vector of components to extract
(i.e. |
Essentially this function takes an object of the format:
x <- list( list( "trt1" = list( est = 1, se = 2 ), "trt2" = list( est = 3, se = 4 ) ), list( "trt1" = list( est = 5, se = 6 ), "trt2" = list( est = 7, se = 8 ) ) )
and produces:
list( trt1 = list( est = c(1,5), se = c(2,6) ), trt2 = list( est = c(3,7), se = c(4,8) ) )
Transposes samples generated by draws()
so that they are grouped
by subjid
instead of by sample number.
transpose_samples(samples)
transpose_samples(samples)
samples |
A list of samples generated by |
This function is used to perform assertions that an object conforms to its expected structure and no basic assumptions have been violated. Will throw an error if checks do not pass.
validate(x, ...)
validate(x, ...)
x |
object to be validated. |
... |
additional arguments to pass to the specific validation method. |
Validates analysis results generated by analyse()
.
validate_analyse_pars(results, pars)
validate_analyse_pars(results, pars)
results |
A list of results generated by the analysis |
pars |
A list of expected parameters in each of the analysis.
lists i.e. |
Validate a longdata object
validate_datalong(data, vars) validate_datalong_varExists(data, vars) validate_datalong_types(data, vars) validate_datalong_notMissing(data, vars) validate_datalong_complete(data, vars) validate_datalong_unifromStrata(data, vars) validate_dataice(data, data_ice, vars, update = FALSE)
validate_datalong(data, vars) validate_datalong_varExists(data, vars) validate_datalong_types(data, vars) validate_datalong_notMissing(data, vars) validate_datalong_complete(data, vars) validate_datalong_unifromStrata(data, vars) validate_dataice(data, data_ice, vars, update = FALSE)
data |
a |
vars |
a |
data_ice |
a |
update |
logical, indicates if the ICE data is being set for the first time or if an update is being applied |
These functions are used to validate various different parts of the longdata object
to be used in draws()
, impute()
, analyse()
and pool()
. In particular:
validate_datalong_varExists - Checks that each variable listed in vars
actually exists
in the data
validate_datalong_types - Checks that the types of each key variable is as expected i.e. that visit is a factor variable
validate_datalong_notMissing - Checks that none of the key variables (except the outcome variable) contain any missing values
validate_datalong_complete - Checks that data
is complete i.e. there is 1 row for each subject *
visit combination. e.g. that nrow(data) == length(unique(subjects)) * length(unique(visits))
validate_datalong_unifromStrata - Checks to make sure that any variables listed as stratification variables do not vary over time. e.g. that subjects don't switch between stratification groups.
Compares the user provided strategies to those that are required (the reference). Will throw an error if not all values of reference have been defined.
validate_strategies(strategies, reference)
validate_strategies(strategies, reference)
strategies |
named list of strategies. |
reference |
list or character vector of strategies that need to be defined. |
Will throw an error if there is an issue otherwise will return TRUE
.
analysis
objectsValidates the return object of the analyse()
function.
## S3 method for class 'analysis' validate(x, ...)
## S3 method for class 'analysis' validate(x, ...)
x |
An |
... |
Not used. |
draws
objectValidate draws
object
## S3 method for class 'draws' validate(x, ...)
## S3 method for class 'draws' validate(x, ...)
x |
A |
... |
Not used. |
is_mar
for a given subjectChecks that the longitudinal data for a patient is divided in MAR followed by non-MAR data; a non-MAR observation followed by a MAR observation is not allowed.
## S3 method for class 'is_mar' validate(x, ...)
## S3 method for class 'is_mar' validate(x, ...)
x |
Object of class |
... |
Not used. |
Will error if there is an issue otherwise will return TRUE
.
vars
Checks that the required variable names are defined within vars
and
are of appropriate datatypes
## S3 method for class 'ivars' validate(x, ...)
## S3 method for class 'ivars' validate(x, ...)
x |
named list indicating the names of key variables in the source dataset |
... |
not used |
Checks to ensure that the user specified references are expect values (i.e. those found within the source data).
## S3 method for class 'references' validate(x, control, ...)
## S3 method for class 'references' validate(x, control, ...)
x |
named character vector. |
control |
factor variable (should be the |
... |
Not used. |
Will error if there is an issue otherwise will return TRUE
.
sample_list
objectValidate sample_list
object
## S3 method for class 'sample_list' validate(x, ...)
## S3 method for class 'sample_list' validate(x, ...)
x |
A |
... |
Not used. |
sample_single
objectValidate sample_single
object
## S3 method for class 'sample_single' validate(x, ...)
## S3 method for class 'sample_single' validate(x, ...)
x |
A |
... |
Not used. |
simul_pars
objectValidate a simul_pars
object
## S3 method for class 'simul_pars' validate(x, ...)
## S3 method for class 'simul_pars' validate(x, ...)
x |
An |
... |
Not used. |
stan_data
objectValidate a stan_data
object
## S3 method for class 'stan_data' validate(x, ...)
## S3 method for class 'stan_data' validate(x, ...)
x |
A |
... |
Not used. |