This vignette shows the general purpose and syntax of the
tern.rbmi
R package. The tern.rbmi
provides an
interface for Reference Based Multiple Imputation (rbmi
)
within the tern framework. For details of the rbmi
package,
please see Reference Based
Multiple Imputation (rbmi). The basic usage of rbmi
core functions is described in the quickstart
vignette:
tern.rbmi
The rbmi
package consists of 4 core functions (plus
several helper functions) which are typically called in sequence:
draws()
- fits the imputation models and stores their
parametersimpute()
- creates multiple imputed datasetsanalyse()
- analyses each of the multiple imputed
datasetspool()
- combines the analysis results across imputed
datasets into a single statisticWe use a publicly available example dataset from an antidepressant
clinical trial of an active drug versus placebo. The relevant endpoint
is the Hamilton 17-item depression rating scale (HAMD17
)
which was assessed at baseline and at weeks 1, 2, 4, and 6. Study drug
discontinuation occurred in 24% of subjects from the active drug and 26%
of subjects from placebo. All data after study drug discontinuation are
missing and there is a single additional intermittent missing
observation.
library(tern.rbmi)
#> Loading required package: rbmi
#> Loading required package: tern
#> Loading required package: rtables
#> Loading required package: formatters
#>
#> Attaching package: 'formatters'
#> The following object is masked from 'package:base':
#>
#> %||%
#> Loading required package: magrittr
#>
#> Attaching package: 'rtables'
#> The following object is masked from 'package:utils':
#>
#> str
#> Registered S3 method overwritten by 'tern':
#> method from
#> tidy.glm broom
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data <- antidepressant_data
levels(data$THERAPY) <- c("PLACEBO", "DRUG") # This is important! The order defines the computation order later
missing_var <- "CHANGE"
vars <- list(
id = "PATIENT",
visit = "VISIT",
expand_vars = c("BASVAL", "THERAPY"),
group = "THERAPY"
)
covariates <- list(
draws = c("BASVAL*VISIT", "THERAPY*VISIT"),
analyse = c("BASVAL")
)
data <- data %>%
dplyr::select(PATIENT, THERAPY, VISIT, BASVAL, THERAPY, CHANGE) %>%
dplyr::mutate(dplyr::across(.cols = vars$id, ~ as.factor(.x))) %>%
dplyr::arrange(dplyr::across(.cols = c(vars$id, vars$visit)))
# Use expand_locf to add rows corresponding to visits with missing outcomes to the dataset
data_full <- do.call(
expand_locf,
args = list(
data = data,
vars = c(vars$expand_vars, vars$group),
group = vars$id,
order = c(vars$id, vars$visit)
) %>%
append(lapply(data[c(vars$id, vars$visit)], levels))
)
data_full <- data_full %>%
dplyr::group_by(dplyr::across(vars$id)) %>%
dplyr::mutate(!!vars$group := Filter(Negate(is.na), .data[[vars$group]])[1])
# there are duplicates - use first value
data_full <- data_full %>%
dplyr::group_by(dplyr::across(c(vars$id, vars$group, vars$visit))) %>%
dplyr::slice(1) %>%
dplyr::ungroup()
# need to have a single ID column
data_full <- data_full %>%
tidyr::unite("TMP_ID", dplyr::all_of(vars$id), sep = "_#_", remove = FALSE) %>%
dplyr::mutate(TMP_ID = as.factor(TMP_ID))
Set the imputation strategy to MAR for each patient with at least one missing observation.
The rbmi::draws()
function fits the imputation models
and stores the corresponding parameter estimates or Bayesian posterior
parameter draws. The three main inputs to the rbmi::draws()
function are:
Define the names of key variables in our dataset and the covariates
included in the imputation model using rbmi::set_vars()
.
Note that the covariates argument can also include interaction
terms.
debug_mode <- FALSE
draws_vars <- rbmi::set_vars(
outcome = missing_var,
visit = vars$visit,
group = vars$group,
covariates = covariates$draws
)
draws_vars$subjid <- "TMP_ID"
Define which imputation method to use, then create samples for the
imputation parameters by running the draws()
function.
draws_method <- method_bayes()
draws_obj <- rbmi::draws(
data = data_full,
data_ice = data_ice,
vars = draws_vars,
method = draws_method
)
#> Compiling Stan model please wait...
#>
#> SAMPLING FOR MODEL 'rbmi_mmrm' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.000272 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.72 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 1200 [ 0%] (Warmup)
#> Chain 1: Iteration: 120 / 1200 [ 10%] (Warmup)
#> Chain 1: Iteration: 201 / 1200 [ 16%] (Sampling)
#> Chain 1: Iteration: 320 / 1200 [ 26%] (Sampling)
#> Chain 1: Iteration: 440 / 1200 [ 36%] (Sampling)
#> Chain 1: Iteration: 560 / 1200 [ 46%] (Sampling)
#> Chain 1: Iteration: 680 / 1200 [ 56%] (Sampling)
#> Chain 1: Iteration: 800 / 1200 [ 66%] (Sampling)
#> Chain 1: Iteration: 920 / 1200 [ 76%] (Sampling)
#> Chain 1: Iteration: 1040 / 1200 [ 86%] (Sampling)
#> Chain 1: Iteration: 1160 / 1200 [ 96%] (Sampling)
#> Chain 1: Iteration: 1200 / 1200 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.509 seconds (Warm-up)
#> Chain 1: 1.803 seconds (Sampling)
#> Chain 1: 2.312 seconds (Total)
#> Chain 1:
#> Warning in fit_mcmc(designmat = model_df_scaled[, -1, drop = FALSE], outcome = model_df_scaled[, : The largest R-hat is 1.51, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
The next step is to use the parameters from the imputation model to
generate the imputed datasets. This is done via the
rbmi::impute()
function. The function only has two key
inputs: the imputation model output from rbmi::draws()
and
the reference groups relevant to reference-based imputation methods. Its
usage is thus:
The next step is to run the analysis model on each imputed dataset.
This is done by defining an analysis function and then calling
rbmi::analyse()
to apply this function to each imputed
dataset.
# Define analysis model
analyse_fun <- ancova
ref_levels <- levels(impute_obj$data$group[[1]])
names(ref_levels) <- c("ref", "alt")
analyse_obj <- rbmi::analyse(
imputations = impute_obj,
fun = analyse_fun,
vars = rbmi::set_vars(
subjid = "TMP_ID",
outcome = missing_var,
visit = vars$visit,
group = vars$group,
covariates = covariates$analyse
)
)
The rbmi::pool()
function can be used to summarize the
analysis results across multiple imputed datasets to provide an overall
statistic with a standard error, confidence intervals and a p-value for
the hypothesis test of the null hypothesis that the effect is equal to
0.
Finally create output with rtables
and tern
packages
library(broom)
df <- tidy(pool_obj)
df
#> group est se_est lower_cl_est upper_cl_est est_contr se_contr
#> 1 ref -1.615820 0.4862316 -2.575771 -0.6558685 NA NA
#> 2 alt -1.707626 0.4749573 -2.645319 -0.7699335 -0.09180645 0.6826279
#> 3 ref -4.184840 0.6556643 -5.479872 -2.8898079 NA NA
#> 4 alt -2.802217 0.6379757 -4.062190 -1.5422444 1.38262257 0.9228398
#> 5 ref -6.314670 0.7149585 -7.728227 -4.9011127 NA NA
#> 6 alt -4.182772 0.6969350 -5.560583 -2.8049602 2.13189810 1.0224376
#> 7 ref -7.540040 0.8166518 -9.158407 -5.9216735 NA NA
#> 8 alt -4.805389 0.7652133 -6.318709 -3.2920684 2.73465169 1.1223964
#> lower_cl_contr upper_cl_contr p_value relative_reduc visit conf_level
#> 1 NA NA NA NA 4 0.95
#> 2 -1.4394968 1.255884 0.89317724 0.05681725 4 0.95
#> 3 NA NA NA NA 5 0.95
#> 4 -0.4402417 3.205487 0.13609395 -0.33038842 5 0.95
#> 5 NA NA NA NA 6 0.95
#> 6 0.1088194 4.154977 0.03904710 -0.33761039 6 0.95
#> 7 NA NA NA NA 7 0.95
#> 8 0.5128870 4.956416 0.01626812 -0.36268396 7 0.95
Final product, reshape rbmi
final results to nicely
formatted rtable
object.
basic_table() %>%
split_cols_by("group", ref_group = levels(df$group)[1]) %>%
split_rows_by("visit", split_label = "Visit", label_pos = "topleft") %>%
summarize_rbmi() %>%
build_table(df)
#> Visit ref alt
#> —————————————————————————————————————————————————————————————————————————
#> 4
#> Adjusted Mean (SE) -1.616 (0.486) -1.708 (0.475)
#> 95% CI (-2.576, -0.656) (-2.645, -0.770)
#> Difference in Adjusted Means (SE) -0.092 (0.683)
#> 95% CI (-1.439, 1.256)
#> Relative Reduction (%) 5.7%
#> p-value (RBMI) 0.8932
#> 5
#> Adjusted Mean (SE) -4.185 (0.656) -2.802 (0.638)
#> 95% CI (-5.480, -2.890) (-4.062, -1.542)
#> Difference in Adjusted Means (SE) 1.383 (0.923)
#> 95% CI (-0.440, 3.205)
#> Relative Reduction (%) -33.0%
#> p-value (RBMI) 0.1361
#> 6
#> Adjusted Mean (SE) -6.315 (0.715) -4.183 (0.697)
#> 95% CI (-7.728, -4.901) (-5.561, -2.805)
#> Difference in Adjusted Means (SE) 2.132 (1.022)
#> 95% CI (0.109, 4.155)
#> Relative Reduction (%) -33.8%
#> p-value (RBMI) 0.0390
#> 7
#> Adjusted Mean (SE) -7.540 (0.817) -4.805 (0.765)
#> 95% CI (-9.158, -5.922) (-6.319, -3.292)
#> Difference in Adjusted Means (SE) 2.735 (1.122)
#> 95% CI (0.513, 4.956)
#> Relative Reduction (%) -36.3%
#> p-value (RBMI) 0.0163