---
title: "PSA and correlated endpoints (bootstrap approach)"
author: "Roche"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: biomedicine.csl
vignette: >
%\VignetteIndexEntry{PSA and correlated endpoints (bootstrap approach)}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 9,
fig.height = 5
)
```
# Introduction
Parametric survival models are often the preferred method of extrapolating survival in economic modeling. When doing such modeling it is important to account for uncertainty in the estimates.
The flexsurvPlus package allows the handling of uncertainty in estimates primarily through boot strapping. The primary use of these bootstrap samples is to be used in probabilistic sensitivity analyses (PSA) in economic models. This is simpler to execute than the the traditional method in Excel using the variance covariance matrix and the cholesky decomposition as we now just need to sample from a list of parameters.
In addition this enables correlations between different endpoints to be preserved in the excel models. For example, if we expect that time to progression correlates positively with survival, then the random draws should also reflect this correlation, rather than just being drawn independently, which could lead to unrealistic iterations with large OS but minimal PFS, and vice versa. This vignette illustrates this with a simple example.
# Set up packages and data
## Install packages
The following packages are required to run this example:
```{r message=FALSE}
library(flexsurvPlus)
library(tibble)
library(dplyr)
library(boot)
library(ggplot2)
```
## Read in the data
To perform survival analyses, patient level data is required for the survival endpoints.
This example uses a standard simulated data set
(adtte). There is no standard naming that is
needed for this package however, there are some set variables that are needed:
* Time - a numeric variable
* Event - a binary variable (event=1, censor=0)
In this example, we use overall survival (OS) and progression-free survival (PFS).
For the purposes of this example we will look at only a single arm trial.
```{r}
# simulate data with a medium correlation between PFS & OS on the patient level
adtte <- sim_adtte(seed = 2020, # for reproducibility
rho = 0.6 # defines a correlation on underlying survival times
)
# subset OS data for reference arm and rename
OS_data <- adtte %>%
filter(PARAMCD=="OS", ARMCD == "A") %>%
transmute(USUBJID,
OS_days = AVAL,
OS_event = 1- CNSR
)
# subset PFS data for reference arm and rename
PFS_data <- adtte %>%
filter(PARAMCD=="PFS", ARMCD == "A") %>%
transmute(USUBJID,
PFS_days = AVAL,
PFS_event = 1- CNSR
)
analysis_data <- left_join(OS_data, PFS_data, by = c("USUBJID"))
```
The Kaplan-Meier curves are plotted below.
```{r}
km.est.PFS <- survfit(Surv(PFS_days, PFS_event) ~ 1,
data = analysis_data)
km.est.OS <- survfit(Surv(OS_days, OS_event) ~ 1,
data = analysis_data)
par(mfrow=c(1,2))
# Plot Kaplan-Meier
plot(km.est.PFS,
xlab = "Time (Days)", # x-axis label
ylab = "Progression-free survival", # y-axis label
xlim = c(0, 800))
plot(km.est.OS,
xlab = "Time (Days)", # x-axis label
ylab = "Overall survival", # y-axis label
xlim = c(0, 800))
par(mfrow=c(1,1))
```
# Bootstrapping the endpoints
## Fitting the models to the original data
Before bootstrapping we can estimate the models using the original data. In this example we have used OS and PFS. For speed of execution and illustration of concept we only fit a weibull model for PFS and a gamma models for OS here but all models discussed in runPSM could be specified with a single call.
```{r}
psm_PFS <- runPSM(
data = analysis_data, # the dataset
time_var = "PFS_days", # the time variable
event_var = "PFS_event", # the event variable coded as 1 for event
model.type = "One arm",
distr = "weibull", # for speed we only fit a single model but multiple could be specified,
int_name = "A" # needed for one arm even as essential meta data
)
psm_OS <- runPSM(
data = analysis_data, # the dataset
time_var = "OS_days", # the time variable
event_var = "OS_event", # the event variable coded as 1 for event
model.type = "One arm", # again for speed only a single model is specified
distr = "gamma",
int_name = "A" # needed for one arm even as essential meta data
)
```
We can use the bootPSM function with the boot package to generate bootstrap samples.
## Ignoring correlation between endpoints
If we are not concerned about correlations between endpoints we can generate independent samples for OS and PFS as shown below.
```{r message = FALSE}
# fix a seed for the sampling
set.seed(2358)
# define a number of samples to generate. For illustration 100 are taken
# In practice a larger number is preferable.
n.sim <- 100
PSM_bootstraps_PFS <- do.call(boot, args = c(psm_PFS$config, statistic = bootPSM, R = n.sim))
PSM_bootstraps_OS <- do.call(boot, args = c(psm_OS$config, statistic = bootPSM, R = n.sim))
```
We can see these are using different bootstrap samples by looking at the bootstrap indexes. This is equivalent to performing PSA for OS and PFS independently in an Excel based model using two independent covariance matrices.
```{r}
index_PFS <- boot.array(PSM_bootstraps_PFS, indices = TRUE)
index_OS <- boot.array(PSM_bootstraps_OS, indices = TRUE)
all(index_OS == index_PFS)
```
## Considering correlation between endpoints
If we want to maintain correlations between PFS and OS we need to use the same bootstrap samples for both the PFS and OS analysis. The easiest way to do this is to reuse the same seed before calling the boot function for the second endpoint. This use of a common seed means the same underlying random sampling process will be performed leading to a matching bootstrap sample.
```{r message = FALSE}
# for illustration and speed only 100 samples used
n.sim <- 100
# define a seed that will affect the random numbers used by boot for sampling
set.seed(2020)
PSM_bootstraps_PFScor <- do.call(boot, args = c(psm_PFS$config, statistic = bootPSM, R = n.sim))
# reuse the same seed that will affect the random numbers used by boot for sampling
set.seed(2020)
PSM_bootstraps_OScor <- do.call(boot, args = c(psm_OS$config, statistic = bootPSM, R = n.sim))
```
We can check these are indeed using the same bootstrap samples by comparing the indexes selected. This means the associated bootstrap distributions for PFS and OS should remain correlated.
```{r}
# this returns the indexes selected in each sample
index_PFScor <- boot.array(PSM_bootstraps_PFScor, indices = TRUE)
index_OScor <- boot.array(PSM_bootstraps_OScor, indices = TRUE)
# as desired all match between the two sampled sets
all(index_OScor == index_PFScor)
```
## Plotting impact of accounting for correlations between endpoints
To illustrate the impact of accounting for correlations between each endpoint in such PSA we will look at plotting the extrapolated mean PFS and mean OS for the reference arm. To do this we need to calculate the mean PFS and mean OS for the reference for each sample.
```{r}
# first we need to preprocess the data a little bit
# first extract the bootstrapped parameters into tibbles
bootsamples_PFS <- as_tibble(PSM_bootstraps_PFS$t)
bootsamples_OS <- as_tibble(PSM_bootstraps_OS$t)
bootsamples_PFScor <- as_tibble(PSM_bootstraps_PFScor$t)
bootsamples_OScor <- as_tibble(PSM_bootstraps_OScor$t)
# then add column names so can identify model and parameter more easily
colnames(bootsamples_PFS) <- names(PSM_bootstraps_PFS$t0)
colnames(bootsamples_OS) <- names(PSM_bootstraps_OS$t0)
colnames(bootsamples_PFScor) <- names(PSM_bootstraps_PFScor$t0)
colnames(bootsamples_OScor) <- names(PSM_bootstraps_OScor$t0)
# we can then calculate extrapolated means for each sample
mean_PFS <- with(bootsamples_PFS, flexsurv::mean_weibull(scale = onearm.weibull.scale.int, shape = onearm.weibull.shape.int))
mean_OS <- with(bootsamples_OS, flexsurv::mean_gamma(shape = onearm.gamma.shape.int, rate = onearm.gamma.rate.int))
mean_PFScor <- with(bootsamples_PFScor, flexsurv::mean_weibull(scale = onearm.weibull.scale.int, shape = onearm.weibull.shape.int))
mean_OScor <- with(bootsamples_OScor, flexsurv::mean_gamma(shape = onearm.gamma.shape.int, rate = onearm.gamma.rate.int))
# we can now combine in a tibble for plotting
means_both <- tibble(PFS = mean_PFS, OS = mean_OS, sample_id = 1:n.sim, Bootstrap = "uncorrelated")
means_bothcor <- tibble(PFS = mean_PFScor, OS = mean_OScor, sample_id = 1:n.sim, Bootstrap = "correlated")
plot_data <- rbind(means_both, means_bothcor)
head(plot_data)
```
We can now plot this dataset and as expected the correlated bootstrap approach leads to mean estimates for PFS and OS that preserve the correlations in the underlying data. Here we use all bootstrap samples but in the context of an excel economic model we would perform PSA by randomly sampling sample_ID and using the same sample_ID for both PFS and OS in a single PSA run.
```{r}
plot_data %>%
ggplot(aes(x = PFS, y = OS, color = Bootstrap)) +
theme_bw() +
geom_point() +
stat_ellipse() +
coord_cartesian(xlim = c(150, 250), ylim = c(750, 1750)) +
xlab("Mean (weibull) PFS days") +
ylab("Mean (gamma) OS days") +
ggtitle("Each point is a bootstrap sample")
```