---
title: "STEM compatibility"
author: "Roche"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
csl: biomedicine.csl
vignette: >
%\VignetteIndexEntry{STEM compatibility}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 9,
fig.height = 5
)
```
# Introduction
This vignette describes the use of the convSTEM function which enables backwards compatibility with STEM economic models which use an alternative formulation of parametric survival models. For most users this vignette can be safely ignored.
The following packages are required to run this example:
```{r message=FALSE}
library(flexsurvPlus)
library(tibble)
library(dplyr)
library(boot)
```
## Read in the data
To perform survival analyses, patient level data is required for the survival endpoints.
This example uses a standard simulated data set
(adtte). There is no standard naming that is
needed for this package however, there are some set variables that are needed:
* Time - a numeric variable
* Event - a binary variable (event=1, censor=0)
In this example, we use only progression-free survival (PFS).
```{r}
# simulate data with a medium correlation between PFS & OS on the patient level
adtte <- sim_adtte(seed = 2020, # for reproducibility
rho = 0.6 # defines a correlation on underlying survival times
)
# subset PFS data and rename
PFS_data <- adtte %>%
filter(PARAMCD=="PFS") %>%
transmute(USUBJID, ARMCD,
AVAL,
event = 1 - CNSR #needs to be coded as 1 for event
)
```
# Fitting models
This section uses runPSM as described in other vignettes to estimate survival models. As described in other vignettes flexsurvPlus primarily uses bootstrapping to handle uncertainty with the bootPSM function in conjunction with boot.
```{r message = FALSE}
# For speed only 4 models are considered for this example but
# all models could be considered
psm_PFS_expweib <- runPSM(data = PFS_data,
time_var = "AVAL",
event_var ="event",
model.type = c("Common shape",
"Separate"),
distr = c('exp',
'weibull'),
strata_var = "ARMCD",
int_name = "A",
ref_name = "B")
# Apply bootstrapping for illustration 100 samples are taken
# In practice a larger number maybe preferable.
psm_boot_PFS_expweib <- do.call(boot, args = c(psm_PFS_expweib$config, statistic = bootPSM, R = 100))
```
# Converting to a STEM format
This is done using convSTEM. Depending on if a runPSM object or a bootPSM object or both are provided different quantities will be estimated.
```{r message = FALSE}
# only supplying x
PFS_conv_x <- convSTEM(x = psm_PFS_expweib)
# only supplying samples
PFS_conv_samples <- convSTEM(samples = psm_boot_PFS_expweib)
# supplying both x and samples from the same set of models
PFS_conv_both <- convSTEM(x = psm_PFS_expweib, samples = psm_boot_PFS_expweib)
```
# The output of convSTEM
convSTEM returns a list of four data frames containing parameters converted to match those reported by an existing SAS macro.
* stem_param The parameter estimates
* stem_cov Estimates for a covariance matrix for the parameters derived from the bootstrap samples.
* stem_modsum A summary of the model fit statistics.
* stem_boot A data frame of converted boot strap estimates.
## stem_param
stem_param contains the parameter estimates and is calculated if either x or samples is provided.
```{r message = FALSE}
names(PFS_conv_x$stem_param)
PFS_conv_x$stem_param %>%
transmute(Model, Dist, Param, Estimate)
PFS_conv_samples$stem_param %>%
transmute(Model, Dist, Param, Estimate)
```
## stem_cov
stem_cov contains estimates for a covariance matrix for the parameters and is only derived if samples is provided.
```{r message = FALSE}
names(PFS_conv_x$stem_cov)
# note is empty here as samples were not provided
nrow(PFS_conv_x$stem_cov)
PFS_conv_samples$stem_cov %>%
transmute(Model, Dist, rowParam, colParam, rowNum, colNum, CovEst)
```
The matrix is described by row and column numbers so can be back converted to a matrix if desired.
```{r message = FALSE}
# select a single model from the data frame
weibull_common_data <- PFS_conv_samples$stem_cov %>%
filter(Model == "Common shape", Dist == "Weibull") %>%
transmute(rowParam, colParam, rowNum, colNum, CovEst)
weibull_common_data
# can see is ordered by row so can convert to a matrix
weibull_common_matrix <- matrix(data = weibull_common_data$CovEst,
nrow = 3, ncol = 3, byrow = TRUE)
# applying names
rownames(weibull_common_matrix) <- colnames(weibull_common_matrix) <- weibull_common_data$colParam[1:3]
weibull_common_matrix
```
## stem_modsum
stem_modsum contains a summary of the model fit statistics and is only produced if x is provided.
```{r message = FALSE}
names(PFS_conv_x$stem_modsum)
# note is empty here as x was not provided
nrow(PFS_conv_samples$stem_modsum)
PFS_conv_x$stem_modsum %>%
transmute(Model, Dist, Status, AIC, AIC_SAS, BIC, BIC_SAS)
```
## stem_boot
stem_boot contains a data frame of converted boot strap estimates and is only derived if samples is provided.
```{r message = FALSE}
names(PFS_conv_samples$stem_boot)
# note is empty here as samples were not provided
nrow(PFS_conv_x$stem_boot)
PFS_conv_samples$stem_boot %>%
transmute(Model, Dist, BootSample, Param, Estimate) %>%
head()
```
# Details on conversions
To illustrate the conversions will fit a full set of models to the data.
```{r message = FALSE}
psm_PFS_all <- runPSM(data=PFS_data,
time_var="AVAL",
event_var="event",
model.type= c("Common shape",
"Independent shape",
"Separate"),
distr = c('exp',
'weibull',
'gompertz',
'lnorm',
'llogis',
'gengamma',
'gamma',
'genf'),
strata_var = "ARMCD",
int_name="A",
ref_name = "B")
# only supplying x
PFS_conv_all <- convSTEM(x = psm_PFS_all)
# To see what the converted parameters are can look at stem_param
PFS_conv_all$stem_param %>%
group_by(Model, Dist) %>%
summarise(Params = paste(Param, collapse = ", ")) %>%
knitr::kable()
```
## Conversion of parameters
Note in general independent shape models were not estimated by STEM macro so are converted for backwards compatibility as if if they were estimated by separate models. This has implications for how AIC and BIC should be interpreted in these cases.
### Conversion of Exponential models
flexsurvPlus uses the parameter rate ($b$). For common shape, independent shape and separate models these can be denoted as $b_R$ for reference and $b_I$ for intervention.
The SAS STEM macro reports INTERCEPT and TX(INTERVENTION) which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $-\log(b_R)$
* Intervention arm
* INTERCEPT = $-\log(b_I)$
#### Common shape models
* INTERCEPT = $-\log(b_R)$
* TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = EXP(-INTERCEPT)` and/or
* `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)))`
* `Survival = EXP(-lambda*time)`
### Conversion of Weibull models
flexsurvPlus uses the parameters scale ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $\log(b_R)$
* SCALE = $\frac{1}{a_R}$
* Intervention arm
* INTERCEPT = $\log(b_I)$
* SCALE = $\frac{1}{a_I}$
#### Common shape models
* INTERCEPT = $\log(b_R)$
* TX(INTERVENTION) = $\log(b_I) - \log(b_R)$
* SCALE = $\frac{1}{a_R}$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = EXP(-INTERCEPT/SCALE)` and/or
* `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)/SCALE))`
* `gamma = 1/SCALE`
* `Survival = EXP(-lambda*time^gamma)`
### Conversion of Gompertz models
flexsurvPlus uses the parameters rate ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $-\log(b_R)$
* SCALE = $a_R$
* Intervention arm
* INTERCEPT = $-\log(b_I)$
* SCALE = $a_I$
#### Common shape models
* INTERCEPT = $-\log(b_R)$
* TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$
* SCALE = $a_R$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = EXP(-INTERCEPT)` and/or
* `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)))`
* `gamma = SCALE`
* `Survival = EXP((lambda/gamma)*(1-EXP(gamma*time)))`
### Conversion of Log-logistic models
flexsurvPlus uses the parameters scale ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $\log(b_R)$
* SCALE = $\frac{1}{a_R}$
* Intervention arm
* INTERCEPT = $\log(b_I)$
* SCALE = $\frac{1}{a_I}$
#### Common shape models
* INTERCEPT = $-\log(b_R)$
* TX(INTERVENTION) = $\log(b_I) - \log(B_R)$
* SCALE = $\frac{1}{a_R}$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = EXP(-INTERCEPT/SCALE)` and/or
* `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)/SCALE))`
* `gamma = 1/SCALE`
* `Survival = 1 / (1 + lambda*time^gamma)`
### Conversion of Log-normal models
flexsurvPlus uses the parameters meanlog ($\mu$) and sdlog ($\sigma$). For common shape, independent shape and separate models these can be denoted as $\mu_R, \sigma_R$ for reference and $\mu_I, \sigma_I$ for intervention. For common shape models $\sigma_R=\sigma_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $\mu_R$
* SCALE = $\sigma_R$
* Intervention arm
* INTERCEPT = $\mu_I$
* SCALE = $\sigma_I$
#### Common shape models
* INTERCEPT = $\mu_R$
* TX(INTERVENTION) = $\mu_I - \mu_R$
* SCALE = $\sigma_R$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = INTERCEPT` and/or
* `lambda = INTERCEPT+TX(INTERVENTION)`
* `gamma = SCALE`
* `Survival = 1-NORMDIST(((LN(time)-lambda)/gamma),0,1,TRUE)`
### Conversion of Generalized Gamma models
flexsurvPlus uses the parameters mu ($\mu$), sigma ($\sigma$) and $Q$. For common shape, independent shape and separate models these can be denoted as $\mu_R, \sigma_R, Q_R$ for reference and $\mu_I, \sigma_I, Q_I$ for intervention. For common shape models $\sigma_R=\sigma_I$ and $Q_R=$Q_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION), SCALE and SHAPE which are converted as shown below.
#### Seperate models
* Reference arm
* INTERCEPT = $\mu_R$
* SCALE = $\sigma_R$
* SHAPE = $Q_R$
* Intervention arm
* INTERCEPT = $\mu_I$
* SCALE = $\sigma_I$
* SHAPE = $Q_I$
#### Common shape models
* INTERCEPT = $\mu_R$
* TX(INTERVENTION) = $\mu_I - \mu_R$
* SCALE = $\sigma_R$
* SHAPE = $Q_R$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = (EXP(-INTERCEPT)^(SHAPE/SCALE))/(SHAPE^2)` and/or
* `lambda = (EXP(-(INTERCEPT+TX(INTERVENTION))^(SHAPE/SCALE))/(SHAPE^2)`
* `gamma = 1/(SHAPE^2)`
* `delta = SHAPE/SCALE`
* `Survival = IF(gamma>0, 1-GAMMADIST(lambda*time^(delta),gamma,1,TRUE), 1-( 1-GAMMADIST(lambda*time^(delta),gamma,1,TRUE)))`
### Conversion of Gamma models
flexsurvPlus uses the parameters Shape ($a$) and Rate ($b$). For common shape, independent shape and separate models these can be denoted as $a_R, b_R$ for reference and $a_I, b_I$ for intervention. For common shape models $a_R=a_I$.
The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. This is equivalent to the parameters used for the Generalized Gamma when the constraint that $\sigma = Q$ or identically that SCALE = SHAPE is applied.
#### Seperate models
* Reference arm
* INTERCEPT = $-\log(\frac{b_R}{a_R})$
* SCALE = $\frac{1}{\sqrt{a_R}}$
* Intervention arm
* INTERCEPT = $-\log(\frac{b_I}{a_I})$
* SCALE = $\frac{1}{\sqrt{a_I}}$
#### Common shape models
* INTERCEPT = $-\log(\frac{b_R}{a_R})$
* TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$
* SCALE = $\frac{1}{\sqrt{a_R}}$
#### Excel formula
To derive survival in Excel the following formula are used:
* `lambda = (EXP(-INTERCEPT))/(SCALE^2)` and/or
* `lambda = (EXP(-(INTERCEPT+TX(INTERVENTION)))/(SCALE^2)`
* `gamma = 1/(SCALE^2)`
* `Survival = IF(gamma>0, 1-GAMMADIST(lambda*time,gamma,1,TRUE), 1- (1-GAMMADIST(lambda*time,gamma,1,TRUE)))`
## Conversion of AIC and BIC
As the existing SAS STEM macro estimated models with log(time) as a response variable while flexsurv and flexsurvPlus estimate models with time as the response variable there are differences in the log-likelihood between the two software packages. These differences do not affect the parameter estimates or ranking of fit but do affect the AIC and BIC reported by a constant.
This constant is the sum of the events log(time). As such can convert by adding this constant to the maximum log-likelihood and by extension double the sum of the events log(time) to AIC or BIC.
```{r message = FALSE}
# This constant is estimated from the sum of the log of times for patients who have an event
PFS_data %>%
filter(event == 1) %>% # select events only
transmute(log_event_time = log(AVAL)) %>% # difference in log time
summarise(2 * sum(log_event_time))
# So for common shape models (using all data) the difference is 4274.312
# Can be seen in results from convSTEM here
modsum <- PFS_conv_all$stem_modsum %>%
transmute(Model, Dist, AIC, AIC_SAS, BIC, BIC_SAS)
# note as the separate and common shape models use different data these are not comparable
modsum %>%
mutate(delta_AIC = AIC - AIC_SAS,
delta_BIC = AIC - AIC_SAS)
```