--- title: "STEM compatibility" author: "Roche" date: "`r Sys.Date()`" output: rmarkdown::html_vignette csl: biomedicine.csl vignette: > %\VignetteIndexEntry{STEM compatibility} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 9, fig.height = 5 ) ``` # Introduction This vignette describes the use of the convSTEM function which enables backwards compatibility with STEM economic models which use an alternative formulation of parametric survival models. For most users this vignette can be safely ignored. The following packages are required to run this example: ```{r message=FALSE} library(flexsurvPlus) library(tibble) library(dplyr) library(boot) ``` ## Read in the data To perform survival analyses, patient level data is required for the survival endpoints. This example uses a standard simulated data set (adtte). There is no standard naming that is needed for this package however, there are some set variables that are needed: * Time - a numeric variable * Event - a binary variable (event=1, censor=0) In this example, we use only progression-free survival (PFS). ```{r} # simulate data with a medium correlation between PFS & OS on the patient level adtte <- sim_adtte(seed = 2020, # for reproducibility rho = 0.6 # defines a correlation on underlying survival times ) # subset PFS data and rename PFS_data <- adtte %>% filter(PARAMCD=="PFS") %>% transmute(USUBJID, ARMCD, AVAL, event = 1 - CNSR #needs to be coded as 1 for event ) ``` # Fitting models This section uses runPSM as described in other vignettes to estimate survival models. As described in other vignettes flexsurvPlus primarily uses bootstrapping to handle uncertainty with the bootPSM function in conjunction with boot. ```{r message = FALSE} # For speed only 4 models are considered for this example but # all models could be considered psm_PFS_expweib <- runPSM(data = PFS_data, time_var = "AVAL", event_var ="event", model.type = c("Common shape", "Separate"), distr = c('exp', 'weibull'), strata_var = "ARMCD", int_name = "A", ref_name = "B") # Apply bootstrapping for illustration 100 samples are taken # In practice a larger number maybe preferable. psm_boot_PFS_expweib <- do.call(boot, args = c(psm_PFS_expweib$config, statistic = bootPSM, R = 100)) ``` # Converting to a STEM format This is done using convSTEM. Depending on if a runPSM object or a bootPSM object or both are provided different quantities will be estimated. ```{r message = FALSE} # only supplying x PFS_conv_x <- convSTEM(x = psm_PFS_expweib) # only supplying samples PFS_conv_samples <- convSTEM(samples = psm_boot_PFS_expweib) # supplying both x and samples from the same set of models PFS_conv_both <- convSTEM(x = psm_PFS_expweib, samples = psm_boot_PFS_expweib) ``` # The output of convSTEM convSTEM returns a list of four data frames containing parameters converted to match those reported by an existing SAS macro. * stem_param The parameter estimates * stem_cov Estimates for a covariance matrix for the parameters derived from the bootstrap samples. * stem_modsum A summary of the model fit statistics. * stem_boot A data frame of converted boot strap estimates. ## stem_param stem_param contains the parameter estimates and is calculated if either x or samples is provided. ```{r message = FALSE} names(PFS_conv_x$stem_param) PFS_conv_x$stem_param %>% transmute(Model, Dist, Param, Estimate) PFS_conv_samples$stem_param %>% transmute(Model, Dist, Param, Estimate) ``` ## stem_cov stem_cov contains estimates for a covariance matrix for the parameters and is only derived if samples is provided. ```{r message = FALSE} names(PFS_conv_x$stem_cov) # note is empty here as samples were not provided nrow(PFS_conv_x$stem_cov) PFS_conv_samples$stem_cov %>% transmute(Model, Dist, rowParam, colParam, rowNum, colNum, CovEst) ``` The matrix is described by row and column numbers so can be back converted to a matrix if desired. ```{r message = FALSE} # select a single model from the data frame weibull_common_data <- PFS_conv_samples$stem_cov %>% filter(Model == "Common shape", Dist == "Weibull") %>% transmute(rowParam, colParam, rowNum, colNum, CovEst) weibull_common_data # can see is ordered by row so can convert to a matrix weibull_common_matrix <- matrix(data = weibull_common_data$CovEst, nrow = 3, ncol = 3, byrow = TRUE) # applying names rownames(weibull_common_matrix) <- colnames(weibull_common_matrix) <- weibull_common_data$colParam[1:3] weibull_common_matrix ``` ## stem_modsum stem_modsum contains a summary of the model fit statistics and is only produced if x is provided. ```{r message = FALSE} names(PFS_conv_x$stem_modsum) # note is empty here as x was not provided nrow(PFS_conv_samples$stem_modsum) PFS_conv_x$stem_modsum %>% transmute(Model, Dist, Status, AIC, AIC_SAS, BIC, BIC_SAS) ``` ## stem_boot stem_boot contains a data frame of converted boot strap estimates and is only derived if samples is provided. ```{r message = FALSE} names(PFS_conv_samples$stem_boot) # note is empty here as samples were not provided nrow(PFS_conv_x$stem_boot) PFS_conv_samples$stem_boot %>% transmute(Model, Dist, BootSample, Param, Estimate) %>% head() ``` # Details on conversions To illustrate the conversions will fit a full set of models to the data. ```{r message = FALSE} psm_PFS_all <- runPSM(data=PFS_data, time_var="AVAL", event_var="event", model.type= c("Common shape", "Independent shape", "Separate"), distr = c('exp', 'weibull', 'gompertz', 'lnorm', 'llogis', 'gengamma', 'gamma', 'genf'), strata_var = "ARMCD", int_name="A", ref_name = "B") # only supplying x PFS_conv_all <- convSTEM(x = psm_PFS_all) # To see what the converted parameters are can look at stem_param PFS_conv_all$stem_param %>% group_by(Model, Dist) %>% summarise(Params = paste(Param, collapse = ", ")) %>% knitr::kable() ``` ## Conversion of parameters Note in general independent shape models were not estimated by STEM macro so are converted for backwards compatibility as if if they were estimated by separate models. This has implications for how AIC and BIC should be interpreted in these cases. ### Conversion of Exponential models flexsurvPlus uses the parameter rate ($b$). For common shape, independent shape and separate models these can be denoted as $b_R$ for reference and $b_I$ for intervention. The SAS STEM macro reports INTERCEPT and TX(INTERVENTION) which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $-\log(b_R)$ * Intervention arm * INTERCEPT = $-\log(b_I)$ #### Common shape models * INTERCEPT = $-\log(b_R)$ * TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = EXP(-INTERCEPT)` and/or * `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)))` * `Survival = EXP(-lambda*time)` ### Conversion of Weibull models flexsurvPlus uses the parameters scale ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $\log(b_R)$ * SCALE = $\frac{1}{a_R}$ * Intervention arm * INTERCEPT = $\log(b_I)$ * SCALE = $\frac{1}{a_I}$ #### Common shape models * INTERCEPT = $\log(b_R)$ * TX(INTERVENTION) = $\log(b_I) - \log(b_R)$ * SCALE = $\frac{1}{a_R}$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = EXP(-INTERCEPT/SCALE)` and/or * `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)/SCALE))` * `gamma = 1/SCALE` * `Survival = EXP(-lambda*time^gamma)` ### Conversion of Gompertz models flexsurvPlus uses the parameters rate ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $-\log(b_R)$ * SCALE = $a_R$ * Intervention arm * INTERCEPT = $-\log(b_I)$ * SCALE = $a_I$ #### Common shape models * INTERCEPT = $-\log(b_R)$ * TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$ * SCALE = $a_R$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = EXP(-INTERCEPT)` and/or * `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)))` * `gamma = SCALE` * `Survival = EXP((lambda/gamma)*(1-EXP(gamma*time)))` ### Conversion of Log-logistic models flexsurvPlus uses the parameters scale ($b$) and shape ($a$). For common shape, independent shape and separate models these can be denoted as $b_R, a_R$ for reference and $b_I, a_I$ for intervention. For common shape models $a_R=a_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $\log(b_R)$ * SCALE = $\frac{1}{a_R}$ * Intervention arm * INTERCEPT = $\log(b_I)$ * SCALE = $\frac{1}{a_I}$ #### Common shape models * INTERCEPT = $-\log(b_R)$ * TX(INTERVENTION) = $\log(b_I) - \log(B_R)$ * SCALE = $\frac{1}{a_R}$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = EXP(-INTERCEPT/SCALE)` and/or * `lambda = EXP(-(INTERCEPT+TX(INTERVENTION)/SCALE))` * `gamma = 1/SCALE` * `Survival = 1 / (1 + lambda*time^gamma)` ### Conversion of Log-normal models flexsurvPlus uses the parameters meanlog ($\mu$) and sdlog ($\sigma$). For common shape, independent shape and separate models these can be denoted as $\mu_R, \sigma_R$ for reference and $\mu_I, \sigma_I$ for intervention. For common shape models $\sigma_R=\sigma_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $\mu_R$ * SCALE = $\sigma_R$ * Intervention arm * INTERCEPT = $\mu_I$ * SCALE = $\sigma_I$ #### Common shape models * INTERCEPT = $\mu_R$ * TX(INTERVENTION) = $\mu_I - \mu_R$ * SCALE = $\sigma_R$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = INTERCEPT` and/or * `lambda = INTERCEPT+TX(INTERVENTION)` * `gamma = SCALE` * `Survival = 1-NORMDIST(((LN(time)-lambda)/gamma),0,1,TRUE)` ### Conversion of Generalized Gamma models flexsurvPlus uses the parameters mu ($\mu$), sigma ($\sigma$) and $Q$. For common shape, independent shape and separate models these can be denoted as $\mu_R, \sigma_R, Q_R$ for reference and $\mu_I, \sigma_I, Q_I$ for intervention. For common shape models $\sigma_R=\sigma_I$ and $Q_R=$Q_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION), SCALE and SHAPE which are converted as shown below. #### Seperate models * Reference arm * INTERCEPT = $\mu_R$ * SCALE = $\sigma_R$ * SHAPE = $Q_R$ * Intervention arm * INTERCEPT = $\mu_I$ * SCALE = $\sigma_I$ * SHAPE = $Q_I$ #### Common shape models * INTERCEPT = $\mu_R$ * TX(INTERVENTION) = $\mu_I - \mu_R$ * SCALE = $\sigma_R$ * SHAPE = $Q_R$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = (EXP(-INTERCEPT)^(SHAPE/SCALE))/(SHAPE^2)` and/or * `lambda = (EXP(-(INTERCEPT+TX(INTERVENTION))^(SHAPE/SCALE))/(SHAPE^2)` * `gamma = 1/(SHAPE^2)` * `delta = SHAPE/SCALE` * `Survival = IF(gamma>0, 1-GAMMADIST(lambda*time^(delta),gamma,1,TRUE), 1-( 1-GAMMADIST(lambda*time^(delta),gamma,1,TRUE)))` ### Conversion of Gamma models flexsurvPlus uses the parameters Shape ($a$) and Rate ($b$). For common shape, independent shape and separate models these can be denoted as $a_R, b_R$ for reference and $a_I, b_I$ for intervention. For common shape models $a_R=a_I$. The SAS STEM macro reports INTERCEPT, TX(INTERVENTION) and SCALE which are converted as shown below. This is equivalent to the parameters used for the Generalized Gamma when the constraint that $\sigma = Q$ or identically that SCALE = SHAPE is applied. #### Seperate models * Reference arm * INTERCEPT = $-\log(\frac{b_R}{a_R})$ * SCALE = $\frac{1}{\sqrt{a_R}}$ * Intervention arm * INTERCEPT = $-\log(\frac{b_I}{a_I})$ * SCALE = $\frac{1}{\sqrt{a_I}}$ #### Common shape models * INTERCEPT = $-\log(\frac{b_R}{a_R})$ * TX(INTERVENTION) = $-(\log(b_I) - \log(b_R))$ * SCALE = $\frac{1}{\sqrt{a_R}}$ #### Excel formula To derive survival in Excel the following formula are used: * `lambda = (EXP(-INTERCEPT))/(SCALE^2)` and/or * `lambda = (EXP(-(INTERCEPT+TX(INTERVENTION)))/(SCALE^2)` * `gamma = 1/(SCALE^2)` * `Survival = IF(gamma>0, 1-GAMMADIST(lambda*time,gamma,1,TRUE), 1- (1-GAMMADIST(lambda*time,gamma,1,TRUE)))` ## Conversion of AIC and BIC As the existing SAS STEM macro estimated models with log(time) as a response variable while flexsurv and flexsurvPlus estimate models with time as the response variable there are differences in the log-likelihood between the two software packages. These differences do not affect the parameter estimates or ranking of fit but do affect the AIC and BIC reported by a constant. This constant is the sum of the events log(time). As such can convert by adding this constant to the maximum log-likelihood and by extension double the sum of the events log(time) to AIC or BIC. ```{r message = FALSE} # This constant is estimated from the sum of the log of times for patients who have an event PFS_data %>% filter(event == 1) %>% # select events only transmute(log_event_time = log(AVAL)) %>% # difference in log time summarise(2 * sum(log_event_time)) # So for common shape models (using all data) the difference is 4274.312 # Can be seen in results from convSTEM here modsum <- PFS_conv_all$stem_modsum %>% transmute(Model, Dist, AIC, AIC_SAS, BIC, BIC_SAS) # note as the separate and common shape models use different data these are not comparable modsum %>% mutate(delta_AIC = AIC - AIC_SAS, delta_BIC = AIC - AIC_SAS) ```