--- title: "Survival Analysis with visR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Survival Analysis with visR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction This tutorial illustrates a typical use case in clinical development - the analysis of time to a certain event (e.g., death) in different groups. Typically, data obtained in randomized clinical trials (RCT) can be used to estimate the overall survival of patients in one group (e.g., treated with drug X) vs another group (e.g., treated with drug Y) and thus determine if there is a treatment difference. For a more thorough introduction to Survival Analysis, we recommend the following [tutorial](https://bioconnector.github.io/workshops/r-survival.html). In this example, we will work with patient data from NCCTG Lung Cancer dataset that is part of the `survival` package. Another vignette presents an example using a data set following the [CDISC ADaM standard](https://www.cdisc.org/standards/foundational/adam/adam-basic-data-structure-bds-time-event-tte-analyses-v1-0). ```{r imports, echo=TRUE, warning=FALSE, message=FALSE} library(ggplot2) library(visR) ``` ## Global Document Setup ```{r globalSetup} # Metadata Title DATASET <- paste0("NCCTG Lung Cancer Dataset (from survival package ", packageVersion("survival"), ")") # Save original options() old <- options() # Global formatting options options(digits = 3) # Global ggplot settings theme_set(theme_bw()) # Global table settings options(DT.options = list(pageLength = 10, language = list(search = 'Filter:'), scrollX = TRUE)) lung_cohort <- survival::lung # Change gender to be a factor and rename some variables to make output look nicer lung_cohort <- lung_cohort %>% dplyr::mutate(sex = as.factor(ifelse(sex == 1, "Male", "Female"))) %>% dplyr::rename(Age = "age", Sex = "sex", Status = "status", Days = "time") # Restore original options() options(old) ``` ## Cohort Overview (Table one) Visualizing tables, like the table one or risk tables, is a two-step process in visR . First, a data.frame (or tibble) is created by a `get_XXX()` function (e.g. `get_tableone()`). Secondly, the data.frame can be displayed by calling the function `render()`. The advantage of this process is that data summaries can be created, used and adjusted throughout an analysis, while at every step data summaries can be displayed or even be downloaded. Populations are usually displayed as a so-called table one. Function `get_tableone` creates a tibble that includes populations summaries. ```{r table1_get_default} # Select variables of interest and change names to look nicer lung_cohort_tab1 <- lung_cohort %>% dplyr::select(Age, Sex) # Create a table one tab1 <- visR::get_tableone(lung_cohort_tab1) # Render the tableone visR::render(tab1, title = "Overview over Lung Cancer patients", datasource = DATASET) ``` Function `render` nicely displays the tableone. Additionally, visR includes a wrapper function to create and display a `tableone` in only one function call. ```{r table1_render_default} # Use wrapper functionality to create and display a tableone visR::tableone(lung_cohort_tab1, title = "Overview over Lung Cancer patients", datasource = DATASET) ``` Creating and visualizing a tableone with default settings is very simple and can be done with one line of code. However, there are further customization options. In both the get and the wrapper functions, a stratifier can be defined and the column displaying total information can be removed. ```{r table1_get_options} # Create and render a tableone with a stratifier and without displaying the total visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE, title = "Overview over Lung Cancer patients", datasource = DATASET) ``` visR's `render` supports three different rendering engines to be as flexible as possible. By default, `render` uses `gt`. Additional engines are `datatable` (`dt`) to include easy downloading options... ```{r table1_render_options_dt} # Create and render a tableone with with dt as an engine visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE, title = "Overview over Lung Cancer patients", datasource = DATASET, engine = "dt") ``` ...and `kable` for flexible displaying in various output formats (`html` by default, `latex` supported). ```{r table1_render_options_kable} # Create and render a tableone with with kable as an engine and html as output format visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE, title = "Overview over Lung Cancer patients", datasource = DATASET, engine = "kable", output_format="html") ``` Called with `html` as an output format, a `html` view is displayed; called with `latex` a string containing latex code is printed. ## Time-to-event analysis ### Survival estimation visR provides a wrapper function to estimate a Kaplan-Meier curve and several functions to visualize the results. This wrapper function is compatible with `%>%` and purrr::map functions without losing traceability of the dataset name. ```{r km_est} # Select variables of interest and change names to look nicer lung_cohort_survival <- lung_cohort %>% dplyr::select(Age, Sex, Status, Days) # For the survival estimate, the censor must be 0 or 1 lung_cohort_survival$Status <- lung_cohort_survival$Status - 1 # Estimate the survival curve lung_suvival_object <- lung_cohort_survival %>% visR::estimate_KM(strata = "Sex", CNSR = "Status", AVAL = "Days") lung_suvival_object ``` ### Survival visualization There are two frequently used ways to estimate time-to-event data: As a risk table and as a Kaplan-Meier curve. In principle, visR allows to either visualize a risk table and a Kaplan-Meier curve separately, or both together in one plot. #### Displaying the risktable Creating and visualizing a risk table separately works in the exact same way as for the tableone (above): First, `get_risktable()` creates a tibble with risk information that can still be changed. Secondly, the risk table can be rendered to be displayed. ```{r km_tab} # Create a risktable rt <- visR::get_risktable(lung_suvival_object) # Display the risktable visR::render(rt, title = "Overview over survival rates of Lung Cancer patients", datasource = DATASET) ``` The risktable is only one piece of information that can be extracted from a survival object with a `get_XXX` to then be rendered. ```{r km_tab_options_1} # Display a summary of the survival estimate visR::render(lung_suvival_object %>% visR::get_summary(), title = "Summary", datasource = DATASET) ``` ```{r km_tab_options_2} # Display test statistics associated with the survival estimate visR::render(lung_suvival_object %>% visR::get_pvalue(), title = "P-values", datasource = DATASET) ``` ```{r km_tab_options_3} # Display qunatile information of the survival estimate visR::render(lung_suvival_object %>% visR::get_quantile(), title = "Quantile Information", datasource = DATASET) ``` ```{r km_tab_options_4} # Display a cox model estimate associated with the survival estimate visR::render(lung_suvival_object %>% visR::get_COX_HR(), title = "COX estimate", datasource = DATASET) ``` #### Plotting the Kaplan-Meier Alternatively, the survival data can be plotted as a Kaplan-Meier curve. In `visR`, a plot is in most cases a ggplot object and adapting the plot follows the general principle of creating a plot and then adding visual contents step-by-step. ```{r km_plot_1} # Create and display a Kaplan-Meier from the survival object gg <- visR::visr(lung_suvival_object) gg ``` ```{r km_plot_2} # Add a confidence interval to the Kaplan-Meier and display the plot gg %>% visR::add_CI() ``` ```{r km_plot_3} # Add a confidence interval and the censor ticks to the Kaplan-Meier and display the plot gg %>% visR::add_CI() %>% visR::add_CNSR(shape = 3, size = 2) ``` visR includes a wrapper function to create a risktable and then add it directly to a Kaplan-Meier plot. ```{r km_add} # Add a confidence interval and the censor ticks and a risktable to the Kaplan-Meier and display the plot gg %>% visR::add_CI() %>% visR::add_CNSR(shape = 3, size = 2) %>% visR::add_risktable() ``` ## Competing Risks In addition to classic right-censored data, the {visR} package supports the estimation of time-to-event outcomes in the presence of competing events. The package wraps the [{tidycmprsk}](https://mskcc-epi-bio.github.io/tidycmprsk/) package, and exports functions for cumulative incidence estimation and visualization. The function `estimate_cuminc()` estimates the cumulative incidence of the competing event or outcome of interest. The syntax is nearly identical to `estimate_KM()`; however, the outcome status variable (passed to the `CNSR=` argument) must be a factor where the first level indicates censoring, the second level the competing event of interest, and subsequent levels are the other competing events. Visualization functions, `visr()`, `add_CI()`, `add_CNSR()`, and `add_risktable()` share the same syntax as the Kaplan-Meier variants. ```{r cuminc_1} visR::estimate_cuminc( tidycmprsk::trial, strata = "trt", CNSR = "death_cr", AVAL = "ttdeath" ) %>% visR::visr( legend_position = "bottom", x_label = "Months from Treatment", y_label = "Risk of Death" ) %>% visR::add_CI() %>% visR::add_risktable(statlist = c("n.risk", "cum.event")) ```