tern
functionstern
functionsEvery function in the tern
package is designed to have a
certain structure that can cooperate well with every user’s need, while
maintaining a consistent and predictable behavior. This document will
guide you through an example function in the package, explaining the
purpose of many of its building blocks and how they can be used.
As we recently worked on it we will consider
summarize_change()
as an example. This function is used to
calculate the change from a baseline value for a given variable. A
realistic example can be found in LBT03
from the TLG-catalog.
summarize_change()
is the main function that is
available to the user. You can find lists of these functions in
?tern::analyze_functions
. All of these are build around
rtables::analyze()
function, which is the core analysis
function in rtables
. All these wrapper functions call
specific analysis functions (always written as a_*
) that
are meant to handle the statistic functions (always written as
s_*
) and format the results with the
rtables::in_row()
function. We can summarize this structure
as follows:
summarize_change()
(1)->
a_change_from_baseline()
(2)->
[s_change_from_baseline()
+
rtables::in_row()
]
The main questions that may arise are:
NA
.Data set and library loading.
library(dplyr)
library(tern)
## Fabricate dataset
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9))
) %>%
mutate(ABLFLL = AVISIT == "V1") %>%
group_by(USUBJID) %>%
mutate(
BLVAL = AVAL[ABLFLL],
CHG = AVAL - BLVAL
) %>%
ungroup()
Classic use of summarize_change()
.
fix_layout <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AVISIT")
# Dealing with NAs: na_rm = TRUE
fix_layout %>%
summarize_change("CHG", variables = list(value = "AVAL", baseline_flag = "ABLFLL")) %>%
build_table(dta_test) %>%
print()
#> A B C
#> ————————————————————————————————————————————————
#> V1
#> n 2 1 0
#> Mean (SD) 7.50 (2.12) 3.00 (NA) NA
#> Median 7.50 3.00 NA
#> Min - Max 6.00 - 9.00 3.00 - 3.00 NA
#> V2
#> n 2 1 0
#> Mean (SD) -1.00 (0.00) -1.00 (NA) NA
#> Median -1.00 -1.00 NA
#> Min - Max -1.00 - -1.00 -1.00 - -1.00 NA
#> V3
#> n 2 1 0
#> Mean (SD) -2.00 (0.00) -2.00 (NA) NA
#> Median -2.00 -2.00 NA
#> Min - Max -2.00 - -2.00 -2.00 - -2.00 NA
# Dealing with NAs: na_rm = FALSE
fix_layout %>%
summarize_change("CHG", variables = list(value = "AVAL", baseline_flag = "ABLFLL"), na_rm = FALSE) %>%
build_table(dta_test) %>%
print()
#> A B C
#> ————————————————————————————————————————————————
#> V1
#> n 2 1 0
#> Mean (SD) 7.50 (2.12) 3.00 (NA) NA
#> Median 7.50 3.00 NA
#> Min - Max 6.00 - 9.00 3.00 - 3.00 NA
#> V2
#> n 2 1 0
#> Mean (SD) -1.00 (0.00) -1.00 (NA) NA
#> Median -1.00 -1.00 NA
#> Min - Max -1.00 - -1.00 -1.00 - -1.00 NA
#> V3
#> n 2 1 0
#> Mean (SD) -2.00 (0.00) -2.00 (NA) NA
#> Median -2.00 -2.00 NA
#> Min - Max -2.00 - -2.00 -2.00 - -2.00 NA
# changing the NA string (it is done on all levels)
fix_layout %>%
summarize_change("CHG", variables = list(value = "AVAL", baseline_flag = "ABLFLL"), na_str = "my_na") %>%
build_table(dta_test) %>%
print()
#> A B C
#> ———————————————————————————————————————————————————
#> V1
#> n 2 1 0
#> Mean (SD) 7.50 (2.12) 3.00 (my_na) my_na
#> Median 7.50 3.00 my_na
#> Min - Max 6.00 - 9.00 3.00 - 3.00 my_na
#> V2
#> n 2 1 0
#> Mean (SD) -1.00 (0.00) -1.00 (my_na) my_na
#> Median -1.00 -1.00 my_na
#> Min - Max -1.00 - -1.00 -1.00 - -1.00 my_na
#> V3
#> n 2 1 0
#> Mean (SD) -2.00 (0.00) -2.00 (my_na) my_na
#> Median -2.00 -2.00 my_na
#> Min - Max -2.00 - -2.00 -2.00 - -2.00 my_na
.formats
, .labels
, and
.indent_mods
depend on the names of .stats
.
Here is how you can change the default formatting.
# changing n count format and label and indentation
fix_layout %>%
summarize_change("CHG",
variables = list(value = "AVAL", baseline_flag = "ABLFLL"),
.stats = c("n", "mean"), # reducing the number of stats for visual appreciation
.formats = c(n = "xx.xx"),
.labels = c(n = "NnNn"),
.indent_mods = c(n = 5), na_str = "nA"
) %>%
build_table(dta_test) %>%
print()
#> A B C
#> —————————————————————————————————————
#> V1
#> NnNn 2.00 1.00 0.00
#> Mean 7.5 3.0 nA
#> V2
#> NnNn 2.00 1.00 0.00
#> Mean -1.0 -1.0 nA
#> V3
#> NnNn 2.00 1.00 0.00
#> Mean -2.0 -2.0 nA
What if I want something special for the format?
# changing n count format and label and indentation
fix_layout %>%
summarize_change("CHG",
variables = list(value = "AVAL", baseline_flag = "ABLFLL"),
.stats = c("n", "mean"), # reducing the number of stats for visual appreciation
.formats = c(n = function(x, ...) as.character(x * 100))
) %>% # Note you need ...!!!
build_table(dta_test) %>%
print()
#> A B C
#> —————————————————————————
#> V1
#> n 200 100 0
#> Mean 7.5 3.0 NA
#> V2
#> n 200 100 0
#> Mean -1.0 -1.0 NA
#> V3
#> n 200 100 0
#> Mean -2.0 -2.0 NA
Adding a custom statistic (and custom format):
# changing n count format and label and indentation
fix_layout %>%
summarize_change(
"CHG",
variables = list(value = "AVAL", baseline_flag = "ABLFLL"),
.stats = c("n", "my_stat" = function(df, ...) {
a <- mean(df$AVAL, na.rm = TRUE)
b <- list(...)$.N_row # It has access at all `?rtables::additional_fun_params`
a / b
}),
.formats = c("my_stat" = function(x, ...) sprintf("%.2f", x))
) %>%
build_table(dta_test)
#> A B C
#> ————————————————————————————
#> V1
#> n 2 1 0
#> my_stat 1.25 0.50 NA
#> V2
#> n 2 1 0
#> my_stat 1.08 0.33 NA
#> V3
#> n 2 1 0
#> my_stat 0.92 0.17 NA
In all of these layers there are specific parameters that need to be
available, and, while rtables
has multiple way to handle
formatting and NA
values, we had to decide how to correctly
handle these and additional extra arguments. We follow the following
scheme:
Level 1: summarize_change()
: all parameters without a
starting dot .*
are used or added to
extra_args
. Specifically, here we solve NA
values by using inclNAs = TRUE
always in
rtables::analyze()
. This will keep NA
values
to the analysis function a_*
. Please follow the way
na_rm
is used in summarize_change
, and you
will see how to retrieve it from ...
only when you need it.
In this case, only at the summary()
level.
na_str
, instead is set only on the top level (in the
rtables::analyze()
call). We may want to be
statistic-dependent in the future, but we still need to think how to
accomplish that. We add the rtables::additional_fun_params
to the analysis function so to make them available as ...
in the next level. Note that they all can be retrieved with
list(...)[["na_rm"]]
.
Level 2: a_change_from_baseline()
: all parameters
starting with a dot .
are ideally used or transmitted into
lower functions from here. Mainly .stats
,
.formats
, .labels
, and
.indent_mods
are used only at this level. We also bring
forward extra_afun_params
to the ...
list for
the statistical function. Notice the handling for additional parameters
in the do.call()
function.
Level 3 and beyond: s_*
functions. In this case
s_summary
is at the end used and the result brought into
the main a_*
function.