--- title: "Programming Concepts and Conventions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Programming Concepts and Conventions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(admiral) library(admiraldev) ``` # Introduction This vignette aims to discuss some of the common programming concepts and conventions that have been adopted within the `{admiral}` family of packages. It is intended to be a user-facing version of the [Programming Strategy](https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html) vignette, but users can also read the latter after becoming more familiar with the package to expand further on any topics of interest. For some of common `{admiral}` FAQ, visit the corresponding FAQ page provided in the same drop down menu as this vignette. # Input and Output It is expected that the input dataset is not grouped. Otherwise an error is issued. The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved. # `{admiral}` Functions and Options As a general principle, the behavior of the `{admiral}` functions is only determined by their input, not by any global object, i.e. all inputs like datasets, variable names, options, etc. must be provided to the function by arguments. Correspondingly, in general functions do not have any side-effects like creating or modifying global objects, printing, writing files, etc. An exception to the above principle is found in our approach to package options (see `get_admiral_option()` and `set_admiral_options()`), which allow for user-defined defaults on commonly used function arguments. For instance, the option `subject_keys` is currently pre-defined as `exprs(STUDYID, USUBJID)`, but can be modified using `set_admiral_options(subject_keys = exprs(...))` at the top of a script. For a full discussion on admiral Inputs, Outputs and Options, see [this section](https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html#input-output-and-side-effects) on our developer-facing Programming Strategy. # Handling of Missing Values {#missing} When using the `{haven}` package to read SAS datasets into R, SAS-style character missing values, i.e. `""`, are *not* converted into proper R `NA` values. Rather they are kept as is. This is problematic for any downstream data processing as R handles `""` just as any other string. Thus, before any data manipulation is being performed SAS blanks should be converted to R `NA`s using `{admiral}`'s `convert_blanks_to_na()` function, e.g. ```r dm <- haven::read_sas("dm.sas7bdat") %>% convert_blanks_to_na() ``` Note that any logical operator being applied to an `NA` value *always* returns `NA` rather than `TRUE` or `FALSE`. ```{r} visits <- c("Baseline", NA, "Screening", "Week 1 Day 7") visits != "Baseline" ``` The only exception is `is.na()` which returns `TRUE` if the input is `NA`. ```{r} is.na(visits) ``` Thus, to filter all visits which are not `"Baseline"` the following condition would need to be used. ```{r} visits != "Baseline" | is.na(visits) ``` Also note that most aggregation functions, like `mean()` or `max()`, also return `NA` if any element of the input vector is missing. ```{r} mean(c(1, NA, 2)) ``` To avoid this behavior one has to explicitly set `na.rm = TRUE`. ```{r} mean(c(1, NA, 2), na.rm = TRUE) ``` This is very important to keep in mind when using `{admiral}`'s aggregation functions such as `derive_summary_records()`. For handling of `NA`s in sorting variables see [Sort Order](generic.html#sort_order). # Expressions in Scripts {#exprs} ## Quoting and Unquoting: Introducing `expr()`, `exprs()`, `!!` and `!!!` ### `expr()` and `exprs()` `expr()` is a function from the `{rlang}` package, which is used to create an **expression**. The expression is not evaluated - rather, it is passed on to the derivation function which evaluates it in its own environment. `exprs()` is the plural version of `expr()`, so it accepts multiple comma-separated items and returns a list of expressions. ```{r, eval = TRUE} library(rlang) adae <- data.frame(USUBJID = "XXX-1", AEDECOD = "HEADACHE") # Return the adae object adae # Return an expression expr(adae) ``` When used within the contest of an `{admiral}` derivation function, `expr()` and `exprs()` allow the function to evaluate the expressions in the context of the input dataset. As an example, `expr()` and `exprs()` allow users to pass variable names of datasets to the function without wrapping them in quotation marks. The expressions framework is powerful because users are able to intuitively "inject code" into `admiral` functions (through the function parameters) using very similar syntax as if they were writing open code, with the exception possibly being an outer `exprs()` wrapper. For instance, in the `derive_vars_merged()` call below, the user is merging `adsl` with `ex` and is able to filter `ex` prior to the merge using an expression passed to the `filter_add` parameter. Because `filter_add` accepts expressions, the user has full power to filter their dataset as they please. In the same vein, the user is able to create any new variables they wish after the merge using the `new_vars` argument, to which they pass a list of expressions containing "standard" R code. ``` {r, eval = FALSE} derive_vars_merged( adsl, dataset_add = ex, filter_add = !is.na(EXENDTM), by_vars = exprs(STUDYID, USUBJID), new_vars = exprs( TRTEDTM = EXENDTM, TRTETMF = EXENTMF, COMPTRT = if_else(!is.na(EXENDTM), "Y", "N") ), order = exprs(EXENDTM), mode = "last" ) ``` ### Bang-Bang (`!!`) and Bang-Bang-Bang (`!!!`) {#unquoting} Sometimes you may want to construct an expression using other, pre-existing expressions. However, it's not immediately clear how to achieve this because expressions inherently pause evaluation of your code before it's executed: ```{r, eval = TRUE} a <- expr(2) b <- expr(3) expr(a + b) # NOT 2 + 3 ``` This is where `!!` (bang-bang) comes in: provided again by the `{rlang}` package, it allows you to inject the contents of an expression into another expression, meaning that by using `!!` you can modify the code inside an expression before R evaluates it. By using `!!` you are **unquoting** an expression, i.e. evaluating it before you pass it onwards. ```{r, eval = TRUE} expr(!!a + !!b) ``` You can see an example of where `!!` comes in handy within `{admiral}` code in [Common Pitfall 1](#pitfall1), where the contents of an expression is unquoted so that it can be passed to `derive_vars_merged()`. `!!!` (bang-bang-bang) is the plural version of `!!` and can be used to unquote a list of expressions: ```{r, eval = TRUE} exprs(!!!list(a, b)) ``` Within `{admiral}`, this operator can be useful if we need to unquote a list of variables (stored as expressions) to use them inside of an `{admiral}` or even `{dplyr}` call. One example is the `{admiral}` subject keys: ```{r, eval = TRUE} get_admiral_option("subject_keys") ``` If we want to use the subject keys stored within this `{admiral}` option to subset a dataset, we need to use `!!!` to unquote this list. Let's construct a dummy example to illustrate the point: ```{r, eval = TRUE, error = TRUE} adcm <- data.frame(STUDYID = "XXX", USUBJID = "XXX-1", CMTRT = "ASPIRIN") adcm # This doesn't work as we are not unquoting the subject keys adcm %>% select(get_admiral_option("subject_keys")) # This works because we are unquoting the subject keys adcm %>% select(!!!get_admiral_option("subject_keys")) ``` You can see another example of `!!!` in action in [this line](https://github.com/pharmaverse/admiral/blob/c5dd8fb196c2cd3f27a6b02ec88fa1abd6b24d60/inst/templates/ad_adex.R#L162) of the `{admiral}` `ADEX` template script, where it is used to dynamically control the by variables passed to an `{admiral}` function. ### Summary In summary, although the expressions framework may seem slightly clunky and mysterious to begin with, it allows for such power and flexibility that it forms a key part of the `{admiral}` package. For a comprehensive treatment of expressions, see [Chapter 18](https://adv-r.hadley.nz/expressions.html) and [Chapter 19](https://adv-r.hadley.nz/quasiquotation.html) of the Advanced R textbook. Chapter 19 specifically covers in much more detail the concept of unquoting. ## Common pitfalls Expressions are very powerful, but this can also lead to misunderstandings about their functionality. Let's set up some dummy data to explore common issues that new (or experienced!) programmers may encounter when dealing with expressions. ```{r, eval = TRUE, echo = TRUE} library(dplyr, warn.conflicts = FALSE) library(admiral) vs <- tribble( ~USUBJID, ~VSTESTCD, ~VISIT, ~VSSTRESN, ~VSSTRESU, ~VSDTC, "01-1301", "WEIGHT", "SCREENING", 82.1, "kg", "2013-08-29", "01-1301", "WEIGHT", "WEEK 2", 81.19, "kg", "2013-09-15", "01-1301", "WEIGHT", "WEEK 4", 82.56, "kg", "2013-09-24", "01-1302", "BMI", "SCREENING", 20.1, "kg/m2", "2013-08-29", "01-1302", "BMI", "WEEK 2", 20.2, "kg/m2", "2013-09-15", "01-1302", "BMI", "WEEK 4", 19.9, "kg/m2", "2013-09-24" ) dm <- tribble( ~USUBJID, ~AGE, "01-1301", 18 ) ``` ### 1. Mistakenly passing something that isn't an expression to an argument {#pitfall1} When writing more complex `{admiral}` code it can be easy to mistakenly pass the wrong input to an argument that expects an expression. For example, the code below fails because `my_expression` is not an expression - it is the name of an object in the global environment containing an expression. ```{r, eval = TRUE, error = TRUE} my_expression <- expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING") derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = my_expression ) ``` To fix this code, we need to [unquote](#unquoting) `my_expression` so that the expression that it is holding is passed correctly to `derive_vars_merged()`: ```{r, eval = TRUE, error = FALSE} derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = !!my_expression ) ``` ### 2. Forgetting that expressions must be evaluable in the dataset In a similar vein to above, even if an actual expression *is* passed as an argument, you must make sure that it can be evaluated within the dataset of interest. This may seem trivial, but it is a common pitfall because expressions delay evaluation of code and so can delay the identification of issues. For instance, consider this example: ```{r, eval = TRUE, error = TRUE} filter_vs_and_merge <- function(my_expression) { derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = !!my_expression ) } # This works filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING")) # This fails filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING" & VSTPT == "PREDOSE")) ``` The second call fails because hidden within the expression is a mention of `VSTPT`, which was dropped from `vs` in `filter_vs_and_merge()`. # See also - [Programming Strategy](https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html)