---
title: "Creating ADBCVA"
output:
  rmarkdown::html_vignette:
vignette: >
  %\VignetteIndexEntry{Creating ADBCVA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(admiraldev)
```

# Introduction

This article describes creating an ADBCVA ADaM with Best-Corrected Visual Acuity (BCVA) data for ophthalmology endpoints. It is to be used in conjunction with the article on [creating a BDS dataset from SDTM](https://pharmaverse.github.io/admiral/articles/bds_finding.html). As such, derivations and processes that are not specific to ADBCVA are absent, and the user is invited to consult the aforementioned article for guidance.

**Note**: *All examples assume CDISC SDTM and/or ADaM format as input unless
otherwise specified.*

## Dataset Contents

As the name ADBCVA implies, `{admiralophtha}` suggests to populate ADBCVA solely with BCVA records from the OE SDTM.

## Required Packages

The examples of this vignette require the following packages.

```{r, warning=FALSE, message=FALSE}
library(dplyr)
library(admiral)
library(pharmaversesdtm)
library(admiraldev)
library(admiralophtha)
```

# Programming Workflow

* [Initial Set Up of ADBCVA ](#setup)
* [Deriving LogMAR Score Parameters](#logmar)
* [Further Derivations of Standard BDS Variables](#further)
* [Deriving Analysis Value Categories for Snellen Scores](#avalcats)
* [Deriving Criterion Flags for BCVA Change](#critflags)
* [Additional Notes](#notes)
* [Example Script](#example)

## Initial set up of ADBCVA {#setup}

As with all BDS ADaM datasets, one should start from the OE SDTM, where only the BCVA records are of interest. For the purposes of the next two sections, we shall be using the `{admiral}` OE and ADSL test data. We will also require a lookup table for the mapping of parameter codes.

**Note**: to simulate an ophthalmology study, we add a randomly generated `STUDYEYE` variable to ADSL, but in practice `STUDYEYE` will already have been derived using `derive_var_studyeye()`.

```{r}
data("oe_ophtha")
data("admiral_adsl")

# Add STUDYEYE to ADSL to simulate an ophtha dataset
adsl <- admiral_adsl %>%
  as.data.frame() %>%
  mutate(STUDYEYE = sample(c("LEFT", "RIGHT"), n(), replace = TRUE)) %>%
  convert_blanks_to_na()

oe <- convert_blanks_to_na(oe_ophtha) %>%
  ungroup()

# ---- Lookup table ----
param_lookup <- tibble::tribble(
  ~OETESTCD, ~OECAT, ~OESCAT, ~AFEYE, ~PARAMCD, ~PARAM, ~PARAMN,
  "VACSCORE", "BEST CORRECTED VISUAL ACUITY", "OVERALL EVALUATION", "Study Eye", "SBCVA", "Study Eye Visual Acuity Score (letters)", 1, # nolint
  "VACSCORE", "BEST CORRECTED VISUAL ACUITY", "OVERALL EVALUATION", "Fellow Eye", "FBCVA", "Fellow Eye Visual Acuity Score (letters)", 2, # nolint
)
```

Following this setup, the programmer can start constructing ADBCVA. The first step is to subset OE to only BCVA parameters and merge with ADSL. This is required for two reasons: firstly, `STUDYEYE` is crucial in the mapping of `AFEYE` and `PARAMCD`'s. Secondly, the treatment start date (`TRTSDT`) is also a prerequisite for the derivation of variables such as Analysis Day (`ADY`).

```{r}
adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P, STUDYEYE)

adbcva <- oe %>%
  filter(
    OETESTCD %in% c("VACSCORE")
  ) %>%
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = adsl_vars,
    by_vars = get_admiral_option("subject_keys")
  )
```

The next item of business is to derive `AVAL`, `AVALU`, and `DTYPE`. In this example, due to the small number of parameters their derivation is trivial. `AFEYE` is also created in this step using the function `derive_var_afeye()`.

```{r}
adbcva <- adbcva %>%
  mutate(
    AVAL = OESTRESN,
    AVALU = "letters",
    DTYPE = NA_character_
  ) %>%
  derive_var_afeye(loc_var = OELOC, lat_var = OELAT)
```

Moving forwards, `PARAM` and `PARAMCD` can be assigned using `derive_vars_merged()` from `{admiral}` and the lookup table `param_lookup` generated above.

```{r}
adbcva <- adbcva %>%
  derive_vars_merged(
    dataset_add = param_lookup,
    new_vars = exprs(PARAM, PARAMCD),
    by_vars = exprs(OETESTCD, AFEYE),
    filter_add = PARAMCD %in% c("SBCVA", "FBCVA")
  )
```

## Deriving LogMAR Score Parameters {#logmar}

Often ADBCVA datasets contain derived records for BCVA in LogMAR units. This can easily be achieved as follows using `derive_param_computed()`. The conversion of units is done using `convert_etdrs_to_logmar()`. Two separate calls are required due to the parameters being split by study and fellow eye. Once these extra parameters are added, all the records that will be in the end dataset are now present, so `AVALC` and day/date variables such as `ADY` and `ADT` can be derived.

```{r}
adbcva <- adbcva %>%
  derive_param_computed(
    by_vars = c(
      get_admiral_option("subject_keys"),
      exprs(VISIT, VISITNUM, OEDY, OEDTC, AFEYE, !!!adsl_vars)
    ),
    parameters = c("SBCVA"),
    set_values_to = exprs(
      AVAL = convert_etdrs_to_logmar(AVAL.SBCVA),
      PARAMCD = "SBCVALOG",
      PARAM = "Study Eye Visual Acuity LogMAR Score",
      DTYPE = NA_character_,
      AVALU = "LogMAR"
    )
  ) %>%
  derive_param_computed(
    by_vars = c(
      get_admiral_option("subject_keys"),
      exprs(VISIT, VISITNUM, OEDY, OEDTC, AFEYE, !!!adsl_vars)
    ),
    parameters = c("FBCVA"),
    set_values_to = exprs(
      AVAL = convert_etdrs_to_logmar(AVAL.FBCVA),
      PARAMCD = "FBCVALOG",
      PARAM = "Fellow Eye Visual Acuity LogMAR Score",
      DTYPE = NA_character_,
      AVALU = "LogMAR"
    )
  ) %>%
  mutate(AVALC = as.character(AVAL)) %>%
  derive_vars_dt(
    new_vars_prefix = "A",
    dtc = OEDTC,
    flag_imputation = "none"
  ) %>%
  derive_vars_dy(reference_date = TRTSDT, source_vars = exprs(ADT))
```

Importantly, the above calls to `derive_param_computed()` list the SDTM variables `VISIT`, `VISITNUM`, `OEDY` and `OEDTC` as `by_vars` for the function. This is because they will be necessary to derive ADaM variables such as `AVISIT` and `ADY` in successive steps. Once all the ADaM variables which require them are derived, the SDTM variables should be set to missing for the derived records, as per ADaM standards:

```{r}
adbcva <- adbcva %>%
  mutate(
    VISIT = ifelse(PARAMCD %in% c("SBCVALOG", "FBCVALOG"), NA_character_, VISIT),
    VISITNUM = ifelse(PARAMCD %in% c("SBCVALOG", "FBCVALOG"), NA, VISITNUM),
    OEDY = ifelse(PARAMCD %in% c("SBCVALOG", "FBCVALOG"), NA, OEDY),
    OEDTC = ifelse(PARAMCD %in% c("SBCVALOG", "FBCVALOG"), NA_character_, OEDTC)
  )
```

## Further Derivations of Standard BDS Variables {#further}

The user is invited to consult the article on [creating a BDS dataset from SDTM](https://pharmaverse.github.io/admiral/articles/bds_finding.html) to learn how to add standard BDS variables to ADBCVA. Henceforth, for the purposes of this article, the following sections use the ADBCVA dataset generated by the corresponding `{admiralophtha}` template program as a starting point.

**Note**: This dataset already comes with some criterion flags and analysis value categorisation variables, so for illustration purposes these are removed.

```{r}
data("admiralophtha_adbcva")

adbcva <- admiralophtha_adbcva %>%
  select(-starts_with("CRIT"), -starts_with("AVALCA"))
```

## Deriving Analysis Value Categories for Snellen Scores {#avalcats}
Some ophthalmology studies may desire to subdivide BCVA records according to which Snellen category they fall into (eg, 20/320, 20/100, 20/20 etc). This is best done through the use of `AVALCATx`/`AVALCAxN` variable pairs. Currently, `{admiralophtha}` does not provide specific functionality to create `AVALCATx`/`AVALCAxN` pairs, although this may be included in future releases of the package. With the current toolset, the suggested approach to derive such variables is to:

* Create a lookup table which assigns numeric equivalents (i.e. `AVALCAxN`) to Snellen categories.
* Create a format function to map each `AVAL` to a numeric category.
* Add `AVALCAxN` through a mutate statement using the format function.
* Add `AVALCATx` using `derive_vars_merged` in combination with the lookup table.

```{r}
avalcat_lookup <- tibble::tribble(
  ~PARAMCD, ~AVALCA1N, ~AVALCAT1,
  "SBCVA", 1000, "< 20/800",
  "SBCVA", 800, "20/800",
  "SBCVA", 640, "20/640",
  "SBCVA", 500, "20/500",
  "SBCVA", 400, "20/400",
  "SBCVA", 320, "20/320",
  "SBCVA", 250, "20/250",
  "SBCVA", 200, "20/200",
  "SBCVA", 160, "20/160",
  "SBCVA", 125, "20/125",
  "SBCVA", 100, "20/100",
  "SBCVA", 80, "20/80",
  "SBCVA", 63, "20/63",
  "SBCVA", 50, "20/50",
  "SBCVA", 40, "20/40",
  "SBCVA", 32, "20/32",
  "SBCVA", 25, "20/25",
  "SBCVA", 20, "20/20",
  "SBCVA", 16, "20/16",
  "SBCVA", 12, "20/12",
  "SBCVA", 1, "> 20/12",
)

avalcat_lookup <- avalcat_lookup %>%
  mutate(PARAMCD = "FBCVA") %>%
  rbind(avalcat_lookup)

format_avalcat1n <- function(param, aval) {
  case_when(
    param %in% c("SBCVA", "FBCVA") & aval >= 0 & aval <= 3 ~ 1000,
    param %in% c("SBCVA", "FBCVA") & aval >= 4 & aval <= 8 ~ 800,
    param %in% c("SBCVA", "FBCVA") & aval >= 9 & aval <= 13 ~ 640,
    param %in% c("SBCVA", "FBCVA") & aval >= 14 & aval <= 18 ~ 500,
    param %in% c("SBCVA", "FBCVA") & aval >= 19 & aval <= 23 ~ 400,
    param %in% c("SBCVA", "FBCVA") & aval >= 24 & aval <= 28 ~ 320,
    param %in% c("SBCVA", "FBCVA") & aval >= 29 & aval <= 33 ~ 250,
    param %in% c("SBCVA", "FBCVA") & aval >= 34 & aval <= 38 ~ 200,
    param %in% c("SBCVA", "FBCVA") & aval >= 39 & aval <= 43 ~ 160,
    param %in% c("SBCVA", "FBCVA") & aval >= 44 & aval <= 48 ~ 125,
    param %in% c("SBCVA", "FBCVA") & aval >= 49 & aval <= 53 ~ 100,
    param %in% c("SBCVA", "FBCVA") & aval >= 54 & aval <= 58 ~ 80,
    param %in% c("SBCVA", "FBCVA") & aval >= 59 & aval <= 63 ~ 63,
    param %in% c("SBCVA", "FBCVA") & aval >= 64 & aval <= 68 ~ 50,
    param %in% c("SBCVA", "FBCVA") & aval >= 69 & aval <= 73 ~ 40,
    param %in% c("SBCVA", "FBCVA") & aval >= 74 & aval <= 78 ~ 32,
    param %in% c("SBCVA", "FBCVA") & aval >= 79 & aval <= 83 ~ 25,
    param %in% c("SBCVA", "FBCVA") & aval >= 84 & aval <= 88 ~ 20,
    param %in% c("SBCVA", "FBCVA") & aval >= 89 & aval <= 93 ~ 16,
    param %in% c("SBCVA", "FBCVA") & aval >= 94 & aval <= 97 ~ 12,
    param %in% c("SBCVA", "FBCVA") & aval >= 98 ~ 1
  )
}

adbcva <- adbcva %>%
  mutate(AVALCA1N = format_avalcat1n(param = PARAMCD, aval = AVAL)) %>%
  derive_vars_merged(
    avalcat_lookup,
    by = exprs(PARAMCD, AVALCA1N)
  )
```

The resulting output is shown below (limited to the first patient only):

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  adbcva %>% filter(USUBJID == "01-701-1015"),
  display_vars = exprs(
    USUBJID, PARAMCD, AVAL, AVALCAT1, AVALCA1N
  )
)
```
## Deriving Criterion Flags for BCVA Change {#critflags}

`{admiralophtha}` suggests the use of criterion flag variable pairs (`CRITx`/`CRITxFL`) to program BCVA endpoints such as *Avoiding a loss of x letters*  or *Gain of y letters* or *Gain of between x and y letters* (relative to baseline or other basetypes). The package provides the function `derive_var_bcvacritxfl()` to program these endpoints efficiently and consistently. In terms of the logic to apply to the variable `CHG`, the endpoints fall into three classes, which can be represented by inequalities:

* Class 1: `CHG` value lying inside a range, `a <= CHG <= b`.
* Class 2: `CHG` value below an upper limit, `CHG <= a`.
* Class 3: `CHG` value above a lower limit, `CHG => b`.

By using `derive_var_bcvacritxfl()`, the ADaM programmer can implement all three types of endpoint at once. This is achieved by feeding the appropriate ranges, upper limits and lower limits to the `bcva_ranges`, `bcva_uplims` and `bcva_lowlims` arguments of the function. For instance, let's suppose that the endpoints of interest are:

* *Gain of between 5 and 10 letters relative to baseline* (Class 1: `5 <= CHG <= 10`)
* *Gain of 25 letters or fewer relative to baseline* (Class 2: `CHG <= 25`)
* *Loss of 5 letters or more relative to baseline* (Class 2: `CHG <= -5`)
* *Gain of 15 letters or more relative to baseline* (Class 3: `CHG >= 15`)
* *Loss of 10 letters or fewer relative to baseline* (Class 3: `CHG >= -10`).

Then, the following call will implement criterion variable/flag pairs for the endpoints above. The `CRITx` variables will automatically encode the correct inequality. Note that that `restrict_derivation()` is wrapped around the call so as to only derive the variables for the relevant parameters. In this way, the `filter` argument can be altered to restrict derivation to only relevant records. Note also that the argument `crit_var = exprs(CHG)` has to be specified so that the criterion flags are derived with respect to the correct variable.

```{r}
adbcva <- adbcva %>% restrict_derivation(
  derivation = derive_var_bcvacritxfl,
  args = params(
    crit_var = exprs(CHG),
    bcva_ranges = list(c(5, 10)),
    bcva_uplims = list(25, -5),
    bcva_lowlims = list(15, -10)
  ),
  filter = PARAMCD %in% c("SBCVA", "FBCVA")
)
```

The resulting output is shown below (limited to the first patient only):

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  adbcva %>%
    filter(USUBJID == "01-701-1015") %>%
    select(USUBJID, PARAMCD, AVAL, CHG, starts_with("CRIT"))
)
```

It is also possible to assign significance to the "x" in `CRITxFL`. For instance, one could designate all criterion flags of Class 1 as `CRIT1yFL`, Class 2 as `CRIT2yFL`, and Class 3 as `CRIT3yFL`. The argument `critxfl_index` allows a simple implementation of this in conjunction with three separate calls to `derive_var_bcvacritxfl()`:

```{r}
adbcva <- adbcva %>%
  restrict_derivation(
    derivation = derive_var_bcvacritxfl,
    args = params(
      crit_var = exprs(CHG),
      bcva_ranges = list(c(5, 10)),
      critxfl_index = 10
    ),
    filter = PARAMCD %in% c("SBCVA", "FBCVA")
  ) %>%
  restrict_derivation(
    derivation = derive_var_bcvacritxfl,
    args = params(
      crit_var = exprs(CHG),
      bcva_uplims = list(25, -5),
      critxfl_index = 20
    ),
    filter = PARAMCD %in% c("SBCVA", "FBCVA")
  ) %>%
  restrict_derivation(
    derivation = derive_var_bcvacritxfl,
    args = params(
      crit_var = exprs(CHG),
      bcva_lowlims = list(15, -10),
      critxfl_index = 30
    ),
    filter = PARAMCD %in% c("SBCVA", "FBCVA")
  )
```
  
## Additional Notes
  
  * When interpreting endpoints such as *Loss of 5 letters or fewer relative to baseline*, it is implicitly assumed in this article that this also includes the case where letters are *gained*, so that the inequality reads `CHG >= -5`. One would then use the `bcva_lowlims = list(-5)` argument of `derive_var_bcvacritxfl()`  to program such an endpoint. If this is not the case, i.e. one wishes to exclude cases of letter gains, then the inequality of interest would instead be `-5 <= CHG <= -1`. Importantly, `derive_var_bcvacritxfl()` could still be used, but with the argument `bcva_ranges = list(c(-5, -1))`.
  
  * This vignette extensively showcases the use of `derive_var_bcvacritxfl()` applied to the variable `CHG`, but through the argument `crit_var` the function can also be used to create criterion flag relative to other variables (e.g. `crit_var = exprs(AVAL)` for `AVAL`).
  
## Example Script {#example}
  
  ADaM | Sample Code 
  ---- | -------------- 
  ADBCVA | [ad_adbcva.R](https://github.com/pharmaverse/admiralophtha/blob/main/inst/templates/ad_adbcva.R)