Creating an OCCDS ADaM

Introduction

This article describes creating an OCCDS ADaM. Examples are currently presented and tested in the context of ADAE. However, the examples could be applied to other OCCDS ADaMs such as ADCM, ADMH, ADDV, etc.

Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.

Programming Workflow

Read in Data

To start, all data frames needed for the creation of ADAE should be read into the environment. This will be a company specific process. Some of the data frames needed may be AE and ADSL

For example purpose, the CDISC Pilot SDTM and ADaM datasets —which are included in {pharmaversesdtm}— are used.

library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(lubridate)

ae <- pharmaversesdtm::ae
adsl <- admiral::admiral_adsl
ex_single <- admiral::ex_single

ae <- convert_blanks_to_na(ae)

At this step, it may be useful to join ADSL to your AE domain as well. Only the ADSL variables used for derivations are selected at this step. The rest of the relevant ADSL variables would be added later.

adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P, DTHDT, EOSDT)

adae <- derive_vars_merged(
  ae,
  dataset_add = adsl,
  new_vars = adsl_vars,
  by = exprs(STUDYID, USUBJID)
)
USUBJID AESEQ AETERM AESTDTC TRTSDT TRTEDT TRT01A TRT01P DTHDT EOSDT
01-701-1015 1 APPLICATION SITE ERYTHEMA 2014-01-03 2014-01-02 2014-07-02 Placebo Placebo NA 2014-07-02
01-701-1015 2 APPLICATION SITE PRURITUS 2014-01-03 2014-01-02 2014-07-02 Placebo Placebo NA 2014-07-02
01-701-1015 3 DIARRHOEA 2014-01-09 2014-01-02 2014-07-02 Placebo Placebo NA 2014-07-02
01-701-1023 3 ATRIOVENTRICULAR BLOCK SECOND DEGREE 2012-08-26 2012-08-05 2012-09-01 Placebo Placebo NA 2012-09-02
01-701-1023 1 ERYTHEMA 2012-08-07 2012-08-05 2012-09-01 Placebo Placebo NA 2012-09-02
01-701-1023 2 ERYTHEMA 2012-08-07 2012-08-05 2012-09-01 Placebo Placebo NA 2012-09-02
01-701-1023 4 ERYTHEMA 2012-08-07 2012-08-05 2012-09-01 Placebo Placebo NA 2012-09-02
01-703-1086 1 APPLICATION SITE IRRITATION 2012-09-13 2012-09-02 2012-12-04 Xanomeline Low Dose Xanomeline Low Dose NA 2012-12-24
01-703-1086 2 APPLICATION SITE IRRITATION 2012-09-13 2012-09-02 2012-12-04 Xanomeline Low Dose Xanomeline Low Dose NA 2012-12-24
01-703-1086 3 APPLICATION SITE IRRITATION 2012-09-13 2012-09-02 2012-12-04 Xanomeline Low Dose Xanomeline Low Dose NA 2012-12-24

Derive/Impute End and Start Analysis Date/time and Relative Day

This part derives ASTDTM, ASTDT, ASTDY, AENDTM, AENDT, and AENDY. The function derive_vars_dtm() can be used to derive ASTDTM and AENDTM where ASTDTM could be company-specific. ASTDT and AENDT can be derived from ASTDTM and AENDTM, respectively, using function derive_vars_dtm_to_dt(). derive_vars_dy() can be used to create ASTDY and AENDY.

adae <- adae %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    min_dates = exprs(TRTSDT)
  ) %>%
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = exprs(DTHDT, EOSDT)
  ) %>%
  derive_vars_dtm_to_dt(exprs(ASTDTM, AENDTM)) %>%
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ASTDT, AENDT)
  )
USUBJID AESTDTC AEENDTC ASTDTM ASTDT ASTDY AENDTM AENDT AENDY
01-701-1015 2014-01-03 NA 2014-01-03 2014-01-03 2 NA NA NA
01-701-1015 2014-01-03 NA 2014-01-03 2014-01-03 2 NA NA NA
01-701-1015 2014-01-09 2014-01-11 2014-01-09 2014-01-09 8 2014-01-11 23:59:59 2014-01-11 10
01-701-1023 2012-08-26 NA 2012-08-26 2012-08-26 22 NA NA NA
01-701-1023 2012-08-07 2012-08-30 2012-08-07 2012-08-07 3 2012-08-30 23:59:59 2012-08-30 26
01-701-1023 2012-08-07 NA 2012-08-07 2012-08-07 3 NA NA NA
01-701-1023 2012-08-07 2012-08-30 2012-08-07 2012-08-07 3 2012-08-30 23:59:59 2012-08-30 26
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2012-09-13 12 2013-01-02 23:59:59 2013-01-02 123
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2012-09-13 12 2013-01-02 23:59:59 2013-01-02 123
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2012-09-13 12 2013-01-02 23:59:59 2013-01-02 123

See also Date and Time Imputation.

Derive Durations

The function derive_vars_duration() can be used to create the variables ADURN and ADURU.

adae <- adae %>%
  derive_vars_duration(
    new_var = ADURN,
    new_var_unit = ADURU,
    start_date = ASTDT,
    end_date = AENDT
  )
USUBJID AESTDTC AEENDTC ASTDT AENDT ADURN ADURU
01-701-1015 2014-01-03 NA 2014-01-03 NA NA NA
01-701-1015 2014-01-03 NA 2014-01-03 NA NA NA
01-701-1015 2014-01-09 2014-01-11 2014-01-09 2014-01-11 3 DAYS
01-701-1023 2012-08-26 NA 2012-08-26 NA NA NA
01-701-1023 2012-08-07 2012-08-30 2012-08-07 2012-08-30 24 DAYS
01-701-1023 2012-08-07 NA 2012-08-07 NA NA NA
01-701-1023 2012-08-07 2012-08-30 2012-08-07 2012-08-30 24 DAYS
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2013-01-02 112 DAYS
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2013-01-02 112 DAYS
01-703-1086 2012-09-13 2013-01-02 2012-09-13 2013-01-02 112 DAYS

Derive ATC variables

The function derive_vars_atc() can be used to derive ATC Class Variables.

It helps to add Anatomical Therapeutic Chemical class variables from FACM to ADCM.

The expected result is the input dataset with ATC variables added.

cm <- tibble::tribble(
  ~STUDYID,  ~USUBJID,       ~CMGRPID, ~CMREFID,  ~CMDECOD,
  "STUDY01", "BP40257-1001", "14",     "1192056", "PARACETAMOL",
  "STUDY01", "BP40257-1001", "18",     "2007001", "SOLUMEDROL",
  "STUDY01", "BP40257-1002", "19",     "2791596", "SPIRONOLACTONE"
)
facm <- tibble::tribble(
  ~STUDYID,  ~USUBJID,       ~FAGRPID, ~FAREFID,  ~FATESTCD,  ~FASTRESC,
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC1CD", "N",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC2CD", "N02",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC3CD", "N02B",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC4CD", "N02BE",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC1CD", "D",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC2CD", "D10",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC3CD", "D10A",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC4CD", "D10AA",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC1CD", "D",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC2CD", "D07",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC3CD", "D07A",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC4CD", "D07AA",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC1CD", "H",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC2CD", "H02",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC3CD", "H02A",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC4CD", "H02AB",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC1CD", "C",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC2CD", "C03",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC3CD", "C03D",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC4CD", "C03DA"
)

derive_vars_atc(cm, dataset_facm = facm, id_vars = exprs(FAGRPID))
#> # A tibble: 5 × 9
#>   STUDYID USUBJID      CMGRPID CMREFID CMDECOD       ATC1CD ATC2CD ATC3CD ATC4CD
#>   <chr>   <chr>        <chr>   <chr>   <chr>         <chr>  <chr>  <chr>  <chr> 
#> 1 STUDY01 BP40257-1001 14      1192056 PARACETAMOL   N      N02    N02B   N02BE 
#> 2 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    D      D10    D10A   D10AA 
#> 3 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    D      D07    D07A   D07AA 
#> 4 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    H      H02    H02A   H02AB 
#> 5 STUDY01 BP40257-1002 19      2791596 SPIRONOLACTO… C      C03    C03D   C03DA

Derive Planned and Actual Treatment

TRTA and TRTP must match at least one value of the character treatment variables in ADSL (e.g., TRTxxA/TRTxxP, TRTSEQA/TRTSEQP, TRxxAGy/TRxxPGy).

An example of a simple implementation for a study without periods could be:

adae <- mutate(adae, TRTP = TRT01P, TRTA = TRT01A)

count(adae, TRTP, TRTA, TRT01P, TRT01A)
#> # A tibble: 2 × 5
#>   TRTP                TRTA                TRT01P              TRT01A           n
#>   <chr>               <chr>               <chr>               <chr>        <int>
#> 1 Placebo             Placebo             Placebo             Placebo         10
#> 2 Xanomeline Low Dose Xanomeline Low Dose Xanomeline Low Dose Xanomeline …     6

For studies with periods see the “Visit and Period Variables” vignette.

Derive Date/Date-time of Last Dose

The function derive_vars_joined() can be used to derive the last dose date before the start of the event.

ex_single <- derive_vars_dtm(
  ex_single,
  dtc = EXSTDTC,
  new_vars_prefix = "EXST",
  flag_imputation = "none"
)

adae <- derive_vars_joined(
  adae,
  ex_single,
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(LDOSEDTM = EXSTDTM),
  join_vars = exprs(EXSTDTM),
  join_type = "all",
  order = exprs(EXSTDTM),
  filter_add = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) & !is.na(EXSTDTM),
  filter_join = EXSTDTM <= ASTDTM,
  mode = "last"
)
USUBJID AEDECOD AESEQ AESTDTC AEENDTC ASTDT AENDT LDOSEDTM
01-701-1015 APPLICATION SITE ERYTHEMA 1 2014-01-03 NA 2014-01-03 NA 2014-01-03
01-701-1015 APPLICATION SITE PRURITUS 2 2014-01-03 NA 2014-01-03 NA 2014-01-03
01-701-1015 DIARRHOEA 3 2014-01-09 2014-01-11 2014-01-09 2014-01-11 2014-01-09
01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE 3 2012-08-26 NA 2012-08-26 NA 2012-08-26
01-701-1023 ERYTHEMA 1 2012-08-07 2012-08-30 2012-08-07 2012-08-30 2012-08-07
01-701-1023 ERYTHEMA 2 2012-08-07 NA 2012-08-07 NA 2012-08-07
01-701-1023 ERYTHEMA 4 2012-08-07 2012-08-30 2012-08-07 2012-08-30 2012-08-07
01-703-1086 APPLICATION SITE IRRITATION 1 2012-09-13 2013-01-02 2012-09-13 2013-01-02 2012-09-13
01-703-1086 APPLICATION SITE IRRITATION 2 2012-09-13 2013-01-02 2012-09-13 2013-01-02 2012-09-13
01-703-1086 APPLICATION SITE IRRITATION 3 2012-09-13 2013-01-02 2012-09-13 2013-01-02 2012-09-13

Derive Severity, Causality, and Toxicity Grade

The variables ASEV, AREL, and ATOXGR can be added using simple dplyr::mutate() assignments, if no imputation is required.

adae <- adae %>%
  mutate(
    ASEV = AESEV,
    AREL = AEREL
  )

Derive Treatment Emergent Flag

To derive the treatment emergent flag TRTEMFL, one can call derive_var_trtemfl(). In the example below, we use 30 days in the flag derivation.

adae <- adae %>%
  derive_var_trtemfl(
    trt_start_date = TRTSDT,
    trt_end_date = TRTEDT,
    end_window = 30
  )
USUBJID TRTSDT TRTEDT AESTDTC ASTDT TRTEMFL
01-701-1015 2014-01-02 2014-07-02 2014-01-03 2014-01-03 Y
01-701-1015 2014-01-02 2014-07-02 2014-01-03 2014-01-03 Y
01-701-1015 2014-01-02 2014-07-02 2014-01-09 2014-01-09 Y
01-701-1023 2012-08-05 2012-09-01 2012-08-26 2012-08-26 Y
01-701-1023 2012-08-05 2012-09-01 2012-08-07 2012-08-07 Y
01-701-1023 2012-08-05 2012-09-01 2012-08-07 2012-08-07 Y
01-701-1023 2012-08-05 2012-09-01 2012-08-07 2012-08-07 Y
01-703-1086 2012-09-02 2012-12-04 2012-09-13 2012-09-13 Y
01-703-1086 2012-09-02 2012-12-04 2012-09-13 2012-09-13 Y
01-703-1086 2012-09-02 2012-12-04 2012-09-13 2012-09-13 Y

To derive on-treatment flag (ONTRTFL) in an ADaM dataset with a single occurrence date, we use derive_var_ontrtfl().

The expected result is the input dataset with an additional column named ONTRTFL with a value of "Y" or NA.

If you want to also check an end date, you could add the end_date argument. Note that in this scenario you could set span_period = TRUE if you want occurrences that started prior to drug intake, and was ongoing or ended after this time to be considered as on-treatment.

bds1 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-02-24"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-01-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2019-12-31"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds1,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT
)
#> # A tibble: 3 × 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-02-24 2020-01-01 2020-03-01 Y      
#> 2 P02     2020-01-01 2020-01-01 2020-03-01 Y      
#> 3 P03     2019-12-31 2020-01-01 2020-03-01 <NA>

bds2 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-07-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-04-30"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2020-03-15"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds2,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT,
  ref_end_window = 60
)
#> # A tibble: 3 × 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-07-01 2020-01-01 2020-03-01 <NA>   
#> 2 P02     2020-04-30 2020-01-01 2020-03-01 Y      
#> 3 P03     2020-03-15 2020-01-01 2020-03-01 Y

bds3 <- tibble::tribble(
  ~ADTM,              ~TRTSDTM,           ~TRTEDTM,           ~TPT,
  "2020-01-02T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA,
  "2020-01-01T12:00", "2020-01-01T12:00", "2020-03-01T12:00", "PRE",
  "2019-12-31T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA
) %>%
  mutate(
    ADTM = ymd_hm(ADTM),
    TRTSDTM = ymd_hm(TRTSDTM),
    TRTEDTM = ymd_hm(TRTEDTM)
  )
derive_var_ontrtfl(
  bds3,
  start_date = ADTM,
  ref_start_date = TRTSDTM,
  ref_end_date = TRTEDTM,
  filter_pre_timepoint = TPT == "PRE"
)
#> # A tibble: 3 × 5
#>   ADTM                TRTSDTM             TRTEDTM             TPT   ONTRTFL
#>   <dttm>              <dttm>              <dttm>              <chr> <chr>  
#> 1 2020-01-02 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  Y      
#> 2 2020-01-01 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 PRE   <NA>   
#> 3 2019-12-31 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  <NA>

Derive Occurrence Flags

The function derive_var_extreme_flag() can help derive variables such as AOCCIFL, AOCCPIFL, AOCCSIFL, and AOCCzzFL.

If grades were collected, the following can be used to flag first occurrence of maximum toxicity grade.

adae <- adae %>%
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars = exprs(USUBJID),
      order = exprs(desc(ATOXGR), ASTDTM, AESEQ),
      new_var = AOCCIFL,
      mode = "first"
    ),
    filter = TRTEMFL == "Y"
  )

Similarly, ASEV can also be used to derive the occurrence flags, if severity is collected. In this case, the variable will need to be recoded to a numeric variable. Flag first occurrence of most severe adverse event:

adae <- adae %>%
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars = exprs(USUBJID),
      order = exprs(
        as.integer(factor(
          ASEV,
          levels = c("DEATH THREATENING", "SEVERE", "MODERATE", "MILD")
        )),
        ASTDTM, AESEQ
      ),
      new_var = AOCCIFL,
      mode = "first"
    ),
    filter = TRTEMFL == "Y"
  )
USUBJID ASTDTM ASEV AESEQ TRTEMFL AOCCIFL
01-701-1015 2014-01-03 MILD 1 Y Y
01-701-1015 2014-01-03 MILD 2 Y NA
01-701-1015 2014-01-09 MILD 3 Y NA
01-701-1023 2012-08-07 MODERATE 2 Y Y
01-701-1023 2012-08-07 MILD 1 Y NA
01-701-1023 2012-08-07 MILD 4 Y NA
01-701-1023 2012-08-26 MILD 3 Y NA
01-703-1086 2012-09-13 SEVERE 3 Y Y
01-703-1086 2012-09-13 MODERATE 2 Y NA
01-703-1086 2012-09-13 MILD 1 Y NA

Derive Query Variables

For deriving query variables SMQzzNAM, SMQzzCD, SMQzzSC, SMQzzSCN, or CQzzNAM the derive_vars_query() function can be used. As input it expects a queries dataset, which provides the definition of the queries. See Queries dataset documentation for a detailed description of the queries dataset. The create_query_data() function can be used to create queries datasets.

The following example shows how to derive query variables for Standardized MedDRA Queries (SMQs) in ADAE.

queries <- admiral::queries
PREFIX GRPNAME GRPID SCOPE SCOPEN SRCVAR TERMCHAR TERMNUM
CQ01 Dermatologic events NA NA NA AELLT APPLICATION SITE ERYTHEMA NA
CQ01 Dermatologic events NA NA NA AELLT APPLICATION SITE PRURITUS NA
CQ01 Dermatologic events NA NA NA AELLT ERYTHEMA NA
CQ01 Dermatologic events NA NA NA AELLT LOCALIZED ERYTHEMA NA
CQ01 Dermatologic events NA NA NA AELLT GENERALIZED PRURITUS NA
SMQ02 Immune-Mediated Hypothyroidism 20000160 BROAD 1 AEDECOD BIOPSY THYROID GLAND ABNORMAL NA
SMQ02 Immune-Mediated Hypothyroidism 20000160 BROAD 1 AEDECOD BLOOD THYROID STIMULATING HORMONE ABNORMAL NA
SMQ02 Immune-Mediated Hypothyroidism 20000160 NARROW 2 AEDECOD BIOPSY THYROID GLAND INCREASED NA
SMQ03 Immune-Mediated Guillain-Barre Syndrome 20000131 NARROW 2 AEDECOD GUILLAIN-BARRE SYNDROME NA
SMQ03 Immune-Mediated Guillain-Barre Syndrome 20000131 NARROW 2 AEDECOD MILLER FISHER SYNDROME NA
adae1 <- tibble::tribble(
  ~USUBJID, ~ASTDTM, ~AETERM, ~AESEQ, ~AEDECOD, ~AELLT, ~AELLTCD,
  "01", "2020-06-02 23:59:59", "ALANINE AMINOTRANSFERASE ABNORMAL",
  3, "Alanine aminotransferase abnormal", NA_character_, NA_integer_,
  "02", "2020-06-05 23:59:59", "BASEDOW'S DISEASE",
  5, "Basedow's disease", NA_character_, 1L,
  "03", "2020-06-07 23:59:59", "SOME TERM",
  2, "Some query", "Some term", NA_integer_,
  "05", "2020-06-09 23:59:59", "ALVEOLAR PROTEINOSIS",
  7, "Alveolar proteinosis", NA_character_, NA_integer_
)

adae_query <- derive_vars_query(dataset = adae1, dataset_queries = queries)
USUBJID ASTDTM AETERM AESEQ AEDECOD AELLT AELLTCD SMQ02NAM SMQ02CD SMQ02SC SMQ02SCN SMQ03NAM SMQ03CD SMQ03SC SMQ03SCN SMQ05NAM SMQ05CD SMQ05SC SMQ05SCN CQ01NAM CQ04NAM CQ04CD CQ06NAM CQ06CD
01 2020-06-02 23:59:59 ALANINE AMINOTRANSFERASE ABNORMAL 3 Alanine aminotransferase abnormal NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
02 2020-06-05 23:59:59 BASEDOW’S DISEASE 5 Basedow’s disease NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Immune-Mediated Colitis 10009888
03 2020-06-07 23:59:59 SOME TERM 2 Some query Some term NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
05 2020-06-09 23:59:59 ALVEOLAR PROTEINOSIS 7 Alveolar proteinosis NA NA NA NA NA NA NA NA NA NA Immune-Mediated Pneumonitis 20000042 NARROW 2 NA NA NA NA NA

Similarly to SMQ, the derive_vars_query() function can be used to derive Standardized Drug Groupings (SDG).

sdg <- tibble::tribble(
  ~PREFIX, ~GRPNAME,          ~GRPID, ~SCOPE,  ~SCOPEN, ~SRCVAR,   ~TERMCHAR,          ~TERMNUM,
  "SDG01", "Diuretics",           11, "BROAD", 1,       "CMDECOD", "Diuretic 1",       NA,
  "SDG01", "Diuretics",           11, "BROAD", 1,       "CMDECOD", "Diuretic 2",       NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 1", NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 2", NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 3", NA,
)
adcm <- tibble::tribble(
  ~USUBJID, ~ASTDTM,               ~CMDECOD,
  "01",     "2020-06-02 23:59:59", "Diuretic 1",
  "02",     "2020-06-05 23:59:59", "Diuretic 1",
  "03",     "2020-06-07 23:59:59", "Costicosteroid 2",
  "05",     "2020-06-09 23:59:59", "Diuretic 2"
)
adcm_query <- derive_vars_query(adcm, sdg)
USUBJID ASTDTM CMDECOD SDG01NAM SDG01CD SDG01SC SDG01SCN SDG02NAM SDG02CD SDG02SC SDG02SCN
01 2020-06-02 23:59:59 Diuretic 1 Diuretics 11 BROAD 1 NA NA NA NA
02 2020-06-05 23:59:59 Diuretic 1 Diuretics 11 BROAD 1 NA NA NA NA
03 2020-06-07 23:59:59 Costicosteroid 2 NA NA NA NA Costicosteroids 12 BROAD 1
05 2020-06-09 23:59:59 Diuretic 2 Diuretics 11 BROAD 1 NA NA NA NA

Add the ADSL variables

If needed, the other ADSL variables can now be added:

adae <- adae %>%
  derive_vars_merged(
    dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
    by_vars = exprs(STUDYID, USUBJID)
  )
USUBJID AEDECOD ASTDTM DTHDT RFSTDTC RFENDTC AGE AGEU SEX
01-701-1015 APPLICATION SITE ERYTHEMA 2014-01-03 NA 2014-01-02 2014-07-02 63 YEARS F
01-701-1015 APPLICATION SITE PRURITUS 2014-01-03 NA 2014-01-02 2014-07-02 63 YEARS F
01-701-1015 DIARRHOEA 2014-01-09 NA 2014-01-02 2014-07-02 63 YEARS F
01-701-1023 ERYTHEMA 2012-08-07 NA 2012-08-05 2012-09-02 64 YEARS M
01-701-1023 ERYTHEMA 2012-08-07 NA 2012-08-05 2012-09-02 64 YEARS M
01-701-1023 ERYTHEMA 2012-08-07 NA 2012-08-05 2012-09-02 64 YEARS M
01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE 2012-08-26 NA 2012-08-05 2012-09-02 64 YEARS M
01-703-1086 APPLICATION SITE IRRITATION 2012-09-13 NA 2012-09-02 2012-12-24 71 YEARS M
01-703-1086 APPLICATION SITE IRRITATION 2012-09-13 NA 2012-09-02 2012-12-24 71 YEARS M
01-703-1086 APPLICATION SITE IRRITATION 2012-09-13 NA 2012-09-02 2012-12-24 71 YEARS M

Derive Analysis Sequence Number

The function derive_var_obs_number() can be used for deriving ASEQ variable to ensure the uniqueness of subject records within the dataset.

For example, there can be multiple records present in ADCM for a single subject with the same ASTDTM and CMSEQ variables. But these records still differ at ATC level:

adcm <- tibble::tribble(
  ~USUBJID,       ~ASTDTM,          ~CMSEQ, ~CMDECOD,         ~ATC1CD, ~ATC2CD, ~ATC3CD, ~ATC4CD,
  "BP40257-1001", "2013-07-05 UTC", "14",   "PARACETAMOL",    "N",     "N02",   "N02B",  "N02BE",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "D",     "D10",   "D10A",  "D10AA",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "D",     "D07",   "D07A",  "D07AA",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "H",     "H02",   "H02A",  "H02AB",
  "BP40257-1002", "2012-12-15 UTC", "19",   "SPIRONOLACTONE", "C",     "C03",   "C03D",  "C03DA"
)

adcm_aseq <- adcm %>%
  derive_var_obs_number(
    by_vars    = exprs(USUBJID),
    order      = exprs(ASTDTM, CMSEQ, ATC1CD, ATC2CD, ATC3CD, ATC4CD),
    new_var    = ASEQ,
    check_type = "error"
  )
USUBJID ASTDTM CMSEQ CMDECOD ATC1CD ATC2CD ATC3CD ATC4CD ASEQ
BP40257-1001 2013-07-05 UTC 14 PARACETAMOL N N02 N02B N02BE 1
BP40257-1001 2013-08-15 UTC 18 SOLUMEDROL D D07 D07A D07AA 2
BP40257-1001 2013-08-15 UTC 18 SOLUMEDROL D D10 D10A D10AA 3
BP40257-1001 2013-08-15 UTC 18 SOLUMEDROL H H02 H02A H02AB 4
BP40257-1002 2012-12-15 UTC 19 SPIRONOLACTONE C C03 C03D C03DA 1

Add Labels and Attributes

Adding labels and attributes for SAS transport files is supported by the following packages:

  • metacore: establish a common foundation for the use of metadata within an R session.

  • metatools: enable the use of metacore objects. Metatools can be used to build datasets or enhance columns in existing datasets as well as checking datasets against the metadata.

  • xportr: functionality to associate all metadata information to a local R data frame, perform data set level validation checks and convert into a transport v5 file(xpt).

NOTE: All these packages are in the experimental phase, but the vision is to have them associated with an End to End pipeline under the umbrella of the pharmaverse. An example of applying metadata and perform associated checks can be found at the pharmaverse E2E example.

Example Scripts

ADaM Sourcing Command
ADAE use_ad_template("ADAE")
ADCM use_ad_template("ADCM")