Introduction

This article describes creating questionnaire ADaMs. Although questionnaire data is collected in a single SDTM dataset (QS), usually it does not make sense to create a single ADQS dataset for all questionnaire analyses. For example, a univariate analysis of scores by visit requires different variables than a time-to-event analysis. Therefore this vignette does not provide a programming workflow for a complete dataset, but provides examples for deriving common types of questionnaire parameters.

At the moment, {admiral} does not provide functions or metadata for specific questionnaires nor functionality for handling the vast amount of questionnaires and related parameters, e.g. a metadata structure for storing parameter definitions and functions for reading such metadata. We plan to provide it in future releases.

Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.

Required Packages

The examples of this vignette require the following packages.

library(dplyr)
library(tidyr)
library(tibble)
library(admiral)

Example Data

In this vignette we use the example data from the CDISC ADaM Supplements (Generalized Anxiety Disorder 7-Item Version 2 (GAD-7), Geriatric Depression Scale Short Form (GDS-SF))¹:

qs <- admiral::example_qs

STUDYID	USUBJID	QSSEQ	QSTESTCD	QSTEST	VISIT	VISITNUM	QSDTC	QSORRES	QSCAT	QSSTRESN
STUDYX	P0001	1	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	VISIT 1	1	2012-11-16	More than half the days	GAD-7 V2	2
STUDYX	P0001	2	GAD0202	GAD02-Not Able to Stop/Control Worrying	VISIT 1	1	2012-11-16	More than half the days	GAD-7 V2	2
STUDYX	P0001	3	GAD0203	GAD02-Worrying Too Much About Things	VISIT 1	1	2012-11-16	More than half the days	GAD-7 V2	2
STUDYX	P0001	4	GAD0204	GAD02-Trouble Relaxing	VISIT 1	1	2012-11-16	More than half the days	GAD-7 V2	2
STUDYX	P0001	5	GAD0205	GAD02-Being Restless Hard to Sit Still	VISIT 1	1	2012-11-16	Nearly every day	GAD-7 V2	3
STUDYX	P0001	6	GAD0206	GAD02-Becoming Easily Annoyed/Irritable	VISIT 1	1	2012-11-16	Nearly every day	GAD-7 V2	3
STUDYX	P0001	7	GAD0207	GAD02-Feel Afraid/Something Awful Happen	VISIT 1	1	2012-11-16	Several days	GAD-7 V2	1
STUDYX	P0001	8	GAD0208	GAD02-Total Score	VISIT 1	1	2012-11-16	15	GAD-7 V2	15
STUDYX	P0001	9	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	UNSCHEDULED VISIT 5.01	501	2013-04-15	More than half the days	GAD-7 V2	2
STUDYX	P0001	10	GAD0202	GAD02-Not Able to Stop/Control Worrying	UNSCHEDULED VISIT 5.01	501	2013-04-15	More than half the days	GAD-7 V2	2

adsl <- tribble(
  ~STUDYID, ~USUBJID, ~SITEID, ~ITTFL, ~TRTSDT,                      ~DTHCAUS,
  "STUDYX",  "P0001",     13L,    "Y", lubridate::ymd("2012-11-16"), NA_character_,
  "STUDYX",  "P0002",     11L,    "Y", lubridate::ymd("2012-11-16"), "PROGRESSIVE DISEASE"
)

STUDYID	USUBJID	SITEID	ITTFL	TRTSDT	DTHCAUS
STUDYX	P0001	13	Y	2012-11-16	NA
STUDYX	P0002	11	Y	2012-11-16	PROGRESSIVE DISEASE

Original Items

The original items, i.e. the answers to the questionnaire questions, can be handled in the same way as in a BDS finding ADaM. For example:

adqs <- qs %>%
  # Add ADSL variables
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = exprs(TRTSDT, DTHCAUS),
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%
  # Add analysis parameter variables
  mutate(
    PARAMCD = QSTESTCD,
    PARAM = QSTEST,
    PARCAT1 = QSCAT,
    AVALC = QSORRES,
    AVAL = QSSTRESN
  ) %>%
  # Add timing variables
  derive_vars_dt(new_vars_prefix = "A", dtc = QSDTC) %>%
  derive_vars_dy(reference_date = TRTSDT, source_vars = exprs(ADT)) %>%
  mutate(
    AVISIT = if_else(ADT <= TRTSDT, "BASELINE", VISIT),
    AVISITN = if_else(ADT <= TRTSDT, 0, VISITNUM)
  )

USUBJID	PARAMCD	PARAM	PARCAT1	AVALC	AVAL	ADY	AVISIT
P0001	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	GAD-7 V2	More than half the days	2	1	BASELINE
P0001	GAD0202	GAD02-Not Able to Stop/Control Worrying	GAD-7 V2	More than half the days	2	1	BASELINE
P0001	GAD0203	GAD02-Worrying Too Much About Things	GAD-7 V2	More than half the days	2	1	BASELINE
P0001	GAD0204	GAD02-Trouble Relaxing	GAD-7 V2	More than half the days	2	1	BASELINE
P0001	GAD0205	GAD02-Being Restless Hard to Sit Still	GAD-7 V2	Nearly every day	3	1	BASELINE
P0001	GAD0206	GAD02-Becoming Easily Annoyed/Irritable	GAD-7 V2	Nearly every day	3	1	BASELINE
P0001	GAD0207	GAD02-Feel Afraid/Something Awful Happen	GAD-7 V2	Several days	1	1	BASELINE
P0001	GAD0208	GAD02-Total Score	GAD-7 V2	15	15	1	BASELINE
P0001	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	GAD-7 V2	More than half the days	2	151	UNSCHEDULED VISIT 5.01
P0001	GAD0202	GAD02-Not Able to Stop/Control Worrying	GAD-7 V2	More than half the days	2	151	UNSCHEDULED VISIT 5.01

We handle unscheduled visits as normal visits. For deriving visits based on time-windows, see Visit and Period Variables. And for flagging values to be used for analysis, see derive_var_extreme_flag().

Transformed Items

Please note that in the example data, the numeric values of the answers are mapped in SDTM (QSSTRESN) such that they can be used for deriving scores. Depending on the question, QSORRES == "YES" is mapped to QSSTRESN = 0 or QSSTRESN = 1. If the QSSTRESN values are not ready to be used for deriving scores and require transformation, it is recommended that QSSTRESN is kept in the ADaM dataset for traceability, and the transformed value is stored in AVAL, since that’s what will be used for the score calculation.

It may also be necessary to transform the range of the numeric values of the original items. For example if a scale should be derived as the average but the range of the contributing items varies. In this case the values could be linearly transformed to a unified range like [0, 100]. The computation function transform_range() can be used for the transformation.

Scales and Scores

Scales and Scores are often derived as the sum or the average across a subset of the items. For the GAD-7 questionnaire, the total score is derived as the sum. The derive_summary_records() function with sum() can be used to derive it as a new parameter. For selecting the parameters to be summarized, regular expressions like in the example below may be helpful. In the example we derive a separate ADaM dataset for each questionnaire. Depending on the analysis needs, it is also possible that an ADaM contains more than one questionnaire or all questionnaires.

adgad7 <- adqs %>%
  # Select records to keep in the GAD-7 ADaM
  filter(PARCAT1 == "GAD-7 V2") %>%
  derive_summary_records(
    dataset = .,
    dataset_add = .,
    by_vars = exprs(STUDYID, USUBJID, AVISIT, ADT, ADY, TRTSDT, DTHCAUS),
    # Select records contributing to total score
    filter_add = str_detect(PARAMCD, "GAD020[1-7]"),
    set_values_to = exprs(
      AVAL = sum(AVAL, na.rm = TRUE),
      PARAMCD = "GAD02TS",
      PARAM = "GAD02-Total Score - Analysis"
    )
  )

USUBJID	PARAMCD	PARAM	AVAL	ADY	AVISIT
P0001	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	2	1	BASELINE
P0001	GAD0202	GAD02-Not Able to Stop/Control Worrying	2	1	BASELINE
P0001	GAD0203	GAD02-Worrying Too Much About Things	2	1	BASELINE
P0001	GAD0204	GAD02-Trouble Relaxing	2	1	BASELINE
P0001	GAD0205	GAD02-Being Restless Hard to Sit Still	3	1	BASELINE
P0001	GAD0206	GAD02-Becoming Easily Annoyed/Irritable	3	1	BASELINE
P0001	GAD0207	GAD02-Feel Afraid/Something Awful Happen	1	1	BASELINE
P0001	GAD0208	GAD02-Total Score	15	1	BASELINE
P0001	GAD02TS	GAD02-Total Score - Analysis	15	1	BASELINE
P0001	GAD0201	GAD02-Feeling Nervous Anxious or On Edge	2	151	UNSCHEDULED VISIT 5.01

For the GDS-SF questionnaire, the total score is defined as the average of the item values transformed to the range [0, 15] and rounded up to the next integer. If more than five items are missing, the total score is considered as missing. This parameter can be derived by compute_scale() and derive_summary_records():

adgdssf <- adqs %>%
  # Select records to keep in the GDS-SF ADaM
  filter(PARCAT1 == "GDS SHORT FORM") %>%
  derive_summary_records(
    dataset = .,
    dataset_add = .,
    by_vars = exprs(STUDYID, USUBJID, AVISIT, ADT, ADY, TRTSDT, DTHCAUS),
    # Select records contributing to total score
    filter_add = str_detect(PARAMCD, "GDS02[01][0-9]"),
    set_values_to = exprs(
      AVAL = compute_scale(
        AVAL,
        source_range = c(0, 1),
        target_range = c(0, 15),
        min_n = 10
      ) %>%
        ceiling(),
      PARAMCD = "GDS02TS",
      PARAM = "GDS02- Total Score - Analysis"
    )
  )

USUBJID	PARAMCD	PARAM	AVAL	ADY	AVISIT
P0001	GDS0201	GDS02-Satisfied With Life	0	1	BASELINE
P0001	GDS0202	GDS02-Dropped Activities and Interests	1	1	BASELINE
P0001	GDS0203	GDS02-Life Is Empty	0	1	BASELINE
P0001	GDS0204	GDS02-Bored Often	1	1	BASELINE
P0001	GDS0205	GDS02-Good Spirits Most of Time	1	1	BASELINE
P0001	GDS0206	GDS02-Afraid of Something Bad Happening	1	1	BASELINE
P0001	GDS0207	GDS02-Feel Happy Most of Time	1	1	BASELINE
P0001	GDS0208	GDS02-Often Feel Helpless	1	1	BASELINE
P0001	GDS0209	GDS02-Prefer to Stay Home	0	1	BASELINE
P0001	GDS0210	GDS02-Memory Problems	1	1	BASELINE

After deriving the scores by visit, the baseline and change from baseline variables can be derived:

adgdssf <- adgdssf %>%
  # Flag baseline records (last before treatement start)
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars = exprs(STUDYID, USUBJID, PARAMCD),
      order = exprs(ADT),
      new_var = ABLFL,
      mode = "last"
    ),
    filter = !is.na(AVAL) & ADT <= TRTSDT
  ) %>%
  # Derive baseline and change from baseline variables
  derive_var_base(
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    source_var = AVAL,
    new_var = BASE
  ) %>%
  # Calculate CHG for post-baseline records
  # The decision on how to populate pre-baseline and baseline values of CHG is left to producer choice
  restrict_derivation(
    derivation = derive_var_chg,
    filter = AVISITN > 0
  ) %>%
  # Calculate PCHG for post-baseline records
  # The decision on how to populate pre-baseline and baseline values of PCHG is left to producer choice
  restrict_derivation(
    derivation = derive_var_pchg,
    filter = AVISITN > 0
  ) %>%
  # Derive sequence number
  derive_var_obs_number(
    by_vars = exprs(STUDYID, USUBJID),
    order = exprs(PARAMCD, ADT),
    check_type = "error"
  )

USUBJID	PARAMCD	PARAM	AVISIT	AVAL	BASE	CHG	PCHG
P0001	GDS0201	GDS02-Satisfied With Life	BASELINE	0	0	NA	NA
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 2	0	0	0	NA
P0001	GDS0201	GDS02-Satisfied With Life	UNSCHEDULED 2.01	0	0	0	NA
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 3	NA	0	NA	NA
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 4	0	0	0	NA
P0001	GDS0202	GDS02-Dropped Activities and Interests	BASELINE	1	1	NA	NA
P0001	GDS0202	GDS02-Dropped Activities and Interests	VISIT 2	0	1	-1	-100
P0001	GDS0202	GDS02-Dropped Activities and Interests	UNSCHEDULED 2.01	0	1	-1	-100
P0001	GDS0202	GDS02-Dropped Activities and Interests	VISIT 3	NA	1	NA	NA
P0001	GDS0202	GDS02-Dropped Activities and Interests	VISIT 4	0	1	-1	-100

Time to Deterioration/Improvement

As time to event parameters require specific variables like CNSR, STARTDT, and EVNTDESC, it makes sense to create a separate time to event dataset for them. However, it might be useful to create flags or categorization variables in ADQS. For example:

# Create AVALCATx lookup table
avalcat_lookup <- exprs(
  ~PARAMCD, ~condition, ~AVALCAT1, ~AVALCAT1N,
  "GDS02TS", AVAL <= 5, "Normal", 0L,
  "GDS02TS", AVAL <= 10 & AVAL > 5, "Possible Depression", 1L,
  "GDS02TS", AVAL > 10, "Likely Depression", 2L
)
# Create CHGCAT1 lookup table
chgcat_lookup <- exprs(
  ~condition, ~CHGCAT1,
  AVALCAT1N > BASECA1N, "WORSENED",
  AVALCAT1N == BASECA1N, "NO CHANGE",
  AVALCAT1N < BASECA1N, "IMPROVED"
)

adgdssf <- adgdssf %>%
  derive_vars_cat(
    definition = avalcat_lookup,
    by_vars = exprs(PARAMCD)
  ) %>%
  derive_var_base(
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    source_var = AVALCAT1,
    new_var = BASECAT1
  ) %>%
  derive_var_base(
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    source_var = AVALCAT1N,
    new_var = BASECA1N
  ) %>%
  derive_vars_cat(
    definition = chgcat_lookup
  )

USUBJID	PARAMCD	PARAM	AVISIT	AVAL	AVALCAT1	CHGCAT1
P0001	GDS02TS	GDS02- Total Score - Analysis	BASELINE	10	Possible Depression	NO CHANGE
P0001	GDS02TS	GDS02- Total Score - Analysis	VISIT 2	8	Possible Depression	NO CHANGE
P0001	GDS02TS	GDS02- Total Score - Analysis	UNSCHEDULED 2.01	8	Possible Depression	NO CHANGE
P0001	GDS02TS	GDS02- Total Score - Analysis	VISIT 3	7	Possible Depression	NO CHANGE
P0001	GDS02TS	GDS02- Total Score - Analysis	VISIT 4	3	Normal	IMPROVED
P0001	GDS0215	GDS02-Most People Better Off Than You	BASELINE	0	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	VISIT 2	0	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	UNSCHEDULED 2.01	0	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	VISIT 3	0	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	VISIT 4	0	NA	NA

Then a time to deterioration parameter can be derived by:

# Define event
deterioration_event <- event_source(
  dataset_name = "adqs",
  filter = PARAMCD == "GDS02TS" & CHGCAT1 == "WORSENED",
  date = ADT,
  set_values_to = exprs(
    EVNTDESC = "DEPRESSION WORSENED",
    SRCDOM = "ADQS",
    SRCVAR = "ADT",
    SRCSEQ = ASEQ
  )
)

# Define censoring at last assessment
last_valid_assessment <- censor_source(
  dataset_name = "adqs",
  filter = PARAMCD == "GDS02TS" & !is.na(CHGCAT1),
  date = ADT,
  set_values_to = exprs(
    EVNTDESC = "LAST ASSESSMENT",
    SRCDOM = "ADQS",
    SRCVAR = "ADT",
    SRCSEQ = ASEQ
  )
)

# Define censoring at treatment start (for subjects without assessment)
start <- censor_source(
  dataset_name = "adsl",
  date = TRTSDT,
  set_values_to = exprs(
    EVNTDESC = "TREATMENT START",
    SRCDOM = "ADSL",
    SRCVAR = "TRTSDT"
  )
)

adgdstte <- derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adqs = adgdssf),
  start_date = TRTSDT,
  event_conditions = list(deterioration_event),
  censor_conditions = list(last_valid_assessment, start),
  set_values_to = exprs(
    PARAMCD = "TTDEPR",
    PARAM = "Time to depression"
  )
) %>%
  derive_vars_duration(
    new_var = AVAL,
    start_date = STARTDT,
    end_date = ADT
  )

USUBJID	PARAMCD	PARAM	AVAL	CNSR	EVNTDESC	SRCDOM	SRCVAR
P0001	TTDEPR	Time to depression	90	1	LAST ASSESSMENT	ADQS	ADT
P0002	TTDEPR	Time to depression	30	0	DEPRESSION WORSENED	ADQS	ADT

Time to Confirmed/Definitive Deterioration/Improvement

The derivation of confirmed/definitive deterioration/improvement parameters is very similar to the unconfirmed deterioration parameters except that the event is not based on CHGCATy, but on a confirmation flag variable. This confirmation flag can be derived by derive_var_joined_exist_flag(). For example, flagging deteriorations, which are confirmed by a second assessment at least seven days later:

adgdssf <- adgdssf %>%
  derive_var_joined_exist_flag(
    dataset_add = adgdssf,
    by_vars = exprs(USUBJID, PARAMCD),
    order = exprs(ADT),
    new_var = CDETFL,
    join_vars = exprs(CHGCAT1, ADY),
    join_type = "after",
    filter_join = CHGCAT1 == "WORSENED" &
      CHGCAT1.join == "WORSENED" &
      ADY.join >= ADY + 7
  )

USUBJID	PARAMCD	PARAM	ADY	CHGCAT1	CDETFL
P0001	GDS02TS	GDS02- Total Score - Analysis	1	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	30	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	43	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	58	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	90	IMPROVED	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	1	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	30	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	43	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	58	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	90	NA	NA

For flagging deteriorations at two consecutive assessments or considering death due to progression at the last assessment as confirmation, the tmp_obs_nr_var argument is helpful:

# Flagging deterioration at two consecutive assessments
adgdssf <- adgdssf %>%
  derive_var_joined_exist_flag(
    dataset_add = adgdssf,
    by_vars = exprs(USUBJID, PARAMCD),
    order = exprs(ADT),
    new_var = CONDETFL,
    join_vars = exprs(CHGCAT1),
    join_type = "after",
    tmp_obs_nr_var = tmp_obs_nr,
    filter_join = CHGCAT1 == "WORSENED" &
      CHGCAT1.join == "WORSENED" &
      tmp_obs_nr.join == tmp_obs_nr + 1
  ) %>%
  # Flagging deterioration confirmed by
  # - a second deterioration at least 7 days later or
  # - deterioration at the last assessment and death due to progression
  derive_var_joined_exist_flag(
    .,
    dataset_add = .,
    by_vars = exprs(USUBJID, PARAMCD),
    order = exprs(ADT),
    new_var = CDTDTHFL,
    join_vars = exprs(CHGCAT1, ADY),
    join_type = "all",
    tmp_obs_nr_var = tmp_obs_nr,
    filter_join = CHGCAT1 == "WORSENED" & (
      CHGCAT1.join == "WORSENED" & ADY.join >= ADY + 7 |
        tmp_obs_nr == max(tmp_obs_nr.join) & DTHCAUS == "PROGRESSIVE DISEASE")
  )

USUBJID	PARAMCD	PARAM	ADY	CHGCAT1	CONDETFL	CDTDTHFL
P0001	GDS02TS	GDS02- Total Score - Analysis	1	NO CHANGE	NA	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	30	NO CHANGE	NA	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	43	NO CHANGE	NA	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	58	NO CHANGE	NA	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	90	IMPROVED	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	1	NA	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	30	NA	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	43	NA	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	58	NA	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	90	NA	NA	NA

For definitive deterioration (deterioration at all following assessments), parameter summary functions like all() can be used in the filter condition:

adgdssf <- adgdssf %>%
  derive_var_joined_exist_flag(
    dataset_add = adgdssf,
    by_vars = exprs(USUBJID, PARAMCD),
    order = exprs(ADT),
    new_var = DEFDETFL,
    join_vars = exprs(CHGCAT1),
    join_type = "after",
    filter_join = CHGCAT1 == "WORSENED" & all(CHGCAT1.join == "WORSENED")
  )

USUBJID	PARAMCD	PARAM	ADY	CHGCAT1	DEFDETFL
P0001	GDS02TS	GDS02- Total Score - Analysis	1	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	30	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	43	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	58	NO CHANGE	NA
P0001	GDS02TS	GDS02- Total Score - Analysis	90	IMPROVED	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	1	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	30	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	43	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	58	NA	NA
P0001	GDS0215	GDS02-Most People Better Off Than You	90	NA	NA

The time-to-event parameter can be derived in the same way as for the unconfirmed parameters (see Time to Deterioration/Improvement).

Worst/Best Answer

This class of parameters can be used when the worst answer of a set of yes/no answers should be selected. For example, if yes/no answers for “No sleep”, “Waking up more than three times”, “More than 30 minutes to fall asleep” are collected, a parameter for the worst sleeping problems could be derived. In the example, “no sleeping problems” is assumed if all questions were answered with “no”.

adsp <- adqs %>%
  filter(PARCAT1 == "SLEEPING PROBLEMS") %>%
  derive_extreme_event(
    by_vars = exprs(USUBJID, AVISIT),
    tmp_event_nr_var = event_nr,
    order = exprs(event_nr, ADY, QSSEQ),
    mode = "first",
    events = list(
      event(
        condition = PARAMCD == "SP0101" & AVALC == "YES",
        set_values_to = exprs(
          AVALC = "No sleep",
          AVAL = 1
        )
      ),
      event(
        condition = PARAMCD == "SP0102" & AVALC == "YES",
        set_values_to = exprs(
          AVALC = "Waking up more than three times",
          AVAL = 2
        )
      ),
      event(
        condition = PARAMCD == "SP0103" & AVALC == "YES",
        set_values_to = exprs(
          AVALC = "More than 30 mins to fall asleep",
          AVAL = 3
        )
      ),
      event(
        condition = all(AVALC == "NO"),
        set_values_to = exprs(
          AVALC = "No sleeping problems",
          AVAL = 4
        )
      ),
      event(
        condition = TRUE,
        set_values_to = exprs(
          AVALC = "Missing",
          AVAL = 99
        )
      )
    ),
    set_values_to = exprs(
      PARAMCD = "SP01WSP",
      PARAM = "Worst Sleeping Problems"
    )
  )

USUBJID	PARAMCD	PARAM	AVISIT	AVALC
P0001	SP0101	SP01-No sleep at all	BASELINE	NO
P0001	SP0102	SP01-Waking up more than three times	BASELINE	NO
P0001	SP0103	SP01-More than 30 mins to fall asleep	BASELINE	YES
P0001	SP01WSP	Worst Sleeping Problems	BASELINE	More than 30 mins to fall asleep
P0001	SP0101	SP01-No sleep at all	VISIT 2	NO
P0001	SP0102	SP01-Waking up more than three times	VISIT 2	NO
P0001	SP0103	SP01-More than 30 mins to fall asleep	VISIT 2	NO
P0001	SP01WSP	Worst Sleeping Problems	VISIT 2	No sleeping problems
P0001	SP0101	SP01-No sleep at all	VISIT 4	NO
P0001	SP0102	SP01-Waking up more than three times	VISIT 4	YES

Completion

Parameters for completion, like “at least 90% of the questions were answered”, can be derived by derive_summary_records().

adgdssf <- adgdssf %>%
  derive_summary_records(
    dataset_add = adgdssf,
    filter_add = str_detect(PARAMCD, "GDS02[01][0-9]"),
    by_vars = exprs(USUBJID, AVISIT),
    set_values_to = exprs(
      AVAL = sum(!is.na(AVAL)) / 15 >= 0.9,
      PARAMCD = "COMPL90P",
      PARAM = "Completed at least 90% of questions?",
      AVALC = if_else(AVAL == 1, "YES", "NO")
    )
  )

USUBJID	PARAMCD	PARAM	AVISIT	AVALC
P0001	COMPL90P	Completed at least 90% of questions?	BASELINE	YES
P0001	COMPL90P	Completed at least 90% of questions?	UNSCHEDULED 2.01	YES
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 2	YES
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 3	NO
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 4	YES
P0001	GDS0201	GDS02-Satisfied With Life	BASELINE	YES
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 2	YES
P0001	GDS0201	GDS02-Satisfied With Life	UNSCHEDULED 2.01	YES
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 3	NA
P0001	GDS0201	GDS02-Satisfied With Life	VISIT 4	YES

Please note that the denominator may depend on the answers of some of the questions. For example, a given questionnaire might direct someone to go from question #4 directly to question #8 based on their response to question #4, because questions #5, #6 and #7 would not apply in that case.

If missed visits need to be taken into account, the expected records can be added to the input dataset by calling derive_expected_records():

# Create dataset with expected visits and parameters (GDS0201 - GDS0215)
parm_visit_ref <- crossing(
  tribble(
    ~AVISIT,    ~AVISITN,
    "BASELINE",        0,
    "VISIT 2",         2,
    "VISIT 3",         3,
    "VISIT 4",         4,
    "VISIT 5",         5
  ),
  tibble(PARAMCD = sprintf("GDS02%02d", seq(1, 15)))
)

adgdssf <- adgdssf %>%
  derive_expected_records(
    dataset_ref = parm_visit_ref,
    by_vars = exprs(USUBJID),
    set_values_to = exprs(
      filled_in = 1
    )
  ) %>%
  derive_summary_records(
    dataset = .,
    dataset_add = .,
    filter_add = str_detect(PARAMCD, "GDS02[01][0-9]"),
    by_vars = exprs(USUBJID, AVISIT),
    set_values_to = exprs(
      AVAL = all(!is.na(AVAL)),
      PARAMCD = "COMPLALL",
      PARAM = "Completed all questions?",
      AVALC = if_else(AVAL == 1, "YES", "NO")
    )
  ) %>%
  filter(is.na(filled_in)) %>%
  select(-filled_in)

USUBJID	PARAMCD	PARAM	AVISIT	AVALC
P0001	COMPL90P	Completed at least 90% of questions?	BASELINE	YES
P0001	COMPL90P	Completed at least 90% of questions?	UNSCHEDULED 2.01	YES
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 2	YES
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 3	NO
P0001	COMPL90P	Completed at least 90% of questions?	VISIT 4	YES
P0001	COMPLALL	Completed all questions?	BASELINE	YES
P0001	COMPLALL	Completed all questions?	UNSCHEDULED 2.01	YES
P0001	COMPLALL	Completed all questions?	VISIT 2	YES
P0001	COMPLALL	Completed all questions?	VISIT 3	NO
P0001	COMPLALL	Completed all questions?	VISIT 4	YES

The example QS data (example_qs) is included in the admiral package.↩︎

Creating Questionnaire ADaMs