--- title: "Queries Dataset Documentation" output: rmarkdown::html_vignette: toc: false vignette: > %\VignetteIndexEntry{Queries Dataset Documentation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(admiraldev) ``` # Introduction To support the safety analysis, it is quite common to define specific grouping of events. One of the most common ways is to group events or medications by a specific medical concept such as a Standard MedDRA Queries (SMQs) or WHO-Drug Standardized Drug Groupings (SDGs). To help with the derivation of these variables, the {admiral} function `derive_vars_query()` can be used. This function takes as input the dataset (`dataset`) where the grouping must occur (e.g `ADAE`) and a dataset containing the required information to perform the derivation of the grouping variables (`dataset_queries`). The dataset passed to the `dataset_queries` argument of the `derive_vars_query()` function can be created by the `create_query_data()` function. For SMQs and SDGs company-specific functions for accessing the SMQ and SDG database need to be passed to the `create_query_data()` function (see the description of the `get_terms_fun` argument for details). This vignette describes the expected structure and content of the dataset passed to the `dataset_queries` argument in the `derive_vars_query()` function. # Structure of the Query Dataset ## Variables Variable | Scope | Type | Example Value ------- | ----- | ------ | ----- **PREFIX** | The prefix used to define the grouping variables | Character | `"SMQ01"` **GRPNAME** | The value provided to the grouping variables name| Character | `"Immune-Mediated Guillain-Barre Syndrome"` **SRCVAR** | The variable used to define the grouping. Used in conjunction with TERMCHAR | Character | `"AEDECOD"` **TERMCHAR** | A term used to define the grouping. Used in conjunction with SRCVAR | Character | `"GUILLAIN-BARRE SYNDROME"` **TERMNUM** | A code used to define the grouping. Used in conjunction with SRCVAR | Integer | `10018767` GRPID | Id number of the query. This could be a SMQ identifier | Integer | `20000131` SCOPE | Scope (Broad/Narrow) of the query | Character | `BROAD`, `NARROW`, `NA` SCOPEN | Scope (Broad/Narrow) of the query | Integer | `1`, `2`, `NA` VERSION | The version of the dictionary | Character | `"20.1"` **Bold variables** are required in `dataset_queries`: an error is issued if any of these variables is missing. `TERMCHAR` is only REQUIRED if there is character variable named in `SRCVAR`. `TERMNUM` is only REQUIRED if there is numeric variable named in `SRCVAR`. When `SRCVAR` contains both character and numeric variables, then both `TERMCHAR` and `TERMNUM` are required. Other variables are optional. The `VERSION` variable is not used by `derive_vars_query()` but can be used to check if the dictionary version of the queries dataset and the analysis dataset are in line. ## Required Content Each row must be unique within the dataset. As described above, the variables `PREFIX`, `GRPNAME`, `SRCVAR`, `TERMCHAR` and `TERMNUM` are required. The combination of these variables will allow the creation of the grouping variable. ### Input + `PREFIX` must be a character string starting with 2 or 3 letters, followed by a 2-digits number (e.g. "CQ01"). + `GRPNAME` must be a non missing character string and it must be unique within `PREFIX`. + `SRCVAR` must be a non missing character string. + Each value in `SRCVAR` represents a variable from `dataset` used to define the grouping variables (e.g. `AEDECOD`,`AEBODSYS`, `AELLTCD`). + The function `derive_vars_query()` will check that each value given in `SRCVAR` has a corresponding variable in the input `dataset` and issue an error otherwise. + Different `SRCVAR` variables may be specified within a `PREFIX`. + `TERMCHAR` must be a character string. This **must** be populated if `TERMNUM` is missing. + `TERMNUM` must be an integer. This **must** be populated if `TERMCHAR` is missing. ### Output + `PREFIX` will be used to create the grouping variable appending the suffix "NAM". This variable will now be referred to as `ABCzzNAM`: the name of the grouping variable. + E.g. `PREFIX == "SMQ01"` will create the `SMQ01NAM` variable. + For each `PREFIX`, a new `ABCzzNAM` variable is created in `dataset`. + [`GRPNAME`]{#GRPNAME} will be used to populate the corresponding `ABCzzNAM` variable. + `SRCVAR` will be used to identify the variables from `dataset` used to perform the grouping (e.g. `AEDECOD`,`AEBODSYS`, `AELLTCD`). + `TERMCHAR` (for character variables), `TERMNUM` (for numeric variables) will be used to identify the records meeting the criteria in `dataset` based on the variable defined in `SRCVAR`. + **Result:** + For each record in `dataset`, where the variable defined by `SRCVAR` match a term from the `TERMCHAR` (for character variables) or `TERMNUM` (for numeric variables) in the `datasets_queries`, `ABCzzNAM` is populated with `GRPNAME`. + Note: The type (numeric or character) of the variable defined in `SRCVAR` is checked in `dataset`. If the variable is a character variable (e.g. `AEDECOD`), it is expected that `TERMCHAR` is populated, if it is a numeric variable (e.g. `AEBDSYCD`), it is expected that `TERMNUM` is populated, otherwise an error is issued. ### Example In this example, one standard MedDRA query (`PREFIX = "SMQ01"`) and one customized query (`PREFIX = "CQ02"`) are defined to analyze the adverse events. + The standard MedDRA query variable `SMQ01NAM` [`PREFIX`] will be populated with "Standard Query 1" [`GRPNAME`] if any preferred term (`AEDECOD`) [`SRCVAR`] in `dataset` is equal to "AE1" or "AE2" [`TERMCHAR`] + The customized query (`CQ02NAM`) [`PREFIX`] will be populated with "Query 2" [`GRPNAME`] if any Low Level Term Code (`AELLTCD`) [`SRCVAR`] in `dataset` is equal to 10 [`TERMNUM`] or any preferred term (`AEDECOD`) [`SRCVAR`] in `dataset` is equal to "AE4" [`TERMCHAR`]. #### Query Dataset (`ds_query`) PREFIX | GRPNAME | SRCVAR | TERMCHAR | TERMNUM | ------- | ----- | ------ | ----- | ----- | ----- SMQ01| Standard Query 1 | AEDECOD | AE1 | SMQ01| Standard Query 1 | AEDECOD | AE2 | CQ02| Query 2 | AELLTCD | | 10 CQ02| Query 2 | AEDECOD | AE4| #### Adverse Event Dataset (`ae`) USUBJID | AEDECOD | AELLTCD ------- | ----- | ------ 0001 | AE1 | 101 | 0001 | AE3 | 10| | 0001 | AE4 | 120 | 0001 | AE5 | 130 | #### Output Dataset Generated by calling `derive_vars_query(dataset = ae, dataset_queries = ds_query)`. USUBJID | AEDECOD | AELLTCD | SMQ01NAM |CQ02NAM ------- | ----- | ------ | ----- | ----- 0001 | AE1 | 101 | Standard Query 1 | 0001 | AE3 | 10| | Query 2 0001 | AE4 | 120 | | Query 2 0001 | AE5 | 130 | | Subject 0001 has one event meeting the Standard Query 1 criteria (`AEDECOD = "AE1"`) and two events meeting the customized query (`AELLTCD = 10` and `AEDECOD = "AE4"`). ## Optional Content When standardized MedDRA Queries are added to the dataset, it is expected that the name of the query (`ABCzzNAM`) is populated along with its number code (`ABCzzCD`), and its Broad or Narrow scope (`ABCzzSC`). The following variables can be added to `queries_datset` to derive this information. ### Input + `GRPID` must be an integer. + `SCOPE` must be a character string. Possible values are: "BROAD", "NARROW" or `NA`. + `SCOPEN` must be an integer. Possible values are: `1`, `2` or `NA`. ### Output + `GRPID`, `SCOPE` and `SCOPEN` will be used in the same way as `GRPNAME` [(see here)](#GRPNAME) and will help in the creation of the `ABCzzCD`, `ABCzzSC` and `ABCzzSCN` variables. ### Output Variables These variables are optional and if not populated in `dataset_queries`, the corresponding output variable will not be created: PREFIX | GRPNAME | GRPID | SCOPE |SCOPEN | **Variables created** ------- | ----- | ------ | ----- | ----- | ----- SMQ01| Query 1 | XXXXXXXX | NARROW | 2 | `SMQ01NAM`, `SMQ01CD`, `SMQ01SC`, `SMQ01SCN` SMQ02| Query 2 | XXXXXXXX |BROAD | | `SMQ02NAM`, `SMQ02CD`, `SMQ02SC` SMQ03| Query 3 | XXXXXXXX | |1 | `SMQ03NAM`, `SMQ03CD`, `SMQ03SCN` SMQ04| Query 4 | XXXXXXXX | | | `SMQ04NAM`, `SMQ04CD` SMQ05| Query 5| | | | `SMQ05NAM`