The purpose of the
sdtmchecks
package is to help detect and investigate
potential analysis relevant issues in SDTM data. This is done using a
set of data check functions. These check functions are intended to be
generalizable, actionable, and
meaningful for analysis.
To start using sdtmchecks
first install it via
Then just load the package
Here’s how to access the help page for the package
The package comes with the sdtmchecksmeta
dataset which
contains metadata on each check function. It contains details like
function name, category, priority, and descriptions. Each function is
given a Category (Cross Therapeutic Area, Oncology,
Covid-19, Patient Reported Outcomes, Ophthalmology) and a
Priority (High, Medium, Low).
## # A tibble: 10 × 5
## check title category priority domains
## <chr> <chr> <chr> <chr> <chr>
## 1 check_ae_aeacn_ds_disctx_covid COVID AE trt di… COVID Low ae, ds
## 2 check_ae_aeacnoth AE AEACNOTH mul… ALL Low ae
## 3 check_ae_aeacnoth_ds_disctx AE AEACNOTx Dis… ALL Low ae, ds
## 4 check_ae_aeacnoth_ds_stddisc_covid COVID AE study … COVID Low ae, ds
## 5 check_ae_aedecod AE Missing PT ALL High ae
## 6 check_ae_aedthdtc_aesdth AE Death Date v… ALL High ae
## 7 check_ae_aedthdtc_ds_death DS Death Dates … ALL High ae, ds
## 8 check_ae_aelat AE AELAT Missing OPHTH High ae
## 9 check_ae_aeout AE Death Outcome ALL High ae
## 10 check_ae_aeout_aeendtc_aedthdtc Fatal AE Resolu… ALL High ae
Let’s do an example using
check_ae_ds_partial_death_dates(AE,DS)
This check flags records with partial death dates (i.e. length
<10) in AE and DS. If any are found, then data check returns
FALSE
with attributes containing a list of flagged records
as well as a brief message explaining the result. If no issues are
detected the check returns TRUE
.
## USUBJID AEDECOD AEDTHDTC
## 1 1 AE1 2017-01-01
## 2 2 AE2 2017
## 3 3 AE3 <NA>
## USUBJID DSSCAT DSDECOD DSSTDTC
## 1 4 STUDY DISCON DEATH 2018-01-01
## 2 5 STUDY DISCON DEATH 2017-03-03
## 3 6 STUDY DISCON DEATH 2018-01-02
## 4 7 STUDY DISCON DEATH 2016-10
## [1] FALSE
## attr(,"msg")
## [1] "There are 2 patients with partial death dates. "
## attr(,"data")
## USUBJID DSSCAT DSDECOD DSSTDTC AEDECOD AEDTHDTC
## 1 2 <NA> <NA> <NA> AE2 2017
## 2 7 STUDY DISCON DEATH 2016-10 <NA> <NA>
Running all the checks on your data is super easy. Just use the
run_all_checks
function. This function assumes you have all
of your sdtm datasets as objects in your global environment,
e.g. ae
,dm
,ex
,etc.
# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
ds = haven::read_sas("path/to/ds.sas7bdat")
# Run the checks and save as an object called "myreport"
myreport=run_all_checks(metads = sdtmchecksmeta,
priority = c("High", "Medium", "Low"), #subset checks based on priority
type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category
verbose = TRUE)
class(myreport) #results in a list object
names(myreport) #each check result is saved in a slot of the list
myreport[["check_ae_aedecod"]] #investigate the results of a check
The run_all_checks
function also lets you easily subset
on category or priority
myreport=run_all_checks(metads = sdtmchecksmeta,
priority = c("High"),
type = c("ONC"),
verbose = TRUE)
You can also choose specific checks to run. Here’s a way to get started with some checks that should work fairly well for most datasets
# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
cm = haven::read_sas("path/to/cm.sas7bdat")
dm = haven::read_sas("path/to/dm.sas7bdat")
# Subset to checks that should work OK for most datasets
metads = sdtmchecksmeta %>%
filter(check %in% c("check_ae_aedecod",
"check_ae_aetoxgr",
"check_ae_dup",
"check_cm_cmdecod",
"check_cm_missing_month",
"check_dm_age_missing",
"check_dm_usubjid_dup",
"check_dm_armcd"
))
myreport=run_all_checks(metads = metads,
verbose = TRUE)
You can then write results out to an xlsx for easy sharing.