Introduction

This article describes how to cut study SDTM data using a modular approach to enable any further study or project specific customization.

Programming Flow

Read in Data
Create DCUT Dataset
Preprocess Datasets
Specify Cut Types
Patient Cut
Date Cut
DM Cut
Apply Cut
Output Final List of Cut Datasets

Read in Data

To start, all SDTM data to be cut needs to be stored in a list.

library(datacutr)
library(admiraldev)
library(dplyr)
library(lubridate)
library(stringr)
library(purrr)
library(rlang)

source_data <- list(
  ds = datacutr_ds, dm = datacutr_dm, ae = datacutr_ae, sc = datacutr_sc,
  lb = datacutr_lb, fa = datacutr_fa, ts = datacutr_ts
)

Create DCUT Dataset

The next step is to create the DCUT dataset containing the datacut date and description.

dcut <- create_dcut(
  dataset_ds = source_data$ds,
  ds_date_var = DSSTDTC,
  filter = DSDECOD == "RANDOMIZATION",
  cut_date = "2022-06-04",
  cut_description = "Clinical Cutoff Date"
)

USUBJID	DCUTDTC	DCUTDTM	DCUTDESC
AB12345-001	2022-06-04	2022-06-04 23:59:59	Clinical Cutoff Date
AB12345-002	2022-06-04	2022-06-04 23:59:59	Clinical Cutoff Date
AB12345-003	2022-06-04	2022-06-04 23:59:59	Clinical Cutoff Date
AB12345-004	2022-06-04	2022-06-04 23:59:59	Clinical Cutoff Date

Preprocess Datasets

If any pre-processing of datasets is needed, for example in the case of FA, where there are multiple date variables, this should be done next.

source_data$fa <- source_data$fa %>%
  mutate(DCUT_TEMP_FAXDTC = case_when(
    FASTDTC != "" ~ FASTDTC,
    FADTC != "" ~ FADTC,
    TRUE ~ as.character(NA)
  ))

USUBJID	FASTDTC	FADTC	DCUT_TEMP_FAXDTC
AB12345-001		2022-06-01	2022-06-01
AB12345-002	2022-06-30		2022-06-30
AB12345-003		2022-07-01	2022-07-01
AB12345-004	2022-05-04		2022-05-04
AB12345-005		2022-12-01	2022-12-01

Specify Cut Types

We’ll next specify the cut types for each dataset (patient cut, date cut or no cut) and in the case of date cut which date variable should be used.

patient_cut_list <- c("sc", "ds")

date_cut_list <- rbind(
  c("ae", "AESTDTC"),
  c("lb", "LBDTC"),
  c("fa", "DCUT_TEMP_FAXDTC")
)

no_cut_list <- list(ts = source_data$ts)

Patient Cut

Next we’ll apply the patient cut.

patient_cut_data <- lapply(
  source_data[patient_cut_list], pt_cut,
  dataset_cut = dcut
)

This adds on temporary flag variables indicating which observations will be removed, for example for SC:

USUBJID	SCORRES	DCUT_TEMP_REMOVE
AB12345-001	A	NA
AB12345-002	B	NA
AB12345-003	C	NA
AB12345-004	D	NA
AB12345-005	E	Y

Date Cut

Next we’ll apply the date cut.

date_cut_data <- pmap(
  .l = list(
    dataset_sdtm = source_data[date_cut_list[, 1]],
    sdtm_date_var = syms(date_cut_list[, 2])
  ),
  .f = date_cut,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

This again adds on temporary flag variables indicating which observations will be removed, for example for AE:

USUBJID	AETERM	AESTDTC	DCUT_TEMP_SDTM_DATE	DCUT_TEMP_DCUTDTM	DCUT_TEMP_REMOVE
AB12345-001	AE1	2022-06-01	2022-06-01	2022-06-04 23:59:59	NA
AB12345-002	AE2	2022-06-30	2022-06-30	2022-06-04 23:59:59	Y
AB12345-003	AE3	2022-07-01	2022-07-01	2022-06-04 23:59:59	Y
AB12345-004	AE4	2022-05-04	2022-05-04	2022-06-04 23:59:59	NA
AB12345-005	AE5	2022-12-01	2022-12-01	NA	Y

DM Cut

Then lastly we’ll apply the special DM cut which also updates the death related variables.

dm_cut <- special_dm_cut(
  dataset_dm = source_data$dm,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

This adds on temporary variables indicating any death records that would change as a result of applying a datacut:

USUBJID	DTHFL	DTHDTC	DCUT_TEMP_REMOVE	DCUT_TEMP_DTHDT	DCUT_TEMP_DCUTDTM	DCUT_TEMP_DTHCHANGE
AB12345-001	Y	2022-06-01	NA	2022-06-01	2022-06-04 23:59:59	NA
AB12345-002	NA	NA	NA	NA	2022-06-04 23:59:59	NA
AB12345-003	Y	2022-07-01	NA	2022-07-01	2022-06-04 23:59:59	Y
AB12345-004	NA	NA	NA	NA	2022-06-04 23:59:59	NA
AB12345-005	Y	2022-12-01	Y	2022-12-01	NA	NA

Apply Cut

The last step is to create the RMD report, to summarize which patients and observations will be cut, and then apply the cut to strip out all observations flagged as to be removed.

cut_data <- purrr::map(
  c(patient_cut_data, date_cut_data, list(dm = dm_cut)),
  apply_cut,
  dcutvar = DCUT_TEMP_REMOVE,
  dthchangevar = DCUT_TEMP_DTHCHANGE
)

Output Final List of Cut Datasets

Lastly, we create the final list of all the cut SDTM data, adding in the SDTM where no cut was needed.

final_data <- c(cut_data, no_cut_list, list(dcut = dcut))

- Introduction
- Programming Flow

Modular Approach