Title: | Utilities to Output CDISC SDTM/ADaM XPT Files |
---|---|
Description: | Tools to build CDISC compliant data sets and check for CDISC compliance. |
Authors: | Eli Miller [aut, cre] , Ben Straub [aut], Zelos Zhu [aut], Ethan Brockmann [aut], Vedha Viyash [aut], Andre Verissimo [aut], Sophie Shapcott [aut], Celine Piraux [aut], Kangjie Zhang [aut], Adrian Chan [aut], Sadchla Mascary [aut], Atorus/GSK JPT [cph] |
Maintainer: | Eli Miller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1.9000 |
Built: | 2024-12-10 22:21:55 UTC |
Source: | https://github.com/atorus-research/xportr |
An example dataset containing subject level data
data("adsl_xportr")
data("adsl_xportr")
adsl_xportr
A data frame with 306 rows and 51 columns:
Study Identifier
Unique Subject Identifier
Subject Identifier for the Study
Subject Reference Start Date/Time
Subject Reference End Date/Time
Date/Time of First Study Treatment
Date/Time of Last Study Treatment
Date/Time of Informed Consent
Date/Time of End of Participation
Date/Time of Death
Subject Death Flag
Study Site Identifier
Age
Age Units
Sex
Race
Ethnicity
Planned Arm Code
Description of Planned Arm
Actual Arm Code
Description of Actual Arm
Country
Date/Time of Collection
Study Day of Collection
Planned Treatment for Period 01
Actual Treatment for Period 01
Datetime of First Exposure to Treatment
Time of First Exposure Imputation Flag
Datetime of Last Exposure to Treatment
Time of Last Exposure Imputation Flag
Date of First Exposure to Treatment
Date of Last Exposure to Treatment
Total Treatment Duration (Days)
Screen Failure Date
End of Study Date
End of Study Status
Final Retrieval Visit Date
Date of Randomization
Date of Death
Date of Death Imputation Flag
Relative Day of Death
Elapsed Days from Last Dose to Death
Date Last Known Alive
Safety Population Flag
Pooled Race Group 1
Pooled Age Group 1
Geographic Region 1
Last Dose to Death - Days Elapsed Group 1
Death Within 30 Days of Last Trt Flag
Death After 30 Days from Last Trt Flag
Death Within 30 Days of First Trt Flag
Dataset created by admiral::use_ad_template("adsl")
Example Dataset Specification
data("dataset_spec")
data("dataset_spec")
dataset_spec
A data frame with 1 row and 9 columns:
chr: Dataset
chr: Dataset description
chr: Dataset class
lgl: Logical, indicating if there's a specific structure
chr: Purpose of the dataset
chr: Join Key variables in the dataset
chr: Indicates if the dataset is repeating
lgl: Reference Data
chr: Additional comment
Example Dataset Variable Specification
data("var_spec")
data("var_spec")
var_spec
A data frame with 216 rows and 19 columns:
Order of variable
Dataset
Variable
Variable Label
Data Type
Variable Length
Significant Digits
Variable Format
Mandatory Variable Flag
Variable Assigned Value
Variable Codelist
Common Variable Flag
Variable Origin
Pages
Variable Method
Variable Predecessor
Variable Role
Comment
Developer Notes
Wrapper to apply all core xportr functions and write xpt
xportr( .df, var_metadata = NULL, df_metadata = NULL, domain = NULL, verbose = NULL, path, strict_checks = FALSE )
xportr( .df, var_metadata = NULL, df_metadata = NULL, domain = NULL, verbose = NULL, path, strict_checks = FALSE )
.df |
A data frame of CDISC standard. |
var_metadata |
A data frame containing variable level metadata |
df_metadata |
A data frame containing dataset level metadata. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
path |
Path where transport file will be written. File name sans will be
used as |
strict_checks |
If TRUE, xpt validation will report errors and not write out the dataset. If FALSE, xpt validation will report warnings and continue with writing out the dataset. Defaults to FALSE |
Returns the input dataframe invisibly
data("adsl_xportr", "dataset_spec", "var_spec") adsl <- adsl_xportr library(magrittr) test_dir <- tempdir() pipeline_path <- file.path(test_dir, "adslpipe.xpt") xportr_path <- file.path(test_dir, "adslxptr.xpt") dataset_spec_low <- setNames(dataset_spec, tolower(names(dataset_spec))) names(dataset_spec_low)[[2]] <- "label" var_spec_low <- setNames(var_spec, tolower(names(var_spec))) names(var_spec_low)[[5]] <- "type" adsl %>% xportr_metadata(var_spec_low, "ADSL", verbose = "none") %>% xportr_type() %>% xportr_length() %>% xportr_label() %>% xportr_order() %>% xportr_format() %>% xportr_df_label(dataset_spec_low) %>% xportr_write(pipeline_path) # `xportr()` can be used to apply a whole pipeline at once xportr( adsl, var_metadata = var_spec_low, df_metadata = dataset_spec_low, domain = "ADSL", verbose = "none", path = xportr_path )
data("adsl_xportr", "dataset_spec", "var_spec") adsl <- adsl_xportr library(magrittr) test_dir <- tempdir() pipeline_path <- file.path(test_dir, "adslpipe.xpt") xportr_path <- file.path(test_dir, "adslxptr.xpt") dataset_spec_low <- setNames(dataset_spec, tolower(names(dataset_spec))) names(dataset_spec_low)[[2]] <- "label" var_spec_low <- setNames(var_spec, tolower(names(var_spec))) names(var_spec_low)[[5]] <- "type" adsl %>% xportr_metadata(var_spec_low, "ADSL", verbose = "none") %>% xportr_type() %>% xportr_length() %>% xportr_label() %>% xportr_order() %>% xportr_format() %>% xportr_df_label(dataset_spec_low) %>% xportr_write(pipeline_path) # `xportr()` can be used to apply a whole pipeline at once xportr( adsl, var_metadata = var_spec_low, df_metadata = dataset_spec_low, domain = "ADSL", verbose = "none", path = xportr_path )
Assigns dataset label from a dataset level metadata to a given data frame. This is stored in the 'label' attribute of the dataframe.
xportr_df_label(.df, metadata = NULL, domain = NULL, metacore = deprecated())
xportr_df_label(.df, metadata = NULL, domain = NULL, metacore = deprecated())
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing dataset. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
metacore |
Data frame with label attributes.
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments two columns must be present:
Domain Name - passed as the 'xportr.df_domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Label Name - passed as the 'xportr.df_label' option. Default:
"label". Character values to update the 'label' attribute of the
dataframe This is passed to haven::write_xpt
to note the label.
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), SITEID = c(001, 002, 003), AGE = c(63, 35, 27), SEX = c("M", "F", "M") ) metadata <- data.frame( dataset = c("adsl", "adae"), label = c("Subject-Level Analysis", "Adverse Events Analysis") ) adsl <- xportr_df_label(adsl, metadata, domain = "adsl")
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), SITEID = c(001, 002, 003), AGE = c(63, 35, 27), SEX = c("M", "F", "M") ) metadata <- data.frame( dataset = c("adsl", "adae"), label = c("Subject-Level Analysis", "Adverse Events Analysis") ) adsl <- xportr_df_label(adsl, metadata, domain = "adsl")
Assigns a SAS format from a variable level metadata to a given data frame. If
no format is found for a given variable, it is set as an empty character
vector. This is stored in the 'format.sas
' attribute.
xportr_format( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
xportr_format( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
metacore |
Data frame with SASformat
attributes for each variable.
This function carries out a series of basic checks to ensure the formats being applied make sense.
Note, the 'type' of message that is generated will depend on the value
passed to the verbose
argument: with 'stop' producing an error, 'warn'
producing a warning, or 'message' producing a message. A value of 'none'
will not output any messages.
If the variable has a suffix of DT
, DTM
, TM
(indicating a
numeric date/time variable) then a message will be shown if there is
no format associated with it.
If a variable is character then a message will be shown if there is
no $
prefix in the associated format.
If a variable is character then a message will be shown if the
associated format has greater than 31 characters (excluding the $
).
If a variable is numeric then a message will be shown if there is a
$
prefix in the associated format.
If a variable is numeric then a message will be shown if the associated format has greater than 32 characters.
All formats will be checked against a list of formats considered 'standard' as part of an ADaM dataset. Note, however, this list is not exhaustive (it would not be feasible to check all the functions within the scope of this package). If the format is not found in the 'standard' list, then a message is created advising the user to check.
Format Name | w Values | d Values |
w.d | 1 - 32 | ., 0 - 31 |
$w. | 1 - 200 | |
DATEw. | ., 5 - 11 | |
DATETIMEw. | 7 - 40 | |
DDMMYYw. | ., 2 - 10 | |
HHMM. | ||
MMDDYYw. | ., 2 - 10 | |
TIMEw. | ., 2 - 20 | |
WEEKDATEw. | ., 3 - 37 | |
YYMMDDw. | ., 2 - 10 | |
B8601DAw. | ., 8 - 10 | |
B8601DTw.d | ., 15 - 26 | ., 0 - 6 |
B8601TM. | ||
IS8601DA. | ||
IS8601TM. | ||
E8601DAw. | ., 10 | |
E8601DNw. | ., 10 | |
E8601DTw.d | ., 16 - 26 | ., 0 - 6 |
E8601DXw. | ., 20 - 35 | |
E8601LXw. | ., 20 - 35 | |
E8601LZw. | ., 9 - 20 | |
E8601TMw.d | ., 8 - 15 | ., 0 - 6 |
E8601TXw. | ., 9 - 20 | |
E8601TZw.d | ., 9 - 20 | ., 0 - 6 |
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments three columns must be present:
Domain Name - passed as the 'xportr.domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Format Name - passed as the 'xportr.format_name' option. Default:
"format". Character values to update the 'format.sas
' attribute of the
column. This is passed to haven::write
to note the format.
Variable Name - passed as the 'xportr.variable_name' option. Default: "variable". This is used to match columns in '.df' argument and the metadata.
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), BRTHDT = c(1, 1, 2) ) metadata <- data.frame( dataset = c("adsl", "adsl"), variable = c("USUBJID", "BRTHDT"), format = c(NA, "DATE9.") ) adsl <- xportr_format(adsl, metadata, domain = "adsl")
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), BRTHDT = c(1, 1, 2) ) metadata <- data.frame( dataset = c("adsl", "adsl"), variable = c("USUBJID", "BRTHDT"), format = c(NA, "DATE9.") ) adsl <- xportr_format(adsl, metadata, domain = "adsl")
Assigns variable label from a variable level metadata to a given data frame. This function will give detect if a label is greater than 40 characters which isn't allowed in XPT v5. If labels aren't present for the variable it will be assigned an empty character value. Labels are stored in the 'label' attribute of the column.
xportr_label( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
xportr_label( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
metacore |
Data frame with label attributes for each variable.
label_log()
is the primary messaging tool for
xportr_label()
. If there are any columns present in the '.df' that are not
noted in the metadata, they cannot be assigned a label and a message will
be generated noting the number or variables that have not been assigned a
label.
If variables were not found in the metadata and the value passed to the 'verbose' argument is 'stop', 'warn', or 'message', a message will be generated detailing the variables that were missing in metadata.
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments three columns must be present:
Domain Name - passed as the 'xportr.domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Variable Name - passed as the 'xportr.variable_name' option. Default: "variable". This is used to match columns in '.df' argument and the metadata.
Variable Label - passed as the 'xportr.label' option.
Default: "label". These character values to update the 'label' attribute of
the column. This is passed to haven::write
to note the label.
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), SITEID = c(001, 002, 003), AGE = c(63, 35, 27), SEX = c("M", "F", "M") ) metadata <- data.frame( dataset = "adsl", variable = c("USUBJID", "SITEID", "AGE", "SEX"), label = c("Unique Subject Identifier", "Study Site Identifier", "Age", "Sex") ) adsl <- xportr_label(adsl, metadata, domain = "adsl")
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), SITEID = c(001, 002, 003), AGE = c(63, 35, 27), SEX = c("M", "F", "M") ) metadata <- data.frame( dataset = "adsl", variable = c("USUBJID", "SITEID", "AGE", "SEX"), label = c("Unique Subject Identifier", "Study Site Identifier", "Age", "Sex") ) adsl <- xportr_label(adsl, metadata, domain = "adsl")
Assigns the SAS length to a specified data frame, either from a metadata object or based on the calculated maximum data length. If a length isn't present for a variable the length value is set to maximum data length for character columns, and 8 for non-character columns. This value is stored in the 'width' attribute of the column.
xportr_length( .df, metadata = NULL, domain = NULL, verbose = NULL, length_source = c("metadata", "data"), metacore = deprecated() )
xportr_length( .df, metadata = NULL, domain = NULL, verbose = NULL, length_source = c("metadata", "data"), metacore = deprecated() )
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
length_source |
Choose the assigned length from either metadata or data. If Permitted Values: |
metacore |
Data frame with SAS default length attributes for each variable.
length_log
is the primary messaging tool for
xportr_length
. If there are any columns present in the '.df' that are not
noted in the metadata, they cannot be assigned a length and a message will
be generated noting the number or variables that have not been assigned a
length.
If variables were not found in the metadata and the value passed to the 'verbose' argument is 'stop', 'warn', or 'message', a message will be generated detailing the variables that were missing in the metadata.
The argument passed in the 'metadata' argument can either
be a {metacore}
object, or a data.frame containing the data listed below. If
metacore is used, no changes to options are required.
For data.frame 'metadata' arguments three columns must be present:
Domain Name - passed as the 'xportr.domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Variable Name - passed as the 'xportr.variable_name' option. Default: "variable". This is used to match columns in '.df' argument and the metadata.
Variable Label - passed as the 'xportr.length' option.
Default: "length". These numeric values to update the 'width' attribute of
the column. This is passed to haven::write
to note the variable length.
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), BRTHDT = c(1, 1, 2) ) metadata <- data.frame( dataset = c("adsl", "adsl"), variable = c("USUBJID", "BRTHDT"), length = c(10, 8) ) adsl <- xportr_length(adsl, metadata, domain = "adsl", length_source = "metadata")
adsl <- data.frame( USUBJID = c(1001, 1002, 1003), BRTHDT = c(1, 1, 2) ) metadata <- data.frame( dataset = c("adsl", "adsl"), variable = c("USUBJID", "BRTHDT"), length = c(10, 8) ) adsl <- xportr_length(adsl, metadata, domain = "adsl", length_source = "metadata")
Sets metadata and/or domain for a dataset in a way that can be accessed by other xportr functions. If used at the start of an xportr pipeline, it removes the need to set metadata and domain at each step individually. For details on the format of the metadata, see the 'Metadata' section for each function in question.
xportr_metadata(.df, metadata = NULL, domain = NULL, verbose = NULL)
xportr_metadata(.df, metadata = NULL, domain = NULL, verbose = NULL)
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
.df
dataset with metadata and domain attributes set
metadata <- data.frame( dataset = "test", variable = c("Subj", "Param", "Val", "NotUsed"), type = c("numeric", "character", "numeric", "character"), format = NA, order = c(1, 3, 4, 2) ) adlb <- data.frame( Subj = as.character(123, 456, 789), Different = c("a", "b", "c"), Val = c("1", "2", "3"), Param = c("param1", "param2", "param3") ) xportr_metadata(adlb, metadata, "test") library(magrittr) adlb %>% xportr_metadata(metadata, "test") %>% xportr_type() %>% xportr_order()
metadata <- data.frame( dataset = "test", variable = c("Subj", "Param", "Val", "NotUsed"), type = c("numeric", "character", "numeric", "character"), format = NA, order = c(1, 3, 4, 2) ) adlb <- data.frame( Subj = as.character(123, 456, 789), Different = c("a", "b", "c"), Val = c("1", "2", "3"), Param = c("param1", "param2", "param3") ) xportr_metadata(adlb, metadata, "test") library(magrittr) adlb %>% xportr_metadata(metadata, "test") %>% xportr_type() %>% xportr_order()
There are two mechanisms for working with options for xportr. One is the
options()
function, which is part of base R, and the other is the
xportr_options()
function, which is in the xportr package. The reason for
these two mechanisms is has to do with legacy code and scoping.
The options()
function sets options globally, for the duration of the R
process. The getOption()
function retrieves the value of an option. All
xportr related options of this type are prefixed with "xportr."
.
xportr_options(...)
xportr_options(...)
... |
Options to set, with the form |
options()
defaults to "dataset"
The name of the domain "name" column in dataset metadata.
defaults to "label"
The column noting the dataset label in dataset metadata.
defaults to "dataset"
The name of the domain "name" column in variable metadata.
defaults to "variable"
The name of the variable "name" in variable metadata.
defaults to "type"
The name of the variable type column in variable metadata.
defaults to "label"
The name of the variable label column in variable metadata.
defaults to "length"
The name of the variable length column in variable metadata.
defaults to "order"
The name of the variable order column in variable metadata.
defaults to "format"
The name of the variable format column in variable metadata.
defaults to "none"
The default argument for the 'verbose' argument for xportr_format
.
defaults to "none"
The default argument for the 'verbose' argument for xportr_label
.
defaults to "none"
The default argument for the 'verbose' argument for xportr_length
.
defaults to "label"
The default argument for the 'verbose' argument for xportr_type
.
defaults to "character"
The default character vector used to explicitly coerce R classes to character XPT types.
defaults to c("character", "char", "text", "date", "posixct", "posixt", "datetime", "time", "partialdate", "partialtime", "partialdatetime", "incompletedatetime", "durationdatetime", "intervaldatetime")
The default character vector used to explicitly coerce R classes to character XPT types.
defaults to c("integer", "numeric", "num", "float")
The default character vector used to explicitly coerce R classes to numeric XPT types.
defaults to c("integer", "float", "numeric", "posixct", "posixt", "time", "date", "hms")
The default character vector used to explicitly coerce R classes to numeric XPT types.
xportr_options()
Alternative to the options()
, the xportr_options()
function can be used to set the options.
The xportr_options()
function also returns the current options when a character vector of
the options keys are passed into it. If nothing is passed into it, it returns the state of all xportr options.
xportr_options("xportr.df_label") xportr_options(xportr.df_label = "data_label", xportr.label = "custom_label") xportr_options(c("xportr.label", "xportr.df_label")) xportr_options()
xportr_options("xportr.df_label") xportr_options(xportr.df_label = "data_label", xportr.label = "custom_label") xportr_options(c("xportr.label", "xportr.df_label")) xportr_options()
The dplyr::arrange()
function is used to order the columns of the dataframe.
Any variables that are missing an order value are appended to the end of the dataframe
after all of the variables that have an order.
xportr_order( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
xportr_order( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
metacore |
Dataframe that has been re-ordered according to spec
var_ord_msg()
is the primary messaging tool for
xportr_order()
. There are two primary messages that are output from
var_ord_msg()
. The first is the "moved" variables. These are the variables
that were not found in the metadata file and moved to the end of the
dataset. A message will be generated noting the number, if any, of
variables that were moved to the end of the dataset. If any variables were
moved, and the 'verbose' argument is 'stop', 'warn', or 'message', a
message will be generated detailing the variables that were moved.
The second primary message is the number of variables that were in the dataset, but not in the correct order. A message will be generated noting the number, if any, of variables that have been reordered. If any variables were reordered, and the 'verbose' argument is 'stop', 'warn', or 'message', a message will be generated detailing the variables that were reordered.
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments three columns must be present:
Domain Name - passed as the 'xportr.domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Variable Name - passed as the 'xportr.variable_name' option. Default: "variable". This is used to match columns in '.df' argument and the metadata.
Variable Order - passed as the 'xportr.order_name' option. Default: "order". These values used to arrange the order of the variables. If the values of order metadata are not numeric, they will be coerced to prevent alphabetical sorting of numeric values.
adsl <- data.frame( BRTHDT = c(1, 1, 2), STUDYID = c("mid987650", "mid987650", "mid987650"), TRT01A = c("Active", "Active", "Placebo"), USUBJID = c(1001, 1002, 1003) ) metadata <- data.frame( dataset = c("adsl", "adsl", "adsl", "adsl"), variable = c("STUDYID", "USUBJID", "TRT01A", "BRTHDT"), order = 1:4 ) adsl <- xportr_order(adsl, metadata, domain = "adsl")
adsl <- data.frame( BRTHDT = c(1, 1, 2), STUDYID = c("mid987650", "mid987650", "mid987650"), TRT01A = c("Active", "Active", "Placebo"), USUBJID = c(1001, 1002, 1003) ) metadata <- data.frame( dataset = c("adsl", "adsl", "adsl", "adsl"), variable = c("STUDYID", "USUBJID", "TRT01A", "BRTHDT"), order = 1:4 ) adsl <- xportr_order(adsl, metadata, domain = "adsl")
This function is deprecated. Please use the argument
max_gb_size
in the function xportr_write()' instead.
Per the FDA Study Data Technical Conformance
Guide(https://www.fda.gov/media/88173/download) section 3.3.2, dataset files
sizes shouldn't exceed 5 GB. If datasets are large enough, they should be
split based on a variable. For example, laboratory readings in ADLB
can be
split by LBCAT
to split up hematology and chemistry data.
This function will tell xportr_write()
to split the data frame based on the
variable passed in split_by
. When written, the file name will be prepended
with a number for uniqueness. These files should be noted in the Reviewer Guides per
CDISC guidance to note how you split your files.
xportr_split(.df, split_by = NULL)
xportr_split(.df, split_by = NULL)
.df |
A data frame of CDISC standard. |
split_by |
A quoted variable that will be passed to |
A data frame with an additional attribute added so xportr_write()
knows how to split the data frame.
adlb <- data.frame( USUBJID = c(1001, 1002, 1003), LBCAT = c("HEMATOLOGY", "HEMATOLOGY", "CHEMISTRY") ) adlb <- xportr_split(adlb, "LBCAT")
adlb <- data.frame( USUBJID = c(1001, 1002, 1003), LBCAT = c("HEMATOLOGY", "HEMATOLOGY", "CHEMISTRY") ) adlb <- xportr_split(adlb, "LBCAT")
XPT v5 datasets only have data types of character and numeric. xportr_type()
attempts to collapse R classes to those two XPT types. The
'xportr.character_types' option is used to explicitly collapse the class of a
column to character using as.character()
. Similarly, 'xportr.numeric_types'
will collapse a column to a numeric type. (See xportr_options()
for default
values of these options.) If no type is passed for a variable, it is assumed
to be numeric and coerced with as.numeric()
.
xportr_type( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
xportr_type( .df, metadata = NULL, domain = NULL, verbose = NULL, metacore = deprecated() )
.df |
A data frame of CDISC standard. |
metadata |
A data frame containing variable level metadata. See 'Metadata' section for details. |
domain |
Appropriate CDISC dataset name, e.g. ADAE, DM. Used to subset the metadata object. |
verbose |
The action this function takes when an action is taken on the dataset or function validation finds an issue. See 'Messaging' section for details. Options are 'stop', 'warn', 'message', and 'none' |
metacore |
Certain care should be taken when using timing variables. R serializes dates
based on a reference date of 01/01/1970 where XPT uses 01/01/1960. This can
result in dates being 10 years off when outputting from R to XPT if you're
using a date class. For this reason, xportr
will try to determine what
should happen with variables that appear to be used to denote time.
Returns the modified table.
type_log()
is the primary messaging tool for
xportr_type()
. The number of column types that mismatch the reported type
in the metadata, if any, is reported by xportr_type()
. If there are any
type mismatches, and the 'verbose' argument is 'stop', 'warn', or
'message', each mismatch will be detailed with the actual type in the data
and the type noted in the metadata.
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments four columns must be present:
Domain Name - passed as the 'xportr.domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Variable Name - passed as the 'xportr.variable_name' option. Default: "variable". This is used to match columns in '.df' argument and the metadata.
Variable Type - passed as the 'xportr.type_name'. Default: "type". This is used to note the XPT variable "type" options are numeric or character.
(Option only) Character Types - The list of classes that should be explicitly coerced to a XPT Character type. Default: c( "character", "char", "text", "date", "posixct", "posixt", "datetime", "time", "partialdate", "partialtime", "partialdatetime", "incompletedatetime", "durationdatetime", "intervaldatetime")'
(Option only) Numeric Types - The list of classes that should be explicitly coerced to a XPT numeric type. Default: c("integer", "numeric", "num", "float")
metadata <- data.frame( dataset = "test", variable = c("Subj", "Param", "Val", "NotUsed"), type = c("numeric", "character", "numeric", "character") ) .df <- data.frame( Subj = as.character(123, 456, 789), Different = c("a", "b", "c"), Val = c("1", "2", "3"), Param = c("param1", "param2", "param3") ) df2 <- xportr_type(.df, metadata, "test")
metadata <- data.frame( dataset = "test", variable = c("Subj", "Param", "Val", "NotUsed"), type = c("numeric", "character", "numeric", "character") ) .df <- data.frame( Subj = as.character(123, 456, 789), Different = c("a", "b", "c"), Val = c("1", "2", "3"), Param = c("param1", "param2", "param3") ) df2 <- xportr_type(.df, metadata, "test")
Writes a local data frame into SAS transport file of version 5. The SAS transport format is an open format, as is required for submission of the data to the FDA.
xportr_write( .df, path, max_size_gb = NULL, metadata = NULL, domain = NULL, strict_checks = FALSE, label = deprecated() )
xportr_write( .df, path, max_size_gb = NULL, metadata = NULL, domain = NULL, strict_checks = FALSE, label = deprecated() )
Variable and dataset labels are stored in the "label" attribute.
SAS format are stored in the "SASformat" attribute.
SAS type are based on the metadata
attribute.
A data frame. xportr_write()
returns the input data invisibly.
The argument passed in the 'metadata' argument can either be a metacore object, or a data.frame containing the data listed below. If metacore is used, no changes to options are required.
For data.frame 'metadata' arguments two columns must be present:
Domain Name - passed as the 'xportr.df_domain_name' option. Default: "dataset". This is the column subset by the 'domain' argument in the function.
Label Name - passed as the 'xportr.df_label' option. Default:
"label". Character values to update the 'label' attribute of the
dataframe This is passed to haven::write_xpt
to note the label.
adsl <- data.frame( SUBL = as.character(123, 456, 789), DIFF = c("a", "b", "c"), VAL = c("1", "2", "3"), PARAM = c("param1", "param2", "param3") ) var_spec <- data.frame( dataset = "adsl", label = "Subject-Level Analysis Dataset", data_label = "ADSL" ) xportr_write(adsl, path = paste0(tempdir(), "/adsl.xpt"), domain = "adsl", metadata = var_spec, strict_checks = FALSE )
adsl <- data.frame( SUBL = as.character(123, 456, 789), DIFF = c("a", "b", "c"), VAL = c("1", "2", "3"), PARAM = c("param1", "param2", "param3") ) var_spec <- data.frame( dataset = "adsl", label = "Subject-Level Analysis Dataset", data_label = "ADSL" ) xportr_write(adsl, path = paste0(tempdir(), "/adsl.xpt"), domain = "adsl", metadata = var_spec, strict_checks = FALSE )
Function used to validate dataframes before they are sent to
haven::write_xpt
for writing.
xpt_validate(data)
xpt_validate(data)
data |
Dataset to be exported as xpt file |
Returns a character vector of failed conditions