| Title: | A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows |
|---|---|
| Description: | Create an immutable container holding metadata for the purpose of better enabling programming activities and functionality of other packages within the clinical programming workflow. |
| Authors: | Liam Hobby [aut, cre], Christina Fillmore [aut] (ORCID: <https://orcid.org/0000-0003-0595-2302>), Bill Denney [aut], Maya Gans [aut] (ORCID: <https://orcid.org/0000-0002-5452-6089>), Ashley Tarasiewicz [aut], Mike Stackhouse [aut] (ORCID: <https://orcid.org/0000-0001-6030-723X>), Tamara Senior [aut], GSK/Atorus JPT [cph, fnd] |
| Maintainer: | Liam Hobby <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-05-21 09:53:37 UTC |
| Source: | https://github.com/atorus-research/metacore |
These functions check to see if values (e.g labels, formats) that should be consistent for a variable across all data are actually consistent.
check_inconsistent_labels(metacore) check_inconsistent_types(metacore) check_inconsistent_formats(metacore)check_inconsistent_labels(metacore) check_inconsistent_types(metacore) check_inconsistent_formats(metacore)
metacore |
metacore object to check |
If all variables are consistent it will return a message. If there are inconsistencies it will return a message and a dataset of the variables with inconsistencies.
## EXAMPLE WITH DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) check_inconsistent_labels(metacore) check_inconsistent_types(metacore) ## EXAMPLE WITHOUT DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_SDTM.rda")) check_inconsistent_labels(metacore) check_inconsistent_formats(metacore) check_inconsistent_types(metacore)## EXAMPLE WITH DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) check_inconsistent_labels(metacore) check_inconsistent_types(metacore) ## EXAMPLE WITHOUT DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_SDTM.rda")) check_inconsistent_labels(metacore) check_inconsistent_formats(metacore) check_inconsistent_types(metacore)
Column Validation Function
check_structure(.data, col, func, any_na_acceptable, nm)check_structure(.data, col, func, any_na_acceptable, nm)
.data |
the dataframe to check the column for |
col |
the column to test |
func |
the function to use to assert column structure |
any_na_acceptable |
boolean, testing if the column can have missing |
nm |
name of column to check (for warning and error clarification) |
Check Words in Column
check_words(..., col)check_words(..., col)
... |
permissible words in the column |
col |
the column to check for specific words |
This function creates a table from excel sheets. This is mainly used internally for building spec readers, but is exported so others who need to build spec readers can use it.
create_tbl(doc, cols, context)create_tbl(doc, cols, context)
doc |
list of sheets from a excel doc |
cols |
vector of regex to get a datasets base on which columns it has. If the vector is named it will also rename the columns |
context |
Provides the calling context for better error messaging to the user |
dataset (or list of datasets if not specific enough)
Given a path, this function converts the define xml to a DataDef/Metacore object.
define_to_metacore(path, quiet = deprecated(), verbose = "message")define_to_metacore(path, quiet = deprecated(), verbose = "message")
Metacore/DataDef object
Returns the control term (a vector for permitted values and a tibble for code lists) for a given variable. The dataset can be optionally specified if there is different control terminology for different datasets
get_control_term(metacode, variable, dataset = NULL)get_control_term(metacode, variable, dataset = NULL)
metacode |
metacore object |
variable |
A variable name to get the controlled terms for. This can either be a string or just the name of the variable |
dataset |
A dataset name. This is not required if there is only one set of control terminology across all datasets |
a vector for permitted values and a 2-column tibble for codelists
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_control_term(meta_ex, QVAL, SUPPAE) get_control_term(meta_ex, "QVAL", "SUPPAE") ## End(Not run)## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_control_term(meta_ex, QVAL, SUPPAE) get_control_term(meta_ex, "QVAL", "SUPPAE") ## End(Not run)
Returns the dataset keys for a given dataset
get_keys(metacode, dataset)get_keys(metacode, dataset)
metacode |
metacore object |
dataset |
A dataset name |
a 2-column tibble with dataset key variables and key sequence
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_keys(meta_ex, "AE") get_keys(meta_ex, AE) ## End(Not run)## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_keys(meta_ex, "AE") get_keys(meta_ex, AE) ## End(Not run)
Is DatasetMeta object
is_DatasetMeta(x)is_DatasetMeta(x)
x |
object to check |
TRUE if DatasetMeta, FALSE if not
load(metacore_example("pilot_ADaM.rda")) adsl <- select_dataset(metacore, "ADSL", quiet = TRUE) is_DatasetMeta("DUMMY") # Expect FALSE is_DatasetMeta(metacore) # Expect FALSE is_DatasetMeta(adsl) # Expect TRUEload(metacore_example("pilot_ADaM.rda")) adsl <- select_dataset(metacore, "ADSL", quiet = TRUE) is_DatasetMeta("DUMMY") # Expect FALSE is_DatasetMeta(metacore) # Expect FALSE is_DatasetMeta(adsl) # Expect TRUE
Is metacore object
is_metacore(x)is_metacore(x)
x |
object to check |
TRUE if metacore, FALSE if not
# Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) is_metacore(metacore)# Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) is_metacore(metacore)
load metacore object
load_metacore(path = NULL)load_metacore(path = NULL)
path |
location of the metacore object to load into memory |
metacore object in memory
R6 Class wrapper to create your own metacore object
metacore( ds_spec = tibble(dataset = character(), structure = character(), label = character()), ds_vars = tibble(dataset = character(), variable = character(), keep = NULL, mandatory = logical(), key_seq = integer(), order = integer(), core = character(), supp_flag = logical()), var_spec = tibble(variable = character(), label = character(), length = integer(), type = character(), common = character(), format = character()), value_spec = tibble(dataset = character(), variable = character(), where = character(), type = character(), sig_dig = integer(), code_id = character(), origin = character(), derivation_id = integer()), derivations = tibble(derivation_id = integer(), derivation = character()), codelist = tibble(code_id = character(), name = character(), type = character(), codes = list()), supp = tibble(dataset = character(), variable = character(), idvar = character(), qeval = character()), quiet = deprecated(), verbose = "message" )metacore( ds_spec = tibble(dataset = character(), structure = character(), label = character()), ds_vars = tibble(dataset = character(), variable = character(), keep = NULL, mandatory = logical(), key_seq = integer(), order = integer(), core = character(), supp_flag = logical()), var_spec = tibble(variable = character(), label = character(), length = integer(), type = character(), common = character(), format = character()), value_spec = tibble(dataset = character(), variable = character(), where = character(), type = character(), sig_dig = integer(), code_id = character(), origin = character(), derivation_id = integer()), derivations = tibble(derivation_id = integer(), derivation = character()), codelist = tibble(code_id = character(), name = character(), type = character(), codes = list()), supp = tibble(dataset = character(), variable = character(), idvar = character(), qeval = character()), quiet = deprecated(), verbose = "message" )
metacore comes bundled with a number of sample files in its inst/extdata
directory. This function make them easy to access. When testing or writing
examples in other packages, it is best to use the 'pilot_ADaM.rda' example as
it loads fastest.
metacore_example(file = NULL)metacore_example(file = NULL)
file |
Name of file. If |
metacore_example() metacore_example("mock_spec.xlsx")metacore_example() metacore_example("mock_spec.xlsx")
Select method to subset by a single dataframe
MetaCore_filter(value)MetaCore_filter(value)
value |
the dataframe to subset by |
Given a path to a file, this function reads in all sheets of an excel file
read_all_sheets(path)read_all_sheets(path)
path |
string of the file path |
a list of datasets
save metacore object
save_metacore(metacore_object, path = NULL)save_metacore(metacore_object, path = NULL)
metacore_object |
the metacore object in memory to save to disc |
path |
file path and file name to save metacore object |
an .rda file
Select metacore object to single dataset
select_dataset( .data, dataset, simplify = FALSE, quiet = deprecated(), verbose = "message" )select_dataset( .data, dataset, simplify = FALSE, quiet = deprecated(), verbose = "message" )
a filtered subset of the metacore object
This function takes the location of an excel specification document and reads it in as a meta core object. At the moment it only supports specification in the format of pinnacle 21 specifications. But, the section level spec builder can be used as building blocks for bespoke specification documents.
spec_to_metacore( path, quiet = deprecated(), where_sep_sheet = TRUE, verbose = "message" )spec_to_metacore( path, quiet = deprecated(), where_sep_sheet = TRUE, verbose = "message" )
given a spec document it returns a metacore object
# Run `spec_to_metacore` with `verbose = "collapse"` spec_path <- metacore_example("p21_mock.xlsx") metacore <- spec_to_metacore( path = spec_path, verbose = "collapse" ) # Run `spec_to_metacore` with `verbose = "warn"` metacore <- spec_to_metacore( path = spec_path, verbose = "warn" )# Run `spec_to_metacore` with `verbose = "collapse"` spec_path <- metacore_example("p21_mock.xlsx") metacore <- spec_to_metacore( path = spec_path, verbose = "collapse" ) # Run `spec_to_metacore` with `verbose = "warn"` metacore <- spec_to_metacore( path = spec_path, verbose = "warn" )
Check the type of spec document
spec_type(path)spec_type(path)
path |
file location as a string |
returns string indicating the type of spec document
Creates the value_spec from a list of datasets (optionally filtered by the
sheet input). The named vector *_cols is used to determine which is the
correct sheet and renames the columns.
spec_type_to_codelist( doc, codelist_cols = c(code_id = "ID", name = "[N|n]ame", code = "^[C|c]ode|^[T|t]erm", decode = "[D|d]ecode"), permitted_val_cols = NULL, dict_cols = c(code_id = "ID", name = "[N|n]ame", dictionary = "[D|d]ictionary", version = "[V|v]ersion"), sheets = NULL, simplify = FALSE )spec_type_to_codelist( doc, codelist_cols = c(code_id = "ID", name = "[N|n]ame", code = "^[C|c]ode|^[T|t]erm", decode = "[D|d]ecode"), permitted_val_cols = NULL, dict_cols = c(code_id = "ID", name = "[N|n]ame", dictionary = "[D|d]ictionary", version = "[V|v]ersion"), sheets = NULL, simplify = FALSE )
doc |
Named list of datasets @seealso |
codelist_cols |
Named vector of column names that make up the codelist. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
permitted_val_cols |
Named vector of column names that make up the permitted value The column names can be regular expressions for more flexibility. This is optional, can be left as null if there isn't a permitted value sheet |
dict_cols |
Named vector of column names that make up the dictionary value The column names can be regular expressions for more flexibility. This is optional, can be left as null if there isn't a permitted value sheet |
sheets |
Optional, regular expressions of the sheets |
simplify |
Boolean value, if true will convert code/decode pairs that are all equal to a permitted value list. True by default |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_derivations(),
spec_type_to_ds_spec(),
spec_type_to_ds_vars(),
spec_type_to_value_spec(),
spec_type_to_var_spec()
Creates the derivation table from a list of datasets (optionally filtered by
the sheet input). The named vector cols is used to determine which is the
correct sheet and renames the columns. The derivation will be used for
"derived" origins, the comments for "assigned" origins, and predecessor for
"predecessor" origins.
spec_type_to_derivations( doc, cols = c(derivation_id = "ID", derivation = "[D|d]efinition|[D|d]escription"), sheet = "Method|Derivations?", var_cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", predecessor = "[P|p]redecessor", comment = "[C|c]omment") )spec_type_to_derivations( doc, cols = c(derivation_id = "ID", derivation = "[D|d]efinition|[D|d]escription"), sheet = "Method|Derivations?", var_cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", predecessor = "[P|p]redecessor", comment = "[C|c]omment") )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
var_cols |
Named vector of the name(s) of the origin, predecessor and comment columns. These do not have to be on the specified sheet. |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist(),
spec_type_to_ds_spec(),
spec_type_to_ds_vars(),
spec_type_to_value_spec(),
spec_type_to_var_spec()
Creates the ds_spec from a list of datasets (optionally filtered by the sheet
input). The named vector cols is used to determine which is the correct
sheet and renames the columns
spec_type_to_ds_spec( doc, cols = c(dataset = "[N|n]ame|[D|d]ataset|[D|d]omain", structure = "[S|s]tructure", label = "[L|l]abel|[D|d]escription"), sheet = NULL )spec_type_to_ds_spec( doc, cols = c(dataset = "[N|n]ame|[D|d]ataset|[D|d]omain", structure = "[S|s]tructure", label = "[L|l]abel|[D|d]escription"), sheet = NULL )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist(),
spec_type_to_derivations(),
spec_type_to_ds_vars(),
spec_type_to_value_spec(),
spec_type_to_var_spec()
Creates the ds_vars from a list of datasets (optionally filtered by the sheet
input). The named vector cols is used to determine which is the correct
sheet and renames the columns
spec_type_to_ds_vars( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[V|v]ariable [[N|n]ame]?|[V|v]ariables?", order = "[V|v]ariable [O|o]rder|[O|o]rder", mandatory = "[K|k]eep|[M|m]andatory"), key_seq_sep_sheet = TRUE, key_seq_cols = c(dataset = "Dataset", key_seq = "Key Variables"), sheet = "[V|v]ar|Datasets" )spec_type_to_ds_vars( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[V|v]ariable [[N|n]ame]?|[V|v]ariables?", order = "[V|v]ariable [O|o]rder|[O|o]rder", mandatory = "[K|k]eep|[M|m]andatory"), key_seq_sep_sheet = TRUE, key_seq_cols = c(dataset = "Dataset", key_seq = "Key Variables"), sheet = "[V|v]ar|Datasets" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
key_seq_sep_sheet |
A boolean to indicate if the key sequence is on a
separate sheet. If set to false add the key_seq column name to the |
key_seq_cols |
names vector to get the key_sequence for each dataset |
sheet |
Regular expression for the sheet names |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist(),
spec_type_to_derivations(),
spec_type_to_ds_spec(),
spec_type_to_value_spec(),
spec_type_to_var_spec()
Creates the value_spec from a list of datasets (optionally filtered by the
sheet input). The named vector cols is used to determine which is the
correct sheet and renames the columns
spec_type_to_value_spec( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", type = "[T|t]ype", code_id = "[C|c]odelist|Controlled Term", sig_dig = "[S|s]ignificant", where = "[W|w]here", derivation_id = "[M|m]ethod", predecessor = "[P|p]redecessor"), sheet = NULL, where_sep_sheet = TRUE, where_cols = c(id = "ID", where = c("Variable", "Comparator", "Value")), var_sheet = "[V|v]ar" )spec_type_to_value_spec( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", type = "[T|t]ype", code_id = "[C|c]odelist|Controlled Term", sig_dig = "[S|s]ignificant", where = "[W|w]here", derivation_id = "[M|m]ethod", predecessor = "[P|p]redecessor"), sheet = NULL, where_sep_sheet = TRUE, where_cols = c(id = "ID", where = c("Variable", "Comparator", "Value")), var_sheet = "[V|v]ar" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
where_sep_sheet |
Boolean value to control if the where information in a
separate dataset. If the where information is on a separate sheet, set to
true and provide the column information with the |
where_cols |
Named list with an id and where field. All columns in the where field will be collapsed together |
var_sheet |
Name of sheet with the Variable information on it. Metacore expects each variable will have a row in the value_spec. Because many specification only have information in the value tab this is added. If the information already exists in the value tab of your specification set to NULL |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist(),
spec_type_to_derivations(),
spec_type_to_ds_spec(),
spec_type_to_ds_vars(),
spec_type_to_var_spec()
Creates the var_spec from a list of datasets (optionally filtered by the sheet
input). The named vector cols is used to determine which is the correct
sheet and renames the columns. (Note: the keep column will be converted logical)
spec_type_to_var_spec( doc, cols = c(variable = "[N|n]ame|[V|v]ariables?", length = "[L|l]ength", label = "[L|l]abel", type = "[T|t]ype", dataset = "[D|d]ataset|[D|d]omain", format = "[F|f]ormat"), sheet = "[V|v]ar" )spec_type_to_var_spec( doc, cols = c(variable = "[N|n]ame|[V|v]ariables?", length = "[L|l]ength", label = "[L|l]abel", type = "[T|t]ype", dataset = "[D|d]ataset|[D|d]omain", format = "[F|f]ormat"), sheet = "[V|v]ar" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist(),
spec_type_to_derivations(),
spec_type_to_ds_spec(),
spec_type_to_ds_vars(),
spec_type_to_value_spec()
This function that is a wrapper to the functions is_metacore and
is_DatasetMeta.
This function is not intended to be called directly by the user. It is
used as a guard clause in many features of the {metatools} package that are
intended only to be used with the subsetted Metacore object of class type
DatasetMeta. If either of the wrapped functions return FALSE then
execution is stopped and an appropriate error message is displayed.
verify_DatasetMeta(metacore)verify_DatasetMeta(metacore)
metacore |
An object whose class type needs to be checked. |
Logical: TRUE if the class type of metacore is DatasetMeta,
otherwise abort with errors.
load(metacore_example("pilot_ADaM.rda")) adsl <- select_dataset(metacore, "ADSL", quiet = TRUE) ## Not run: verify_DatasetMeta("DUMMY") # Expect error verify_DatasetMeta(metacore) # Expect error ## End(Not run) verify_DatasetMeta(adsl) # Expect valid, i.e., return TRUEload(metacore_example("pilot_ADaM.rda")) adsl <- select_dataset(metacore, "ADSL", quiet = TRUE) ## Not run: verify_DatasetMeta("DUMMY") # Expect error verify_DatasetMeta(metacore) # Expect error ## End(Not run) verify_DatasetMeta(adsl) # Expect valid, i.e., return TRUE
Reads in a define xml and creates a code_list table. The code_list table is a nested tibble where each row is a code list or permitted value list. The code column contains a vector of a tibble depending on if it is a permitted values or code list
xml_to_codelist(doc)xml_to_codelist(doc)
doc |
xml document |
a tibble containing the code list and permitted value information
Other xml builder:
xml_to_derivations(),
xml_to_ds_spec(),
xml_to_ds_vars(),
xml_to_value_spec(),
xml_to_var_spec()
This reads in a xml document and gets all the derivations/comments. These can be cross referenced to variables using the derivation_id's
xml_to_derivations(doc)xml_to_derivations(doc)
doc |
xml document |
dataframe with derivation id's and derivations
Other xml builder:
xml_to_codelist(),
xml_to_ds_spec(),
xml_to_ds_vars(),
xml_to_value_spec(),
xml_to_var_spec()
Creates a dataset specification, which has the domain name and label for each dataset
xml_to_ds_spec(doc)xml_to_ds_spec(doc)
doc |
xml document |
data frame with the data set specifications
Other xml builder:
xml_to_codelist(),
xml_to_derivations(),
xml_to_ds_vars(),
xml_to_value_spec(),
xml_to_var_spec()
Creates the ds_vars table, which acts as a key between the datasets and the var spec
xml_to_ds_vars(doc)xml_to_ds_vars(doc)
doc |
xml document |
data frame with the dataset and variables
Other xml builder:
xml_to_codelist(),
xml_to_derivations(),
xml_to_ds_spec(),
xml_to_value_spec(),
xml_to_var_spec()
Takes a define xml and pulls out the value level metadata including codelist_id's, defines_id's, and where clause. There is one row per variable expect when there is a where clause, at which point there is one row per value.
xml_to_value_spec(doc)xml_to_value_spec(doc)
doc |
xml document |
tibble with the value level information
Other xml builder:
xml_to_codelist(),
xml_to_derivations(),
xml_to_ds_spec(),
xml_to_ds_vars(),
xml_to_var_spec()
Takes a define xml and returns a dataset with specifications for each variable. The variable will just be the variable, unless the specification for that variable differ between datasets
xml_to_var_spec(doc)xml_to_var_spec(doc)
doc |
define xml document |
data frame with variable, length, label columns
Other xml builder:
xml_to_codelist(),
xml_to_derivations(),
xml_to_ds_spec(),
xml_to_ds_vars(),
xml_to_value_spec()