Title: | A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows |
---|---|
Description: | Create an immutable container holding metadata for the purpose of better enabling programming activities and functionality of other packages within the clinical programming workflow. |
Authors: | Christina Fillmore [aut, cre] , Maya Gans [aut] , Ashley Tarasiewicz [aut], Mike Stackhouse [aut] , Tamara Senior [aut], GSK/Atorus JPT [cph, fnd] |
Maintainer: | Christina Fillmore <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2024-10-29 03:00:58 UTC |
Source: | https://github.com/atorus-research/metacore |
This function checks for vector types and accepted words
check_columns( ds_spec, ds_vars, var_spec, value_spec, derivations, codelist, supp )
check_columns( ds_spec, ds_vars, var_spec, value_spec, derivations, codelist, supp )
ds_spec |
dataset specification |
ds_vars |
dataset variables |
var_spec |
variable specification |
value_spec |
value specification |
derivations |
derivation information |
codelist |
codelist information |
supp |
supp information |
These functions check to see if values (e.g labels, formats) that should be consistent for a variable across all data are actually consistent.
check_inconsistent_labels(metacore) check_inconsistent_types(metacore) check_inconsistent_formats(metacore)
check_inconsistent_labels(metacore) check_inconsistent_types(metacore) check_inconsistent_formats(metacore)
metacore |
metacore object to check |
If all variables are consistent it will return a message. If there are inconsistencies it will return a message and a dataset of the variables with inconsistencies.
## EXAMPLE WITH DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) check_inconsistent_labels(metacore) check_inconsistent_types(metacore) ## EXAMPLE WITHOUT DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_SDTM.rda")) check_inconsistent_labels(metacore) check_inconsistent_formats(metacore) check_inconsistent_types(metacore)
## EXAMPLE WITH DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) check_inconsistent_labels(metacore) check_inconsistent_types(metacore) ## EXAMPLE WITHOUT DUPLICATES # Loads in a metacore obj called metacore load(metacore_example("pilot_SDTM.rda")) check_inconsistent_labels(metacore) check_inconsistent_formats(metacore) check_inconsistent_types(metacore)
Column Validation Function
check_structure(.data, col, func, any_na_acceptable, nm)
check_structure(.data, col, func, any_na_acceptable, nm)
.data |
the dataframe to check the column for |
col |
the column to test |
func |
the function to use to assert column structure |
any_na_acceptable |
boolean, testing if the column can have missing |
nm |
name of column to check (for warning and error clarification) |
Check Words in Column
check_words(..., col)
check_words(..., col)
... |
permissible words in the column |
col |
the column to check for specific words |
This function creates a table from excel sheets. This is mainly used internally for building spec readers, but is exported so others who need to build spec readers can use it.
create_tbl(doc, cols)
create_tbl(doc, cols)
doc |
list of sheets from a excel doc |
cols |
vector of regex to get a datasets base on which columns it has. If the vector is named it will also rename the columns |
dataset (or list of datasets if not specific enough)
Given a path, this function converts the define xml to a DataDef Object
define_to_metacore(path, quiet = FALSE)
define_to_metacore(path, quiet = FALSE)
path |
location of the define xml as a string |
quiet |
Option to quietly load in, this will suppress warnings, but not errors |
DataDef Object
Returns the control term (a vector for permitted values and a tibble for code lists) for a given variable. The dataset can be optionally specified if there is different control terminology for different datasets
get_control_term(metacode, variable, dataset = NULL)
get_control_term(metacode, variable, dataset = NULL)
metacode |
metacore object |
variable |
A variable name to get the controlled terms for. This can either be a string or just the name of the variable |
dataset |
A dataset name. This is not required if there is only one set of control terminology across all datasets |
a vector for permitted values and a 2-column tibble for codelists
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_control_term(meta_ex, QVAL, SUPPAE) get_control_term(meta_ex, "QVAL", "SUPPAE") ## End(Not run)
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_control_term(meta_ex, QVAL, SUPPAE) get_control_term(meta_ex, "QVAL", "SUPPAE") ## End(Not run)
Returns the dataset keys for a given dataset
get_keys(metacode, dataset)
get_keys(metacode, dataset)
metacode |
metacore object |
dataset |
A dataset name |
a 2-column tibble with dataset key variables and key sequence
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_keys(meta_ex, "AE") get_keys(meta_ex, AE) ## End(Not run)
## Not run: meta_ex <- spec_to_metacore(metacore_example("p21_mock.xlsx")) get_keys(meta_ex, "AE") get_keys(meta_ex, AE) ## End(Not run)
Is metacore object
is_metacore(x)
is_metacore(x)
x |
object to check |
TRUE
if metacore, FALSE
if not
# Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) is_metacore(metacore)
# Loads in a metacore obj called metacore load(metacore_example("pilot_ADaM.rda")) is_metacore(metacore)
load metacore object
load_metacore(path = NULL)
load_metacore(path = NULL)
path |
location of the metacore object to load into memory |
metacore object in memory
R6 Class wrapper to create your own metacore object
metacore( ds_spec = tibble(dataset = character(), structure = character(), label = character()), ds_vars = tibble(dataset = character(), variable = character(), keep = logical(), key_seq = integer(), order = integer(), core = character(), supp_flag = logical()), var_spec = tibble(variable = character(), label = character(), length = integer(), type = character(), common = character(), format = character()), value_spec = tibble(dataset = character(), variable = character(), where = character(), type = character(), sig_dig = integer(), code_id = character(), origin = character(), derivation_id = integer()), derivations = tibble(derivation_id = integer(), derivation = character()), codelist = tibble(code_id = character(), name = character(), type = character(), codes = list()), supp = tibble(dataset = character(), variable = character(), idvar = character(), qeval = character()) )
metacore( ds_spec = tibble(dataset = character(), structure = character(), label = character()), ds_vars = tibble(dataset = character(), variable = character(), keep = logical(), key_seq = integer(), order = integer(), core = character(), supp_flag = logical()), var_spec = tibble(variable = character(), label = character(), length = integer(), type = character(), common = character(), format = character()), value_spec = tibble(dataset = character(), variable = character(), where = character(), type = character(), sig_dig = integer(), code_id = character(), origin = character(), derivation_id = integer()), derivations = tibble(derivation_id = integer(), derivation = character()), codelist = tibble(code_id = character(), name = character(), type = character(), codes = list()), supp = tibble(dataset = character(), variable = character(), idvar = character(), qeval = character()) )
ds_spec |
contains each dataset in the study, with the labels for each |
ds_vars |
information on what variables are in each dataset + plus dataset specific variable information |
var_spec |
variable information that is shared across all datasets |
value_spec |
parameter specific information, as data is long the specs for wbc might be difference the hgb |
derivations |
contains derivation, it allows for different variables to have the same derivation |
codelist |
contains the code/decode information |
supp |
contains the idvar and qeval information for supplemental variables |
metacore comes bundled with a number of sample files in its inst/extdata
directory. This function make them easy to access. When testing or writing
examples in other packages, it is best to use the 'pilot_ADaM.rda' example as
it loads fastest.
metacore_example(file = NULL)
metacore_example(file = NULL)
file |
Name of file. If |
metacore_example() metacore_example("mock_spec.xlsx")
metacore_example() metacore_example("mock_spec.xlsx")
Select method to subset by a single dataframe
MetaCore_filter(value)
MetaCore_filter(value)
value |
the dataframe to subset by |
Given a path to a file, this function reads in all sheets of an excel file
read_all_sheets(path)
read_all_sheets(path)
path |
string of the file path |
a list of datasets
save metacore object
save_metacore(metacore_object, path = NULL)
save_metacore(metacore_object, path = NULL)
metacore_object |
the metacore object in memory to save to disc |
path |
file path and file name to save metacore object |
an .rda file
Select metacore object to single dataset
select_dataset(.data, dataset, simplify = FALSE)
select_dataset(.data, dataset, simplify = FALSE)
.data |
the metacore object of dataframes |
dataset |
the specific dataset to subset by |
simplify |
return a single dataframe |
a filtered subset of the metacore object
This function takes the location of an excel specification document and reads it in as a meta core object. At the moment it only supports specification in the format of pinnacle 21 specifications. But, the section level spec builder can be used as building blocks for bespoke specification documents.
spec_to_metacore(path, quiet = FALSE, where_sep_sheet = TRUE)
spec_to_metacore(path, quiet = FALSE, where_sep_sheet = TRUE)
path |
string of file location |
quiet |
Option to quietly load in, this will suppress warnings, but not errors |
where_sep_sheet |
Option to tell if the where is in a separate sheet, like in older p21 specs or in a single sheet like newer p21 specs |
given a spec document it returns a metacore object
Check the type of spec document
spec_type(path)
spec_type(path)
path |
file location as a string |
returns string indicating the type of spec document
Creates the value_spec from a list of datasets (optionally filtered by the
sheet input). The named vector *_cols
is used to determine which is the
correct sheet and renames the columns.
spec_type_to_codelist( doc, codelist_cols = c(code_id = "ID", name = "[N|n]ame", code = "^[C|c]ode|^[T|t]erm", decode = "[D|d]ecode"), permitted_val_cols = NULL, dict_cols = c(code_id = "ID", name = "[N|n]ame", dictionary = "[D|d]ictionary", version = "[V|v]ersion"), sheets = NULL, simplify = FALSE )
spec_type_to_codelist( doc, codelist_cols = c(code_id = "ID", name = "[N|n]ame", code = "^[C|c]ode|^[T|t]erm", decode = "[D|d]ecode"), permitted_val_cols = NULL, dict_cols = c(code_id = "ID", name = "[N|n]ame", dictionary = "[D|d]ictionary", version = "[V|v]ersion"), sheets = NULL, simplify = FALSE )
doc |
Named list of datasets @seealso |
codelist_cols |
Named vector of column names that make up the codelist. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
permitted_val_cols |
Named vector of column names that make up the permitted value The column names can be regular expressions for more flexibility. This is optional, can be left as null if there isn't a permitted value sheet |
dict_cols |
Named vector of column names that make up the dictionary value The column names can be regular expressions for more flexibility. This is optional, can be left as null if there isn't a permitted value sheet |
sheets |
Optional, regular expressions of the sheets |
simplify |
Boolean value, if true will convert code/decode pairs that are all equal to a permitted value list. True by default |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_derivations()
,
spec_type_to_ds_spec()
,
spec_type_to_ds_vars()
,
spec_type_to_value_spec()
,
spec_type_to_var_spec()
Creates the derivation table from a list of datasets (optionally filtered by
the sheet input). The named vector cols
is used to determine which is the
correct sheet and renames the columns. The derivation will be used for
"derived" origins, the comments for "assigned" origins, and predecessor for
"predecessor" origins.
spec_type_to_derivations( doc, cols = c(derivation_id = "ID", derivation = "[D|d]efinition|[D|d]escription"), sheet = "Method|Derivations?", var_cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", predecessor = "[P|p]redecessor", comment = "[C|c]omment") )
spec_type_to_derivations( doc, cols = c(derivation_id = "ID", derivation = "[D|d]efinition|[D|d]escription"), sheet = "Method|Derivations?", var_cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", predecessor = "[P|p]redecessor", comment = "[C|c]omment") )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
var_cols |
Named vector of the name(s) of the origin, predecessor and comment columns. These do not have to be on the specified sheet. |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist()
,
spec_type_to_ds_spec()
,
spec_type_to_ds_vars()
,
spec_type_to_value_spec()
,
spec_type_to_var_spec()
Creates the ds_spec from a list of datasets (optionally filtered by the sheet
input). The named vector cols
is used to determine which is the correct
sheet and renames the columns
spec_type_to_ds_spec( doc, cols = c(dataset = "[N|n]ame|[D|d]ataset|[D|d]omain", structure = "[S|s]tructure", label = "[L|l]abel|[D|d]escription"), sheet = NULL )
spec_type_to_ds_spec( doc, cols = c(dataset = "[N|n]ame|[D|d]ataset|[D|d]omain", structure = "[S|s]tructure", label = "[L|l]abel|[D|d]escription"), sheet = NULL )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist()
,
spec_type_to_derivations()
,
spec_type_to_ds_vars()
,
spec_type_to_value_spec()
,
spec_type_to_var_spec()
Creates the ds_vars from a list of datasets (optionally filtered by the sheet
input). The named vector cols
is used to determine which is the correct
sheet and renames the columns
spec_type_to_ds_vars( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[V|v]ariable [[N|n]ame]?|[V|v]ariables?", order = "[V|v]ariable [O|o]rder|[O|o]rder", keep = "[K|k]eep|[M|m]andatory"), key_seq_sep_sheet = TRUE, key_seq_cols = c(dataset = "Dataset", key_seq = "Key Variables"), sheet = "[V|v]ar|Datasets" )
spec_type_to_ds_vars( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[V|v]ariable [[N|n]ame]?|[V|v]ariables?", order = "[V|v]ariable [O|o]rder|[O|o]rder", keep = "[K|k]eep|[M|m]andatory"), key_seq_sep_sheet = TRUE, key_seq_cols = c(dataset = "Dataset", key_seq = "Key Variables"), sheet = "[V|v]ar|Datasets" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
key_seq_sep_sheet |
A boolean to indicate if the key sequence is on a
separate sheet. If set to false add the key_seq column name to the |
key_seq_cols |
names vector to get the key_sequence for each dataset |
sheet |
Regular expression for the sheet names |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist()
,
spec_type_to_derivations()
,
spec_type_to_ds_spec()
,
spec_type_to_value_spec()
,
spec_type_to_var_spec()
Creates the value_spec from a list of datasets (optionally filtered by the
sheet input). The named vector cols
is used to determine which is the
correct sheet and renames the columns
spec_type_to_value_spec( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", type = "[T|t]ype", code_id = "[C|c]odelist|Controlled Term", sig_dig = "[S|s]ignificant", where = "[W|w]here", derivation_id = "[M|m]ethod", predecessor = "[P|p]redecessor"), sheet = NULL, where_sep_sheet = TRUE, where_cols = c(id = "ID", where = c("Variable", "Comparator", "Value")), var_sheet = "[V|v]ar" )
spec_type_to_value_spec( doc, cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?", origin = "[O|o]rigin", type = "[T|t]ype", code_id = "[C|c]odelist|Controlled Term", sig_dig = "[S|s]ignificant", where = "[W|w]here", derivation_id = "[M|m]ethod", predecessor = "[P|p]redecessor"), sheet = NULL, where_sep_sheet = TRUE, where_cols = c(id = "ID", where = c("Variable", "Comparator", "Value")), var_sheet = "[V|v]ar" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
where_sep_sheet |
Boolean value to control if the where information in a
separate dataset. If the where information is on a separate sheet, set to
true and provide the column information with the |
where_cols |
Named list with an id and where field. All columns in the where field will be collapsed together |
var_sheet |
Name of sheet with the Variable information on it. Metacore expects each variable will have a row in the value_spec. Because many specification only have information in the value tab this is added. If the information already exists in the value tab of your specification set to NULL |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist()
,
spec_type_to_derivations()
,
spec_type_to_ds_spec()
,
spec_type_to_ds_vars()
,
spec_type_to_var_spec()
Creates the var_spec from a list of datasets (optionally filtered by the sheet
input). The named vector cols
is used to determine which is the correct
sheet and renames the columns. (Note: the keep column will be converted logical)
spec_type_to_var_spec( doc, cols = c(variable = "[N|n]ame|[V|v]ariables?", length = "[L|l]ength", label = "[L|l]abel", type = "[T|t]ype", dataset = "[D|d]ataset|[D|d]omain", format = "[F|f]ormat"), sheet = "[V|v]ar" )
spec_type_to_var_spec( doc, cols = c(variable = "[N|n]ame|[V|v]ariables?", length = "[L|l]ength", label = "[L|l]abel", type = "[T|t]ype", dataset = "[D|d]ataset|[D|d]omain", format = "[F|f]ormat"), sheet = "[V|v]ar" )
doc |
Named list of datasets @seealso |
cols |
Named vector of column names. The column names can be regular expressions for more flexibility. But, the names must follow the given pattern |
sheet |
Regular expression for the sheet name |
a dataset formatted for the metacore object
Other spec builders:
spec_type_to_codelist()
,
spec_type_to_derivations()
,
spec_type_to_ds_spec()
,
spec_type_to_ds_vars()
,
spec_type_to_value_spec()
Reads in a define xml and creates a code_list table. The code_list table is a nested tibble where each row is a code list or permitted value list. The code column contains a vector of a tibble depending on if it is a permitted values or code list
xml_to_codelist(doc)
xml_to_codelist(doc)
doc |
xml document |
a tibble containing the code list and permitted value information
Other xml builder:
xml_to_derivations()
,
xml_to_ds_spec()
,
xml_to_ds_vars()
,
xml_to_value_spec()
,
xml_to_var_spec()
This reads in a xml document and gets all the derivations/comments. These can be cross referenced to variables using the derivation_id's
xml_to_derivations(doc)
xml_to_derivations(doc)
doc |
xml document |
dataframe with derivation id's and derivations
Other xml builder:
xml_to_codelist()
,
xml_to_ds_spec()
,
xml_to_ds_vars()
,
xml_to_value_spec()
,
xml_to_var_spec()
Creates a dataset specification, which has the domain name and label for each dataset
xml_to_ds_spec(doc)
xml_to_ds_spec(doc)
doc |
xml document |
data frame with the data set specifications
Other xml builder:
xml_to_codelist()
,
xml_to_derivations()
,
xml_to_ds_vars()
,
xml_to_value_spec()
,
xml_to_var_spec()
Creates the ds_vars table, which acts as a key between the datasets and the var spec
xml_to_ds_vars(doc)
xml_to_ds_vars(doc)
doc |
xml document |
data frame with the dataset and variables
Other xml builder:
xml_to_codelist()
,
xml_to_derivations()
,
xml_to_ds_spec()
,
xml_to_value_spec()
,
xml_to_var_spec()
Takes a define xml and pulls out the value level metadata including codelist_id's, defines_id's, and where clause. There is one row per variable expect when there is a where clause, at which point there is one row per value.
xml_to_value_spec(doc)
xml_to_value_spec(doc)
doc |
xml document |
tibble with the value level information
Other xml builder:
xml_to_codelist()
,
xml_to_derivations()
,
xml_to_ds_spec()
,
xml_to_ds_vars()
,
xml_to_var_spec()
Takes a define xml and returns a dataset with specifications for each variable. The variable will just be the variable, unless the specification for that variable differ between datasets
xml_to_var_spec(doc)
xml_to_var_spec(doc)
doc |
define xml document |
data frame with variable, length, label columns
Other xml builder:
xml_to_codelist()
,
xml_to_derivations()
,
xml_to_ds_spec()
,
xml_to_ds_vars()
,
xml_to_value_spec()