Title: | Find Clinical Trial Sites Under-Reporting Adverse Events |
---|---|
Description: | Monitoring of Adverse Event (AE) reporting in clinical trials is important for patient safety. Sites that are under-reporting AEs can be detected using Bootstrap-based simulations that simulate overall AE reporting. Based on the simulation an AE under-reporting probability is assigned to each site in a given trial (Koneswarakantha 2021 <doi:10.1007/s40264-020-01011-5>). |
Authors: | Bjoern Koneswarakantha [aut, cre, cph] , F. Hoffmann-La Roche Ltd [cph] |
Maintainer: | Bjoern Koneswarakantha <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.1 |
Built: | 2024-12-01 07:58:13 UTC |
Source: | https://github.com/openpharma/simaerep |
Internal function called by check_df_visit().
aggr_duplicated_visits(df_visit)
aggr_duplicated_visits(df_visit)
df_visit |
dataframe with columns: study_id, site_number, patnum, visit, n_ae |
df_visit corrected
Internal function used by all functions that accept df_visit as a parameter. Checks for NA columns, numeric visits and AEs, implicitly missing and duplicated visits.
check_df_visit(df_visit)
check_df_visit(df_visit)
df_visit |
dataframe with columns: study_id, site_number, patnum, visit, n_ae |
corrected df_visit
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_visit_filt <- df_visit %>% dplyr::filter(visit != 3) df_visit_corr <- check_df_visit(df_visit_filt) 3 %in% df_visit_corr$visit nrow(df_visit_corr) == nrow(df_visit) df_visit_corr <- check_df_visit(dplyr::bind_rows(df_visit, df_visit)) nrow(df_visit_corr) == nrow(df_visit)
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_visit_filt <- df_visit %>% dplyr::filter(visit != 3) df_visit_corr <- check_df_visit(df_visit_filt) 3 %in% df_visit_corr$visit nrow(df_visit_corr) == nrow(df_visit) df_visit_corr <- check_df_visit(dplyr::bind_rows(df_visit, df_visit)) nrow(df_visit_corr) == nrow(df_visit)
Correct under-reporting probabilities using p.adjust
.
eval_sites(df_sim_sites, method = "BH", under_only = TRUE, ...)
eval_sites(df_sim_sites, method = "BH", under_only = TRUE, ...)
df_sim_sites |
dataframe generated by |
method |
character, passed to stats::p.adjust(), if NULL eval_sites_deprecated() is used instead, Default = "BH" |
under_only |
compute under-reporting probabilities only, default = TRUE check_df_visit(), computationally expensive on large data sets. Default: TRUE |
... |
use to pass r_sim_sites parameter to eval_sites_deprecated() |
dataframe with the following columns:
study identification
site identification
median(max(visit)) * 0.75
mean AE at visit_med75 site level
mean AE at visit_med75 study level
p-value as returned by poisson.test
bootstrapped probability for having mean_ae_site_med75 or lower
adjusted p-values
adjusted bootstrapped probability for having mean_ae_site_med75 or lower
probability under-reporting as 1 - pval_adj, poisson.test (use as benchmark)
probability under-reporting as 1 - prob_low_adj, bootstrapped (use)
site_aggr
,
sim_sites
,
p.adjust
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_eval <- eval_sites(df_sim_sites) df_eval
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_eval <- eval_sites(df_sim_sites) df_eval
Internal function called by check_df_visit().
exp_implicit_missing_visits(df_visit)
exp_implicit_missing_visits(df_visit)
df_visit |
dataframe with columns: study_id, site_number, patnum, visit, n_ae |
df_visit corrected
Get Portfolio configuration from a dataframe aggregated on
patient level with max_ae and max_visit. Will filter studies with only a few
sites and patients and will anonymize IDs. Portfolio configuration can be
used by sim_test_data_portfolio
to generate data for an
artificial portfolio.
get_config( df_site, min_pat_per_study = 100, min_sites_per_study = 10, anonymize = TRUE, pad_width = 4 )
get_config( df_site, min_pat_per_study = 100, min_sites_per_study = 10, anonymize = TRUE, pad_width = 4 )
df_site |
dataframe aggregated on patient level with max_ae and max_visit |
min_pat_per_study |
minimum number of patients per study, Default: 100 |
min_sites_per_study |
minimum number of sites per study, Default: 10 |
anonymize |
logical, Default: TRUE |
pad_width |
padding width for newly created IDs, Default: 4 |
dataframe with the following columns:
study identification
mean AE per visit per study
site
standard deviation of maximum patient visits per site
mean of maximum patient visits per site
number of patients
sim_test_data_study
get_config
sim_test_data_portfolio
sim_ur_scenarios
get_portf_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
Test function, test applicability of poisson test, by calculating
the bootstrapped probability of obtaining a specific p-value or lower, use
in combination with sim_studies()
.
get_ecd_values(df_sim_studies, df_sim_sites, val_str)
get_ecd_values(df_sim_studies, df_sim_sites, val_str)
df_sim_studies |
dataframe, generated by |
df_sim_sites |
dataframe, generated by |
val_str |
c("prob_low","pval") |
trains a ecdf function for each studies based on the results
of sim_studies()
dataframe with the following columns:
study identification
site identification
median(max(visit)) * 0.75
mean AE at visit_med75 site level
mean AE at visit_med75 study level
p-value as returned by poisson.test
p-value as returned by poisson.test
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.3) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_sim_studies <- sim_studies( df_site = df_site, df_visit = df_visit, r = 3, parallel = FALSE, poisson_test = TRUE, prob_lower = TRUE ) get_ecd_values(df_sim_studies, df_sim_sites, "prob_low") get_ecd_values(df_sim_studies, df_sim_sites, "pval")
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.3) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_sim_studies <- sim_studies( df_site = df_site, df_visit = df_visit, r = 3, parallel = FALSE, poisson_test = TRUE, prob_lower = TRUE ) get_ecd_values(df_sim_studies, df_sim_sites, "prob_low") get_ecd_values(df_sim_studies, df_sim_sites, "pval")
Internal Function used by sim_sites()
get_pat_pool_config(df_visit, df_site, min_n_pat_with_med75 = 1)
get_pat_pool_config(df_visit, df_site, min_n_pat_with_med75 = 1)
df_visit |
dataframe |
df_site |
dataframe as created by site_aggr() |
min_n_pat_with_med75 |
minimum number of patients with visit_med_75 for simulation, Default: 1 |
For simulating a study we need to configure the study patient pool to match the configuration of the sites
dataframe
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 1000, n_sites = 3, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site <- site_aggr(df_visit) df_config <- get_pat_pool_config(df_visit, df_site) df_config
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 1000, n_sites = 3, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site <- site_aggr(df_visit) df_config <- get_pat_pool_config(df_visit, df_site) df_config
Performance as true positive rate (tpr as tp/P) on the basis of desired false positive rates (fpr as fp/P).
get_portf_perf(df_scen, stat = "prob_low_prob_ur", fpr = c(0.001, 0.01, 0.05))
get_portf_perf(df_scen, stat = "prob_low_prob_ur", fpr = c(0.001, 0.01, 0.05))
df_scen |
dataframe as returned by |
stat |
character denoting the column name of the under-reporting statistic, Default: 'prob_low_prob_ur' |
fpr |
numeric vector specifying false positive rates, Default: c(0.001, 0.01, 0.05) |
DETAILS
dataframe
sim_test_data_study
get_config
sim_test_data_portfolio
sim_ur_scenarios
get_portf_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
Internal function used by site_aggr()
, plot_visit_med75()
,
returns mean AE development from visit 0 to visit_med75.
get_site_mean_ae_dev(df_visit, df_pat, df_site)
get_site_mean_ae_dev(df_visit, df_pat, df_site)
df_visit |
dataframe |
df_pat |
dataframe as returned by pat_aggr() |
df_site |
dataframe as returned by site_aggr() |
dataframe
Internal function used by site_aggr()
.
get_visit_med75(df_pat, method = "med75_adj", min_pat_pool = 0.2)
get_visit_med75(df_pat, method = "med75_adj", min_pat_pool = 0.2)
df_pat |
dataframe as returned by |
method |
character, one of c("med75", "med75_adj") defining method for defining evaluation point visit_med75 (see details), Default: "med75_adj" |
min_pat_pool |
double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
dataframe
internal function
is_orivisit(x)
is_orivisit(x)
x |
object |
logical
internal function
is_simaerep(x)
is_simaerep(x)
x |
object |
logical
like rank() with ties.method = "max", works on tbl objects
max_rank(df, col, col_new)
max_rank(df, col, col_new)
df |
dataframe |
col |
character column name to rank y |
col_new |
character column name for rankings |
this is needed for hochberg p value adjustment. We need to assign higher rank when multiple sites have same p value
df <- tibble::tibble(s = c(1, 2, 2, 2, 5, 10)) %>% dplyr::mutate( rank = rank(s, ties.method = "max") ) df %>% max_rank("s", "max_rank") # Database con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") dplyr::copy_to(con, df, "df") max_rank(dplyr::tbl(con, "df"), "s", "max_rank") DBI::dbDisconnect(con)
df <- tibble::tibble(s = c(1, 2, 2, 2, 5, 10)) %>% dplyr::mutate( rank = rank(s, ties.method = "max") ) df %>% max_rank("s", "max_rank") # Database con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") dplyr::copy_to(con, df, "df") max_rank(dplyr::tbl(con, "df"), "s", "max_rank") DBI::dbDisconnect(con)
Internal S3 object, stores lazy reference to original visit data.
orivisit(df_visit, call = NULL, env = parent.frame())
orivisit(df_visit, call = NULL, env = parent.frame())
df_visit |
dataframe with original visit data |
call |
optional, provide call, Default: NULL |
env |
optional, provide environment of original visit data, Default: parent.frame() |
Saves variable name of original visit data, checks whether it can be retrieved from parent environment and stores summary. Original data can be retrieved using as.data.frame(x).
orivisit object
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" visit <- orivisit(df_visit) object.size(df_visit) object.size(visit) as.data.frame(visit)
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" visit <- orivisit(df_visit) object.size(df_visit) object.size(visit) as.data.frame(visit)
Internal function used by site_aggr()
and
plot_visit_med75()
, adds the maximum visit for each patient.
pat_aggr(df_visit)
pat_aggr(df_visit)
df_visit |
dataframe |
dataframe
Internal function for sim_sites
,
filter all visits greater than max_visit_med75_study
returns dataframe with one column for studies and one column with nested
patient data.
pat_pool(df_visit, df_site)
pat_pool(df_visit, df_site)
df_visit |
dataframe, created by |
df_site |
dataframe created by |
dataframe with nested pat_pool column
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_pat_pool <- pat_pool(df_visit, df_site) df_pat_pool
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_pat_pool <- pat_pool(df_visit, df_site) df_pat_pool
This plot is meant to supplement the package documentation.
plot_dots( df, nrow = 10, ncols = 10, col_group = "site", thresh = NULL, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", size_dots = 10 )
plot_dots( df, nrow = 10, ncols = 10, col_group = "site", thresh = NULL, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", size_dots = 10 )
df |
dataframe, cols = c('site', 'patients', 'n_ae') |
nrow |
integer, number of rows, Default: 10 |
ncols |
integer, number of columns, Default: 10 |
col_group |
character, grouping column, Default: 'site' |
thresh |
numeric, threshold to determine color of mean_ae annotation, Default: NULL |
color_site_a |
character, hex color value, Default: '#BDBDBD' |
color_site_b |
character, hex color value, Default: '#757575' |
color_site_c |
character, hex color value, Default: 'gold3' |
color_high |
character, hex color value, Default: '#00695C' |
color_low |
character, hex color value, Default: '#25A69A' |
size_dots |
integer, Default: 10 |
ggplot object
study <- tibble::tibble( site = LETTERS[1:3], patients = c(list(seq(1, 50, 1)), list(seq(1, 40, 1)), list(seq(1, 10, 1))) ) %>% tidyr::unnest(patients) %>% dplyr::mutate(n_ae = as.integer(runif(min = 0, max = 10, n = nrow(.)))) plot_dots(study)
study <- tibble::tibble( site = LETTERS[1:3], patients = c(list(seq(1, 50, 1)), list(seq(1, 40, 1)), list(seq(1, 10, 1))) ) %>% tidyr::unnest(patients) %>% dplyr::mutate(n_ae = as.integer(runif(min = 0, max = 10, n = nrow(.)))) plot_dots(study)
This plots supplements the package documentation.
plot_sim_example( substract_ae_per_pat = 0, size_dots = 10, size_raster_label = 12, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", title = TRUE, legend = TRUE, seed = 5 )
plot_sim_example( substract_ae_per_pat = 0, size_dots = 10, size_raster_label = 12, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", title = TRUE, legend = TRUE, seed = 5 )
substract_ae_per_pat |
integer, subtract aes from patients at site C, Default: 0 |
size_dots |
integer, Default: 10 |
size_raster_label |
integer, Default: 12 |
color_site_a |
character, hex color value, Default: '#BDBDBD' |
color_site_b |
character, hex color value, Default: '#757575' |
color_site_c |
character, hex color value, Default: 'gold3' |
color_high |
character, hex color value, Default: '#00695C' |
color_low |
character, hex color value, Default: '#25A69A' |
title |
logical, include title, Default: T |
legend |
logical, include legend, Default: T |
seed |
pass seed for simulations Default: 5 |
uses plot_dots()
and adds 2 simulation panels, uses made-up
site config with three sites A,B,C simulating site C
ggplot
plot_sim_example(size_dots = 5)
plot_sim_example(size_dots = 5)
This plot is meant to supplement the package documentation.
plot_sim_examples(substract_ae_per_pat = c(0, 1, 3), ...)
plot_sim_examples(substract_ae_per_pat = c(0, 1, 3), ...)
substract_ae_per_pat |
integer, Default: c(0, 1, 3) |
... |
parameters passed to plot_sim_example() |
This function is a wrapper for plot_sim_example()
ggplot
plot_sim_examples(size_dot = 3, size_raster_label = 10) plot_sim_examples()
plot_sim_examples(size_dot = 3, size_raster_label = 10) plot_sim_examples()
Most suitable visual representation of the AE under-reporting statistics.
plot_study( df_visit, df_site, df_eval, study, df_al = NULL, n_sites = 16, pval = FALSE, prob_col = "prob_low_prob_ur" )
plot_study( df_visit, df_site, df_eval, study, df_al = NULL, n_sites = 16, pval = FALSE, prob_col = "prob_low_prob_ur" )
df_visit |
dataframe, created by |
df_site |
dataframe created by |
df_eval |
dataframe created by |
study |
study |
df_al |
dataframe containing study_id, site_number, alert_level_site, alert_level_study (optional), Default: NA |
n_sites |
integer number of most at risk sites, Default: 16 |
pval |
logical show p-value, Default:FALSE |
prob_col |
character, denotes probability column, Default: "prob_low_prob_ur" |
Left panel shows mean AE reporting per site (lightblue and darkblue lines) against mean AE reporting of the entire study (golden line). Single sites are plotted in descending order by AE under-reporting probability on the right panel in which grey lines denote cumulative AE count of single patients. Grey dots in the left panel plot indicate sites that were picked for single plotting. AE under-reporting probability of dark blue lines crossed threshold of 95%. Numbers in the upper left corner indicate the ratio of patients that have been used for the analysis against the total number of patients. Patients that have not been on the study long enough to reach the evaluation point (visit_med75) will be ignored.
ggplot
df_visit <- sim_test_data_study(n_pat = 1000, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.15, max_visit_sd = 8) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_eval <- eval_sites(df_sim_sites) plot_study(df_visit, df_site, df_eval, study = "A")
df_visit <- sim_test_data_study(n_pat = 1000, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.15, max_visit_sd = 8) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_eval <- eval_sites(df_sim_sites) plot_study(df_visit, df_site, df_eval, study = "A")
Plots cumulative AEs against visits for patients at sites of given study and compares against visit_med75.
plot_visit_med75( df_visit, df_site = NULL, study_id_str, n_sites = 6, min_pat_pool = 0.2, verbose = TRUE )
plot_visit_med75( df_visit, df_site = NULL, study_id_str, n_sites = 6, min_pat_pool = 0.2, verbose = TRUE )
df_visit |
dataframe |
df_site |
dataframe, as returned by |
study_id_str |
character, specify study in study_id column |
n_sites |
integer, Default: 6 |
min_pat_pool |
double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
verbose |
logical, Default: TRUE |
ggplot
df_visit <- sim_test_data_study(n_pat = 120, n_sites = 6, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) plot_visit_med75(df_visit, df_site, study_id_str = "A", n_site = 6)
df_visit <- sim_test_data_study(n_pat = 120, n_sites = 6, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) plot_visit_med75(df_visit, df_site, study_id_str = "A", n_site = 6)
generic plot function for simaerep objects
## S3 method for class 'simaerep' plot( x, ..., study = NULL, what = "ur", n_sites = 16, df_visit = NULL, env = parent.frame() )
## S3 method for class 'simaerep' plot( x, ..., study = NULL, what = "ur", n_sites = 16, df_visit = NULL, env = parent.frame() )
x |
simaerep object |
... |
additional parameters passed to plot_study() or plot_visit_med75() |
study |
character specifying study to be plotted, Default: NULL |
what |
one of c("ur", "med75"), specifying whether to plot site AE under-reporting or visit_med75 values, Default: 'ur' |
n_sites |
number of sites to plot, Default: 16 |
df_visit |
optional, pass original visit data if it cannot be retrieved from parent environment, Default: NULL |
env |
optional, pass environment from which to retrieve original visit data, Default: parent.frame() |
see plot_study() and plot_visit_med75()
ggplot object
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" aerep <- simaerep(df_visit) plot(aerep, what = "ur", study = "A") plot(aerep, what = "med75", study = "A")
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" aerep <- simaerep(df_visit) plot(aerep, what = "ur", study = "A") plot(aerep, what = "med75", study = "A")
Internal function used by sim_sites()
.
poiss_test_site_ae_vs_study_ae(site_ae, study_ae, visit_med75)
poiss_test_site_ae_vs_study_ae(site_ae, study_ae, visit_med75)
site_ae |
vector with AE numbers |
study_ae |
vector with AE numbers |
visit_med75 |
integer |
sets pvalue=1 if mean AE site is greater than mean AE study or ttest gives error
pval
poiss_test_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10 ) poiss_test_site_ae_vs_study_ae( site_ae = c(11, 9, 8, 6, 3), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10 )
poiss_test_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10 ) poiss_test_site_ae_vs_study_ae( site_ae = c(11, 9, 8, 6, 3), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10 )
Internal function called by sim_sites
.
Collect AEs per patient at visit_med75 for site and study as a vector of
integers.
prep_for_sim(df_site, df_visit)
prep_for_sim(df_site, df_visit)
df_site |
dataframe created by |
df_visit |
dataframe, created by |
dataframe
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_prep <- prep_for_sim(df_site, df_visit) df_prep
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_prep <- prep_for_sim(df_site, df_visit) df_prep
Internal function used by sim_sites()
prob_lower_site_ae_vs_study_ae( site_ae, study_ae, r = 1000, parallel = FALSE, under_only = TRUE )
prob_lower_site_ae_vs_study_ae( site_ae, study_ae, r = 1000, parallel = FALSE, under_only = TRUE )
site_ae |
vector with AE numbers |
study_ae |
vector with AE numbers |
r |
integer, denotes number of simulations, default = 1000 |
parallel |
logical, toggles parallel processing on and of, default = F |
under_only |
compute under-reporting probabilities only, default = TRUE |
sets pvalue=1 if mean AE site is greater than mean AE study
pval
prob_lower_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8), parallel = FALSE )
prob_lower_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8), parallel = FALSE )
Internal utility function.
purrr_bar( ..., .purrr, .f, .f_args = list(), .purrr_args = list(), .steps, .slow = FALSE, .progress = TRUE )
purrr_bar( ..., .purrr, .f, .f_args = list(), .purrr_args = list(), .steps, .slow = FALSE, .progress = TRUE )
... |
iterable arguments passed to .purrr |
.purrr |
purrr or furrr function |
.f |
function to be executed over iterables |
.f_args |
list of arguments passed to .f, Default: list() |
.purrr_args |
list of arguments passed to .purrr, Default: list() |
.steps |
integer number of iterations |
.slow |
logical slows down execution, Default: FALSE |
.progress |
logical, show progress bar, Default: TRUE |
Call still needs to be wrapped in with_progress
or with_progress_cnd()
result of function passed to .f
# purrr::map progressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5) ) # purrr::walk progressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5) ) # progress bar off progressr::with_progress( purrr_bar( rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5, .progress = FALSE ) ) # purrr::map2 progressr::with_progress( purrr_bar( rep(1, 5), rep(2, 5), .purrr = purrr::map2, .f = `+`, .steps = 5, .slow = TRUE ) ) # purrr::pmap progressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = `+`, .steps = 5, .slow = TRUE ) ) # define function within purr_bar() call progressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = function(x, y) { paste0(x, y) }, .steps = 5, .slow = TRUE ) ) # with mutate progressr::with_progress( tibble::tibble(x = rep(0.25, 5)) %>% dplyr::mutate(x = purrr_bar(x, .purrr = purrr::map, .f = Sys.sleep, .steps = 5)) )
# purrr::map progressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5) ) # purrr::walk progressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5) ) # progress bar off progressr::with_progress( purrr_bar( rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5, .progress = FALSE ) ) # purrr::map2 progressr::with_progress( purrr_bar( rep(1, 5), rep(2, 5), .purrr = purrr::map2, .f = `+`, .steps = 5, .slow = TRUE ) ) # purrr::pmap progressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = `+`, .steps = 5, .slow = TRUE ) ) # define function within purr_bar() call progressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = function(x, y) { paste0(x, y) }, .steps = 5, .slow = TRUE ) ) # with mutate progressr::with_progress( tibble::tibble(x = rep(0.25, 5)) %>% dplyr::mutate(x = purrr_bar(x, .purrr = purrr::map, .f = Sys.sleep, .steps = 5)) )
Internal function called by sim_sites
after prep_for_sim
sim_after_prep( df_sim_prep, r = 1000, poisson_test = FALSE, prob_lower = TRUE, progress = FALSE, under_only = TRUE )
sim_after_prep( df_sim_prep, r = 1000, poisson_test = FALSE, prob_lower = TRUE, progress = FALSE, under_only = TRUE )
df_sim_prep |
dataframe as returned by
|
r |
integer, denotes number of simulations, default = 1000 |
poisson_test |
logical, calculates poisson.test pvalue |
prob_lower |
logical, calculates probability for getting a lower value |
progress |
logical, display progress bar, Default = TRUE |
under_only |
compute under-reporting probabilities only, default = TRUE check_df_visit(), computationally expensive on large data sets. Default: TRUE |
dataframe
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_prep <- prep_for_sim(df_site, df_visit) df_sim <- sim_after_prep(df_prep) df_sim
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_prep <- prep_for_sim(df_site, df_visit) df_sim <- sim_after_prep(df_prep) df_sim
Calculate prob_lower for study sites using table operations
sim_inframe(df_visit, r = 1000, df_site = NULL)
sim_inframe(df_visit, r = 1000, df_site = NULL)
df_visit |
Data frame with columns: study_id, site_number, patnum, visit, n_ae. |
r |
Integer or tbl_object, number of repetitions for bootstrap simulation. Pass a tbl object referring to a table with one column and as many rows as desired repetitions. Default: 1000. |
df_site |
dataframe as returned be |
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_sim <- sim_inframe(df_visit) df_eval <- eval_sites(df_sim) df_eval
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_sim <- sim_inframe(df_visit) df_eval <- eval_sites(df_sim) df_eval
internal function called by simulate_scenarios()
sim_scenario(n_ae_site, n_ae_study, frac_pat_with_ur, ur_rate)
sim_scenario(n_ae_site, n_ae_study, frac_pat_with_ur, ur_rate)
n_ae_site |
integer vector |
n_ae_study |
integer vector |
frac_pat_with_ur |
double |
ur_rate |
double |
list
sim_scenario(c(5,5,5,5), c(8,8,8,8), 0.2, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 0.75, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 1, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 1, 1) sim_scenario(c(5,5,5,5), c(8,8,8,8), 0, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 2, 0.5)
sim_scenario(c(5,5,5,5), c(8,8,8,8), 0.2, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 0.75, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 1, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 1, 1) sim_scenario(c(5,5,5,5), c(8,8,8,8), 0, 0.5) sim_scenario(c(5,5,5,5), c(8,8,8,8), 2, 0.5)
Collects the number of AEs of all eligible patients that meet visit_med75 criteria of site. Then calculates poisson.test pvalue and bootstrapped probability of having a lower mean value.
sim_sites( df_site, df_visit, r = 1000, poisson_test = TRUE, prob_lower = TRUE, progress = TRUE, check = TRUE, under_only = TRUE )
sim_sites( df_site, df_visit, r = 1000, poisson_test = TRUE, prob_lower = TRUE, progress = TRUE, check = TRUE, under_only = TRUE )
df_site |
dataframe created by |
df_visit |
dataframe, created by |
r |
integer, denotes number of simulations, default = 1000 |
poisson_test |
logical, calculates poisson.test pvalue |
prob_lower |
logical, calculates probability for getting a lower value |
progress |
logical, display progress bar, Default = TRUE |
check |
logical, perform data check and attempt repair with |
under_only |
compute under-reporting probabilities only, default = TRUE check_df_visit(), computationally expensive on large data sets. Default: TRUE |
dataframe with the following columns:
study identification
site identification
number of patients at site
median(max(visit)) * 0.75
number of patients at site with med75
mean AE at visit_med75 site level
mean AE at visit_med75 study level
number of patients at study with med75 excl. site
p-value as returned by poisson.test
bootstrapped probability for having mean_ae_site_med75 or lower
sim_sites
,
site_aggr
,
pat_pool
,
prob_lower_site_ae_vs_study_ae
,
poiss_test_site_ae_vs_study_ae
,
sim_sites
,
prep_for_sim
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_sim_sites %>% knitr::kable(digits = 2)
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.2 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit, r = 100) df_sim_sites %>% knitr::kable(digits = 2)
Test function, test applicability of poisson test, by
calculating a the bootstrapped probability of obtaining a specific p-value
or lower, use in combination with get_ecd_values()
.
sim_studies( df_visit, df_site, r = 100, poisson_test = TRUE, prob_lower = TRUE, r_prob_lower = 1000, under_only = TRUE, parallel = FALSE, keep_ae = FALSE, min_n_pat_with_med75 = 1, studies = NULL, .progress = TRUE )
sim_studies( df_visit, df_site, r = 100, poisson_test = TRUE, prob_lower = TRUE, r_prob_lower = 1000, under_only = TRUE, parallel = FALSE, keep_ae = FALSE, min_n_pat_with_med75 = 1, studies = NULL, .progress = TRUE )
df_visit |
dataframe |
df_site |
dataframe |
r |
integer, denotes number of simulations, Default: 1000 |
poisson_test |
logical, calculates poisson.test pvalue, Default: TRUE |
prob_lower |
logical, calculates probability for getting a lower value, Default: FALSE |
r_prob_lower |
integer, denotes number of simulations for prob_lower value calculation,, Default: 1000 |
under_only |
compute under-reporting probabilities only, default = TRUE |
parallel |
logical, see examples for registering parallel processing framework , Default: FALSE |
keep_ae |
logical, keep ae numbers in output dataframe memory increase roughly 30 percent, Default: F |
min_n_pat_with_med75 |
integer, min number of patients with med75 at site to simulate, Default: 1 |
studies |
vector with study names, Default: NULL |
.progress |
logical, show progress bar |
Here we simulate study replicates maintaining the same number of sites, patients and visit_med75 by bootstrap resampling, then probabilities for obtaining lower or same mean_ae count and p-values using poisson.test are calculated.
adds column with simulated probabilities for equal or lower mean_ae at visit_med75
dataframe
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 1000, n_sites = 3, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site <- site_aggr(df_visit) sim_studies(df_visit, df_site, r = 3, keep_ae = TRUE) ## Not run: # parallel processing ------------------------- library(future) future::plan(multiprocess) sim_studies(df_visit, df_site, r = 3, keep_ae = TRUE, parallel = TRUE) future::plan(sequential) ## End(Not run)
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 1000, n_sites = 3, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site <- site_aggr(df_visit) sim_studies(df_visit, df_site, r = 3, keep_ae = TRUE) ## Not run: # parallel processing ------------------------- library(future) future::plan(multiprocess) sim_studies(df_visit, df_site, r = 3, keep_ae = TRUE, parallel = TRUE) future::plan(sequential) ## End(Not run)
helper function for sim_test_data_study()
sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 20, sd = 4), .f_sample_ae_per_visit = function(max_visit) rpois(max_visit, 0.5) )
sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 20, sd = 4), .f_sample_ae_per_visit = function(max_visit) rpois(max_visit, 0.5) )
.f_sample_max_visit |
function used to sample the maximum number of aes, Default: function() rnorm(1, mean = 20, sd = 4) |
.f_sample_ae_per_visit |
function used to sample the aes for each visit, Default: function(x) rpois(x, 0.5) |
""
vector containing cumulative aes
replicate(5, sim_test_data_patient()) replicate(5, sim_test_data_patient( .f_sample_ae_per_visit = function(x) rpois(x, 1.2)) ) replicate(5, sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 5, sd = 5)) )
replicate(5, sim_test_data_patient()) replicate(5, sim_test_data_patient( .f_sample_ae_per_visit = function(x) rpois(x, 1.2)) ) replicate(5, sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 5, sd = 5)) )
Simulate visit level data from a portfolio configuration.
sim_test_data_portfolio( df_config, df_ae_rates = NULL, parallel = FALSE, progress = TRUE )
sim_test_data_portfolio( df_config, df_ae_rates = NULL, parallel = FALSE, progress = TRUE )
df_config |
dataframe as returned by |
df_ae_rates |
dataframe with ae rates. Default: NULL |
parallel |
logical activate parallel processing, see details, Default: FALSE |
progress |
logical, Default: TRUE |
uses sim_test_data_study
.
We use the furrr
package to
implement parallel processing as these simulations can take a long time to
run. For this to work we need to specify the plan for how the code should
run, e.g. 'plan(multisession, workers = 3)
dataframe with the following columns:
study identification
mean AE per visit per study
site
standard deviation of maximum patient visits per site
mean of maximum patient visits per site
number of patients
visit number
cumulative sum of AEs
sim_test_data_study
get_config
sim_test_data_portfolio
sim_ur_scenarios
get_portf_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
evenly distributes a number of given patients across a number of given sites. Then simulates ae development of each patient reducing the number of reported AEs for patients distributed to AE-under-reporting sites.
sim_test_data_study( n_pat = 1000, n_sites = 20, frac_site_with_ur = 0, ur_rate = 0, max_visit_mean = 20, max_visit_sd = 4, ae_per_visit_mean = 0.5, ae_rates = NULL )
sim_test_data_study( n_pat = 1000, n_sites = 20, frac_site_with_ur = 0, ur_rate = 0, max_visit_mean = 20, max_visit_sd = 4, ae_per_visit_mean = 0.5, ae_rates = NULL )
n_pat |
integer, number of patients, Default: 1000 |
n_sites |
integer, number of sites, Default: 20 |
frac_site_with_ur |
fraction of AE under-reporting sites, Default: 0 |
ur_rate |
AE under-reporting rate, will lower mean ae per visit used to simulate patients at sites flagged as AE-under-reporting. Negative Values will simulate over-reporting., Default: 0 |
max_visit_mean |
mean of the maximum number of visits of each patient, Default: 20 |
max_visit_sd |
standard deviation of maximum number of visits of each patient, Default: 4 |
ae_per_visit_mean |
mean ae per visit per patient, Default: 0.5 |
ae_rates |
vector with visit-specific ae rates, Default: Null |
maximum visit number will be sampled from normal distribution with characteristics derived from max_visit_mean and max_visit_sd, while the ae per visit will be sampled from a poisson distribution described by ae_per_visit_mean.
tibble with columns site_number, patnum, is_ur, max_visit_mean, max_visit_sd, ae_per_visit_mean, visit, n_ae
set.seed(1) df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5) df_visit[which(df_visit$patnum == "P000001"),] df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.2, ur_rate = 0.5) df_visit[which(df_visit$patnum == "P000001"),] ae_rates <- c(0.7, rep(0.5, 8), rep(0.3, 5)) sim_test_data_study(n_pat = 100, n_sites = 5, ae_rates = ae_rates)
set.seed(1) df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5) df_visit[which(df_visit$patnum == "P000001"),] df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, frac_site_with_ur = 0.2, ur_rate = 0.5) df_visit[which(df_visit$patnum == "P000001"),] ae_rates <- c(0.7, rep(0.5, 8), rep(0.3, 5)) sim_test_data_study(n_pat = 100, n_sites = 5, ae_rates = ae_rates)
we remove a fraction of AEs from a specific site
sim_ur(df_visit, study_id, site_number, ur_rate)
sim_ur(df_visit, study_id, site_number, ur_rate)
df_visit |
dataframe |
study_id |
character |
site_number |
character |
ur_rate |
double |
we determine the absolute number of AEs per patient for removal. Then them remove them at the first visit. We intentionally allow fractions
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_ur <- sim_ur(df_visit, "A", site_number = "S0001", ur_rate = 0.35) # Example cumulated AE for first patient with 35% under-reporting df_ur[df_ur$site_number == "S0001" & df_ur$patnum == "P000001",]$n_ae # Example cumulated AE for first patient with no under-reporting df_visit[df_visit$site_number == "S0001" & df_visit$patnum == "P000001",]$n_ae
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit$study_id <- "A" df_ur <- sim_ur(df_visit, "A", site_number = "S0001", ur_rate = 0.35) # Example cumulated AE for first patient with 35% under-reporting df_ur[df_ur$site_number == "S0001" & df_ur$patnum == "P000001",]$n_ae # Example cumulated AE for first patient with no under-reporting df_visit[df_visit$site_number == "S0001" & df_visit$patnum == "P000001",]$n_ae
Use with simulated portfolio data to generate under-reporting stats for specified scenarios.
sim_ur_scenarios( df_portf, extra_ur_sites = 3, ur_rate = c(0.25, 0.5), r = 1000, poisson_test = FALSE, prob_lower = TRUE, parallel = FALSE, progress = TRUE, site_aggr_args = list(), eval_sites_args = list(), check = TRUE )
sim_ur_scenarios( df_portf, extra_ur_sites = 3, ur_rate = c(0.25, 0.5), r = 1000, poisson_test = FALSE, prob_lower = TRUE, parallel = FALSE, progress = TRUE, site_aggr_args = list(), eval_sites_args = list(), check = TRUE )
df_portf |
dataframe as returned by |
extra_ur_sites |
numeric, set maximum number of additional under-reporting sites, see details Default: 3 |
ur_rate |
numeric vector, set under-reporting rates for scenarios Default: c(0.25, 0.5) |
r |
integer, denotes number of simulations, default = 1000 |
poisson_test |
logical, calculates poisson.test pvalue |
prob_lower |
logical, calculates probability for getting a lower value |
parallel |
logical, use parallel processing see details, Default: FALSE |
progress |
logical, show progress bar, Default: TRUE |
site_aggr_args |
named list of parameters passed to
|
eval_sites_args |
named list of parameters passed to
|
check |
logical, perform data check and attempt repair with |
The function will apply under-reporting scenarios to each site. Reducing the number of AEs by a given under-reporting (ur_rate) for all patients at the site and add the corresponding under-reporting statistics. Since the under-reporting probability is also affected by the number of other sites that are under-reporting we additionally calculate under-reporting statistics in a scenario where additional under reporting sites are present. For this we use the median number of patients per site at the study to calculate the final number of patients for which we lower the AEs in a given under-reporting scenario. We use the furrr package to implement parallel processing as these simulations can take a long time to run. For this to work we need to specify the plan for how the code should run, e.g. plan(multisession, workers = 18)
dataframe with the following columns:
study identification
site identification
number of patients at site
number of patients at site with visit_med75
median(max(visit)) * 0.75
mean AE at visit_med75 site level
mean AE at visit_med75 study level
number of patients at site with visit_med75 at study excl site
additional sites with under-reporting patients
ratio of patients in study that are under-reporting
under-reporting rate
p-value as
returned by poisson.test
bootstrapped probability for having mean_ae_site_med75 or lower
adjusted p-values
adjusted bootstrapped probability for having mean_ae_site_med75 or lower
probability under-reporting as 1 - pval_adj, poisson.test (use as benchmark)
probability under-reporting as 1 - prob_low_adj, bootstrapped (use)
sim_test_data_study
get_config
sim_test_data_portfolio
sim_ur_scenarios
get_portf_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.4, ur_rate = 0.6) df_visit1$study_id <- "A" df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, frac_site_with_ur = 0.2, ur_rate = 0.1) df_visit2$study_id <- "B" df_visit <- dplyr::bind_rows(df_visit1, df_visit2) df_site_max <- df_visit %>% dplyr::group_by(study_id, site_number, patnum) %>% dplyr::summarise(max_visit = max(visit), max_ae = max(n_ae), .groups = "drop") df_config <- get_config(df_site_max) df_config df_portf <- sim_test_data_portfolio(df_config) df_portf df_scen <- sim_ur_scenarios(df_portf, extra_ur_sites = 2, ur_rate = c(0.5, 1)) df_scen df_perf <- get_portf_perf(df_scen) df_perf
Simulate AE under-reporting probabilities.
simaerep( df_visit, r = 1000, check = TRUE, under_only = TRUE, visit_med75 = TRUE, inframe = FALSE, progress = TRUE, mult_corr = TRUE, param_site_aggr = list(method = "med75_adj", min_pat_pool = 0.2), param_sim_sites = list(r = 1000, poisson_test = FALSE, prob_lower = TRUE), param_eval_sites = list(method = "BH"), env = parent.frame() )
simaerep( df_visit, r = 1000, check = TRUE, under_only = TRUE, visit_med75 = TRUE, inframe = FALSE, progress = TRUE, mult_corr = TRUE, param_site_aggr = list(method = "med75_adj", min_pat_pool = 0.2), param_sim_sites = list(r = 1000, poisson_test = FALSE, prob_lower = TRUE), param_eval_sites = list(method = "BH"), env = parent.frame() )
df_visit |
Data frame with columns: study_id, site_number, patnum, visit, n_ae. |
r |
Integer or tbl_object, number of repetitions for bootstrap simulation. Pass a tbl object referring to a table with one column and as many rows as desired repetitions. Default: 1000. |
check |
Logical, perform data check and attempt repair with
|
under_only |
Logical, compute under-reporting probabilities only.
Supersedes under_only parameter passed to |
visit_med75 |
Logical, should evaluation point visit_med75 be used. Default: TRUE. |
inframe |
Logical, only table operations to be used; does not require visit_med75. Compatible with dbplyr supported database backends. |
progress |
Logical, display progress bar. Default: TRUE. |
mult_corr |
Logical, multiplicity correction, Default: TRUE |
param_site_aggr |
List of parameters passed to |
param_sim_sites |
List of parameters passed to |
param_eval_sites |
List of parameters passed to |
env |
Optional, provide environment of original visit data. Default: parent.frame(). |
Executes site_aggr()
, sim_sites()
, and eval_sites()
on original
visit data and stores all intermediate results. Stores lazy reference to
original visit data for facilitated plotting using generic plot(x).
A simaerep object.
site_aggr()
, sim_sites()
, eval_sites()
, orivisit()
,
plot.simaerep()
site_aggr(), sim_sites(), eval_sites(), orivisit(), plot.simaerep()
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" aerep <- simaerep(df_visit) aerep str(aerep) # In-frame table operations simaerep(df_visit, inframe = TRUE, visit_med75 = FALSE, under_only = FALSE)$df_eval simaerep(df_visit, inframe = TRUE, visit_med75 = TRUE, under_only = FALSE)$df_eval # Database example con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") df_r <- tibble::tibble(rep = seq(1, 1000)) dplyr::copy_to(con, df_visit, "visit") dplyr::copy_to(con, df_r, "r") tbl_visit <- dplyr::tbl(con, "visit") tbl_r <- dplyr::tbl(con, "r") simaerep(tbl_visit, r = tbl_r, inframe = TRUE, visit_med75 = FALSE, under_only = FALSE)$df_eval simaerep(tbl_visit, r = tbl_r, inframe = TRUE, visit_med75 = TRUE, under_only = FALSE)$df_eval DBI::dbDisconnect(con)
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" aerep <- simaerep(df_visit) aerep str(aerep) # In-frame table operations simaerep(df_visit, inframe = TRUE, visit_med75 = FALSE, under_only = FALSE)$df_eval simaerep(df_visit, inframe = TRUE, visit_med75 = TRUE, under_only = FALSE)$df_eval # Database example con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") df_r <- tibble::tibble(rep = seq(1, 1000)) dplyr::copy_to(con, df_visit, "visit") dplyr::copy_to(con, df_r, "r") tbl_visit <- dplyr::tbl(con, "visit") tbl_r <- dplyr::tbl(con, "r") simaerep(tbl_visit, r = tbl_r, inframe = TRUE, visit_med75 = FALSE, under_only = FALSE)$df_eval simaerep(tbl_visit, r = tbl_r, inframe = TRUE, visit_med75 = TRUE, under_only = FALSE)$df_eval DBI::dbDisconnect(con)
Calculates visit_med75, n_pat_with_med75 and mean_ae_site_med75
site_aggr(df_visit, method = "med75_adj", min_pat_pool = 0.2, check = TRUE)
site_aggr(df_visit, method = "med75_adj", min_pat_pool = 0.2, check = TRUE)
df_visit |
dataframe with columns: study_id, site_number, patnum, visit, n_ae |
method |
character, one of c("med75", "med75_adj") defining method for defining evaluation point visit_med75 (see details), Default: "med75_adj" |
min_pat_pool |
double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
check |
logical, perform data check and attempt repair with check_df_visit(), computationally expensive on large data sets. Default: TRUE |
For determining the visit number at which we are going to evaluate AE reporting we take the maximum visit of each patient at the site and take the median. Then we multiply with 0.75 which will give us a cut-off point determining which patient will be evaluated. Of those patients we will evaluate we take the minimum of all maximum visits hence ensuring that we take the highest visit number possible without excluding more patients from the analysis. In order to ensure that the sampling pool for that visit is large enough we limit the visit number by the 80% quantile of maximum visits of all patients in the study.
dataframe with the following columns:
study identification
site identification
number of patients, site level
adjusted median(max(visit)) * 0.75 see Details
number of patients that meet visit_med75 criterion, site level
mean AE at visit_med75, site level
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_site %>% knitr::kable(digits = 2)
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, frac_site_with_ur = 0.4, ur_rate = 0.6 ) df_visit$study_id <- "A" df_site <- site_aggr(df_visit) df_site %>% knitr::kable(digits = 2)
with_progress
.Internal function. Use instead of
with_progress
within custom functions with progress
bars.
with_progress_cnd(ex, progress = TRUE)
with_progress_cnd(ex, progress = TRUE)
ex |
expression |
progress |
logical, Default: TRUE |
This wrapper adds a progress parameter to with_progress
so that we can control the progress bar in the user facing functions. The progressbar
only shows in interactive mode.
No return value, called for side effects
if (interactive()) { with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = TRUE ) with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = FALSE ) # wrap a function with progress bar with another call with progress bar f1 <- function(x, progress = TRUE) { with_progress_cnd( purrr_bar(x, .purrr = purrr::walk, .f = Sys.sleep, .steps = length(x), .progress = progress), progress = progress ) } # inner progress bar blocks outer progress bar progressr::with_progress( purrr_bar( rep(rep(1, 3),3), .purrr = purrr::walk, .f = f1, .steps = 3, .f_args = list(progress = TRUE) ) ) # inner progress bar turned off progressr::with_progress( purrr_bar( rep(list(rep(0.25, 3)), 5), .purrr = purrr::walk, .f = f1, .steps = 5, .f_args = list(progress = FALSE) ) ) }
if (interactive()) { with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = TRUE ) with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = FALSE ) # wrap a function with progress bar with another call with progress bar f1 <- function(x, progress = TRUE) { with_progress_cnd( purrr_bar(x, .purrr = purrr::walk, .f = Sys.sleep, .steps = length(x), .progress = progress), progress = progress ) } # inner progress bar blocks outer progress bar progressr::with_progress( purrr_bar( rep(rep(1, 3),3), .purrr = purrr::walk, .f = f1, .steps = 3, .f_args = list(progress = TRUE) ) ) # inner progress bar turned off progressr::with_progress( purrr_bar( rep(list(rep(0.25, 3)), 5), .purrr = purrr::walk, .f = f1, .steps = 5, .f_args = list(progress = FALSE) ) ) }