The respecatlbes package provides a framework to
For this vignette we load the respecatbles and
dplyr package:
Note the respectables package is still under
development.
Lets start defining a simple dataset dm with a single
variable id.
gen_id <- function(n) {
paste0("id-", 1:n)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)## id
## 1 id-1
## 2 id-2
Note that the argument n is defined by
respectables, in this case it is equal N.
We can use the recepie dm_recepie again to create a
different dataset:
## id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5
We will now specify the variables height and
weight to the dm recipe:
gen_hw <- function(n) {
bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))
data.frame(height = runif(n, min = 1.5, 1.95)) %>%
mutate(weight = bmi * height^2)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)## id height weight
## 1 id-1 1.805600 56.77960
## 2 id-2 1.764667 62.44193
Note that we used random number generators in gen_hw,
hence rerunning gen_table_data will give different
values
## id height weight
## 1 id-1 1.588430 55.20884
## 2 id-2 1.750001 70.22846
We will now continue our dm example by defining the
variable age which for illustrative purposes is dependent
on the height.
gen_age <- function(n, .df) {
.df %>%
transmute(age = height*25)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)## id height weight age
## 1 id-1 1.859828 69.07735 46.49569
## 2 id-2 1.765107 56.10640 44.12768
Note that respectables creates the arguments
n and .df on the fly. Also,
respectables determines the evaluation order of the
variables based on the dependency structure. That is,
respectables does not guarantee to build the resulting data
frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) {
data.frame(color = sample(colors, n, replace = TRUE))
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args,
"color", no_deps, gen_color, list(color = c("blue", "red"))
)
gen_table_data(N = 4, recipe = dm_recipe)## id height weight age color
## 1 id-1 1.587398 52.16781 39.68496 red
## 2 id-2 1.859781 59.13455 46.49453 blue
## 3 id-3 1.623077 62.95267 40.57693 red
## 4 id-4 1.800500 61.27754 45.01250 red
The miss_recipe argument in gen_table_data
can be used to inject missing values in the last step when creating data
with gen_table_data. That is, first the data generation
recipe is executed and then the missing data is injected. Hence, all
variables are available at execution time and the .df
argument is supplied to the func.
gen_alternate_na <- function(.df) {
n <- nrow(.df)
rep(c(TRUE, FALSE), length.out = n)
}
dm_na_recipe <- tribble(
~variables, ~func, ~func_args,
"age", gen_alternate_na, no_args
)
gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)## id height weight age color
## 1 id-1 1.704042 57.45607 NA red
## 2 id-2 1.560078 65.20829 39.00194 red
## 3 id-3 1.652178 51.83063 NA red
## 4 id-4 1.934419 69.74841 48.36047 blue
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq with the
variable seqterm being
c("step 1", ..., "step i"), where i is
extracted from the variable id.
dm <- gen_table_data(N = 3, recipe = dm_recipe)
# grow dataset
gen_seq <- function(.db) {
dm <- .db$dm
ni <- as.numeric(substring(dm$id, 4))
df_grow <- data.frame(
id = rep(dm$id, ni),
seq = unlist(sapply(ni, seq, from = 1))
)
left_join(dm, df_grow, by = "id")
}
aseq_scf_recipe <- tribble(
~foreign_tbl, ~foreign_key, ~func, ~func_args,
"dm", "id", gen_seq, no_args
)
gen_seq_term <- function(.df, ...) {
data.frame(seqterm = paste("step", .df$seq))
}
aseq_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"seqterm", "seq", gen_seq_term, no_args
)
gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))## id height weight age color seq seqterm
## 1 id-1 1.569424 60.41217 39.23560 blue 1 step 1
## 2 id-2 1.674346 61.49738 41.85864 red 1 step 1
## 3 id-2 1.674346 61.49738 41.85864 red 2 step 2
## 4 id-3 1.742860 62.44902 43.57150 blue 1 step 1
## 5 id-3 1.742860 62.44902 43.57150 blue 2 step 2
## 6 id-3 1.742860 62.44902 43.57150 blue 3 step 3
The steps here are:
joinrec to grow a new data frame, say
A, possibly from dbgen_table_data with the following arguments
A for dftblrec for recipemiss_recipeNote that this functionality is under development. Currently
aseq_scf_recipe needs to be a tibble with one row, and the
foreign_key is currently not used.
dplyrThis section needs further work.
Let’s map the following code into respectible
recipes:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
There are multiple solutions to map this to the
respectables framework.
gen_toupper <- function(varname, .df, ...) {
toupper(.df[[varname]])
}
rcp <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"SPECIES", "Species", gen_toupper, list(varname = "Species")
)
gen_table_data(recipe = rcp, df = iris) %>%
head()## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
Note in gen_toupper we use the ellipsis ...
to absorb not used arguments such as n.