The respecatlbes
package provides a framework to
For this vignette we load the respecatbles
and
dplyr
package:
Note the respectables
package is still under
development.
Lets start defining a simple dataset dm
with a single
variable id
.
gen_id <- function(n) {
paste0("id-", 1:n)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id
## 1 id-1
## 2 id-2
Note that the argument n
is defined by
respectables
, in this case it is equal N
.
We can use the recepie dm_recepie
again to create a
different dataset:
## id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5
We will now specify the variables height
and
weight
to the dm
recipe:
gen_hw <- function(n) {
bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))
data.frame(height = runif(n, min = 1.5, 1.95)) %>%
mutate(weight = bmi * height^2)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight
## 1 id-1 1.766413 54.70709
## 2 id-2 1.680255 61.17568
Note that we used random number generators in gen_hw
,
hence rerunning gen_table_data
will give different
values
## id height weight
## 1 id-1 1.903912 85.25352
## 2 id-2 1.884228 75.13749
We will now continue our dm
example by defining the
variable age
which for illustrative purposes is dependent
on the height
.
gen_age <- function(n, .df) {
.df %>%
transmute(age = height*25)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight age
## 1 id-1 1.755060 57.03430 43.87651
## 2 id-2 1.888836 64.41491 47.22091
Note that respectables
creates the arguments
n
and .df
on the fly. Also,
respectables
determines the evaluation order of the
variables based on the dependency structure. That is,
respectables
does not guarantee to build the resulting data
frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) {
data.frame(color = sample(colors, n, replace = TRUE))
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args,
"color", no_deps, gen_color, list(color = c("blue", "red"))
)
gen_table_data(N = 4, recipe = dm_recipe)
## id height weight age color
## 1 id-1 1.693552 58.90777 42.33879 red
## 2 id-2 1.833927 57.60692 45.84817 blue
## 3 id-3 1.540379 49.29668 38.50947 blue
## 4 id-4 1.691383 57.73142 42.28457 red
The miss_recipe
argument in gen_table_data
can be used to inject missing values in the last step when creating data
with gen_table_data
. That is, first the data generation
recipe is executed and then the missing data is injected. Hence, all
variables are available at execution time and the .df
argument is supplied to the func
.
gen_alternate_na <- function(.df) {
n <- nrow(.df)
rep(c(TRUE, FALSE), length.out = n)
}
dm_na_recipe <- tribble(
~variables, ~func, ~func_args,
"age", gen_alternate_na, no_args
)
gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
## id height weight age color
## 1 id-1 1.541134 44.53334 NA blue
## 2 id-2 1.793726 70.39314 44.84316 blue
## 3 id-3 1.562529 53.84893 NA blue
## 4 id-4 1.935932 79.35835 48.39830 red
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq
with the
variable seqterm
being
c("step 1", ..., "step i")
, where i
is
extracted from the variable id
.
dm <- gen_table_data(N = 3, recipe = dm_recipe)
# grow dataset
gen_seq <- function(.db) {
dm <- .db$dm
ni <- as.numeric(substring(dm$id, 4))
df_grow <- data.frame(
id = rep(dm$id, ni),
seq = unlist(sapply(ni, seq, from = 1))
)
left_join(dm, df_grow, by = "id")
}
aseq_scf_recipe <- tribble(
~foreign_tbl, ~foreign_key, ~func, ~func_args,
"dm", "id", gen_seq, no_args
)
gen_seq_term <- function(.df, ...) {
data.frame(seqterm = paste("step", .df$seq))
}
aseq_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"seqterm", "seq", gen_seq_term, no_args
)
gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
## id height weight age color seq seqterm
## 1 id-1 1.758112 65.20014 43.95280 blue 1 step 1
## 2 id-2 1.588006 69.84311 39.70016 blue 1 step 1
## 3 id-2 1.588006 69.84311 39.70016 blue 2 step 2
## 4 id-3 1.772379 59.71679 44.30948 red 1 step 1
## 5 id-3 1.772379 59.71679 44.30948 red 2 step 2
## 6 id-3 1.772379 59.71679 44.30948 red 3 step 3
The steps here are:
joinrec
to grow a new data frame, say
A
, possibly from db
gen_table_data
with the following arguments
A
for df
tblrec
for recipe
miss_recipe
Note that this functionality is under development. Currently
aseq_scf_recipe
needs to be a tibble with one row, and the
foreign_key
is currently not used.
dplyr
This section needs further work.
Let’s map the following code into respectible
recipes:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
There are multiple solutions to map this to the
respectables
framework.
gen_toupper <- function(varname, .df, ...) {
toupper(.df[[varname]])
}
rcp <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"SPECIES", "Species", gen_toupper, list(varname = "Species")
)
gen_table_data(recipe = rcp, df = iris) %>%
head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
Note in gen_toupper
we use the ellipsis ...
to absorb not used arguments such as n
.