The respecatlbes
package provides a framework to
For this vignette we load the respecatbles
and
dplyr
package:
Note the respectables
package is still under
development.
Lets start defining a simple dataset dm
with a single
variable id
.
gen_id <- function(n) {
paste0("id-", 1:n)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id
## 1 id-1
## 2 id-2
Note that the argument n
is defined by
respectables
, in this case it is equal N
.
We can use the recepie dm_recepie
again to create a
different dataset:
## id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5
We will now specify the variables height
and
weight
to the dm
recipe:
gen_hw <- function(n) {
bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))
data.frame(height = runif(n, min = 1.5, 1.95)) %>%
mutate(weight = bmi * height^2)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight
## 1 id-1 1.547805 49.27803
## 2 id-2 1.592881 51.36388
Note that we used random number generators in gen_hw
,
hence rerunning gen_table_data
will give different
values
## id height weight
## 1 id-1 1.906048 69.70237
## 2 id-2 1.543742 44.50299
We will now continue our dm
example by defining the
variable age
which for illustrative purposes is dependent
on the height
.
gen_age <- function(n, .df) {
.df %>%
transmute(age = height*25)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight age
## 1 id-1 1.568570 60.02290 39.21426
## 2 id-2 1.658362 55.46995 41.45905
Note that respectables
creates the arguments
n
and .df
on the fly. Also,
respectables
determines the evaluation order of the
variables based on the dependency structure. That is,
respectables
does not guarantee to build the resulting data
frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) {
data.frame(color = sample(colors, n, replace = TRUE))
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args,
"color", no_deps, gen_color, list(color = c("blue", "red"))
)
gen_table_data(N = 4, recipe = dm_recipe)
## id height weight age color
## 1 id-1 1.601608 48.29782 40.04019 blue
## 2 id-2 1.692776 55.71318 42.31939 red
## 3 id-3 1.928869 72.33984 48.22171 blue
## 4 id-4 1.938519 65.40852 48.46298 blue
The miss_recipe
argument in gen_table_data
can be used to inject missing values in the last step when creating data
with gen_table_data
. That is, first the data generation
recipe is executed and then the missing data is injected. Hence, all
variables are available at execution time and the .df
argument is supplied to the func
.
gen_alternate_na <- function(.df) {
n <- nrow(.df)
rep(c(TRUE, FALSE), length.out = n)
}
dm_na_recipe <- tribble(
~variables, ~func, ~func_args,
"age", gen_alternate_na, no_args
)
gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
## id height weight age color
## 1 id-1 1.685651 58.74241 NA red
## 2 id-2 1.768599 57.62706 44.21496 blue
## 3 id-3 1.570489 48.85073 NA blue
## 4 id-4 1.811559 61.47564 45.28899 red
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq
with the
variable seqterm
being
c("step 1", ..., "step i")
, where i
is
extracted from the variable id
.
dm <- gen_table_data(N = 3, recipe = dm_recipe)
# grow dataset
gen_seq <- function(.db) {
dm <- .db$dm
ni <- as.numeric(substring(dm$id, 4))
df_grow <- data.frame(
id = rep(dm$id, ni),
seq = unlist(sapply(ni, seq, from = 1))
)
left_join(dm, df_grow, by = "id")
}
aseq_scf_recipe <- tribble(
~foreign_tbl, ~foreign_key, ~func, ~func_args,
"dm", "id", gen_seq, no_args
)
gen_seq_term <- function(.df, ...) {
data.frame(seqterm = paste("step", .df$seq))
}
aseq_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"seqterm", "seq", gen_seq_term, no_args
)
gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
## id height weight age color seq seqterm
## 1 id-1 1.700983 67.94144 42.52457 red 1 step 1
## 2 id-2 1.736638 79.05012 43.41595 red 1 step 1
## 3 id-2 1.736638 79.05012 43.41595 red 2 step 2
## 4 id-3 1.770105 61.37452 44.25262 blue 1 step 1
## 5 id-3 1.770105 61.37452 44.25262 blue 2 step 2
## 6 id-3 1.770105 61.37452 44.25262 blue 3 step 3
The steps here are:
joinrec
to grow a new data frame, say
A
, possibly from db
gen_table_data
with the following arguments
A
for df
tblrec
for recipe
miss_recipe
Note that this functionality is under development. Currently
aseq_scf_recipe
needs to be a tibble with one row, and the
foreign_key
is currently not used.
dplyr
This section needs further work.
Let’s map the following code into respectible
recipes:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
There are multiple solutions to map this to the
respectables
framework.
gen_toupper <- function(varname, .df, ...) {
toupper(.df[[varname]])
}
rcp <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"SPECIES", "Species", gen_toupper, list(varname = "Species")
)
gen_table_data(recipe = rcp, df = iris) %>%
head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
Note in gen_toupper
we use the ellipsis ...
to absorb not used arguments such as n
.