The respecatlbes
package provides a framework to
For this vignette we load the respecatbles
and
dplyr
package:
Note the respectables
package is still under
development.
Lets start defining a simple dataset dm
with a single
variable id
.
gen_id <- function(n) {
paste0("id-", 1:n)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id
## 1 id-1
## 2 id-2
Note that the argument n
is defined by
respectables
, in this case it is equal N
.
We can use the recepie dm_recepie
again to create a
different dataset:
## id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5
We will now specify the variables height
and
weight
to the dm
recipe:
gen_hw <- function(n) {
bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))
data.frame(height = runif(n, min = 1.5, 1.95)) %>%
mutate(weight = bmi * height^2)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight
## 1 id-1 1.943792 79.91244
## 2 id-2 1.791271 67.27335
Note that we used random number generators in gen_hw
,
hence rerunning gen_table_data
will give different
values
## id height weight
## 1 id-1 1.600315 53.34039
## 2 id-2 1.591516 49.22176
We will now continue our dm
example by defining the
variable age
which for illustrative purposes is dependent
on the height
.
gen_age <- function(n, .df) {
.df %>%
transmute(age = height*25)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight age
## 1 id-1 1.853473 59.02684 46.33681
## 2 id-2 1.557912 56.24207 38.94780
Note that respectables
creates the arguments
n
and .df
on the fly. Also,
respectables
determines the evaluation order of the
variables based on the dependency structure. That is,
respectables
does not guarantee to build the resulting data
frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) {
data.frame(color = sample(colors, n, replace = TRUE))
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args,
"color", no_deps, gen_color, list(color = c("blue", "red"))
)
gen_table_data(N = 4, recipe = dm_recipe)
## id height weight age color
## 1 id-1 1.853894 70.14849 46.34736 red
## 2 id-2 1.866296 65.30558 46.65740 red
## 3 id-3 1.807959 58.55209 45.19899 red
## 4 id-4 1.604780 50.24193 40.11949 blue
The miss_recipe
argument in gen_table_data
can be used to inject missing values in the last step when creating data
with gen_table_data
. That is, first the data generation
recipe is executed and then the missing data is injected. Hence, all
variables are available at execution time and the .df
argument is supplied to the func
.
gen_alternate_na <- function(.df) {
n <- nrow(.df)
rep(c(TRUE, FALSE), length.out = n)
}
dm_na_recipe <- tribble(
~variables, ~func, ~func_args,
"age", gen_alternate_na, no_args
)
gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
## id height weight age color
## 1 id-1 1.828758 72.83832 NA red
## 2 id-2 1.807926 67.21645 45.19816 blue
## 3 id-3 1.540891 41.21060 NA red
## 4 id-4 1.626210 68.30714 40.65525 blue
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq
with the
variable seqterm
being
c("step 1", ..., "step i")
, where i
is
extracted from the variable id
.
dm <- gen_table_data(N = 3, recipe = dm_recipe)
# grow dataset
gen_seq <- function(.db) {
dm <- .db$dm
ni <- as.numeric(substring(dm$id, 4))
df_grow <- data.frame(
id = rep(dm$id, ni),
seq = unlist(sapply(ni, seq, from = 1))
)
left_join(dm, df_grow, by = "id")
}
aseq_scf_recipe <- tribble(
~foreign_tbl, ~foreign_key, ~func, ~func_args,
"dm", "id", gen_seq, no_args
)
gen_seq_term <- function(.df, ...) {
data.frame(seqterm = paste("step", .df$seq))
}
aseq_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"seqterm", "seq", gen_seq_term, no_args
)
gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
## id height weight age color seq seqterm
## 1 id-1 1.829745 69.46447 45.74364 red 1 step 1
## 2 id-2 1.872380 70.51194 46.80949 blue 1 step 1
## 3 id-2 1.872380 70.51194 46.80949 blue 2 step 2
## 4 id-3 1.806641 58.39891 45.16601 blue 1 step 1
## 5 id-3 1.806641 58.39891 45.16601 blue 2 step 2
## 6 id-3 1.806641 58.39891 45.16601 blue 3 step 3
The steps here are:
joinrec
to grow a new data frame, say
A
, possibly from db
gen_table_data
with the following arguments
A
for df
tblrec
for recipe
miss_recipe
Note that this functionality is under development. Currently
aseq_scf_recipe
needs to be a tibble with one row, and the
foreign_key
is currently not used.
dplyr
This section needs further work.
Let’s map the following code into respectible
recipes:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
There are multiple solutions to map this to the
respectables
framework.
gen_toupper <- function(varname, .df, ...) {
toupper(.df[[varname]])
}
rcp <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"SPECIES", "Species", gen_toupper, list(varname = "Species")
)
gen_table_data(recipe = rcp, df = iris) %>%
head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
Note in gen_toupper
we use the ellipsis ...
to absorb not used arguments such as n
.