The respecatlbes
package provides a framework to
For this vignette we load the respecatbles
and
dplyr
package:
Note the respectables
package is still under
development.
Lets start defining a simple dataset dm
with a single
variable id
.
gen_id <- function(n) {
paste0("id-", 1:n)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id
## 1 id-1
## 2 id-2
Note that the argument n
is defined by
respectables
, in this case it is equal N
.
We can use the recepie dm_recepie
again to create a
different dataset:
## id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5
We will now specify the variables height
and
weight
to the dm
recipe:
gen_hw <- function(n) {
bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))
data.frame(height = runif(n, min = 1.5, 1.95)) %>%
mutate(weight = bmi * height^2)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight
## 1 id-1 1.835681 61.37383
## 2 id-2 1.584224 44.35310
Note that we used random number generators in gen_hw
,
hence rerunning gen_table_data
will give different
values
## id height weight
## 1 id-1 1.704511 69.89425
## 2 id-2 1.673766 47.69598
We will now continue our dm
example by defining the
variable age
which for illustrative purposes is dependent
on the height
.
gen_age <- function(n, .df) {
.df %>%
transmute(age = height*25)
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args
)
gen_table_data(N = 2, recipe = dm_recipe)
## id height weight age
## 1 id-1 1.668165 48.96687 41.70413
## 2 id-2 1.547495 43.62934 38.68739
Note that respectables
creates the arguments
n
and .df
on the fly. Also,
respectables
determines the evaluation order of the
variables based on the dependency structure. That is,
respectables
does not guarantee to build the resulting data
frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) {
data.frame(color = sample(colors, n, replace = TRUE))
}
dm_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"id", no_deps, gen_id, no_args,
c("height", "weight"), no_deps, gen_hw, no_args,
"age", "height", gen_age, no_args,
"color", no_deps, gen_color, list(color = c("blue", "red"))
)
gen_table_data(N = 4, recipe = dm_recipe)
## id height weight age color
## 1 id-1 1.766169 59.86787 44.15423 red
## 2 id-2 1.918571 86.07405 47.96427 blue
## 3 id-3 1.548031 50.23885 38.70077 blue
## 4 id-4 1.697221 54.30586 42.43053 red
The miss_recipe
argument in gen_table_data
can be used to inject missing values in the last step when creating data
with gen_table_data
. That is, first the data generation
recipe is executed and then the missing data is injected. Hence, all
variables are available at execution time and the .df
argument is supplied to the func
.
gen_alternate_na <- function(.df) {
n <- nrow(.df)
rep(c(TRUE, FALSE), length.out = n)
}
dm_na_recipe <- tribble(
~variables, ~func, ~func_args,
"age", gen_alternate_na, no_args
)
gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
## id height weight age color
## 1 id-1 1.913277 73.47471 NA red
## 2 id-2 1.676949 60.91223 41.92373 blue
## 3 id-3 1.561641 51.86744 NA red
## 4 id-4 1.756622 81.29693 43.91554 blue
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq
with the
variable seqterm
being
c("step 1", ..., "step i")
, where i
is
extracted from the variable id
.
dm <- gen_table_data(N = 3, recipe = dm_recipe)
# grow dataset
gen_seq <- function(.db) {
dm <- .db$dm
ni <- as.numeric(substring(dm$id, 4))
df_grow <- data.frame(
id = rep(dm$id, ni),
seq = unlist(sapply(ni, seq, from = 1))
)
left_join(dm, df_grow, by = "id")
}
aseq_scf_recipe <- tribble(
~foreign_tbl, ~foreign_key, ~func, ~func_args,
"dm", "id", gen_seq, no_args
)
gen_seq_term <- function(.df, ...) {
data.frame(seqterm = paste("step", .df$seq))
}
aseq_recipe <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"seqterm", "seq", gen_seq_term, no_args
)
gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
## id height weight age color seq seqterm
## 1 id-1 1.874792 62.68412 46.86981 blue 1 step 1
## 2 id-2 1.777130 53.97085 44.42826 blue 1 step 1
## 3 id-2 1.777130 53.97085 44.42826 blue 2 step 2
## 4 id-3 1.702817 75.99336 42.57044 blue 1 step 1
## 5 id-3 1.702817 75.99336 42.57044 blue 2 step 2
## 6 id-3 1.702817 75.99336 42.57044 blue 3 step 3
The steps here are:
joinrec
to grow a new data frame, say
A
, possibly from db
gen_table_data
with the following arguments
A
for df
tblrec
for recipe
miss_recipe
Note that this functionality is under development. Currently
aseq_scf_recipe
needs to be a tibble with one row, and the
foreign_key
is currently not used.
dplyr
This section needs further work.
Let’s map the following code into respectible
recipes:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
There are multiple solutions to map this to the
respectables
framework.
gen_toupper <- function(varname, .df, ...) {
toupper(.df[[varname]])
}
rcp <- tribble(
~variables, ~dependencies, ~func, ~func_args,
"SPECIES", "Species", gen_toupper, list(varname = "Species")
)
gen_table_data(recipe = rcp, df = iris) %>%
head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1 5.1 3.5 1.4 0.2 setosa SETOSA
## 2 4.9 3.0 1.4 0.2 setosa SETOSA
## 3 4.7 3.2 1.3 0.2 setosa SETOSA
## 4 4.6 3.1 1.5 0.2 setosa SETOSA
## 5 5.0 3.6 1.4 0.2 setosa SETOSA
## 6 5.4 3.9 1.7 0.4 setosa SETOSA
Note in gen_toupper
we use the ellipsis ...
to absorb not used arguments such as n
.