---
title: "Reformatting"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Reformatting}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dunlin)
```
## Introduction
Reformatting in `dunlin` consists in replacing predetermined values by another in particular variables for selected
tables of a data set stored.
This is performed in two steps:
1. A Reformatting Map (`rule` object) is created which specifies the correspondence between the old and the new values
2. The reformatting itself is performed with the `reformat()` function.
## The Formatting Map Structure
The Reformatting Map is a `rule` object inheriting from `character`.
Its names are the new values to be used, and its values are the old values to be used.
```{r}
rule(A = "a", B = c("c", "d"))
```
This rule will replace "a" with "A", replace "c" or "d" with "B".
## Calling `reformat`
`reformat` is a generic supports reformatting of `character` or `factor`. Reformatting for
other types of variables is meaningless. `reformat` will also preserve the attributes of the
original data, e.g. the data type or labels will be unchanged.
An example of reformatting `character` can be
```{r}
r <- rule(A = "a", B = c("c", "d"))
reformat(c("a", "c", "d", NA), r)
```
We can see that the `NA` values are not changed.
Now we test the factor reformatting:
```{r}
r <- rule(A = "a", B = c("c", "d"))
reformat(factor(c("a", "c", "d", NA)), r)
```
The `NA` values are also not changed.
However, if we including reformatting for the `NA`, there is something different:
```{r}
r <- rule(A = "a", C = NA, B = c("c", "d"))
reformat(factor(c("a", "c", "d", NA)), r)
```
By default, the level replacing `NA` is set as the last one. This can be changed by setting `.na_last = FALSE`.
```{r}
r <- rule(A = "a", C = NA, B = c("c", "d"))
reformat(factor(c("a", "c", "d", NA)), r, .na_last = FALSE)
```
For `list` of `data.frames`, the `format` argument is actually a nested list of rule.
The first layer indicates the table names, the second layer indicates the variables in that table.
Reformatting is only available for columns of characters or factors, reformatting columns of another types will result in a warning.
#### Example
```{r}
df1 <- data.frame(
"char" = c("", "b", NA, "a", "k", "x"),
"fact" = factor(c("f1", "f2", NA, NA, "f1", "f1"), levels = c("f2", "f1")),
"logi" = c(NA, FALSE, TRUE, NA, FALSE, NA)
)
df2 <- data.frame(
"char" = c("a", "b", NA, "a", "k", "x"),
"fact" = factor(c("f1", "f2", NA, NA, "f1", "f1"))
)
db <- list(df1 = df1, df2 = df2)
attr(db$df1$char, "label") <- "my label"
rule_map <- list(
df1 = list(
char = rule("Empty" = "", "B" = "b", "Not Available" = NA),
fact = rule("F1" = "f1"),
logi = rule()
),
df2 = list(
char = rule("Empty" = "", "A" = "a", "Not Available" = NA)
)
)
res <- reformat(db, rule_map, .na_last = TRUE)
res
```
## Rule Attributes
The behavior of a rule can be further refined using special mapping values.
* `.to_NA` convert the specified character to `NA` at the end of the process.
```{r}
r <- rule(A = "a", B = c("c", "d"), .to_NA = c("x"))
reformat(c("a", "c", "d", NA, "x"), r)
```
* `.drop` specifies whether unused levels should be dropped.
```{r}
# With drop = FALSE
obj <- factor(c("a", "c", "d", NA), levels = c("d", "c", "a", "Not used"))
r <- rule(A = "a", B = c("c", "d"))
reformat(obj, r)
# With drop = TRUE
obj <- factor(c("a", "c", "d", NA), levels = c("d", "c", "a", "Not used"))
r <- rule(A = "a", B = c("c", "d"), .drop = TRUE)
reformat(obj, r)
```
Note that behavior of the rule can be overridden using the corresponding arguments in `reformat`.
```{r}
r <- rule(A = "a", B = c("c", "d"), .to_NA = c("x"), .drop = TRUE)
obj <- factor(c("a", "c", "d", NA, "x", "y"), levels = c("d", "c", "a", "Not used", "x", "y"))
reformat(obj, r)
# Override
reformat(obj, r, .to_NA = "y", .drop = FALSE)
```