---
title: "Univariate Statistics Analysis"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Univariate Statistics Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(tidytlg)
```

## Introduction

The `univar` function is called to produce univariate-type summary statistics for numeric variables. A typical example of using the `univar` function is to create a `tbl` chunk as shown below for summarizing N, MEAN (SD), MEDIAN, RANGE, IQ Range for the `Age` variable in adsl.

```{r, message=FALSE}
tbl <- cdisc_adsl %>%
  univar(colvar = "TRT01PN",
         rowvar = "AGE",
         statlist = statlist(c("N", "MEANSD", "MEDIAN", "RANGE", "IQRANGE")),
         decimal = 0,
         row_header = "Age (Years)")

knitr::kable(tbl)
```


## Customizing Univariate Statistics

Besides the 5 standard univariate statistics shown above that are often required in the demographic tables, you can pick any univariate statistics from the table below and arrange them in a character vector for passing to the `statlist` argument.

| Statlist   |          Description          |
|------------|:-----------------------------:|
| N          | number of non-missing values  |
| SUM        |              sum              |
| MEAN       |             mean              |
| GeoMEAN    |        geometric mean         |
| SD         |      standard deviation       |
| SE         |        standard error         |
| CV         |   coefficient of variation    |
| GSD        | geometric standard deviation  |
| GSE        |   geometric standard error    |
| MEANSD     |   mean (standard deviation)   |
| MEANSE     |     mean (standard error)     |
| MEDIAN     |            median             |
| MIN        |            minimum            |
| MAX        |            maximum            |
| RANGE      |             range             |
| Q1         |         1st quartile          |
| Q3         |         3rd quartile          |
| IQRANGE    |     inter-quartile range      |
| MEDRANGE   |        median (range)         |
| MEDIQRANGE | median (inter-quartile range) |
| MEAN_CI    |        mean (95% C.I.)        |
| GeoMEAN_CI |   geometric mean (95% C.I.)   |

A customized example is shown below for displaying N, Mean (95% C.I.), and Geometric Mean (95% C.I.) for the `Age` variable in adsl.

```{r}
tbl <- cdisc_adsl %>%
  univar(colvar = "TRT01PN",
         rowvar = "AGE",
         statlist = statlist(c("N", "MEAN_CI", "GeoMEAN_CI")),
         decimal = 0,
         row_header = "Age (Years)")

knitr::kable(tbl)
```

## Decimal Precision

The decimal precision to be used in display of univariate statistics is comprised of two pieces.  The base decimal precision is what controls the base number of decimals to be used, this can be set using the `decimal` argument.  The precision extra is what controls the difference between the precision used for different statistics, this is controlled using the option `tidytlg.precision.extra`.  The precision extra is the amount precision will need to be adjusted from the base precision for each different statistic. The default of the precision extra is set by following our table and listing conventions: Range has a precision extra of 0, Mean and Median have a precision extra of 1, SD has a precision extra of 2. To see a full list of precision extra defaults, please type `options("tidytlg.precision.extra")` in your console. An example function call of `univar` is shown below for presenting the data using a base decimal value of 2.

```{r}
tbl <- cdisc_adsl %>%
  univar(colvar = "TRT01PN",
         rowvar = "BMIBL",
         decimal = 2,
         row_header = "Age (Years)")

knitr::kable(tbl)
```

## Data Driven Precision

While static precision is useful in some cases, data driven precision is also available.  This is controlled using the `precisionby`, `precisionon`, and `decimal` arguments.  `precisionby` tells the function the variable(s) the user would like to compute the precision using.  This could be variables such as PARAMCD if the precision is to be varied between parameter. `precisionon` is the variable that should be used when calculating how many base decimal places are present in the data.  The last piece to data drive precision is the `decimal` argument which gives us a cap for base precision values.  This can be used to help avoid unnecessarily long decimals in your final output.

A customized example is shown below for presenting the univariate summary of vital signs data using PARAMCD as the by variable. In addition, we would like the precision to be data driven and varied by parameter, which can be achieved by setting `precisionby = "PARAMCD"`.


```{r}
tbl <- cdisc_advs %>%
  univar(colvar = "TRTAN",
         rowvar = "AVAL",
         rowbyvar = "PARAMCD",
         precisionby = "PARAMCD",
         decimal = 4)

knitr::kable(tbl)
```

While data driven precision is usually done with a by variable it doesn't always have to.  The `precisionon` argument can be used to calculate data driven precision on a single variable.  This might be useful if a table template is going to be used multiple times or if multiple parts of the table are using a similar call but need to have different data driven precision.  The following example uses the variable CHG to calculate precision, similar to the above example we still use `decimal = 4` to cap our decimal spaces at 4.

```{r}
tbl <- cdisc_advs %>%
  filter(PARAMCD == "SYSBP") %>%
  univar(colvar = "TRTAN",
         rowvar = "CHG",
         precisionon = "CHG",
         decimal = 4)

knitr::kable(tbl)
```

Another use case for the `precisionon` argument could be if you need to calculate the summary on one variable but use another for precision for table output formatting.  The following example uses both `precisionby` and `precisionon` to show how they can be used together to make special tables.  For this table, we are creating an element of the table that summarizes AVAL but uses CHG to calculate precision.  This allows us to have consistent formatting throughout the table even though the two variables may have different precision.  We also calculate precision by PARAMCD since the output table will be presented using that as a by variable.

```{r}
tbl <- cdisc_advs %>%
  filter(PARAMCD == "SYSBP") %>%
  univar(colvar = "TRTAN",
         rowvar = "AVAL",
         rowbyvar = "PARAMCD",
         precisionby = "PARAMCD",
         precisionon = "CHG",
         decimal = 4)

knitr::kable(tbl)
```