Package 'hermes' reference manual

Title:	Preprocessing, analyzing, and reporting of RNA-seq data
Description:	Provides classes and functions for quality control, filtering, normalization and differential expression analysis of pre-processed `RNA-seq` data. Data can be imported from `SummarizedExperiment` as well as `matrix` objects and can be annotated from `BioMart`. Filtering for genes without too low expression or containing required annotations, as well as filtering for samples with sufficient correlation to other samples or total number of reads is supported. The standard normalization methods including cpm, rpkm and tpm can be used, and 'DESeq2` as well as voom differential expression analyses are available.
Authors:	Daniel Sabanés Bové [aut, cre], Namrata Bhatia [aut], Stefanie Bienert [aut], Benoit Falquet [aut], Haocheng Li [aut], Jeff Luong [aut], Lyndsee Midori Zhang [aut], Alex Richardson [aut], Simona Rossomanno [aut], Tim Treis [aut], Mark Yan [aut], Naomi Chang [aut], Chendi Liao [aut], Carolyn Zhang [aut], Joseph N. Paulson [aut], F. Hoffmann-La Roche AG [cph, fnd]
Maintainer:	Daniel Sabanés Bové <[email protected]>
License:	Apache License 2.0
Version:	1.7.2.9005
Built:	2025-03-25 06:08:43 UTC
Source:	https://github.com/insightsengineering/hermes

`hermes` Package

Description

hermes facilitates preprocessing, analyzing, and reporting of RNA-seq data.

Author(s)

Maintainer: Daniel Sabanés Bové [email protected]

Authors:

Namrata Bhatia
Stefanie Bienert
Benoit Falquet
Haocheng Li
Jeff Luong
Lyndsee Midori Zhang [email protected]
Alex Richardson
Simona Rossomanno
Tim Treis
Mark Yan
Naomi Chang
Chendi Liao [email protected]
Carolyn Zhang
Joseph N. Paulson [email protected]

Other contributors:

F. Hoffmann-La Roche AG [copyright holder, funder]

Add Quality Flags

Description

The function add_quality_flags() adds quality flag information to a AnyHermesData object:

low_expression_flag: for each gene, counts how many samples don't pass a minimum expression Counts per Million (CPM) threshold. If too many, then it flags this gene as a "low expression" gene.
tech_failure_flag: first calculates the Pearson correlation matrix of the sample wise CPM values, resulting in a matrix measuring the correlation between samples. Then compares the average correlation per sample with a threshold - if it is too low, then the sample is flagged as a "technical failure" sample.
low_depth_flag: computes the library size (total number of counts) per sample. If this number is too low, the sample is flagged as a "low depth" sample.

Separate helper functions are internally used to create the flags, and separate getter functions allow easy access to the quality control flags in an object.

Usage

add_quality_flags(object, control = control_quality(), overwrite = FALSE)

h_low_expression_flag(object, control = control_quality())

h_low_depth_flag(object, control = control_quality())

h_tech_failure_flag(object, control = control_quality())

get_tech_failure(object)

get_low_depth(object)

get_low_expression(object)
add_quality_flags(object, control = control_quality(), overwrite = FALSE)

h_low_expression_flag(object, control = control_quality())

h_low_depth_flag(object, control = control_quality())

h_tech_failure_flag(object, control = control_quality())

get_tech_failure(object)

get_low_depth(object)

get_low_expression(object)

Arguments

`object`	(`AnyHermesData`) input.
`control`	(`list`) list of settings (thresholds etc.) used to compute the quality control flags, produced by `control_quality()`.
`overwrite`	(`flag`) whether previously added flags may be overwritten.

Details

While object already has the variables mentioned above as part of the rowData and colData (as this is enforced by the validation method for AnyHermesData), they are usually still NA after the initial object creation.

Value

The input object with added quality flags.

Functions

h_low_expression_flag(): creates the low expression flag for genes given control settings.
h_low_depth_flag(): creates the low depth (library size) flag for samples given control settings.
h_tech_failure_flag(): creates the technical failure flag for samples given control settings.
get_tech_failure(): get the technical failure flags for all samples.
get_low_depth(): get the low depth failure flags for all samples.
get_low_expression(): get the low expression failure flags for all genes.

Examples

# Adding default quality flags to `AnyHermesData` object.
object <- hermes_data
result <- add_quality_flags(object)
which(get_tech_failure(result) != get_tech_failure(object))
head(get_low_expression(result))
head(get_tech_failure(result))
head(get_low_depth(result))

# It is possible to overwrite flags if needed, which will trigger a message.
result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE)

# Separate calculation of low expression flag.
low_expr_flag <- h_low_expression_flag(
  object,
  control_quality(min_cpm = 500, min_cpm_prop = 0.9)
)
length(low_expr_flag) == nrow(object)
head(low_expr_flag)

# Separate calculation of low depth flag.
low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5))
length(low_depth_flag) == ncol(object)
head(low_depth_flag)

# Separate calculation of technical failure flag.
tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35))
length(tech_failure_flag) == ncol(object)
head(tech_failure_flag)
head(get_tech_failure(object))
head(get_low_depth(object))
head(get_low_expression(object))
# Adding default quality flags to `AnyHermesData` object.
object <- hermes_data
result <- add_quality_flags(object)
which(get_tech_failure(result) != get_tech_failure(object))
head(get_low_expression(result))
head(get_tech_failure(result))
head(get_low_depth(result))

# It is possible to overwrite flags if needed, which will trigger a message.
result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE)

# Separate calculation of low expression flag.
low_expr_flag <- h_low_expression_flag(
  object,
  control_quality(min_cpm = 500, min_cpm_prop = 0.9)
)
length(low_expr_flag) == nrow(object)
head(low_expr_flag)

# Separate calculation of low depth flag.
low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5))
length(low_depth_flag) == ncol(object)
head(low_depth_flag)

# Separate calculation of technical failure flag.
tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35))
length(tech_failure_flag) == ncol(object)
head(tech_failure_flag)
head(get_tech_failure(object))
head(get_low_depth(object))
head(get_low_expression(object))

Checks Whether All Missing

Description

Internal function to check whether a whole vector is NA.

Usage

all_na(x)
all_na(x)

Arguments

`x`	(`vector`) vector to check.

Value

Corresponding flag.

Annotation Accessor and Setter

Description

These methods access and set the gene annotations stored in a AnyHermesData object.

Usage

## S4 method for signature 'AnyHermesData'
annotation(object, ...)

.row_data_annotation_cols

## S4 replacement method for signature 'AnyHermesData,DataFrame'
annotation(object) <- value
## S4 method for signature 'AnyHermesData'
annotation(object, ...)

.row_data_annotation_cols

## S4 replacement method for signature 'AnyHermesData,DataFrame'
annotation(object) <- value

Arguments

`object`	(`AnyHermesData`) object to access the annotations from.
`...`	not used.
`value`	(`DataFrame`) what should the annotations be replaced with.

Format

The annotation column names are available in the exported character vector .row_data_annotation_cols.

Value

The S4Vectors::DataFrame with the gene annotations:

symbol
desc
chromosome
size

Note

When trying to replace the required annotations with completely missing values for any genes, a warning will be given and the corresponding gene IDs will be saved in the attribute annotation.missing.genes. Note also that additional annotations beyond the required ones may be supplied and will be stored.

Examples

object <- hermes_data
head(annotation(object))
object <- hermes_data
head(annotation(object))

Standard Assertion Arguments

Description

The documentation to this function lists all the conventional arguments in additional checkmate assertions.

Arguments

`x`	an object to check.
`null.ok`	(`flag`) whether `x` may also be `NULL`.
`.var.name`	(`string`) name of the checked object to print in assertions; defaults to the heuristic implemented in `checkmate::vname()`.
`add`	(`AssertCollection` or `NULL`) collection to store assertion messages, see `checkmate::AssertCollection`.
`info`	(`string`) extra information to be included in the message for the `testthat` reporter, see `testthat::expect_that()`.
`label`	(`string`) name of the checked object to print in messages. Defaults to the heuristic implemented in `checkmate::vname()`.

Additional Assertions for `assert_that`

Description

We provide additional assertion functions which can be used together with assertthat::assert_that().

We provide additional assertion functions which can be used together with the checkmate functions. These are described in individual help pages linked below.

Usage

is_class(x, class2)

is_hermes_data(x)

is_counts_vector(x)

is_list_with(x, elements)

one_provided(one, two)

is_constant(x)
is_class(x, class2)

is_hermes_data(x)

is_counts_vector(x)

is_list_with(x, elements)

one_provided(one, two)

is_constant(x)

Arguments

`x`	an object to check.
`class2`	(`character` or class definition) the class to which `x` could belong.
`elements`	(`character`) names of elements which should be in the list `x`.
`one`	first input.
`two`	second input.

Value

Depending on the function prefix.

assert_ functions return the object invisibly if successful, and otherwise throw an error message.
check_ functions return TRUE if successful, otherwise a string with the error message.
test_ functions just return TRUE or FALSE.

Functions

is_class(): checks the class.
is_hermes_data(): checks whether x is an AnyHermesData object.
is_counts_vector(): checks for a vector of counts (positive integers).
is_list_with(): checks for a list containing elements.
one_provided(): checks that exactly one of the two inputs one, two is not NULL.
is_constant(): checks whether the vector x is constant (only supports numeric, factor, character, logical). NAs are removed first.

Examples

# Assert a general class.
a <- 5
is_class(a, "character")

# Assert a `AnyHermesData` object.
is_hermes_data(hermes_data)
is_hermes_data(42)

# Assert a counts vector.
a <- 5L
is_counts_vector(a)

# Assert a list containing certain elements.
b <- list(a = 5, b = 3)
is_list_with(b, c("a", "c"))
is_list_with(b, c("a", "b"))

# Assert that exactly one of two arguments is provided.
a <- 10
b <- 10
one_provided(a, b)
one_provided(a, NULL)

# Assert a constant vector.
is_constant(c(1, 2))
is_constant(c(NA, 1))
is_constant(c("a", "a"))
is_constant(factor(c("a", "a")))
# Assert a general class.
a <- 5
is_class(a, "character")

# Assert a `AnyHermesData` object.
is_hermes_data(hermes_data)
is_hermes_data(42)

# Assert a counts vector.
a <- 5L
is_counts_vector(a)

# Assert a list containing certain elements.
b <- list(a = 5, b = 3)
is_list_with(b, c("a", "c"))
is_list_with(b, c("a", "b"))

# Assert that exactly one of two arguments is provided.
a <- 10
b <- 10
one_provided(a, b)
one_provided(a, NULL)

# Assert a constant vector.
is_constant(c(1, 2))
is_constant(c(NA, 1))
is_constant(c("a", "a"))
is_constant(factor(c("a", "a")))

All Standard Plots in Default Setting

Description

This generates all standard plots - histogram and q-q plot of library sizes, density plot of the (log) counts distributions, boxplot of the number of number of non-zero expressed genes per sample, and a stacked barplot of low expression genes by chromosome at default setting.

Usage

## S4 method for signature 'AnyHermesData'
autoplot(object)
## S4 method for signature 'AnyHermesData'
autoplot(object)

Arguments

object

(AnyHermesData)
input.

Value

A list with the ggplot objects from draw_libsize_hist(), draw_libsize_qq(), draw_libsize_densities(), draw_nonzero_boxplot() and draw_genes_barplot() functions with default settings.

Examples

result <- hermes_data
autoplot(result)
result <- hermes_data
autoplot(result)

Principal Components Analysis Calculation

Description

The calc_pca() function performs principal components analysis of the gene count vectors across all samples.

A corresponding autoplot() method then can visualize the results.

Usage

calc_pca(object, assay_name = "counts", n_top = NULL)
calc_pca(object, assay_name = "counts", n_top = NULL)

Arguments

`object`	(`AnyHermesData`) input.
`assay_name`	(`string`) name of the assay to use.
`n_top`	(`count` or `NULL`) filter criteria based on number of genes with maximum variance.

Details

PCA should be performed after filtering out low quality genes and samples, as well as normalization of counts.
In addition, genes with constant counts across all samples are excluded from the analysis internally in calc_pca(). Centering and scaling is also applied internally.
Plots can be obtained with the ggplot2::autoplot() function with the corresponding method from the ggfortify package to plot the results of a principal components analysis saved in a HermesDataPca object. See ggfortify::autoplot.prcomp() for details.

Value

A HermesDataPca object which is an extension of the stats::prcomp class.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

result <- calc_pca(object, assay_name = "tpm")
summary(result)

result1 <- calc_pca(object, assay_name = "tpm", n_top = 500)
summary(result1)

# Plot the results.
autoplot(result)
autoplot(result, x = 2, y = 3)
autoplot(result, variance_percentage = FALSE)
autoplot(result, label = TRUE, label.repel = TRUE)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

result <- calc_pca(object, assay_name = "tpm")
summary(result)

result1 <- calc_pca(object, assay_name = "tpm", n_top = 500)
summary(result1)

# Plot the results.
autoplot(result)
autoplot(result, x = 2, y = 3)
autoplot(result, variance_percentage = FALSE)
autoplot(result, label = TRUE, label.repel = TRUE)

Concatenate and Print with Newline

Description

This function concatenates inputs like cat() and prints them with newline.

Usage

cat_with_newline(...)
cat_with_newline(...)

Arguments

...

inputs to concatenate.

Value

None, only used for the side effect of producing the concatenated output in the R console.

Examples

cat_with_newline("hello", "world")
cat_with_newline("hello", "world")

Column Binding of `AnyHermesData` Objects

Description

This method combines AnyHermesData objects with the same ranges but different samples (columns in assays).

Arguments

...

(AnyHermesData)
objects to column bind.

Value

The combined AnyHermesData object.

Note

Note that this just inherits SummarizedExperiment::cbind,SummarizedExperiment-method(). When binding a AnyHermesData object with a SummarizedExperiment::SummarizedExperiment object, then the result will be a SummarizedExperiment::SummarizedExperiment object (the more general class).
Note that the combined object needs to have unique sample IDs (column names).

Examples

a <- hermes_data[, 1:10]
b <- hermes_data[, 11:20]
result <- cbind(a, b)
class(result)
a <- hermes_data[, 1:10]
b <- hermes_data[, 11:20]
result <- cbind(a, b)
class(result)

Check for proportion

Description

Check whether x is a (single) proportion.

Usage

check_proportion(x, null.ok = FALSE)

assert_proportion(
  x,
  null.ok = FALSE,
  .var.name = checkmate::vname(x),
  add = NULL
)

test_proportion(x, null.ok = FALSE)

expect_proportion(x, null.ok = FALSE, info = NULL, label = vname(x))
check_proportion(x, null.ok = FALSE)

assert_proportion(
  x,
  null.ok = FALSE,
  .var.name = checkmate::vname(x),
  add = NULL
)

test_proportion(x, null.ok = FALSE)

expect_proportion(x, null.ok = FALSE, info = NULL, label = vname(x))

Arguments

`x`	an object to check.
`null.ok`	(`flag`) whether `x` may also be `NULL`.
`.var.name`	(`string`) name of the checked object to print in assertions; defaults to the heuristic implemented in `checkmate::vname()`.
`add`	(`AssertCollection` or `NULL`) collection to store assertion messages, see `checkmate::AssertCollection`.
`info`	(`string`) extra information to be included in the message for the `testthat` reporter, see `testthat::expect_that()`.
`label`	(`string`) name of the checked object to print in messages. Defaults to the heuristic implemented in `checkmate::vname()`.

Value

TRUE if successful, otherwise a string with the error message.

Examples

check_proportion(0.25)
check_proportion(0.25)

Sample Variables with Selected Gene Information

Description

This obtains the sample variables of a HermesData object together with selected gene information.

Usage

col_data_with_genes(object, assay_name, genes)
col_data_with_genes(object, assay_name, genes)

Arguments

`object`	(`AnyHermesData`) input experiment.
`assay_name`	(`string`) which assay to use.
`genes`	(`GeneSpec`) which genes or which gene signature should be extracted.

Value

The combined data set, where the additional attribute gene_cols contains the names of the columns obtained by extracting the genes information.

Note

The class of the returned data set will depend on the class of colData, so usually will be S4Vectors::DFrame.

Examples

result <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820"))
tail(names(result))
result$GeneID.1820
result <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820"))
tail(names(result))
result$GeneID.1820

Mean Z-score Gene Signature

Description

This helper function returns the Z-score from an assay stored as a matrix.

Usage

colMeanZscores(x)
colMeanZscores(x)

Arguments

`x`	(`matrix`) containing numeric data with genes in rows and samples in columns, no missing values are allowed.

Value

A numeric vector containing the mean Z-score values for each column in x.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize() %>%
  assay("counts")

colMeanZscores(object)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize() %>%
  assay("counts")

colMeanZscores(object)

First Principal Component (PC1) Gene Signature

Description

This helper function returns the first principal component from an assay stored as a matrix.

Usage

colPrinComp1(x, center = TRUE, scale = TRUE)
colPrinComp1(x, center = TRUE, scale = TRUE)

Arguments

`x`	(`matrix`) containing numeric data with genes in rows and samples in columns, no missing values are allowed.
`center`	(`flag`) whether the variables should be zero centered.
`scale`	(`flag`) whether the variables should be scaled to have unit variance.

Value

A numeric vector containing the principal component values for each column in x.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize() %>%
  assay("counts")

colPrinComp1(object)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize() %>%
  assay("counts")

colPrinComp1(object)

Connection to `BioMart`

Description

connect_biomart() creates a connection object of class ConnectionBiomart which contains the biomaRt object of class biomaRt::Mart and the prefix of the object which is used downstream for the query.

Usage

connect_biomart(prefix = c("ENSG", "GeneID"), version = NULL)
connect_biomart(prefix = c("ENSG", "GeneID"), version = NULL)

Arguments

`prefix`	(`string`) gene ID prefix.
`version`	(`string` or `NULL`) optional `Ensembl` version to use. If `NULL` the latest available release is used.

Details

This connects to the Ensembl data base of BioMart for human genes. A specific version can be optionally chosen to ensure reproducibility of results once a new release is available, as accessed data might then change.

Value

ConnectionBiomart object.

Examples

if (interactive()) {
  connection <- connect_biomart("ENSG")
}
if (interactive()) {
  connection <- connect_biomart("ENSG")
}

Control Settings for Counts Normalization

Description

This control function allows for easy customization of the normalization settings.

Usage

control_normalize(
  log = TRUE,
  lib_sizes = NULL,
  prior_count = 1,
  fit_type = "parametric"
)
control_normalize(
  log = TRUE,
  lib_sizes = NULL,
  prior_count = 1,
  fit_type = "parametric"
)

Arguments

`log`	(`flag`) whether `log2` values are returned, otherwise original scale is used.
`lib_sizes`	(`NULL` or `counts`) library sizes, if `NULL` the vector with the sum of the counts for each of the samples will be used.
`prior_count`	(non-negative `number`) average count to be added to each observation to avoid taking log of zero, used only when `log = TRUE`.
`fit_type`	(`string`) method to estimate dispersion parameters in Negative Binomial model, used only when `normalize()` methods include `vst` and/or `rlog`. See `estimateDispersions` for details.

Value

List with the above settings used to perform the normalization procedure.

Note

To be used with the normalize() function.

Examples

control_normalize()
control_normalize(log = FALSE, lib_sizes = rep(1e6L, 20))
control_normalize()
control_normalize(log = FALSE, lib_sizes = rep(1e6L, 20))

Control for Specified Quality Flags

Description

Control function which specifies the quality flag settings. One or more settings can be customized. Not specified settings are left at defaults.

Usage

control_quality(
  min_cpm = 1,
  min_cpm_prop = 0.25,
  min_corr = 0.5,
  min_depth = NULL
)
control_quality(
  min_cpm = 1,
  min_cpm_prop = 0.25,
  min_corr = 0.5,
  min_depth = NULL
)

Arguments

`min_cpm`	(non-negative `number`) minimum Counts per Million (`CPM`) for each gene within the sample.
`min_cpm_prop`	(`proportion`) minimum proportion of samples with acceptable `CPM` of certain gene for low expression flagging.
`min_corr`	(`proportion`) minimum Pearson correlation coefficient of `CPM` between samples for technical failure flagging.
`min_depth`	(non-negative `count` or `NULL`) minimum library depth for low depth flagging. If `NULL`, this will be calculated as the first quartile minus 1.5 times the inter-quartile range of the library size (depth) of all samples. (So anything below the usual lower boxplot whisker would be too low.)

Value

List with the above criteria to flag observations.

Note

To be used with the add_quality_flags() function.

Examples

# Default settings.
control_quality()

# One or more settings can be customized.
control_quality(min_cpm = 5, min_cpm_prop = 0.001)
# Default settings.
control_quality()

# One or more settings can be customized.
control_quality(min_cpm = 5, min_cpm_prop = 0.001)

Generic Function for Correlation Calculations

Description

New generic function to calculate correlations for one or two objects.

Usage

correlate(object, ...)
correlate(object, ...)

Arguments

`object`	input of which the class will be used to decide the method.
`...`	additional arguments.

Value

Corresponding object that contains the correlation results.

Examples

sample_cors <- correlate(hermes_data)
autoplot(sample_cors)

pca_sample_var_cors <- correlate(calc_pca(hermes_data), hermes_data)
autoplot(pca_sample_var_cors)
sample_cors <- correlate(hermes_data)
autoplot(sample_cors)

pca_sample_var_cors <- correlate(calc_pca(hermes_data), hermes_data)
autoplot(pca_sample_var_cors)

Correlation between Sample Counts of `AnyHermesData`

Description

The correlate() method can calculate the correlation matrix between the sample vectors of counts from a specified assay. This produces a HermesDataCor object, which is an extension of a matrix with additional quality flags in the slot flag_data (containing the tech_failure_flag and low_depth_flag columns describing the original input samples).

An autoplot() method then afterwards can produce the corresponding heatmap.

Usage

## S4 method for signature 'AnyHermesData'
correlate(object, assay_name = "counts", method = "pearson", ...)

## S4 method for signature 'HermesDataCor'
autoplot(
  object,
  flag_colors = c(`FALSE` = "green", `TRUE` = "red"),
  cor_colors = circlize::colorRamp2(c(0, 0.5, 1), c("red", "yellow", "green")),
  ...
)
## S4 method for signature 'AnyHermesData'
correlate(object, assay_name = "counts", method = "pearson", ...)

## S4 method for signature 'HermesDataCor'
autoplot(
  object,
  flag_colors = c(`FALSE` = "green", `TRUE` = "red"),
  cor_colors = circlize::colorRamp2(c(0, 0.5, 1), c("red", "yellow", "green")),
  ...
)

Arguments

`object`	(`AnyHermesData`) object to calculate the correlation.
`assay_name`	(`string`) the name of the assay to use.
`method`	(`string`) the correlation method, see `stats::cor()` for details.
`...`	other arguments to be passed to `ComplexHeatmap::Heatmap()`.
`flag_colors`	(named `character`) a vector that specifies the colors for `TRUE` and `FALSE` flag values.
`cor_colors`	(`function`) color scale function for the correlation values in the heatmap, produced by `circlize::colorRamp2()`.

Value

A HermesDataCor object.

Functions

autoplot(HermesDataCor): This autoplot() method uses the ComplexHeatmap::Heatmap() function to plot the correlations between samples saved in a HermesDataCor object.

Examples

object <- hermes_data

# Calculate the sample correlation matrix.
correlate(object)

# We can specify another correlation coefficient to be calculated.
result <- correlate(object, method = "spearman")

# Plot the correlation matrix.
autoplot(result)

# We can customize the heatmap.
autoplot(result, show_column_names = FALSE, show_row_names = FALSE)

# Including changing the axis label text size.
autoplot(
  result,
  row_names_gp = grid::gpar(fontsize = 8),
  column_names_gp = grid::gpar(fontsize = 8)
)
object <- hermes_data

# Calculate the sample correlation matrix.
correlate(object)

# We can specify another correlation coefficient to be calculated.
result <- correlate(object, method = "spearman")

# Plot the correlation matrix.
autoplot(result)

# We can customize the heatmap.
autoplot(result, show_column_names = FALSE, show_row_names = FALSE)

# Including changing the axis label text size.
autoplot(
  result,
  row_names_gp = grid::gpar(fontsize = 8),
  column_names_gp = grid::gpar(fontsize = 8)
)

Correlation of Principal Components with Sample Variables

Description

This correlate() method analyses the correlations (in R2 values) between all sample variables in a AnyHermesData object and the principal components of the samples.

A corresponding autoplot() method then can visualize the results in a heatmap.

Usage

## S4 method for signature 'HermesDataPca'
correlate(object, data)

## S4 method for signature 'HermesDataPcaCor'
autoplot(
  object,
  cor_colors = circlize::colorRamp2(c(-1, 0, 1), c("blue", "white", "red")),
  ...
)
## S4 method for signature 'HermesDataPca'
correlate(object, data)

## S4 method for signature 'HermesDataPcaCor'
autoplot(
  object,
  cor_colors = circlize::colorRamp2(c(-1, 0, 1), c("blue", "white", "red")),
  ...
)

Arguments

`object`	(`HermesDataPca`) input. It can be generated using `calc_pca()` function on `AnyHermesData`.
`data`	(`AnyHermesData`) input that was used originally for the PCA.
`cor_colors`	(`function`) color scale function for the correlation values in the heatmap, produced by `circlize::colorRamp2()`.
`...`	other arguments to be passed to `ComplexHeatmap::Heatmap()`.

Value

A HermesDataPcaCor object with R2 values for all sample variables.

Functions

autoplot(HermesDataPcaCor): This plot method uses the ComplexHeatmap::Heatmap() function to visualize a HermesDataPcaCor object.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Perform PCA and then correlate the prinicipal components with the sample variables.
object_pca <- calc_pca(object)
result <- correlate(object_pca, object)

# Visualize the correlations in a heatmap.
autoplot(result)

# We can also choose to not reorder the columns.
autoplot(result, cluster_columns = FALSE)

# We can also choose break-points for color customization.
autoplot(
  result,
  cor_colors = circlize::colorRamp2(
    c(-0.5, -0.25, 0, 0.25, 0.5, 0.75, 1),
    c("blue", "green", "purple", "yellow", "orange", "red", "brown")
  )
)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Perform PCA and then correlate the prinicipal components with the sample variables.
object_pca <- calc_pca(object)
result <- correlate(object_pca, object)

# Visualize the correlations in a heatmap.
autoplot(result)

# We can also choose to not reorder the columns.
autoplot(result, cluster_columns = FALSE)

# We can also choose break-points for color customization.
autoplot(
  result,
  cor_colors = circlize::colorRamp2(
    c(-0.5, -0.25, 0, 0.25, 0.5, 0.75, 1),
    c("blue", "green", "purple", "yellow", "orange", "red", "brown")
  )
)

Counts Accessor and Setter

Description

These methods access and set the counts assay in a AnyHermesData object.

Usage

## S4 method for signature 'AnyHermesData'
counts(object, ...)

## S4 replacement method for signature 'AnyHermesData,matrix'
counts(object, ..., withDimnames = TRUE) <- value
## S4 method for signature 'AnyHermesData'
counts(object, ...)

## S4 replacement method for signature 'AnyHermesData,matrix'
counts(object, ..., withDimnames = TRUE) <- value

Arguments

`object`	(`AnyHermesData`) object to access the counts from.
`...`	not used.
`withDimnames`	(`flag`) setting `withDimnames =FALSE` in the setter (`⁠counts<-⁠`) is required when the `dimnames` on the supplied counts assay are not identical to the `dimnames` on the `AnyHermesData` object; it does not influence actual assignment of `dimnames` to the assay (they're always stored as-is).
`value`	(`matrix`) what should the counts assay be replaced with.

Value

The counts assay.

Methods (by class)

counts(object = AnyHermesData) <- value:

Examples

a <- hermes_data
result <- counts(a)
class(result)
head(result)
counts(a) <- counts(a) + 100L
head(counts(a))
a <- hermes_data
result <- counts(a)
class(result)
head(result)
counts(a) <- counts(a) + 100L
head(counts(a))

Cutting a Numeric Vector into a Factor of Quantile Bins

Description

This function transforms a numeric vector into a factor corresponding to the quantile intervals. The intervals are left-open and right-closed.

Usage

cut_quantile(x, percentiles = c(1/3, 2/3), digits = 0)
cut_quantile(x, percentiles = c(1/3, 2/3), digits = 0)

Arguments

`x`	(`numeric`) the continuous variable values which should be cut into quantile bins. `NA` values are not taken into account when computing quantiles and are attributed to the `NA` interval.
`percentiles`	(`proportions`) the required percentiles for the quantile intervals to be generated. Duplicated values are removed.
`digits`	(`integer`) the precision to use when formatting the percentages.

Value

The factor with a description of the available quantiles as levels.

Examples

set.seed(452)
x <- runif(10, -10, 10)
cut_quantile(x, c(0.33333333, 0.6666666), digits = 4)

x[1:4] <- NA
cut_quantile(x)
set.seed(452)
x <- runif(10, -10, 10)
cut_quantile(x, c(0.33333333, 0.6666666), digits = 4)

x[1:4] <- NA
cut_quantile(x)

Conversion of Eligible Columns to Factor Variables in a `DataFrame`

Description

This utility function converts all eligible character and logical variables in a S4Vectors::DataFrame to factor variables. All factor variables get amended with an explicit missing level. This includes both NA and empty strings.

Usage

df_cols_to_factor(data, omit_columns = NULL, na_level = "<Missing>")
df_cols_to_factor(data, omit_columns = NULL, na_level = "<Missing>")

Arguments

`data`	(`DataFrame`) input `S4Vectors::DataFrame`.
`omit_columns`	(`character` or `NULL`) which columns should be omitted from the possible conversion to factor and explicit missing level application.
`na_level`	(`string`) explicit missing level to be used for factor variables.

Value

The modified data.

Note

All required rowData and colData variables cannot be converted to ensure proper downstream behavior. These are automatically omitted if found in data and therefore do not need to be specified in omit_columns.

Examples

dat <- colData(summarized_experiment)
any(vapply(dat, is.character, logical(1)))
any(vapply(dat, is.logical, logical(1)))
dat_converted <- df_cols_to_factor(dat)
any(vapply(dat_converted, function(x) is.character(x) || is.logical(x), logical(1)))
dat <- colData(summarized_experiment)
any(vapply(dat, is.character, logical(1)))
any(vapply(dat, is.logical, logical(1)))
dat_converted <- df_cols_to_factor(dat)
any(vapply(dat_converted, function(x) is.character(x) || is.logical(x), logical(1)))

Differential Expression Analysis

Description

The diff_expression() function performs differential expression analysis using a method of preference.

A corresponding autoplot() method is visualizing the results as a volcano plot.

Usage

diff_expression(object, group, method = c("voom", "deseq2"), ...)

## S4 method for signature 'HermesDataDiffExpr'
autoplot(object, adj_p_val_thresh = 0.05, log2_fc_thresh = 2.5)
diff_expression(object, group, method = c("voom", "deseq2"), ...)

## S4 method for signature 'HermesDataDiffExpr'
autoplot(object, adj_p_val_thresh = 0.05, log2_fc_thresh = 2.5)

Arguments

`object`	(`AnyHermesData`) input. Note that this function only uses the original counts for analysis, so this does not need to be normalized.
`group`	(`string`) name of factor variable with 2 levels in `colData(object)`. These 2 levels will be compared in the differential expression analysis.
`method`	(`string`) method for differential expression analysis, see details below.
`...`	additional arguments passed to the helper function associated with the selected method.
`adj_p_val_thresh`	(`proportion`) threshold on the adjusted p-values (y-axis) to flag significance.
`log2_fc_thresh`	(`number`) threshold on the absolute log2 fold-change (x-axis) to flag up- or down-regulation of transcription.

Details

Possible method choices are:

voom: uses limma::voom(), see h_diff_expr_voom() for details.
deseq2: uses DESeq2::DESeq(), see h_diff_expr_deseq2() for details.

Value

A HermesDataDiffExpr object which is a data frame with the following columns for each gene in the HermesData object:

log2_fc (the estimate of the log2 fold change between the 2 levels of the provided factor)
stat (the test statistic, which one depends on the method used)
p_val (the raw p-value)
adj_p_val (the multiplicity adjusted p-value value)

Functions

autoplot(HermesDataDiffExpr): generates a volcano plot for a HermesDataDiffExpr object.

Note

We provide the df_cols_to_factor() utility function that makes it easy to convert the colData() character and logical variables to factors, so that they can be subsequently used as group inputs. See the example.
In order to avoid a warning when using deseq2, it can be necessary to specify fitType = "local" as additional argument. This could e.g. be the case when only few samples are present in which case the default parametric dispersions estimation will not work.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter()

# Convert character and logical to factor variables in `colData`,
# including the below used `group` variable.
colData(object) <- df_cols_to_factor(colData(object))
res1 <- diff_expression(object, group = "SEX", method = "voom")
head(res1)
res2 <- diff_expression(object, group = "SEX", method = "deseq2")
head(res2)

# Pass method arguments to the internally used helper functions.
res3 <- diff_expression(object, group = "SEX", method = "voom", robust = TRUE, trend = TRUE)
head(res3)
res4 <- diff_expression(object, group = "SEX", method = "deseq2", fitType = "local")
head(res4)

# Create the corresponding volcano plots.
autoplot(res1)
autoplot(res3)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter()

# Convert character and logical to factor variables in `colData`,
# including the below used `group` variable.
colData(object) <- df_cols_to_factor(colData(object))
res1 <- diff_expression(object, group = "SEX", method = "voom")
head(res1)
res2 <- diff_expression(object, group = "SEX", method = "deseq2")
head(res2)

# Pass method arguments to the internally used helper functions.
res3 <- diff_expression(object, group = "SEX", method = "voom", robust = TRUE, trend = TRUE)
head(res3)
res4 <- diff_expression(object, group = "SEX", method = "deseq2", fitType = "local")
head(res4)

# Create the corresponding volcano plots.
autoplot(res1)
autoplot(res3)

Barplot for Gene Expression Percentiles

Description

This produces a barplot of the dichotomized gene expression counts into two or three categories based on custom defined percentiles.

Usage

draw_barplot(
  object,
  assay_name,
  x_spec,
  facet_var = NULL,
  fill_var = NULL,
  percentiles = c(1/3, 2/3)
)
draw_barplot(
  object,
  assay_name,
  x_spec,
  facet_var = NULL,
  fill_var = NULL,
  percentiles = c(1/3, 2/3)
)

Arguments

`object`	(`AnyHermesData`) input.
`assay_name`	(`string`) selects assay from input.
`x_spec`	(`GeneSpec`) gene specification for the x-axis.
`facet_var`	(`string` or `NULL`) optional faceting variable, taken from input sample variables.
`fill_var`	(`string` or `NULL`) optional fill variable, taken from input sample variables.
`percentiles`	(`vector`) lower and upper percentiles to dichotomize the gene counts into two or three categories.

Value

The ggplot barplot.

Examples

object <- hermes_data

g <- genes(object)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1]),
  facet_var = "SEX",
  fill_var = "AGE18"
)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1:3], colMedians, "Median"),
  facet_var = "SEX",
  fill_var = "AGE18"
)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1:3], colMeans, "Mean"),
  facet_var = "SEX",
  fill_var = "AGE18",
  percentiles = c(0.1, 0.9)
)
object <- hermes_data

g <- genes(object)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1]),
  facet_var = "SEX",
  fill_var = "AGE18"
)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1:3], colMedians, "Median"),
  facet_var = "SEX",
  fill_var = "AGE18"
)

draw_barplot(
  object,
  assay_name = "counts",
  x_spec = gene_spec(g[1:3], colMeans, "Mean"),
  facet_var = "SEX",
  fill_var = "AGE18",
  percentiles = c(0.1, 0.9)
)

Boxplot for Gene Expression Values

Description

This produces boxplots of the gene expression values of a single gene, multiple genes or a gene signature.

Usage

draw_boxplot(
  object,
  assay_name,
  genes,
  x_var = NULL,
  color_var = NULL,
  facet_var = NULL,
  violin = FALSE,
  jitter = FALSE
)

h_draw_boxplot_df(object, assay_name, genes, x_var, color_var, facet_var)
draw_boxplot(
  object,
  assay_name,
  genes,
  x_var = NULL,
  color_var = NULL,
  facet_var = NULL,
  violin = FALSE,
  jitter = FALSE
)

h_draw_boxplot_df(object, assay_name, genes, x_var, color_var, facet_var)

Arguments

`object`	(`AnyHermesData`) input.
`assay_name`	(`string`) selects assay from input for the y-axis.
`genes`	(`GeneSpec`) for which genes or which gene signature to produce boxplots.
`x_var`	(`string` or `NULL`) optional stratifying variable for the x-axis, taken from input sample variables.
`color_var`	(`string` or `NULL`) optional color variable, taken from input sample variables.
`facet_var`	(`string` or `NULL`) optional faceting variable, taken from input sample variables.
`violin`	(`flag`) whether to draw a violin plot instead of a boxplot.
`jitter`	(`flag`) whether to add jittered original data points.

Value

The ggplot boxplot.

Functions

h_draw_boxplot_df(): Helper function to prepare the data frame required for plotting.

Examples

object <- hermes_data
draw_boxplot(
  object,
  assay_name = "counts",
  genes = gene_spec(c(A = genes(object)[1])),
  violin = TRUE
)

object2 <- object %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()
draw_boxplot(
  object2,
  assay_name = "tpm",
  x_var = "SEX",
  genes = gene_spec(setNames(genes(object2)[1:10], 1:10), fun = colMeans),
  facet_var = "RACE",
  color_var = "AGE18",
  jitter = TRUE
)

draw_boxplot(
  object,
  assay_name = "counts",
  x_var = "SEX",
  genes = gene_spec(genes(object)[1:3]),
  jitter = TRUE,
  facet_var = "AGE18"
)

draw_boxplot(
  object,
  assay_name = "counts",
  genes = gene_spec(c(A = "GeneID:11185", B = "GeneID:10677")),
  violin = TRUE
)
object <- hermes_data
draw_boxplot(
  object,
  assay_name = "counts",
  genes = gene_spec(c(A = genes(object)[1])),
  violin = TRUE
)

object2 <- object %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()
draw_boxplot(
  object2,
  assay_name = "tpm",
  x_var = "SEX",
  genes = gene_spec(setNames(genes(object2)[1:10], 1:10), fun = colMeans),
  facet_var = "RACE",
  color_var = "AGE18",
  jitter = TRUE
)

draw_boxplot(
  object,
  assay_name = "counts",
  x_var = "SEX",
  genes = gene_spec(genes(object)[1:3]),
  jitter = TRUE,
  facet_var = "AGE18"
)

draw_boxplot(
  object,
  assay_name = "counts",
  genes = gene_spec(c(A = "GeneID:11185", B = "GeneID:10677")),
  violin = TRUE
)

Stacked Barplot of Low Expression Genes by Chromosome

Description

This creates a barplot of chromosomes for the AnyHermesData object with the proportions of low expression genes.

Usage

draw_genes_barplot(
  object,
  chromosomes = c(seq_len(22), "X", "Y", "MT"),
  include_others = TRUE
)
draw_genes_barplot(
  object,
  chromosomes = c(seq_len(22), "X", "Y", "MT"),
  include_others = TRUE
)

Arguments

`object`	(`AnyHermesData`) input.
`chromosomes`	(`character`) names of the chromosomes which should be displayed.
`include_others`	(`flag`) option to show the chromosomes not in `chromosomes` as "Others".

Value

The ggplot object with the histogram.

Examples

object <- hermes_data

# Display chromosomes 1-22, X, Y, and MT. Other chromosomes are displayed in "Others".
# To increase readability, we can have flip the coordinate axes.
draw_genes_barplot(object) + coord_flip()

# Alternatively we can also rotate the x-axis tick labels.
draw_genes_barplot(object) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

# Display chromosomes 1 and 2. Other chromosomes are displayed in "Others".
draw_genes_barplot(object, chromosomes = c("1", "2"))

# Display chromosomes 1 and 2 only.
draw_genes_barplot(object, chromosomes = c("1", "2"), include_others = FALSE)
object <- hermes_data

# Display chromosomes 1-22, X, Y, and MT. Other chromosomes are displayed in "Others".
# To increase readability, we can have flip the coordinate axes.
draw_genes_barplot(object) + coord_flip()

# Alternatively we can also rotate the x-axis tick labels.
draw_genes_barplot(object) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

# Display chromosomes 1 and 2. Other chromosomes are displayed in "Others".
draw_genes_barplot(object, chromosomes = c("1", "2"))

# Display chromosomes 1 and 2 only.
draw_genes_barplot(object, chromosomes = c("1", "2"), include_others = FALSE)

Heatmap for Gene Expression Counts

Description

This produces a heatmap of the chosen assay and groups by various sample variables.

Usage

draw_heatmap(
  object,
  assay_name,
  color_extremes = c(0.01, 0.99),
  col_data_annotation = NULL,
  ...
)
draw_heatmap(
  object,
  assay_name,
  color_extremes = c(0.01, 0.99),
  col_data_annotation = NULL,
  ...
)

Arguments

`object`	(`AnyHermesData`) input.
`assay_name`	(`string`) selects assay from input.
`color_extremes`	(`numeric`) min and max percentiles to inform the color scheme of the heatmap as blue and red respectively.
`col_data_annotation`	(`character` or `NULL`) optional grouping variable(s), taken from input sample variables.
`...`	additional arguments to pass to `ComplexHeatmap::Heatmap()`.

Value

The ComplexHeatmap::Heatmap heatmap

Examples

result <- hermes_data %>%
  normalize(methods = "voom") %>%
  add_quality_flags() %>%
  filter(what = "genes")

draw_heatmap(
  object = result[1:10, ],
  assay_name = "counts",
  col_data_annotation = "COUNTRY"
)

draw_heatmap(
  object = result[1:10, ],
  assay_name = "counts",
  color_extremes = c(0.001, 0.999),
  col_data_annotation = "AGEGRP"
)
result <- hermes_data %>%
  normalize(methods = "voom") %>%
  add_quality_flags() %>%
  filter(what = "genes")

draw_heatmap(
  object = result[1:10, ],
  assay_name = "counts",
  col_data_annotation = "COUNTRY"
)

draw_heatmap(
  object = result[1:10, ],
  assay_name = "counts",
  color_extremes = c(0.001, 0.999),
  col_data_annotation = "AGEGRP"
)

Density Plot of (Log) Counts Distributions

Description

This creates a density plot of the (log) counts distributions of the AnyHermesData object where each line on the plot corresponds to a sample.

Usage

draw_libsize_densities(object, log = TRUE)
draw_libsize_densities(object, log = TRUE)

Arguments

`object`	(`AnyHermesData`) input.
`log`	(`flag`) should the counts be log transformed (log2).

Value

The ggplot object with the density plot.

Examples

result <- hermes_data
draw_libsize_densities(result)
draw_libsize_densities(result, log = FALSE)
result <- hermes_data
draw_libsize_densities(result)
draw_libsize_densities(result, log = FALSE)

Histogram of Library Sizes

Description

This creates a histogram of the library sizes of the AnyHermesData object.

Usage

draw_libsize_hist(object, bins = 30L, fill = "darkgrey")
draw_libsize_hist(object, bins = 30L, fill = "darkgrey")

Arguments

`object`	(`AnyHermesData`) input.
`bins`	(`count`) number of evenly distributed groups desired.
`fill`	(`string`) color of the bars filling.

Value

The ggplot object with the histogram.

Examples

result <- hermes_data
draw_libsize_hist(result)
draw_libsize_hist(result, bins = 10L, fill = "blue")
result <- hermes_data
draw_libsize_hist(result)
draw_libsize_hist(result, bins = 10L, fill = "blue")

Q-Q Plot of Library Sizes

Description

This creates a Q-Q plot of the library sizes of the AnyHermesData object.

Usage

draw_libsize_qq(object, color = "grey", linetype = "dashed")
draw_libsize_qq(object, color = "grey", linetype = "dashed")

Arguments

`object`	(`AnyHermesData`) input.
`color`	(`string`) color of Q-Q line.
`linetype`	(`string`) line type of Q-Q line.

Value

The ggplot object with the Q-Q Plot.

Examples

result <- hermes_data
draw_libsize_qq(result)
draw_libsize_qq(result, color = "blue", linetype = "solid")

# We can also add sample names as labels.
library(ggrepel)
draw_libsize_qq(result) + geom_text_repel(label = colnames(result), stat = "qq")
result <- hermes_data
draw_libsize_qq(result)
draw_libsize_qq(result, color = "blue", linetype = "solid")

# We can also add sample names as labels.
library(ggrepel)
draw_libsize_qq(result) + geom_text_repel(label = colnames(result), stat = "qq")

Boxplot of Non-Zero Genes

Description

This draws a boxplot, with overlaid data points, of the number of non-zero expressed genes per sample.

Usage

draw_nonzero_boxplot(object, position = position_jitter(0.2), alpha = 0.25)
draw_nonzero_boxplot(object, position = position_jitter(0.2), alpha = 0.25)

Arguments

`object`	(`AnyHermesData`) input.
`position`	(`Position`) specifies x-axis position of points, e.g. for jittering.
`alpha`	(`proportion`) specifies transparency of points.

Value

The ggplot object with the boxplot.

Examples

# Default boxplot.
result <- hermes_data
draw_nonzero_boxplot(result)

# Reusing the same position for labeling.
library(ggrepel)
pos <- position_jitter(0.5)
draw_nonzero_boxplot(result, position = pos) +
  geom_text_repel(aes(label = samples(result)), position = pos)
# Default boxplot.
result <- hermes_data
draw_nonzero_boxplot(result)

# Reusing the same position for labeling.
library(ggrepel)
pos <- position_jitter(0.5)
draw_nonzero_boxplot(result, position = pos) +
  geom_text_repel(aes(label = samples(result)), position = pos)

Scatterplot for Gene Expression Values

Description

This produces a scatterplot of two genes or gene signatures.

Usage

draw_scatterplot(
  object,
  assay_name,
  x_spec,
  y_spec,
  color_var = NULL,
  facet_var = NULL,
  smooth_method = c("lm", "loess", "none")
)
draw_scatterplot(
  object,
  assay_name,
  x_spec,
  y_spec,
  color_var = NULL,
  facet_var = NULL,
  smooth_method = c("lm", "loess", "none")
)

Arguments

`object`	(`AnyHermesData`) input.
`assay_name`	(`string`) selects assay from input.
`x_spec`	(`GeneSpec`) gene specification for the x-axis.
`y_spec`	(`GeneSpec`) gene specification for the y-axis.
`color_var`	(`string` or `NULL`) optional color variable, taken from input sample variables.
`facet_var`	(`string` or `NULL`) optional faceting variable, taken from input sample variables.
`smooth_method`	(`string`) smoothing method to use, either linear regression line (`lm`), local polynomial regression (`loess`) or `none`.

Value

The ggplot scatterplot.

Examples

object <- hermes_data
g <- genes(object)

draw_scatterplot(
  object,
  assay_name = "counts",
  facet_var = NULL,
  x_spec = gene_spec(c(A = g[1])),
  y_spec = gene_spec(g[2]),
  color = "RACE"
)

object2 <- object %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()
g2 <- genes(object2)

draw_scatterplot(
  object2,
  assay_name = "tpm",
  facet_var = "SEX",
  x_spec = gene_spec(g2[1:10], colMeans, "Mean"),
  y_spec = gene_spec(g2[11:20], colMedians, "Median"),
  smooth_method = "loess"
)
object <- hermes_data
g <- genes(object)

draw_scatterplot(
  object,
  assay_name = "counts",
  facet_var = NULL,
  x_spec = gene_spec(c(A = g[1])),
  y_spec = gene_spec(g[2]),
  color = "RACE"
)

object2 <- object %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()
g2 <- genes(object2)

draw_scatterplot(
  object2,
  assay_name = "tpm",
  facet_var = "SEX",
  x_spec = gene_spec(g2[1:10], colMeans, "Mean"),
  y_spec = gene_spec(g2[11:20], colMedians, "Median"),
  smooth_method = "loess"
)

Example `ExpressionSet` Data

Description

This example data can be used to try out conversion of a Biobase::ExpressionSet object into a HermesData object.

Usage

expression_set
expression_set

Format

A Biobase::ExpressionSet object with 20 samples covering 5085 features (Entrez gene IDs).

Source

This is an artificial dataset designed to resemble real data.

Extra Variable Names Accessor Methods

Description

The methods access the names of the variables in colData() and rowData() of the object which are not required by design. So these can be additional sample or patient characteristics, or gene characteristics.

Usage

extraColDataNames(x, ...)

## S4 method for signature 'AnyHermesData'
extraColDataNames(x, ...)

extraRowDataNames(x, ...)

## S4 method for signature 'AnyHermesData'
extraRowDataNames(x, ...)
extraColDataNames(x, ...)

## S4 method for signature 'AnyHermesData'
extraColDataNames(x, ...)

extraRowDataNames(x, ...)

## S4 method for signature 'AnyHermesData'
extraRowDataNames(x, ...)

Arguments

`x`	(`AnyHermesData`) object.
`...`	not used.

Value

The character vector with the additional variable names in either colData() or rowData().

Examples

object <- hermes_data
extraColDataNames(object)
extraRowDataNames(object)
object <- hermes_data
extraColDataNames(object)
extraRowDataNames(object)

Filter `AnyHermesData` on Subset Passing Default QC Flags

Description

This filters a AnyHermesData object using the default QC flags and required annotations.

Usage

filter(object, ...)

## S4 method for signature 'AnyHermesData'
filter(object, what = c("genes", "samples"), annotation_required = "size")

## S4 method for signature 'data.frame'
filter(object, ...)

## S4 method for signature 'ts'
filter(object, ...)
filter(object, ...)

## S4 method for signature 'AnyHermesData'
filter(object, what = c("genes", "samples"), annotation_required = "size")

## S4 method for signature 'data.frame'
filter(object, ...)

## S4 method for signature 'ts'
filter(object, ...)

Arguments

`object`	(`AnyHermesData`) object to filter.
`...`	additional arguments.
`what`	(`character`) specify whether to apply the filter on `genes` and / or `samples`.
`annotation_required`	(`character`) names of required `annotation` columns for genes. Only used when `genes` are filtered.

Details

Only genes without low expression (low_expression_flag) and samples without low depth (low_depth_flag) or technical failure (tech_failure_flag) remain in the returned filtered object.
Also required gene annotation columns can be specified, so that genes which are not complete for these columns are filtered out. By default this is the size column, which is needed for default normalization of the object.

Value

The filtered AnyHermesData object.

Note

The internal implementation cannot use the subset() method since that requires non-standard evaluation of arguments.

Examples

a <- hermes_data
dim(a)

# Filter genes and samples on default QC flags.
result <- filter(a)
dim(result)

# Filter only genes without low expression.
result <- filter(a, what = "genes")

# Filter only samples with low depth and technical failure.
result <- filter(a, what = "samples")

# Filter only genes, and require certain annotations to be present.
result <- filter(a, what = "genes", annotation_required = c("size"))
a <- hermes_data
dim(a)

# Filter genes and samples on default QC flags.
result <- filter(a)
dim(result)

# Filter only genes without low expression.
result <- filter(a, what = "genes")

# Filter only samples with low depth and technical failure.
result <- filter(a, what = "samples")

# Filter only genes, and require certain annotations to be present.
result <- filter(a, what = "genes", annotation_required = c("size"))

`GeneSpec` Constructor

Description

Creates a new GeneSpec object.

Usage

gene_spec(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))
gene_spec(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))

Arguments

`genes`	(named `character` or `NULL`) the gene IDs, where the names are used as labels if available.
`fun`	(`function` or `NULL`) summary function. If `NULL` is used then multiple genes are not summarized but returned as a matrix from the `extract` method.
`fun_name`	(`string`) name of the summary function.

Value

A new GeneSpec object.

Examples

gene_spec("GeneID:11185")
gene_spec(c("GeneID:11185", "GeneID:10677", "GeneID:101928428"), fun = colMeans)
gene_spec("GeneID:11185")
gene_spec(c("GeneID:11185", "GeneID:10677", "GeneID:101928428"), fun = colMeans)

Gene IDs Accessor

Description

Access the gene IDs, i.e. row names, of a AnyHermesData object with a nicely named accessor method.

Usage

genes(object)

## S4 method for signature 'AnyHermesData'
genes(object)
genes(object)

## S4 method for signature 'AnyHermesData'
genes(object)

Arguments

object

(AnyHermesData)
input.

Value

The character vector with the gene IDs.

Examples

a <- hermes_data
genes(a)
a <- hermes_data
genes(a)

R6 Class Representing a Gene (Signature) Specification

Description

A GeneSpec consists of the gene IDs (possibly named with labels), the summary function and the name of the summary function.

Methods

Method `new()`

Creates a new GeneSpec object.

Usage

GeneSpec$new(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))

Arguments

genes: (named character or NULL)
the gene IDs, where the names are used as labels if available.
fun: (function or NULL)
summary function. If NULL is used then multiple genes are not summarized but returned as a matrix from the extract method.
fun_name: (string)
name of the summary function.

Returns

A new GeneSpec object.

Method `get_genes()`

Returns the genes.

Usage

GeneSpec$get_genes()

Method `get_gene_labels()`

Returns the gene labels (substituted by gene IDs if not available).

Usage

GeneSpec$get_gene_labels(genes = self$get_genes())

Arguments

genes: (character)
for which subset of genes the labels should be returned.

Method `returns_vector()`

Predicate whether the extract returns a vector or not.

Usage

GeneSpec$returns_vector()

Method `get_label()`

Returns a string which can be used e.g. for plot labels.

Usage

GeneSpec$get_label(genes = self$get_genes())

Arguments

genes: (character)
for which subset of genes the labels should be returned.

Method `extract()`

Extract the gene values from an assay as specified.

Usage

GeneSpec$extract(assay)

Arguments

assay: (matrix)
original matrix with rownames containing the specified genes.

Returns

Either a vector with one value per column, or a matrix with multiple genes in the rows.

Method `extract_data_frame()`

Extract the gene values as a data.frame.

Usage

GeneSpec$extract_data_frame(assay)

Arguments

assay: (matrix)
original matrix with rownames containing the specified genes.

Returns

A data.frame with the genes in the columns and the samples in the rows.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GeneSpec$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Minimal specification if only one gene is used.
x_spec <- gene_spec("GeneID:1820")

# Using multiple genes with a signature.
x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colMeans)
x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colPrinComp1)
x_spec$returns_vector()
x_spec$get_genes()
x_spec$get_gene_labels()
x_spec$get_label()

# Using multiple genes with partial labels, without a signature.
x_spec <- gene_spec(c(A = "GeneID:1820", "GeneID:52"))
x_spec$returns_vector()
x_spec$get_gene_labels()

# Use the gene specification to extract genes from a matrix.
mat <- matrix(
  data = rpois(15, 10),
  nrow = 3, ncol = 5,
  dimnames = list(c("GeneID:1820", "GeneID:52", "GeneID:523"), NULL)
)
x_spec$extract(mat)

# We can also extract these as a `data.frame`.
x_spec$extract_data_frame(mat)
# Minimal specification if only one gene is used.
x_spec <- gene_spec("GeneID:1820")

# Using multiple genes with a signature.
x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colMeans)
x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colPrinComp1)
x_spec$returns_vector()
x_spec$get_genes()
x_spec$get_gene_labels()
x_spec$get_label()

# Using multiple genes with partial labels, without a signature.
x_spec <- gene_spec(c(A = "GeneID:1820", "GeneID:52"))
x_spec$returns_vector()
x_spec$get_gene_labels()

# Use the gene specification to extract genes from a matrix.
mat <- matrix(
  data = rpois(15, 10),
  nrow = 3, ncol = 5,
  dimnames = list(c("GeneID:1820", "GeneID:52", "GeneID:523"), NULL)
)
x_spec$extract(mat)

# We can also extract these as a `data.frame`.
x_spec$extract_data_frame(mat)

Finding All Duplicates in Vector

Description

The difference here to duplicated() is that also the first occurrence of a duplicate is flagged as TRUE.

Usage

h_all_duplicated(x)
h_all_duplicated(x)

Arguments

`x`	a vector or a data frame or an array or `NULL`.

Value

Logical vector flagging all occurrences of duplicate values as TRUE.

Examples

h_all_duplicated(c("a", "a", "b"))
duplicated(c("a", "a", "b"))
h_all_duplicated(c("a", "a", "b"))
duplicated(c("a", "a", "b"))

Conversion to Factors with Explicit Missing Level in a `data.frame`

Description

This helper function converts all character and logical variables to factor variables in a data.frame. It also sets an explicit missing data level for all factor variables that have at least one NA. Empty strings are handled as NA.

Usage

h_df_factors_with_explicit_na(data, na_level = "<Missing>")
h_df_factors_with_explicit_na(data, na_level = "<Missing>")

Arguments

`data`	(`data.frame`) input data with at least one column.
`na_level`	(`string`) explicit missing level to be used.

Value

The modified data.

Examples

dat <- data.frame(
  a = c(NA, 2),
  b = c("A", NA),
  c = c("C", "D"),
  d = factor(c(NA, "X")),
  e = factor(c("Y", "Z"))
)
h_df_factors_with_explicit_na(dat)
dat <- data.frame(
  a = c(NA, 2),
  b = c("A", NA),
  c = c("C", "D"),
  d = factor(c(NA, "X")),
  e = factor(c("Y", "Z"))
)
h_df_factors_with_explicit_na(dat)

`DESeq2` Differential Expression Analysis

Description

This helper functions performs the differential expression analysis with DESeq2::DESeq() for a given AnyHermesData input and design matrix.

Usage

h_diff_expr_deseq2(object, design, ...)
h_diff_expr_deseq2(object, design, ...)

Arguments

`object`	(`HermesData`) input.
`design`	(`matrix`) design matrix.
`...`	additional arguments internally passed to `DESeq2::DESeq()` (`fitType`, `sfType`, `minReplicatesForReplace`, `useT`, `minmu`).

Value

A data frame with columns log2_fc (estimated log2 fold change), stat (Wald statistic), p_val (raw p-value), adj_p_pval (Benjamini-Hochberg adjusted p-value).

References

Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8.

Examples

object <- hermes_data

# Create the design matrix corresponding to the factor of interest.
design <- model.matrix(~SEX, colData(object))

# Then perform the `DESeq2` differential expression analysis.
result <- h_diff_expr_deseq2(object, design)
head(result)

# Change of the `fitType` can be required in some cases.
result2 <- h_diff_expr_deseq2(object, design, fitType = "local")
head(result2)
object <- hermes_data

# Create the design matrix corresponding to the factor of interest.
design <- model.matrix(~SEX, colData(object))

# Then perform the `DESeq2` differential expression analysis.
result <- h_diff_expr_deseq2(object, design)
head(result)

# Change of the `fitType` can be required in some cases.
result2 <- h_diff_expr_deseq2(object, design, fitType = "local")
head(result2)

`limma`/voom Differential Expression Analysis

Description

This helper functions performs the differential expression analysis with the voom method from the limma package (via limma::voom(), limma::lmFit() and limma::eBayes()) for given counts in a AnyHermesData object and a corresponding design matrix.

Usage

h_diff_expr_voom(object, design, ...)
h_diff_expr_voom(object, design, ...)

Arguments

`object`	(`AnyHermesData`) input.
`design`	(`matrix`) design matrix.
`...`	additional arguments internally passed to `limma::eBayes()` (`robust`, `trend`, `proportion`, `winsor.tail.p`, `stdev.coef.lim`).

Value

A data frame with columns log2_fc (estimated log2 fold change), stat (moderated t-statistic), p_val (raw p-value), adj_p_pval (Benjamini-Hochberg adjusted p-value).

References

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007.

Law CW, Chen Y, Shi W, Smyth GK (2014). “voom: precision weights unlock linear model analysis tools for RNA-seq read counts.” Genome Biology, 15(2), R29. doi:10.1186/gb-2014-15-2-r29.

Examples

object <- hermes_data

# Create the design matrix corresponding to the factor of interest.
design <- model.matrix(~SEX, colData(object))

# Then perform the differential expression analysis.
result <- h_diff_expr_voom(object, design)
head(result)

# Sometimes we might want to specify method details.
result2 <- h_diff_expr_voom(object, design, trend = TRUE, robust = TRUE)
head(result2)
object <- hermes_data

# Create the design matrix corresponding to the factor of interest.
design <- model.matrix(~SEX, colData(object))

# Then perform the differential expression analysis.
result <- h_diff_expr_voom(object, design)
head(result)

# Sometimes we might want to specify method details.
result2 <- h_diff_expr_voom(object, design, trend = TRUE, robust = TRUE)
head(result2)

Translation of `Ensembl` to `Entrez` Gene IDs

Description

This helper function queries BioMart to translate Ensembl to Entrez Gene IDs.

Usage

h_ensembl_to_entrez_ids(gene_ids, mart)
h_ensembl_to_entrez_ids(gene_ids, mart)

Arguments

`gene_ids`	(`character`) `Ensembl` gene IDs.
`mart`	(`Mart`) given `biomaRt::Mart` object.

Value

Character vector of Entrez gene IDs.

Examples

if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_ensembl_to_entrez_ids(c("ENSG00000135407", "ENSG00000241644"), mart)
}
if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_ensembl_to_entrez_ids(c("ENSG00000135407", "ENSG00000241644"), mart)
}

Get Annotations from `BioMart`

Description

Helper function to query annotations from biomaRt, for cleaned up gene IDs of a specific ID variable and given biomaRt::Mart.

Usage

h_get_annotation_biomart(gene_ids, id_var, mart)
h_get_annotation_biomart(gene_ids, id_var, mart)

Arguments

`gene_ids`	(`character`) gene IDs, e.g. `10329`, i.e. already without the `Entrez` `GeneID` prefix, or `ENSG00000241644` for `Ensembl` gene ID.
`id_var`	(`string`) corresponding gene ID variable name in `BioMart`, i.e. `entrezgene_id` or `ensembl_gene_id`.
`mart`	(`Mart`) given `biomaRt::Mart` object.

Value

A data frame with columns:

id_var (depending on what was used)
hgnc_symbol
entrezgene_description
chromosome_name
size
refseq_mrna
refseq_peptide

Examples

if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_get_annotation_biomart(c("11185", "10677"), id_var = "entrezgene_id", mart = mart)
}
if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_get_annotation_biomart(c("11185", "10677"), id_var = "entrezgene_id", mart = mart)
}

Conversion of `BioMart` Coordinates into `GRanges`

Description

This function extracts the chromosome number, the start position and the end position of transcripts in given data.frame with coordinates as returned by biomaRt::getBM() and converts them to a GRanges object.

Usage

h_get_granges_by_id(coords, id)
h_get_granges_by_id(coords, id)

Arguments

`coords`	(`data.frame`) as returned by `biomaRt::getBM()`, containing the columns `ensembl_gene_id`, `chromosome_name`, `exon_chrom_start`, `exon_chrom_end`.
`id`	(`string`) single `Ensembl` gene ID to convert the coordinates for.

Value

GRange objects for the respective single gene ID.

Examples

if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  attrs <- c(
    "ensembl_gene_id",
    "ensembl_exon_id",
    "chromosome_name",
    "exon_chrom_start",
    "exon_chrom_end"
  )
  coords <- biomaRt::getBM(
    filters = "entrezgene_id",
    attributes = attrs,
    values = c("11185", "10677"),
    mart = mart
  )
  h_get_granges_by_id(coords, "ENSG00000135407")
}
if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  attrs <- c(
    "ensembl_gene_id",
    "ensembl_exon_id",
    "chromosome_name",
    "exon_chrom_start",
    "exon_chrom_end"
  )
  coords <- biomaRt::getBM(
    filters = "entrezgene_id",
    attributes = attrs,
    values = c("11185", "10677"),
    mart = mart
  )
  h_get_granges_by_id(coords, "ENSG00000135407")
}

Total Length of All Exons for Genes

Description

This helper function queries BioMart for lengths of genes by adding up all exon lengths after reducing overlaps.

Usage

h_get_size_biomart(gene_ids, id_var, mart)
h_get_size_biomart(gene_ids, id_var, mart)

Arguments

`gene_ids`	(`character`) gene IDs, e.g. `10329`, i.e. already without the `Entrez` `GeneID` prefix, or `ENSG00000241644` for `Ensembl` gene ID.
`id_var`	(`string`) corresponding gene ID variable name in `BioMart`, i.e. `entrezgene_id` or `ensembl_gene_id`.
`mart`	(`Mart`) given `biomaRt::Mart` object.

Value

Named integer vector indicating the gene lengths.

Examples

if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_get_size_biomart("11185", "entrezgene_id", mart)
  h_get_size_biomart("ENSG00000215417", "ensembl_gene_id", mart)
  h_get_size_biomart(c("11185", "10677"), "entrezgene_id", mart)
  h_get_size_biomart(c("ENSG00000135407", "ENSG00000215417"), "ensembl_gene_id", mart)
}
if (interactive()) {
  mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")
  h_get_size_biomart("11185", "entrezgene_id", mart)
  h_get_size_biomart("ENSG00000215417", "ensembl_gene_id", mart)
  h_get_size_biomart(c("11185", "10677"), "entrezgene_id", mart)
  h_get_size_biomart(c("ENSG00000135407", "ENSG00000215417"), "ensembl_gene_id", mart)
}

Predicate for Required Annotations

Description

This helper function determines for each gene in the object whether all required annotation columns are filled.

Usage

h_has_req_annotations(object, annotation_required)
h_has_req_annotations(object, annotation_required)

Arguments

`object`	(`AnyHermesData`) input object.
`annotation_required`	(`character`) names of required `annotation` columns for genes.

Value

Named logical vector with one value for each gene in object, which is TRUE if all required annotation columns are filled, and otherwise FALSE.

Examples

object <- hermes_data
result <- h_has_req_annotations(object, "size")
all(result)
rowData(object)$size[1] <- NA # nolint
which(!h_has_req_annotations(object, "size"))
object <- hermes_data
result <- h_has_req_annotations(object, "size")
all(result)
rowData(object)$size[1] <- NA # nolint
which(!h_has_req_annotations(object, "size"))

Helper Function For Matching Map Values to Names

Description

This is used by the rename method. It wraps the assertions and the matching used several times.

Usage

h_map_pos(names, map)
h_map_pos(names, map)

Arguments

`names`	(`character`) original names.
`map`	(named `character`) the mapping vector from old (value) to new (name) names. All values must be included in `names`.

Value

Integer vector of the positions of the map values in the names.

Examples

h_map_pos(c("a", "b"), c(d = "b"))
h_map_pos(c("a", "b"), c(d = "b"))

Parenthesize a Character Vector

Description

This helper function adds parentheses around each element of a character vector.

Usage

h_parens(x)
h_parens(x)

Arguments

`x`	(`character`) inputs which should be parenthesized.

Value

Character vector with parentheses, except when x is a blank string in which case it is returned unaltered.

Examples

h_parens("bla")
h_parens("")
h_parens(c("bla", "bli"))
h_parens("bla")
h_parens("")
h_parens(c("bla", "bli"))

Calculation of R2 Matrix between Sample Variables and Principal Components

Description

This function processes sample variables from AnyHermesData and the corresponding principal components matrix, and then generates the matrix of R2 values.

Usage

h_pca_df_r2_matrix(pca, df)
h_pca_df_r2_matrix(pca, df)

Arguments

`pca`	(`matrix`) comprises principal components generated by `calc_pca()`.
`df`	(`data.frame`) from the `SummarizedExperiment::colData()` of a `AnyHermesData` object.

Details

Note that only the df columns which are numeric, character, factor or logical are included in the resulting matrix, because other variable types are not supported.
In addition, df columns which are constant, all NA, or character or factor columns with too many levels are also dropped before the analysis.

Value

A matrix with R2 values for all combinations of sample variables and principal components.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Obtain the principal components.
pca <- calc_pca(object)$x

# Obtain the `colData` as a `data.frame`.
df <- as.data.frame(colData(object))

# Correlate them.
r2_all <- h_pca_df_r2_matrix(pca, df)
str(r2_all)

# We can see that only about half of the columns from `df` were
# used for the correlations.
ncol(r2_all)
ncol(df)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Obtain the principal components.
pca <- calc_pca(object)$x

# Obtain the `colData` as a `data.frame`.
df <- as.data.frame(colData(object))

# Correlate them.
r2_all <- h_pca_df_r2_matrix(pca, df)
str(r2_all)

# We can see that only about half of the columns from `df` were
# used for the correlations.
ncol(r2_all)
ncol(df)

Calculation of R2 between Sample Variable and Principal Components

Description

This helper function calculates R2 values between one sample variable from AnyHermesData and all Principal Components (PCs) separately (one linear model is fit for each PC).

Usage

h_pca_var_rsquared(pca, x)
h_pca_var_rsquared(pca, x)

Arguments

`pca`	(`matrix`) principal components matrix generated by `calc_pca()`.
`x`	(`vector`) values of one sample variable from a `AnyHermesData` object.

Details

Note that in case there are estimation problems for any of the PCs, then NA will be returned for those.

Value

A vector with R2 values for each principal component.

Examples

object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Obtain the principal components.
pca <- calc_pca(object)$x

# Obtain the sample variable.
x <- colData(object)$AGE18

# Correlate them.
r2 <- h_pca_var_rsquared(pca, x)
object <- hermes_data %>%
  add_quality_flags() %>%
  filter() %>%
  normalize()

# Obtain the principal components.
pca <- calc_pca(object)$x

# Obtain the sample variable.
x <- colData(object)$AGE18

# Correlate them.
r2 <- h_pca_var_rsquared(pca, x)

Make a Short List of a Character Vector

Description

This helper function makes a short list string, e.g. "a, b, ..., z" out of a character vector, e.g. letters.

Usage

h_short_list(x, sep = ", ", thresh = 3L)
h_short_list(x, sep = ", ", thresh = 3L)

Arguments

`x`	(`character`) input which should be listed.
`sep`	(`string`) separator to use.
`thresh`	(`count`) threshold to use, if the length of `x` is larger then the list will be shortened using the `...` ellipsis.

Value

String with the short list.

Examples

h_short_list(letters)
h_short_list(letters[1:3])
h_short_list(LETTERS[1:5], sep = ";", thresh = 5L)
h_short_list(letters)
h_short_list(letters[1:3])
h_short_list(LETTERS[1:5], sep = ";", thresh = 5L)

Stripping Prefix from Gene IDs

Description

This helper function removes the prefix and possible delimiter from a vector of gene IDs, such that only the digits are returned.

Usage

h_strip_prefix(gene_ids, prefix)
h_strip_prefix(gene_ids, prefix)

Arguments

`gene_ids`	(`character`) original gene IDs including prefix and optional delimiter before the digits.
`prefix`	(`string`) common prefix to be stripped away from `gene_ids`.

Value

Character vector that contains only the digits for each gene ID.

Note

This is currently used to strip away the GeneID prefix from Entrez gene IDs so that they can be queried from BioMart

Examples

h_strip_prefix(c("GeneID:11185", "GeneID:10677"), prefix = "GeneID")
h_strip_prefix(c("GeneID:11185", "GeneID:10677"), prefix = "GeneID")

Creation of Unique Labels

Description

This helper function generates a set of unique labels given unique IDs and not necessarily unique names.

Usage

h_unique_labels(ids, nms = NULL)
h_unique_labels(ids, nms = NULL)

Arguments

`ids`	(`character` or `NULL`) unique IDs.
`nms`	(`character` or `NULL`) not necessarily unique names if provided.

Value

Character vector where empty names are replaced by the IDs and non-unique names are made unique by appending the IDs in parentheses.

Examples

h_unique_labels(c("1", "2", "3"), c("A", "B", "A"))
h_unique_labels(NULL)
h_unique_labels(c("1", "2", "3"))
h_unique_labels(c("1", "2", "3"), c("A", "B", "A"))
h_unique_labels(NULL)
h_unique_labels(c("1", "2", "3"))

Example `HermesData` Data

Description

This example HermesData is created from the underlying SummarizedExperiment::SummarizedExperiment object by renaming descriptors to align with standard specification. It already contains the required columns in rowData and colData.

Usage

hermes_data
hermes_data

Format

A HermesData object with 20 samples covering 5085 features (Entrez gene IDs).

Source

This is an artificial dataset designed to resemble real data.

`HermesData` and `RangedHermesData`

Description

The HermesData class is an extension of SummarizedExperiment::SummarizedExperiment with additional validation criteria.

Usage

HermesData(object)

HermesDataFromMatrix(counts, ...)
HermesData(object)

HermesDataFromMatrix(counts, ...)

Arguments

`object`	(`SummarizedExperiment`) input to create the `HermesData` object from. If this is a `RangedSummarizedExperiment`, then the result will be `RangedHermesData`.
`counts`	(`matrix`) counts to create the `HermesData` object from.
`...`	additional arguments, e.g. `rowData`, `colData`, etc. passed to `SummarizedExperiment::SummarizedExperiment()` internally. Note that if `rowRanges` is passed instead of `rowData`, then the result will be a `RangedHermesData` object.

Details

The additional criteria are:

The first assay must be counts containing non-missing, integer, non-negative values. Note that rename() can be used to edit the assay name to counts if needed.
The following columns must be in rowData:
- symbol (also often called HGNC or similar, example: "INMT")
- desc (the gene name, example: "indolethylamine N-methyltransferase")
- chromosome (the chromosome as string, example: "7")
- size (the size of the gene in base pairs, e.g 5468)
- low_expression_flag (can be populated with add_quality_flags())
The following columns must be in colData:
- low_depth_flag (can be populated with add_quality_flags())
- tech_failure_flag (can be populated with add_quality_flags())
The object must have unique row and column names. The row names are the gene names and the column names are the sample names.

Analogously, RangedHermesData is an extension of SummarizedExperiment::RangedSummarizedExperiment and has the same additional validation requirements. Methods can be defined for both classes at the same time with the AnyHermesData signature.

A Biobase::ExpressionSet object can be imported by using the SummarizedExperiment::makeSummarizedExperimentFromExpressionSet() function to first convert it to a SummarizedExperiment::SummarizedExperiment object before converting it again into a HermesData object.

Value

An object of class AnyHermesData (HermesData or RangedHermesData).

Slots

prefix: common prefix of the gene IDs (row names).

Note

Note that we use S4Vectors::setValidity2() to define the validity method, which allows us to turn off the validity checks in internal functions where intermediate objects may not be valid within the scope of the function.
It can be helpful to convert character and logical variables to factors in colData() (before or after the HermesData creation). We provide the utility function df_cols_to_factor() to simplify this task, but leave it to the user to allow for full control of the details.

Examples

# Convert an `ExpressionSet` to a `RangedSummarizedExperiment`.
ranged_summarized_experiment <- makeSummarizedExperimentFromExpressionSet(expression_set)

# Then convert to `RangedHermesData`.
HermesData(ranged_summarized_experiment)

# Create objects starting from a `SummarizedExperiment`.
hermes_data <- HermesData(summarized_experiment)
hermes_data

# Create objects from a matrix. Note that additional arguments are not required but possible.
counts_matrix <- assay(summarized_experiment)
counts_hermes_data <- HermesDataFromMatrix(counts_matrix)
# Convert an `ExpressionSet` to a `RangedSummarizedExperiment`.
ranged_summarized_experiment <- makeSummarizedExperimentFromExpressionSet(expression_set)

# Then convert to `RangedHermesData`.
HermesData(ranged_summarized_experiment)

# Create objects starting from a `SummarizedExperiment`.
hermes_data <- HermesData(summarized_experiment)
hermes_data

# Create objects from a matrix. Note that additional arguments are not required but possible.
counts_matrix <- assay(summarized_experiment)
counts_hermes_data <- HermesDataFromMatrix(counts_matrix)

Inner Joining a Genes with a CDISC Data Set

Description

This is a useful function when trying to join genetic with CDISC data sets.

Usage

inner_join_cdisc(
  gene_data,
  cdisc_data,
  patient_key = "USUBJID",
  additional_keys = character()
)
inner_join_cdisc(
  gene_data,
  cdisc_data,
  patient_key = "USUBJID",
  additional_keys = character()
)

Arguments

`gene_data`	(`data.frame` or `DataFrame`) genetic data.
`cdisc_data`	(`data.frame`) CDISC data (typically patient level data).
`patient_key`	(`string`) patient identifier.
`additional_keys`	(`character`) potential additional keys for the two data sets.

Value

A data.frame which contains columns from both data sets merged by the keys.

Note

Columns which are contained in both data sets but are not specified as keys are taken from gene_data and not from cdisc_data.

Examples

gene_data <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820"))
cdisc_data <- data.frame(
  USUBJID = head(gene_data$USUBJID, 10),
  extra = 1:10
)
result <- inner_join_cdisc(gene_data, cdisc_data)
result
gene_data <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820"))
cdisc_data <- data.frame(
  USUBJID = head(gene_data$USUBJID, 10),
  extra = 1:10
)
result <- inner_join_cdisc(gene_data, cdisc_data)
result

Checking for Empty `SummarizedExperiment`

Description

This method checks whether a SummarizedExperiment::SummarizedExperiment object is empty.

Usage

## S4 method for signature 'SummarizedExperiment'
isEmpty(x)
## S4 method for signature 'SummarizedExperiment'
isEmpty(x)

Arguments

`x`	(`SummarizedExperiment`) object to check.

Value

Flag whether the object is empty.

Examples

isEmpty(summarized_experiment)
isEmpty(summarized_experiment[NULL, ])
isEmpty(hermes_data)
isEmpty(summarized_experiment)
isEmpty(summarized_experiment[NULL, ])
isEmpty(hermes_data)

`lapply` method for `MultiAssayExperiment`

Description

Apply a function on all experiments in an MAE.

Usage

## S4 method for signature 'MultiAssayExperiment'
lapply(X, FUN, safe = TRUE, ...)
## S4 method for signature 'MultiAssayExperiment'
lapply(X, FUN, safe = TRUE, ...)

Arguments

`X`	(`MultiAssayExperiment`) input.
`FUN`	(`function`) to be applied to each experiment in `X`.
`safe`	(`flag`) whether this method should skip experiments where the function fails.
`...`	additional arguments passed to `FUN`.

Value

MultiAssayExperiment object with specified function applied.

Examples

object <- multi_assay_experiment
result <- lapply(object, normalize, safe = TRUE)
# Similarly, all experiments in an MAE can be converted to HermesData class:
result <- lapply(object, HermesData, safe = TRUE)
object <- multi_assay_experiment
result <- lapply(object, normalize, safe = TRUE)
# Similarly, all experiments in an MAE can be converted to HermesData class:
result <- lapply(object, HermesData, safe = TRUE)

Metadata Accessor and Setter

Description

These methods access or set the metadata in a AnyHermesData object.

Arguments

`x`	(`AnyHermesData`) object to access the metadata from.
`value`	(`list`) the list to replace the current metadata with.

Value

The metadata which is a list.

Note

Note that this just inherits S4Vectors::metadata,Annotated-method().

Examples

a <- hermes_data
metadata(a)
metadata(a) <- list(new = "my metadata")
metadata(a)
a <- hermes_data
metadata(a)
metadata(a) <- list(new = "my metadata")
metadata(a)

Example `MultiAssayExperiment` Data

Description

This example MultiAssayExperiment::MultiAssayExperiment can be used as test data.

Usage

multi_assay_experiment
multi_assay_experiment

Format

A MultiAssayExperiment::MultiAssayExperiment object with 3 separate HermesData objects.

The first object contains 5 samples and covers 1000 features (Entrez gene IDs).
The second object contains 9 samples with 2500 features.
The third object contains 6 samples with 1300 features.

Source

This is an artificial dataset designed to resemble real data.

Normalization of `AnyHermesData` Objects

Description

The normalize() method is normalizing the input AnyHermesData according to one or more specified normalization methods. The results are saved as additional assays in the object.

Possible normalization methods (which are implemented with separate helper functions):

cpm: Counts per Million (CPM). Separately by sample, the original counts of the genes are divided by the library size of this sample, and multiplied by one million. This is the appropriate normalization for between-sample comparisons.
rpkm: Reads per Kilobase of transcript per Million reads mapped (RPKM). Each gene count is divided by the gene size (in kilobases) and then again divided by the library sizes of each sample (in millions). This allows for within-sample comparisons, as it takes into account the gene sizes - longer genes will always have more counts than shorter genes.
tpm: Transcripts per Million (TPM). This addresses the problem of RPKM being inconsistent across samples (which can be seen that the sum of all RPKM values will vary from sample to sample). Therefore here we divide the RPKM by the sum of all RPKM values for each sample, and multiply by one million.
voom: VOOM normalization. This is essentially just a slight variation of CPM where a prior_count of 0.5 is combined with lib_sizes increased by 1 for each sample. Note that this is not required for the corresponding differential expression analysis, but just provided as a complementary experimental normalization approach here.
vst: Variance stabilizing transformation. This is to transform the normalized count data for all genes into approximately homoskedastic values (having constant variance).
rlog: The transformation to the log2 scale values with approximately homoskedastic values.

Usage

## S4 method for signature 'AnyHermesData'
normalize(
  object,
  methods = c("cpm", "rpkm", "tpm", "voom", "vst"),
  control = control_normalize(),
  ...
)

h_cpm(object, control = control_normalize())

h_rpkm(object, control = control_normalize())

h_tpm(object, control = control_normalize())

h_voom(object, control = control_normalize())

h_vst(object, control = control_normalize())

h_rlog(object, control = control_normalize())
## S4 method for signature 'AnyHermesData'
normalize(
  object,
  methods = c("cpm", "rpkm", "tpm", "voom", "vst"),
  control = control_normalize(),
  ...
)

h_cpm(object, control = control_normalize())

h_rpkm(object, control = control_normalize())

h_tpm(object, control = control_normalize())

h_voom(object, control = control_normalize())

h_vst(object, control = control_normalize())

h_rlog(object, control = control_normalize())

Arguments

`object`	(`AnyHermesData`) object to normalize.
`methods`	(`character`) which normalization methods to use, see details.
`control`	(named `list`) settings produced by `control_normalize()`.
`...`	not used.

Value

The AnyHermesData object with additional assays containing the normalized counts. The control is saved in the metadata of the object for future reference.

Functions

h_cpm(): calculates the Counts per Million (CPM) normalized counts.
h_rpkm(): calculates the Reads per Kilobase per Million (RPKM) normalized counts.
h_tpm(): calculates the Transcripts per Million (TPM) normalized counts.
h_voom(): calculates the VOOM normalized counts.
h_vst(): variance stabilizing transformation (vst) from DESeq2 package.
h_rlog(): regularized log transformation (rlog) from DESeq2 package.

Examples

a <- hermes_data

# By default, log values are used with a prior count of 1 added to original counts.
result <- normalize(a)
assayNames(result)
tpm <- assay(result, "tpm")
tpm[1:3, 1:3]

# We can also work on original scale.
result_orig <- normalize(a, control = control_normalize(log = FALSE))
tpm_orig <- assay(result_orig, "tpm")
tpm_orig[1:3, 1:3]

# Separate calculation of the CPM normalized counts.
counts_cpm <- h_cpm(a)
str(counts_cpm)

# Separate calculation of the RPKM normalized counts.
counts_rpkm <- h_rpkm(a)
str(counts_rpkm)

# Separate calculation of the TPM normalized counts.
counts_tpm <- h_tpm(a)
str(counts_tpm)

# Separate calculation of the VOOM normalized counts.
counts_voom <- h_voom(a)
str(counts_voom)

# Separate calculation of the vst transformation.
counts_vst <- h_vst(a)
str(counts_vst)

# Separate calculation of the rlog transformation.
counts_rlog <- h_rlog(a)
str(counts_rlog)
a <- hermes_data

# By default, log values are used with a prior count of 1 added to original counts.
result <- normalize(a)
assayNames(result)
tpm <- assay(result, "tpm")
tpm[1:3, 1:3]

# We can also work on original scale.
result_orig <- normalize(a, control = control_normalize(log = FALSE))
tpm_orig <- assay(result_orig, "tpm")
tpm_orig[1:3, 1:3]

# Separate calculation of the CPM normalized counts.
counts_cpm <- h_cpm(a)
str(counts_cpm)

# Separate calculation of the RPKM normalized counts.
counts_rpkm <- h_rpkm(a)
str(counts_rpkm)

# Separate calculation of the TPM normalized counts.
counts_tpm <- h_tpm(a)
str(counts_tpm)

# Separate calculation of the VOOM normalized counts.
counts_voom <- h_voom(a)
str(counts_voom)

# Separate calculation of the vst transformation.
counts_vst <- h_vst(a)
str(counts_vst)

# Separate calculation of the rlog transformation.
counts_rlog <- h_rlog(a)
str(counts_rlog)

Prefix Accessor

Description

Generic function to access the prefix from an object.

Usage

prefix(object, ...)
prefix(object, ...)

Arguments

`object`	(`AnyHermesData`) input.
`...`	additional arguments.

Value

The prefix slot contents.

Examples

a <- hermes_data
prefix(a)
a <- hermes_data
prefix(a)

Query Gene Annotations from a Connection

Description

The generic function query() is the interface for querying gene annotations from a data base connection.

Usage

query(genes, connection)

## S4 method for signature 'character,ConnectionBiomart'
query(genes, connection)
query(genes, connection)

## S4 method for signature 'character,ConnectionBiomart'
query(genes, connection)

Arguments

`genes`	(`character`) gene IDs.
`connection`	(connection class) data base connection object.

Details

A method is provided for the ConnectionBiomart class. However, the framework is extensible: It is simple to add new connections and corresponding query methods for other data bases, e.g. company internal data bases. Please make sure to follow the required format of the returned value.
The BioMart queries might not return information for all the genes. This can be due to different versions being used in the gene IDs and the queried Ensembl data base.

Value

A S4Vectors::DataFrame with the gene annotations. It is required that:

The rownames are identical to the input genes.
The colnames are equal to the annotation columns .row_data_annotation_cols.
Therefore, missing information needs to be properly included in the DataFrame with NA entries.

Examples

if (interactive()) {
  object <- hermes_data
  connection <- connect_biomart(prefix(object))
  result <- query(genes(object), connection)
  head(result)
  head(annotation(object))
}
if (interactive()) {
  object <- hermes_data
  connection <- connect_biomart(prefix(object))
  result <- query(genes(object), connection)
  head(result)
  head(annotation(object))
}

Row Binding of `AnyHermesData` Objects

Description

This method combines AnyHermesData objects with the same samples but different features of interest (rows in assays).

Arguments

...

(AnyHermesData)
objects to row bind.

Value

The combined AnyHermesData object.

Note

Note that this just inherits SummarizedExperiment::rbind,SummarizedExperiment-method(). When binding a AnyHermesData object with a SummarizedExperiment::SummarizedExperiment object, then the result will be a SummarizedExperiment::SummarizedExperiment object (the more general class).
Note that we need to have unique gene IDs (row names) and the same prefix across the combined object.

Examples

a <- hermes_data[1:2542, ]
b <- hermes_data[2543:5085, ]
result <- rbind(a, b)
class(result)
a <- hermes_data[1:2542, ]
b <- hermes_data[2543:5085, ]
result <- rbind(a, b)
class(result)

Renaming Contents of `SummarizedExperiment` Objects

Description

This method renames columns of the rowData and colData, as well as assays, of SummarizedExperiment::SummarizedExperiment objects. This increases the flexibility since renaming can be done before conversion to a HermesData object.

Usage

## S4 method for signature 'SummarizedExperiment'
rename(
  x,
  row_data = character(),
  col_data = character(),
  assays = character(),
  ...
)

## S4 method for signature 'data.frame'
rename(x, ...)
## S4 method for signature 'SummarizedExperiment'
rename(
  x,
  row_data = character(),
  col_data = character(),
  assays = character(),
  ...
)

## S4 method for signature 'data.frame'
rename(x, ...)

Arguments

`x`	(`SummarizedExperiment`) object to rename contents in.
`row_data`	(named `character`) mapping from existing (right-hand side values) to new (left-hand side names) column names of `rowData`.
`col_data`	(named `character`) mapping from existing (right-hand side values) to new (left-hand side names) column names of `colData`.
`assays`	(named `character`) mapping from existing (right-hand side values) to new (left-hand side names) assay names.
`...`	additional arguments (not used here).

Value

The SummarizedExperiment::SummarizedExperiment object with renamed contents.

Examples

x <- summarized_experiment
# Use deliberately a non-standard assay name in this example.
assayNames(x) <- "count"

# Rename `HGNC` to `symbol` in the `rowData`.
x <- rename(x, row_data = c(symbol = "HGNC"))
head(names(rowData(x)))

# Rename `LowDepthFlag` to `low_depth_flag` in `colData`.
x <- rename(x, col_data = c(low_depth_flag = "LowDepthFlag"))
tail(names(colData(x)))

# Rename assay `count` to `counts`.
x <- rename(x, assays = c(counts = "count"))
assayNames(x)
x <- summarized_experiment
# Use deliberately a non-standard assay name in this example.
assayNames(x) <- "count"

# Rename `HGNC` to `symbol` in the `rowData`.
x <- rename(x, row_data = c(symbol = "HGNC"))
head(names(rowData(x)))

# Rename `LowDepthFlag` to `low_depth_flag` in `colData`.
x <- rename(x, col_data = c(low_depth_flag = "LowDepthFlag"))
tail(names(colData(x)))

# Rename assay `count` to `counts`.
x <- rename(x, assays = c(counts = "count"))
assayNames(x)

Sample IDs Accessor

Description

Access the sample IDs, i.e. col names, of a AnyHermesData object with a nicely named accessor method.

Usage

## S4 method for signature 'AnyHermesData'
samples(object)
## S4 method for signature 'AnyHermesData'
samples(object)

Arguments

object

(AnyHermesData)
input.

Value

The character vector with the sample IDs.

Examples

a <- hermes_data
samples(a)
a <- hermes_data
samples(a)

Set Technical Failure Flags

Description

Setter function which allows the user to define a sample manually as a technical failure.

Usage

set_tech_failure(object, sample_ids)
set_tech_failure(object, sample_ids)

Arguments

`object`	(`AnyHermesData`) input.
`sample_ids`	(`character`) sample IDs to be flagged as technical failures.

Value

AnyHermesData object with modified technical failure flags.

Examples

# Manually flag technical failures in a `AnyHermesData` object.
object <- hermes_data
get_tech_failure(object)["06520101B0017R"]
result <- set_tech_failure(object, c("06520101B0017R", "06520047C0017R"))
get_tech_failure(result)["06520101B0017R"]
# Manually flag technical failures in a `AnyHermesData` object.
object <- hermes_data
get_tech_failure(object)["06520101B0017R"]
result <- set_tech_failure(object, c("06520101B0017R", "06520047C0017R"))
get_tech_failure(result)["06520101B0017R"]

Show Method for `AnyHermesData` Objects

Description

A show method that displays high-level information of AnyHermesData objects.

Usage

## S4 method for signature 'HermesData'
show(object)

## S4 method for signature 'RangedHermesData'
show(object)
## S4 method for signature 'HermesData'
show(object)

## S4 method for signature 'RangedHermesData'
show(object)

Arguments

object

(AnyHermesData)
input.

Value

None (invisible NULL), only used for the side effect of printing to the console.

Note

The same method is used for both HermesData and RangedHermesData objects. We need to define this separately to have this method used instead of the one inherited from SummarizedExperiment::SummarizedExperiment.

Examples

object <- hermes_data
object
object <- hermes_data
object

Subsetting `AnyHermesData` Objects

Description

This method subsets AnyHermesData objects, based on expressions involving the rowData columns and the colData columns.

Arguments

`x`	(`AnyHermesData`) object to subset from.
`subset`	(`expression`) logical expression based on the `rowData` columns to select genes.
`select`	(`expression`) logical expression based on the `colData` columns to select samples.

Value

The subsetted AnyHermesData object.

Note

Note that this just inherits SummarizedExperiment::subset,SummarizedExperiment-method().

Examples

a <- hermes_data
a

# Subset both genes and samples.
subset(a, subset = low_expression_flag, select = DISCSTUD == "N")

# Subset only genes.
subset(a, subset = chromosome == "2")

# Subset only samples.
subset(a, select = AGE > 18)
a <- hermes_data
a

# Subset both genes and samples.
subset(a, subset = low_expression_flag, select = DISCSTUD == "N")

# Subset only genes.
subset(a, subset = chromosome == "2")

# Subset only samples.
subset(a, select = AGE > 18)

Example `SummarizedExperiment` Data

Description

This example SummarizedExperiment::SummarizedExperiment can be used to create a HermesData object. It already contains the required columns in rowData and colData.

Usage

summarized_experiment
summarized_experiment

Format

A SummarizedExperiment::SummarizedExperiment object with 20 samples covering 5085 features (Entrez gene IDs).

Source

This is an artificial dataset designed to resemble real data.

Summary Method for `AnyHermesData` Objects

Description

Provides a concise summary of the content of AnyHermesData objects.

Usage

summary(object, ...)

## S4 method for signature 'AnyHermesData'
summary(object)

## S4 method for signature 'HermesDataSummary'
show(object)
summary(object, ...)

## S4 method for signature 'AnyHermesData'
summary(object)

## S4 method for signature 'HermesDataSummary'
show(object)

Arguments

`object`	(`HermesDataSummary`) result from the summary method applied to `AnyHermesData` object.
`...`	not used.

Value

An object of the corresponding summary class, here HermesDataSummary.

Methods (by class)

summary(AnyHermesData): A summary method for AnyHermesData object that creates a HermesDataSummary object.
show(HermesDataSummary): A show method prints summary description of HermesDataSummary object generated by the summary() method.

Examples

object <- hermes_data
object_summary <- summary(object)

# We can access parts of this S4 object with the `slot` function.
str(object_summary)
slotNames(object_summary)
slot(object_summary, "lib_sizes")

# Just calling the summary method like this will use the `show()` method.
summary(object)
object <- hermes_data
object_summary <- summary(object)

# We can access parts of this S4 object with the `slot` function.
str(object_summary)
slotNames(object_summary)
slot(object_summary, "lib_sizes")

# Just calling the summary method like this will use the `show()` method.
summary(object)

Derivation of Top Genes

Description

top_genes() creates a HermesDataTopGenes object, which extends data.frame. It contains two columns:

expression: containing the statistic values calculated by summary_fun across columns.
name: the gene names.

The corresponding autoplot() method then visualizes the result as a barplot.

Usage

top_genes(
  object,
  assay_name = "counts",
  summary_fun = rowMeans,
  n_top = if (is.null(min_threshold)) 10L else NULL,
  min_threshold = NULL
)

## S4 method for signature 'HermesDataTopGenes'
autoplot(
  object,
  x_lab = "HGNC gene names",
  y_lab = paste0(object@summary_fun_name, "(", object@assay_name, ")"),
  title = "Top most expressed genes"
)
top_genes(
  object,
  assay_name = "counts",
  summary_fun = rowMeans,
  n_top = if (is.null(min_threshold)) 10L else NULL,
  min_threshold = NULL
)

## S4 method for signature 'HermesDataTopGenes'
autoplot(
  object,
  x_lab = "HGNC gene names",
  y_lab = paste0(object@summary_fun_name, "(", object@assay_name, ")"),
  title = "Top most expressed genes"
)

Arguments

`object`	(`AnyHermedData`) input.
`assay_name`	(`string`) name of the assay to use for the sorting of genes.
`summary_fun`	(`function`) summary statistics function to apply across the samples in the assay resulting in a numeric vector with one value per gene.
`n_top`	(`count` or `NULL`) selection criteria based on number of entries.
`min_threshold`	(`number` or `NULL` ) selection criteria based on a minimum summary statistics threshold.
`x_lab`	(`string`) x-axis label.
`y_lab`	(`string`) y-axis label.
`title`	(`string`) plot title.

Details

The data frame is sorted in descending order of expression and only the top entries according to the selection criteria are included.
Note that exactly one of the arguments n_top and min_threshold must be provided.

Value

A HermesDataTopGenes object.

Functions

autoplot(HermesDataTopGenes): Creates a bar plot from a HermesDataTopGenes object, where the y axis shows the expression statistics for each of the top genes on the x-axis.

Examples

object <- hermes_data

# Default uses average of raw counts across samples to rank genes.
top_genes(object)

# Instead of showing top 10 genes, can also set a minimum threshold on average counts.
top_genes(object, n_top = NULL, min_threshold = 50000)

# We can also use the maximum of raw counts across samples, by specifying a different
# summary statistics function.
result <- top_genes(object, summary_fun = rowMax)

# Finally we can produce barplots based on the results.
autoplot(result, title = "My top genes")
autoplot(result, y_lab = "Counts", title = "My top genes")
object <- hermes_data

# Default uses average of raw counts across samples to rank genes.
top_genes(object)

# Instead of showing top 10 genes, can also set a minimum threshold on average counts.
top_genes(object, n_top = NULL, min_threshold = 50000)

# We can also use the maximum of raw counts across samples, by specifying a different
# summary statistics function.
result <- top_genes(object, summary_fun = rowMax)

# Finally we can produce barplots based on the results.
autoplot(result, title = "My top genes")
autoplot(result, y_lab = "Counts", title = "My top genes")

Internal Helper Functions for Validation of `AnyHermesData` Objects

Description

These functions are used internally only and therefore not exported. They work on SummarizedExperiment::SummarizedExperiment objects, and AnyHermesData objects are defined by successfully passing these validation checks.

Usage

validate_counts(object)

validate_cols(required, actual)

validate_row_data(object)

validate_col_data(object)

validate_names(object)

validate_prefix(object)
validate_counts(object)

validate_cols(required, actual)

validate_row_data(object)

validate_col_data(object)

validate_names(object)

validate_prefix(object)

Arguments

`object`	(`SummarizedExperiment`) object to validate.
`required`	(`character`) required column names.
`actual`	(`actual`) actual column names.

Value

A character vector with the validation failure messages, or NULL in case validation passes.

Functions

validate_counts(): validates that the first assay is counts containing non-missing, integer, non-negative values.
validate_cols(): validates that required column names are contained in actual column names.
validate_row_data(): validates that the object contains rowData with required columns.
validate_col_data(): validates that the object contains colData with required columns.
validate_names(): validates that the object contains row and column names.
validate_prefix(): validates that the object prefix is a string and only contains alphabetic characters.

Wrap in MAE

Description

This helper function wraps SummarizedExperiment objects into an a MultiAssayExperiment (MAE) object.

Usage

wrap_in_mae(x, name = deparse(substitute(x)))
wrap_in_mae(x, name = deparse(substitute(x)))

Arguments

`x`	(`SummarizedExperiment`) input to create the MAE object from.
`name`	(`string`) experiment name to use in the MAE for `x`.

Value

The MAE object with the only experiment being x having the given name.

Examples

mae <- wrap_in_mae(summarized_experiment)
mae[["summarized_experiment"]]
mae <- wrap_in_mae(summarized_experiment)
mae[["summarized_experiment"]]

Package 'hermes'

Help Index

hermes Package

Description

Author(s)

See Also

Add Quality Flags

Description

Usage

Arguments

Details

Value

Functions

See Also

Examples

Checks Whether All Missing

Description

Usage

Arguments

Value

Annotation Accessor and Setter

Description

Usage

Arguments

Format

Value

Note

Examples

Standard Assertion Arguments

Description

Arguments

Additional Assertions for assert_that

Description

Usage

Arguments

Value

Functions

See Also

Examples

All Standard Plots in Default Setting

Description

Usage

Arguments

Value

Examples

Principal Components Analysis Calculation

Description

Usage

Arguments

Details

Value

See Also

Examples

Concatenate and Print with Newline

Description

Usage

Arguments

Value

See Also

Examples

Column Binding of AnyHermesData Objects

Description

Arguments

Value

Note

See Also

Examples

Check for proportion

Description

Usage

Arguments

Value

See Also

Examples

Sample Variables with Selected Gene Information

Description

Usage

Arguments

Value

Note

`hermes` Package

Additional Assertions for `assert_that`

Column Binding of `AnyHermesData` Objects

Connection to `BioMart`

Correlation between Sample Counts of `AnyHermesData`

Conversion of Eligible Columns to Factor Variables in a `DataFrame`