Here we will walk-through how to update _envsetup.yml to meet your needs. The configuration is currently setup to address:
This adds envsetup:paths
to your search path which
contains all of the relevant objects needed to point to different
directories in your environment.
Scripts typically execute in different environments depending on your workflow. Here we have a workflow where multiple developers work in dev making scripts, they move to qa for some quality checks and sign off, then move to prod where they are executed for delivery.
Each execution environment might have slightly different configurations. This allows us to change the configuration to meet the needs of each environment.
This is best illustrated with an example. For this example, we will focus on setting up one environment, the default configuration.
If you wish to have different configurations based off your environment, you would need to expand this to fit your needs.
First we will need to read in data, write out results and save the
script for future reference for a project we’ll call
project1. So we need an object to point to each of
these locations, and we add the data
, output
and programs
objects to our config.
default:
paths:
data: "/demo/DEV/username/project1/data"
output: "/demo/DEV/username/project1/output"
programs: "/demo/DEV/username/project1/programs"
A working example is even better, so let’s create a temporary
directory and store this config file as _envsetup.yml
.
library(envsetup)
#>
#> Attaching package: 'envsetup'
#> The following object is masked from 'package:base':
#>
#> library
# create temporary directory
dir <- fs::file_temp()
dir.create(dir)
config_path <- file.path(dir, "_envsetup.yml")
# write a config file to it
file_conn <- file(config_path)
writeLines(
"default:
paths:
data: '/demo/DEV/username/project1/data'
output: '/demo/DEV/username/project1/output'
programs: '/demo/DEV/username/project1/programs'", file_conn)
close(file_conn)
We can then call rprofile()
, passing in this
configuration.
# Set up the project
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
We now have data, output and programs available to us in our search
path within envsetup:paths
. Let’s take a look:
objects("envsetup:paths")
#> [1] "auto_stored_envsetup_config" "data"
#> [3] "output" "programs"
data
#> [1] "/demo/DEV/username/project1/data"
output
#> [1] "/demo/DEV/username/project1/output"
programs
#> [1] "/demo/DEV/username/project1/programs"
Alright!
Now let’s go one step further and imagine a programmer, we’ll call them Tidy McVerse. Miss McVerse needs to read in some data and this data is in the development area when she started programming.
This is great! We already have the object data that points to “/demo/DEV/username/project1/data”.
Half way through programming, the data was considered production ready and the data moved from “/demo/DEV/username/project1/data” to “/demo/PROD/project1/data”. Miss McVerse should not need to change her programs now, she needs a way to read data that is smarter than the average bear.
The same object she uses to read in the data should work if the data is in “/demo/DEV/username/project1/data” or “/demo/PROD/project1/data”.
Let’s create a config to keep Tidy McVerse happy and focused on the results, not data locations.
Here we have a configuration where we execute some R code to build a list for our possible data sources, see the config package for details.
default:
paths:
data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
output: '/demo/DEV/username/project1/output'
programs: '/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'
Once again, we have a working example if you would like to code along. We will overwrite the previous config file with our new config.
file_conn <- file(config_path)
writeLines(
paste0(
"default:
paths:
data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data')
output: '",dir,"/demo/DEV/username/project1/output'
programs: '",dir,"/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'"
), file_conn)
close(file_conn)
Now we can re-setup the project.
# Set up the project
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
We have data
, output
and
programs
available to us in our search path within
envsetup:paths
, but data
is now a named list
with two locations. We also now have envsetup_environ
which, we will get into more details later, just accept it exists for
now.
objects("envsetup:paths")
#> [1] "auto_stored_envsetup_config" "data"
#> [3] "envsetup_environ" "output"
#> [5] "programs"
data
#> $DEV
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/DEV/username/project1/data"
#>
#> $PROD
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/PROD/project1/data"
output
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/DEV/username/project1/output"
programs
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/DEV/username/project1/programs"
envsetup_environ
#> [1] "DEV"
We can use envsetup::read_path()
to help us find where
the data is we would like to read.
Let’s create the directories in our temporary folder structure …
dir.create(file.path(dir, "/demo/DEV/username/project1/data"), recursive = TRUE)
dir.create(file.path(dir, "/demo/PROD/project1/data"), recursive = TRUE)
… and add mtcars
to the PROD data directory,
“/demo/PROD/project1/data”.
Now we can use read_path()
, passing in the path object
data
to find where to read mtcars.RDS
. The
data is only in PROD so the function returns the path to PROD
mtcars.RDS
.
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpyfAIEk/file16d97dce1eac/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/PROD/project1/data/mtcars.RDS"
Let’s keep going!
What if the data was in DEV and PROD?
Let’s save the same data to DEV …
… and see what read_path()
returns.
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpyfAIEk/file16d97dce1eac/demo/DEV/username/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/DEV/username/project1/data/mtcars.RDS"
We see the path to DEV now instead of the path to PROD.
To explain this, we will now talk about
envsetup_environ
, which we set in the config earlier.
When we have multiple paths, as we do here with data, this controls which paths should be checked. This is just an index. Wherever the environment is found in the list, only this location to the end will be checked for data.
In this example below, we set envsetup_environ = 'DEV'
.
So DEV is first in our data
list, meaning all locations are
checked until the object is found or nothing is found.
default:
paths:
data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
output: '/demo/DEV/username/project1/output'
programs: '/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'
Let’s now add an execution environment for PROD
. We
cannot simply change envsetup_environ
from DEV
to PROD
, or DEV
wouldn’t work. We need to add
a configuration to PROD
, otherwise it will use
default
.
default:
paths:
data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data')
output: '/demo/DEV/username/project1/output'
programs: '/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'
prod:
paths:
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'
So we will write this new config out …
# overwrite the config file to the temporary directory previously setup
file_conn <- file(config_path)
writeLines(
paste0(
"default:
paths:
data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data')
output: '",dir,"/demo/DEV/username/project1/output'
programs: '",dir,"/demo/DEV/username/project1/programs'
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'
prod:
paths:
envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'"
), file_conn)
close(file_conn)
… and use it to overwrite the project with our new configuration.
# setup the project
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
Let’s check that envsetup_environ
is now PROD.
What! It isn’t PROD.
We must pass the configuration to config:get()
telling
it to use PROD.
envsetup_config <- config::get(file = config_path, config = "prod")
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
envsetup_environ
#> [1] "PROD"
Now lets see what has changed when we call read_path()
for mtcars.RDS
using the PROD configuration.
read_path(data, "mtcars.RDS")
#> Read Path:/tmp/RtmpyfAIEk/file16d97dce1eac/demo/PROD/project1/data/mtcars.RDS
#> [1] "/tmp/RtmpyfAIEk/file16d97dce1eac/demo/PROD/project1/data/mtcars.RDS"
We see the path to PROD, even though data exists in both DEV and PROD. This is because data was indexed starting with the location of PROD, which is the last element in data, so only this location was checked, excluding DEV.
Miss McVerse no longer needs to think about where her data is in the
workflow, and can use read_path(data, ...)
to determine the
correct path.
We can apply the same steps to update our configuration for
output
and programs
to account for
PROD
as well.
This adds multiple environments to your search path, each of which contain objects that are automatically sourced.
So let’s go back to Tidy McVerse. She has created a custom, one off
function and stored this in
/demo/DEV/username/project1/script_library
.
We will add this path to the autos config.
Let’s look at a working example. We will create the directory, place a script into the folder …
# create the temp directory
dir <- fs::file_temp()
dir.create(dir)
dir.create(file.path(dir, "/demo/DEV/username/project1/script_library"), recursive = TRUE)
# write a function to the folder
file_conn <- file(file.path(dir, "/demo/DEV/username/project1/script_library/test.R"))
writeLines(
"test <- function(){print('test')}", file_conn)
close(file_conn)
# write the config
config_path <- file.path(dir, "_envsetup.yml")
file_conn <- file(config_path)
writeLines(
paste0(
"default:
autos:
dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'"
), file_conn)
close(file_conn)
… and call rprofile()
passing in this config file.
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
#> Attaching functions from /tmp/RtmpyfAIEk/file16d94abac338/demo/DEV/username/project1/script_library to autos:dev_script_library
Now we can see autos:dev_script_library
was added to the
search path.
search()
#> [1] ".GlobalEnv" "autos:dev_script_library"
#> [3] "package:envsetup" "envsetup:paths"
#> [5] "package:rmarkdown" "package:stats"
#> [7] "package:graphics" "package:grDevices"
#> [9] "package:utils" "package:datasets"
#> [11] "package:methods" "Autoloads"
#> [13] "package:base"
test()
is available within this environment, and we can
execute this function without sourcing.
Why on earth would we need this?
Just as with our previous data example, these scripts can be in multiple locations during their qualification lifecycle.
So let’s say Tidy McVerse’s friend, Sir Purrr, has a function that is useful for others in this specific project, but it is already in prod. Miss McVerse would like to use her function in dev and Sir Purrr’s function in prod.
To illustrate this, let’s add the prod script library to our config …
default:
autos:
dev_script_library: '/demo/DEV/username/project1/script_library'
prod_script_library: '/demo/PROD/project1/script_library'
… create the PROD
directory and Sir Purrr’s function to
PROD
.
dir.create(file.path(dir, "/demo/PROD/project1/script_library"), recursive = TRUE)
# write a function to the folder
file_conn <- file(file.path(dir, "/demo/PROD/project1/script_library/test2.R"))
writeLines(
"test2 <- function(){print('test2')}", file_conn)
close(file_conn)
Then we can overwrite our _envsetup.yml
…
# write the config
file_conn <- file(config_path)
writeLines(
paste0(
"default:
autos:
dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'
prod_script_library: '", dir,"/demo/PROD/project1/script_library'"
), file_conn)
close(file_conn)
… and overwrite the project with our new configuration.
envsetup_config <- config::get(file = config_path)
rprofile(envsetup_config)
#> Attaching paths to envsetup:paths
#> Attaching functions from /tmp/RtmpyfAIEk/file16d94abac338/demo/PROD/project1/script_library to autos:prod_script_library
#> Attaching functions from /tmp/RtmpyfAIEk/file16d94abac338/demo/DEV/username/project1/script_library to autos:dev_script_library
Now we can see prod_script_library
was added to the
search path, the function test()
and test2()
are available, and we can execute these functions without a need for
sourcing.
search()
#> [1] ".GlobalEnv" "autos:dev_script_library"
#> [3] "autos:prod_script_library" "package:envsetup"
#> [5] "envsetup:paths" "package:rmarkdown"
#> [7] "package:stats" "package:graphics"
#> [9] "package:grDevices" "package:utils"
#> [11] "package:datasets" "package:methods"
#> [13] "Autoloads" "package:base"
objects("autos:prod_script_library")
#> [1] "test2"
test()
#> [1] "test"
test2()
#> [1] "test2"
We can keep going and create different configurations for each execution environment, similar to what we did for PATHS above.
One example that we would not want to source any functions in dev, when executing in prod. This configuration example is one way you can handle this situation, by blanking out the dev script location when executing in prod.
# write the config
file_conn <- file(config_path)
writeLines(
paste0(
"default:
autos:
dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'
prod_script_library: '", dir,"/demo/PROD/project1/script_library'
prod:
autos:
dev_script_library: ''"
), file_conn)
close(file_conn)
envsetup_config <- config::get(file = config_path, config = "prod")
rprofile(envsetup_config)