--- title: "Create Your Config" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{config} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Here we will walk-through how to update \_envsetup.yml to meet your needs. The configuration is currently setup to address: 1. paths 2. autos # PATHS This adds `envsetup:paths` to your search path which contains all of the relevant objects needed to point to different directories in your environment. ## Level 1 of config: the execution environment (ex. dev, qa or prod) Scripts typically execute in different environments depending on your workflow. Here we have a workflow where multiple developers work in dev making scripts, they move to qa for some quality checks and sign off, then move to prod where they are executed for delivery. ``` yaml default: dev: qa: prod: ``` ## Level 2 of config: paths and autos Each execution environment might have slightly different configurations. This allows us to change the configuration to meet the needs of each environment. ``` yaml default: paths: autos: dev: paths: autos: qa: paths: autos: prod: paths: autos: ``` ## Level 3 of config: configure the environment This is best illustrated with an example. For this example, we will focus on setting up one environment, the default configuration. If you wish to have different configurations based off your environment, you would need to expand this to fit your needs. ``` yaml default: paths: autos: ``` ## A working example First we will need to read in data, write out results and save the script for future reference for a project we'll call **project1**. So we need an object to point to each of these locations, and we add the `data`, `output` and `programs` objects to our config. ``` yaml default: paths: data: "/demo/DEV/username/project1/data" output: "/demo/DEV/username/project1/output" programs: "/demo/DEV/username/project1/programs" ``` A working example is even better, so let's create a temporary directory and store this config file as `_envsetup.yml`. ```{r} library(envsetup) # create temporary directory dir <- fs::file_temp() dir.create(dir) config_path <- file.path(dir, "_envsetup.yml") # write a config file to it file_conn <- file(config_path) writeLines( "default: paths: data: '/demo/DEV/username/project1/data' output: '/demo/DEV/username/project1/output' programs: '/demo/DEV/username/project1/programs'", file_conn) close(file_conn) ``` We can then call `rprofile()`, passing in this configuration. ```{r} # Set up the project envsetup_config <- config::get(file = config_path) rprofile(envsetup_config) ``` We now have data, output and programs available to us in our search path within `envsetup:paths`. Let's take a look: ```{r echo = TRUE} objects("envsetup:paths") data output programs ``` Alright! Now let's go one step further and imagine a programmer, we'll call them Tidy McVerse. Miss McVerse needs to read in some data and this data is in the development area when she started programming. This is great! We already have the object data that points to "/demo/DEV/username/project1/data". Half way through programming, the data was considered production ready and the data moved from "/demo/DEV/username/project1/data" to "/demo/PROD/project1/data". Miss McVerse should not need to change her programs now, she needs a way to read data that is smarter than the average bear. The same object she uses to read in the data should work if the data is in "/demo/DEV/username/project1/data" or "/demo/PROD/project1/data". Let's create a config to keep Tidy McVerse happy and focused on the results, not data locations. Here we have a configuration where we execute some R code to build a list for our possible data sources, [see the config package for details](https://rstudio.github.io/config/articles/introduction.html#r-code). ``` yaml default: paths: data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data') output: '/demo/DEV/username/project1/output' programs: '/demo/DEV/username/project1/programs' envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV' ``` Once again, we have a working example if you would like to code along. We will overwrite the previous config file with our new config. ```{r} file_conn <- file(config_path) writeLines( paste0( "default: paths: data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data') output: '",dir,"/demo/DEV/username/project1/output' programs: '",dir,"/demo/DEV/username/project1/programs' envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV'" ), file_conn) close(file_conn) ``` Now we can re-setup the project. ```{r} # Set up the project envsetup_config <- config::get(file = config_path) rprofile(envsetup_config) ``` We have `data`, `output` and `programs` available to us in our search path within `envsetup:paths`, but `data` is now a named list with two locations. We also now have `envsetup_environ` which, we will get into more details later, just accept it exists for now. ```{r echo = TRUE} objects("envsetup:paths") data output programs envsetup_environ ``` We can use `envsetup::read_path()` to help us find where the data is we would like to read. Let's create the directories in our temporary folder structure ... ```{r} dir.create(file.path(dir, "/demo/DEV/username/project1/data"), recursive = TRUE) dir.create(file.path(dir, "/demo/PROD/project1/data"), recursive = TRUE) ``` ... and add `mtcars` to the PROD data directory, "/demo/PROD/project1/data". ```{r} saveRDS(mtcars, file.path(dir, "/demo/PROD/project1/data/mtcars.RDS")) ``` Now we can use `read_path()`, passing in the path object `data` to find where to read `mtcars.RDS`. The data is only in PROD so the function returns the path to PROD `mtcars.RDS`. ```{r} read_path(data, "mtcars.RDS") ``` Let's keep going! What if the data was in DEV and PROD? Let's save the same data to DEV ... ```{r} saveRDS(mtcars, file.path(dir, "/demo/DEV/username/project1/data/mtcars.RDS")) ``` ... and see what `read_path()` returns. ```{r} read_path(data, "mtcars.RDS") ``` We see the path to DEV now instead of the path to PROD. To explain this, we will now talk about `envsetup_environ`, which we set in the config earlier. When we have multiple paths, as we do here with data, this controls which paths should be checked. This is just an index. Wherever the environment is found in the list, only this location to the end will be checked for data. In this example below, we set `envsetup_environ = 'DEV'`. So DEV is first in our `data` list, meaning all locations are checked until the object is found or nothing is found. ```{r eval = FALSE} default: paths: data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data') output: '/demo/DEV/username/project1/output' programs: '/demo/DEV/username/project1/programs' envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV' ``` Let's now add an execution environment for `PROD`. We cannot simply change `envsetup_environ` from `DEV` to `PROD`, or `DEV` wouldn't work. We need to add a configuration to `PROD`, otherwise it will use `default`. ```{r eval = FALSE} default: paths: data: !expr list(DEV = '/demo/DEV/username/project1/data', PROD = '/demo/PROD/project1/data') output: '/demo/DEV/username/project1/output' programs: '/demo/DEV/username/project1/programs' envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV' prod: paths: envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD' ``` So we will write this new config out ... ```{r} # overwrite the config file to the temporary directory previously setup file_conn <- file(config_path) writeLines( paste0( "default: paths: data: !expr list(DEV = '",dir,"/demo/DEV/username/project1/data', PROD = '",dir,"/demo/PROD/project1/data') output: '",dir,"/demo/DEV/username/project1/output' programs: '",dir,"/demo/DEV/username/project1/programs' envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'DEV'); 'DEV' prod: paths: envsetup_environ: !expr Sys.setenv(ENVSETUP_ENVIRON = 'PROD'); 'PROD'" ), file_conn) close(file_conn) ``` ... and use it to overwrite the project with our new configuration. ```{r} # setup the project envsetup_config <- config::get(file = config_path) rprofile(envsetup_config) ``` Let's check that `envsetup_environ` is now PROD. ```{r} envsetup_environ ``` What! It isn't PROD. We must pass the configuration to `config:get()` telling it to use PROD. ```{r} envsetup_config <- config::get(file = config_path, config = "prod") rprofile(envsetup_config) envsetup_environ ``` Now lets see what has changed when we call `read_path()` for `mtcars.RDS` using the PROD configuration. ```{r} read_path(data, "mtcars.RDS") ``` We see the path to PROD, even though data exists in both DEV and PROD. This is because data was indexed starting with the location of PROD, which is the last element in data, so only this location was checked, excluding DEV. Miss McVerse no longer needs to think about where her data is in the workflow, and can use `read_path(data, ...)` to determine the correct path. We can apply the same steps to update our configuration for `output` and `programs` to account for `PROD` as well. ```{r echo = FALSE} unlink(dir, recursive=TRUE) ``` # AUTOS This adds multiple environments to your search path, each of which contain objects that are automatically sourced. ```{r eval = FALSE} default: autos: ``` ## A working example So let's go back to Tidy McVerse. She has created a custom, one off function and stored this in `/demo/DEV/username/project1/script_library`. We will add this path to the autos config. ```{r eval = FALSE} default: autos: dev_script_library: '/demo/DEV/username/project1/script_library' ``` Let's look at a working example. We will create the directory, place a script into the folder ... ```{r} # create the temp directory dir <- fs::file_temp() dir.create(dir) dir.create(file.path(dir, "/demo/DEV/username/project1/script_library"), recursive = TRUE) # write a function to the folder file_conn <- file(file.path(dir, "/demo/DEV/username/project1/script_library/test.R")) writeLines( "test <- function(){print('test')}", file_conn) close(file_conn) # write the config config_path <- file.path(dir, "_envsetup.yml") file_conn <- file(config_path) writeLines( paste0( "default: autos: dev_script_library: '", dir,"/demo/DEV/username/project1/script_library'" ), file_conn) close(file_conn) ``` ... and call `rprofile()` passing in this config file. ```{r} envsetup_config <- config::get(file = config_path) rprofile(envsetup_config) ``` Now we can see `autos:dev_script_library` was added to the search path. ```{r} search() ``` `test()` is available within this environment, and we can execute this function without sourcing. ```{r} objects("autos:dev_script_library") test() ``` Why on earth would we need this? Just as with our previous data example, these scripts can be in multiple locations during their qualification lifecycle. So let's say Tidy McVerse's friend, Sir Purrr, has a function that is useful for others in this specific project, but it is already in prod. Miss McVerse would like to use her function in dev and Sir Purrr's function in prod. To illustrate this, let's add the prod script library to our config ... ```{r eval = FALSE} default: autos: dev_script_library: '/demo/DEV/username/project1/script_library' prod_script_library: '/demo/PROD/project1/script_library' ``` ... create the `PROD` directory and Sir Purrr's function to `PROD`. ```{r} dir.create(file.path(dir, "/demo/PROD/project1/script_library"), recursive = TRUE) # write a function to the folder file_conn <- file(file.path(dir, "/demo/PROD/project1/script_library/test2.R")) writeLines( "test2 <- function(){print('test2')}", file_conn) close(file_conn) ``` Then we can overwrite our `_envsetup.yml` ... ```{r} # write the config file_conn <- file(config_path) writeLines( paste0( "default: autos: dev_script_library: '", dir,"/demo/DEV/username/project1/script_library' prod_script_library: '", dir,"/demo/PROD/project1/script_library'" ), file_conn) close(file_conn) ``` ... and overwrite the project with our new configuration. ```{r} envsetup_config <- config::get(file = config_path) rprofile(envsetup_config) ``` Now we can see `prod_script_library` was added to the search path, the function `test()` and `test2()` are available, and we can execute these functions without a need for sourcing. ```{r} search() objects("autos:prod_script_library") test() test2() ``` We can keep going and create different configurations for each execution environment, similar to what we did for PATHS above. One example that we would not want to source any functions in dev, when executing in prod. This configuration example is one way you can handle this situation, by blanking out the dev script location when executing in prod. ```{r eval=FALSE} # write the config file_conn <- file(config_path) writeLines( paste0( "default: autos: dev_script_library: '", dir,"/demo/DEV/username/project1/script_library' prod_script_library: '", dir,"/demo/PROD/project1/script_library' prod: autos: dev_script_library: ''" ), file_conn) close(file_conn) envsetup_config <- config::get(file = config_path, config = "prod") rprofile(envsetup_config) ``` ```{r echo = FALSE} unlink(dir, recursive=TRUE) ```