1  Workflow

In this chapter, we will cover the basic guidelines and issues related to the prefered workflow when programming with R programming language. This chapter will cover four different aspects: a) Cloud Storage, b) Version Control System, c) R Studio Projects, and d) File Management practices.

1.1 SharePoint

All files used by the DAU are stored in the Data Analytics folder within the organization’s SharePoint. This is done in order to provide an easy access to all members of the team. As a rule, you are required to work and modify these files directly from this online directory. To achieve this, you will need to download the OneDrive app in your computer and sign in using your WJP account. If you are already using One Drive with your personal account, you can add an additional professional account which will function parallel to your personal account in your computer. For more information, see the following website.

Given that the DAU SharePoint is a directory created and administered by the organization, you will have to sync this directory to your own OneDrive WJP Account. In order to achieve this, follow these steps:

  1. Login into your WJP Outlook account in any web browser.
  2. Once into your account, access OneDrive. This will open a new tab for you.
  3. In the left side panel, locate the Share with Metab and click it.
  4. Locate the Data Analytics folder.
  5. Once that you are inside the folder, you will see the Add Shortcut to My Files,or, Sync Up option in the upper toolbar, depending on your OneDrive version.
  6. Follow the instructions and sync it to your WJP folder in your computer.

By syncing your work with the SharePoint, the whole team is going to be able to see and review changes as they go. However, if more than one person is working on the same file at the same time, it is likely that this working flow will produce several mistakes. Therefore, additionally to this shared cloud storage system, we need a version control tool that will allow us to modify and collaborate in team projects without these kind of issues. Given that most of our work is focused in coding, we use Git Version Control and the GitHub platform for this.

1.2 Git

Git is a free and open source software that allows users to set up a version control system designed to handle projects. Given the nature of its features, it is normally used for collaboratively developing code and data integrity. GitHub is a website and cloud-based service that helps developers store and manage their code, as well as track and control changes to their code. For a gentle introduction to Git and GitHub, see the following post published by Kinsta. For a more in depth introduction, please refer to the GitHub documentation or watch this video tutorial.

Using the Git features allow us to simultaneously work on the same project and even in the same code without worrying about interfering with other members of the team. As a rule, every project carried by the DAU has a code administrator who is in charge of setting up the GitHub repository and add other members of the team as project collaborators. Additionally, the code administrator is in charge of setting the main branch and the initial structure of the code (see the [data-management](https://ctoruno.quarto.pub/wjp-r-handbook/workflow.html#data-management) section on this chapter). It is required that the GitHub repository have its main branch in its respective SharePoint folder.

Note: It is highly recommended to create the repository from GitHub.com and not from the local machine to avoid the initial commit that can include system files such as .DS_STORE files

We use convergence development1 to collaboratively code in the same project. For this, it is highly recommended that each team member works on a separate branch and, once the data routine is done, the auxiliary branches can be merged into the main branch of the repository. Collaborators that are not the code administrator have the option to clone the GitHub repository in their computer in a local directory that it is not sync to the SharePoint and work in their respective branch from a local copy outside the SharePoint. In other words, it is only the functional final version contained inside the main branch the one that it is going to be sync in SharePoint.

Important: GitHub is used to keep track of the code we use in each project. Under no circumstance, we will include the data sources in the online repositories.

1.3 Projects

When you load a data set or source an R script, you will have to set up the working directory where these files are located in your computer. However, the path to this working directory is quite different for all the members of the team. R Studio allow us to enclose all of our analysis, code and auxiliary files into a project.

A project is a feature that allow us to work with the analysis we are carrying without having to worry about where does these files are stored or who is working on them. Say goodbye to setwd("..."). Besides managing relatives paths, R projects allow users to keep a history of actions performed and even keep the objects in your environment. Because of this, projects are the cornerstone of our work when performing analysis with R.

As a rule, every project has a file named project-name.Rproj in its root directory and open it should be the first action when working on a project. For more information on working with R projects, refer to the Workflow section from the R for Data Science book.

1.4 SharePoint Global Access

As mentioned before, we try to keep everything within the Project’s wworking directory. Nevertheless, due to their nature, sometimes we will have to load file from outside this directory. For example, we will constantly loading the GPP or QRQ datasets or the WJP font files. Instead of copying these files into the projects directory, we access them through what we call a Global Access. These files are located in a static path inside the DAU SharePoint. Assuming that you have installed the desktop version of OneDrive, you can access these files quite easily. However, this path is unique in every local machine. Therefore, it is very common to find a variable named path2SP the following lines of code in the settings of every project:

## +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
##
## 2.  SharePoint Path                                                      ----
##
## +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

# SharePoint path
if (Sys.info()["user"] == "johnDoe") {
  path2SP <- "/Users/johnDoe/OneDrive-WJP/Data Analytics/"

} else if (Sys.info()["user"] == "anaPerez") {
  path2SP <- "/CloudStorage/OneDrive-WJP/Data Analytics/"
  
} else if (Sys.info()["user"] == "kumikoNagato"){
  path2SP <- "/Users/kumikoNagato/Library/OneDrive-WJP/Data Analytics/"
  
} else{
  path2SP <- "PLEASE INSERT YOUR PERSONAL PATH TO THE SP DAU DIRECTORY"
  
}

This way, loading or accessing files that are outside the Project’s directory but inside the DAU SharePoint is possible without adjusting the code everytime a different user tries to run the code. For example, if you would like the user to load the WJP fonts, you simply have to paste their location within the SharePoint and the path2SP string to let the code run smoothly between users.

path2fonts<- paste0(path2SP, "6. Country Reports/0. Fonts/")
font_add(family     = "Lato Full",
         regular    = paste0(path2fonts, "Lato-Regular.ttf"),
         italic     = paste0(path2fonts, "Lato-LightItalic.ttf"),
         bold       = paste0(path2fonts, "Lato-Bold.ttf"),
         bolditalic = paste0(path2fonts, "Lato-BoldItalic.ttf"))

1.5 Data Management

The University of California San Diego (UC San Diego) has a Data Management Best Practices that reviews common guidelines for managing research data. In this handbook, I will focus on two topics mentioned by those guidelines: File Organization and Documentation.

1.5.1 File Organization

The file organization involves two important elements: filing system and naming conventions.

A filing system is basically the organization structure (directories, folders, sub-folders, etc) in which files are stored. There are no standard rules about how this should be done. However, the chosen filing system needs to make sense not only to the person currently working on a given project but to anyone going through these files in the future. As a rule, each project would have the following sub-folders:

  • Code: Depending on the complexity of the project, you could choose to create separate directories within the Code folder for Stata, R or Python files.

  • Data: Depending on the complexity and nature of the project, you could choose to create separate directories within the Data folder for RAW, INTERMEDIATE or CLEAN data sets.

  • Outcomes: The outcomes of the project might have several different formats. For example, images could be in PNG and/or SVG format, Reports might be created in PDF, some tables might be exported as TEX files, etc. The outcomes folder should have a separate directory for each one of these formats.

In some cases, the creation of a PDF report is key, for example, the Regional Country Reports. For these projects, we strongly advise to create a separate directory to store the code files used for the report. At the moment of writing this handbook, these reports are created using R Markdown, but a migration to Quarto is feasible in the future.

  • Markdown/Quarto

In relation to the naming conventions, these are a set of rules designed to complement the filing system and help collaborators in understanding the data organization. Each project have the flexibility to use a specific set of naming rules to use in the filing system. However, there are a few general rules to note:

  • Use descriptive file names that are meaningful to you and your colleagues while also keeping them short.

  • Avoid using spaces and make use of hyphens, snake_case, and/or camelCase.

  • Avoid special characters such as $, #, ^, & in the file names.

  • Be consistent not only along the project but also across different projects. If all different data files and routines are named in the same way, it’s easier for you to use those tools across projects and re-factor routines.

1.5.2 Documentation

When initializing the GitHub repository, we strongly suggest to include a README file with it. Such file should be a Markdown document including the following elements:

  • A brief description of the project and the role of the DAU in it.

  • Referred person contact, which should the project leader and the code administrator.

  • A brief description of the filing system along with what to find in each sub-directory.

  • Given that no data is uploaded into the online repositories, include the following descriptions:

    • Data needed to run the code: source and process to obtain it.

    • Location of the data in the SharePoint.

  • A brief description on how to read the code with, if possible, visual aids.

For an example on one of these files, please check the following README file. If possible, use this example as a template for future projects.

Depending on the complexity of each project, the team can add any suitable documentation files if needed. Additionally, the DAU keeps records of significant issues encountered during each project in order to facilitate the solution process in future projects. These records are kept in the Issues Logbook of the DAU.


  1. For more information about convergence development and branches, we advise you to refer to this article from Pluralsight.↩︎