{"data":{"allMarkdownRemark":{"edges":[{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 0<br>Introduction to Jupyter</h1></div>\n\n---\n\n# What is Jupyter?\n\nThe Jupyter Notebook is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and text.\n\nThe name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. \n\nFor this workshop, we will be using R via [Jupyter](https://jupyter.org/index.html)\n\n---\n\nNotebooks are a great tool for exploration and for documenting your workflow.\nNotebooks allow you to write:\n- Code\n- Plots\n- Formatted text\n- Latex text for equations\n\nand much more in a single document!\n\n---\n\n- R is the programming language that runs computations.\n\n- Jupyter is an integrated development environment (IDE) that provides an interface by adding convenient features and tools.\n\n<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/1200px-Jupyter_logo.svg.png\" alt=\"jupyter logo\" align=\"left\" width=\"25%\" >\n\n<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/1920px-R_logo.svg.png\" alt=\"R logo\" width=\"25%\" align=\"center\">\n\n---\n\nYou can think of Jupyter Notebooks as the dashboard of a car.\n\nYou don’t drive a car by interacting with the engine but rather by interacting with the car’s dashboard.\n\nIn the same way, rather than interacting with R directly, we will be using the Jupyter's interface.\n\nJupyter will allow us to:\n- Run R code interactively\n- Use other languages such as Python, Julia, or Matlab!\n\n---\n\nThis is what a Jupyter Notebook looks like:\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/01_jupyter_notebook.png?raw=true\" alt=\"Jupyter Notebook shot\" width=75% align=\"center\">\n\n---\n\n- Notebooks are great for exploration and for documenting your workflow\n- There are many options for sharing notebooks in human readable format:\n  - Share online with [nbviewer.jupyter.org](http://nbviewer.jupyter.org/)\n  - Github renders automatically any notebooks that you push.\n  - You can convert to HTML, PDF, etc. with [nbconvert](https://nbconvert.readthedocs.io/en/latest/)\n\n---\n\n# Let's practice!\n","fields":{"slug":"/chapter1_01_introduction"}}},{"node":{"rawMarkdownBody":"\n# What is Binder?\n\nAlthough we can install software and dependencies in our local machine, we will be working with a Binder on this module. \n\n\n\nA Binder is a code repository that contains:\n\n- Code or content that you’d like people to run. This might be a Jupyter Notebook.\n\n- Configuration files for your environment. This ensures that your code is reproducible.\n\n---\n\nYou will be working simultaneously with the Binder notebook and these slides.\n\nYou can find the  <a href=\"https://mybinder.org/v2/gh/throughput-ec/ec-binder/HEAD\" target=\"_blank\">Binder</a>.\n\nClick on the previous link, the Binder will be launched. Choose to open a Jupyter Notebook and you must see the following:\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/02_binder.png?raw=true\" alt=\"Jupyter Notebook shot\" width=55% align=\"center\">\n\n---\n\nKeep in mind that any work that you do on the Binder will not be saved.\nYou will have to download your work each time you work with the Binder.\n\n---\n\n# Explore the Jupyter Notebook in the <a href=\"https://mybinder.org/v2/gh/throughput-ec/ec-binder/HEAD\" target=\"_blank\">Binder</a>\n\n!","fields":{"slug":"/chapter1_02_using_binder"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 1<br>Introduction to Binder</h1></div>\n\n<head>\n<meta charset=\"UTF-8\" />\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n<title>Page Title</title>\n<style>\n    /* The . with the boxed represents that it is a class */\n    .boxed {\n    background: white;\n    color: black;\n    border: 3px solid black;\n    margin: 0px auto;\n    width: 700px;\n    padding: 10px;\n    border-radius: 10px;\n    }\n</style>\n</head>\n\n---\n\n# What is Binder?\n**Motivation: Going Beyond Paper**\n\n- Binder is a Jupyter tool for sharing interactive notebooks with others.\n\n- When publishing an article in paper, text and images might not suffice for the reader to understand all the expressed concepts.\n\n- Luckily, a lot of researchers publish their workflow on GitHub as well. \n\n- If we ran their code, we would probably understand their ideas better.\n\n- However, we might encounter difficulties while trying to read/reproduce other's people code...\n\n---\n\n# What is Binder?\n**Motivation: Reading Other People's Code**\n\n<img src=\"https://www.explainxkcd.com/wiki/images/8/89/code_quality_3.png\" alt=\"Other people code map\" width=65% align=\"center\">\n\n[Source: XKCD cartoon](http://xkcd.com/1833/)\n\n---\n\n# What is Binder?\n**Motivation: Sharing Code**\n\nSharing and reproducing other people's code may come with some challenges.\n\nBecause of this, there are different tools to share code that is reproducible:\n\n- Creating Virtual Environments\n- Creating a Docker Image\n- Writing a very precise manual on how to create the right environment to run your code\n\nEach of these methodologies come with each own set of challenges and it might be complicated or require some expertise.\nFurthermore, they will still require some efforts from your user.\n\n---\n\n# What is Binder?\n**Motivation: Sharing Code**\n\nWith Binder, we can:\n\n- Get/Provide one link with a prebuilt environment where we can run the Jupyter notebook or Rmd smoothly.\n\n- Spend the time understanding the code rather than setting up the environment to execute the code.\n\n---\n\n# What is Binder?\n**Sharing a Single Link**\n\nThat way, our emails can transition from this: \n\n<body>\n<div class=\"boxed\">\n\nHi Jane,  \nI am so happy that you like our project and that you want to run our code to understand our  figures better! \nTo run our code without installing dependencies, you will need to:  \n- Install Docker and repo2docker   \n- Download the image from the DockerHub - you can find the link in the repo.\n- Run from your terminal  \n```\nrepo2docker https://github.com/throughput-ec/ec-workshops\n```\nThat will generate a long output and at the end there will be a URL. Copy that ULR and paste it into your browser.  Send me an email if you have any issues with the installations.\nBest,  \nS\n\n</div>\n</body>\n\n---\n\n# What is Binder?\n**Sharing a Single Link**\n\nto this:\n\n<div class=\"boxed\">\n\nDear Jane,  \nI am so happy that you like our project and that you want to run our code to understand our  figures better! \nPlease click on this link to start executing our code.  \n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/throughput-ec/ec-workshops/binder)   \nBest,  \nS\n\n</div>\n\n---\n\n# Uses of Binder\n**Motivation: Your Next Project**\n\nIf your intent is to share a Notebook or an Rmd file, consider using Binder.\n\nOther popular uses for Binder include:\n\n- Sharing computational work or papers\n- Sharing educational material\n- Generating interactive open-source package documentation\n- Creating live demonstrations\n\n---\n\n# Let's review what we learned!","fields":{"slug":"/chapter4_01_introduction_to_binder"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 0<br>Learning Outcomes</h1></div>\n\n---\n\nBy the end of this module, you should:\n\n1. Understand what a Binder is.\n\n2. Write a standard configuration file; such as `environment.yml` or an `install.R`.\n\n3. Prepare a repository for Binder.\n\n4. Run a Jupyter Notebook or an Rmd in a temporary environment in the cloud ([Binder](mybinder.org))\n\n5. Share a link to a reproducible workflow.\n\n6. Know the limitations of Binder\n\n---\n\n# Let's learn about Binder!","fields":{"slug":"/chapter4_00_learning_outcomes"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/>Your First Notebook</h1></div>\n\n---\n\n### Create a New Notebook\n\n- Navigate to the `binder` repository. Here is the [link]('https://mybinder.org/v2/gh/throughput-ec/ec-binder/HEAD').\n\n- Under Notebooks, select R.\n- This will create a new untitled notebook\n  - Note the .ipynb extension (comes from \"interactive Python notebook\", the previous name before it was changed to Jupyter to reflect multi-language support)\n  - Rename the notebook to \"workshop.ipynb\"\n\n- Notebooks usually auto-save periodically, since we are in a Binder, all your new content will be lost once the Binder is closed.\n- You can download your notebooks by clicking on File -> Download\n---\n\n## Working with Notebooks\n\nA notebook consists of a series of \"cells\":\n- **Code cells**: execute snippets of code and display the output\n- **Markdown cells**: formatted text, equations, images, and more\n\nBy default, a new cell is always a code cell.\n\n---\n\n## Code Cells\n\nTo run a code cell, click in it and press `Shift-Enter` or press the Run button on the toolbar\n\nThis is an example of a Code Cell\n\n```r\n# Print something\nprint(\"Hello world\", quote = FALSE)\n```\n\n```out\nHello world\n```\n\n---\n\n## Markdown Cells\n\nIn Markdown cells, you can write plain text or add formatting and other elements with [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). These include headers, **bold text**, *italic text*, hyperlinks, equations $A=\\pi r^2$, inline code `print('Hello world!')`, bulleted lists, and more.\n\n\nThis is a markdown cell:\n```\nHello~!\n```\n---\n\n\n- To create a Markdown cell, select an empty cell and change the cell type from \"Code\" to \"Markdown\" in the dropdown menu on the toolbar\n- To run a Markdown cell, press `Shift-Enter` or the Run button on the toolbar\n- To edit a Markdown cell, you need to double-click inside it\n\n---\n\n## Other Notebook Basics\n\n- Organizating cells &mdash; insert, delete, cut/copy/paste, move up/down, split, merge\n- Running all cells or selected cell(s)\n- Restarting and interrupting the kernel\n- Caveat: Notebooks are nonlinear and running cells out of order can sometimes lead to unexpected results\n  - It's good practice to periodically restart the kernel and run all cells, making sure that everything works as expected when you run the whole notebook from top to bottom\n- Closing vs. shutting down a notebook &mdash; kernel process in background\n- Re-opening a notebook after shutdown\n  - All the code output is maintained from the previous kernel session\n- Clear output of all cells or selected cell(s)\n\n---\n\n# Let's practice!\n","fields":{"slug":"/chapter1_03_creating_a_notebook"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 2<br>Setting a Binder Up</h1></div>\n\n---\n# Getting to the Hub\n\nIf you visit [mybinder.org](https://mybinder.org/) you will encounter the following screen:\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/00_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/00_binder.png?raw=true\" alt=\"Binder\" width=45% align=\"center\" title = \"Click to zoom in\">\n</a>\n\nBefore you can fill in the information, what do you need to create a Binder repository?\n\n---\n\n# What Do you Need to Build a Binder Repository?\n**Git Repository**\n\n- You will need to have a Git repository.\n\n- The repository must be in a *public* location online. \n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/02_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/02_github.png?raw=true\" alt=\"Git requirements for Binder\" width=35% align=\"center\" title = \"Click to zoom in\">\n</a>\n\n\n---\n# What Do you Need to Build a Binder Repository?\n**Git Repository**\n\n- You can work with other Git repository hosting manager tools such as:\n    - `GitHub`, `GitLab`, `Bitbucket`, and MORE!\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/01_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/01_binder.png?raw=true\" alt=\"Binder site\" width=45% align=\"center\" title = \"Click to zoom in\">\n</a>\n\n---\n\n# What Do you Need to Build a Binder Repository?\n**Configuration Files**\n\n- The repository must have configuration files that specify its environment.\n\n- These configuration files should be placed in the root of the repository or in a binder/ folder in the repository’s root.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/03_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/03_github.png?raw=true\" alt=\"Configuration Files\" width=45% align=\"center\" title = \"Click to zoom in\">\n</a>\n---\n\n# What Do you Need to Build a Binder Repository?\n**A File to Share**\n\n- The repository contains content designed for people to read.\n    - A Jupyter Notebook \n    - An R script to make a visualization.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/04_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/04_github.png?raw=true\" alt=\"Pointing to sharing the right file\" width=45% align=\"center\" title = \"Click to zoom in\">\n</a>\n\n---\n\n# What Do you Need to Build a Binder Repository?\n**Security**\n\n- The repository **does not** require any sensitive information \n    - Passwords\n    - API secrets\n    - Personal information\n    - Private data\n\n---\n\n# What Do you Need to Build a Binder Repository?\n**A BinderHub**\n\n- Binders are powered by a BinderHub, an open-source tool that deploys the Binder service to the cloud.\n\n- There are several BinderHubs that you may use:\n    - [Binder Pangeo](https://binder.pangeo.io/)\n    - [mybinder.org](https://mybinder.org/)\n    - [Alan Turing Institute Binder](https://turing.mybinder.org/)\n    - and [others](https://mybinder.readthedocs.io/en/latest/about/federation.html)\n\n---\n# What Do you Need to Build a Binder Repository?\n**A BinderHub**\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/00_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/00_binder.png?raw=true\" alt=\"Binder Org\" width=45% align=\"center\" title = \"Click to zoom in\"> \n</a>\n\n---\n\n# Binder's Behind the Scenes\n**repo2docker**\n\nBinder uses a tool that mimics how humans do reproducible code **repo2docker**.\n\n- It clones a github repository.\n\n- It looks for configuration files \n    - These files describe the dependencies needed for the project.\n    - It recognizes files named: `environment.yml`, `requirements.txt`, `install.R`, `Dockerfile`, and MORE.\n\n- It installs the dependencies based on the configuration file.\n\n- Starts a Jupyter Notebook / RStudio session.\n\n---\n\n# Let's practice what we learned!","fields":{"slug":"/chapter4_02_what_does_a_binder_repo_need"}}},{"node":{"rawMarkdownBody":"  \n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 3<br>Setting Up the Python Binder</h1></div>\n\n---\n\n# Step 1\n\n- Create a **public** GitHub repository.\n- You can name your GitHub repository as you like.\n- Initialize your repository with a README!\n- Clone the repository to your local machine.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/02_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/02_github.png?raw=true\" alt=\"Git requirements for Binder\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 2\n\n- Inside your Github repository folder:\n    - Create a Jupyter notebook.\n    - Open a new code cell and write:\n    ```\n    import folium\n    import pandas\n\n    m = folium.Map(location=[49.267665596, -123.241999032], zoom_start=12)\n    tooltip = \"Click Here For More Info\"\n\n    marker = folium.Marker(\n        location=[49.267665596, -123.241999032],\n        popup=\"<stong>UBC</stong>\",\n        tooltip=tooltip)\n    marker.add_to(m)\n    m\n    ```\n    - Save the Jupyter notebook. \n\n---\n\n# Step 3 \n\n— Create an `environment.yml` file into your Github repository.\n\nFor our previous example, our dependencies are `folium` and `pandas`.\n\nOur `environment.yml` file should look like this:\n\n```\nname: my-example-environment\nchannels:   \n  - conda-forge \ndependencies:\n  - pandas\n  - folium\n```\n\nYou can find a template in the next \"activity\"\n\n---\n\n# Step 4 \n\n- Push all your repository changes back to GitHub.\n- Your repository should look now like this:\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/05_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/05_github.png?raw=true\" alt=\"First py binder repo\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 5\n\n- Go to my binder.\n- Type the URL of your repo into the \"GitHub repo or URL\" box. It should look like this:\n```\nhttps://github.com/your-username/my-first-python-binder\n```\n\n- Where it says Git ref type in `main` or the branch that you woud like to use.\n- Where it says \"URL to open (optional)\" type in the notebook file name and choose \"file\". \n\n- As you type, the webpage generates a link in the \"Copy the URL below...\" box. It looks like this:\n```\nhttps://mybinder.org/v2/gh/your-username/my-first-python-binder/HEAD\n```\n\n---\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/06_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/06_binder.png?raw=true\" alt=\"Zero to Binder\" width=65% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n\n---\n\n# Step 5b\n\n- Once this is done simply hit the launch button. \n- My Binder will create your binder repo in a few minutes.\n- Be patient. The first time it might take some while to build.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/07_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/07_binder.png?raw=true\" alt=\"Binder Process\" width=40% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 6\n\n- Copy the generated link, open a new browser tab and visit that URL.\n\n- You will see a \"spinner\" as Binder launches the repository.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/08_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/08_binder.png?raw=true\" alt=\"Spinner\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n\n---\n\n# Step 7\n\n- Go to the link provided by Binder. \n- You should now be able to work and navigate the last version of your pushed Jupyter notebook.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/09_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/09_binder.png?raw=true\" alt=\"Binder from Git\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 8\n\n- Once built, you can share the link to this with anybody you want to run your project on their machine.\n\n- Save your LaunchBinder Badge and share it! [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sedv8808/my-first-python-binder/main?labpath=my-folium-map-notebook.ipynb)\n\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/10_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/10_binder.png?raw=true\" alt=\"Binder Badge\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Let's practice!","fields":{"slug":"/chapter4_03_python_repository"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 4<br>Setting Up the R Binder</h1></div>\n\n---\n\n# Step 1\n\n- Create a **public** GitHub repository.\n- You can name your GitHub repository as you like.\n- Initialize your repository with a README!\n- Clone the repository to your local machine.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/11_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/11_github.png?raw=true\" alt=\"Binder Process\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 2\n\n- Inside your Github repository folder:\n    - Create an Rmd file.\n    - Copy and paste the following slide. (We'll learn more about Rmd files in the next module.)\n    - Save the Rmd file. \n\n---\n\n'''\n\n    ```{r setup, include=FALSE}  \n    knitr::opts_chunk$set(echo = TRUE)  \n    library(leaflet)  \n    leaflet(options = leafletOptions(minZoom = 0, maxZoom = 18))  \n    ```\n\n    ## My Leaflet Map\n\n    **TASK:** Find UBC in a Leaflet map.\n \n\n    ```{r}\n    map1 <- leaflet() %>%  \n                addProviderTiles(providers$Stamen.TerrainBackground) %>%  \n                addTiles() %>%  \n                addCircleMarkers(lng =-123.241999032 , lat = 49.267665596,  \n                popup = paste0(\"UBC\"))  \n    map1  \n    ```\n'''\n\n---\n\n# Step 3 \n\nYou will need two files in your repository:\n1. `runtime.txt` Specify the R version by date. The easiest day, write today's date (e.g. r-2021-12-07). \n\n    ```\n    r-2021-12-07\n    ```\n\n2. `install.R` A list of `install.packages('package_name')` commands, one per line.\n    For our example\n    ```\n    install.packages(c(\"leaflet\", \"tidyverse\"\n                       \"knitr\", \"rmarkdown\",\n                       \"caTools\", \"bitops\"))\n    ```\n    \nYou can find a template for both files in the next section.\n\n---\n\n# Step 4 \n\n- Push all your repository changes back to GitHub.\n- Your repository should look now like this:\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/12_github.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/12_github.png?raw=true\" alt=\"R repo\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 5\n\n- Go to my binder.\n- Type the URL of your repo into the \"GitHub repo or URL\" box. It should look like this:\n```\nhttps://github.com/your-username/my-first-R-binder\n```\n\n- Where it says Git ref type in `main` or the branch that you woud like to use.\n- Where it says \"URL to open (optional)\", choose URL and type `rstudio`\n- As you type, the webpage generates a link in the \"Copy the URL below...\" box. It should look like this:\n```\nhttps://mybinder.org/v2/gh/your-username/my-first-R-binder/main?urlpath=rstudio\n```\n\n---\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/13_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/13_binder.png?raw=true\" alt=\"Binder Page\" width=65% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 5b\n\n- Once this is done simply hit the launch button. \n- My Binder will create your binder repo in a few minutes.\n- Be patient. The first time it might take some while to build.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/14_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/14_binder.png?raw=true\" alt=\"Binder loading\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 6\n\n- Copy the generated link, open a new browser tab and visit that URL.\n\n- You will see a \"spinner\" as Binder launches the repository.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/08_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/08_binder.png?raw=true\" alt=\"Binder Spinner\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n\n---\n\n# Step 7\n\n- RStudio will open in your browser.\n- You will have to open your `.Rmd` file manually by clicking on it.\n    - You can find it on the bottom right panel.\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/15_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/15_binder.png?raw=true\" alt=\"Binder from Git\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Step 8\n\n- Once built, you can share the link to this RStudio instance with anybody you want to run your project on their machine.\n\n- Save your LaunchBinder Badge and share it!\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sedv8808/my-first-R-binder/main?urlpath=Rstudio)\n\n<a href=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/16_binder.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/16_binder.png?raw=true\" alt=\"Binder from Git\" width=45% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n# Let's practice!","fields":{"slug":"/chapter4_04_R_repository"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> What We Learned</h1></div>\n\n--- \n\n# Binder is ...\n\n- Code sharing made easy\n\n- You will never have to say again \"but it ran in my computer\"\n\n- A Binder Repository contains at minimum two elements:\n    - Code or content that you want people to be able to run. \n    - Configuration files for the environment.\n    \n- Binders Hubs allow you to deploy Juypter Notebooks and Rmd the easy way\n\n---\n\n# Advantages of Binder\n\n- Binder lets you share links to interactive data analytics environments. This is great for workshops, tutorials, and classes.\n- Binder can provide interactivity to documentation and demonstrations of tools. \n- Binder can provide interactivity to readers, allowing them a more rich experience with your content.\n\n---\n\n# Limitations of Binder\n- Each instance is limited to 2 gb of RAM and will get destroyed after 10 minutes of inactivity. \n- Each instance can run for a maximum of 24 hours before it will get killed.\n- You can get around these limitations by hosting your own binder hub but this will require compute + devops resources from your side.\n\n---\n\n# Let's use what we learned!","fields":{"slug":"/chapter4_05_summary_and_conclusions"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 0<br>Learning Outcomes</h1></div>\n\n---\n\nBy the end of this module, you should be able to:\n\n- Create, edit and run an Rscript \n- Write and run R Markdown reports\n- Create and run R Markdown presentations\n- Elaborate an RStudio project\n- Edit/customize RStudio project settings \n- Use RStudio as a Git client to manage your project's version control (?)\n\n---\n\n# Let's Get Started!","fields":{"slug":"/chapter5_00_learning_outcomes"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 1<br>Introduction to RStudio</h1></div>\n\n---\n\n# What is RStudio?\n\nRStudio is an IDE (integrated development environment) mainly used for R.\n\nIt includes tools for:\n- linting\n- code completion\n- debugging.\n\nYou can also work with Rmd files which enable a notebook-like functionality.\nYou can use other language engines, such as Python.\n\n---\n\n## Opening RStudio\n\nIf you open *RStudio* you will encounter the following screen:\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module4/00_rstudio.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\n## Running Rscripts in RStudio\n\n- Launch this [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/throughput-ec/ec-binder/main?urlpath=rstudio) in a new tab and follow these steps:\n\n- Creating a new R script: \n  * From the menu clcik on: \n      File -> New File -> R Script\n\n- Write a simple R code the document that shows up in the editor:\n```r\n4 + 5\n```\n\n- Run your code clicking on the \"Source\" button (upper right-hand side) to run the entire document.\n- To run a single line, type Ctrl+Enter (Command+Return) to run the current line or highlighted code\n\n--\n\n## Code output\n\nWhen running R code in the console output ends up in one of to places, depending on the type of output:\n\n* Textual output: printed to the console.\n\n* Graphical output: displayed in the Files/Help/Plots panel *usually* at the bottom right of the Window.\n\n---\n\n## Getting help in RStudio\n\nIf you want to know more about what a function or package does, type a `?` followed by the function's name\n\n```\n?function_name\n```\n\nHelp is going to be available in the bottom right pane of RStudio.\n\n---\n\n## The Files Pane Importance\n\nWhen you open in RStudio a `.R` or `.Rmd` file the RStudio, the current working directory is **not** neccesarily the project working directory, or the directory of the file you opened.\n\n---\n\n## Where am I? (or the files pane)\n\n**EVERY SESSION** you need to tell RStudio what your `working directory` is. Especially if you are loading different files.\n\nYou can find out where you are by:\n\n1. typing `getwd()` in the console\n\n2. In the files panel, click the cog/More button and then click \"Go To Working Directory\"\n\n---\n\n## Setting your Working Directory\n\nSet your working directory to the root directory of the Git repository you are working in!\n\nYou can set the working directory using the following 3 ways:\n\n1. In the Session menu, click Set Working Directory and then Choose Directory. Navigate the opened file browser to choose the directory. \n\n2. In the files panel, navigate the file structure to where you want the working directory to be. Then click the cog/More button and then click \"Set As Working Directory\"\n\n3. type `setwd(\"PATH\")` in the console.\n\n---\n\n## Setting the working directory is important!\n\nIf you are working in RStudio and you start feeling lost, you probably forgot to set the working directory.\n\nWe will see how to create R Projects later which will help us keep our directory clean.\n\n**Suggestion:** Try using the [here](https://github.com/jennybc/here_here) package instead.\n\n---\n\n# Let's Practice!\n","fields":{"slug":"/chapter5_01_introduction_to_Rstudio"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 3<br>Creating a Presentation with R Markdown</h1></div>\n\n---\n\n## How to Set Up a Presentation\n\n- Open a new R Markdown file\n\n- In the YAML header output specify `ioslides_presentation` \n\n- You can create a slide show broken up into sections by using the # and ## heading tags \n\n- A Slide without a header can be set by using a horizontal rule (---). \n\n- Code and text will be rendered: remember to use your code chunks options accordingly.\n\n- Do the Next exercise to do your first R Markdown presentation\n\n---\n\n# Let's practice what we learned!\n","fields":{"slug":"/chapter5_03_creating_a_presentation"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 2<br>Creating an R Markdown Document</h1></div>\n\n---\n\n## What is an R Markdown\n\nAn R Markdown document is very similar to a Jupyter Notebook.\n\nIt allows you to create and share documents that contain live code, equations, visualizations, and text.\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/00_rstudio.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\n## Getting Started\n\n- To create a new R Markdown: \n  * From the menu clcik on: \n      File -> New File -> R Markdown\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/03_create_markdown.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\n- You will be asked to choose some settings for the R Markdown.\n- For now, leave the default options. \n  - At the end of the module you will see which other outputs you could choose.\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/04_create_markdown.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\n## Text and rendering R Markdown documents\n\nIn a RMarkdown document any line of text not in a code chunk will be formatted using Markdown. You can use HTML and LaTeX here to do more formatting. \n\nUnlike Jupyter, code and text do not render on their own; you will need to \"knit\" (render) the whole document in order to see the rendered output. \n\nClicking the \"Knit\" button on the top to render the docuemnt:\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/01_knit_button.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n \n---\n\nWhen you \"knit\" a Markdown (`.md`) file will be created and a new window will pop up with your rendered document (usually a `.pdf` or `.html` document). \n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/02_rmd.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\n## Rendering Often\n\nIt is important to \"knit\" as often as you make important changes.\n\nOtherwise, an error in a LaTeX equation or a code chunk will stop the rendering process resulting in an error. \n\nIf you have a long document, it might be harder to identify what went wrong, so it is important to render often so that you will easily identify and fix errors. \n\n---\n\n## Creating code chunks\n\nInstead of code cells as in Jupyter, R Markdown has code chunks. \n\nTo start a Code Chunk: \n- Write 3 backticks (\\`\\`\\`) followed by curly braces containing the language engine you want to run (usually r): `{r}`. \n- If you want to use python, do `{python}`\n\n- Code is entered on the lines below.\n\n- To finish a Code Chunk, close it with 3 more backticks (\\`\\`\\`).\n\n---\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/05_code_chunk.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n- Code chunks are run when you knit the entire document. \n- The code in the chunk and the code output will be included in your rendered Markdown (`.md`) document. \n\n---\n\n## Naming Code Chunks and Markdown sections\n\nWhen you include Markdown headers (`#` symbol) R Studio automatically creates a pop-up-like menu for you to use to navigate the document:\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/06_title_navigator.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\n---\n\nYou can also name Code Chunks by writing the desired name after the language engine inside the curly braces:\n```{r my-name}\n# This code chunk is named my-name\n```\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/07_code_navigator.png?raw=true\" alt=\"Binder\" width=45% align=\"center\">\n\nBy clicking on any of the options in the pop-up-like menu, RStudio will navigate you to that section of the R Markdown document. \n\nWARNING: Do not duplicate code chunk names.\n\n---\n\n## Code Chunk Options\n\nThere are many code chunk options that you can set, for example:\n- Choose whether a chunk is evaluated - allowing you to ignore code that may have errors and still render the Rmd file.\n- Choose Whether to include the output in the rendered document\n- Lots of ther [options](https://yihui.org/knitr/options/#chunk-options) document.\n\n---\n## Code Chunk Options\n\n- Code Chunk options can be set at a global level or locally for a specific chunk.\n- Global options are set in one chunk at the top of the document. For example:\n\n```{r, setup, include=FALSE}\nknitr::opts_chunk$set(\n  comment = '', fig.width = 6, fig.height = 6\n)\n```\n\n- Global chuncks are set by adding them as arguments to the function `knitr::opts_chunk$set(...)`\n\n---\n\n## Code Chunk Options\n\n- Local chunk options are set by adding the options in the curly braces of a code chunk after the language engine and code chunk name. For example:\n\n```{r correlation no warning, warning = FALSE}\ncor( c( 1 , 1 ), c( 2 , 3 ) )\n```\n\n- Separate multiple options in a code chunk with a comma. \n\n---\n\n## Document output options\n\n- Besides Markdown text and code chunks, you can add an optional YAML header to your document.\n\n- Specify a YAML header surrouding it with `---`\n\n- Include:\n  - title\n  - author\n  - output\n  - etc\n---\n\n## Example YAML Header**\n\n~~~\n ---\n title: \"Finding coordinates in a map\"\n author: \"My Name\"\n date: \"December 07, 2021\"\n output: html\n ---\n~~~\n\n---\n\nOutput options include:\n\n- `output: github_document`\n- `output: html_document`\n- `output: pdf_document`\n- [others](https://bookdown.org/yihui/rmarkdown/output-formats.html)","fields":{"slug":"/chapter5_02_creating_an_Rmd_file"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> What We Learned</h1></div>\n\n--- \n\n## RStudio\n\n- RStudio is an integrated development environment (IDE) mainly for R. \n- It includes 4 panels, including:\n  - an editor that supports direct code execution\n  - tools for plotting, history, debugging\n  - a console.\n\n- It allows us to create Rmd files which work as Jupyter Notebooks\n- We can create self contained code projects using R projects.\n\n---\n\n## An R Markdown File\n\n- A file format for making dynamic documents with R. \n- An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code and markdown text.\n- R Markdown files are the source code for reproducible documents. You can transform an R Markdown file in two ways:\n  * `knit` - knitr will run each chunk of code in the document and append the results of the code to the document next to the code chunk. This workflow facilitates reproducible reports.\n\n  * `convert` - The rmarkdown package will use the pandoc program to transform the file into a new format such as: HTML, PDF, or Microsoft Word file. This is defined in the YAML header.\n---\n\n# Let's practice what we learned!","fields":{"slug":"/chapter5_05_summary_and_conclusions"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 4<br>Creating an RStudio Project</h1></div>\n\n---\n\n# What is an R Project?\n\n- A directory containing a special file: `*.Rproj`\n\n- Allows you to do the following:\n\n  - use RStudio as a Git client\n\n  - stop using `setwd()` to set your working directory\n\n---\n\n## Getting Started\n\n- To create a new R Project: \n  * From the menu clcik on: \n      File -> New Project\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/09_rproj.png?raw=true\" alt=\"RProject\" width=45% align=\"center\">\n\n---\n\n- A pop up will show up.\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/10_rproj.png?raw=true\" alt=\"RProject\" width=45% align=\"center\">\n\n---\n\n## Creating RStudio projects\n\nYou can choose if you want to start a new project from:\n- A GitHub repository\n- An existing directory\n- A completely new directory\n\n---\n\n## Creating from a completely new directory\n\n- Click on `New Directory`\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/11_rproj.png?raw=true\" alt=\"RProject\" width=45% align=\"center\">\n\n---\n\n- Name the project and browse where would you like to set it up.\n\n- Optionally, you can also set up a new git repository from here.\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/12_rproj.png?raw=true\" alt=\"RProject\" width=45% align=\"center\">\n\n---\n\nIf you already have a git repository\n\n1. Click on `Version Control`\n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module5/10_rproj.png?raw=true\" alt=\"RProject\" width=45% align=\"center\">\n\n\n2. Fill in:\n  - Repository URL\n  - Create Project as a subdirectory of\n  \n---\n\n## Motivation - avoid using setwd()\n\nSo what’s wrong with:\n\n```\nsetwd(\"~/USER/my_awesome_project/sub_project_1/data\")\nread_data(\"data_shared_with_everyone.csv\")\n```\n\n- The chance of the `setwd()` command making the file paths work for anyone besides its author is 0%. Even the future author might have issues down the line.\n\n- Your data analysis project is not self-contained and portable, which makes recreating your work impossible.\n\n---\n\n## Solution - Where is your working directory?\n\n- After you created an R project, your working directory should now be the R Project's root directory. \n\n**Verify that by typing `getwd()` in your R console**\n\n---\n\n## Using RStudio to drive Git\n\n- RStudio can be used as a Git GUI to `add`, `commit`, `push` and `pull` your changes. \n\n- This only works IF you have a `.Rproj` file.\n\n- You can find the Git tab in the upper right panel of RStudio.\n\n---\n\n## Use `.Rproj` to open Rstudio\n\n- You can double click on the `.Rproj` file of an RStudio project to open RStudio.\n\n- When you do this, it also sets the currentl working directory to be the RStudio project's working directory.\n\n---\n\n## Organizing projects:\n\n- A data analysis task can be organized using RStudio Project. \n\n- A suggestion on how to organize a project directory is:\n\n```\ndata/\nresults/\nscripts/\n.Rproj\n```\nwhere `data/`, `results/`, and `scripts/` are directories as well.\n\n- When you need to share an analysis, you can share the entire project over. \n\n- This will keep the structure of your project.\n\n---\n\n# Let's Practice What We Learned","fields":{"slug":"/chapter5_04_R_project"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 0<br>Learning Outcomes</h1></div>\n\n---\n\nBy the end of this module, you should be able to:\n\n- Understand what Docker is\n- Use Docker Images from the DockerHub\n- Create your own Docker Images\n\n<img src=\"https://res.cloudinary.com/practicaldev/image/fetch/s--up7-nOgB--/c_imagga_scale,f_auto,fl_progressive,h_420,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/mfnwwxkfx46xzrlndw12.png\" alt=\"Motivation Docker\" width=50% align=\"center\">\n\n\n---\n\n# Let's Get Started!","fields":{"slug":"/chapter6_00_learning_outcomes"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 1<br>Introduction to Docker</h1></div>\n\n---\n\n## What is Docker?\n**Motivation**\n\n<img src=\"https://res.cloudinary.com/practicaldev/image/fetch/s--lIJpZE9A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/34328907/78120341-7b56f900-7429-11ea-9b3f-1a5e17b813da.png\" alt=\"Motivation Docker\" width=45% align=\"center\">\n\n[Source](https://res.cloudinary.com/practicaldev/image/fetch/s--lIJpZE9A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/34328907/78120341-7b56f900-7429-11ea-9b3f-1a5e17b813da.png)\n\n---\n\n## What is Docker?\n**Motivation**\n\nImagine the following scenario:\n- You are working on an analysis in R / python and you send your code to a colleague.\n\n- Your colleague runs exactly this code on exactly the same data but they get a different result or even worse, an error. \n\n- Some reasons for this to happen: A different operating system, a different version of R or of an R package. \n\n- Docker is trying to solve problems like that. \n\n- Docker is an open-source containerization platorm.\n\n---\n\n## What is Docker?\n\n- A Docker container can be seen as a computer inside your computer. \n\n- You can send this \"inside computer\" to your colleagues.\n\n- Your colleagues will use this \"virtual computer\" and run your code. They will get exactly the same results.\n\n- Docker allows you to wrangle dependencies (from the operating system up to details such as R, python and Latex package versions).\n\n- It makes sure that your analyses are reproducible.\n\n- It makes your analysis **portable**  and **sharable**.\n\n---\n## Important Vocabulary\n\nThrough our journey with Docker, the following terms might come often:\n\n- Virtual Machine\n- Container\n- Docker Image\n- Docker Container\n\n---\n\n### Virtual Machine\n- Not different from your tangible computer/phone/server.\n\n- A VM has its own CPU, storage, memory, and access to the internet.\n\n- software-based versions of a computer sorted in a file tipically called an **image**\n\n- A *VM image* is a set of instructions on how exactly to assemble the code and achieve a desired software configuration.\n\n---\n\n### Container\n- Software executable bundles where the bins, libraries, and dependencies are packaged alongside their code under a standardized framework.\n\n- Lightweight images of application can then be run anywhere: desktop, cloud, etc...\n\n- This avoids the \"It works on my machine\" problem.\n\n- \"docker\" and \"containers\" are terms that currently, people use interchaingeable.\n\n---\n\n### Docker Image\n\n- Source code for binaries, libraries, tools, dependencies that are required to function as an application.\n\n- When Docker runs an image, it becomes a container.\n\n- One image can be the base for multiple containers that share commonalities.\n\n- You can add several \"layers\" of images on top of the container layer. \n\n---\n\n### Docker Container\n\n- What we ultimately build: the application that is completely interactable by users and administrators.\n\n- Remember Docker Images are a set of instructions to build a container.\n\n- This is the \"background service\" that runs the Docker operations.\n\n- Keeps track of Docker operations and assigns them with proper tags.\n\n---\n\n# Let's Practice!\n","fields":{"slug":"/chapter6_01_introduction_to_docker"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 2<br>Launching Docker</h1></div>\n\n---\n\n## Installation\n\n- Follow the installation instructions. [Get Docker](https://docs.docker.com/get-docker/). \n\n- You will be directed to several tutorials. Feel free to take a look at them.\n\n---\n\n## Launching Our First Docker Image\n\n1. Launch Docker is to open a Unix Shell:\n  - On Mac or Windows, use the `Docker Quickstart Terminal` that you installed.\n  - On a Linux machine, use a terminal prompt.\n\n2. We are going to use a pre-existing image: [rocker/geospatial](https://hub.docker.com/r/rocker/geospatial). \n\n3. In the `Docker Quickstart Terminal` type:\n\n```\ndocker run --rm -p 8787:8787 rocker/geospatial\n```\n\n---\n\n## Launching Our First Docker Image\n\n`*p` and `--rm` are flags that customize how the container is run. \n\n`-p` : This is the port that you need to specify where you will be working.  \nSince we specified that we wanted to use port 8787 in our command line, we need to go to:\nhttp://localhost:8787/\n\n`-–rm`: delete the container when it is quitted. Otherwise, a version of it will be saved to our local computer.  \nContainers can occupy a lot of disk space.\n\n---\n## Launching Our First Docker Image\n\n- You should see:\n````markdown\n                        ##         .\n                  ## ## ##        ==\n               ## ## ## ## ##    ===\n           /\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\\___/ ===\n      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~\n           \\______ o           __/\n             \\    \\         __/\n              \\____\\_______/\ndocker is configured to use the default machine with IP 192.168.99.100\nFor help getting started, check out the docs at https://docs.docker.com\nThus, you would enter http://192.168.99.100:8787 in your browser as the url.\n````\n---\n\n## The Hub\n\nWhere did `rocker/geospatial` came from? \n- If you try to run a Docker container which you have not installed locally, Docker will automatically search for the container on Docker Hub (an online repository for docker images).\n- If it exists, it will download it.\n- In our case, this is the <a href=\"https://hub.docker.com/r/rocker/geospatial\" target=\"_blank\"> Docker repository </a>\n- You can notice that it contains some instructions and all the dependencies that are included in this image.\n\n---\n\n- The command above will lead RStudio-Server to launch invisibly. \n- To connect to it, open a browser and enter `http://localhost:8787/`\n- You should see RStudio welcome screen. Log in using:\n\n```\nusername: rstudio \npassword: # given in terminal\n```\n\n---\n\n- You are able to work with RStudio in your browser in much the same way as you would on your desktop.\n\n- Look at your terminal, the password is there. You can change the password with the flag `-e PASSWORD`\n\n- Exercise: Change the login password.\n\n```\ndocker run --rm -e PASSWORD=<YOUR_PASS> -p 8787:8787 rocker/geospatial\n```\n---\n\n## Linking Volumes\n\n1. Given that we used the `--rm` flag when we launched the Docker container, anything we create on the machine will be gone. Let’s verify this.\n\n2. Open a new R script.\n\n3. Enter the following code in the script, run it and save it:\n\n```\n# make x the numbers from 1 to 5, and y the numbers from 6-10\nx <- 1:5\ny <- 6:10\n\n# plot x against y\nplot(x, y)\n```\n---\n\n## Linking Volumes\n\n4. In your files panel, you will see the new script file.\n\n5. Close the browser tab where you have RStudio open, and then go to your terminal window from where you launched the Docker container and type Contol+C. This shuts down the Docker container.\n\n6. Relaunch a Docker container using the RStudio image as you did previously:\n```\ndocker run --rm -p 8787:8787 rocker/geospatial\n```\n\n7. Go to `http://localhost:8787/` in your browser.\n\n8. The R script you created ... is... gone.\n\n---\n\n## Linking Volumes\n\n- Linking a volume allows to access data and save files in our local machine.\n\n- When launching a container, add the `-v` flag along with the path to your project’s root directory and the path to the container: \n```\ndocker run --rm -p 8787:8787 -v /Users/your_user/Documents/r_docker_tutorial:/home/rstudio/r_docker_tutorial rocker/geospatial\n```\n\n- Go then to `http://localhost:8787/` in your browser.\n\n- Set the working directory to the directory called `r_docker_tutorial`\n\n- You can load data from your computer into the launched RStudio tab.\n\n- You can save your analysis into the `r_docker_tutorial` directory.\n\n- If you created a document, when you close the RStudio browser and exit your Docker container look inside the `r_docker_tutorial` directory to find them.\n\n---\n\n# Let's Practice!","fields":{"slug":"/chapter6_02_launching_a_docker"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 3<br>The Docker Hub</h1></div>\n\n---\n## What is the Docker Hub?\n- We used it last module.\n- Docker Hub is the place where open Docker images are stored. \n- When we ran our first image by typing\n```\ndocker run --rm -p 8787:8787 rocker/geospatial\n```\nDocker checked if this image is available on your computer.\n\n- If not, the image is downloaded \"automatically\" from the Docker Hub. \n\n- If you just want to pull the image but not run it, you can also do\n```\ndocker pull rocker/verse\n```\n\n---\n\n## Pushing an image to Docker Hub\nWhat happens if you want to design your own image so that others can use it?\n\nWith the Docker Hub, you can easily share your image in https://hub.docker.com/.  \nAfter verifying your email you are ready to go and upload your first docker image.\n\nLet's get started!\n\n---\n\n## Setting an account in Docker Hub\n\n1. Create an account and log in on https://hub.docker.com/\n2. Click on Create Repository.\n3. Choose a name and a description for your repository and click on `Create`.\n4. Now, log into the Docker Hub from the command line\n```\ndocker login --username=yourhubusername --email=youremail@company.com\n```\n5. Enter your password when prompted. \n6. If everything worked you will get a message similar to\n```\nWARNING: login credentials saved in /home/username/.docker/config.json\nLogin Succeeded\n```\n---\n\n## Getting an image reaady\n\n1. Check the image ID using\n```\ndocker images\n```\nand what you will see will be similar to\n\n```\nREPOSITORY                         TAG       IMAGE ID       CREATED         SIZE\ncourse-starter-python_gatsby       latest    8ec687baf514   3 weeks ago     1.38GB\nec_workshops_gatsby                latest    8ec687baf514   3 weeks ago     1.38GB\nrocker/rstudio                     latest    1878e29db52f   3 months ago    1.93GB\nrocker/verse                       latest    0168d115f220   3 days ago      1.954 GB\nds-toolbox_gatsby                  latest    c7a440cb78ff   3 months ago    2.25GB\nsedv8808/unacquired_sites_db_app   latest    a8a671927478   10 months ago   1.26GB\nsedv8808/unacquired_sites_ml_app   latest    6dd7e4093508   10 months ago   1.39GB\n```\n\n---\n## Tagging your Image\n\n- Tag your image doing:\n```\ndocker tag 0168d115f220 yourhubusername/changed_verse:firsttry\n```\n\n- The number must match the image ID and `:firsttry` is the tag. \n- A good tag will help you understand what this particular image is intended for.\n- Examples of good tags:\n    - A paper’s DOI or journal-issued serial number\n    - A particular version of a code or data version control repo\n\n---\n\n## Push your image \n\n- To push your image, type:\n```\ndocker push yourhubusername/verse_gapminder\n```\n\nYour image is now available for everyone to use.\n\n---\n\n## Saving your images locally\n\n- To save a Docker image after you have pulled, committed or built it you use the docker save command. \n```\ndocker save verse_gapminder > verse_gapminder.tar\n```\n\n- To load that Docker container from the archived tar file in the future, we can use the docker load command:\n```\ndocker load --input verse_gapminder.tar\n```\n\n---\n\n# Let's Practice What We Learned","fields":{"slug":"/chapter6_03_pushing_pulling_dockerhub"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 4<br>Dockerfiles</h1></div>\n\n---\n\n## What is a Dockerfile?\n\n- Dockerfiles are a set of instructions on how to add things to a base image. \n\n- They build custom images up in a series of layers. \n\n- It is a configuration file that describes several things: \n    - from what previous Docker image you are building this one\n    - how to configure the OS\n    - what happens when you run the container\n\n---\n## Writing a Dockerfile\n\nLet’s build a very basic Dockerfile for R. \n\nThe task to solve is:  \n```\nI have today an analysis that works in a .R file. \nI want to make sure that this analysis will always work in the future, \nregardless of any update to the packages used.\n```\n\nCreate a project folder. Then, inside this folder, in the `root`, create a new text file.\n\nSave this now empty text file as `Dockerfile`\n\n---\n\nLet's use a modified R script from module 4.\nThis is the main analysis that we will want to reproduce.\n\nSave it as `my_analysis.R`\n\n```\nlibrary(leaflet)  \nlibrary(htmlwidgets)\n\nleaflet(options = leafletOptions(minZoom = 0, maxZoom = 18))  \n\n## My Leaflet Map\n\nmap1 <- leaflet() %>%  \n  addProviderTiles(providers$Stamen.TerrainBackground) %>%  \n  addTiles() %>%  \n  addCircleMarkers(lng =-123.241999032 , lat = 49.267665596,  \n                   popup = paste0(\"UBC\"))  \n\nmap1  \n\n## 'leaflet' objects (image above)\nsaveWidget(map1, file=\"m.html\")\n```\n\n---\n\n1. In your new Dockerfile, write the following:\n\n```\nFROM rocker/verse:latest\n```\n\nEvery Dockerfile starts with a `FROM`.  \nThis tells Docker to start with the rocker/verse base image.   \nThere are a lot of other official images, and you can also build from a local one.\n\nThis FROM is, in a way, describing the dependency of your image.\n\n**REMEMBER**\nThe `FROM` command must always be the first thing in your Dockerfile.\n\n---\n\n2. Add a layer on top of our verse. Install `leaflet` and `htmlwidgets`.\n```\nRUN R -e \"install.packages('leaflet')\"\nRUN R -e \"install.packages('htmlwidgets')\" \n```\n\n`RUN` commands in your Dockerfile execute shell commands to build up your image. \n\nYou can see that, since we are working with `R`, we use the same commands that we would use in R console.\n\n---\n\n## Building a Dockerfile\n\n4. Return to the Docker terminal.\n\n5. Build the image by doing:\n```\ndocker build -t my-r-image .\n```\n\n`-t my-r-image` names the image (use only lower case)  \n`.` says all the resources we need to build the image are in our current directory. \n\nThis step will probably take a few minutes. Depending on the complexity of your image, how long it will delay. You will see lines being printed on the Docker terminal.\n\nWait until it is done.\n\n---\n\n6. List your images via:\n```\ndocker images\n```\n7. You should see `my-r-image` in the list. \n\n<img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module6/00_docker_images.png?raw=true\" alt=\"Docker Images\" width=45% align=\"left\"/>\n\n---\n\n## Launch your new image \n\n- Simply do the commands that you already know:\n```\ndocker run --rm -p 8787:8787 my-r-image\n```\n\nThen in the RStudio terminal in the browser, try the library `leaflet` again:\n\n```\nlibrary('leaflet')\n\n```\n\nYou will see the library is loaded. This means `leaflet` is pre-installed and ready to go in your new docker image.\n\n---\n\nYou can also tart the container with a mounted volume to save the output.\n\n```\ndocker run --rm -p 8787:8787 -v /Users/your_path/r_docker_tutorial:/home my-r-image\n```\n\n---\n\n## Adding Data\n\n- You may also want some some static files inside our Docker image - such as data.\n\n- Add a line in the Dockerfile:\n```\nADD data/gapminder-FiveYearData.csv /home/rstudio/\n```\nRebuild your Docker image:\n```\ndocker build -t my-r-image .\n```\nAnd launch it again:\n```\ndocker run --rm -p 8787:8787 my-r-image\n```\nGo back to RStudio in the browser, and `gapminder-FiveYearData.csv` will be, present in the files visible to RStudio. \n\n---\n## Data Security\n\n- For you to load the data, you must have the CSV file in your project's root directory (or in a directory called data/ inside your root directory) in order to push it to the image. \n\n- Be careful when uploading sensitive data.\n\n---\n\n## Advanced Dockerfiles\n\nSometimes, we only want to send a Docker image that reproduces all the analysis and just outputs the results' files.\n\nWe can add a few more lines to our Dockerfile to achieve this goal.\n\n---\n\n## RUN mkdir and WORKDIR\n\nThe `mkdir` command is used to make a new directory. \nYou can use it to create the `app` or a main directory in your Dockerfile.\n\nMost importantly is the `WORKDIR` command, which will allow you to set what your working directory within Docker.\n\nAdd the following two lines to your Dockerfile\n\n```\nRUN mkdir /home/my_docker\nWORKDIR /home/my_docker\n```\n\n---\n\n## Copy\n\nLet's get our analysis script from our host machine to the container. \n\nFor that, we’ll need to use `COPY` localfile pathinthecontainer.   \nNote that here, the my_analysis.R has to be in the same folder as the Dockerfile on your computer.\n\nAdd this line to your Dockerfile\n\n```\nCOPY my_analysis.R /home/my_docker/my_analysis.R\n```\n\n---\n\n## CMD\n\nFinally `CMD`\n\nThis command will be run every time you’ll launch the docker. \n\nWe want `my_analysis.R` to be sourced. Add this line to your Dockerfile\n```\nCMD R -e \"source('/home/my_analysis.R')\"\n```\n\nSave the Dockerfile, rebuild, and reload.\n\n---\n\n## Final Complete Dockerfile\n\n```\nFROM rocker/verse:latest  # image where you are basing yourself from\n\nRUN R -e \"install.packages('htmlwidgets')\"   # libraries that you want to have in your image\nRUN R -e \"install.packages('leaflet')\"\n\nRUN mkdir /home/my_docker  # creating the directory where your app will live\nWORKDIR /home/my_docker    # setting the working directory \n\nCOPY my_analysis.R /home/my_docker/my_analysis.R     # copying scripts that you might want in your image\n\nCMD R -e \"source('/home/my_docker/my_analysis.R')\"   # running the scripts\n```\n\n---\n\n# Let's practice what we learned!\n","fields":{"slug":"/chapter6_04_Dockerfiles"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> What We Learned</h1></div>\n\n---\n\n## Docker\n\nContainers allow us to:\n\n- Functionality for apps regardless of the host operating system.\n\n- High portability: everything is packed together, so it is easy to transfer these apps from project to project.\n\n- High sharability.\n\n---\n\n# Let's practice what we learned!","fields":{"slug":"/chapter6_05_summary_and_conclusions"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 0<br>Learning Outcomes</h1></div>\n\n---\n\n\nBy the end of this module, you should be able to:\n\n- Define data science, and the related terms reproducibile and audible analysis\n- Give examples of workflows that are considered reproducible and trustworthy in the context of a data analysis\n- Describe the data analysis cycle\n- Explain how to mechanistically start a data analysis project\n- State and refine a data analysis question\n\n---\n\n\n- Define the following 3 types of testing\n    - unit testing\n    - integration testing\n    - regression testing\n- Define continuous integration testing\n- Explain why continuous integration testing is superior to manually running tests\n- Define the following key concepts that underlie GitHub Actions: Actions, Workflow, Event, Runner, Job, Step\n\n---\n\n\n- Store and use GitHub Actions credentials safely via GitHub Secrets\n- Explain who owns the copyright of code they write in a give situation, and why\n- Choose an appropriate license for software (i.e., packages or analysis code)\n- Choose an appropriate license for your non-software materials\n\n---\n\n# Let's Get Started!","fields":{"slug":"/chapter7_00_learning_outcomes"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 1<br>File Names</h1></div>\n\n---\n\n# Three principles for (file) names:\n\n- Machine readable\n- Human readable\n- Plays well with default ordering\n\n---\n\n## Machine readable\n\n- Regular expression and globbing friendly\n- Avoid spaces, punctuation, accented characters, case sensitivity\n- Easy to compute on\n- Deliberate use of delimiters\n\n---\n\n## Machine readable Motivation \n\n- Easy to search for files later\n\n- Easy to narrow file lists based on names\n\n- Easy to extract info from file names, e.g. by splitting\n\n- New to regular expressions and globbing? be kind to yourself and avoid\n    + Spaces in file names\n    + Punctuation\n    + Accented characters\n    + Different files named `foo` and `Foo`\n\n---\n\n## Human readable\n\n- Name contains info on content\n\n- Connects to concept of a *slug* from semantic URLs\n\n---\n\n## Embrace the slug\n\nExamples of names\n```\n01_marshal-data.r\nhelper01_load-counts.r\n```\n\n---\n\n## Human readable Motivation\n\nEasy to figure out what a file is, based on its name\n\n---\n\n## Plays well with default ordering\n\n- Put something numeric first\n\n- Use the ISO 8601 standard for dates\n\n- Left pad other numbers with zeros\n\n**Examples**\n\nChronological order\n```\n```\n\nLogical order\n```\n```\n\nDates\n```\n```\n\n---\n\n## Left pad other numbers with zeros\n\nIf you don’t left pad, you get this:\n\n~~~\n10_final-figs-for-publication.R\n1_data-cleaning.R\n2_fit-model.R\n~~~\n\n---\n## Examples of Good and Bad Names\n\n**BIG NO**\n```\nmyabstract.docx\nJoe’s Filenames Use Spaces and Punctuation.xlsx\nfigure 1.png\nfig 2.png\nJW7d^(2sl@deletethisandyourcareerisoverWx2*.txt\n```\n\n**YES**\n```\n2014-06-08_abstract-for-sla.docx\njoes-filenames-are-getting-better.xlsx\nfig01_scatterplot-talk-length-vs-interest.png\nfig02_histogram-talk-attendance.png\n1986-01-28_raw-data-from-challenger-o-rings.txt\n```\n\nSource [Data Carpentry](https://datacarpentry.org/rr-organization1/01-file-naming/index.html)\n---\n\n# Let's Practice!\n","fields":{"slug":"/chapter7_01_filenames"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 2<br>Project Organization</h1></div>\n\n---\n\n## Data analysis workflow \n\n![workflow](https://datacarpentry.org/rr-organization1/fig/workflow.png)\n\n---\n\n## Face it...\n\n- There are going to be files\n\n- LOTS of files\n\n- The files will change over time\n\n- The files will have relationships to each other\n\n---\n\n## For Your Inner Self\n\n- File organization and naming is a mighty weapon against chaos\n\n- Make a file's name and location VERY INFORMATIVE about what it is, why it exists, how it relates to other things\n\n- The more things are self-explanatory, the better\n\n---\n\n## Organizing your data analysis workflow\n\n**Raw data -> data**\n\nPick a strategy, any strategy, just pick one!\n\n<div class=\"columns-2\">\n\n![workflow_raw_data_to_data](https://datacarpentry.org/rr-organization1/fig/workflow_raw_data_to_data.png)\n\n~~~\ndata\ndata-raw\ndata-clean\ndata/\n  - raw\n  - clean\n~~~\n\n</div>\n\n---\n\n## Data -> results\n\nPick a strategy, any strategy, just pick one!\n\n<div class=\"columns-2\">\n\n![workflow_data_to_results_1](https://datacarpentry.org/rr-organization1/fig/workflow_data_to_results_1.png)\n\n~~~\ncode\nscripts\nanalysis\nbin\n~~~\n</div>\n\n---\n\n## Data -> results\n\nPick a strategy, any strategy, just pick one!\n\n<div class=\"columns-2\">\n\n![workflow_data_to_results_2](https://datacarpentry.org/rr-organization1/fig/workflow_data_to_results_2.png)\n\n~~~\nfigures\nresults\nresults/\n  - figs\n  - nums\nfigures\ntables\n~~~\n</div>\n\n---\n\n## A real (and imperfect!) example\n\n~~~\n  /Users/jenny/research/bohlmann/White_Pine_Weevil_DE:\n  total used in directory 246648 available 131544558\n  drwxr-xr-x  14 jenny  staff        476 Jun 23  2014 .\n  drwxr-xr-x   4 jenny  staff        136 Jun 23  2014 ..\n  -rw-r--r--@  1 jenny  staff      15364 Apr 23 10:19 .DS_Store\n  -rw-r--r--   1 jenny  staff  126231190 Jun 23  2014 .RData\n  -rw-r--r--   1 jenny  staff      19148 Jun 23  2014 .Rhistory\n  drwxr-xr-x   3 jenny  staff        102 May 16  2014 .Rproj.user\n  drwxr-xr-x  17 jenny  staff        578 Apr 29 10:20 .git\n  -rw-r--r--   1 jenny  staff         50 May 30  2014 .gitignore\n  -rw-r--r--   1 jenny  staff       1003 Jun 23  2014 README.md\n  -rw-r--r--   1 jenny  staff        205 Jun  3  2014 White_Pine_Weevil_DE.Rproj\n  drwxr-xr-x  20 jenny  staff        680 Apr 14 15:44 analysis\n  drwxr-xr-x   7 jenny  staff        238 Jun  3  2014 data\n  drwxr-xr-x  22 jenny  staff        748 Jun 23  2014 model-exposition\n  drwxr-xr-x   4 jenny  staff        136 Jun  3  2014 results\n~~~\n\n---\n\n## Data\n\nReady to analyze data:\n\n![sample_ready_to_analyze_data](https://datacarpentry.org/rr-organization1/fig/sample_ready_to_analyze_data.png)\n\n<hr>\n\nRaw data:\n\n![sample_raw_data](https://datacarpentry.org/rr-organization1/fig/sample_raw_data.png)\n\n\n---\n\n## Analysis and figures\n\nR scripts + the Markdown files from \"Compile Notebook\":\n\n![sample_ready_to_analyze_data](https://datacarpentry.org/rr-organization1/fig/sample_ready_to_analyze_data.png)\n\n<hr>\n\nThe figures created in those R scripts and linked in those Markdown files:\n\n![sample_raw_data](https://datacarpentry.org/rr-organization1/fig/sample_raw_data.png)\n\n## Scripts\n\nLinear progression of R scripts, and Makefile to run the entire analysis:\n\n![sample_scripts](https://datacarpentry.org/rr-organization1/fig/sample_scripts.png)\n\n## Results\n\nTab-delimited files with one row per gene of parameter estimates, test statistics, etc.:\n\n![sample_results](https://datacarpentry.org/rr-organization1/fig/sample_results.png)\n\n## Expository files\n\nFiles to help collaborators understand the model we fit: some markdown docs, a Keynote presentation, Keynote slides exported as PNGs for viewability on GitHub:\n\n![sample_expository](https://datacarpentry.org/rr-organization1/fig/sample_expository.png)\n\n---\n\n## Caveats / problems with this example\n\n- This project is no where near done, i.e. no manuscript or publication-ready figs\n\n- File naming has inconsistencies due to three different people being involved\n\n- Code and reports/figures all sit together because it’s just much easier that way w/ knitr & rmarkdown\n\n---\n\n## Wins of this example\n\n- I can walk away from the project and come back to it a year later and resume work fairly quickly\n\n- The two other people were able to figure out what I did and decide which files they needed to look at, etc.\n\n---\n\n## Tip: Life cycle of data\n\nHere’s how most data analyses go down in reality:\n\n- You get raw data\n\n- You explore, describe and visualize it\n\n- You diagnose what this data needs to become useful\n\n- You fix, clean, marshal the data into ready-to-analyze form\n\n- You visualize it some more\n\n- You fit a model or whatever and write lots of numerical results to file\n\n- You make prettier tables and many figures based on the data & results accumulated by this point\n\nBoth the data file(s) and the code/scripts that acts on them reflect this progression\n\n---\n\n## Prepare data -> Do stats -> Make tables & figs\n\nThe R scripts:\n\n~~~\n01_marshal-data.r\n02_pre-dea-filtering.r\n03_dea-with-limma-voom.r\n~~~\n\n<hr>\n\nThe figures left behind:\n\n~~~\n02_pre-dea-filtering-preDE-filtering.png\n03-dea-with-limma-voom-voom-plot.png\n04_explore-dea-results-focus-term-adjusted-p-values1.png\n04_explore-dea-results-focus-term-adjusted-p-values2.png\n~~~\n\nFile organization should reflect inputs vs outputs and the flow of information\n\nSource [Data Carpentry](https://datacarpentry.org/rr-organization1/02-file-organization/index.html)\n---\n\n# Let's Practice!","fields":{"slug":"/chapter7_02_project_organization"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 4<br>Testing</h1></div>\n\n---\n\n## What is Testing?\n\n- Method to check whether the actual software product matches expected requirements\n- Ensures that software product is Defect free. \n- It involves execution of software components using manual or automated tools to evaluate properties of interest. \n- The purpose is to identify errors or missing requirements in contrast to actual requirements.\n\n---\n\n## Types of Testing\n\n- Unit testing  \n    - Tests whether an individual component of a piece of software works as expected.\n\n- Integration testing  \n    - Tests whether separate components of a piece of software, which depend upon eachother, work together as expected.\n\n- Regression testing\n    - Tests that recent changes do not break older features.\n\n---\n\n## Unit Testing\n\nHow can we be so sure that the code we wrote is doing what we want it to?\n\nDoes our code work 100% of the time?\n\n- The answer is: units tests.\n\n- In Python unit tests can be implemented using assert statements although there are other ways.\n\n- Let’s first discuss the syntax of an assert statement and then how they can be applied to the bigger concept, which is unit tests.\n\n---\n\n## Assert Statements\n\n``` python\nassert 1 == 2 , \"1 is not equal to 2.\"\n```\n\n``` out\nAssertionError: 1 is not equal to 2.\n\nDetailed traceback: \n  File \"<string>\", line 1, in <module>\n```\n\n---\n\n`assert` statements can be used as sanity checks for our program.\n\nWe implement them as a “debugging” tactic to make sure our code runs as\nwe expect it to.\n\nWhen Python reaches an `assert` statement, it evaluates the condition to\na Boolean value.\n\nIf the statement is `True`, Python will continue to run. However, if the\nBoolean is `False`, the code stops running, and an error message is\nprinted.\n\n\n---\n\n## Example 1\n\nLet’s take a look at an example where the Boolean is `True`.\n\n``` python\nassert 1 == 1 , \"1 is not equal to 1.\"\nprint('Will this line execute?')\n```\n\nWhat do you think the output will be?\n\n---\n\nAnswer: \n\n```out\nWill this line execute?\n```\n\nHere, since the `assert` statement results in a `True` values, Python\ncontinues to run, and the next line of code is executed.\n\n---\n\n## Example 2\n\n``` python\nassert 1 == 2 , \"1 is not equal to 2.\"\nprint('Will this line execute?')\n```\n\nWhat do you think the output will be?\n\n---\n\nAnswer: \n\n``` out\nAssertionError: 1 is not equal to 2.\n\nDetailed traceback: \n  File \"<string>\", line 1, in <module>\n```\n\n\nWhen an assert is thrown due to a Boolean evaluating to `False`, the\nnext line of code does not get an opportunity to be executed.\n\n---\n\n## When to test?\n\n\n- You probably are used to creating a function, and only after that,  you might want to write the tests.\n\n- Actually, writing tests should be done *before* the actual function. This is called Test-Driven Development.\n\n- This may seem a little counter-intuitive, but we’re creating the  expectations of our function before the actual function code.\n\n- Often we have an idea of what our function should be able to do and what output is expected.\n\n- Writing tests before the function, help understand what code is needed and it avoids encountering large bugs down the line.\n\n- It is recommended to write multiple tests.\n\n---\n\n## What to test?\n\n- Keep these tests simple - things that we know are true or\ncould be easily calculated by hand.\n\nFor example, let’s look at our `exponent_a_list()` function.\n\nEasy cases for this function would be lists containing numbers that we\ncan easily square or cube.\n\nFor example, we expect the square output of `[1, 2, 4, 7]` to be\n`[1, 4, 16, 49]`.\n\n---\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n    new_exponent_list = list()\n    \n    for number in numerical_list:\n        new_exponent_list.append(number ** exponent)\n    \n    return new_exponent_list\n```\n\n``` python\nassert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], \"incorrect output for exponent = 2\"\n```\n\n``` python\nassert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], \"incorrect output for exponent = 3\"\n```\n\n``` python\nassert type(exponent_a_list([1,2,4], 2)) == list, \"output type not a list\"\n```\n\n---\n\n## Systematic Approach\n\nWe use a **systematic approach** to design our function using a general\nset of steps to follow when writing programs.\n\n***1. Write the function stub: a function that does nothing but accepts\nall input parameters and returns the correct datatype.***\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n    return list()\n```\n\n---\n\n***2. Write tests to satisfy the design specifications.***\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n    return list()\n   \nassert type(exponent_a_list([1,2,4], 2)) == list, \"output type not a list\"\nassert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], \"incorrect output for exponent = 2\"\nassert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], \"incorrect output for exponent = 3\"\n```\n\n``` out\nAssertionError: incorrect output for exponent = 2\n\nDetailed traceback: \n  File \"<string>\", line 1, in <module>\n```\n\n---\n\n***3. Outline the program with pseudo-code.***\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n\n    # create a new empty list\n    # loop through all the elements in numerical_list\n    # for each element calculate element ** exponent\n    # append it to the new list \n    \n    return list()\n    \nassert type(exponent_a_list([1,2,4], 2)) == list, \"output type not a list\"\nassert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], \"incorrect output for exponent = 2\"\nassert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], \"incorrect output for exponent = 3\"\n```\n\n``` out\nAssertionError: incorrect output for exponent = 2\n\nDetailed traceback: \n  File \"<string>\", line 1, in <module>\n```\n\n---\n\n***4. Write code and test frequently.***\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n    new_exponent_list = list()\n    \n    for number in numerical_list:\n        new_exponent_list.append(number ** exponent)\n    \n    return new_exponent_list\n    \nassert type(exponent_a_list([1,2,4], 2)) == list, \"output type not a list\"\nassert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], \"incorrect output for exponent = 2\"\nassert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], \"incorrect output for exponent = 3\"\n```\n\n---\n\n***5. Write documentation.***\n\n``` python\ndef exponent_a_list(numerical_list, exponent=2):\n    \"\"\" Creates a new list containing specified exponential values of the input list. \n    \n    Parameters\n    ----------\n    numerical_list : list\n        The list from which to calculate exponential values from\n    exponent : int or float, optional\n        The exponent value (the default is 2, which implies the square).\n    \n    Returns\n    -------\n    new_exponent_list : list\n        A new list containing the exponential value specified of each of\n        the elements from the input list \n        \n    Examples\n    --------\n    >>> exponent_a_list([1, 2, 3, 4])\n    [1, 4, 9, 16]\n    \"\"\"\n    new_exponent_list = list()\n    for number in numerical_list:\n        new_exponent_list.append(number ** exponent)\n    return new_exponent_list\n```\n\n--- \n\n## Organizing tests\n\nTests are organised hierarchically: expectations are grouped into tests which are organised in files:\n\nAn expectation is the atom of testing:\n- It describes the expected result of a computation. Examples:\n    - Does it have the right value and right class? \n    - Does it produce error messages when it should? \n    \nA test groups together multiple expectations to test the output from a simple function.\nThis is why they are sometimes called unit as they test one unit of functionality. \n\n---\n\n## Tools for Testing\n\nThere are automated tools we can take advantage of: \n- `pytest` and `asssert` for Python\n- `testthat` for R\n\n---\n\n# Let's practice what we learned!\n","fields":{"slug":"/chapter7_04_testing"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 8<br>Software Licensing</h1></div>\n\n---\n\n## Copyright Law\n\nIn both the US and Canada, software code falls under copyright law.\n\nThe owner of the copyright can decide how this code may be used, copied, distributed, changed, among other activities.\n\nWarning!\nCopyright protects the language and words used to express ideas, concepts and themes.\nIt does not protect not the ideas, concepts or themes themselves.\n\n---\n\n## Copyright Owner Rights\n\nIn the USA, the owner of copyright has the exclusive right to do and to authorize others to do the following:\n\n- To reproduce the work in copies or phonorecords;\n- To prepare derivative works based upon the work;\n- To distribute copies or phonorecords of the work to the public by sale or other transfer of ownership, or by rental, lease, or lending;\n- To publicly perform the work, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works;\n- To publicly display the work, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work.\n- To digitally transmit sound recordings by means of digital audio transmission.\n\n[Copyright Law of the USA](https://en.wikipedia.org/wiki/Copyright_law_of_the_United_States#Exclusive_rights)\n\n---\n\n## Who has copyright ownership? \n\n- You if you are author the code and you are doing this for yourself (i.e., not for your work)\n\n- In such a case, you (the person who typed the code) automatically become the copyright owner.\n\n    - In the USA, you do not need to use the copyright symbol © .\n\n    - Best practices: Use the copyright symbol along with your name and the year of first publication.\n\n    - Register tools that you publicate at USA Copyright Registration Portal\n\n---\n\n- The employer or client if you were doing a job for them.\n\n- Implications if you would like to reuse a code that you did for your employer in a different situation:\n    - Negotiate a licence to use the code you wrote.\n\n---\n\n## Why do I need a license?\n\nAs mentioned above, creative works (like software code) are automatically eligible for intellectual property (and thus copyright) protection\n\nReusing creative works without a license is dangerous, because the copyright holders could sue you for copyright infringement\n\nThus, if you publicly share your creative work (i.e., software code), you should let others know if and how they can reuse it\n\nThis is done via the inclusion of a LICENSE or LICENSE.txt file in the base directory of the repository that clearly states under which license the content is being made available\n\nUnless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation.\n\n---\n\n## How do licenses work?\n\nA license grants rights to others (the licensees) that they would otherwise not have. What rights are being granted under which conditions differs, often only slightly, from one license to another.\n\nLicenses are legal documents and written by legal experts. But you can choose an already written one that best suits your situation.\n\n---\n\n## Choosing a License\n\nIn practice, a few [licenses](https://choosealicense.com/licenses/) are by far the most popular, and [choosealicense.com](https://choosealicense.com/) will help you find a common license that suits your needs. \n\n- [Very Simple and Permisive](https://choosealicense.com/licenses/mit/)\n\n- [Existing projects and communities](https://choosealicense.com/community/)\n\n- [Sharing Improvements](https://choosealicense.com/licenses/gpl-3.0/)\n\n\nFor visualizations, reports, presentations, or tutorials, use [The Creative Commons licences](https://creativecommons.org/licenses/).\n\nThey are now widely used in academia and the publishing industry.\n\n---\n\n# Let's practice what we learned!","fields":{"slug":"/chapter7_06_software_licensing"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 7<br>Introduction to CI/CD and Github Actions</h1></div>\n\n---\n\n# Continuous Integration (CI)\n\n- CI is the practice of frequently integrating code changes from contributors to a shared repository. \n- Often the submission of code to a shared repository is combined with automated testing to increase code dependability and quality.\n\n#### Why use CI + automated testing\n\n- detects errors sooner\n- reduces the amount of code to be examined when debugging\n- facilitates merging\n- ensures new code additions do not introduce errors\n\n---\n\n # Continous Development (CD)\n\n- Defined as the practice of automating the deployment of software that has successfully run through your test-suite.\n\n- For example, upon merging a pull request to master, an automation process builds the Python package and publishes to PyPI without further human intervention.\n\n#### Why use CD?\n\n- little to no effort in deploying new version of the software allows new features to be rolled out quickly and frequently\n\n- allows for quick implementation and release of bug fixes\n\n- deployment can be done by many contributors\n\n---\n\n## GitHub Actions \n\n- It is a continuous integration and continuous development (CI/CD) platform that allows you to automate your build, test, and deployment pipeline.\n\n- A tool for automating software development tasks, located in the same place where you already store your code.\n\n- You can create workflows that build and test every pull request to your repository, or deploy merged pull requests to production.\n\n---\n\n- GitHub Actions lets you run workflows when other events happen in your repository. \n  - For example, you can run a workflow to automatically add the appropriate labels whenever someone creates a new issue in your repository.\n\n- GitHub provides Linux, Windows, and macOS virtual machines to run your workflows, or you can host your own self-hosted runners in your own data center or cloud infrastructure.\n\n---\n\n## Key concepts:\n\n- Actions: Individual tasks you want to perform.\n\n- Workflow: A collection of actions (specified together in one file).\n\n- Event: Something that triggers the running of a workflow.\n\n- Runner: A machine that can run the Github Action(s).\n\n- Job: A set of steps executed on the same runner.\n\n- Step: A set of commands or actions which a job executes.\n\n---\n\n## Create an example workflow\n\n1. Create a new public GitHub.com repository.\n\n2. Click on the “Actions” tab\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/00_gh_actions_button.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/00_gh_actions_button.png?raw=true\" alt=\"GH Actions\" width=75% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n3. Click on the first “Simple workflow” configure button\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/01_gh_actions_wf.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/01_gh_actions_wf.png?raw=true\" alt=\"GH Actions\" width=75% align=\"center\" title=\"Click to zoom in\"> \n</a>\n\n---\n\n4. Click on the two green commit buttons to add this workflow file\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/03_sec_commit_bt.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/03_sec_commit_bt.png?raw=true\" alt=\"Commits\" width=45% align=\"center\" title=\"Click to zoom in\"> \n\n---\n\n5. Go back to the “Actions” tab. It now looks different:\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/04_nw_tab.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/04_nw_tab.png?raw=true\" alt=\"Actions\" width=45% align=\"center\" title=\"Click to zoom in\"> \n\n\n---\n\n6. Click on the message associated with the event that created the action:\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/05_mn_wf.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/05_mn_wf.png?raw=true\" alt=\"Message\" width=45% align=\"center\" title=\"Click to zoom in\"> \n\n---\n\n7. Click on the build link:\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/06_run_wf.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/06_run_wf.png?raw=true\" alt=\"Actions\" width=45% align=\"center\" title=\"Click to zoom in\"> \n\n---\n\n8. Click on the arrow inside the build logs to expand a section and see the output of the action Check all of the arrows and see what happens at each step.\n\n<a href=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/08_wf_ran2.png?raw=true\" target=\"_blank\">\n<img src=\"https://github.com/throughput-ec/ec_workshops_py/blob/main/static/module7/08_wf_ran2.png?raw=true\" alt=\"Actions\" width=45% align=\"center\" title=\"Click to zoom in\"> \n\n---\n\n## GitHub Actions workflow file\n\nA YAML file that lives in the .github/workflows directory or your repository which speciies your workflow.\n\nA basic example of this yaml file:\n\n```\nname: learn-github-actions\non: [push]\njobs:\n  check-bats-version:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v2\n      - uses: actions/setup-node@v2\n        with:\n          node-version: '14'\n      - run: npm install -g bats\n      - run: bats -v\n```\n\n---\n\n## Understanding the file\n\n|                            |                          |\n| ----------------- | -----------------------------|\n| name: learn-github-actions |  Optional - The name of the workflow as it will appear in the Actions tab of the GitHub repository.  |\n| on: [push]        | Specifies the trigger for this workflow. This example uses the push event, so a workflow run is triggered every time someone pushes a change to the repository  |\n| jobs:  | Groups together all the jobs that run in the learn-github-actions workflow. |\n| check-bats-version: | Defines a job named check-bats-version. The child keys will define properties of the job. |\n|  runs-on: ubuntu-latest | Configures the job to run on the latest version of an Ubuntu Linux runner. This means that the job will execute on a fresh virtual machine hosted by GitHub. |\n| steps:  | Groups all the steps that run in the check-bats-version job. |\n|  uses: actions/checkout@v2  | The uses keyword specifies that this step will run v2 of the actions/checkout action. |\n| uses: actions/setup-node@v2 with: node-version: '14' | This step uses the actions/setup-node@v2 action to install the specified version of the Node.js (this example uses v14). This puts both the node and npm commands in your PATH. |\n| run: npm install -g bats | The run keyword tells the job to execute a command on the runner. |\n| run: bats -v | Run the bats command with a parameter that outputs the software version.  |\n\n\n---\n# Let's practice what we learned!\n","fields":{"slug":"/chapter7_05_github_actions"}}},{"node":{"rawMarkdownBody":"\n<div><h1><img src=\"https://github.com/throughput-ec/ec-workshops/blob/main/static/module1/00_ec_slide1.png?raw=true\" alt=\"EC Theme\" width=25% align=\"left\"/> Lesson 3<br>Data Workflows</h1></div>\n\n---\n## Dealing with Data\n\nData Science is the study, development and practice of reproducible and auditable processes to obtain insight from data. $^1$\n\nFrom this definition, we must also define reproducible and auditable analysis:\n\nReproducible analysis:\nreaching the same result given the same input, computational methods and conditions. $^2$\n\n~~~\ninput = data\ncomputational methods = computer code\nconditions = computational environment (e.g., programming language & it's dependencies)\nAuditable/transparent analysis,\na readable record of the steps used to carry out the analysis as well as a record of how the analysis methods evolved .\n~~~\n\n1. [National Academies of Sciences, 2019](https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science)\n2. [Parker, 2017](https://peerj.com/preprints/3210/) and [Ram, 2013](https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-8-7)\n\n---\n## Motivation\n\nData products can be built via other methods, but we lack confidence in how the results or products were created.\n\nWe believe this stems from non-reproducible and non-auditable analyses:\n\n- lacking evidence that the results or product could be regenerated given the same input computational methods, and conditions\n\n- lacking evidence of the steps taken during creation\n\n- having an incomplete record of how and why analysis decisions were made\n\n---\n\n## Making Data Science Trustworthy\n\n- It should be reproducible and auditable\n- It should be correct\n- It should be fair, equitable and honest\n\n---\n\n## When Research Goes Wrong\n\nAn example with large impact\n\nRETRACTED ARTICLE: Safety and efficacy of favipiravir versus hydroxychloroquine in management of COVID-19: A randomised controlled trial\n\nA research paper was published in March 2021 that claimed that a drug, Favipiravir, was a safe and effective alternative to another drug, hydroxychloroquine (a medication commonly used to prevent or treat malaria), in mild or moderate COVID-19 infected patients.\n\n---\n\n## When Research Goes Wrong\n\nIn September, 2021 the paper we retracted by the editors - in part due to reproducibility issues:\n\n\"After concerns were brought to the Editors' attention after publication, the raw data underlying the study were requested. The authors provided several versions of their dataset. Post-publication peer review confirmed that none of these versions fully recapitulates the results presented in the cohort background comparisons, casting doubt on the reliability of the data. Additional concerns were raised about the randomisation procedure, as the equal distribution of male and female patients is unlikely unless sex is a parameter considered during randomisation. However, based on the clarification provided by the authors, sex was not considered during this process. The Editors therefore no longer have confidence in the results and conclusions presented.\"\n\n---\n\n## When Research Goes Wrong\n\nThe problem doesn't just stop once the article is retracted. Between the time the article was published and retracted, the article was cited 17 times!\n\nThe use of non-reproducible tools can impact government and industry as well! \n\n---\n\n## Workflows Importance\n\n- It makes it easier to collaborate with your most imporant collaborator - YOU in 6 months!\n- It makes others think you know what you are doing.\n\n---\n\n## Workflows & complex projects\n\nComplex projects have at least one item of the following list:\n\n- two, or more, people directly working on the analysis\n- projects that involve two or more coding documents\n- projects that involve analysis of medium/large data\n- projects where you are working on a remote machine\n- projects that have many software or environment dependencies\n\nAs a project accumulates more of these features it grows further in complexity.\n\n---\n\nComplex projects without intentional Data Science workflows can result in:\n\n- An result that you cannot recreate.\n- Spare files of information related to the project that only you have access to.\n- A small change to the analysis code requires re-running the entire program taking several hours.\n- Code that can only be run on one machine (The \"But it runs on my computer\" problem)\n\n---\n\n## Avoiding the Chaos\nUse:\n- Version Control (Git & GitHub)\n- Write Executable analysis scripts & pipelines (Python/R scripts)\n- Defined & shippable dependencies as we saw (Docker)\n\n\n---\n\n## Version Control\n\n- Use GitHub Issues for communications related to the project\n- Version control contributes to better communication & team work\n- All collaborators/team members know where to find the latest (or earlier) version of the analysis (code and output)\n\nAll collaborators/team members have access to all communications associated with the analysis\n\n---\n\n## Executable analysis scripts & pipelines\n\nAs analysis grows in length and complexity, one literate code document generally is not enough\n\nTo improve code report readability (and code reproducibility and modularity) it is better to abstract at least parts of the code away (e.g, to scripts)\n\nThese scripts save figures and tables that will be imported into the final report\n\nExample problem solved by executable analysis scripts & pipelines\nProblem: A small change to the analysis code requires re-running the entire thing, and takes hours.\n\nSolution: Use a smart dependency tree tool to only re-run the parts that needs to be updated.\n\n---\n\n## Defined & shippable dependencies\n\nDependencies are other things one need to install to run your code, and includes:\n- programming languages (e.g., R, Python, Julia, etc)\n- packages from programming languates (e.g., tidyverse, scikit-learn)\n- Dependencies include versions too\n\nExample problem solved by defined & shippable dependencies\nProblem: Code that can only be run on one machine, and you don't know why.\n\n---\n\n## Life cycle of a data analysis project\n\nIt is also critical that you match the correct data science methods to the type of statistical question you are asking.\n\n---\n\n## Descriptive\n\nOne that seeks to summarize a characteristic of a set of data. No interpretation of the result itself as the result is a fact, an attribute of the data set you are working with.\n\nExamples:\n\nHow many people live in each US state?\n\n---\n\n## Exploratory\n\nOne in which you analyze the data to see if there are patterns, trends, or relationships between variables looking for patterns that would support proposing a hypothesis to test in a future study.\n\nExamples:\n\nDoes air pollution correlate with life expectancy in a set of data collected from groups of individuals from several regions in the United States?\n\n---\n\n## Inferential\n\nOne in which you analyze the data to see if there are patterns, trends, or relationships between variables in a representative sample. We want to quantify how much the patterns, trends, or relationships between variables is applicable to all individuals units in the population.\n\nExamples:\n\nIs eating at least 5 servings a day of fresh fruit and vegetables is associated with fewer viral illnesses per year?\n\n---\n\n## Predictive\n\nOne where you are trying to predict measurements or labels for individuals (people or things). Less interested in what causes the predicted outcome, just what predicts it.\n\nExamples:\n\nHow many viral illnesses will someone have next year?\n\n---\n\n## Causal\n\nAsks about whether changing one factor will change another factor, on average, in a population. Sometimes the underlying design of the data collection, by default, allows for the question that you ask to be causal (e.g., randomized experiment or trial)\n\nExamples:\n\nDoes smoking lead to cancer?\n\n---\n\n## Mechanistic\n\nOne that tries to explain the underlying mechanism of the observed patterns, trends, or relationship (how does it happen?)\n\nExamples:\n\nHow does how airplane wing design changes air flow over a wing, leading to decreased drag?\n\n---\n\n## What happens next?\n\nKnowing the kind of question you are trying to answer, helps narrow down the possibilities of the kind of analysis you might want to do.\n\nFor example, if you have the question: \"How many viral illnesses will someone have next year?\" and you identify that it is predictive. You could narrow down that some kind of statistical or machine learning model might help you answer that.\n\nThen you need to go a step deeper and look at the data that you have, and see which kind of statistical or machine learning model is most suitable for your data.\n\n---\n\n# Let's Practice What We Learned","fields":{"slug":"/chapter7_03_ds_workflows"}}}]}}}