Reproducible data analysis reports

Krisanat Anukarnsakulchularp

Department of Econometrics and Business Statistics, Monash University, Australia.

Introduction

Presenters:

Session 1: Reproducible data analysis reports

Krisanat Anukarnsakulchularp

Session 2: Writing academically

P. G. Jayani Lakshika

Session 3: Engaging reproducible presentations

Janith Wanniarachchi

šŸ’» šŸ“ƒ What is reproducible research?



The National Academies of Science, Engineering and Medicine in the USA says:

Reproducibility means obtaining consistent results using the same input data, computational steps, methods, code and conditions of analysis.





Why?

  • Efficiency: allow changes to be implemented more easily, especially for dynamic reproducible documents.

  • Repeatability: the analysis can be repeated multiple times while still obtaining the same results.

  • Transparency: everything is available for access, resulting in more trustworthy results.

  • Easy to update: when new data arrives, the report can be automatically updated.

And future you will thank you, because they will know what past you was thinking. It helps memory, collaboration, and sharing.

How might the project look?

How to combine text and data analysis?


Literate programming

Literate programming is an approach to writing reports using software that weaves together the source code and text at the time of creation.

Reproducibility requires more than literate programming. These are:

  • a versioning and sharing system, like GitHub and git.
  • software environment supporting workflows such as targets or renv.

But these are for future workshops.

Dynamics documents

  • A dynamic document includes code used for data analysis and text explaining the analysis and results.

  • These two things produce a report, a paper, or presentation sequentially and dynamically, and possibly different output formats, html, pdf, docx, ppt, by changing one line in the file.

More on papers and presentations in the next sessions! Here we focus on reports!

Main tools for reproducible research

  • R: the programming language.

  • RStudio: an integrated developer environment (IDE).

  • Quarto: tools for writing a complete analysis, and combines text and code together.

Similar tools are available in other languages, and Quarto can contain code chunks of various languages, possibly in the same document.

Getting started

First step

Practicing

  1. Create a project.

  2. Create a quarto document.

  3. Render document.

Your turn!

Elements of a reproducible project

  • All the elements of the project should be files.
  • All files should be stored within the project location (typically a folder).
  • All files should be explicitly tied together.

But how do we tie the files together?

Computer paths

A path is the complete location or name of where a computer file, directory, or web page is located.

Examples:

  • Windows: C:\\Documents\\workshop
  • Mac/Linux: /Users/Documents/workshop
  • Internet: https://numbat.space/

Absolute and Relative paths

  • Absolute: start from the lowest level, typically a drive letter or root (/)
    • /Users/Documents/workshop āš ļø
  • Relative: refers to a location that is relative to the current directory.
    • ./workshop

Important

Absolute paths should be avoided since it is extremely unlikely another person will have the same absolute path as you.

Work projects

  • Data folder: contains all the data for the project.
  • Images/Figures folder: contains all the external pictures not produced by the code in the qmd file.
  • .Rproj file: automatically added when creating an RStudio project (handles the relative paths and working directories).
  • qmd file: quarto document
  • Other R scripts, etc…

Your turn!

Practicing

Let’s fill up our work projects.

Quarto details

Quarto

  • Provides a framework for integrating code and text into a single document.

  • The code is written within the code chunks, put the text around that, and get a fully reproducible document.

Quarto document elements

  1. Text (formatted with Markdown)

  2. Code (code formatting)

  3. Metadata (YAML)

Dynamic documents

Quarto + knitr = Dynamic document

  • Quarto allows the use of Markdown for writing text in the report and also to include R code.
  • knitr runs all code chunks, and ā€œknitā€ the results into a markdown file (replacing R chunks with output).
  • pandoc is used to convert the markdown file to different output formats.

Quarto: text (Markdown)

Markdown is a lightweight markup language for adding formatting elements to plain text documents.

  • Text formatting
  • Headings
  • Links & Images
  • Lists
  • Many more…

Text formatting & Headings

Markdown Syntax:

*italics*, _italics_

**bold**, __bold__

***bold italics***, ___bold italics___

~~strikethrough~~

`verbatimcode`

# Heading 1

## Heading 2

Results:

italics, italics

bold, bold

bold italics, bold italics

strikethrough

verbatimcode

Heading 1

Heading 2

Your turn!

Practicing

  1. Create different heading levels.
  2. Write 1-2 sentences with different text formatting.
  3. Add images.

Quarto: code (R)

R code:

```{r}
#| echo: false

1+1
``` 

Results:

[1] 2

Insert an R code chunk into a Quarto document with:

  • Keyboard short cut Ctrl + Alt + I (Mac: Cmd + Option + I)

  • Typing the chunk delimiters (```)

Chunk output can be customised with Chunk execution options, which are at the top of a chunk, starting with #|

Chunk execution options

  • eval: false does not evaluate (run) this code chunk when knitting.
  • echo: false does not show the source code in the finished file.
  • include: false prevents code and results from showing in the finished file.
  • message: false prevents messages that are generated by code from showing in the finished file.
  • warning: false prevents warnings that are generated from showing in the finished file
  • fig.cap = "Text" adds a caption to a figure

There are many more; see Quarto documentation.

Tables and captions

R code:

```{r}
#| echo: false

library(ggplot2)

data(cars)

table_data <- head(cars, 5)

knitr::kable(table_data,
             caption = "Speed and stopping 
             distances of cars")
``` 

Results:

Speed and stopping distances of cars
speed dist
4 2
4 10
7 4
7 22
8 16

Figures and captions

R code:

```{r}
#| fig-label: cars-plot
#| fig-cap: "Distance taken for a car to stop, against it's speed during the test."

library(ggplot2)

ggplot(cars,
      aes(x = speed,
          y = dist)
      ) +
  geom_point()
``` 

Results:

Distance taken for a car to stop, against it’s speed during the test.

Your turn!

Practicing

Using the diamonds data from the ggplot2 package, do the following:

  1. Add a table with a caption

  2. Add figures with a caption

Quarto: YAML

Basic YAML syntax

title: "My report"
author: "Krisanat A."
format:
  html:
    toc: true
    theme: solar
  pdf:
    toc: true

HTML result

PDF result

Your turn!

Practice

  1. Change the title
  2. Add your name as an author
  3. Use HTML and PDF format

Quarto templates

What is a quarto template?

The templates provide a straightforward way to get started with new Quarto projects by providing example content and options.

  1. Create a working initial document for custom formats

  2. Provide the initial content for a custom project type

Remember all the painstaking work we did earlier, setting YAML, creating all the folders, and setting the execution options.

All of that can be gone with one line in the terminal!!

Using a template

The command below copies the contents from the GitHub repository to our local system.

quarto use template numbat-tutorials/workshop-template

Let’s take a 10-minute break