2024-10-22
Infographic by Kramer & Bosman
Literate Programming, introduced by Donald Knuth in the 1980s, is a programming paradigm that emphasizes the intertwining of human-readable documentation and source code.
Essentially:
The program is written as a coherent narrative where code segments and explanations are woven together in a way that emphasizes understanding and readability
The code segments ordered in a logical manner for the reader, rather than the order required by the compiler.
The narrative format helps to bridge the gap between the code and the theoretical framework, ensuring that the computational steps are aligned with the objectives.
In Quarto, this is enabled through Code Chunks
An example of code chunk:
Magic! (just kidding)
Image by Allison Horst (allisonhorst.com)
For R, the code chunks are generated with the help of knitr
package.
Each code chunk will have a list of cell options that looks like this if you use source
view:
The complete list of code chunk options for Knitr is in this documentation page, but the important ones are:
echo
- Whether to display the source code in the rendered output (true/false)
output
- Whether to display the output of the code (true/false)
label
- Unique label for the code chunks - useful for cross-referencing!
output-location
- Location of output relative to the code that generates it (more relevant for presentations)
Use highlight-style
to specify the code highlighting style by choosing from the supported themes: a11y, arrow, atom-one, ayu, breeze, github, gruvbox
Use code-line-numbers
to highlight specific lines of codes (this will make more sense for presentation, but you can also apply this to static documents)
```{r}
#| echo: true
#| output: false
#| code-line-numbers: "3,4"
#| highlight-style: github
#| code-overflow: wrap
library(tidyverse)
diamonds %>% ggplot(aes(x = price)) +
geom_histogram(binwidth = 500, fill = "blue", color = "black") +
labs(title = "Histogram of Diamond Prices",
x = "Price (USD)",
y = "Frequency")
```
tidyverse
library
Code blocks and executable code cells in Quarto can include line-based annotations to further explain the code and the flow of the logic to your readers.
Great for teaching / presentation!
Syntax (in visual editor):
Output:
Each annotated line must end with a comment using the language-specific comment character for the code cell, followed by a space and the annotation number enclosed in angle brackets (e.g., # <1>
).
If the annotation covers multiple lines, the same annotation number can be repeated.
After the code cell, provide an ordered list that details the contents of each annotation. Each item in this list should correspond to the line(s) of code marked with the same annotation number.
Amend the YAML header of the following code chunks to fit their description. Refer to this documentation page for the available knitr cells option. Use the source editor for this exercise.
Chunk 1: Give the chunk below a label called “basic-chunk”
```{r}
# Create a simple data frame
df <- data.frame(
x = 1:5,
y = c(2, 4, 6, 8, 10)
)
print(df)
```
Chunk 2: make this chunk hide output and show code:
Convert the comments inside the code chunks below into annotations. Feel free to amend the comment to your liking if you think it’s not descriptive enough to the audience. Render the document to see how it looks like.
Chunk 6:
```{r}
#| label: annotation-1
numbers <- c(10, 20, 30, 40, 50) # create a number vector
mean_value <- mean(numbers) #Calculate the average
print(mean_value)
```
Chunk 7:
```{r}
#| label: data-manipulation
library(dplyr) # load the dplyr library
# Create a sample dataset
df <- data.frame(
name = c("Alice", "Bob", "Charlie", "David"),
age = c(25, 30, 35, 28),
score = c(85, 92, 78, 95)
)
result <- df %>%
filter(age > 25) %>% # Keep only rows where age > 25
mutate(grade = case_when(
score >= 90 ~ "A",
score >= 80 ~ "B",
TRUE ~ "C"
)) %>% # Add a new column 'grade' based on score
arrange(desc(score)) # Sort by score in descending order
print(result)
```
Chunk 8:
```{r}
#| label: plot-annotation
library(ggplot2) # load the ggplot2 library
ggplot(df, aes(x = age, y = score)) + # set the x and y axis
geom_point() + # Add scatter plot points
geom_smooth(method = "lm", se = FALSE) + # Add a linear regression line
labs(title = "Age vs Score",
x = "Age",
y = "Score") # Set plot labels
```
Chunk 9:
By default, Quarto will use Pandoc engine to convert the in-text citations and generate the references in your document. You will need the following components:
A quarto document formatted with in-text citations in Rmarkdown syntax (more on this later).
A bibliographic file, e.g. BibLaTeX (.bib) or BibTeX (.bibtex) file.
A Citation Style Language (CSL) file which specifies the formatting to use when generating the citations and bibliography (when not using natbib or biblatex to generate the bibliography).
Both files have to be specified in the YAML header like so: (In this example, the .bib
file and the .csl
file is located in the same folder as the .qmd
document.)
---
title: "Manuscript"
bibliography: references.bib
csl: nature.csl
---
references.bib
is the bibliographic text file. This will also be automatically generated after you include a citation in your document for the first time.
nature.csl
is the citation style document, in this example is the nature citation style.
You will need to download the csl file from the repository and place it in your working directory.
CSL Project repository: https://github.com/citation-style-language/styles
Common ones:
Syntax | Output |
---|---|
@katz2021 mentioned that… |
@katz2021 mentioned that… |
Katz et al. [-@katz2021] mentioned that… |
Katz et al. [-@katz2021] mentioned that… |
Software citation is good [@katz2021, pp. 33-35] |
Software citation is good [@katz2021, pp. 33-35]! |
More researchers are saying that software citation is good [@katz2021; @park2019] |
More researchers are saying that software citation is good [@katz2021; @park2019] |
Insert in-text citations by typing @
which will trigger a popup of items saved in your Zotero library.
Inserting citations and footnotes is generally easier in Visual editor :
click on Insert
> Citation
, which will bring up a popup box where you can choose your citation source!
click on Insert
> Footnotes
to add footnotes
Other than your Zotero library, here are the sources that you can retrieve from:
By default, Quarto will place the references section at the end of the document. You can also specify the placement by putting this section in your document (note that the example below is the source
view on Quarto):
### References
::: {#refs}
:::
Which will print out the output below:
On the next slide is a snapshot of a paragraph from the tidyverse homepage, with hard-coded in-text citations and reference section.
Paragraph:
There are a number of projects that are similar in scope to the tidyverse. The closest is perhaps Bioconductor (Gentleman et al. 2004; Huber et al. 2015), which provides an ecosystem of packages that support the analysis of high-throughput genomic data. The tidyverse has similar goals to R itself, but any comparison to the R Project (R Core Team 2019) is fundamentally challenging as the tidyverse is written in R, and relies on R for its infrastructure; there is no tidyverse without R! That said, the biggest difference is in priorities: base R is highly focused on stability, whereas the tidyverse will make breaking changes in the search for better interfaces. Another closely related project is data.table by Dowle and Srinivasan (2019), which provides tools roughly equivalent to the combination of dplyr, tidyr, tibble, and readr. data.table prioritises concision and performance.
References used:
Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. https://www.nature.com/articles/nmeth.3252.
Gentleman, R.C., Carey, V.J., Bates, D.M. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80 (2004). https://doi.org/10.1186/gb-2004-5-10-r80
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Dowle, Matt, and Arun Srinivasan. 2019. data.table: Extension of ’Data.frame‘. https://CRAN.R-project.org/package=data.table.
Quarto provides extensions for manuscript writing that contains styles specific for several journals/publishers, such as PLOS, ACM, JOSS, Elsevier, and more.
These extensions provide rich YAML metadata specifically for academic writing (often referred as “Front Matter” metadata).
Let’s dive into these Front Matter YAML metadata first before we explore the templates!
Scholarly articles demand extensive details in their front matter, beyond just a title and author.
Quarto offers a comprehensive range of YAML metadata keys to include these details.
This metadata covers specifying authors and their affiliations, abstract, keywords, copyright, licensing, and funding.
Below is a YAML header example:
---
title: "Library Carpentry: Best practices in organizing shelf space in the library"
date: 2024-07-01
author:
- name: Bella Ratmelia
id: br
orcid: 0000-0003-4913-9508
email: bellar@smu.edu.sg
corresponding: true
affiliation:
- name: Singapore Management University
city: Singapore
url: www.smu.edu.sg
- name: Danping Dong
id: dp
orcid: 0000-0003-4913-9508
email: bellar@smu.edu.sg
affiliation:
- name: Singapore Management University
city: Singapore
url: www.smu.edu.sg
abstract: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
keywords:
- Library
- Carpentry
license: "CC BY"
copyright:
holder: Bella Ratmelia
year: 2024
citation:
container-title: Journal of Library Carpentry
volume: 1
issue: 1
doi: 10.5555/12345678
funding: "The author received no specific funding for this work."
---
The author key includes several sub-keys that offer additional details needed for scholarly articles. For example, you can add an author’s affiliation, roles, email contact, and whether the author is a corresponding author.
---
author:
name: Bella Ratmelia
orcid: 0000-0003-4913-9508
url: https://bellaratmelia.github.io
email: bellar@smu.edu.sg
corresponding: true
roles: "Conceptualization"
affiliation:
- name: Singapore Management University
department: SMU Libraries
country: SG
url: www.smu.edu.sg
ror: 050qmg959
---
These metadata include things like abstract, keywords, license, copyright, and funding information.
---
abstract: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
keywords:
- Library
- Carpentry
license: "CC BY"
copyright:
holder: Bella Ratmelia
year: 2024
funding: "The author received no specific funding for this work."
---
For articles published to the web, include author, date and citation url metadata. For example:
---
title: "Library Carpentry: Best practices in organizing shelf space in the library"
description: |
Best practices in organizing shelf space in the library
date: 2024-07-01
author:
- name: Bella Ratmelia
id: br
orcid: 0000-0003-4913-9508
email: bellar@smu.edu.sg
corresponding: true
affiliation:
- name: Singapore Management University
city: Singapore
url: www.smu.edu.sg
citation:
url: https://smu.edu.sg/library
bibliography: references.bib
---
For journal articles, there are additional metadata that needs to be included such as volume, issue, publisher, and page numbers, like so:
---
citation:
type: article-journal
container-title: "Journal of Library Carpentry"
volume: 1
issue: 1
doi: 10.5555/12345678
url: https://example.com/summarizing-output
bibliography: references.bib
---
Tip
The front matter metadata in Quarto is based on the schema from Citation Style Language project (expressed as YAML instead of XML). See the complete list of options in this documentation page.
Copy and paste the following Front Matter template to your quarto document:
---
title: "Library Carpentry: Best practices in organizing shelf space in the library"
date: 2024-07-01
author:
- name: Bella Ratmelia
id: br
orcid: 0000-0003-4913-9508
email: bellar@smu.edu.sg
roles: "Shelf Blueprint"
corresponding: true
affiliation:
- name: Singapore Management University
city: Singapore
url: www.smu.edu.sg
- name: Danping Dong
id: dp
orcid: 0000-0002-2229-6709
email: dpdong@smu.edu.sg
roles: "Materials Procurement"
affiliation:
- name: Singapore Management University
city: Singapore
url: www.smu.edu.sg
abstract: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
keywords:
- Library
- Carpentry
license: "CC BY"
copyright:
holder: Bella Ratmelia
year: 2024
citation:
container-title: Journal of Library Carpentry
volume: 1
issue: 1
doi: 10.5555/12345678
funding: "The author received no specific funding for this work."
---
quarto use template quarto-journals/plos
quarto render your-document-name.qmd --to plos-pdf
By default, Quarto will render document output to HTML. We can change it to render to Word by changing the YAML header like so:
---
title: "Library Carpentry: Best practices in organizing shelf space in the library"
format:
docx:
toc: true
number-sections: true
highlight-style: github
---
Note
You need to have Microsoft Word installed to be able to produce and view the Word output.
Similar to docx, you can change the render output to PDF by amending the YAML header like so:
---
title: "Library Carpentry: Best practices in organizing shelf space in the library"
format:
pdf:
toc: true
number-sections: true
colorlinks: true
highlight-style: github
---
Note
Latest version of Quarto has a built-in built in PDF compilation engine, which among other things performs automatic installation of TinyTex
and any missing TeX packages (required for LaTeX rendering)
If you encounter persistent errors when rendering to PDF, a workaround that I like to use is to render it to an HTML page, and then “print” them as PDF.
Note
You can update or install TinyTex in the RStudio Terminal with this command:
quarto install tinytex
Not a proprietary format - it is rendered as HTML slides which you can put on GitHub if you’d like to host it online.
Being open-source, Reveal.js is free to use, which eliminates licensing costs associated with PowerPoint.
Extensive customization options through HTML, CSS, and JavaScript - and easily switch to HTML or PDF.
Presentations are HTML-based and can be accessed via any web browser without needing specific software.
Works across different operating systems and devices without compatibility issues.
Presentations can be designed to be responsive and accessible, ensuring they look good on any device or screen size.
Presentations can be hosted locally for offline access or online for easy sharing.
Similar to docx and PDF, we can change the render output format to revealjs through the YAML header like so:
---
title: "Habits"
author: "John Doe"
format: revealjs
---
Note
Fun Fact: The slides for this workshops are created with Quarto and RevealJS!
The complete list of options is in this documentation page. Here are several ones that you may find useful:
incremental
- controls whether to show all bullet points at once, or as you progress the slides.
slide-number
- controls whether to show slide numbers (will appear at the bottom right corner)
theme
- Theme name, theme scss file, or a mix of both.
scrollable
- controls whether to allow content that overflows slides vertically to scroll. This can also be set per-slide by including the .scrollable
class on the slide title.
Let’s explore the following:
## The Diamonds dataset
```{r}
#| label: load-library
#| lst-label: lst-loadlib
#| lst-cap: Load libraries
#| echo: true
library(tidyverse)
library(corrplot)
library(gtsummary)
```
The dataset, available through `ggplot2` package, contains the prices and other attributes of over 50,000 round cut diamonds, specifically 53,940 diamonds. It includes various details such as the price, weight, cut quality, color, clarity, and dimensions of the diamonds. Below is a table detailing the variables included in the dataset:
| **Variable** | **Description** |
|--------------|----------------------------------------------------------|
| price | Price in US dollars (\$326–\$18,823) |
| carat | Weight of the diamond (0.2–5.01) |
| cut | Quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
| color | Diamond color, from D (best) to J (worst) |
| clarity | A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
| x | Length in mm (0–10.74) |
| y | Width in mm (0–58.9) |
| z | Depth in mm (0–31.8) |
| depth | Total depth percentage = z / mean(x, y) = 2 \* z / (x + y) (43–79) |
| table | Width of top of diamond relative to widest point (43–95) |
## Diamonds overview
```{r}
#| label: view-data
head(diamonds)
```
Thank you for your active participation!
Please tell us one thing you liked about the course and one area of improvement here!