Notes - Using CICD to check spelling in quarto documents

As a data scientist or R programmer, you may be familiar with the benefits of version control systems like GitHub for tracking changes to your code base and collaborating with others. But did you know that you can also use GitHub to automate the testing, building, and deployment of your R projects? This process, known as continuous integration and deployment (CICD), can save you time and effort by ensuring that your code is always in a deploy-able state and by automatically delivering new updates to your users. In this blog post, we will show you how to set up CICD for your quarto documents on GitHub, including configuring a build pipeline and integrating a spelling checker. By the end of this tutorial, you will have a workflow in place that helps you catch spelling mistakes before they make it into your final documents.

CICD

CICD is often used in GitHub projects for package development where it helps to maintain a certain code-quality and style consistency across different contributors and developers. For R projects other than packages CICD is used much less frequently. I belief that setting up CICD pipelines for less complex projects with only very few contributors is still useful to ensure consistent style, spelling, and more.

As I am occasionally involved in creating teaching materials in R using quarto, I wanted to implement some CICD checks for quarto documents. As most out-of-the-box CICD pipelines are designed for package development, existing pipelines needed some adjustment to work with other R projects.

Aim

When creating teaching materials in R I rely on GitHub for version control. Generally, I have a main-branch which deploys to a GitHub-page displaying the rendered content. The development of materials happens on the devel branch with a pending merge request to the main branch. Whenever a chapter or a section is ready to be published, I merge the branches. I wanted to create a pipeline that runs a spell-check on all my quarto files on the merge request with main, i.e.: Whenever I push to devel I want GitHub to run the CICD pipeline to check my spelling. As an example, I will show how to implement spelling CICD on this blog-project.

Spell-check

Because R is all I know, I would like to use an R-package to do the spell-checking. The spelling package is well suited for the task, as it allows to spell-check all files at once. Before we try to implement the CICD pipeline, the spell-checker has to work locally, so we first install and load the package:

install.packages(spelling)

library(tidyverse)

── Attaching core tidyverse packages ─────────────── tidyverse 1.3.2.9000 ──
✔ dplyr     1.1.0.9000     ✔ readr     2.1.3     
✔ forcats   0.5.2          ✔ stringr   1.5.0     
✔ ggplot2   3.4.0          ✔ tibble    3.1.8     
✔ lubridate 1.9.0          ✔ tidyr     1.2.1     
✔ purrr     1.0.1          
── Conflicts ────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(spelling)

If you are working on a package, you can directly use the function spell_check_package() and the spelling package will do so. If you are working on any other R project you have to use the spell_check_files() function and you have to include a path to the files you want to check. Let’s check just this file:

# 2023-01-09_CICD-spelling/
spelling::spell_check_files(path = "index.qmd") %>%
  head()

  WORD      FOUND IN
aff       index.qmd:348,349,350
callout   index.qmd:114
CICD      index.qmd:2,6,8,9,16,27,28,31,33,34,35,37,38,46,47,53,70,122,139,167,170,172,173,174,176,278,322,411,412,415
cran      index.qmd:103
de        index.qmd:349,350,357,367,391,400,401,414
desc      index.qmd:261,305,399

It looks like there are a few words that spelling did not recognize. We should carefully look through the full list and decide whether any mistakes were made. We would not want the GitHub action to prohibit a merge request for any of these words, as there are no typos present (I hope). Therefore, we want to add these words to a file that include words to be ignored by the spelling package.

If you are working on a package, this is easy, you can use the spelling::update_wordlist() function. We simply save the list of words as a .txt file. For now, we save it in the working directory.

write(spelling::spell_check_files(path = "index.qmd")[[1]], "WORDLIST.txt")

The file looks like this:

read_lines("WORDLIST.txt")

 [1] "aff"            "callout"        "CICD"           "cran"          
 [5] "de"             "desc"           "dic"            "djnavarro"     
 [9] "doch"           "eval"           "frami"          "FRAMI"         
[13] "german"         "github"         "hört"           "https"         
[17] "hunspell"       "ìnst"           "jeder"          "JetBrains"     
[21] "JW"             "lang"           "lockfile"       "md"            
[25] "nur"            "png"            "pre"            "qmd"           
[29] "qmd's"          "readme"         "readr"          "renv"          
[33] "repo"           "Rproj"          "Rscript"        "StefanThoma"   
[37] "subfolders"     "testthat"       "Thoma"          "tidyverse"     
[41] "ubuntu"         "verowokjnsthet" "versteht"       "versthet"      
[45] "von"            "wordlist"       "WORDLIST"       "yaml"          
[49] "yml"            "zitat"          "Zitat"

Now we can tell spelling to ignore the words in this file from typo-detection:

spelling::spell_check_files(
  path = "index.qmd",
  ignore = read_lines("WORDLIST.txt")
)

No spelling errors found.

You can find a more comprehensive guide to the spelling package in the package manual.

# we can get all qmd's in a project by
list.files(
  path = "../..", # first setting the path to the project
  recursive = TRUE, # include subfolders
  pattern = ".*.qmd$"
) # include only files ending in .qmd

[1] "index.qmd"                                     
[2] "posts/2023-01-09_CICD-spelling/index.qmd"      
[3] "posts/2023-01-09_welcome/index.qmd"            
[4] "posts/2023-03-14_shiny_script_upload/index.qmd"

path

The structure of this project is such that each blog-post .qmd file is two folders down from the .Rproj file. The working directory of the .qmd blog-post file is where the file is located. If I want to list files or save files in a higher order folder I need to adjust my path to first go two folders up. I do this by adding "../.." to my file paths.

The working directory of the CICD pipeline is by default on project level, therefore, the "../.." is not required.

wordlist <- list.files(
  path = "../..",
  recursive = TRUE,
  full.names = TRUE,
  pattern = ".*.qmd$"
) %>%
  spelling::spell_check_files()

Now you should take a good look at the output and fix any typos spotted.

What remains is a list of words to be ignored. They can now be saved into a project level WORDLIST_EXAMPLE.txt file to be accessed later by our CICD workflow.

write(x = wordlist[[1]], file = "../../inst/WORDLIST_EXAMPLE.txt")

Check again with WORDLIST_EXAMPLE.txt:

list.files(
  path = "../..",
  recursive = TRUE,
  full.names = TRUE,
  pattern = ".*.qmd$"
) %>%
  spelling::spell_check_files(ignore = read_lines("../../inst/WORDLIST_EXAMPLE.txt"))

No spelling errors found.

Looks like it worked — great!

Append WORDLIST

It makes sense to check the spelling locally before you push to your develop branch. For this purpose I create an r-script where I can run the spell-check for the project and where I can also append the WORDLIST.txt file if needed.

#-------------------------- spell-check ----------------------------------------

# create empty wordlist:
# write("", file =   "../../inst/WORDLIST_EXAMPLE.txt")
# check spelling:
spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
  ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)

# now check those words and whether or not they are really mistakes.
# once you fixed all mistaked you can:
words <- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
  ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)
# now you can add words to the wordlist
#-- uncomment the following line
# write(words[[1]], file =   "inst/WORDLIST_EXAMPLE.txt", append = TRUE)

spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
  ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)

Setup CICD Workflow

Now this needs to be implemented in the CICD pipeline. To implement GitHub CICD I create a folder .github in the project directory, and the folder workflows within the .github folder. This is where CICD pipelines are stored.

CICD pipelines are written in yaml format, it should look like this:

#| eval: false
name: Spellcheck
on:
  pull_request: {branches: ['main']}
jobs:
  Spelling:
    runs-on: ubuntu-latest
    container: {image: "rocker/tidyverse:4.2.1"}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Install spelling
        run: if (!require("spelling")) install.packages("spelling")
        shell: Rscript {0}

      - name: Run Spelling Check test
        run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
        shell: Rscript {0}

The first few lines define the name of the workflow (Spellcheck) and when it should be executed.

In this case, the action runs on pull requests to the main branch.

#| eval: false
name: Spellcheck
on:
  pull_request: {branches: ['main']}

Then, we define the job to run:

jobs:
  Spelling:
    runs-on: ubuntu-latest
    container: {image: "rocker/tidyverse:4.2.1"}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Install spelling
        run: if (!require("spelling")) install.packages("spelling")
        shell: Rscript {0}

      - name: Run Spelling Check test
        run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
        shell: Rscript {0}

We run just one job called Spelling.

It is run on a docker image deployed by GitHub. We use a particular docker image that comes with R and tidyverse pre-installed, this eases the use of R in this image.

The actual workflow is defined in the steps (which can be named) — here we only have three steps.

actions/checkout@v3 loads the GitHub repository so the subsequent steps can reference the repo.
Next, the Install spelling step installs the R package spelling. This is written in R code, so we need to specify that we run the command in R. We do this with the instruction shell: Rscript {0}.
At last, we run the spell check in R. By default, the code is executed in the project level directory, so we do not need to adjust the path in the list.files() function to go up the project directory. The same goes for the ìnst/WORDLIST_EXAMPLE.txt file.

Now while this works, it will not throw an error if typos are spotted. We can remedy this by writing code that throws an error if there is a typo. The testthat package is designed to test R code for packages. We use its test_that() function together with the expect_equal() function where we can specify the test we want to conduct. Our test is simple: As the object argument we run the spell-check from above. The output we expect is a spell-check that did not result in any error. We have to supply such an object representing a flawless spell-check in the expected argument. To always get such an object we simply spell-check the WORDLIST_EXAMPLE.txt file using itself as the list of words to ignore:

testthat::test_that(
  desc = "No Typo",
  code = testthat::expect_equal(
    object = spelling::spell_check_files(
      path = list.files(
        path = "../..", pattern = ".*.qmd$",
        recursive = TRUE, full.names = TRUE
      ),
      ignore = readr::read_lines("../../inst/WORDLIST_EXAMPLE.txt")
    ),
    expected = spelling::spell_check_files(
      path = "../../inst/WORDLIST_EXAMPLE.txt",
      ignore = readr::read_lines("../../inst/WORDLIST_EXAMPLE.txt")
    )
  )
)

Test passed 🌈

We can now implement this test into our CICD workflow:

name: Spellcheck
on:
  pull_request: {branches: ['main']}
jobs:
  Spelling:
    runs-on: ubuntu-latest
    container: {image: "rocker/tidyverse:4.2.1"}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Install spelling
        run: if (!require("spelling")) install.packages("spelling")
        shell: Rscript {0}

      - name: Run Spelling Check test
        run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
        shell: Rscript {0}

      - name: Install testthat
        run: if (!require("testthat")) install.packages("testthat")
        shell: Rscript {0}

      - name: test typos
        run: testthat::test_that(desc = "No Typo", code = {
        no_problem <- spelling::spell_check_files(path = "inst/WORDLIST_EXAMPLE.txt", ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
        spellcheck <- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
        testthat::expect_equal(object = spellcheck, expected = no_problem)
        })
        shell: Rscript {0}

Change language

This works fine for English, but what if we write in German? The spelling package depends on the hunspell package. This package comes with the English dictionary pre-installed. Further, it looks at the user library for any other dictionaries requested in the spelling function call.

We can add dictionaries to the user library in the OS we are using to locally check the spelling in our projects. As soon as we want to spell-check on GitHub (with CICD) it gets a bit more tricky because we need to reference a library file within the CICD workflow.

Let’s write a file that contains a German quote (by JW von Goethe).

# zitat <- file("Zitat.txt", encoding = "UTF-8")
# write(x = "Es hört doch jeder nur, was er verowokjnsthet.", file = "Zitat.txt")
write_lines("Es hört doch jeder nur, was er versthet.", file = "Zitat.txt")

The spelling package does not recognize the language in a file:

spelling::spell_check_files("Zitat.txt")

  WORD       FOUND IN
doch       Zitat.txt:1
hört      Zitat.txt:1
jeder      Zitat.txt:1
nur        Zitat.txt:1
versthet   Zitat.txt:1

We can list the dictionaries that are currently available to the hunspell package.

hunspell::list_dictionaries()

[1] "en_AU" "en_CA" "en_GB" "en_US"

Apparently, only English dictionaries are available at the moment. You can download UTF-8 encoded dictionaries from the LibreOffice GitHub repo fork. For me, the easiest way was to download the entire repo as a .zip folder and then move the dictionary files manually into the repo in which you want to spell-check using that dictionary.

hunspell requires two dictionary files for a language: the .dic and the .aff file. In this example we take the German dictionary files de_DE_frami.aff and de_DE_frami.dic and save them in the inst folder where our WORDLIST_EXAMPLE.txt file is as well. I am not sure why, but sometimes hunspell will look for the file de_CH_FRAMI.dic when we specify lang = "inst/de_CH_frami" so make sure to rename the .dic and .aff files as de_DE_FRAMI.aff and de_DE_FRAMI.dic, just to be sure.

list.files("../../inst")

[1] "de_CH_FRAMI.aff"      "de_CH_FRAMI.dic"      "de_DE_FRAMI.aff"     
[4] "de_DE_FRAMI.dic"      "WORDLIST_EXAMPLE.txt" "WORDLIST.txt"

spelling::spell_check_files("Zitat.txt", lang = "../../inst/de_CH_FRAMI")

  WORD        FOUND IN
versthet.   Zitat.txt:1

Now we just have to fix the error and check again.

write_lines("Es hört doch jeder nur, was er versteht.", file = "Zitat.txt")

spelling::spell_check_files("Zitat.txt", lang = "../../inst/de_CH_FRAMI")

No spelling errors found.

The .yml file for the german spell-check would like this:

name: Spellcheck
on:
  pull_request: {branches: ['main']}
jobs:
  Spelling:
    runs-on: ubuntu-latest
    container: {image: "rocker/tidyverse:4.2.1"}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Install spelling
        run: if (!require("spelling")) install.packages("spelling")
        shell: Rscript {0}

      - name: Run Spelling Check test
        run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST.txt"), lang = "inst/de_CH_frami")
        shell: Rscript {0}

      - name: Install testthat
        run: if (!require("testthat")) install.packages("testthat")
        shell: Rscript {0}

      - name: test typos
        run: testthat::test_that(desc = "No Typo", code = {
        no_problem <- spelling::spell_check_files(path = "inst/WORDLIST_EXAMPLE.txt", ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"), lang = "inst/de_CH_FRAMI")
        spellcheck <- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"), lang = "inst/de_CH_FRAMI")
        testthat::expect_equal(object = spellcheck, expected = no_problem)
        })
        shell: Rscript {0}

Conclusion

You should now be able to run a spell-check on your quarto files. Further, you know how to implement a GitHub CICD pipeline for spell-checks in any language with available dictionary files. This also allows you to implement other R-code based CICD pipelines.

For your (and my) convenience, I have created book-templates for both English and German quarto books. They include CICD pipelines for both spelling and style check, and also implement a CICD publishing workflow. Please read the respective readme.md file for more information.

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{thoma2023,
  author = {Stefan Thoma},
  editor = {},
  title = {Using {CICD} to Check Spelling in Quarto Documents},
  date = {2023-03-07},
  url = {https://stefanthoma.github.io/quarto-blog//posts/2023-01-09_CICD-spelling},
  langid = {en}
}

For attribution, please cite this work as:

Stefan Thoma. 2023. “Using CICD to Check Spelling in Quarto Documents.” March 7, 2023. https://stefanthoma.github.io/quarto-blog//posts/2023-01-09_CICD-spelling.