install.packages(spelling)
As a data scientist or R programmer, you may be familiar with the benefits of version control systems like GitHub for tracking changes to your code base and collaborating with others. But did you know that you can also use GitHub to automate the testing, building, and deployment of your R projects? This process, known as continuous integration and deployment (CICD), can save you time and effort by ensuring that your code is always in a deploy-able state and by automatically delivering new updates to your users. In this blog post, we will show you how to set up CICD for your quarto documents on GitHub, including configuring a build pipeline and integrating a spelling checker. By the end of this tutorial, you will have a workflow in place that helps you catch spelling mistakes before they make it into your final documents.
CICD
CICD is often used in GitHub projects for package development where it helps to maintain a certain code-quality and style consistency across different contributors and developers. For R projects other than packages CICD is used much less frequently. I belief that setting up CICD pipelines for less complex projects with only very few contributors is still useful to ensure consistent style, spelling, and more.
As I am occasionally involved in creating teaching materials in R using quarto, I wanted to implement some CICD checks for quarto documents. As most out-of-the-box CICD pipelines are designed for package development, existing pipelines needed some adjustment to work with other R projects.
Aim
When creating teaching materials in R I rely on GitHub for version control. Generally, I have a main
-branch which deploys to a GitHub-page displaying the rendered content. The development of materials happens on the devel
branch with a pending merge request to the main
branch. Whenever a chapter or a section is ready to be published, I merge the branches. I wanted to create a pipeline that runs a spell-check on all my quarto files on the merge request with main
, i.e.: Whenever I push to devel
I want GitHub to run the CICD pipeline to check my spelling. As an example, I will show how to implement spelling CICD on this blog-project.
Spell-check
Because R is all I know, I would like to use an R-package to do the spell-checking. The spelling
package is well suited for the task, as it allows to spell-check all files at once. Before we try to implement the CICD pipeline, the spell-checker has to work locally, so we first install and load the package:
library(tidyverse)
── Attaching core tidyverse packages ─────────────── tidyverse 1.3.2.9000 ──
✔ dplyr 1.1.0.9000 ✔ readr 2.1.3
✔ forcats 0.5.2 ✔ stringr 1.5.0
✔ ggplot2 3.4.0 ✔ tibble 3.1.8
✔ lubridate 1.9.0 ✔ tidyr 1.2.1
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(spelling)
If you are working on a package, you can directly use the function spell_check_package()
and the spelling
package will do so. If you are working on any other R project you have to use the spell_check_files()
function and you have to include a path to the files you want to check. Let’s check just this file:
# 2023-01-09_CICD-spelling/
::spell_check_files(path = "index.qmd") %>%
spellinghead()
WORD FOUND IN
aff index.qmd:348,349,350
callout index.qmd:114
CICD index.qmd:2,6,8,9,16,27,28,31,33,34,35,37,38,46,47,53,70,122,139,167,170,172,173,174,176,278,322,411,412,415
cran index.qmd:103
de index.qmd:349,350,357,367,391,400,401,414
desc index.qmd:261,305,399
It looks like there are a few words that spelling did not recognize. We should carefully look through the full list and decide whether any mistakes were made. We would not want the GitHub action to prohibit a merge request for any of these words, as there are no typos present (I hope). Therefore, we want to add these words to a file that include words to be ignored by the spelling
package.
If you are working on a package, this is easy, you can use the spelling::update_wordlist()
function. We simply save the list of words as a .txt file. For now, we save it in the working directory.
write(spelling::spell_check_files(path = "index.qmd")[[1]], "WORDLIST.txt")
The file looks like this:
read_lines("WORDLIST.txt")
[1] "aff" "callout" "CICD" "cran"
[5] "de" "desc" "dic" "djnavarro"
[9] "doch" "eval" "frami" "FRAMI"
[13] "german" "github" "hört" "https"
[17] "hunspell" "ìnst" "jeder" "JetBrains"
[21] "JW" "lang" "lockfile" "md"
[25] "nur" "png" "pre" "qmd"
[29] "qmd's" "readme" "readr" "renv"
[33] "repo" "Rproj" "Rscript" "StefanThoma"
[37] "subfolders" "testthat" "Thoma" "tidyverse"
[41] "ubuntu" "verowokjnsthet" "versteht" "versthet"
[45] "von" "wordlist" "WORDLIST" "yaml"
[49] "yml" "zitat" "Zitat"
Now we can tell spelling
to ignore the words in this file from typo-detection:
::spell_check_files(
spellingpath = "index.qmd",
ignore = read_lines("WORDLIST.txt")
)
No spelling errors found.
You can find a more comprehensive guide to the spelling
package in the package manual.
# we can get all qmd's in a project by
list.files(
path = "../..", # first setting the path to the project
recursive = TRUE, # include subfolders
pattern = ".*.qmd$"
# include only files ending in .qmd )
[1] "index.qmd"
[2] "posts/2023-01-09_CICD-spelling/index.qmd"
[3] "posts/2023-01-09_welcome/index.qmd"
[4] "posts/2023-03-14_shiny_script_upload/index.qmd"
<- list.files(
wordlist path = "../..",
recursive = TRUE,
full.names = TRUE,
pattern = ".*.qmd$"
%>%
) ::spell_check_files() spelling
Now you should take a good look at the output and fix any typos spotted.
What remains is a list of words to be ignored. They can now be saved into a project level WORDLIST_EXAMPLE.txt
file to be accessed later by our CICD workflow.
write(x = wordlist[[1]], file = "../../inst/WORDLIST_EXAMPLE.txt")
Check again with WORDLIST_EXAMPLE.txt
:
list.files(
path = "../..",
recursive = TRUE,
full.names = TRUE,
pattern = ".*.qmd$"
%>%
) ::spell_check_files(ignore = read_lines("../../inst/WORDLIST_EXAMPLE.txt")) spelling
No spelling errors found.
Looks like it worked — great!
Append WORDLIST
It makes sense to check the spelling locally before you push to your develop
branch. For this purpose I create an r-script where I can run the spell-check for the project and where I can also append the WORDLIST.txt
file if needed.
#-------------------------- spell-check ----------------------------------------
# create empty wordlist:
# write("", file = "../../inst/WORDLIST_EXAMPLE.txt")
# check spelling:
::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
spellingignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)
# now check those words and whether or not they are really mistakes.
# once you fixed all mistaked you can:
<- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
words ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)# now you can add words to the wordlist
#-- uncomment the following line
# write(words[[1]], file = "inst/WORDLIST_EXAMPLE.txt", append = TRUE)
::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE),
spellingignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt")
)
Setup CICD Workflow
Now this needs to be implemented in the CICD pipeline. To implement GitHub CICD I create a folder .github
in the project directory, and the folder workflows
within the .github
folder. This is where CICD pipelines are stored.
CICD pipelines are written in yaml
format, it should look like this:
#| eval: false
name: Spellcheck
on:
pull_request: {branches: ['main']}
jobs:
Spelling:
runs-on: ubuntu-latest
container: {image: "rocker/tidyverse:4.2.1"}
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Install spelling
run: if (!require("spelling")) install.packages("spelling")
shell: Rscript {0}
- name: Run Spelling Check test
run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
shell: Rscript {0}
The first few lines define the name of the workflow (Spellcheck
) and when it should be executed.
In this case, the action runs on pull requests to the main
branch.
#| eval: false
name: Spellcheck
on:
pull_request: {branches: ['main']}
Then, we define the job to run:
jobs:
Spelling:
runs-on: ubuntu-latest
container: {image: "rocker/tidyverse:4.2.1"}
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Install spelling
run: if (!require("spelling")) install.packages("spelling")
shell: Rscript {0}
- name: Run Spelling Check test
run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
shell: Rscript {0}
We run just one job called Spelling
.
It is run on a docker image deployed by GitHub. We use a particular docker image that comes with R and tidyverse pre-installed, this eases the use of R in this image.
The actual workflow is defined in the steps
(which can be named) — here we only have three steps.
actions/checkout@v3
loads the GitHub repository so the subsequent steps can reference the repo.Next, the Install spelling step installs the R package
spelling
. This is written in R code, so we need to specify that we run the command in R. We do this with the instructionshell: Rscript {0}
.At last, we run the spell check in R. By default, the code is executed in the project level directory, so we do not need to adjust the path in the
list.files()
function to go up the project directory. The same goes for theìnst/WORDLIST_EXAMPLE.txt
file.
Now while this works, it will not throw an error if typos are spotted. We can remedy this by writing code that throws an error if there is a typo. The testthat
package is designed to test R code for packages. We use its test_that()
function together with the expect_equal()
function where we can specify the test we want to conduct. Our test is simple: As the object
argument we run the spell-check from above. The output we expect is a spell-check that did not result in any error. We have to supply such an object representing a flawless spell-check in the expected
argument. To always get such an object we simply spell-check the WORDLIST_EXAMPLE.txt
file using itself as the list of words to ignore:
::test_that(
testthatdesc = "No Typo",
code = testthat::expect_equal(
object = spelling::spell_check_files(
path = list.files(
path = "../..", pattern = ".*.qmd$",
recursive = TRUE, full.names = TRUE
),ignore = readr::read_lines("../../inst/WORDLIST_EXAMPLE.txt")
),expected = spelling::spell_check_files(
path = "../../inst/WORDLIST_EXAMPLE.txt",
ignore = readr::read_lines("../../inst/WORDLIST_EXAMPLE.txt")
)
) )
Test passed 🌈
We can now implement this test into our CICD workflow:
name: Spellcheck
on:
pull_request: {branches: ['main']}
jobs:
Spelling:
runs-on: ubuntu-latest
container: {image: "rocker/tidyverse:4.2.1"}
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Install spelling
run: if (!require("spelling")) install.packages("spelling")
shell: Rscript {0}
- name: Run Spelling Check test
run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
shell: Rscript {0}
- name: Install testthat
run: if (!require("testthat")) install.packages("testthat")
shell: Rscript {0}
- name: test typos
run: testthat::test_that(desc = "No Typo", code = {
no_problem <- spelling::spell_check_files(path = "inst/WORDLIST_EXAMPLE.txt", ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
spellcheck <- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"))
testthat::expect_equal(object = spellcheck, expected = no_problem)
})
shell: Rscript {0}
Change language
This works fine for English, but what if we write in German? The spelling
package depends on the hunspell
package. This package comes with the English dictionary pre-installed. Further, it looks at the user library for any other dictionaries requested in the spelling
function call.
We can add dictionaries to the user library in the OS we are using to locally check the spelling in our projects. As soon as we want to spell-check on GitHub (with CICD) it gets a bit more tricky because we need to reference a library file within the CICD workflow.
Let’s write a file that contains a German quote (by JW von Goethe).
# zitat <- file("Zitat.txt", encoding = "UTF-8")
# write(x = "Es hört doch jeder nur, was er verowokjnsthet.", file = "Zitat.txt")
write_lines("Es hört doch jeder nur, was er versthet.", file = "Zitat.txt")
The spelling
package does not recognize the language in a file:
::spell_check_files("Zitat.txt") spelling
WORD FOUND IN
doch Zitat.txt:1
hört Zitat.txt:1
jeder Zitat.txt:1
nur Zitat.txt:1
versthet Zitat.txt:1
We can list the dictionaries that are currently available to the hunspell package.
::list_dictionaries() hunspell
[1] "en_AU" "en_CA" "en_GB" "en_US"
Apparently, only English dictionaries are available at the moment. You can download UTF-8 encoded dictionaries from the LibreOffice GitHub repo fork. For me, the easiest way was to download the entire repo as a .zip
folder and then move the dictionary files manually into the repo in which you want to spell-check using that dictionary.
hunspell
requires two dictionary files for a language: the .dic
and the .aff
file. In this example we take the German dictionary files de_DE_frami.aff
and de_DE_frami.dic
and save them in the inst
folder where our WORDLIST_EXAMPLE.txt
file is as well. I am not sure why, but sometimes hunspell
will look for the file de_CH_FRAMI.dic
when we specify lang = "inst/de_CH_frami"
so make sure to rename the .dic
and .aff
files as de_DE_FRAMI.aff
and de_DE_FRAMI.dic
, just to be sure.
list.files("../../inst")
[1] "de_CH_FRAMI.aff" "de_CH_FRAMI.dic" "de_DE_FRAMI.aff"
[4] "de_DE_FRAMI.dic" "WORDLIST_EXAMPLE.txt" "WORDLIST.txt"
::spell_check_files("Zitat.txt", lang = "../../inst/de_CH_FRAMI") spelling
WORD FOUND IN
versthet. Zitat.txt:1
Now we just have to fix the error and check again.
write_lines("Es hört doch jeder nur, was er versteht.", file = "Zitat.txt")
::spell_check_files("Zitat.txt", lang = "../../inst/de_CH_FRAMI") spelling
No spelling errors found.
The .yml
file for the german spell-check would like this:
name: Spellcheck
on:
pull_request: {branches: ['main']}
jobs:
Spelling:
runs-on: ubuntu-latest
container: {image: "rocker/tidyverse:4.2.1"}
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Install spelling
run: if (!require("spelling")) install.packages("spelling")
shell: Rscript {0}
- name: Run Spelling Check test
run: spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST.txt"), lang = "inst/de_CH_frami")
shell: Rscript {0}
- name: Install testthat
run: if (!require("testthat")) install.packages("testthat")
shell: Rscript {0}
- name: test typos
run: testthat::test_that(desc = "No Typo", code = {
no_problem <- spelling::spell_check_files(path = "inst/WORDLIST_EXAMPLE.txt", ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"), lang = "inst/de_CH_FRAMI")
spellcheck <- spelling::spell_check_files(list.files(pattern = ".*.qmd$", recursive = TRUE), ignore = readr::read_lines("inst/WORDLIST_EXAMPLE.txt"), lang = "inst/de_CH_FRAMI")
testthat::expect_equal(object = spellcheck, expected = no_problem)
})
shell: Rscript {0}
Conclusion
You should now be able to run a spell-check on your quarto files. Further, you know how to implement a GitHub CICD pipeline for spell-checks in any language with available dictionary files. This also allows you to implement other R-code based CICD pipelines.
For your (and my) convenience, I have created book-templates for both English and German quarto books. They include CICD pipelines for both spelling and style check, and also implement a CICD publishing workflow. Please read the respective readme.md
file for more information.
Reuse
Citation
@online{thoma2023,
author = {Stefan Thoma},
editor = {},
title = {Using {CICD} to Check Spelling in Quarto Documents},
date = {2023-03-07},
url = {https://stefanthoma.github.io/quarto-blog//posts/2023-01-09_CICD-spelling},
langid = {en}
}