# get all task_info.csv ID's
# each row identifies one file in the trash
<- drive_find(
task_info_master pattern = "task_info.csv",
trashed = TRUE
)
I recently co-created a shiny application for the admiral
hackathon in February 2023. The concept of the app seems quite promising for teaching data-manipulation in general, so I decided to publish the code on GitHub and write this short post for anyone interested.
But from the beginning: The admiral
hackathon was an event designed to make statistical programmers from the pharmaceutical industry more comfortable with the admiral
R package. This package allows users to efficiently transform data from one data standard (sdtm) to another (adam).
Hackathon participants formed groups of up to five people and were then tasked to create R-scripts that map the sdtm
data to adam
according to specifics defined in the metadata.
The purpose of the shiny app was threefold:
Allow participants to easily register as a team.
Let participants upload their R-scripts, so that the
admiral
team can see how theadmiral
package is being used – and how the functions and the documentation could be improved.Run participants R-scripts on the cloud and compare the produced file to a solution file. Let participants autonomously get feedback on their work.
In this blog post I want to highlight some of the thoughts that went into this application. Please keep in mind that this work was done under tight time restraints.
The hackathon application is still online (although data-upload is switched off) and the GitHub repository is publicly available. I have also uploaded to GitHub a .zip
file of the workspace to which hackathon participants had access via posit cloud. For more context you can watch recordings of the hackathon-meetings.
Permanent Data
The biggest challenge you have to consider for this app is the permanent data storage. Shiny apps run on a server. Although we can write files on this server, whenever the app restarts, the files are lost. Therefore, a persistent data storage solution is required.
Google drive
I decided to leverage Google drive using the googledrive
package. This allowed me to save structured data (the team registry and the submission scores) as well as unstructured data (their R-script files).
The initial concept was: Each team gets their own folder including the most recent submission for each task, and a .csv
file containing team information. To keep track of the submissions and the respective scores we wrote a .csv
file in the mock-hackathon
folder, so one folder above the team folders.
Saving the team info as a .csv
file worked fine as each team received their own file which – once created – was not touched anymore. As each upload for every team should simply add a row to the submissions.csv
file, appending the file would be ideal. This was not possible using the googledrive
package. Instead, for each submission, the submissions
file was downloaded, appended, and uploaded again. Unfortunately, this lead to a data loss, as the file was continuously overwritten, especially when two teams would submit simultaneously.
If you want to stay in the Google framework, I recommend using the googlesheets4
package for structured data. googlesheets4
allows appending new information to an already existing sheet without the need to download the file first. As both packages follow the same style, going from one to the other is really simple. googlesheets4
requires authentication as well. However, you can reuse the cached token from the googledrive
authentication by setting gs4_auth(token = drive_token())
.
Security Concerns
Connecting a public shiny app to your Google account introduces a security vulnerability in general. Especially so, because we implemented the upload of files to the Google drive. And even more problematic: We run a user generated script and display some of its output. A malicious party might be able to extract the authentication token of our Google account or could upload malware to the drive.
To reduce risk, I simply created an un-associated Google account to host the drive. There are certainly better options available, but this seemed a reasonable solution for basically zero effort.
Register Team
We wanted to allow users to sign up as teams using the shiny app. The app provides a simple interface where users could input a team name and the number of members. This in turn would open two fields for each user to input their name and email address.
We do simple checks to make sure at least one valid email address is supplied, and that the group name is acceptable. The group name cannot be empty, already taken, or contain vulgar words.
The team registration itself was adding the team information to the Google sheets file event_info
into the sheet teams
and to create a team folder in which to store the uploaded R files.
The checks and registration is implemented in the register_team()
function stored in interact_with_google.R
.
Upload & Source Script
To upload a script, participants had to select their team first. The input options were based on the existing folders on the Google-drive in the mock_hackathon
folder. To upload a particular script participants had to also select the task to be solved. The uploaded script is then uploaded to the team folder following a standardised script naming convention.
There are different aspects to be aware of when sourcing scripts on a shiny server. For example, you have to anticipate the packages users will include in their uploaded scripts, as their scripts will load but not install packages. Further, you should keep the global environment of your shiny app separate from the environment in which the script is sourced. This is possible by supplying an environment to the source()
function, e.g: source(path_to_script, local = new.env())
Another thing we had to consider was to replicate the exact folder-structure on the shiny server that participants were working with when creating the scripts, as they were required to source some scripts and to save their file into a specific folder. This was relatively straight forward as we provided participants with a folder structure in the posit cloud instance they were using. They had access to the sdtm
folder in which the data was stored, and the adam
folder into which they saved their solutions. The structure also included a folder with metadata which was also available on the shiny server. For some tasks, participants required some adam
-datasets stored in the adam
folder, essentially the output from previous tasks. On the shiny server this was solved by copying the required adam
dataset from the key
folder (where our solutions to the tasks were stored) to the adam
folder. After sourcing the content in the adam
folder was deleted.
Compare to Solution File
We want to compare the file created by participants with our solution (key) file stored in the key
folder. The diffdf::diffdf()
function allows for easy comparison of two data-frames and directly provides extensive feedback for the user:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
<- tibble(
df1 numbers = 1:10,
letters = LETTERS[1:10]
)<- tibble(
df2 numbers = 1:10,
letters = letters[1:10]
)
Score
To compare submissions between participants we implemented a simple scoring function (score_f()
) based on the table comparison by diffdf()
. The function can be found in the compare_dfs.R
file:
<- function(df_user, df_key, keys) {
score_f <- 10
score <- diffdf::diffdf(df_user, df_key, keys = keys)
diff if (!diffdf::diffdf_has_issues(diff)) {
return(score)
}
if (!diffdf::diffdf_has_issues(diffdf::diffdf(df_user,
df_key,keys = keys,
strict_numeric = FALSE,
strict_factor = FALSE
))) {return(score - 1)
}return(round(min(max(score - length(diff) / 3, 1), 9), 2))
}
Every comparison starts with a score of 10
. We then subtract the length of the comparison object divided by a factor of 3
. The length of the comparison object is a simplified way to represent the difference between the two data-frames by one value. Finally, the score is bounded by min(score, 1)
.
The score is not a perfect capture of the quality of the script uploaded but: 1. helped participants get an idea of how close their data-frame is to the solution file 2. allowed us to raffle prizes based on the merit of submitted r-scripts
Last updated
2023-04-04 07:58:00 UTC
Details
Reuse
Citation
@online{thoma2023,
author = {Stefan Thoma},
editor = {},
title = {Automatic {Feedback} for {R} {Exercises} {Using} {Shiny}},
date = {2023-03-14},
url = {https://stefanthoma.github.io/quarto-blog//posts/2023-03-14_shiny_script_upload},
langid = {en}
}