Best practices for workflows in GitHub repositories
What are Workflow Best Practices
How does RO-Crate help?
Generate a workflow test using Planemo
Understand how testing can be automated with GitHub ActionsTime estimation: 30 minutesSupporting Materials:Last modification: May 23, 2023purlPURL: https://gxy.io/GTN:T00339
Best viewed in a Jupyter Notebook
This tutorial is best viewed in a Jupyter notebook! You can load this notebook one of the following ways
Launching the notebook in Jupyter in Galaxy
- Instructions to Launch JupyterLab
- Open a Terminal in JupyterLab with File -> New -> Terminal
- Select the notebook that appears in the list of files on the left.
Downloading the notebook
A workflow, just like any other piece of software, can be formally correct and runnable but still lack a number of additional features that might help its reusability, interoperability, understandability, etc.
One of the most useful additions to a workflow is a suite of tests, which help check that the workflow is operating as intended. A test case consists of a set of inputs and corresponding expected outputs, together with a procedure for comparing the workflow’s actual outputs with the expected ones. It might be the case, in fact, that a test may be considered successful even if the actual outputs do not match the expected ones exactly, for instance because the computation involves a certain degree of randomness, or the output includes timestamps or randomly generated identifiers. Providing documentation is also important to help understand the workflow’s purpose and mode of operation, its requirements, the effect of its parameters, etc. Even a single, well structured README file can go a long way towards getting users started with your workflow, especially if complemented by examples that include sample inputs and running instructions.
In this tutorial, you will learn how to create a git repo, and begin working with it.
Community best practices
Though the practices listed above can be considered general enough to be applicable to any kind of software, individual communities usually add their own specific sets of rules and conventions that help users quickly find their way around software projects, understand them more easily and reuse them more effectively. The Galaxy community, for instance, has a guide on best practices for maintaining workflows.
The Intergalactic Workflow Commission (IWC) is a collection of highly curated Galaxy workflows that follow best practices and conform to a specific GitHub directory layout, as specified in the guide on adding workflows. In particular, the workflow file must be accompanied by a Planemo test file with the same name but a
-test.yml extension, and a
test-data directory that contains the datasets used by the tests described in the test file. The guide also specifies how to fulfill other requirements such as setting a license, a creator and a version tag. A new workflow can be proposed for inclusion in the collection by opening a pull request to the IWC repository: if it passes the review and is merged, it will be published to iwc-workflows. The publication process also generates a metadata file that turns the repository into a Workflow Testing RO-Crate, which can be registered to WorkflowHub and LifeMonitor.
Best practice repositories and RO-Crate
The repo2rocrate software package allows to generate a Workflow Testing RO-Crate for a workflow repository that follows community best practices. It currently supports Galaxy (based on IWC guidelines), Nextflow and Snakemake. The tool assumes that the workflow repository is structured according to the community guidelines and generates the appropriate RO-Crate metadata for the various entities. Several command line options allow to specify additional information that cannot be automatically detected or needs to be overridden.
To try the software, we’ll clone one of the iwc-workflows repositories, whose layout is known to respect the IWC guidelines. Since it already contains an RO-Crate metadata file, we’ll delete it before running the tool.
pip install repo2rocrate git clone https://github.com/iwc-workflows/parallel-accession-download cd parallel-accession-download/ rm -fv ro-crate-metadata.json repo2rocrate --repo-url https://github.com/iwc-workflows/parallel-accession-download
This adds an
ro-crate-metadata.json file at the top level with metadata generated based on the tool’s knowledge of the expected repository layout. By specifying a zip file as an output, we can directly generate an RO-Crate in the format accepted by WorkflowHub and LifeMonitor:
repo2rocrate --repo-url https://github.com/iwc-workflows/parallel-accession-download -o ../parallel-accession-download.crate.zip
Generating tests for your workflow
What if you only have a workflow, but you don’t have the test layout yet? You can use Planemo to generate it.
pip install planemo
As an example we will use this simple workflow, which has only two steps: it sorts the input lines and changes them to upper case. Follow these steps to generate a test layout for it:
Hands-on: Generate Workflow Tests With Planemo
- Download the workflow to a
- Download this input dataset to an
- Upload the workflow to Galaxy (e.g., Galaxy Europe): from the upper menu, click on “Workflow” > “Import” > “Browse”, choose
sort-and-change-case.gaand then click “Import workflow”.
- Start a new history: click on the “+” button on the History panel to the right.
- Upload the input dataset to the new history: on the left panel, go to “Upload Data” > “Choose local files” and select
input.bed, then click “Start” > “Close”.
- Wait for the file to finish uploading (i.e., for the loading circle on the dataset’s line in the history to disappear).
- Run the workflow on the input dataset: click on “Workflow” in the upper menu, locate
sort-and-change-case, and click on the play button to the right.
This should take you to the workflow running page. The input slot should be already filled with
input.bedsince there is nothing else in the history. Click on “Run Workflow” on the upper right of the center panel.
- Wait for the workflow execution to finish.
On the upper menu, go to “User” > “Workflow Invocations”, click on the invocation corresponding to the workflow just run and copy the invocation’s ID. In my case it says “Invocation: a043e8c60873170b” on the right, where
a043e8c60873170bis the ID.
On the upper menu, go to “User” > “Preferences” > “Manage API Key”. If you don’t have an API key yet, click the button to create a new one. Under “Current API key”, click the button to copy the API Key on the right.
planemo workflow_test_init --galaxy_url https://usegalaxy.eu --from_invocation INVOCATION_ID --galaxy_user_key API_KEY, replacing
INVOCATION_IDwith the actual invocation ID and
API_KEYwith the actual API key. If you’re not using the Galaxy Europe instance, also replace
https://usegalaxy.euwith the URL of the instance you’re using.
sort-and-change-case-tests.yml. The rest of the files generated by Planemo are under
planemo workflow_test_init --galaxy_url https://usegalaxy.eu --from_invocation INVOCATION_ID --galaxy_user_key API_KEY
Adding a GitHub workflow
In the previous section, we have learned how to generate a test layout for an example Galaxy workflow. You can apply the same procedure to your workflow and get the file structure you need to populate the GitHub repository. One thing is still missing though: a GitHub workflow to test the Galaxy workflow automatically. At the top level of the repository, create a
.github/workflows directory and place a
wftest.yml file inside it with the following content:
name: Periodic workflow test on: schedule: - cron: '0 3 * * *' workflow_dispatch: jobs: test: name: Test workflow runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 with: fetch-depth: 1 - uses: actions/setup-python@v1 with: python-version: '3.7' - name: install Planemo run: | pip install --upgrade pip pip install planemo - name: run planemo test run: | planemo test --biocontainers sort-and-change-case.ga
sort-and-change-case.ga with the name of your actual Galaxy workflow. You can find extensive documentation on GitHub workflows on the GitHub web site. Here we’ll give some highlights:
onfield sets the GitHub workflow to run:
- automatically every day at 3 AM
- when manually dispatched
- the steps do the following:
- check out the GitHub repository
- set up a Python environment
- install Planemo
planemo teston the Galaxy workflow