Tools, Data, and Workflows for tutorials

Author(s)	Bérénice Batut Björn Grüning Saskia Hiltemann Helena Rasche
Reviewers

Overview
Questions:

How can we define the technical infrastructure for a tutorial?

How to define the tools needed for a tutorial?

How to add the needed data directly in an instance?

How to add the workflows related to a tutorial?

How can we check the technical infrastructure is working?

How can we make an existing Galaxy instance able to run a tutorial?

Objectives:

Extracting the technical description for a tutorial

Populating an existing instance with the needed tools, data and workflows for a tutorial

Creating a Galaxy Docker flavor with the needed tools, data and workflows for a tutorial

Testing the Galaxy Docker flavor of a tutorial

Time estimation: 30 minutes

Supporting Materials:

FAQs

Published: Jun 25, 2017

Last modification: Apr 8, 2025

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00060

rating Rating: 4.3 (0 recent ratings, 3 all time)

version Revision: 26

Building a Galaxy instance specifically for your training

To be able to run the tutorial, we need a Galaxy instance where all of the needed tools and data are available. Thus we need to describe the needed technical infrastructure.

The files we define in this tutorial will be used to automatically build a Docker Galaxy flavour, and also to test if a public Galaxy instance is able to run the tool.

In this tutorial, you will learn how to create a virtualized Galaxy instance, based on Docker, to run your training - either on normal computers or cloud environments.

Agenda

In this tutorial, we will deal with:

Building a Galaxy instance specifically for your training

Extracting workflows

Testing the workflow (recommended)

Creating the data-library.yaml (recommended)

Creating the data-manager.yaml (optional)

Creating the Galaxy Interactive Tour (optional)

Testing the technical infrastructure

Conclusion

Extracting workflows

Once the tutorial is ready, we need to develop a workflow that represents the steps taken in the tutorial, and then extract these workflow(s) and add them to the workflows directory in the tutorial. Additionally we will need to add some explanation about the workflow(s) in a README.md file

Hands On: Extract the workflow
Add the topic name as a Tag and the tutorial title as Annotation/Notes to the workflow using the workflow editor.

Download the workflow for the tutorial

Save it in the workflow directory of the tutorial
Check that your workflow directory has an index.md with the contents:
---
layout: workflow-list
---

Testing the workflow (recommended)

Workflow testing is a great way to get feedback that your tutorial can be run successfully on a given server. When you’re giving a training this can provide peace of mind, not only are the tools installed (as is indicated by the badges we provide) but they also work.

Given the workflow you created above and have included in the tutorial folder, you’ll need to create a corresponding -test.yml file.

Hands On: Creating the `-test.yml` file for your workflow
Find the correct name for the file; if your workflow was unicycler.ga, then your test file should be unicycler-test.yml, they need to share the same prefix.
Create the following structure:
---
- doc: Test sample data for the workflow
  job:
    an_input_file:
      class: File
      location: https://....
      filetype: fasta
  outputs:
    ffn:
      asserts:
        - that: has_text
          text: ">A"
        - that: has_text
          text: ">B"

You’ll need to edit the job and outputs sections according to your workflow’s inputs and outputs. Additionally you will need to edit the steps of your workflow .ga file appropriately.

Inputs

Your workflow must use “Data Inputs” for each input dataset. For each of these input steps in the .ga file, you’ll need to do the following:

Edit the label
Edit the name
Edit the inputs[0].name
Edit the tool_state

In a normal workflow you have exported from Galaxy, you’ll see something like

{
    "id": 0,
    "input_connections": {},
    "inputs": [
        {
            "description": "",
            "name": "patient1_ChIP_ER_good_outcome.bam"
        }
    ],
    "label": null,
    "name": "Input dataset",
    "outputs": [],
    "position": {
        "left": 10,
        "top": 10
    },
    "tool_id": null,
    "tool_state": "{\"name\": \"patient1_ChIP_ER_good_outcome.bam\"}",
    "tool_version": null
}

You should synchronize the aforementioned fields so it looks like this:

{
    "id": 0,
    "input_connections": {},
    "inputs": [
        {
            "description": "",
            "name": "good_outcome"
        }
    ],
    "label": "good_outcome",
    "name": "good_outcome",
    "outputs": [],
    "position": {
        "left": 10,
        "top": 10
    },
    "tool_id": null,
    "tool_state": "{\"name\": \"good_outcome\"}",
    "tool_version": null
}

This will allow you to specify good_outcome in your job to load a file:

- doc: ...
  job:
    good_outcome:
      class: File
      location: ...
      filetype: ...

The filetype should be the Galaxy datatype of your file, for example fastqsanger, tabular, bam.

Outputs

For the outputs the process is somewhat simpler:

Identify a step, the outputs of which you would like to test

Convert the relevant outputs to workflow_outputs

In a normal workflow you see

{
    "outputs": [
        {
            "type": "txt",
            "name": "ofile"
        },
        {
            "type": "txt",
            "name": "ofile2"
        }
    ],
    "workflow_outputs": []
}

If you want to test the contents of ofile, you should change it to

{
    "outputs": [
        {
            "type": "txt",
            "name": "ofile"
        },
        {
            "type": "txt",
            "name": "ofile2"
        }
    ],
    "workflow_outputs": [
        {"output_name": "ofile", "label": "my_output"}
    ]
}

You can now use the label you chose (here my_output) in your test case:

- doc:
  job: ...
  outputs:
    my_output:
      asserts:
        has_text:
          text: 'some-string'

Running the Tests

You can test the file you’ve written with the following command and a recent version (>=0.56.0) of planemo:

planemo test \
	--galaxy_url "$GALAXY_URL" \
	--galaxy_user_key "$GALAXY_USER_KEY" \
	--no_shed_install \
	--engine external_galaxy \
	workflow.ga

Planemo will autodetect that the workflow-test.yml file and load that for the testing.

Skipping Testing in the GTN

If for some reason you want to skip this workflow being tested in the GTN, please add a comment with GTN_RUN_SKIP_REASON in the -test.yml file stating the reason it is skipped.

This will also exempt you from writing output tests.

A good use case for this is you want to provide a working test, but the workflow takes upwards of 6 hours to execute (e.g. large download jobs.)

Creating the `data-library.yaml` (recommended)

The datasets needed for a tutorial can also be integrated in the Galaxy instance inside of data libraries. These allow the datasets to be easily shared with all users of a Galaxy instance. Additionally it lets trainees avoid each re-downloading the input data.

These datasets are described in the data-library.yaml files:

---
destination:
  type: library
  name: GTN - Material
  description: Galaxy Training Network Material
  synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org
items:
- name: Title of the topic
  description: Summary of the topic
  items:
  - name: Title of the tutorial
    items:
    - name: 'DOI: 10.5281/zenodo....'
      description: latest
      items:
      - info: https://doi.org/10.5281/zenodo....
        url: https://zenodo.org/records/URL/files/path/to/input
        ext: galaxy-datatype
        src: url

Hands On: Creating the `data-library.yaml`
Copy the Zenodo link
Generate the data-library.yaml file and update the tutorial metadata with the link:
$ planemo training_fill_data_library \
         --topic_name "my-topic" \
         --tutorial_name "my-new-tutorial" \
         --zenodo_link "URL to the Zenodo record"
Check that the data-library.yaml has been generated (or updated)

Check that the Zenodo link is in the metadata at the top of the tutorial.md

Creating the `data-manager.yaml` (optional)

Some of the tools may require specific databases, specifically prepared for the tool. In this case, some Galaxy tools come with “data managers” to simplify this process.

If you need such data managers for your training, then you should describe how to run them in the data-manager.yaml file:

data_managers:
    - id: url to data manager on ToolShed
      params:
        - 'param1': ''
        - 'param2': 'value'
      # Items refer to a list of variables you want to run this data manager. You can use them inside the param field with 
      # In case of genome for example you can run this DM with multiple genomes, or you could give multiple URLs.
      items:
        - item1
        - item2
      # Name of the data-tables you want to reload after your DM are finished. This can be important for subsequent data managers
      data_table_reload:
        - all_fasta
        - __dbkeys__

Creating the Galaxy Interactive Tour (optional)

A Galaxy Interactive Tour is a way to go through an entire analysis, step by step inside Galaxy in an interactive and explorative way. It is a great way to help users run the tutorial directly inside Galaxy. To learn more about creating a Galaxy tour please have a look at our dedicated tour training.

Testing the technical infrastructure

Once we have defined all the requirements for running the tutorial, we can test these requirements, either in a locally running Galaxy or in a Docker container. Please see our tutorial about Setting up Galaxy for Training about how to test your tutorial requirements.

Conclusion

You've Finished the Tutorial

Key points

Tools, data and workflows can be easily integrated in a Docker flavor to have a useful technical support for a tutorial

A Galaxy Docker flavor is a great support for training

A Galaxy Docker flavor can be deployed ‘anywhere’ and is scalable

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Bérénice Batut, Björn Grüning, Saskia Hiltemann, Helena Rasche, Tools, Data, and Workflows for tutorials (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-tutorial-technical/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{contributing-create-new-tutorial-technical,
author = "Bérénice Batut and Björn Grüning and Saskia Hiltemann and Helena Rasche",
	title = "Tools, Data, and Workflows for tutorials (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-tutorial-technical/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Funding

These individuals or organisations provided funding support for the development of this resource

ELIXIR Europe

de.NBI

UFR

Congratulations on successfully completing this tutorial!

Developing GTN training material
This tutorial is part of a series to develop GTN training material, feel free to also look at:

Contributing to the Galaxy Training Network with GitHub

Overview of the Galaxy Training Material

Principles of learning and how they apply to training and teaching

Teaching Python

Adding auto-generated video to your slides

Contributing with GitHub via its interface

Generating PDF artefacts of the website

Preview the GTN website as you edit your training material

Creating Interactive Galaxy Tours

GTN Metadata

Updating diffs in admin training

Tools, Data, and Workflows for tutorials

Including a new topic

Design and plan session, course, materials

Adding Quizzes to your Tutorial

Creating a new tutorial

FAIR Galaxy Training Material

Single Cell Publication - Data Analysis

Single Cell Publication - Data Plotting

Creating content in Markdown

Creating Slides

Updating tool versions in a tutorial

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/contributing/tutorials/create-new-tutorial-technical/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools: []

5 stars 2

3 stars 1

February 2020

5 stars: Liked: the "Testing the workflow" part Disliked: I had to deal with "fetch_url_whitelist" option, as my input file was located on my galaxy server.

5 stars: Liked: the "Testing the workflow" part Disliked: I had to deal with "fetch_url_whitelist" option, as my input file was located on my galaxy server.