View markdown source on GitHub

Tool development and integration into Galaxy


AvatarSaskia Hiltemann AvatarBérénice Batut AvatarAnthony Bretaudeau AvatarJohn Chilton AvatarNicola Soranzo AvatarBjörn Grüning AvatarGildas Le Corguillé



last_modification Last modification: Jul 10, 2021

Galaxy tools

Tools in the Galaxy UI

Screenshot of galaxy with the three main panels labelled list of available tools on left, 'wrapper' in center, and history with results as datasets on right

Galaxy tool / wrapper

Screenshot of tool interface in Galaxy for GraPhlAn showing a variety of input types like file selection, select, text, numbers. --format "png" --size 7 input_tree.txt png_image.png

class: left

So what is a tool?

Link between the Galaxy UI and the underlying tool:

Tool execution

A flowchart is depicated with the galaxy interface pointing to a bowtie2-wrapper.xml file which has a command, inputs, and outputs. Inputs points back to the tool interface. The command block points to the Operating System with an image of servers and the bowtie2 binary. This points back to outputs, and back to the history within the galaxy interface

Speaker Notes

  1. <inputs> (datasets and parameters) specified in the tool XML are exposed in the Galaxy tool UI
  2. When the user fills the form and click the Execute button, Galaxy fills the <command> template in the XML with the inputs entered by the user and execute the Cheetah code, producing a script as output
  3. Galaxy creates a job for the generated script and executes it somewhere (bowtie2 is run in this case)
  4. Some (not necessarily all) output files become new history datasets, as specified in the <outputs> XML tag set

Tool execution

An XML file as an image. The tool id is on the first line, then a description element, a command block running "echo Hello World $mystring to $output", an inputs section with a mystring text input, and an output1 tabular data file. A help block is shown last.

Speaker Notes

Tool execution

The previous image, but the input parameter named 'mystring' is shown pointing to its place in the command block. Same for the output pointing to the command block. An overlay shows the Job Command Line /bin/echo Hello world you are amazing > a/path.dat

Tool execution

The previous image but now there is an overlay showing the output text Hello world you are amazing

Tool XML

Galaxy tool XML format is formally defined in a XML Schema Definition (XSD), used to generate the corresponding online documentation


How to invoke the tool?

    <requirement type="package" version="1.1.3">graphlan</requirement>
--format $format

If the script is provided with the tool xml:

    <requirement type="package" version="2.7">python</requirement>
python '$__tool_directory__/'
--format $format

Speaker Notes

inputs > param to command

Parameters are directly linked to variables in <command>

#if str($dpi):
    --dpi $dpi
#end if
    <param name="input_tree" type="data" label="..."/>

    <param argument="--dpi" type="integer" optional="true" label="..."
        help="For non vectorial formats" />

Speaker Notes

inputs > param > data

Screenshot of the file selection input in Galaxy permitting selection of a single file, multiple files, or a collection. In bold text a label appears above the component describing its use. Below the component in light grey is a help message. This applies to every input in Galaxy

<param name="..." type="data" format="txt" label="..." help="..." />

.footnote[List of possible formats]

inputs > param > integer | float

Screenshot of the float input, it's just an input field set to 7.

<param name="..." type="integer" value="7" label="..." help="..."/>

Screenshot of the float input again, but this time a slider appears due to addition of min and max

<param name="..." type="float" min="0" max="10" value="1" label="..."

Speaker Notes

inputs > param > text

Screenshot of a textbox

<param name="..." type="text" value="..." label="..." help="..."/>

inputs > param > select

Screenshot of a select drop down with several image formats.

<param name="..." type="select" label="..." help="...">
    <option value="png" selected="true">PNG</option>
    <option value="pdf">PDF</option>
    <option value="ps">PS</option>
    <option value="eps">EPS</option>
    <option value="svg">SVG</option>

If no option has selected="true", the first one is selected by default.

inputs > param > select

The select is now a set of checkboxes.

<param name="..." type="select" display="radio" label="..." help="...">
    <option value="min" selected="true">Minimum</option>
    <option value="mean">Mean</option>
    <option value="max">Max</option>
    <option value="sum">Sum</option>

inputs > param > select

A select/unselect all checkbox appears before a box with numerous selections inside, appearing as badges that can be added or removed.

<param name="..." type="select" multiple="true" label="..." help="...">
    <option value="ld" selected="true">Length distribution</option>
    <option value="gc" selected="true">GC content distribution</option>

inputs > param > boolean

A yes/no box is shown

<param name="..." type="boolean" checked="false" truevalue="--log" falsevalue=""
    label="..." help="..." />

class: reduce70

inputs > param > conditional

.image-75[ Two screenshots are shown side by side. In the left a select box is set to Paired and two file inputs appear below. On the right the same select box is set to single, and only a single file input appears below it. ]

#if $fastq_input.selector == 'paired':
    '$fastq_input.input1' '$fastq_input.input2'
#end if
    <conditional name="fastq_input">
        <param name="selector" type="select" label="Single or paired-end reads?">
            <option value="paired">Paired-end</option>
            <option value="single">Single-end</option>
        <when value="paired">
            <param name="input1" type="data" format="fastq" label="Forward reads" />
            <param name="input2" type="data" format="fastq" label="Reverse reads" />
        <when value="single">
            <param name="input" type="data" format="fastq" label="Single reads" />

class: reduce70

inputs > param > repeat

Two boxes appear labelled 1: Series and 2: Series, with an insert series button below them. Each series box has two inputs in it, a file input and a select box.

#for $i, $s in enumerate($series):
#end for

    <repeat name="series" title="Series">
        <param name="input" type="data" format="tabular" label="Dataset"/>
        <param name="xcol" type="data_column" data_ref="input" label="Column for x axis"/>

Speaker Notes

It makes sense to use a <repeat> block only if it contains multiple related parameters, otherwise adding multiple="true" is preferable.


Which files the tool will produce as output?

Screenshot of a galaxy history with three outputs.


    <data name="tree" format="txt" label="${} on ${on_string}: Tree" />
    <data name="annotation" format="txt"
        label="${} on ${on_string}: Annotation" />

Speaker Notes

${} on ${on_string} is the default output label, need to modify this if the tool generates more than 1 output

outputs > filter

Output is collected only if the filter evaluates to True

    <param type="select" name="format" label="Output format">
        <option value="png">PNG</option>
        <option value="pdf">PDF</option>
    <data name="png_output" format="png" label="${} on ${on_string}: PNG">
        <filter>format == "png"</filter>
    <data name="pdf_output" format="pdf" label="${} on ${on_string}: PDF">
        <filter>format == "pdf"</filter>

Speaker Notes

N.B. If the filter expression raises an Exception, the dataset will NOT be filtered out


Legacy tools (i.e. with profile unspecified or less than 16.04) by default fail only if the tool writes to stderr

Non-legacy tools by default fail if the tool exit code is not 0, which is equivalent to specify:

<command detect_errors="exit_code"> ... </command>

To fail if either the tool exit code is not 0 or “Exception:”/”Error:” appears in standard error/output:

<command detect_errors="aggressive"> ... </command>


If you need more precision:

    <exit_code range=":-2" level="warning" description="Low disk space" />
    <exit_code range="1:" level="fatal"  />
    <regex match="Error:"  level="fatal" />
<command> ... </command>

“Warning” level allows to add information to stderr without marking the dataset as failed


Screenshot of a help block in a galaxy tool, it shows the below text block rendered according to restructured text rules. What it does is bold, and user manual is a hyperlink to the bitbucket url.

**What it does**
GraPhlAn is a software tool for producing high-quality circular
representations of taxonomic and phylogenetic trees. GraPhlAn focuses
on concise, integrative, informative, and publication-ready
representations of phylogenetically- and taxonomically-driven

For more information, check the `user manual

Content should be in reStructuredText markup format


Screenshot of the citations box showing 5 nicely formatted citations with italics, and hyperlinked DOIs.

    <citation type="doi">10.1093/bioinformatics/bts611</citation>
    <citation type="doi">10.1093/nar/gks1219</citation>
    <citation type="doi">10.1093/nar/gks1005</citation>
    <citation type="doi">10.1093/bioinformatics/btq461</citation>
    <citation type="doi">10.1038/nbt.2198</citation>

If no DOI is available, a BibTeX citation can be specified with type="bibtex"

Quoting params

Always quote text and data parameters and output data in <command>


Multiple commands

Use && to concatenate them

--format '$format'
echo "Yeah it worked!"

The job will exit on the first error encountered.

Param argument

Use the argument tag when a param name reflects the command line argument

<param argument="--size" type="integer" value="7" label="..." help="..."/>


Use sections to group related parameters

Screenshot of the same section twice, in the first it shows Additional Options and is collapsed. In the second it is expanded and an integer input can be seen.

<section name="advanced" title="Advanced options" expanded="False">
    <param argument="--size" type="integer" value="7" label="..." help="..."/>

Planemo logo, the E mimics the galaxy logo with three bars, the bottom most offset

Command-line utilities to assist in building and publishing Galaxy tools.

##.image-25[Planemo logo]

An overly complicated flowchart with 11 steps and a three level hierarchy. The gist is that planemo tool_init lets a wrapper be created, planemo lint is then used. Planemo conda installs packages from a conda repository. This is then run with planemo test and planemo serve. Afterwards planemo shed_test, shed_create, and shed_update upload the wrapper to the galaxy toolshed. Then it is installed to a galaxy instance where it can be tested, and fetches the conda env from conda.

##.image-25[planemo logo again]

planemo tool_init

Creates a skeleton of xml file

$ mkdir new_tool
$ cd new_tool
$ planemo tool_init --id 'some_short_id' --name 'My super tool'

Complicated version:

$ planemo tool_init --id 'samtools_sort' --name 'Samtools sort' \
          --description 'order of storing aligned sequences' \
          --requirement 'samtools@1.3.1' \
          --example_command "samtools sort -o '1_sorted.bam' '1.bam'" \
          --example_input 1.bam \
          --example_output 1_sorted.bam \
          --test_case \
          --version_command 'samtools --version | head -1' \
          --help_from_command 'samtools sort' \
          --doi '10.1093/bioinformatics/btp352'

class: packed

##.image-25[planemo logo again]

planemo lint

Checks the syntax of a tool

$ planemo lint
Linting tool /opt/galaxy/tools/seqtk_seq.xml
Applying linter tests... CHECK
.. CHECK: 1 test(s) found.
Applying linter output... CHECK
.. INFO: 1 outputs found.
Applying linter inputs... CHECK
.. INFO: Found 1 input parameters.
Applying linter help... CHECK
.. CHECK: Tool contains help section.
.. CHECK: Help contains valid reStructuredText.
Applying linter general... CHECK
.. CHECK: Tool defines a version [0.1.0].
.. CHECK: Tool defines a name [Convert to FASTA (seqtk)].
.. CHECK: Tool defines an id [seqtk_seq].
Applying linter command... CHECK
.. INFO: Tool contains a command.
Applying linter citations... CHECK
.. CHECK: Found 1 likely valid citations.

##.image-25[planemo logo again]

planemo serve

View your new tool in a local Galaxy instance

$ planemo serve

Open in your web browser to view your new tool

##.image-25[planemo logo again]

Building Galaxy Tools

flowchart with planemo tool_init creating a wrapper tool.xml and planemo lint being run repeatedly.

Functional tests

Functional tests

Good practices

  1. Build tests using inputs/parameters/outputs
  2. Run the tests –> Failed –> Coffee
  3. Develop
  4. Run the tests –> Failed –> Coffee
  5. Develop
  6. Run the tests –> Succeed –> Beer


        <param name="input_tree" value="input_tree.txt"/>
        <param name="format" value="png"/>
        <param name="dpi" value="100"/>
        <param name="size" value="7"/>
        <param name="pad" value="2"/>
        <output name="png_output_image" file="png_image.png" />

input_tree.txt and png_image.png must be in the test-data/ directory

Tool directory tree

├── graphlan.xml
└── test-data/
    ├── input_tree.txt
    └── png_image.png

Comparing to an expected result

<output ... compare="diff|re_match|sim_size|contains|re_match_multiline" ... />
<output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta" />
<output ... md5="68b329da9893e34099c7d8ad5cb9c940" />
<output ... lines_diff="4" />
<output ... compare="sim_size" delta="1000" />

.footnote[Complete documentation]

Speaker Notes

Checking the output content

<output name="out_file1">
        <has_text text="chr7" />
        <not_has_text text="chr8" />
        <has_text_matching expression="1274\d+53" />
        <has_line_matching expression=".*\s+127489808\s+127494553" />
        <!-- &#009; is XML escape code for tab -->
        <has_line line="chr7&#009;127471195&#009;127489808" />
        <has_n_columns n="3" />

.footnote[Complete documentation]

Checking tool stdout/stderr

    <has_text text="Step 1... determine cutoff point" />
    <has_text text="Step 2... estimate parameters of null distribution" />

.footnote[Complete documentation]

Nested inputs in test

        <section name="advanced">
            <repeat name="names">
                <param name="first" value="Abraham"/>
                <param name="last" value="Lincoln"/>
            <repeat name="names">
                <param name="first" value="Donald"/>
                <param name="last" value="Trump"/>
            <conditional name="image">
                <param name="output_image" value="yes"/>
                <param name="format" value="png"/>

##.image-25[planemo logo yet again]

planemo test

Runs all functional tests

$ planemo test

An HTML report is automatically created with logs in case of failing test

##.image-25[planemo logo]

Test Galaxy Tools

flowchart with planemo tool_init creating a wrapper tool.xml and planemo lint being run repeatedly, and now planemo test as well.



How Galaxy will deal with dependencies?

schematic of a galaxy server with dependency resolution via requirement tags at the top. On the left is the tool box with a number of xml files listed like seqtk_seq and seqtk_subseq. On the right is applications & libraries showing only a few tools like seqtk, all of the 3 multipoe subtools were collapsed


    <requirement type="package" version="1.66">biopython</requirement>
    <requirement type="package" version="1.0.0">graphlan</requirement>

Local installation using Conda packages

.image-50[Conda logo, the C is textured.]

See Tool Dependencies and Conda

Advanced features


A configfile creates a text file which can then be used inside the command as:

Cheetah code and param/output variables can be used inside configfile (like inside command).

class: packed


<command><![CDATA[ mb $script_nexus ]]></command>

    <configfile name="script_nexus"><![CDATA[
set autoclose = yes;
execute $input_data;
#if str($data_type.type) == "nuc“
    lset nst=$data_type.lset_params.lset_Nst;
#end if
mcmcp ngen=$mcmcp_ngen;
set autoclose = yes;
execute dataset_42.dat;
lset nst=2 ;
mcmcp ngen=100000;


![Another schemating with many arrows. Macros.xml is on the left with token and xml blocks. The token block points to examples like @THREADS@ and @HELP_ABOUT@. The xml block points to examples like . Both of these examples point to three blast tools which make use of the macros.](../../images/macro.png)

.footnote[Planemo documentation about macros]

class: packed

macros > xml


    <xml name="requirements">
            <requirement type="package" version="2.5.0">blast</requirement>
    <xml name="stdio">
            <exit_code range="1" level="fatal" />


<expand macro="requirements"/>
<expand macro="stdio"/>

macros > token


    <token name="@THREADS@">-num_threads "\${GALAXY_SLOTS:-8}"</token>


blastn -query '$query' @THREADS@ [...]

macros > xml > yield


    <xml name="requirements">
            <requirement type="package" version="2.2.0">trinity</requirement>


<expand macro="requirements">
    <requirement type="package" version="1.1.2">bowtie</requirement>


    <token name="@TOOL_VERSION@">1.2</token>
<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="@TOOL_VERSION@+galaxy3">
        <requirement type="package" version="@TOOL_VERSION@">seqtk</requirement>

This means: the 3rd revision of the Galaxy tool for Seqtk 1.2 .

Best practice documentation

command > Reserved variables

# Email’s numeric ID (id column of galaxy_user table in the database)
echo '$__user_id__'

# User’s email address
echo '$__user_email__'

# The instance, gives access to all other configuration file variables.
# Should be used as a last resort, may go away in future releases.
echo '$__app__.config.user_library_import_dir'

# Check a dataset type
#if $input1.is_of_type('gff'):
    echo 'input1 type is ${input1.datatype}'
#end if

.footnote[Reserved Variables List]

Multiple inputs - Mapping over

<param name="..." type="data" format="txt" label="..." help="..." />

File selector input

Possible to select multiple dataset:

File selector input screenshot, but now the middle "multiple files" button is checked.

Multiple inputs - Single execution

<param name="..." type="data" format="txt" multiple="true" label="..." help="..." />

A multi-select file input field is shown, different than the normal file input there is no single file option.

In the command:

#for $input in $inputs
    --input "$input"
#end for

One job for all selected dataset

Multiple outputs

    <data name="output" format="txt">
        <discover_datasets pattern="__designation_and_ext__"
            directory="output_dir" visible="true" />

If the output file extension is not present/usable:

    <data name="output" format="txt">
        <discover_datasets pattern="__designation__" format="txt"
            directory="output_dir" visible="true" />

Dataset collections

A dataset collection combines numerous datasets in a single entity that can be manipulated together


Dataset collections as input

Mapping over:

<param name="inputs" type="data" format="bam" label="Input BAM(s)" />

The normal file selector is shown however now the collection input is clicked.

Single execution:

<param name="inputs" type="data_collection" collection_type="list|paired|list:paired|..."
    format="bam" label="Input BAM(s)" />
#for $input in $inputs
    --input '$input'
    --sample_name '$input.element_identifier'
#end for

Dataset collections as output

A single paired collection:

<collection name="paired_output" type="paired" label="Split Pair">
    <data name="forward" format="txt" />
    <data name="reverse" format_source="input1" from_work_dir="reverse.txt" />

Unknown number of files:

<collection name="output" type="list" label="Unknown number of files">
    <discover_datasets pattern="__name_and_ext__" directory="outputs" />

Use multiple CPUs

Screenshot of two xml files. In the top is the jb_conf.xml where a command line job submission specification indicates that it will be submitted with 4 threads. A single tool, ncbi blastn wrapper is assigned to that destination. In the second xml file the blastn command uses the GALAXY_SLOTS variable to control how many threads are supplied to the tool.

blastn -query foo_bar -num_threads 4

8 is the default value if not set in destination

Data tables

Using a data table in a tool

Three xml files are shown. At the top is the tool data table conf which mentions a tool-data/bowtie2_indices.loc. Below is that bowtie2 loc file which indicates that hg19 will be found at a specific location in the /db directory. And the third is the bowtie2 wrapper which loads options from a data table, and points to the bowtie2_indexes named table in the first xml.


.footnote[Documentation: Adding Datatypes]

Publishing tools

Contributing to a community

Many tools developed by the community on GitHub repositories

Added value:

IUC: Intergalactic Utilities Commission

.image-50[IUC logo]

How should I publish my tool?

Adding to an existing GitHub repository (IUC, GalaxyP, …)

How should I publish my tool?

Using your own GitHub repository

How should I publish my tool?

Using planemo by hand

Check out our tutorial to publish to the ToolShed using Planemo

Continuous Integration

.image-50[Github logo]

.image-50[Travis CI logo]

.image-50[Planemo logo]

.image-50[Conda logo]

Continuous Integration

GitHub side

Screenshot of a .travis.yml file with badges and logos over it. It runs several planemo commands during before install, install, and script portions.

Travis CI side

Screenshot of travis CI shown a tool test passing.


categories: [Sequence Analysis]
description: Tandem Repeats Finder description
long_description: A long long description.
<span id="tandem_repeats_finder_2"><i class="fas fa-link" aria-hidden="true"></i> tandem_repeats_finder_2</span>
owner: gandres
planemo shed_init --name="tandem_repeats_finder_2"
                     --description="Tandem Repeats Finder description"
                     --long_description="A long long description."
                     --category="Sequence Analysis"
                     [--remote_repository_url=<URL to .shed.yml on github>]
                     [--homepage_url=<Homepage for tool.>]

Tool suites

A tool suite is a group of related tools that can all be installed at once.

Defined in .shed.yml: implicitly define repositories for each individual tool in the directory and build a suite for those tools.

Example: trinity/.shed.yml

    name_template: ""
    description_template: " (from the Trinity tool suite)"
    name: "suite_trinity"
    description: Trinity tools to assemble transcript sequences from Illumina RNA-Seq data.


planemo shed_lint --tools --ensure_metadata
Linting repository […]/tandem_repeats_finder
Applying linter expansion... CHECK
.. INFO: Included files all found.
Applying linter tool_dependencies_xsd... CHECK
.. INFO: tool_dependencies.xml found and appears to be valid XML
Applying linter tool_dependencies_actions... CHECK
.. INFO: Parsed tool dependencies.
Applying linter repository_dependencies... CHECK
.. INFO: No repository_dependencies.xml, skipping.
Applying linter shed_yaml... CHECK
.. INFO: .shed.yml found and appears to be valid YAML.
Applying linter readme... CHECK
.. INFO: No README found skipping.
+Linting tool […]/tandem_repeats_finder/tandem_repeats_finder_wrapper.xml

.image-50[Ansible logo]

.image-50[Conda logo]

.image-50[Docker logo]

Docker Galaxy Flavors

Galaxy docker logo, the docker logo with several containers replaced by galaxy logos.

.footnote[Docker Galaxy Stable]

Docker Galaxy Flavors


# Galaxy - My own flavour
# VERSION       0.1


MAINTAINER Björn A. Grüning,


# Adding the tool definitions to the container
ADD my_tool_list.yml $GALAXY_ROOT/my_tool_list.yml

# Install deepTools
RUN install-tools $GALAXY_ROOT/my_tool_list.yml

Docker Galaxy Flavors


api_key: admin
- name: fastqc
  owner: devteam
  tool_panel_section_id: cshl_library_information
    - '8c650f7f76e9'  # v0.62
    - 'd2cf2c0c8a11'  # v0.63
- name: suite_deeptools
  owner: bgruening
  tool_panel_section_label: 'deepTools'
    - '7f5562625ae2'

Docker Galaxy Flavors


docker run -d -p 8080:80 galaxy-deeptools

Key Points

curriculum Do you want to extend your knowledge?

Follow one of our recommended follow-up trainings: - [Development in Galaxy](/training-material/topics/dev) - Tool Dependencies and Conda: [slides slides](/training-material/topics/dev/tutorials/conda/slides.html)

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network This material is licensed under the Creative Commons Attribution 4.0 International License.