Galaxy Tool Management with Ephemeris
Author(s) | Martin Čech Helena Rasche Nicola Soranzo |
Reviewers |
OverviewQuestions:Objectives:
How to install, update, and maintain Galaxy tools?
How to extract a list of tools from a workflow or Galaxy instance?
Learn about Ephemeris
Extract a list of tools from a workflow
Install these tools on a given Galaxy
Time estimation: 45 minutesSupporting Materials:Published: Jan 27, 2019Last modification: Jun 14, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00023rating Rating: 3.0 (1 recent ratings, 14 all time)version Revision: 38
This tutorial will introduce you to one of Galaxy’s associated projects - Ephemeris. Ephemeris is a small Python library and set of scripts for managing the bootstrapping of Galaxy plugins - tools, index data, and workflows. It aims to help automate, and limit the quantity of manual actions admins have to do in order to maintain a Galaxy instance.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon
Background
You are an administrator of a Galaxy server. A colleague has approached you with a request to run a specific Galaxy workflow on their data. In order to enable this workflow for your users, you will have to:
- identify what tools are required for the workflow
- install these tools and their dependencies on your Galaxy instance.
Requirements
To run this tutorial, you will need to install Ephemeris. You would normally install it on your workstation, but during training courses we recommend to install it on the same virtual machine used for the Galaxy server.
Hands-on: Installing Ephemeris in a Python virtual environment
- Install the Python
venv
package if it is not already available. On Ubuntu this can be done withsudo apt install python3-venv
- Create a virtual environment just for ephemeris, activate it and install ephemeris inside it:
python3 -m venv ~/ephemeris_venv . ~/ephemeris_venv/bin/activate pip install ephemeris
Extracting Tools
A common request you will experience as an administrator is “I want to run this workflow”. Since this is such a common workflow, Galaxy has a built in way to accomplish it. We can use Ephemeris to extract a list of tools from a Galaxy workflow document, and then use Ephemeris to install these tools and specific versions into your Galaxy.
However, Galaxy workflow files are complex JSON documents, and the process of mapping the tool IDs to a ToolShed repository and revision is not trivial. Workflow files contain tool IDs which look like:
toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.71
toolshed.g2.bx.psu.edu/repos/bgruening/trim_galore/trim_galore/0.4.3.1
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.6
toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.3.4.2
toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.1
toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.4.1
toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.1
In order to actually install these tools, we need to convert each tool ID into a ToolShed repository name and revision. For example, FastQC version 0.71 corresponds to revision ff9530579d1f
in the ToolShed.
- name: fastqc
owner: devteam
revisions:
- ff9530579d1f
tool_panel_section_label: Tools from workflows
tool_shed_url: https://toolshed.g2.bx.psu.edu
Ephemeris can take care of this process. Let’s practice this on a real workflow.
Hands-on: Extracting a list of tools from a workflow
Download the mapping workflow:
Input: Bashwget https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/workflows/mapping.ga
Use the Ephemeris
workflow-to-tools
command to extract the tool list from this workflow into a file namedworkflow_tools.yml
in the foldertools
.QuestionWhat did your command look like?
Input: Bashworkflow-to-tools -w mapping.ga -o tools/workflow_tools.yml -l Mapping
Or as a diff:
--- /dev/null +++ b/tools/workflow_tools.yml @@ -0,0 +1,41 @@ +install_tool_dependencies: True +install_repository_dependencies: True +install_resolver_dependencies: True + +tools: +- name: fastqc + owner: devteam + revisions: + - e7b2202befea + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/ +- name: trim_galore + owner: bgruening + revisions: + - 949f01671246 + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/ +- name: multiqc + owner: iuc + revisions: + - f7985e0479b9 + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/ +- name: bowtie2 + owner: devteam + revisions: + - 09b2cdb7ace5 + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/ +- name: samtools_stats + owner: devteam + revisions: + - 24c5d43cb545 + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/ +- name: bamtools_filter + owner: devteam + revisions: + - cb20f99fd45b + tool_panel_section_label: Mapping + tool_shed_url: https://toolshed.g2.bx.psu.edu/
Inspect the
workflow_tools.yml
file, which contains a tool list in YAML format.
Installing Tools
Now that you have extracted a list of tools, let’s install these on your Galaxy instance. In order to accomplish this, you will need:
- The URL of your Galaxy server
- The API key for your account, which must be an admin
Galaxy admin accounts are specified as a comma-separated email list in the
admin_users
directive ofgalaxy.yml
. If you have set up your Galaxy server using the Galaxy Installation with Ansible tutorial, this is set toadmin@example.org
.
- In your browser, open your Galaxy homepage
- Log in, or register a new account, if it’s the first time you’re logging in
- Go to
User -> Preferences
in the top menu bar, then click onManage API key
- If there is no current API key available, click on
Create a new key
to generate it- Copy your API key to somewhere convenient, you will need it throughout this tutorial
There are two ways to install tools, depending on how you specify the tools to install:
Hands-on: Installing a single tool
Use the Ephemeris
shed-tools
command to install the toolbwa
, owned bydevteam
into a section namedMapping
QuestionWhat did your command look like?
Use your Galaxy URL and API key in the example command below:
shed-tools install -g https://galaxy.example.org -a <api-key> --name bwa --owner devteam --section_label Mapping
If your Galaxy instance is served via the HTTPS protocol (as it should be!), ephemeris will use the requests Python library to encrypt the communication with Galaxy. Therefore, if your Galaxy uses a self-signed SSL certificate,
shed-tools
may fail with aCERTIFICATE_VERIFY_FAILED
error.Under Ubuntu, you can allow the use of the unrecognized certificate as follows:
- Get hold of the Certificate Authority (CA) certificate used to sign your Galaxy SSL certificate. For a Galaxy Admin Training course, this is usually the Fake LE Root X1 certificate.
- Copy the CA certificate file into
/usr/local/share/ca-certificates/
with a.crt
extension.- Run
update-ca-certificates
as root.- Execute
export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
, as explained in requests docs.Now you should be able to execute successfully the
shed-tools
commands.
This provides an easy way to do a one-off installation of a tool, but is not very convenient if you want to install many tools. For that, you can install from a YAML file:
Hands-on: Installing tools from a tool list
(optional) Watch the installation proceed by running
journalctl -f
in a separate remote shell.Use the Ephemeris
shed-tools
command to install all of the tools from theworkflow_tools.yml
file on your Galaxy.QuestionWhat did your command look like?
Use your Galaxy URL and API key in the example command below:
shed-tools install -g https://galaxy.example.org -a <api-key> -t workflow_tools.yml
Open your Galaxy’s admin interface and check that the tools have been installed.
Using the UI import the workflow file that you used, mapping.ga.
- Right-click or Ctrl-click on the link above and copy the link address
- On your Galaxy instance click on
Workflow
, thenImport
. Paste the URL into theImport Archived URL
field.
Occasionally the tool installation may fail due to network issues; if it does, just re-run the shed-tools
installation process until it succeeds. This is a known issue the developers are working on.
Shift-F2: Create a horizontal split
Shift-Left/Right/Up/Down: Move focus among splits
Ctrl-F6: Close split in focus
Ctrl-D: (Linux, Mac users) Close split in focus
There are more byobu commands described in this gist
Yes. The default tool config (
config/tool_conf.xml.sample
, copy toconfig/tool_conf.xml
to modify) has an option,monitor="true"
set in the root<toolbox>
tag. This instructs Galaxy to watch the tool files referenced in that config and load or reload them as necessary. It will also add any tools you have added.
Yes. The
galaxy_local_tools
option for thegalaxyproject.galaxy
Ansible role can be used to install local tools, or you can manage them in another way that fits your workflow better. UseGalaxy.eu, for example, maintains a repository of tools that are not installed from the ToolShed to aid their local developers. This is deployed to the server using thegit
module, rather than the Galaxy Ansible role.
Tool Testing
Having the tools installed is a good first step, but your users will expect that they actually work as well. You can use Ephemeris to automatically test all of the installed tools.
Hands-on: Test the installed tools
Use the Ephemeris
shed-tools
command to test thebamtools_filter
tool on your Galaxy.QuestionWhat did your command look like?
Use your Galaxy URL and API key in the example command below:
shed-tools test -g https://galaxy.example.org -a <api-key> --name bamtools_filter --owner devteam
Shed-tools test outputs a file with details of
tool_test_output.json
with details of jobs that have run. Have a look at this file.
This can give you some more confidence that things are working correctly. Oftentimes, users provide workflows for biological domains that we are not familiar with, so knowing how we can test these tools is impossible for us as admins. Leveraging the built-in tool test cases can give you reassurance that things are functional before you inform your users of the new tools.
The ephemeris
shed-tools test
command produces an output filetool_test_output.json
with information about the test jobs that have run. Galaxyproject’s Planemo can be used to generate test reports from tool_test_output.json in HTML and other formats.
Obtaining a Tool List
Sometimes a user might ask you to install all the tools they were previously using on another Galaxy instance. Ephemeris can produce a tool_list.yaml
file for all the tools installed on a server.
Hands-on: Obtain UseGalaxy.eu's tool list
Use the Ephemeris
get-tool-list
command to obtain the full set of tools installed on UseGalaxy.euQuestionWhat did your command look like?
This command does not require authentication and can be used to obtain the tool list from any public Galaxy server:
Input: Bashget-tool-list -g "https://usegalaxy.eu" -o "eu_tool_list.yaml"
Inpect the first few lines of tool list: Run
head -n 20 eu_tool_list.yaml
.
We will not install all the tools from the EU Galaxy server as that server likely has more tools than any other Galaxy instance, but it is useful as an example of how you can use Ephemeris to facilitate the mirroring of another Galaxy instance.
The output of
get-tool-list
only includes ToolShed tools, not local non-TS tools.
If you’ve seen the European Galaxy tools view (this is available on any galaxy! Just access
/tools/view
) you’ll notice they report somewhere over 2700 tools, however theget-tool-list
output lists significantly fewer tools.This is for a couple reasons:
- That’s the number of repositories that are installed, and some repositories include multiple tools (e.g.
circos
has quite a few)- This only lists ToolShed tools, while EU also includes a number of non-TS tools
- There are a number of tools built into Galaxy (e.g. the Collection Operation tools)
Production Best Practices
The servers which are part of the usegalaxy.*
network use Ephemeris extensively to manage their large tool sets.
Interestingly, UseGalaxy.eu and UseGalaxy.org.au have different approaches:
- EU maintains YAML files roughly per domain, it is not as clear of an ordering. They maintain a YAML file where humans add the tools which should be installed in a given category, and lock files are automatically generated from these with the latest revision if it is missing. They follow a cycle of updating the lock files with the latest available revisions of tools, and then installing from these lock files any missing revisions. They use a Jenkins server to automatically run tool installation weekly.
- AU maintains a separate YAML file per tool panel section as a record of all installed tools on the server. They accept tool requests as pull requests and a Jenkins server is notified when a pull request is merged so that the tool can be automatically installed and tested. Like EU, they run automatic updates of installed tools once a week.
- Together,
usegalaxy.*
are working on a collaborative approach at galaxyproject/usegalaxy-tools but this is not consumption ready yet.
If running ephemeris directly is not your preference, there is an Ansible role and a sample playbook that can help automate some tasks.
It sometimes happens in Galaxy, that one environment isn’t working anymore. It mostly happens from the start when it does happen. You can remove the environment on disk, or use the “Manage Dependencies” interface, select the environment, and delete it. Then re-install the dependency through the same Manage Dependencies interface.
Previous experience with this is not good, there was a lot of unsafe code that would do things simultaneously that would destroy config files. With conda this is not recommended either, as this can corrupt conda environments. A solution for this is how UseGalaxy.eu does it, where they keep the full list of tools they want to install, and then a CI server (Jenkins) installs them. In this way they can enforce that only a single install process is running at any time.
If a student is running
shed-tools
on the VM, then it should work without certificate issues, because we installed the Fake LE X1 CA certificate, meaning that to your VM, the certificate chain is valid. We cannot (and would not recommend) setting that certificate on your local machine. That is the first way that comes to mind, that running an ephemeris command could generate that error.
While there is a function to accomplish this in BioBlend, it has not been included in Ephemeris yet. If you’re looking for a way to contribute to Galaxy, this would be great :)
There is no one recommended way. Different people like to do different things. Some of us install things through the GUI still, some of us use ephemeris just for the automation.
EU and others try and force all tool installation to go through ephemeris to make sure that rebuilding a server in case of disaster is easy. Anything not managed through their files will be lost.
Others do a more mixed approach:
typically use Ephemeris, when I have to re-deploy a Galaxy instance (and thus install all tools) or during a Galaxy maintenance, when I want to update all installed toolshed tools in one go. Otherwise, when I need to install a specific version of one tool I use the GUI.
There is no one right answer.
Sometimes the toolbox will fail to reload. You can correct for this by manually triggering the toolbox reload with a query:
curl -X PUT https://galaxy.example.org/api/configuration/toolbox -H "x-api-key: $GALAXY_API_KEY"
This will request the toolbox to reload and you can check after if it’s discovered your newly installed tools.
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If you’re using
git
to track your progress, remember to add your changes and commit with a good commit message!
–>
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon