Use Singularity containers for running Galaxy jobs

Overview

Questions:
Objectives:
  • Configure your Galaxy to use Singularity and BioContainers for running jobs

Requirements:
Time estimation: 1 hour
Supporting Materials:
Last modification: Jul 2, 2021
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Overview

In this tutorial you will learn how to configure Galaxy to run jobs using Singularity containers provided by the BioContainers community.

Background

BioContainers is a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity). BioContainers is based on the popular frameworks Conda, Docker and Singularity.

https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html

Singularity is an alternative to Docker that is much friendlier for HPCs

Singularity is a container platform. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible.

https://sylabs.io/guides/3.7/user-guide/introduction.html

Agenda

  1. Background
  2. Installing Singularity
    1. Configure Galaxy to use Singularity

Installing Singularity

First, we will install Singularity using Ansible. On most operating systems there is no package for singularity yet, so we must use a role which will compile it from source. If you’re on CentOS7/8, it is available through the EPEL repository.

tip CentOS7

If you are using CentOS7, you can skip this hands-on section and instead install the epel-release and singularity system packages in your pre_tasks.

hands_on Hands-on: Installing Singularity with Ansible

  1. In your working directory, add the Singularity role to your requirements.yml file:

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -12,3 +12,7 @@
       version: 0.3.0
     - src: usegalaxy_eu.certbot
       version: 0.1.5
    +- src: cyverse-ansible.singularity
    +  version: 048c4f178077d05c1e67ae8d9893809aac9ab3b7
    +- src: gantsign.golang
    +  version: 2.6.3
       
    
  2. Install the requirements with ansible-galaxy:

    code-in Input: Bash

    ansible-galaxy install -p roles -r requirements.yml
    
  3. Specify which version of Singularity you want to install, in group_vars/galaxyservers.yml:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -122,3 +122,9 @@ nginx_conf_http:
     nginx_ssl_role: usegalaxy_eu.certbot
     nginx_conf_ssl_certificate: /etc/ssl/certs/fullchain.pem
     nginx_conf_ssl_certificate_key: /etc/ssl/user/privkey-nginx.pem
    +
    +# Golang
    +golang_gopath: '/opt/workspace-go'
    +# Singularity target version
    +singularity_version: "3.7.4"
    +singularity_go_path: "{{ golang_install_dir }}"
       
    
  4. Add the new roles to your galaxy.yml playbook, before the Galaxy server itself. We’ll do this bceause it’s a dependency of Galaxy to run, so it needs to be there before Galaxy starts.

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -14,6 +14,8 @@
           become: true
           become_user: postgres
         - geerlingguy.pip
    +    - gantsign.golang
    +    - cyverse-ansible.singularity
         - galaxyproject.galaxy
         - role: uchida.miniconda
           become: true
       
    
  5. Run the playbook

    code-in Input: Bash

    ansible-playbook galaxy.yml
    
  6. Singularity should now be installed on your Galaxy server. You can test this by connecting to your server and run the following command:

    code-in Input: Bash

    singularity run docker://hello-world
    

    code-out Output: Bash

    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    Getting image source signatures
    Copying blob 0e03bdcc26d7 done
    Copying config b23a8f6569 done
    Writing manifest to image destination
    Storing signatures
    2021/01/08 11:25:12  info unpack layer: sha256:0e03bdcc26d7a9a57ef3b6f1bf1a210cff6239bff7c8cac72435984032851689
    INFO:    Creating SIF file...
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    
    Hello from Docker!
    This message shows that your installation appears to be working correctly.
    ...
    

Configure Galaxy to use Singularity

Now, we will configure Galaxy to run tools using Singularity containers, which will be automatically fetched from the BioContainers repository.

hands_on Hands-on: Configure Galaxy to use Singularity

  1. Edit the group_vars/galaxyservers.yml file and add a dependency_resolvers_config_file entry and a corresponding galaxy_config_templatets entry:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -29,6 +29,8 @@ miniconda_manage_dependencies: false
        
     galaxy_config:
       galaxy:
    +    dependency_resolvers_config_file: "{{ galaxy_config_dir }}/dependency_resolvers_conf.xml"
    +    containers_resolvers_config_file: "{{ galaxy_config_dir }}/container_resolvers_conf.xml"
         brand: "🧬🔬🚀"
         admin_users: admin@example.org
         database_connection: "postgresql:///galaxy?host=/var/run/postgresql"
    @@ -89,6 +91,10 @@ galaxy_config:
     galaxy_config_templates:
       - src: templates/galaxy/config/job_conf.xml.j2
         dest: "{{ galaxy_config.galaxy.job_config_file }}"
    +  - src: templates/galaxy/config/container_resolvers_conf.xml.j2
    +    dest: "{{ galaxy_config.galaxy.containers_resolvers_config_file }}"
    +  - src: templates/galaxy/config/dependency_resolvers_conf.xml
    +    dest: "{{ galaxy_config.galaxy.dependency_resolvers_config_file }}"
        
     # systemd
     galaxy_manage_systemd: yes
       
    
  2. Create the templates/galaxy/config directory if it doesn’t exist:

    code-in Input: Bash

    mkdir -p templates/galaxy/config
    
  3. Create the new file templates/galaxy/config/dependency_resolvers_conf.xml. This will not enable any dependency resolvers like the legacy toolshed packages or Galaxy packages, and instead everything will be resolved through Singularity.

    --- /dev/null
    +++ b/templates/galaxy/config/dependency_resolvers_conf.xml
    @@ -0,0 +1,2 @@
    +<dependency_resolvers>
    +</dependency_resolvers>
       
    
  4. Create the new file templates/galaxy/config/container_resolvers_conf.xml.j2, this specifies the order in which to attempt container resolution.

    --- /dev/null
    +++ b/templates/galaxy/config/container_resolvers_conf.xml.j2
    @@ -0,0 +1,6 @@
    +<containers_resolvers>
    +  <explicit_singularity />
    +  <cached_mulled_singularity cache_directory="{{ galaxy_mutable_data_dir }}/cache/singularity" />
    +  <mulled_singularity auto_install="False" cache_directory="{{ galaxy_mutable_data_dir }}/cache/singularity" />
    +  <build_mulled_singularity auto_install="False" cache_directory="{{ galaxy_mutable_data_dir }}/cache/singularity" />
    +</containers_resolvers>
       
    
  5. Now, we want to make Galaxy run jobs using Singularity. Modify the file templates/galaxy/config/job_conf.xml.j2, by adding the singularity_enabled parameter:

    --- a/templates/galaxy/config/job_conf.xml.j2
    +++ b/templates/galaxy/config/job_conf.xml.j2
    @@ -2,8 +2,17 @@
         <plugins workers="4">
             <plugin id="local_plugin" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/>
         </plugins>
    -    <destinations default="local_destination">
    +    <destinations default="singularity">
             <destination id="local_destination" runner="local_plugin"/>
    +        <destination id="singularity" runner="local_plugin">
    +            <param id="singularity_enabled">true</param>
    +            <!-- Ensuring a consistent collation environment is good for reproducibility. -->
    +            <env id="LC_ALL">C</env>
    +            <!-- The cache directory holds the docker containers that get converted. -->
    +            <env id="SINGULARITY_CACHEDIR">/tmp/singularity</env>
    +            <!-- Singularity uses a temporary directory to build the squashfs filesystem. -->
    +            <env id="SINGULARITY_TMPDIR">/tmp</env>
    +        </destination>
         </destinations>
         <tools>
         </tools>
       
    
  6. Re-run the playbook

    code-in Input: Bash

    ansible-playbook galaxy.yml
    
  7. In your Galaxy admin interface, install the minimap2 tool.

    • Login to Galaxy as the admin user
    • Click the “admin” menu at the top
    • Under “Tool Management” on the left select “Install and Uninstall”
    • search for minimap2 and install the latest version with the Target Section “Mapping

    Screenshot of the install interface, minimap2 is entered in the search box and the latest revision shows it is currently cloning

  8. Upload the following fasta file

    >testing
    GATTACAGATHISISJUSTATESTGATTACA
    
  9. Map with minimap2 tool with the following parameters

    • “Will you select a reference genome from your history or use a built-in index”: Use a genome from history and build index
    • “Use the following dataset as the reference sequence”: The fasta file you uploaded
    • “Single or Paired-end reads: Single
      • param-file “Select fastq dataset”: The fasta file you uploaded

    Your job should be executed using Singularity with a BioContainer! You can watch the logs of Galaxy to see this happening.

    code-in Input: Bash

    journalctl -f
    

    code-out Output

    uwsgi[1190010]: galaxy.tool_util.deps.containers INFO 2021-01-08 13:37:30,342 [p:1190010,w:0,m:2] [LocalRunner.work_thread-1] Checking with container resolver [MulledSingularityContainerResolver[namespace=biocontainers]] found description [ContainerDescription[identifier=docker://quay.io/biocontainers/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:e1ea28074233d7265a5dc2111d6e55130dff5653-0,type=singularity]]
    uwsgi[1190010]: galaxy.jobs.command_factory INFO 2021-01-08 13:37:30,418 [p:1190010,w:0,m:2] [LocalRunner.work_thread-1] Built script [/srv/galaxy/jobs/000/23/tool_script.sh] for tool command [minimap2 --version > /srv/galaxy/jobs/000/23/outputs/COMMAND_VERSION 2>&1; ln -f -s '/data/000/dataset_22.dat' reference.fa && minimap2           -t ${GALAXY_SLOTS:-4} reference.fa '/data/000/dataset_22.dat' -a | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/data/000/dataset_23.dat' > '/data/000/dataset_23.dat']
    uwsgi[1190010]: galaxy.jobs.runners DEBUG 2021-01-08 13:37:30,441 [p:1190010,w:0,m:2] [LocalRunner.work_thread-1] (23) command is: mkdir -p working outputs configs
    uwsgi[1190010]: if [ -d _working ]; then
    uwsgi[1190010]:     rm -rf working/ outputs/ configs/; cp -R _working working; cp -R _outputs outputs; cp -R _configs configs
    uwsgi[1190010]: else
    uwsgi[1190010]:     cp -R working _working; cp -R outputs _outputs; cp -R configs _configs
    uwsgi[1190010]: fi
    uwsgi[1190010]: cd working; SINGULARITYENV_GALAXY_SLOTS=$GALAXY_SLOTS SINGULARITYENV_HOME=$HOME SINGULARITYENV__GALAXY_JOB_HOME_DIR=$_GALAXY_JOB_HOME_DIR SINGULARITYENV__GALAXY_JOB_TMP_DIR=$_GALAXY_JOB_TMP_DIR SINGULARITYENV_TMPDIR=$TMPDIR SINGULARITYENV_TMP=$TMP SINGULARITYENV_TEMP=$TEMP singularity -s exec -B /srv/galaxy/server:/srv/galaxy/server:ro -B /srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/minimap2/8c6cd2650d1f/minimap2:/srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/minimap2/8c6cd2650d1f/minimap2:ro -B /srv/galaxy/jobs/000/23:/srv/galaxy/jobs/000/23 -B /srv/galaxy/jobs/000/23/outputs:/srv/galaxy/jobs/000/23/outputs -B /srv/galaxy/jobs/000/23/configs:/srv/galaxy/jobs/000/23/configs -B /srv/galaxy/jobs/000/23/working:/srv/galaxy/jobs/000/23/working -B /data:/data -B /srv/galaxy/var/tool-data:/srv/galaxy/var/tool-data:ro -B /srv/galaxy/var/tool-data:/srv/galaxy/var/tool-data:ro --home $HOME:$HOME docker://quay.io/biocontainers/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:e1ea28074233d7265a5dc2111d6e55130dff5653-0 /bin/bash /srv/galaxy/jobs/000/23/tool_script.sh > ../outputs/tool_stdout 2> ../outputs/tool_stderr; return_code=$?; cd '/srv/galaxy/jobs/000/23';
    

comment Manage dependencies menu

You can manually pull one or many containers for tools in the admin menu. Go to the admin menu, click Manage Dependencies and select the Containers tab. This will list all tools, their dependencies and whether containers are already pulled or can be pulled on demand.

When a container has been resolved through Singularity, you’ll see something like this: Image of a table entry with minimap2 having requirements minimap2+singularity, a resolved column with a green checkmark next to via singularity, the resolver is mulled_singularity, and a container column with a path to /srv/galaxy/var/cache/singularity/mulled and some long hash.

tip Singularity, Conda, something else?

We often hear

What would be the best practice, use conda or Singularity?

Many of us are moving towards Singularity. Conda environments can resolve differently if they were installed at different times, which isn’t great for reproducibility. Singularity images are never updated after generation which makes them fantastic. Also the isolation that’s there by default is an incredible improvement for less-trustworthy binaries.

tip Does Singularity fix issues with Conda dependencies resolution?

Yes and no. Singularity images are built from conda environments. Only now you are no longer responsible for solving the conda environment, or ensuring that all of the dependencies are installed. The Galaxy project uses a system called “mulling” to bring together multiple conda dependencies together in a single environment, and Singularity images are produced for these dependencies as well. That said, complex or unresolvable conda environments are not solved by Singularity, because Singularity is really just packaging conda’s environment into a single binary file.

Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.

Click here to load Google feedback frame

Citing this Tutorial

  1. Torfinn Nome, Marius van den Beek, Matthias Bernt, Helena Rasche, 2021 Use Singularity containers for running Galaxy jobs (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/singularity/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

details BibTeX

@misc{admin-singularity,
author = "Torfinn Nome and Marius van den Beek and Matthias Bernt and Helena Rasche",
title = "Use Singularity containers for running Galaxy jobs (Galaxy Training Materials)",
year = "2021",
month = "07",
day = "02"
url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/singularity/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                

Congratulations on successfully completing this tutorial!