Galaxy Installation on Kubernetes

Overview

Questions:
• How do I deploy Galaxy on Kubernetes using Helm?

• How can I create a simple replica of usegalaxy.org?

Objectives:
• Have an understanding of how to use Galaxy’s Helm chart

• Be able to use Helm to install different flavors of Galaxy for different purposes

Time estimation: 30 minutes
Level: Intermediate Intermediate
Supporting Materials:
Last modification: Mar 1, 2022

Galaxy Helm Chart

Overview

Galaxy has a minimal number of required dependencies, which makes its basic installation quick for both users and developers. However, configuring a multi-user production instance is a complex undertaking due to Galaxy’s many interacting and dependent systems, components, and configurations. Software containerization has become the preferred method of addressing deployment challenges across operating environments. Containerization also requires orchestration, so that multiple containers can work together to deliver a complex application. Kubernetes has emerged as the primary container orchestration technology, as it is both container agnostic and widely adopted. Kubernetes allows managing, scaling, and deploying different pieces of an application–in a standardized way–while providing excellent tooling for doing so.

In this tutorial, we’ll take a look at Kubernetes and Helm as tools for deploying containerized Galaxy. The goals for this model of deploying Galaxy is to use best-practices from the Galaxy community on how to deploy the Galaxy application in a well-defined package. This model can simplify deployment and management burden of running Galaxy. While it is possible to follow this tutorial by simply copying and pasting supplied commands, and a production-grade Galaxy will be installed, it is desirable to have a basic understanding of the container concepts and Kubernetes and Helm technologies.

Some of the goals for deploying and running Galaxy in this mode include:

• Design a mostly stateless model for running Galaxy where processes can be horizontally scaled as needed
• Integrate components from the Galaxy project ecosystem to leverage existing resources
• Provide a unified handling of Galaxy configurations
• Minimize customized dependencies
• Minimize the need to build custom components

Prerequisites

We’ll be using the Galaxy Helm chart to install and manage a Galaxy deployment. To be able to use this chart, we’ll need access to a Kubernetes cluster, with Helm installed. For development and testing purposes this can be easily achieved by installing Docker Desktop locally and enabling Kubernetes. Afterwards, also install Helm.

For production deployments, we’ll also need some storage resources for data persistence. This can be done by either defining a storage class or creating a Persistent Volume and a corresponding Persistent Volume Claim. Once created, just keep a note of the resources Persistent Volume Claim ID and to use later.

For the CVMFS-enabled version of the chart (more on this below), it is also necessary to run Kubernetes version 1.13 or newer because we’ll be using the Container Storage Interface (CSI).

The Galaxy Helm Chart is currently under active development with enhancements continuously trickling in. As a result, there are no regular releases yet and instead we recommend just cloning the GitHub repository with the chart implementation. This will be the easiest method to keep up with chart changes for the time being.

Clone the chart repository from the machine where you would like to deploy Galaxy and change into the chart directory.

  git clone https://github.com/galaxyproject/galaxy-helm
cd galaxy-helm/galaxy


Deploying Galaxy

The Galaxy Helm chart packages best-practice solutions for deploying Galaxy into a single package that can be readily deployed as a unit. Behind the scenes, all the supporting services are started and configured into an interoperable system. Specifically, this involves starting a database service based on Postgres, using Nginx as a web proxy, and running an independently scalable set of web and job handler processes for Galaxy. This follows the production-quality deployment recommendation setup for Galaxy and leverages some of the Kubernetes features to help with running long-term services (e.g., liveness probes that automatically restart stuck processes).

Deploying the Default Configuration

The default set of values for the Galaxy chart configures only a minimal set of Galaxy options necessary. The configured options are required for suitable operation of the system. Setting other options will depend on the environment and it’s best to refer to the general Galaxy documentation; we’ll also take a look at how to make configuration changes in the context of the chart later in this tutorial.

hands_on Hands-on: Deploying the Galaxy Helm Chart

1. First, we need to fetch any dependencies for the chart. One of the advantages of using Helm is that we can reuse best-practice deployment methods for other software right out of the box by relying on published charts and integrating them into the Galaxy chart.

helm dependency update

2. We can now deploy Galaxy via the Chart. Before running this command make sure you are in the chart source code directory (where values.yaml file resides) and note the trailing dot. Running this command will create a new Helm release (i.e., chart installation) called galaxy.

helm install --name galaxy .

3. It will take about a minute or two for the database to be initialized, necessary containers downloaded, and Galaxy processes started. Ultimately, while this may depend on the Kubernetes cluster setup you are using, Galaxy should be available at the root URI for the given machine. We can always check the status of our release by typing helm status galaxy.

Deploying a CVMFS-enabled Configuration

The Galaxy Helm chart also comes with a more comprehensive set of configuration options that leverage more of the Galaxy project ecosystem. In practice this means deploying Galaxy with the same toolset as that of usegalaxy.org right out of the box. It’s important to note that this deployment configuration leverages all the same chart components but just defines more configuration options. Namely, we attach to the Galaxy CVMFS ready-only file system for retrieving the tool configurations while leveraging BioContainers for resolving tool dependencies.

hands_on Hands-on: Deploying the CVMFS-enabled Configuration

1. If you are following this tutorial sequentially and have a release of Galaxy already running, let’s delete it (assuming that’s fine and you have no data to keep). More details about the deletion process are available in the Deleting a Deployment section. If you’re just playing around, run helm delete --purge galaxy.

2. The CVMFS variant of the Galaxy chart has an additional dependency on the Galaxy CVMFS chart. We’ll deploy this chart into its own Namespace to keep its resources nicely grouped. We’ll also fetch the chart from a packaged chart repository instead of its GitHub repo.

kubectl create namespace cvmfs
helm repo update
helm install --name cvmfs --namespace cvmfs galaxy/galaxy-cvmfs-csi

3. We can now install the CVMFS-enabled set of values.

helm install --name galaxy galaxy/galaxy

4. Again, it will take a few minutes for Galaxy to start up. This time most of the waiting is due to the tool definition files to be cached on CVMFS and loaded into the tool panel. We can check the status of the deployment by running helm status galaxy. We can also watch the boot process by tailing the logs of the relevant container with a command similar to kubectl logs -f galaxy-web-7568c58b94-hjl9w where the last argument is the name of the desired pod, as printed following the helm install command. Once the boot process has completed, we can access Galaxy at /galaxy/ URI (note the trailing /; it’s significant).

Deleting a Deployed Helm Release

After we’re done experimenting with an installation of the chart, we can just as easily delete all the resources as we’ve created them. However, that may not be desirable so make sure you understand the system you’re working on to avoid undesired surprises. Namely, deleting and recreating a Helm release is generally not a problem where the processes will just respawn and everything will go back to operational; however, underlying storage configuration may interfere here with all the application data being potentially lost. This predominantly depends on how the relevant storage class was configured.

hands_on Hands-on: Deleting a Deployed Helm Release

1. Before we delete a deployment, let’s ensure we understand what will happen with the underlying storage used by Galaxy.

\$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                   STORAGECLASS      REASON   AGE
cvmfs-cache-pv                             1000Mi     RWX            Retain           Bound    cvmfs/cvmfs-cache-pvc                   manual                     31m
pvc-55806281-96c6-11e9-8e96-0251cc6c62f4   1Gi        ROX            Delete           Bound    default/galaxy-cvmfs-gxy-data-pvc       cvmfs-gxy-data             28m
pvc-5580c830-96c6-11e9-8e96-0251cc6c62f4   1Gi        ROX            Delete           Bound    default/galaxy-cvmfs-gxy-main-pvc       cvmfs-gxy-main             28m
pvc-55814757-96c6-11e9-8e96-0251cc6c62f4   10Gi       RWX            Delete           Bound    default/galaxy-galaxy-pvc               nfs-provisioner            28m
pvc-70d4cc48-96be-11e9-8e96-0251cc6c62f4   8Gi        RWO            Delete           Bound    default/data-galaxy-galaxy-postgres-0   nfs-provisioner            84m
pvc-8cb27bc9-9679-11e9-8e96-0251cc6c62f4   100Gi      RWO            Delete           Bound    cloudman/data-nfs-provisioner-0         ebs-provisioner            9h


As we can see in the command output, the storage resources associated with the current deployment have the reclaim policy set to Delete, which will happen once no resources are using the given resource. If what you see is the not the intended behavior, you can change the reclaim policy.

2. Once we’re ok with the state of the resources and are ready to delete a a deployment, we can do so with the following commands:

helm delete --purge galaxy
helm delete --purge cvmfs


Next Steps

This tutorial covers the basics of getting Galaxy deployed on Kubernetes using Helm. There is a lot more to understanding all the configuration options for the chart and the available deployment models. For more info on some of these topics, take a look at the Galaxy Helm chart repository as well as other tutorials tagged with kubernetes. Also, feel free to reach out on Gitter: https://gitter.im/galaxyproject/FederatedGalaxy.

Key points

• Stock deployment of production Galaxy components on Kubernetes is simple

• Helm chart allows easy configuration changes

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

1. Pablo Moreno, Enis Afgan, Nuwan Goonasekera, Alex Mahmoud, Sergey Golitsynskiy, 2022 Galaxy Installation on Kubernetes (Galaxy Training Materials). https://training.galaxyproject.org/archive/2022-05-01/topics/admin/tutorials/k8s-deploying-galaxy/tutorial.html Online; accessed TODAY
2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

details BibTeX

@misc{admin-k8s-deploying-galaxy,
author = "Pablo Moreno and Enis Afgan and Nuwan Goonasekera and Alex Mahmoud and Sergey Golitsynskiy",
title = "Galaxy Installation on Kubernetes (Galaxy Training Materials)",
year = "2022",
month = "03",
day = "01"
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
doi = {10.1016/j.cels.2018.05.012},
url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
year = 2018,
month = {jun},
publisher = {Elsevier {BV}},
volume = {6},
number = {6},
pages = {752--758.e1},
author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
title = {Community-Driven Data Analysis Training for Biology},
journal = {Cell Systems}
}
`