Adding file-sources to Galaxy

Overview
Creative Commons License: CC-BY Questions:
  • How to set up an S3 bucket

Objectives:
  • Add your S3 bucket on Galaxy

Time estimation: 15 minutes
Supporting Materials:
Published: Jan 20, 2025
Last modification: Jan 20, 2025
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
version Revision: 1

This tutorial demonstrates how to implement an S3 bucket as a Galaxy file-source within Galaxy. We will add here the public Argo data Amazon S3 bucket. Argo is an international program that observes the interior of the ocean with a fleet of profiling floats drifting in the deep ocean currents (https://argo.ucsd.edu). It started 20 years ago and is a dataset of 5 billion in situ ocean observations from 18.000 profiling floats (4.000 active). The Argo GDAC dataset is a collection of 18.000 NetCDF files. It is a major asset for ocean and climate science and a contributor to IOCCP reports.

Agenda

In this tutorial, we will cover:

  1. Find the information you need
  2. Add the S3 bucket
  3. Conclusion

Find the information you need

Hands-on: Find an S3 bucket

Go on Amazon Sustainability Data Initiative.

There you can visit the catalog of data, and by searching for Argo you can directly get to the Argo registry.

On this last page you’ll find all the information you’ll need to add the S3 bucket to Galaxy Image of the Argo registry on amazon.

Add the S3 bucket

Hands-on: Add on Galaxy
  • If not already done clone the Galaxy Europe Infrastructure-playbook repo
  • Create a branch on your fork
  • Go to the file file_sources_conf.yml.j2 in templates/galaxy/config/

There you can edit the file and add your S3 bucket by adding a Argo specific section, like in the following:

- type: s3fs
  label: Argo marine floats data and metadata from Global Data Assembly Centre (Argo GDAC)
  id: argo-gdac-sandbox
  doc: Argo is an international program to observe the interior of the ocean with a fleet of profiling floats drifting in the deep ocean currents (https://argo.ucsd.edu). 
  bucket: argo-gdac-sandbox
  anon: true

Image of the file_sources_conf.yml.j2 file with th argo bucket added.

Finally, commit your changes and write a nice message for the admin when you open your Pull Request.

Conclusion

Here you are all set and once your Pull Request is merged you’ll soon be able to see your bucket in upload data, Choose remote files, and then search for your bucket label !