Frequently Asked Questions

Tutorial Questions


Additional resources to learn more about proteomic data analysis

To learn more about proteomic data analysis, we suggest you look at:

After sequencing with MinKNOW software, we get many fastq files, do these files need to be combined into one file before uploading or is it possible to upload them all at once?

Question: After sequencing with MinKNOW software, we get many fastq files, do these files need to be combined into one file before uploading or is it possible to upload them all at once?

After sequencing with MinKNOW software, it is a good approach to combine the files from the same run before processing them. You could create a collection per run with all fastq files and then use the collection operation to concatenate all files in a collection.

AnnData Import/ AnnData Manipulate not working?

This is a known issue, please do not use version 0.7.4 of the tool, and use version 0.6.2 instead. The Inspect AnnData tool should work fine however.

Switching tool versions

Are Barcodes always on R1 and Sequence data on R2?

Question: Are Barcodes always on R1 and Sequence data on R2?

No, it really depends on the protocol. In some protocols this convention is swapped, in others the barcodes can be distributed across both reads.

Are these data free to use and download?

Question: Are these data free to use and download?

Yes, the metadata, aligned reads, and other SARS-CoV-2 data that is mentioned in this training are free to download and have no associated egress charges.

Automatically trim adapters (without providing custom sequences)

There are many tools for this: Trimmomatic, Trim Galore, and a few others (search: “Trim”). In some of these there are options to automatically trim adaptors, but they are not so specific to the sequence you are working on necessarily.

Can EncyclopeDIA be run on a DIA-MS dataset without a spectral library?

Question: Can EncyclopeDIA be run on a DIA-MS dataset without a spectral library?

Yes. In this GTN, the workflow presented is the Standard EncyclopeDIA workflow; however, there is a variation upon the Standard EncyclopeDIA workflow, named the WALNUT EncyclopeDIA workflow in which a spectral library is not required. Simply, the WALNUT variation of the workflow omits the DLIB spectral/PROSIT library input, hence requiring just the GPF DIA dataset collection, Experimental DIA dataset collection, and the FASTA Protein Database file. Therefore, the Chromatogram Library is generated using the GPF DIA dataset collection and the FASTA Protein Database alone. This method does generate fewer searches than if a spectral library is used. The Galaxy-P team tested the efficacy of the WALNUT workflow compared to the Standard EncyclopeDIA workflow, and more information on that comparison and those results can be found at this link.

Can I use alternative tools for the Quantification step?

Question: Can I use alternative tools for the Quantification step?

There are some alternatives to Salmon for reference transcriptome-based RNA quantification. Kallisto and Sailfish use a similar approach, known as pseudoalignment.

Can I use these workflows on datasets generated from our laboratory?

Yes, the workflows can be used on other datasets as well. However, you will need to consider data acquisition and sample preparation methods so that the tool parameters can be adjusted accordingly.

Can this ASaiM workflow be used for single-end data?

Question: Can this ASaiM workflow be used for single-end data?

Yes, the inputs have to be changed to a single-end file rather than a paired-end.

Can we also use this workflow on Illumina raw reads?

Question: Can we also use this workflow on Illumina raw reads?

Yes, some tools would need to be changed or removed:

  • For the Preprocessing workflow, plotting with Nanoplot shall be removed and keep only FastQC, MultiQC and Fastp.
  • For the mapping in the SNP based pathogen detection workflow, instead of Minimap2, Bowtie can be used.

Can we polish the assembly with long reads too?

Yes. In this tutorial, we only polish the assembly with the short reads. This may be enough for bacterial genomes. However, for an even better polish (usually), a common approach is to also polish the assembly with the long reads. A typical workflow for this would assemble with long reads, then polish with long reads (x 4 rounds, with Racon), polish with long reads again (x 1 round, with Medaka), then polish with short reads (x2 rounds with Pilon).

Can we use snippy pipeline instead for the phylogenetic analysis?

Question: Can we use snippy pipeline instead for the phylogenetic analysis?

On principle yes. We did not try yet. Snippy is available in Galaxy

Can we use the ASaiM-MT workflow on multiple input files at the same time?

Question: Can we use the ASaiM-MT workflow on multiple input files at the same time?

Currently, that is one of its limitations. However, Galaxy offers a workflow within workflow feature which can help process multiple files at the same time and this output can be combined into one using the MT2MQ tool.

Changing the heatmap colours

You can change the heatmap color, by expanding the Show advanced options section. There are many options here, including setting the colors.

Could I use a different p-adj value for filtering differentially expressed genes?

Question: Could I use a different p-adj value for filtering differentially expressed genes?

Yes, you can modify this value, to perform a more rigorous analysis, or extend the range of genes selected. A higher p-value will significantly increase the number of genes selected, at the expense of including possible false positives.

Defining a Learning Pathway

Hands-on: Defining a Learning Pathway

Learning Pathways are sets of tutorials curated by community experts to form a coherent set of lessons around a topic, building up knowledge step by step.

To define a learning pathway, create a file in the learning-pathways/ folder. An example file is also given in this folder (pathway-example.md). It should look something like this:

---
layout: learning-pathway

title: Title of your pathway
description: |
Description of the pathway. What will be covered, what are the learning objectives, etc?
Make this as thorough as possible, 1-2 paragraphs. This appears on the index page that
lists all the learning paths, and at the top of the pathway page
tags: [some, keywords, here ]

cover-image: path/to/image.png # optional cover image, defaults to GTN logo
cover-image-alt: alt text for this image

pathway:
- section: "Module 1: Title"
description: |
description of the module. What will be covered, what should learners expect, etc.
tutorials:
- name: galaxy-intro-short
topic: introduction
- name: galaxy-intro-101
topic: introduction

- section: "Module 2: Title"
description: |
description of the tutorial
will be shown under the section title
tutorials:
- name: quality-control
topic: sequence-analysis
- name: mapping
topic: sequence-analysis
- name: general-introduction
topic: assembly
- name: chloroplast-assembly
topic: assembly
- name: "My non-GTN session"
external: true
link: "https://example.com"
type: hands_on # or 'slides'

# you can make as many sections as you want, with as many tutorials as you want

---

You can put some extra information here. Markdown syntax can be used. This is shown after the description on the pathway page, but not on the cards on the index page.

And that’s it!

We are happy to receive contributions of learning pathways! Did you teach a workshop around a topic using GTN materials? Capture the program as a learning pathways for others to reuse!

Do I have to run the tools in the order of the tutorial?

Question: Do I have to run the tools in the order of the tutorial?

The tools are presented in the order that a typical analysis would use. If you want to run some tools in parallel (to save time) you can do so. This workflow illustrates the analysis done in the tutorial and shows that there are multiple “paths” leading to outputs that have some steps that could be run at the same time: MultiQC, Kraken2, JBrowse and TB Variant Report.

Do the pipelines work with both isolates and direct from raw meat? or only isolate?

Question: Do the pipelines work with both isolates and direct from raw meat? or only isolate?

The workflow can work with both isolates and raw meat. The workflow is designed to remove hosts before detecting any pathogen, so both isolates and raw meat samples are pre-processed equaliy before the analysis starts.

Do you have resources to help me get started working in the cloud?

Question: Do you have resources to help me get started working in the cloud?

Yes, we have a number of documents and videos to help you start working with SRA data in the cloud:

Downloading the files from the NCBI server fails or takes too long.

Download the data from Zenodo instead (see overview box at top of tutorial). This method uses Galaxy’s generic data import functionality, and is more reliable and faster than the download from NCBI.

First job I submitted remains grey or running for a long time - is it broken?

Question: First job I submitted remains grey or running for a long time - is it broken?
  • Check with top or your system monitor - if Conda is running, things are working but it’s slow the first time a dependency is installed.
  • The first run generally takes a while to install all the needed dependencies.
  • Subsequent runs should start immediately with all dependencies already in place.
  • Installing new Conda dependencies just takes time so tools that have new Conda packages will take longer to run the first time if they must be installed.
  • In general, a planemo_test job usually takes around a minute - planemo has to build and tear down a new Galaxy for generating test results and then again for testing properly. Longer if the tool has Conda dependencies.
  • The very first test in a fresh appliance may take 6 minutes so be patient.

For preprocessing part with host removal: Where do you find the abbreviations for each host species available (e.g. bos is cow, homo is human..)?

Question: For preprocessing part with host removal: Where do you find the abbreviations for each host species available (e.g. bos is cow, homo is human..)?

The abbreviation (i.e. the genus) is the first word in the list of possible hosts. The names are the scientific names for species, which would be shown on the taxonomy tree if you would look up the common name (i.e. bovine) on Wikipedia.

From where can I import other genomes?

Question: From where can I import other genomes?

In this tutorial, we used kalamari DB with the full list of possible host sequences that can be removed. Reads are either tagged to map one of those species or are left unassigned. If the task at hand in the real world cannot be covered by those, you can also try another DB for Kraken2 that includes your species (or maybe retain unmapped reads from a read aligner such as Bowtie2, Minimap2…).

How do I know what protocol my data was sequenced with?

Question: How do I know what protocol my data was sequenced with?

If you have 10x data, then you just need to count the length of the R1 reads to guess the Chromium version (see this tutorial). For other types of data, you must know the protocol in advance, and even then you must also know the multiplexing strategy and the list of expected (whitelisted) barcodes. The whitelist may vary from sequencing lab to sequencing lab, so always ask the wetlab people how the FASTQ data was generated.

How does one compare metaproteomics measurements from two experimental conditions?

For comparing taxonomy composition or functional content of two conditions in metaproteomics or metatranscriptomics studies, users are recommended to use metaQuantome. GTN tutorials for metaQuantome are available in the proteomics topic.

How does one convert RAW files to MGF peak lists within Galaxy?

Galaxy has implemented msconvert tool so that RAW files from Thermo instruments can be converted into MGF or mzML formats.

How many search engines can you use in SearchGUI?

Question: How many search engines can you use in SearchGUI?

SearchGUI has options to use upto 9 search algorithms. However, running all at the same time can be time consuming. According to our initial test, upto 4 search engines can give you good results.

How to enable the Activity Bar

This FAQ demonstrates how to enable the activity bar within the Galaxy interface

If you do not see the Activity Bar it can be enabled as follows:

  1. Click on the “User” link at the top of the Galaxy interface
  2. Select “Preferences
  3. Scroll down and click on “Manage Activity Bar
  4. Toggle the “Enable Activity Bar” switch and voila!

    The four steps described above are shown visually

I cannot run client tests because yarn is not installed.

Question: I cannot run client tests because yarn is not installed.

Make sure you have executed scripts/common_startup.sh and have activated the virtual environment (. .venv/vin/activate) in your current terminal session.

I have FASTQ files from metagenomics or metatranscriptomics datasets? How can I convert them into a protein FASTA file for metaproteomics searches?

Galaxy has a tool named Sixgill that can be used to convert the nucleic acid sequences to ‘metapeptide’ sequences. There are other options available within Galaxy such as the GalaxyGraph approach and Metagenome Binning, Assembly and Annotation Workflow. Please contact us, if you need assistance.

I have a really large search database, what search strategies do you recommend for searching my mass spectrometry dataset?

Readers are encouraged to use the database sectioning approach described by Praveen Kumar et al and available within Galaxy. Readers are also encouraged to consider other approaches such as MetaNovo (not yet available in Galaxy). In absence of any database or taxonomic information about the microbiome dataset, other methods such as COMPIL 2.0 and De novo search methods can also be considered.

I want to use a collection for outputs but it always passes the test even when the script fails. Why?

Question: I want to use a collection for outputs but it always passes the test even when the script fails. Why?
  • Collections are tricky for generating tests.
    • The contents appear only after the tool has been run and even then may vary with settings.
  • A manual test override is currently the only way to test collections properly.
  • Automation is hard. If you can help, pull requests are welcomed.
  • Until it’s automated, please take a look at the plotter sample.
  • It is recommended that you modify the test over-ride that appears in that sample form. Substitute one or more of the file names you expect to see after the collection is filled by your new tool for the <element.../> used in the plotter sample’s tool test.

In bowtie 2 parameters, in place of 1000 for other experiments, should we mention the median fragment length observed in our library?

Not the median fragment length but the maximum fragment length you expect. However, you will see that in illumina sequencers, the longer the fragments are the less efficiently they are sequenced so long fragment length pairs are not very numerous.

In the MVP platform, is it possible to view the genomic location of all the peptides?

Question: In the MVP platform, is it possible to view the genomic location of all the peptides?

Not really, you can only view the genomic localization of the peptides that were present in the genomic mapping file (output from the first workflow).

Is it possible to replace the existing alignment tools such as HISAT and Freebayes with other tools?

Question: Is it possible to replace the existing alignment tools such as HISAT and Freebayes with other tools?

The tools in this workflow are customizable, however, the user has to ensure that the inputs are in the correct format, while using the same reference genome database.

Is it possible to subsample some samples if you have more reads?

Question: Is it possible to subsample some samples if you have more reads?

Yes, we would recommend to process all reads and just before the peak calling. You can use tool Samtools view to sample the BAM file.

Is it possible to use alternative tools to those proposed in the tutorial?

Yes! There are many tools whose functionality are similar (e.g. Illumina reads can be mapped by using HISAT2 instead of Bowtie2).

Is the ToolFactory a complete replacement for manual tool building?

Question: Is the ToolFactory a complete replacement for manual tool building?
  • No, except where all the requirements for the package or script can be satisfied by the limited automated functions of the code generator, or where there is a script with all the complex logic that might otherwise go into XML
  • Many advanced XML features are not available such as output filters.
  • Adding DIY output filters, XML macros and some other advanced features is possible if anyone is sufficiently enthusiastic - some features in the galaxyxml package would be relatively straightforward to add.

Is there a way to filter on the Kalimari database?

Question: Is there a way to filter on the Kalimari database?

To filter the Kalamari database, e.g. leaving out milk bacteria only to detect spoilers or contaminants, but the Kalimera list contains a lot more than that, you can:

  1. Look at a publication etc. to find a list of bacteria to remove.
  2. Change the regex ^.*Gallus|Homo|Bos.*$ to ^.*Gallus|Homo|Bos|Bacterium1|Bacterium2...|BacteriumN.*$

Milk pathogens are somewhat known, Salmonella, Escherichia… It might be easier to retain reads only mapping to pathogens instead

Isn't it awkward to find so many humans sequences there, since we filter for them before?

Question: Isn't it awkward to find so many humans sequences there, since we filter for them before?

We see a lot that Kraken tends to assign many reads to human, despite they do not map to human genome. Due to resemblance between organisms and the limited species coverage of Kraken databases sometimes does happen that reads corresponding to higher organisms get mapped to humans. It was a very severe problem for the standard databases, because yeast genes were mis-assigned to human.

It says I already have an account when registering for ecology.usegalaxy.eu

The ecology.usegalaxy.eu (and any other Galaxy server ending in usegalaxy.eu) is the SAME server as the regular usegalaxy.eu server, just modified for Ecology analyses.

You can use the SAME credentials you used to register on usegalaxy.eu to log into the ecology server.

If you do not have an account on Galaxy EU yet, will need to create one.

JBrowse is taking a long time to complete?

Question: JBrowse is taking a long time to complete?

Normally this should be done in around 3 minutes. However, it might be busy on the servers, so please be patient and come back to it later.

Most tools seem to have options for assembly using long and short reads, what are the pros and cons of the different tools?

Question: Most tools seem to have options for assembly using long and short reads, what are the pros and cons of the different tools?

In our experience, when both long and short reads are allowed as input, the difference comes down to the order in which set is assembled first. For example, Unicycler assembles the short reads first (which can be good, because they are more accurate), and then scaffolds these into larger contigs using long reads. Other tools (or workflows) often assemble long reads first (which can also be good because these can span repeat regions), then correct this assembly with information from the more accurate short reads. There may also be other variations on long/short read assembly, and/or iterations of these types of steps (assemble, correct). My preference is to assemble long reads first, but that’s because I’m really interested in covering repeat regions. If accuracy was the aim, rather than contig length, the short-reads-first approach may be better. For even more complexity … I think some tools now allow input of “trusted contigs” - i.e. contigs assembled from other tools. Ryan Wick has a new tool called Trycyler that can take in multiple assemblies to make a consensus (bacterial genomes).

MultiQC error for your FastQC reports?

Please double-check that:

  1. You selected FastQC tool as the source of the log files in MultiQC.
  2. And you provided the Raw Data of FastQC and not the HTML reports.

My Rscript tool generates a strange R error on STDOUT about an invalid operation on a closure called 'args' ?

Question: My Rscript tool generates a strange R error on STDOUT about an invalid operation on a closure called 'args' ?

Did your code declare the args vector with something like args = commandArgs(trailingOnly=TRUE) before it tried to access args[1] ? See the plotter tool for a sample

My Scanpy FindMarkers step is giving me an empty table

Question: My Scanpy FindMarkers step is giving me an empty table

Try selecting: “Use programme defaults: Yes” and see if that fixes it.

My snippy is running for a very long time. Is this normal?

Question: My snippy is running for a very long time. Is this normal?

As this tutorial uses real world data some of the tools can run for quite a while. During a course we can expected longer run times as the Galaxy servers are heavily used. Typically expected runtimes are approximately:

Tool name Runtime
FastQC 2 minutes
MultiQC 5 minutes
Trimmomatic 5 minutes
kraken2 5 - 12 minutes
snippy 15 - 25 minutes
TB Variant Filter 2 minutes
TB-Profiler 5 minutes
Text transformation Less than 1 minute
TB Variant Report 1 minute
JBrowse 5 minutes
Samtools stats (optional) 1 minute
BAM Coverage plotter (optional) 1 minute

On Scanpy PlotEmbed, the tool is failing

Question: On Scanpy PlotEmbed, the tool is failing

Try selecting “Use raw attributes if present: NO”

On the Scanpy PlotEmbed step, my object doesn’t have Il2ra or Cd8b1 or Cd8a etc.

Question: On the Scanpy PlotEmbed step, my object doesn’t have Il2ra or Cd8b1 or Cd8a etc.

Check your Anndata object - it should be 7874 x 14832, i.e. 7874 cells x 14832 genes. Is it actually 2000 genes only (i.e. and therefore missing the above markers)? You may have selected to remove genes at the Scanpy FindVariableGenes step (last toggle, ‘Remove genes not marked as highly variable’ < Select NO.) (Most likely you did this correctly the first time, but later in investigating how many got marked as highly variable, may have run this tool again and removed the nonvariable ones. We’ve updated the text to more clearly prevent this, but you may have gotten caught out!)

Only one Planemo test runs at a time. Why doesn't the server allow more than one at once?

Question: Only one Planemo test runs at a time. Why doesn't the server allow more than one at once?
  • When a new dependency is being installed in the Planemo Conda repository, there is no locking to prevent a second process from overwriting or otherwise interfering with it’s own independent repository update.
  • The result is not pretty.
  • Allowing two tests to run at once has proven to be unstable so the Appliance is currently limited to one.

Preparing materials for asynchronous learning: CYOA

If you are running a remote training, and expect your users to follow a specific path, be certain to include the URL parameter to select the pathway to avoid student confusion. Please note that all tutorials using a CYOA should be tagged which will give you a heads up as a trainer.

Preparing materials for asynchronous learning: FAQs

When you are running a remote, asynchronous lesson, you’ll want to be sure you collect all student questions and add them back to your tutorial afterwards, as FAQs. This will help other learners as they progress through the materials, and can give you a very easy URL to point your learners to if they get stuck on a particular task.

Preparing materials for asynchronous learning: Self-Study

In the context of remote trainings, where a teacher isn’t synchronously available, ensuring that you have questions throughout your materials for students to check their understanding is incredibly key.

Additionally ensuring that solutions are provided, and are correct and up-to-date (or use a snippet explaining data variability along with with ways to check the results) is mandatory. Students will then use these questions to self-check their understanding against what you expected them to learn.

Preparing materials for asynchronous learning: Tips

The use of snippets is extremely important for asynchronous, remote learning. In this situation as students do not have a teacher immediately on hand, and likely do not have friends or colleagues sitting working with them, they will rely on these boxes to refresh their knowledge and know what to do.

Please ensure you test your learning materials with a learner or colleague not familiar the material, and if possible, (silently) watch them go through your lesson. You’ll easily identify which portions need more explanations and details.

Running more than one round of Pilon polishing

Include the most recent polished assembly as input to the next round. You will also need to make a new bam file (here, we have round1.bam and round2.bam).

Round 1

assembly.fasta + illumina reads => BWA MEM => round1.bam
round1.bam + assembly.fasta => pilon => polished.fasta

Round 2

polished.fasta + illumina reads => BWA MEM => round2.bam
round2.bam + polished.fasta => pilon => polished2.fasta

How to know when enough polishing iterations have run?

There is no single answer, but a common way is to see when pilon stops making many polishing changes between rounds. So if round1 made 100 changes, and round2 made only 3, this seems like there would not be much more polishing to do.

How can I see how many changes Pilon has made?

There are two ways that I know of to see how many changes that Pilon made:

The first is to look at the tool standard output (stdout) from Pilon (instructions).

Somewhere near the top of this log file will be a line that says how many corrections (changes) were made.

The second way is to count the number of lines in the changes file. To do this, use the tool called Line/Word/Character count tool, and select the line count option.

TB Variant Report crashes (with an error about KeyError: 'protein')

Question: TB Variant Report crashes (with an error about KeyError: 'protein')

This is a bug present in TB Variant Report (aka tbvcfreport) version 0.1.8 and earlier. In this case it is triggered by the presence of variants in Rv3798. You only see this bug, however, if you forget to run tb_variant_filter (TB Variant Filter). Rv3798 is a suspected transposase and any variants in this gene region would be filtered out by tb_variant_filter, so if you see this crash, make sure you have run the filter step before the TB Variant Report step.

The Build tissue-specific expression dataset tool (step one) exits with an error code.

For the HPS source files version select HPA normal tissue 23/10/2018 rather than the version from 01/04/2020.

The UMAP Plots errors out sometimes?

Try a different colour palette. For upstream code reasons, the default color palette sometimes causes the tool to error out.

  • Under Plot attributes, do
    • “Colour map to use for continuous variables”: viridis
    • “Colors to use for plotting categorical annotation groups”: plasma

The folder `recipes/belerophon/` and the file `meta.yaml` already exist in bioconda?

Question: The folder `recipes/belerophon/` and the file `meta.yaml` already exist in bioconda?

The recipe has already been added previously. If you want to create the recipe from scratch you may just do this in another directory below recipes/.

The input for a tool is not listed in the dropdown

This tutorial uses collections, some tools will require collections as input (e.g. Taxonomy-to-Krona). To select a collection as in put to a file, click on the param-collection Dataset collection button in front of the input parameter you want to supply the collection to.

The input for a tool is not listed in the dropdown

This tutorial uses collections, some tools will require collections as input (e.g. Taxonomy-to-Krona). To select a collection as in put to a file, click on the param-collection Dataset collection button in front of the input parameter you want to supply the collection to.

UCSC import: what should my file look like?

Question: UCSC import: what should my file look like?

~2020 lines, with the following header line:

bin    name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        score   name2   cdsStartStat    cdsEndStat      exonFrames

Where:

  • txStart: Transcript start site
  • cdsStart: CodingSequence start site

Note: UCSC is updated frequently, you might get a slightly different number of lines. If you only get one row in this file, make sure you requested the entire chr22, not just one position.

What advantages does a Chromatogram Library have over a DDA-generated library or predicted spectral library?

Question: What advantages does a Chromatogram Library have over a DDA-generated library or predicted spectral library?

While generating a Chromatogram Library is the most time consuming step of the EncyclopeDIA workflow, it is beneficial to DIA data analysis. DIA is a novel technique and methods for DIA data analysis are still being developed. One method commonly used includes searching DIA data against DDA-generated libraries. However, there are limitations in this method. Firstly, DDA-generated libraries are not always an accurate representation of DIA data: differences in the methods of data collection play an important role in the efficacy of the library. Secondly, DDA-generated libraries often require labs to run completely separate DDA experiments to simply generate a library with which to analyze their DIA data. Chromatogram Libraries mitigate some of the previous shortcomings mentioned. DIA data is incorporated into the generation of the Chromatogram Library and therefore provides context to the DIA data being analyzed. Secondly, the ELIB format of the Chromatogram Library allows for extra data to be included in the analysis of the DIA data, including intensity, m/z ratio, and retention time compared to the use of a DDA-generated DLIB library. Lastly, a Chromatogram Library can be generated without the use of a spectral library (as mentioned in the last question). Therefore, it is possible to forgo DDA data collection as the DLIB DDA-generated library is not strictly needed for Chromatogram Library generation and to run the EncyclopeDIA workflow (saving time and resources).

What does `^.*Gallus|Homo|Bos.*$` mean?

Question: What does `^.*Gallus|Homo|Bos.*$` mean?

^.*Gallus|Homo|Bos.*$ is a regular expression that matches a string containing the words Gallus OR Homo OR Bos.

What file/data formats are defined for I/O in Galaxy?

Question: What file/data formats are defined for I/O in Galaxy?
  • Galaxy Datatypes
  • [galaxy-root]/config/datatypes_conf.xml is read at startup so new datatypes can be defined.

What is Gene Ontology (GO)?

Question: What is Gene Ontology (GO)?

A very commonly used way of specifying these sets is to gather genes/proteins that share the same Gene Ontology (GO) term, as specified by the Gene Ontology Consortium.

The GO project provides an ontology that describes gene products and their relations in three non-overlapping domains of molecular biology, namely “Molecular Function”, “Biological Process”, and “Cellular Component”. Genes/proteins are annotated by one or several GO terms, each composed of a label, a definition and a unique identifier. GO terms are organized within a classification scheme that supports relationships, and formalized by a hierarchical structure that forms a directed acyclic graph (DAG). In such a graph is used the notions of child and parent, where a child inherits from one or multiple parents, child class having a more specific annotation than parent class (e.g. “glucose metabolic process” inherits from “hexose metabolic” parent term which itself inherits from “monosaccharide metabolic process” etc.). In this graph, each node corresponds to a GO term composed of genes/proteins sharing the same annotation, while directed edges between nodes represents their relation (e.g. ‘is a’, ‘part of’) and their roles in the hierarchy (i.e. parent and child).

Further reading

What is a SNP?

Question: What is a SNP?

SNP (pronounced “snip”) stands for Single Nucleotide Polymorphism. This means a single nucleotide change as compared to the reference genome.

What is the principle of an enrichment analysis?

Question: What is the principle of an enrichment analysis?

Enrichment analysis approach (also called over-representation analysis (ORA)) was introduced to test whether pre-specified sets of proteins (e.g. those acting together in a given biological process), change in abundance more systematically than as expected by chance. This type of analysis investigates hypotheses that are more directly relevant to the biological function, and can also help highlight a process over-represented within a subset of proteins.

Further reading: “Huang DW, Sherman BT, and Lempicki RA (2009) Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13”

What other methods are available to study the functional state of the microbiome within Galaxy?

Other software such as EggNOG Mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA) and ProPHAnE also generate functional outputs.

What should I do special if on usegalaxy.be?

Note for anyone trying to follow the tutorial on usegalaxy.be:

In step 3 of the hands-on section of setting up the sars-cov-2 analysis bot, when suggested to run

planemo run vcf2lineage.ga vcf2lineage-job.yml --profile planemo-tutorial --history_name "vcf2lineage test"

please use directly the workflow ID 814dd8d1c056bc54 instead of vcf2lineage.ga. This ID points to a public workflow that’s using the version of the pangolin tool installed on usegalaxy.be`.

What software tools are available to determine taxonomic composition from mass spectrometry data?

Within the Galaxy framework we recommend the use of Unipept software that uses NCBI taxonomy and UniProt databases to detect unique peptides for taxonomy. Other software tools such as MetaTryp 2.0 (PMID: 32897080) can also be used to determine the taxonomic composition of the metaproteomics datasets.

When I get a warning for base per sequence content, what should I do?

So far it does not mean that your data is bad. Your protocol or your data might have a bias that you normally expect. Check first the following things:

  • Adapter content (maybe some adapters are still in your data)
  • Kmer content/Over represented sequences (this would indicate a contamination or a protocol/sequence bias)
  • Per base quality plot. If the overall quality is not good, then probably the sequencing was poorly performed.
  • Read about your protocol, e.g., ChIP-Seq and ATAC-Seq typically have a nucleotide bias. For example this article about ATAC-Seq.

When I try to run a Selenium test, I get an error

Question: When I try to run a Selenium test, I get an error

If you get the following error:

selenium.common.exceptions.SessionNotCreatedException (...This version of ChromeDriver only supports Chrome version...)

Make sure that (a) the version of your ChromeDriver is the same as the version of Chrome:

$ chromedriver --version
$ chrome --version

If they are not the same:

  • download the appropriate version of ChromeDriver.
  • unzip the file
  • move the chromedriver file into the appropriate location.
    • On Linux, that could be /usr/bin, $HOME/.local/bin, etc.
    • Use the which command to check the location: $ which chromedriver
  • Make sure the permissions are correct (755).

When will aligned read objects be available for other data types?

Question: When will aligned read objects be available for other data types?

We hope to have these constructed for long read SARS-CoV-2 data in the near future. If there is strong community interest we may expand this offering to other organisms or data types such as metagenome submissions. If you would like this format for other datasets, write to the SRA helpdesk (sra@ncbi.nlm.nih) and let us know!

Where can I find example queries for use in the cloud and elsewhere?

Question: Where can I find example queries for use in the cloud and elsewhere?

We have examples on our website for Athena (link) and BigQuery (link) which can be easily adapted to other environments.

Where can I find the full listing and description of the columns in each metadata table?

Question: Where can I find the full listing and description of the columns in each metadata table?

Table definitions are available here:

Where can I get planemo?

Question: Where can I get planemo?

Plese see the installation section. Essentially you can pip install planemo. If you don’t have pip, you need to install this first.

On windows you’ll need WSL2 and then you can apt-get install python3-pip, same for ubuntu. For OSX users it is probably present.

Where can I read more about Quality Control of data?

Question: Where can I read more about Quality Control of data?

I really like QCFAIL, It has some nice user stories of quality control issues encountered in real data and experiments

Which icons are available to use in my tutorial?

To use icons in your tutorial, take the name of the icon, ‘details’ in this example, and write something like this in your tutorial:

{% icon details %}

Some icons have multiple aliases, any may be used, but we’d suggest trying to choose the most semantically appropriate one in case Galaxy later decides to change the icon.

The following icons are currently available:

icon[0][0]
announcement
icon[0][0]
arrow-keys
icon[0][0]
code-in
icon[0][0]
code-out
icon[0][0]
cofest, hall-of-fame, pref-permissions
icon[0][0]
comment
icon[0][0]
congratulations
icon[0][0]
copy, param-files, zenodo_link
icon[0][0]
curriculum, level
icon[0][0]
details, galaxy-info
icon[0][0]
docker_image
icon[0][0]
email
icon[0][0]
exchange, switch-histories
icon[0][0]
external-link, galaxy_instance
icon[0][0]
event, event-date, last_modification
icon[0][0]
event-location
icon[0][0]
event-cost
icon[0][0]
feedback
icon[0][0]
galaxy-advanced-search
icon[0][0]
galaxy-show-active
icon[0][0]
galaxy-barchart, galaxy-visualise, galaxy-visualize
icon[0][0]
galaxy-vis-config, galaxy-viz-config
icon[0][0]
galaxy-bug
icon[0][0]
galaxy-chart-select-data
icon[0][0]
galaxy-clear
icon[0][0]
galaxy-columns, galaxy-multihistory, galaxy-history
icon[0][0]
galaxy-cross
icon[0][0]
galaxy-dataset-map
icon[0][0]
galaxy-delete
icon[0][0]
galaxy-dropdown
icon[0][0]
galaxy-history-options
icon[0][0]
galaxy-eye, solution
icon[0][0]
galaxy-gear, galaxy-wf-options
icon[0][0]
galaxy-history-archive
icon[0][0]
galaxy-history-input
icon[0][0]
galaxy-history-answer
icon[0][0]
galaxy-home
icon[0][0]
galaxy-lab, subdomain
icon[0][0]
galaxy-library, param-collection, topic
icon[0][0]
galaxy-link
icon[0][0]
galaxy-panelview, pref-list
icon[0][0]
galaxy-pencil, hands_on, param-text
icon[0][0]
galaxy-refresh
icon[0][0]
galaxy-undo
icon[0][0]
galaxy-rulebuilder-history
icon[0][0]
galaxy-save
icon[0][0]
galaxy-scratchbook
icon[0][0]
galaxy-selector, param-check
icon[0][0]
galaxy-show-hidden
icon[0][0]
galaxy-star, rating
icon[0][0]
galaxy-tags
icon[0][0]
galaxy-toggle, param-toggle
icon[0][0]
galaxy-upload
icon[0][0]
galaxy-download
icon[0][0]
galaxy-wf-connection
icon[0][0]
galaxy-wf-edit
icon[0][0]
galaxy-wf-new, new-history
icon[0][0]
galaxy-wf-report-download
icon[0][0]
github
icon[0][0]
gitter
icon[0][0]
gtn-theme, pref-palette
icon[0][0]
help, question
icon[0][0]
history-annotate
icon[0][0]
history-share, workflow
icon[0][0]
instances
icon[0][0]
interactive_tour
icon[0][0]
keypoints, pref-apikey
icon[0][0]
language
icon[0][0]
license
icon[0][0]
linkedin
icon[0][0]
notebook
icon[0][0]
objectives
icon[0][0]
orcid
icon[0][0]
param-file
icon[0][0]
param-repeat, pref-notifications
icon[0][0]
param-select, pref-toolboxfilters
icon[0][0]
point-right
icon[0][0]
pref-info
icon[0][0]
pref-password
icon[0][0]
pref-identities
icon[0][0]
pref-dataprivate
icon[0][0]
pref-cloud
icon[0][0]
pref-custombuilds, tool-versions
icon[0][0]
pref-signout
icon[0][0]
pref-delete
icon[0][0]
purl
icon[0][0]
references
icon[0][0]
requirements
icon[0][0]
rss-feed
icon[0][0]
search
icon[0][0]
slides
icon[0][0]
sticky-note
icon[0][0]
time
icon[0][0]
text-document
icon[0][0]
tip
icon[0][0]
tool
icon[0][0]
trophy
icon[0][0]
tutorial
icon[0][0]
twitter
icon[0][0]
warning
icon[0][0]
wf-input
icon[0][0]
workflow-runtime-toggle
icon[0][0]
workflow-run
icon[0][0]
video
icon[0][0]
video-slides
icon[0][0]
version

Which search algorithms are recommended for searching the metaproteomics data?

SearchGUI supports search using nine search algorithms (X! Tandem. MS-GF+. OMSSA, Comet, Tide, MyriMatch, MS_Amanda, DirecTag and Novor). For this tutorial, we have used the first two search algorithms in the list. In our hands, the first four search algorithms have given us the most optimal results.

Which version of SearchGUI and PeptideShaker shall I use for this tutorial?

We highly recommend the usage of SearchGUI Galaxy version 3.3.10.1 and PeptideShaker version Galaxy Version 1.16.36.3. The newer versions of SearchGUI and PeptideShaker have not yet been tested for this workflow.

Why do I need that big (~5GB!) complicated Docker thing - can I just install the ToolFactory into our local galaxy server from the toolshed?

Question: Why do I need that big (~5GB!) complicated Docker thing - can I just install the ToolFactory into our local galaxy server from the toolshed?

You can but it can’t really be very useful. The ToolFactory is a Galaxy tool, but it installs newly generated tools automatically into the local Galaxy server. This is not normally possible because a tool cannot escape Galaxy’s job execution environment isolation. The ToolFactory needs to write to the normally forbidden server’s configuration so the new tool appears in the tool menu and is installed in the TFtools directory which is a subdirectory of the Galaxy tools directory. The Appliance is configured so the ToolFactory and the Planemo test tool use remote procedure calls (RPC using rpyc) to do what tools cannot normally do. The rpyc server runs in a separate container. Without it, tool installation and testing are difficult to do inside Galaxy tools. Known good tools can be uploaded to a local toolshed from your private appliance for installation to that server of yours. Debugging tools on a production server is not secure SOP. You just never know what might break. That’s why a desktop disposable appliance is a better choice.

Why do we change the chromosome names in the Ensembl GTF to match the UCSC genome reference?

Question: Why do we change the chromosome names in the Ensembl GTF to match the UCSC genome reference?

UCSC chromosome names begin with the prefix chr, but Ensembl chromosome names do not. For example, chromosome 19 would be denoted as chr19 in UCSC, and as 19 in Ensemble. Most tools would view those as different when looking for matches/overlaps. Therefore it is always a good idea to make sure these match before you perform any downstream analysis.

Why do we do dimension reduction and then clustering? Why not just cluster on the actual data?

Within the Galaxy framework we recommend the use of Unipept software that uses UniProt databases and annotation to detect proteins (EC terms) and functional groups such as GO Ontology and InterPro terms. Other software tools such as EggNOG Mapper are also available within the Galaxy platform. Other software such as MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE also generate functional outputs.

Why do we have a variant mapping file when it is not being used in the workflow?

Question: Why do we have a variant mapping file when it is not being used in the workflow?

We are working on updating the existing annotation tool to include the variant mapping file. Once that is done, the variant mapping file will also be an input for those tools.

Why do we use FASTQ interlacer and not the FASTQ joiner?

Question: Why do we use FASTQ interlacer and not the FASTQ joiner?

The reason ASaiM-MT uses FASTQ-interlacer than FASTQ-joiner for combining forward and reverse reads is because the joiner tool combines the forward and reverse read sequence together while the interlacer puts the forward and reverse read sequences in the same file while retaining the entity of each read along with an additional file with unpaired sequences and it maintains the integrity of the reads while helping us distinguish between the forward and reverse reads.

Why does my assembly graph in Bandage look different to the one pictured in the tutorial?

Question: Why does my assembly graph in Bandage look different to the one pictured in the tutorial?

The assembly process in Flye is heuristic, and the resulting assembly will not necessarily be exactly the same each time. This may happen even if running the same data with the same version of Flye. It can also happen with a different version of Flye.

To make things more complicated (stop reading now if you would like!)… the chloroplast genome has a structure that includes repeats (the inverted repeats), and, the small-single-copy region of the chloroplast exists in two orientations between these repeats. So, sometimes the assembly will be a perfect circle, sometimes the inverted repeats will be collapsed into one piece, and sometimes the small-single-copy region will be attached ambiguously. To make things even more complicated…the chloroplast genome may even be a dynamic structure, due to flip flop recombination.

For more see this article

Why does the query `SRR11772204 OR SRR11597145 OR SRR11667145` in the Run Selector not return any results?

Question: Why does the query `SRR11772204 OR SRR11597145 OR SRR11667145` in the Run Selector not return any results?

The query for sars-cov-2 in SRA Entrez returns over 250K results, but only the first 20k are sent to the Run Selector. Enter the above query in Entrez directly to find the three runs used for the tutorial and send them to the Run Selector to send to Galaxy.

Why don't the aligned read files have quality scores?

Question: Why don't the aligned read files have quality scores?

Quality scores take up the majority of space in our compressed sequence files, so removing them makes the files much smaller (~80% or more). In addition, many uses don’t require per-base quality scores to successfully complete their work (some pipelines even require fastq format but don’t actually use the quality scores), so these files represent a faster route to completing many analyses. The full quality scores are still available in the original SRA Runs for anyone that requires them, using the SRA Tools available in Galaxy.

Why don't we perform the V-Search dereplication step of ASaiM for metatrascriptomic data?

Question: Why don't we perform the V-Search dereplication step of ASaiM for metatrascriptomic data?

In the metatranscriptomics data, duplicated reads are expected. And to keep the integrity of the sample, we would like to retain the reverse reads.

Why is Alevin is not working?

Check your tool version, you need to use 1.3.0+galaxy2

Follow these instructions to switch between tool versions.

Why is Alevin is not working?

Check your tool version, you need to use 1.3.0+galaxy2

Follow these instructions to switch between tool versions.

`docker-compose up` fails with error `/usr/bin/start.sh: line 133: /galaxy/.venv/bin/uwsgi: No such file or directory`

Question: `docker-compose up` fails with error `/usr/bin/start.sh: line 133: /galaxy/.venv/bin/uwsgi: No such file or directory`
  • This is why it’s useful to watch the boot process without detaching
  • This can happen if a container has become corrupt on disk after being interrupted
    • cured by a complete cleanup.
  • Make sure no docker galaxy-server related processes are running - use docker ps to check and stop them manually
  • delete the ..compose/export directory with sudo rm -rf export/* to clean out any corrupted files
  • run docker system prune to clear out any old corrupted containers, images or networks. Then run docker volume prune in the same way to remove the shared volumes.
  • run docker-compose pull again to ensure the images are correct
  • run docker-compose up to completely rebuild the appliance from scratch. Please be patient.

Analysis


Are UMIs not actually unique?

Not strictly, but unique enough. The distribution of UMIs should ideally be uniform so that the chance of any two same UMIs capturing the same transcript (via different amplicons) is small. As barcodes have increased in size, the number of UMIs has also increased allowing for UMIs to reach more or less the same numbers of transcripts.

Can RNA-seq techniques be applied to scRNA-seq?

The short answer is ‘no, but yes’. At the beginning this was impossible due to the over-prevalence of dropout events (“zeroes”) in the data complicating the normalisation techniques, but this is not so much of a problem any more with newer methods.

Notebook-based tutorials can give different outputs

Warning: Notebook-based tutorials can give different outputs

The nature of coding pulls the most recent tools to perform tasks. This can - and often does - change the outputs of an analysis. Be prepared, as you are unlikely to get outputs identical to a tutorial if you are running it in a programming environment like a Jupyter Notebook or R-Studio. That’s ok! The outputs should still be pretty close.

Why do we do dimension reduction and then clustering? Why not just cluster on the actual data?

The actual data has tens of thousands of genes, and so tens of thousands of variables to consider. Even after selecting for the most variable genes and the most high quality genes, we can still be left with > 1000 genes. Performing clustering on a dataset with 1000s of variables is possible, but computationally expensive. It is therefore better to perform dimension reduction to reduce the number of variables to a latent representation of these variables. These latent variables are ideally more than 10 but less than 50 to capture the variability in the data to perform clustering upon.

Why do we only consider highly variable genes?

The non-variable genes are likely housekeeping genes, which are expressed everywhere and are not so useful for distinguishing one cell type from another. However background genes are important to the analysis and are used to generate a background baseline model for measuring the variability of the other genes.


Community


How can I talk with other users?

feedback To discuss with like-minded scientists, join our Galaxy Training Network chatspace in Slack and discuss with fellow users of Galaxy single cell analysis tools on #single-cell-users

We also post new tutorials / workflows there from time to time, as well as any other news.

point-right If you’d like to contribute ideas, requests or feedback as part of the wider community building single-cell and spatial resources within Galaxy, you can also join our Single cell & sPatial Omics Community of Practice.

tool You can request tools here on our Single Cell and Spatial Omics Community Tool Request Spreadsheet


Deseq2


The tutorial uses the normalised count table for visualisation. What about using VST normalised counts or rlog normalised counts?

Question: The tutorial uses the normalised count table for visualisation. What about using VST normalised counts or rlog normalised counts?

this depends on what you would like to do with the table. The DESeq2 wrapper in Galaxy can output all of these, and there is a nice discussion in the DESeq2 vignette about this topic.


De novo transcriptome reconstruction with rna-seq


I’m using the same training data, tools, and parameters as the tutorial, but I get a different number of transcripts with a significant change in gene expression between the G1E and megakaryocyte cellular states. Why?

This is okay! Many aspects of the tutorial can potentially affect the exact results you obtain. For example, the reference genome version used and versions of tools. It’s less important to get the exact results shown in the tutorial, and more important to understand the concepts so you can apply them to your own data.


Interpretation


What exactly is a ‘Gene profile’?

Think of it like a fingerprint that some cells exhibit and others don’t. It’s a small collection of genes which are up or down regulated in relation to one another. Their differences are not absolute, but relative. So if CellA has 100 counts of Gene1 and 50 counts of Gene2, this creates a relation of 2:1 between Gene1 and Gene2. If CellB has a 20 counts of Gene1 and 10 counts of Gene2, then they share the same relation. If CellA and CellB share other relations with other genes than this might be enough to say that they share a Gene profile, and will therefore likely cluster together as they describe the same cell type.


Resources


Use our Single Cell Lab

Did you know we have a unique Single Cell Lab with all our single cell tools highlighted to make it easier to use on Galaxy? We recommend this site for all your single cell analysis needs, particularly for newer users.

The Single Cell Lab currently uses the main European Galaxy infrastructure and power, it’s just organised better for users of particular analyses…like single cell!

Try it out! All your histories/workflows/logins from the general European Galaxy server will be there!


Single-cell rna


Why is amplification more of an issue in scRNA-seq than RNA-seq?

Due to the extremely small amount of starting material, the initial amplification is likely to be uneven due to the first cycle of amplified products being overrepresented in the second cycle of amplification leading to further bias. In Bulk RNA-seq, the larger selection of RNA molecules to amplify, evens out the odds that any one transcript will be amplified more than others.


Account


Can I create multiple Galaxy accounts?

The account registration form and activation email include a terms of service statement.

  • You ARE NOT allowed to create more than 1 account per Galaxy server.
  • You ARE allowed to have accounts on different servers.

For example, you are allowed to have 1 account on Galaxy US, and another account on Galaxy EU, but never 2 accounts on the same Galaxy.

WARNING: Having multiple accounts is a violation of the terms of service, and may result in deletion of your accounts.


Need more disk space?


Other tips:

  • Forgot your password? You can request a reset link in on the login page.
  • If you want to associate your account with a different email address, you can do so under User -> Preferences in the top menu bar.
  • To start over with a new account, delete your existing account(s) first before creating your new account. This can be done in User -> Preferences menu in the top bar.

Changing acount email or password

  1. Make sure you are logged in to Galaxy.
  2. Go to User > Preferences in the top menu bar.
  3. To change email and public name, click on Manage Information and to change password, click on Change Password.
  4. Make the changes and click on the Save button at the bottom.
  5. To change email successfully, verify your account by email through the activation link sent by Galaxy.

Note: Don’t open another account if your email changes, update the existing account email instead. Creating a new account will be detected as a duplicate and will get your account disabled and deleted.

How can I reduce quota usage while still retaining prior work (data, tools, methods)?

  • Download Datasets as individual files or entire Histories as an archive. Then purge them from the public server.
  • Transfer/Move Datasets or Histories to another Galaxy server, including your own Galaxy. Then purge.
  • Copy your most important Datasets into a new/other History (inputs, results), then purge the original full History.
  • Extract a Workflow from the History, then purge it.
  • Back-up your work. It is a best practice to download an archive of your FULL original Histories periodically, even those still in use, as a backup.

Resources Much discussion about all of the above options can be found at the Galaxy Help forum.

How do I create an account on a public Galaxy instance?

  1. To create an account at any public Galaxy instance, choose your server from the available list of Galaxy Platforms.

    There are several UseGalaxy servers:

  2. Click on “Login or Register” in the masthead on the server.

    Login or Register on the top panel

  3. On the login page, find the Register here link and click on it.

  4. Fill in the the registration form, then click on Create.

    Your account should now get created, but will remain inactive until you verify the email address you provided in the registration form.

    Banner warning about account with unverified email address

  5. Check for a Confirmation Email in the email you used for account creation.

    Missing? Check your Trash and Spam folders.

  6. Click on the Email confirmation link to fully activate your account.

    galaxy-info Delivery of the confimation email is blocked by your email provider or you mistyped the email address in the registration form?

    Please do not register again, but follow the instructions to change the email address registered with your account! The confirmation email will be resent to your new address once you have changed it.

    Trouble logging in later? Account email addresses and public names are caSe-sensiTive. Check your activation email for formats.

How to update account preferences?

  1. Log in to Galaxy
  2. Navigate to User -> Preferences on the top menu bar
  3. Here you can update various preferences, such as:
    • pref-info Manage Information (change your registered email addresses or public name)
    • pref-password Change Password (change your login credentials)
    • pref-permissions Set Dataset Permissions for New Histories (grant others default access to newly created histories)
    • pref-toolboxfilters Manage Toolbox Filters (customize your Toolbox by displaying or omitting sets of Tools)
    • pref-apikey Manage API Key (access your current API key or create a new one)
    • pref-notifications Manage Notifications (allow push and tab notifcations on job completion)
    • pref-cloud Manage Cloud Authorization (grants Galaxy to access your cloud-based resources)
    • pref-identities Manage Third-Party Identities (connect or disconnect access to your third-party identities)
    • pref-custombuilds Manage Custom Builds (custom databases based on fasta datasets)
    • pref-list Manage Activity Bar (a bonus navigation bar)
    • pref-palette Pick a Color Theme (interface color theme)
    • pref-dataprivate Make All Data Private (disable all data sharing)
    • pref-delete Delete Account (on this Galaxy server)
    • pref-signout Sign out of Galaxy (signs you out of all sessions)

Analysis


Adding a custom database/build (dbkey)

Galaxy may have several reference genomes built-in, but you can also create your own.
  • Navigate to the History that contains your fasta for the reference genome
  • Standarize the fasta format
  • In the top menu bar, go to User -> Preferences -> Manage Custom Builds
  • Create a unique Name for your reference build
  • Create a unique Database (dbkey) for your reference build
  • Under Definition, select the option FASTA-file from history
  • Under FASTA-file, select your fasta file
  • Click the Save button

Beware of Cuts

Galaxy has several different cut tools
Warning: Beware of Cuts

The section below uses Cut tool. There are two cut tools in Galaxy due to historical reasons. This example uses tool with the full name Cut columns from a table (cut). However, the same logic applies to the other tool. It simply has a slightly different interface.

Does MaxQuant in Galaxy support TMT, iTRAQ, etc.?

Question: Does MaxQuant in Galaxy support TMT, iTRAQ, etc.?

Yes, iTRAQ 4 and 8 plex; TMT 2,6,8,10,11 plex; iodoTMT6plex

Extended Help for Differential Expression Analysis Tools

The error and usage help in this FAQ applies to most if not all Bioconductor tools.

  • DEseq2
  • Limma
  • edgeR
  • goseq
  • Diffbind
  • StringTie
  • Featurecounts
  • HTSeq-count
  • HTseq-clip
  • Kalisto
  • Salmon
  • Sailfish
  • DEXSeq
  • DEXSeq-count
  • IsoformSwitchAnalyzeR

galaxy-info Review your error messages and you’ll find some clues about what may be going wrong and what needs to be adjusted in your rerun. If you are getting a message from R, that usually means the underlying tool could not read in or understand your inputs. This can be a labeling problem (what was typed on the form) or a content problem (data within the files).

Expect odd errors or content problems if any of the usage requirements below are not met.

General

  • Are your reference genome, reference transcriptome, and reference annotation all based on the same genome assembly?
    • Check the identifiers in all inputs and adjust as needed.
    • These all may mean the same thing to a person but not to a computer or tool: chr1, Chr1, 1, chr1.1
  • Differential expression tools all require sample count replicates. Rationale from two of the DEseq tool authors.
    • At least two factor levels/groups/conditions with two samples each.
    • All must all contain unique content for valid scientific results.
  • Factor/Factor level names should only contain alphanumeric characters and optionally underscores.
    • Avoid starting these with a number and do not include spaces.
    • Galaxy may be able to normalize these values for you, but if you are getting an error: standardize the format yourself.
  • DEXSeq additionally requires that the first Condition is labeled as Condition.
  • If your count inputs have a header, the option Files have header? is set to Yes. If no headers, set to No.
    • If your files have more than one header line: keep the sample header line, remove all extra line(s).
  • Make sure that tool form settings match your annotation content or the tool cannot match up the inputs!
    • If you are counting by gene_id, your annotation should contain gene_id attributes (9th column)
    • If you are summarizing by exon, your annotation should contain exon features (3rd column)
  • Sometimes these tools do not understand transcript_id.N and gene_id.N notation (where N is a version number).
    • This notation could be in fasta or tabular inputs.
    • Try removing .N from all inputs, and check for the accidential creation of new duplicates!
  • Errors? Understanding the job log messages can be confusing! But are accessible and worth reviewing.
    • The good news is that usage in Galaxy produces the same error messages as direct usage.
    • This means that a search at the Bioconductor Support website can provide useful clues! Come back to the Galaxy Help forum with any remaining questions.

tip Remember, for any value in your inputs that is not a number, using only alphanumeric characters and optionally underscores _ with no spaces is what the authors recommend. Check your factor names, sample names, gene identifiers, transcript identifiers, and header lines in files.

Reference genome (fasta)

  • Can be a server reference genome (hosted index in the pull down menu) or a custom reference genome (fasta from the history).
  • Custom reference genomes must be formatted correctly.
  • If you are using Salmon or Kalisto, you probably don’t need a reference genome but a reference transcriptome instead!
  • More about understanding and working with large fasta datasets.

Reference transcriptome (fasta)

  • Fasta file containing assembled transcripts.
  • Unassembled short or long reads will not work as a substitute.
  • The transcript identifiers on the >seq fasta lines must exactly match the transcript_id values in your annotation or tabular mapping file.

Reference annotation (tabular, GTF, GFF3)

  • Reference annotation in GTF format works best.
  • If a GTF dataset is not available for your genome, a two-column tabular dataset containing transcript <tab> gene can be used instead with most of these tools.
  • HTseq-count requires GTF attributes. Featurecounts is an alternative tool choice.
  • Sometimes the tool gffread is used to transform GFF3 data to GTF.
  • DO use UCSC’s reference annotation (GTF) and reference transcriptome (fasta) data from their Downloads area.
    • These are a match for the UCSC genomes indexed at public Galaxy servers.
    • Links can be directly copy/pasted into the Upload tool.
    • Allow Galaxy to autodetect the datatype to produce an uncompressed dataset in your history ready to use with tools.
  • Avoid GTF data from the UCSC Table Browser: this leads to scientific problems. GTFs will have the same content populated for both the transcript_id and gene_id values. See the note at UCSC for more about why.
  • Still have problems? Try removing all GTF header lines with the tool Remove beginning of a file.
  • More about understanding and working with GTF/GFF/GFF3 reference annotation

For the “quantitation method” what is the default if I just leave it as “None”? Label free?

Question: For the “quantitation method” what is the default if I just leave it as “None”? Label free?

It will report raw intensity (NON-normalized) values which were not normalized like e.g. the LFQ intensities.

How can I adapt this tutorial to my own data?

Question: How can I adapt this tutorial to my own data?

If you would like to run this analysis on your own data, make sure to check which V-region was sequenced. In this tutorial, we sequenced the V4 region, and used a corresponding reference for just this region. If you sequenced another V-region, please use an appropriate reference (either the full SILVA reference, or the SILVA reference specific for your region). Similarly, the Screen.seqs step after the alignment filtered on start and end coordinates of the alignments. These will have to be adjusted to your V-region.

How can I adapt this tutorial to my own data?

Question: How can I adapt this tutorial to my own data?

If you would like to run this analysis on your own data, make sure to check which V-region was sequenced. In this tutorial, we sequenced the V4 region, and used a corresponding reference for just this region. If you sequenced another V-region, please use an appropriate reference (either the full SILVA reference, or the SILVA reference specific for your region). Similarly, the Screen.seqs step after the alignment filtered on start and end coordinates of the alignments. These will have to be adjusted to your V-region.

How can I do analysis X? - Getting help

If you don’t know how to perform a certain analysis, you can ask the Galaxy community for help.

Where to ask

The best places to ask your analysis questions are:

Note: For questions about errors you’ve encountered in Galaxy, please see our troubleshooting page.

How to ask

The more detail you provide, the better we can help you. Please provide information about:

  • Your data and experiment e.g. “paired-end RNASeq, mouse, 16 triplicates, 2 timepoints”, etc
  • Your goal and research question e.g. “I want to detect diffentially expressed genes between these two groups and generate a volcano plot”
  • What you have already tried? Do you already know which tools you want to use? Did you already try some but they didn’t work? Why not? Did you find good papers describing something similiar to what you want to do? etc.
  • Which Galaxy are you using? And if you have already tried some steps, please share your Galaxy history via URL and provide this along with your question.
  • Examples
    • Bad Question: “Help!!! How to perform metagenomics analysis. I need it urgent!”
    • Good Question: “Hello everybody, I have 16S rRNA sequencing data from Illumina, it was paired-end with 150bp reads. I want to perform a taxonomy analysis similar to this paper (provide link). I have followed this GTN tutorial (provide link), but my data is different because (reason) . How can I adapt this step of the analysis for my data? I read about a tool called X, but I cannot find it in Galaxy. I am using Galaxy EU, and here is a link to my history. Any help would be greatly appreciated!”

Before you ask

  • Check the Galaxy Help forum to see if others have already asked a similar question before.
  • Search the GTN website for a tutorial that matches what you want to do, and work your way through that. Even if it doesn’t doe exactly what you need, you usually learn a lot along the way that will help you adapt it to your own data or research question.

Be patient

Please remember that most of the people answering questions on Matrix chat and the help forum are volunteers from the community. They take time out of their busy days to help you. They may also be in a different time zone, so it may take some time to get answers. Please always be patient and kind to each other, and adhere to our code of conduct.

How many proteins can be identified and quantified in shotgun proteomics?

Question: How many proteins can be identified and quantified in shotgun proteomics?

This is depending on the sample, the used technique(s) and the mass spectrometer. Routinely most labs obtain 4000 proteins, but with more effort 10.000 proteins could be analyzed in a single run.

I got slightly different numbers than were in the tutorial

This tutorial uses UCSC which is constantly updating it’s data! As a result it gets outdated very quickly before we can update it :( But it’s ok! It’s expected here to get different numbers.

If you use a mqpar file, can you include modifications that are not in the Galaxy version? For instance, propionamide (Cys alkylation by acrylamide).

Question: If you use a mqpar file, can you include modifications that are not in the Galaxy version? For instance, propionamide (Cys alkylation by acrylamide).

No, one is limited to the modifications which are installed in MaxQuant. The mqpar only contains more parameters / options than the GUI in galaxy. Note: one must use an mqpar from the same version like MaxQuant!

Including custom modifications into MaxQuant in Galaxy?

Comment: Including custom modifications into MaxQuant in Galaxy?

Unfortunately the inclusion of custom modifications is not possible by the user because it requires profound changes in the underlying code. Please let us know the modification you need by creating a new issue: https://github.com/galaxyproteomics/tools-galaxyp/issues entitled MaxQuant new modification request.

MSStats: what does ‘compare groups = yes’ mean? And the comparison matrix to define the contrast between the 2 groups?

Question: MSStats: what does ‘compare groups = yes’ mean? And the comparison matrix to define the contrast between the 2 groups?

MSstats consists of three parts:

  • Reading the input files and converting them into an MSstats compatible format, doing some processing of the data at the same time
  • Data processing: such as protein inference (summary), log2 transformation, normalization and missing value imputation
  • compare groups = yes, means that the third step is performed, which is statistical analysis: Statistical modelling to find differentially abundant protein between different groups. The groups should be specified as “condition” in the annotation file and the group comparison matrix file specifies which groups to compare against each other. In the example this is quite simple because there are only 2 groups, with 3 or more groups the comparison matrix could become more complex.

My jobs aren't running!

  1. Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.

  2. Activate your account. If you have recently registered your account, you may first have to activate it. You will receive an e-mail with an activation link.
    • Make sure to check your spam folder!
  3. Be patient. Galaxy is a free service, when a lot of people are using it, you may have to wait longer than usual (especially for ‘big’ jobs, e.g. alignments).

  4. Contact Support. If you really think something is wrong with the server, you can ask for support

Pick the right Concatenate tool

Most Galaxy servers will have two Concatenate tools installed - know which one to pick!

On most Galaxy servers you will find two tool Concatenate datasets tools installed:

  1. Concatenate datasets tail-to-head
  2. Concatenate datasets tail-to-head (cat)

The two tools have nearly identical interfaces, but behave differently in certain situations, specifically:

  • The second tool, the one with “(cat)” in its name, simply concatenates everything you give to it into a single output dataset.

    Whether you give it multiple datasets or a collection as the first parameter, or some datasets as the first and some others as the second parameter, it will always concatenate them all. In fact, the only reason for having multiple parameters for this tool is that by providing inputs through multiple parameters, you can make sure they are concatenated in the order you pass them in.

  • The first tool, on the other hand, will only ever concatenate inputs provided through different parameters.

    This tool allows you to specify an arbitrary number of param-file single datasets, but if you also want to use param-files multiple datasets or param-collection a collection for some of the Dataset parameters, then all of these need to be of the same type (multiple datasets or collections) and have the same number of inputs.

    Now depending on the inputs, one of the following behaviors will occur:

    • If all the different inputs are param-file single datasets, the tool will concatenate them all and produce a single output dataset.
    • If all the different inputs are specified either as param-files multiple datasets or as param-collection, and all have the same number of datasets, then the tool will concatenate the first datasets of each input parameter, the second datasets of each input parameter, the third, etc., and produce an output collection with as many elements as there are inputs per Dataset parameter.
    • In extension of the above, if some additional inputs are provided as param-file single datasets, the content of these will be recycled and be reused in the concatenation of all the nth elements of the other parameters.

Reporting usage problems, security issues, and bugs

  • For reporting Usage Problems, related to tools and functions, head to the Galaxy Help site.
    • Red Error Datasets:
    • Unexpected results in Green Success Dataset:
      • To resolve it you may be asked to send in a shared history link and possibly a shared workflow link. For sharing your history, refer to this these instructions.
      • To reach our support team, visit Support FAQs.
    • Functionality problems:
      • Using Galaxy Help is the best way to get help in most cases.
      • If the problem is more complex, email a description of the problem and how to reproduce it.
    • Administrative problems:
      • If the problem is present in your own Galaxy, the administrative configuration may be a factor.
      • For the fastest help directly from the development community, admin issues can be alternatively reported to the mailing list or the GalaxyProject Gitter channel.
  • For Security Issues, do not report them via GitHub. Kindly disclose these as explained in this document.
  • For Bug Reporting, create a Github issue. Include the steps mentioned in these instructions.
  • Search the GTN Search to find prior Q & A, FAQs, tutorials, and other documentation across all Galaxy resources, to verify in case your issue was already faced by someone.

Results may vary

Comment: Results may vary

Your results may be slightly different from the ones presented in this tutorial due to differing versions of tools, reference data, external databases, or because of stochastic processes in the algorithms.

Troubleshooting errors

When you get a red dataset in your history, it means something went wrong. But how can you find out what it was? And how can you report errors?

When something goes wrong in Galaxy, there are a number of things you can do to find out what it was. Error messages can help you figure out whether it was a problem with one of the settings of the tool, or with the input data, or maybe there is a bug in the tool itself and the problem should be reported. Below are the steps you can follow to troubleshoot your Galaxy errors.

  1. Expand the red history dataset by clicking on it.
    • Sometimes you can already see an error message here
  2. View the error message by clicking on the bug icon galaxy-bug

  3. Check the logs. Output (stdout) and error logs (stderr) of the tool are available:
    • Expand the history item
    • Click on the details icon
    • Scroll down to the Job Information section to view the 2 logs:
      • Tool Standard Output
      • Tool Standard Error
    • For more information about specific tool errors, please see the Troubleshooting section
  4. Submit a bug report! If you are still unsure what the problem is.
    • Click on the bug icon galaxy-bug
    • Write down any information you think might help solve the problem
      • See this FAQ on how to write good bug reports
    • Click galaxy-bug Report button
  5. Ask for help!

What does it mean to normalize the LFQ intensities?

Question: What does it mean to normalize the LFQ intensities?

Median normalization typically refers to subtracting the median of all intensities within one sample from all of the intensities (e.g. Intensity of Protein A - Median of all intensities from Sample 1) , to account for measurement variations. Before normalization log2 transformation is required since many statistical tests demand that the data is actually normal distributed. (Non log intensities show very high values but have a minimum (limit of quantification) leading to a somehow right skewed distribution, after log-transformation the intensity distribution is more like a gaussian distribution. Beside the median (or median-polish) normalization there is also other e.g. the quantile normalization.

What is the advantage of breaking down protein to peptides before mass spec?

Question: What is the advantage of breaking down protein to peptides before mass spec?

Mass spectrometry works better for peptides: LC separation and ionization is working better on peptides than on proteins and proteins generate too complex and overlaying mass spectra due to their isotopes and their mass might be shifted due to posttranslational modifications or point mutations.

When can you use (or cannot use) Match between runs in MaxQuant?

Question: When can you use (or cannot use) Match between runs in MaxQuant?

No golden rule here. For quantitative comparison of different sample groups it can be valuable to use MBR to increase the number of identified + quantified proteins in all samples and then have more proteins that occur in most of the samples to compare them.

Which isobaric labeled quantification methods does MaxQuant in Galaxy support?

Question: Which isobaric labeled quantification methods does MaxQuant in Galaxy support?

The current MaxQuant version supports: iTRAQ 4 and 8 plex; TMT 2,6,8,10,11 plex; iodoTMT6plex. Includion of TMT16 plex is in preparation.

Will my jobs keep running?

Galaxy is a fantastic system, but some users find themselves wondering:

Will my jobs keep running once I’ve closed the tab? Do I need to keep my browser open?

No, you don’t! You can safely:

  1. Start jobs
  2. Shut down your computer

and your jobs will keep running in the background! Whenever you next visit Galaxy, you can check if your jobs are still running or completed.

However, this is not true for uploading data from your computer. You must wait for uploading a dataset from your computer to finish. (Uploading via URL is not affected by this, if you’re uploading from URL you can close your computer.)


Ansible


Debugging Memory Leaks

memray is a great memory profiler for debugging memory issues.

In the context of Galaxy, this is significantly easier for job handlers. Install it in your virtualenv and

memray run  --trace-python-allocators -o the_dump <your_handler_startup_command_here>

Once you’ve collected enough data,

memray flamegraph --leaks --temporal the_dump -o the_dump.html

would then produce a report that shows allocation made but not freed over time.

It might also be useful to just check what the process is doing with py-spy dump.

You can follow web workers in gunicorn with

memray run --follow-fork -o the_dump gunicorn 'galaxy.webapps.galaxy.fast_factory:factory()' --timeout 600 --pythonpath lib -k galaxy.webapps.galaxy.workers.Worker -b localhost:8082 --config python:galaxy.web_stack.gunicorn_config -w 1 --preload

the traced app will run on port 8082, you can then for instance in an upstream nginx section direct a portion of the traffic to your profiled app.

Define once, reference many times

Using variables, either by defining them ahead of time, or simply accessing them via existing data structures that have been defined, e.g.:

# defining a variable that gets reused is great!
galaxy_user: galaxy

galaxy_config:
galaxy:
# Re-using the galaxy_config_dir variable saves time and ensures everything
# is in sync!
datatypes_config_file: "{{ galaxy_config_dir }}/datatypes_conf.xml"

# and now we can re-use "{{ galaxy_config.galaxy.datatypes_config_file }}"
# in other places!

galaxy_config_templates:
- src: templates/galaxy/config/datatypes_conf.xml
dest: "{{ galaxy_config.galaxy.datatypes_config_file }}"

Practices like those shown above help to avoid problems caused when paths are defined differently in multiple places. The datatypes config file will be copied to the same path as Galaxy is configured to find it in, because that path is only defined in one place. Everything else is a reference to the original definition! If you ever need to update that definition, everything else will be updated accordingly.

Error: "skipping: no hosts matched"

There can be multiple reasons this happens, so we’ll step through all of them. We’ll start by assuming you’re running the command

ansible-playbook galaxy.yml

The following things can cause issues:

  1. Within your galaxy.yml, you’ve referred to a host group that doesn’t exist or is misspelled. Check the hosts: galaxyservers to ensure it matches the host group defined in the hosts file.
  2. Vice-versa, the group in your hosts file should match the hosts selected in the playbook, galaxy.yml.
  3. If neither of these are the issue, it’s possible Ansible doesn’t know to check the hosts file for the inventory. Make sure you’ve specified inventory = hosts in your ansible.cfg.

Failing all jobs from a specific user

This command will let you quickly fail every job from the user ‘service-account’ (replace with your preferred user)

gxadmin tsvquery jobs --user=service-account --nonterminal | awk '{print $1}' |  xargs -I {} -n 1 gxadmin mutate fail-job {} --commit

Galaxy Admin Training Path

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon

How do I know what I can do with a role? What variables are available?

You don’t. There is no standard way for reporting this, but well written roles by trusted authors (e.g. geerlingguy, galaxyproject) do it properly and write all of the variables in the README file of the repository. We try to pick sensible roles for you in this course, but, in real life it may not be that simple.

So, definitely check there first, but if they aren’t there, then you’ll need to read through defaults/ and tasks/ and templates/ to figure out what the role does and how you can control and modify it to accomplish your goals.

How do I see what variables are set for a host?

If you are using a simple group_vars file only, per group, and no other variable sources, then it’s relatively easy to tell what variables are getting set for your host! Just look at that one file.

But if you have graduated into using a more complex setup, perhaps with multiple sets of variables, like for example:

├── group_vars
│ ├── all
│ │ ├── all.yml
│ │ └── secret.yml
│ ├── galaxyservers.yml
│ └── pulsarservers.yml
├── hosts
├── host_vars
│ ├── galaxy.example.org
│ │ ├── all.yml
│ │ └── secret.yml
│ ├── pulsar.example.org
│ │ ├── all.yml
│ │ ├── pulsar.yml
│ │ └── secret.yml
...

Then it might be harder to figure out what variables are being set, in full. This is where ansible-inventory command can be useful.

Graph shows you the structure of your host groups:

$ ansible-inventory --graph
@all:
|--@cluster:
| |--allie.example.com
| |--bob.example.com
| |--charlie.example.com
[...]

Here is a relatively simple, flat example, but this can be more complicated if you nest sub-groups of hosts:

@all:
|--@local:
| |--localhost
|--@ungrouped:
|--@workshop_instances:
| |--@workshop_eu:
| | |--gat-0.eu.training.galaxyproject.eu
| | |--gat-1.eu.training.galaxyproject.eu
| |--@workshop_oz:
| |--@workshop_us:

List shows you all defined variables:

$ ansible-inventory --host galaxy.example.com | head
[WARNING]: While constructing a mapping from
/group_vars/galaxyservers.yml, line 3, column
1, found a duplicate dict key (tiaas_templates_dir). Using last defined value
only.
{
"ansible_connection": "local",
"ansible_user": "ubuntu",
"certbot_agree_tos": "--agree-tos",
"certbot_auth_method": "--webroot",
"certbot_auto_renew": true,
"certbot_auto_renew_hour": "{{ 23 |random(seed=inventory_hostname) }}",
"certbot_auto_renew_minute": "{{ 59 |random(seed=inventory_hostname) }}",

And, helpfully, if variables are overridden in precedence you can see that as well with the above warnings.

Is YAML sensitive to True/true/False/false

By this reference, YAML doesn’t really care:

{ Y, true, Yes, ON   }    : Boolean true
{ n, FALSE, No, off } : Boolean false

Mapping Jobs to Specific Storage By User

It is possible to map your jobs to use specific storage backends based on user! If you have e.g. specific user groups that need their data stored separately from other users, for whatever political reasons, then in your dynamic destination you can do something like:

job_destination = app.job_config.get_destination(destination_id)
if user == "alice":
job_destination.params['object_store_id'] = 'foo' # Maybe lookup the ID from a mapping somewhere

If you manage to do this in production, please let us know and we can update this FAQ with any information you encounter.

Operating system compatibility

These Ansible roles and training materials were last tested on Centos 7 and Ubuntu 18.04, but will probably work on other RHEL and Debian variants.

The roles that are used in these training are currently used by usegalaxy.*, and other, servers in maintaining their infrastructure. (US, EU, both are running CentOS 7)

If you have an issue running these trainings on your OS flavour, please report the issue in the training material and we can see if it is possible to solve.

Running Ansible on your remote machine

It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.

Your hosts file will need to use localhost, and whenever you run playbooks with ansible-playbook -i hosts playbook.yml, you will need to add -c local to your command.

Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.

Updating from 22.01 to 23.0 with Ansible

Galaxy introduced a number of changes in 22.05 and 23.0 that are extremely important to be aware of during the upgrade process. Namely a new database migration system, and a new required running environment (gunicorn instead of uwsgi).

The scripts to migrate to the new database migration system are only compatible with release 22.05, and then were subsequently removed, so it is mandatory to upgrade to 22.05 if you want to go further.

Here is the recommended update procedure with ansible:

  1. Update to 22.01 normally
  2. Change the release to 22.05, and run the upgrade
    1. Galaxy will probably not start correctly here, ignore it.
    2. Run the database migration manually

      GALAXY_CONFIG_FILE=/srv/galaxy/config/galaxy.yml sh manage_db.sh -c /srv/galaxy/config/galaxy.yml upgrade
  3. Update your system’s ansible, you probably need something with a major version greater than 2.
  4. Set the release to 23.0 and make other required changes. There are a lot of useful changes, but the easiest procedure is probably something like:

    1. git clone https://github.com/hexylena/git-gat/
    2. git checkout step-4
    3. Diff and sync (e.g. vimdiff group_vars/galaxyservers.yml git-gat/group_vars/galaxyservers.yml) for the main configuration files:

      • group_vars/all.yml
      • group_vars/dbservers.yml
      • galaxy.yml
      • requirements.yml
      • hosts
      • templates/nginx/galaxy.j2

    But the main change is the swap from uwsgi to gravity+gunicorn

    -  uwsgi:
    - socket: 127.0.0.1:8080
    - buffer-size: 16384
    - processes: 1
    - threads: 4
    - offload-threads: 2
    - static-map:
    - - /static=/static
    - - /favicon.ico=/static/favicon.ico
    - static-safe: client/galaxy/images
    - master: true
    - virtualenv: ""
    - pythonpath: "/lib"
    - module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
    - thunder-lock: true
    - die-on-term: true
    - hook-master-start:
    - - unix_signal:2 gracefully_kill_them_all
    - - unix_signal:15 gracefully_kill_them_all
    - py-call-osafterfork: true
    - enable-threads: true
    - mule:
    - - lib/galaxy/main.py
    - - lib/galaxy/main.py
    - farm: job-handlers:1,2
    + gravity:
    + process_manager: systemd
    + galaxy_root: "/server"
    + galaxy_user: ""
    + virtualenv: ""
    + gunicorn:
    + # listening options
    + bind: "unix:/gunicorn.sock"
    + # performance options
    + workers: 2
    + # Other options that will be passed to gunicorn
    + # This permits setting of 'secure' headers like REMOTE_USER (and friends)
    + # https://docs.gunicorn.org/en/stable/settings.html#forwarded-allow-ips
    + extra_args: '--forwarded-allow-ips="*"'
    + # This lets Gunicorn start Galaxy completely before forking which is faster.
    + # https://docs.gunicorn.org/en/stable/settings.html#preload-app
    + preload: true
    + celery:
    + concurrency: 2
    + loglevel: DEBUG
    + handlers:
    + handler:
    + processes: 2
    + pools:
    + - job-handlers
    + - workflow-schedulers

    Some other important changes include:

    • uchida.miniconda is replaced with galaxyproject.conda
    • usegalaxy_eu.systemd is no longer needed
    • galaxy_user_name is defined in all.yml in the latest git-gat
    • git-gat also separates out the DB serving into a dbservers.yml host group
  5. Backup your venv, mv /srv/galaxy/venv/ /srv/galaxy/venv-old/, as your NodeJS is probably out of date and Galaxy doesn’t handle that gracefully
  6. Do any local customs for luck (knocking on wood, etc.)
  7. Run the playbook
  8. Things might go wrong with systemd units
    • try running galaxyctl -c /srv/galaxy/config/galaxy.yml update as root
    • you may also need to rm /etc/systemd/system/galaxy.service which is then no longer needed
    • you’ll have a galaxy.target and you can instead systemctl daemon-reload and systemctl start galaxy.target

Variable connection

When the playbook runs, as part of the setup, it collects any variables that are set. For a playbook affecting a group of hosts named my_hosts, it checks many different places for variables, including “group_vars/my_hosts.yml”. If there are variables there, they’re added to the collection of current variables. It also checks “group_vars/all.yml” (for the built-in host group all). There is a precedence order, but then these variables are available for roles and tasks to consume.

What if you forget `--diff`?

If you forget to use --diff, it is not easy to see what has changed. Some modules like the copy and template modules have a backup option. If you set this option, then it will keep a backup copy next to the destination file.

However, most modules do not have such an option, so if you want to know what changes, always use --diff.

What is the difference between the roles with `role:` prefix and without?

The bare role name is just simplified syntax for the roles, you could equally specifiy role: <name> every time but it’s only necessary if you want to set additional variables like become_user


Ansible-galaxy


Customising the welcome page

Customising the welcome.html page is very easy. Simply follow the Customising Galaxy Tutorial!


Collections


Adding a tag to a collection

  1. Click on the collection in your history to view it
  2. Click on Edit galaxy-pencil next to the collection name at the top of the history panel
  3. Click on Add Tags galaxy-tags
  4. Add a tag starting with #
    • Tags starting with # will be automatically propagated to the outputs any tools using this dataset.
  5. Click Save galaxy-save
  6. Check that the tag appears below the collection name

Changing the datatype of a collection

This will set the datatype for all files in your collection. Does not change the files themselves.
  1. Click on Edit galaxy-pencil next to the collection name in your history
  2. In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
  3. Under new type, select your desired datatype
    • tip: you can start typing the datatype into the field to filter the dropdown menu
  4. Click the Save button

Converting the datatype of a collection

This will convert all files in your collection to a different format. This will change the files themselves and create a new collection.
  1. Click on Edit galaxy-pencil next to the collection name in your history
  2. In the central panel, click on the galaxy-gear Convert tab on the top
  3. Under Converter Tool, select your desired conversion
  4. Click the Convert Collection button

Creating a dataset collection

  • Click on galaxy-selector Select Items at the top of the history panel Select Items button
  • Check all the datasets in your history you would like to include
  • Click n of N selected and choose Build Dataset List

    build list collection menu item

  • Enter a name for your collection
  • Click Create collection to build your collection
  • Click on the checkmark icon at the top of your history again

Creating a paired collection

  • Click on galaxy-selector Select Items at the top of the history panel Select Items button
  • Check all the datasets in your history you would like to include
  • Click n of N selected and choose Build List of Dataset Pairs

  • Change the text of unpaired forward to a common selector for the forward reads
  • Change the text of unpaired reverse to a common selector for the reverse reads
  • Click Pair these datasets for each valid forward and reverse pair.
  • Enter a name for your collection
  • Click Create List to build your collection
  • Click on the checkmark icon at the top of your history again

Renaming a collection

  1. Click on the collection
  2. Click on the name of the collection at the top
  3. Change the name
  4. Press Enter

Contributing


How to Contribute to Galaxy

Contributing to Galaxy is a multi-step process, this will guide you through it.

To contribute to galaxy, a GitHub account is required. Changes are proposed via a pull request. This allows the project maintainers to review the changes and suggest improvements.

The general steps are as follows:

  1. Fork the Galaxy repository
  2. Clone your fork
  3. Make changes in a new branch
  4. Commit your changes, push branch to your fork
  5. Open a pull request for this branch in the upstream Galaxy repository

For a lot more information about Git branching and managing a repository on Github, see the Contributing with GitHub via command-line tutorial.

The Galaxy Core Architecture slides have a lot of important Galaxy core-related information related to branches, project management, and contributing to Galaxy - under the Project Management section of the slides.


Contributors


Adding workflow tests with Planemo

  1. Find a tutorial that you’re interested in, that doesn’t currently have tests.

    This tutorial has a workflow (.ga) and a test, notice the -test.yml that has the same name as the workflow .ga file.

    machinelearning/workflows/machine_learning.ga
    machinelearning/workflows/machine_learning-test.yml

    You want to find tutorials without the -test.yml file. The workflow file might also be missing.

  2. Check if it has a workflow (if it does, skip to step 5.)
  3. Follow the tutorial
  4. Extract a workflow from the history
  5. Run that workflow in a new history to test
  6. Obtain the workflow invocation ID, and your API key (User → Preferences → Manage API Key)

    screenshot of the workflow invocation page. The user drop down shows where to find this page, and a red box circles a field named "Invocation ID"

  7. Install the latest version of planemo

    # In a virtualenv
    pip install planemo
  8. Run the command to initialise a workflow test from the workflows/ subdirectory - if it doesn’t exist, you might need to create it first.

    planemo workflow_test_init --from_invocation <INVOCATION ID> --galaxy_url <GALAXY SERVER URL> --galaxy_user_key <GALAXY API KEY>

    This will produce a folder of files, for example from a testing workflow:

    $ tree
    .
    ├── test-data
    │   ├── input dataset(s).shapefile.shp
    │   └── shapefile.shp
    ├── testing-openlayer.ga
    └── testing-openlayer-tests.yml
  9. You will need to check the -tests.yml file, it has some automatically generated comparisons. Namely it tests that output data matches the test-data exactly, however, you might want to replace that with assertions that check for e.g. correct file size, or specific text content you expect to see.

  10. If the files in test-data are already uploaded to Zenodo, to save disk space, you should delete them from the test-data dir and use their URL in the -tests.yml file, as in this example:

    - doc: Test the M. Tuberculosis Variant Analysis workflow
    job:
    'Read 1':
    location: https://zenodo.org/record/3960260/files/004-2_1.fastq.gz
    class: File
    filetype: fastqsanger.gz
  11. Add tests on the outputs! Check the planemo reference if you need more detail.

    - doc: Test the M. Tuberculosis Variant Analysis workflow
    job:
    # Simple explicit Inputs
    'Read 1':
    location: https://zenodo.org/record/3960260/files/004-2_1.fastq.gz
    class: File
    filetype: fastqsanger.gz
    outputs:
    jbrowse_html:
    asserts:
    has_text:
    text: "JBrowseDefaultMainPage"
    snippy_fasta:
    asserts:
    has_line:
    line: '>Wildtype Staphylococcus aureus strain WT.'
    snippy_tabular:
    asserts:
    has_n_columns:
    n: 2
  12. Contribute all of those files to the GTN in a PR.

Adding your recording to a tutorial or slide deck

We welcome anybody to submit their recordings! Your videos can be used in (online) training events, or for self-study by learners on the GTN.

For some tips and tricks about recording the video itself, please see

Submission process

The process of adding recordings to the GTN is as follows:

  1. Instructor: Record video (tips & tricks)
  2. Instructor: Submit your video using this Google Form
  3. GTN: A GTN GitHub pull request (PR) will be made by our bot based on the form.
  4. GTN:: We will upload your video to the GalaxyProject YouTube channel
  5. GTN:: We will put the auto-generated captions from YouTube into a Google Doc
  6. Instructor:: Check and fix the auto-generated captions
  7. GTN: Upload the fixed captions to YouTube
  8. GTN: Merge the Pull Request on GitHub
  9. Done! Your recording will now show up on the tutorial for anybody to use and re-use

Note: If you are submitting a video to use in an event, please submit your recording 2 weeks before the start of your course to allow ample time to complete the submission process.

Recordings Metadata

Our bot will add some metadata about your recording to the tutorial or slide deck in question, and looks as follows:

recordings:
- speakers: # speakers must be defined in the CONTRIBUTORS.yaml file
- shiltemann
- hexylena
captioners: # captioners must also be present in the CONTRIBUTORS.yaml file
- bebatut
type: # optional, will default to Tutorial or Lecture, but if you do something different, set it here (e.g. Demo, Lecture & Tutorial, Background, Webinar)
date: '2024-06-12' # date on which you recorded the video
galaxy_version: '24.0' # version of Galaxy you used during the recording, can be found under 'Help->About' in Galaxy
length: 1H17M # length of your video, in format: 17M or 2H34M etc
youtube_id: "dQw4w9WgXcQ" # the bit of the YouTube URL after youtube.com/watch?v=

- speakers:
- shiltemann
captioners:
- hexylena
- bebatut
date: '2020-06-12'
galaxy_version: '20.05'
length: 51M
youtube_id: "oAVjF_7ensg"

Misc

Note: If your videos are already uploaded to YouTube, for example as part of a different project’s account, you can add this metadata to the tutorial or slides manually, without using our submission form. Note that we do require all videos to have good-quality English captions, and we will not be able to help you configure these on other YouTube accounts.

Creating a GTN Event

To add your event to the GTN, you will need to supply your course information (dates, location, program, etc). You will then get an event page like this which you can use during your training. This page includes a course overview, course handbook (full program with links to tutorials) and setup instructions for participants.

Your event will also be shown on the GTN event horizon and on the homepage. We are also happy to advertise your event on social media and Matrix channels.

Already have your own event page? No problem! You can add your event as and external event and we will simply link to your page.

To add your event to the GTN, please:

  1. Create a page in the events/ folder of the GTN repository
  2. Have a look at example event definitions in this folder:
  3. Adapt one of these example pages to fit your event
  4. Create a pull request on the GTN

We are also happy to help you to add your event, please contact us on Matrix to discuss the details of your course with us.

Please also feel free to contact us with ideas for improvements! We know that training comes in many different forms, so if something in your event is not yet supported, let us know and we are happy to add it!

External events

Already have a course webpage? Great! In this case, you only have to provide the most basic information about your course (title, desciption, dates, location). See also 2024-04-01-example-event-external.md for an example definition.

---
layout: event-external
title: My External Training Event Title

external: "https://galaxyproject.org/events/"
description:

date_start:
date_end: # optional, for multi-day events

location:
name:
city:
country:

contributions:
organisers:
- name1
- name2

Creating a GTN FAQ

If you have a snippet of knowledge that is reusable, we recommend you to share with the GTN community, and we encourage you to create an FAQ for it!

Creating the FAQ: The Easy Way

Fill out this Google Form. Every day our bot will import the FAQs submitted via this Google Form, and we will process them, perhaps requesting small changes, so we recommend that you have a GitHub account already.

For Advanced Users

Have a look at the existing FAQs in the faqs/galaxy/ folder of the GTN repository for some examples.

A news post is a markdown file that looks as follows:

---
title: Finding Datasets
area: datasets
box_type: tip
layout: faq
contributors: [jennaj, Melkeb]
---


- To review all active Datasets in your account, go to **User > Datasets**.

Notes:
- Logging out of Galaxy while the Upload tool is still loading data can cause uploads to abort. This is most likely to occur when a dataset is loaded by browsing local files.
- If you have more than one browser window open, each with a different Galaxy History loaded, the Upload tool will load data into the most recently used history.
- Click on refresh icon <i class="fas fa-redo" aria-hidden="true"></i><span class="visually-hidden">galaxy-refresh</span> at the top of the History panel to display the current active History with the datasets.

Creating a GTN News post

If you have created a new tutorial, running an event, published a paper around training, or have anything else interesting to share with the GTN community, we encourage you to write a News item about it!

News items will show up on the GTN homepage and in the GTN news feed.

Creating the news post: The Easy Way

Fill out this Google Form. Every day our bot will import the news posts submitted via this Google Form, and we will process them, perhaps requesting small changes, so we recommend that you have a GitHub account already.

For Advanced Users

Have a look at the existing news items in the news/_posts/ folder of the GTN repository for some examples.

A news post is a markdown file that looks as follows:

---
layout: news

title: "New Tutorial: My tutorial title"
tags:
- new tutorial
- transcriptomics
contributors:
- shiltemann
- hexylena

tutorial: "topics/introduction/tutorials/data-manipulation-olympics/tutorial.html"
cover: "path/to/cover-image.jpg" # usually an image from your tutorial
coveralt: "description of the cover image"

---

A bit of text containing your news, this is all markdown formatted,
so you can do **bold** and *italic* text like this, and links look
like [this](https://example.com) etc.

Describe everything you want to convey here, can be as long as you
need.

Make sure the filename is structured as follows: year-month-day-title.md, so for example: 2022-10-28-my-new-tutorial.md

How can I contribute in "advanced" mode?

Most of the content is written in GitHub Flavored Markdown with some metadata (or variables) found in YAML files. Everything is stored on our GitHub repository. Each training material is related to a topic. All training materials (slides, tutorials, etc) related to a topic are found in a dedicated directory (e.g. transcriptomics directory contains the material related to transcriptomic analysis). Each topic has the following structure:

Structure of the repository

  • a metadata file in YAML format
  • a directory with the topic introduction slide deck in Markdown with introductions to the topic
  • a directory with the tutorials:

    Inside the tutorials directory, each tutorial related to the topic has its own subdirectory with several files:

    • a tutorial file written in Markdown with hands-on
    • an optional slides file in Markdown with slides to support the tutorial
    • a directory with Galaxy Interactive Tours to reproduce the tutorial
    • a directory with workflows extracted from the tutorial
    • a YAML file with the links to the input data needed for the tutorial
    • a YAML file with the description of needed tools to run the tutorial
  • a directory with the Dockerfile describing the details to build a container for the topic (self-study environments).

To manage changes, we use GitHub flow based on Pull Requests (check our tutorial):

  1. Create a fork of this repository on GitHub
  2. Clone your fork of this repository to create a local copy on your computer and initialize the required submodules (git submodule init and git submodule update)
  3. Create a new branch in your local copy for each significant change
  4. Commit the changes in that branch
  5. Push that branch to your fork on GitHub
  6. Submit a pull request from that branch to the original repository
  7. If you receive feedback, make changes in your local clone and push them to your branch on GitHub: the pull request will update automatically
  8. Pull requests will be merged by the training team members after at least one other person has reviewed the Pull request and approved it.

Globally, the process of development of new content is open and transparent:

  1. Creation of a branch derived from the main branch of the GitHub repository
  2. Initialization of a new directory for the tutorial
  3. Filling of the metadata with title, questions, learning objectives, etc
  4. Generation of the input dataset for the tutorial
  5. Filling of the tutorial content
  6. Extraction of the workflows of the tutorial
  7. Automatic extraction of the required tools to populate the tool file
  8. Automatic annotation of the public Galaxy servers
  9. Generation of an interactive tour for the tutorial with the Tourbuilder web-browser extension
  10. Upload of the datasets to Zenodo and addition of the links in the data library file.
  11. Once ready, opening a Pull Request
  12. Automatic checks of the changes are automatically checked for the right format and working links using continuous integration testing on Travis CI
  13. Review of the content by several other instructors via discussions
  14. After the review process, merge of the content into the main branch, starting a series of automatic steps triggered by Travis CI
  15. Regeneration of the website and publication on https://training.galaxyproject.org/training-material/
  16. Generation of PDF artifacts of the tutorials and slides and upload on the FTP server
  17. Population of TeSS, the ELIXIR’s Training Portal, via the metadata

Development process

To learn how to add new content, check out our series of tutorials on creating new content:

  1. Creating a new tutorial
  2. Creating Slides
  3. Overview of the Galaxy Training Material
  4. Contributing with GitHub via command-line
  5. Contributing with GitHub via its interface
  6. Adding Quizzes to your Tutorial
  7. Adding auto-generated video to your slides
  8. Creating Interactive Galaxy Tours
  9. Creating content in Markdown
  10. Principles of learning and how they apply to training and teaching
  11. Generating PDF artefacts of the website
  12. Tools, Data, and Workflows for tutorials
  13. Running the GTN website online using GitPod
  14. Updating diffs in admin training
  15. Running the GTN website locally
  16. Design and plan session, course, materials
  17. GTN Metadata
  18. Teaching Python
  19. Including a new topic
  20. FAIR Galaxy Training Material

We also strongly recommend you read and follow The Carpentries recommendations on lesson design and lesson writing if you plan to add or change some training materials, and also to check the structure of the training material above.

How can I create new content without dealing with git?

If you feel uncomfortable with using the git and the GitHub flow, you can write a new tutorial with any text editor and then contact us (via Gitter or email). We will work together to integrate the new content.

How can I fix mistakes or expand an existing tutorial using the GitHub interface?

Check our tutorial to learn how to use the GitHub interface (soon…)

How can I get started with contributing?

If you would like to get involved in the project but are unsure where to start, there are some easy ways to contribute which will also help you familiarize yourself with the project!

A great way to help out the project is to test/edit existing tutorials. Pick a tutorial and check the contents. Does everything work as expected? Are there things that could be improved?

Below is a checklist of things to look out for to help you get started. If you feel confident in making changes yourself, please open a pull request, otherwise please file an issue with any problems you run into or suggestions for improvements.

Basic:

Intermediate:

  • Metadata
    • Are the objectives, keypoints and time estimate filled in?
    • Do they fit with the contents of the tutorial?
  • Content
    • Is there enough background information provided in the introduction section and throughout the tutorial?
    • Question boxes
      • Add questions or question boxes where you think they might be useful (make people think about results they got, test their understanding, etc)
      • Check that answers are still up-to-date
    • Screenshots and Videos
      • Make sure there is also a textual description of the image/video contents
      • Does the screenshot add value to the tutorial or can it be removed?

Advanced:

  • Workflows
    • Add a workflow definition file .ga if none is present
    • Check that the existing workflow is up-to-date with the tutorial contents
    • Enable workflow testing
  • Tours
    • Add a tour if none exists
    • Run the existing tour and check that it is up-to-date with the tutorial contents
  • Datasets
    • Check that all datasets used in the tutorial are present in Zenodo
    • Add a data-library.yaml file if none exists

Another great way to help out the project is by reviewing open pull requests. You can use the above checklist as a guide for your review. Some documentation about how to add your review in the GitHub interface can be found in GitHub’s PR Reviewing Documentation

How can I give feedback?

At the end of each tutorial, there is a link to a feedback form. We use this information to improve our tutorials.

For global feedbacks, you can open an issue on GitHub, write us on Gitter or send us an email.

How can I report mistakes or errors?

The easiest way to start contributing is to file an issue to tell us about a problem such as a typo, spelling mistake, or a factual error. You can then introduce yourself and meet some of our community members.

How can I test an Interactive Tour?

Perhaps you’ve been asked to review an interactive tour, or maybe you just want to try one out. The easiest way to run an interactive tour is to use the Tour builder browser extension.

  1. Install the Tour Builder extension to your browser (Chrome Web Store, Firefox add-on).
  2. Navigate to a Galaxy instance supporting the tutorial. To find which Galaxy instances support each tutorial, please see the dropdown menu next to the tutorial on the training website. Using one of the usegalaxy.* instances (UseGalaxy.fr, UseGalaxy.org.au, UseGalaxy.org, UseGalaxy.eu) ) is usually a good bet.
  3. Start the Tour Builder plugin by clicking on the icon in your browser menu bar
  4. Copy the contents of the tour.yaml file into the Tour builder editor window
  5. Click Save and then Run

How does the GTN ensure accessibility?

We are committed to an accessible training experience regardless of disability. Please see our accessibility page for more information.

How does the GTN ensure our training materials are FAIR?

This infrastructure has been developed in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles for training materials Garcia et al. 2020. Following these principles enables trainers and trainees to find, reuse, adapt, and improve the available tutorials.

The GTN receives a 100% score on the FAIR Checker, as noted in our recent news post

10 Simple Rules Implementation in GTN framework
Plan to share your training materials online Online training material portfolio, managed via a public GitHub repository
Improve findability of your training materials by properly describing them Rich metadata associated with each tutorial that are visible and accessible via schema.org on each tutorial webpage.
Give your training materials a unique identity URL persistency with redirection in case of renaming of tutorials. Data used for tutorials stored on Zenodo and associated with a Digital Object Identifiers (DOI)
Register your training materials online Tutorials automatically registered on TeSS, the ELIXIR’s Training e-Support System
If appropriate, define access rules for your training materials Online and free to use without registration
Use an interoperable format for your training materials Content of the tutorials and slides written in Markdown. Metadata associated with tutorials stored in YAML, and workflows in JSON. All of this metadata is available from the GTN’s API
Make your training materials (re-)usable for trainers Online. Rich metadata associated with each tutorial: title, contributor details, license, description, learning outcomes, audience, requirements, tags/keywords, duration, date of last revision. Strong technical support for each tutorial: workflow, data on Zenodo and also available as data libraries on UseGalaxy.*, tools installable via the Galaxy Tool Shed, list of possible Galaxy instances with the needed tools.
Make your training materials (re-)usable for trainees Online and easy to follow hands-on tutorials. Rich metadata with “Specific, Measurable, Attainable, Realistic and Time bound” (SMART) learning outcomes following Bloom’s taxonomy. Requirements and follow-up tutorials to build learning path. List of Galaxy instances offering needed tools, data on Zenodo and also available as data libraries on UseGalaxy.*. Support chat embedded in tutorial pages.
Make your training materials contribution friendly and citable Open and collaborative infrastructure with contribution guidelines, a CONTRIBUTING file and a chat. Details to cite tutorials and give credit to contributors available at the end of each tutorial.
Keep your training materials up-to-date Open, collaborative and transparent peer-review and curation process. Short time between updates.

How does the GTN implement the "Ten simple rules for collaborative lesson development"

The GTN framework is inherently collaborative and community-driven, and comprises a growing number of contributors with expertise in a wide range of scientific and technical domains. Given this highly collaborative nature of a community with very different skill sets, the GTN framework has evolved over the years to facilitate the contribution and maintenance of the tutorials. We aim to adhere to best-practice guidelines for collaborative lesson development described in Devenyi et al. 2018. The structure of the tutorials and repository has been made modular with unified syntax and use of snippets enabling easy access for authors to add common tips and tricks new users might need to know. This system allows for easy updating of all tutorials, if there is a change in tools or interface. More generally, we continually strive to lower contribution barriers for content creators by providing a framework that is easy to use for training developers regardless of their level of knowledge of the underlying technical framework.

Implementation of the “Ten simple rules for collaborative lesson development” (Devenyi et al. 2018) in the training material:

Rules Implementation in the GTN framework
Clarify audience Tutorial metadata includes level indicators (introductory, intermediate, advanced) and a list of prerequisite tutorials as recommended prior knowledge. This information is rendered at the top of each tutorial.
Make lessons modular Development of small tutorials linked together via learning paths
Teach best practice lesson development We maintain the topic Contributing to the Galaxy Training Material including numerous tutorials describing how to create new content. Furthermore, quarterly online collaboration fest (CoFests) are organized, where contributors can get direct support. Development of a Train the Trainer program and a mentoring program for instructors, in which lesson development is taught
Encourage and empower contributors Involve them in reviews. Mentor them. Encourage them to become maintainers.
Build community around lessons Quarterly online collaboration fest (CoFests) and Community calls. Chat on our Gitter/Matrix channel.
Publish periodically and recognize contributions Author listed on tutorials. Hall of fame listing all contributors. Full tutorial citation at the end of the tutorial. Tweet about new or updated tutorials. List of new or updated tutorials in Galaxy Community newsletter. Soon: publication of tutorials via article
Evaluate lessons at several scales Tutorial change (Pull Request) review. Embedded feedback form in tutorials for trainee feedback. Instructor feedback. Automatic workflow testing
Reduce, re-use, recycle Sharing content between tutorials, specially using snippets. Development of small modular tutorials linked by learning paths
Link to other resources Links to original paper, documentation, external tutorials and other material
You can’t please everyone but we can try (several different Galaxy introduction tutorials for different audience). Aim to clearly state what the tutorial does and does not cover, at the start.

Recording a video tutorial

This FAQ describes some general guidelines for recording your video

Anybody is welcome to record one of the GTN tutorials, even if another recording already exists! Both the GTN tutorial and Galaxy itself change significantly over time, and having regular and/or multiple recordings of tutorials is great!

Done with your recording? Check out the instructions for adding it to the GTN:

Video content

  1. Start of video
    • Introduce yourself
    • Discuss the questions and learning objectives of the tutorial
    • Give a basic introducion about the topic, many participants will be novices
  2. Guide the learners through the tutorial step by step
    • Explain the scientific background of the analysis
    • Explain where you are clicking in Galaxy
    • Explain what tool parameters mean
    • Explain what the tool does
    • Discuss the output files
    • Discuss how to interpret the results
    • Discuss question boxes from the tutorial
  3. Speak slowly and clearly
    • Take your time, we are not in a hurry
    • It is often a lot of new information for participants, give them a chance to process all of it
    • Speaking slowly and clearly will improve the quality of the auto-generated captions, and will be less work for you to fix captions.
  4. If things go wrong that is OK!
    • It’s a great teaching moment!
    • Explain the steps you are taking to determine what went wrong, and how you are fixing it.
    • It makes participants feel less bad if things go wrong for them
  5. If your tutorial is long
    • Indicate good places for people to take a break
    • e.g. when a tool takes a while to run
  6. End of video
    • Go over some of the take-home messages (key-points) of the tutorial
    • Remind viewers about the feedback form embedded at the end of the tutorial
    • Share your recommendations for follow-up tutorials
    • Share any other tips for where to learn more about the topic
    • Share how to connect with the community (e.g. Matrix, Help Forum, social media, etc)
  7. If you are doing both a lecture and a hands-on training, please create 2 separate videos

Technical Guidelines

  1. Start a Zoom call with yourself, record that.
    • For Mac users, QuickTime Player is also a nice option.
    • Have another preference like OBS? Totally OK too!
    • We recommend zoom to folks new to video production as it is the easiest to get started and produces quite small file sizes.
  2. Do a short test recording first
    • Is the audio quality good enough?
      • Wearing a headset often improves the audio quality.
    • Screen sharing: is your screen readable?
      • Make sure you zoom in enough for it to be clearly visible what you are doing in Galaxy.
      • Test watching the video in a non-maximised window. Is it still legible?
      • If the participant is using 50% of their screen for the video, 50% for Galaxy, will it be legible?
  3. Need to edit your video after recording?
    • For example to merge multiple videos together?
    • Software like KDEnlive can help here.
    • Feel free to ask us for help if you need!

Standards

  1. Zoom in, in every interface you’re covering! Many people will be watching the video while they’re doing the activity, and won’t have significant monitor space. Which video below would you rather be trying to follow?

    Bad Good 😍
    default size screenshot of usegalaxy.eu zoomed in screenshot of usegalaxy.eu, now much more legible
    Bad Good 🤩
    green text on black background console with tiny font zoomed in screenshot of a console with high contrast black and white content
  2. (Especially for introductory videos!) Clearly call out what you’re doing, especially on the first occurrence

    Bad Good
    “Re-run the job” “We need to re-run the job which we can do by first clicking to expand the dataset, and then using the re-run job button which looks like a refresh icon.”
    Bad Good
    “As you can see here the report says X” “I’m going to view the output of this tool, click on the eyeball icon, and as you can see the report says X.”

    But the same goes for terminal aliases, please disable all of your favourite terminal aliases and quick shortcuts that you’re used to using, disable your bashrc, etc. These are all things students will try and type, and will fail in doing so. We need to be very clear and explicit because people will type exactly what is on the screen, and their environment should at minimum match yours.

    Bad Good
    lg file ls -al | grep file
    z galaxy cd path/to/the/galaxy
  3. Consider using a pointer that is more visually highlighted.

    mouse pointer with circle around it that follows it around

    There are themes available for your mouse pointer that you can temporarily use while recording that can make it easier for watchers to see what you’re doing.

Thanks!

First off, thanks for your interest in contributing to the Galaxy training materials!

Individual learners and instructors can make these training more effective by contributing back to them. You can report mistakes and errors, create more content, etc. Whatever is your background, there is a way to contribute: via the GitHub website, via command-line or even without dealing with GitHub.

We will address your issues and/or assess your change proposal as promptly as we can, and help you become a member of our community. You can also check our tutorials for more details.

What can I do to help the project?

In issues, you will find lists of issues to fix and features to implement (with the “newcomer-friendly” label for example). Feel free to work on them!


Data upload


Data retrieval with “NCBI SRA Tools” (fastq-dump)

This section will guide you through downloading experimental metadata, organizing the metadata to short lists corresponding to conditions and replicates, and finally importing the data from NCBI SRA in collections reflecting the experimental design.

Downloading metadata

  • It is critical to understand the condition/replicate structure of an experiment before working with the data so that it can be imported as collections ready for analysis. Direct your browser to SRA Run Selector and in the search box enter GEO data set identifier (for example: GSE72018). Once the study appears, click the box to download the “RunInfo Table”.

Organizing metadata

  • The “RunInfo Table” provides the experimental condition and replicate structure of all of the samples. Prior to importing the data, we need to parse this file into individual files that contain the sample IDs of the replicates in each condition. This can be achieved by using a combination of the ‘group’, ‘compare two datasets’, ‘filter’, and ‘cut’ tools to end up with single column lists of sample IDs (SRRxxxxx) corresponding to each condition.

Importing data

  • Provide the files with SRR IDs to NCBI SRA Tools (fastq-dump) to import the data from SRA to Galaxy. By organizing the replicates of each condition in separate lists, the data will be imported as “collections” that can be directly loaded to a workflow or analysis pipeline.

Directly obtaining UCSC sourced *genome* identifiers

Option 1

  1. Go to UCSC Genome Browser, navigate to “genomes”, then the species of interest.
  2. On the home page for the genome build, immediately under the top navigation box, in the blue bar next to the full genome build name, you will find View sequences button.
  3. Click on the View sequences button and it will take you to a detail page with a table listing out the contents.

Option 2

  1. Use the tool Get Data -> UCSC Main.
  2. In the Table Browser, choose the target genome and build.
  3. For “group” choose the last option “All Tables”.
  4. For “table” choose “chromInfo”.
  5. Leave all other options at default and send the output to Galaxy.
  6. This new dataset will load as a tabular dataset into your history.
  7. It will list out the contents of the genome build, including the chromosome identifiers (in the first column).

How can I upload data using EBI-SRA?

  1. Search for your data directly in the tool and use the Galaxy links.
  2. Be sure to check your sequence data for correct quality score formats and the metadata “datatype” assignment.

Importación por medio de enlaces

  • Copia los enlaces
  • Abre el manejador de carga de datos de Galaxy (galaxy-upload en la parte superior derecha del panel de herramientas)

  • Selecciona ‘Pegar/Traer datos’ Paste/Fetch Data
  • Copia los enlaces en el campo de textos

  • Presiona ‘Iniciar’ Start

  • Close Cierra la ventana.

  • Galaxy utiliza los URLs como nombres de forma predeterminada , así que los tendrás que cambiar a algunos que sean más útiles o informativos. the window

Importer via un lien

  • Copier le lien
  • Ouvrez le gestionnaire de téléchargement Galaxy (galaxy-upload en haut à droite du panneau d’outils)

  • Selectionnez Coller/Récupérer les données
  • Collez le lien dans le champ de texte

  • Appuyez sur Start**

  • Ferme la fenêtre

  • Galaxy utilise les URL comme noms par défaut, vous devrez donc les remplacer par des URL plus utiles ou informatives. the window

Importing data from a data library

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

  1. Go into Shared data (top panel) then Data libraries
  2. Navigate to the correct folder as indicated by your instructor.
    • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
  3. Select the desired files
  4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
  5. In the pop-up window, choose

    • “Select history”: the history you want to import the data to (or create a new one)
  6. Click on Import

Importing data from remote files

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a Choose remote files:

  1. Click on Upload Data on the top of the left panel
  2. Click on Choose remote files and scroll down to find your data folder or type the folder name in the search box on the top.

  3. click on OK
  4. Click on Start
  5. Click on Close
  6. You can find the dataset has begun loading in you history.

Importing via links

  • Copy the link location
  • Click galaxy-upload Upload Data at the top of the tool panel

  • Select galaxy-wf-edit Paste/Fetch Data
  • Paste the link(s) into the text field

  • Press Start

  • Close the window

NCBI SRA sourced fastq data

In these FASTQ data:

  • The quality score identifier (+) is sometimes not a match for the sequence identifier (@).
  • The forward and reverse reads may be interlaced and need to be separated into distinct datasets.
  • Both may be present in a dataset. Correct the first, then the second, as explained below.
  • Format problems of any kind can cause tool failures and/or unexpected results.
  • Fix the problems before running any other tools (including FastQC, Fastq Groomer, or other QA tools)

For inconsistent sequence (@) and quality (+) identifiers

  • Correct the format by running the tool Replace Text in entire line with these options:

    • Find pattern: ^\+SRR.+
    • Replace with: +

Note: If the quality score line is named like “+ERR” instead (or other valid options), modify the pattern search to match.

For interlaced forward and reverse reads

Solution 1 (reads named /1 and /2)

  • Use the tool FASTQ de-interlacer on paired end reads

Solution 2 (reads named /1 and /2)

  • Create distinct datasets from an interlaced fastq dataset by running the tool Manipulate FASTQ reads on various attributes on the original dataset. It will run twice.

Note: The solution does NOT use the FASTQ Splitter tool. The data to be manipulated are interlaced sequences. This is different in format from data that are joined into a single sequence.

  • Use the Manipulate FASTQ settings to produce a dataset that contains the /1 reads**

    Match Reads

    • Match Reads by Name/Identifier
    • Identifier Match Type Regular Expression
    • Match by .+/2

    Manipulate Reads

    • Manipulate Reads by Miscellaneous Actions
    • Miscellaneous Manipulation Type Remove Read
  • Use these Manipulate FASTQ settings to produce a dataset that contains the /2 reads**

    • Exact same settings as above except for this change: Match by .+/1

Solution 3 (reads named /1 and /3)

  • Use the same operations as in Solution 2 above, except change the first Manipulate FASTQ query term to be:
  • Match by .+/3

Solution 4 (reads named without /N)

  • If your data has differently formatted sequence identifiers, the “Match by” expression from Solution 2 above can be modified to suit your identifiers.

Alternative identifiers such as:

@M00946:180:000000000-ANFB2:1:1107:14919:14410 1:N:0:1
@M00946:180:000000000-ANFB2:1:1107:14919:14410 2:N:0:1

Upload datasets from GenomeArk

  1. Open the file galaxy-upload upload menu
  2. Click on Choose remote files tab
  3. Click on the Genome Ark button and then click on species

You can find the data by following this path: /species/${Genus}_${species}/${specimen_code}/genomic_data. Inside a given datatype directory (e.g. pacbio), select all the relevant files individually until all the desired files are highlighted and click the Ok button. Note that there may be multiple pages of files listed. Also note that you may not want every file listed.

Upload fasta datasets via links

Uploading fasta or fasta.gz datasets via URL.

UploadAnimatedPng

Upload fasta datasets via links

Uploading fasta or fasta.gz datasets via URL.

UploadAnimatedPng

Upload fastqsanger datasets via links

Uploading fastqsanger or fastqsanger.gz datasets via URL.

  1. Click on Upload Data on the top of the left panel:

    UploadDataButton

  2. Click on Paste/Fetch:

    PasteFetchButton

  3. Paste URL into text box that would appear:

    PasteFetchModal

  4. Set Type (set all) to fastqsanger or, if your data is compressed as in URLs above (they have .gz extensions), to fastqsanger.gz

    ChangeTypeDropDown:

Warning: Danger: Make sure you choose corect format!

When selecting datatype in “Type (set all)” dropdown, make sure you select fastaqsanger or fastqsanger.gz BUT NOT fastqcssanger or anything else!

UploadAnimatedPng

Upload fastqsanger datasets via links

Uploading fastqsanger or fastqsanger.gz datasets via URL.

  1. Click on Upload Data on the top of the left panel:

    UploadDataButton

  2. Click on Paste/Fetch:

    PasteFetchButton

  3. Paste URL into text box that would appear:

    PasteFetchModal

  4. Set Type (set all) to fastqsanger or, if your data is compressed as in URLs above (they have .gz extensions), to fastqsanger.gz

    ChangeTypeDropDown:

Warning: Danger: Make sure you choose corect format!

When selecting datatype in “Type (set all)” dropdown, make sure you select fastaqsanger or fastqsanger.gz BUT NOT fastqcssanger or anything else!

UploadAnimatedPng

Upload few files (1-10)

  1. Click on Upload Data on the top of the left panel
  2. Click on Choose local file and select the files or drop the files in the Drop files here part
  3. Click on Start
  4. Click on Close

Upload many files (>10) via FTP

Some Galaxies offer FTP upload for very large datasets.

Note: the “Big Three” Galaxies (Galaxy Main, Galaxy EU, and Galaxy Australia) no longer support FTP upload, due to the recent improvements of the default web upload, which should now support large file uploads and almost all use cases. For situations where uploading via the web interface is too tedious, the galaxy-upload commandline utility is also available as an alternative to FTP.

To upload files via FTP, please

  1. Check that your Galaxy supports FTP upload and look up the FTP settings.

  2. Make sure to have an FTP client installed

    There are many options. We can recommend FileZilla, a free FTP client that is available on Windows, MacOS, and Linux.

  3. Establish FTP connection to the Galaxy server
    1. Provide the Galaxy server’s FTP server name (e.g. ftp.mygalaxy.com)
    2. Provide the username (usually the e-mail address) and the password on the Galaxy server
    3. Connect
  4. Add the files to the FTP server by dragging/dropping them or right clicking on them and uploading them

    The FTP transfer will start. We need to wait until they are done.

  5. Open the Upload menu on the Galaxy server
  6. Click on Choose FTP file on the bottom
  7. Select files to import into the history
  8. Click on Start

Data-libraries


Library Permission Issues

When running setup-data-libraries it imports the library with the permissions of the admin user, rather locked down to the account that handled the importing.

Due to how data libraries have been implemented, it isn’t sufficient to share the folder with another user, instead you must also share individual items within this folder. This is an unfortunate issue with Galaxy that we hope to fix someday.

Until then, we can recommend you install the latest version of Ephemeris which includes the set-library-permissions command which let’s you recursively correct the permissions on a data library. Simply run:

set-library-permissions -g https://galaxy.example.com -a $API_KEY LIBRARY --roles ROLES role1,role2,role3

Where LIBRARY is the id of the library you wish to correct.


Datasets


Adding a tag

Tags can help you to better organize your history and track datasets.
  1. Click on the dataset to expand it
  2. Click on Add Tags galaxy-tags
  3. Add a tag starting with #
    • Tags starting with # will be automatically propagated to the outputs of tools using this dataset.
  4. Press Enter
  5. Check that the tag appears below the dataset name

Cambiar el tipo de datos

Galaxy will try to autodetect the datatype of your files, but you may need to manually set this occasionally.
  • Selecciona sobre el galaxy-pencil icono del lápiz para editar los atributos del conjunto de datos
  • Selecciona en la pestaña galaxy-chart-select-data Datatypes en la parte superior del panel central
  • Selecciona tu tipo de datos
  • Da clic en el botón Change datatype

Changer le type de données

Galaxy will try to autodetect the datatype of your files, but you may need to manually set this occasionally.
  • Cliquez sur l’icône galaxy-pencil icône crayon pour modifier les attributs du jeu de données
  • Sélectionnez l’onglet galaxy-chart-select-data Types de données en haut du volet central
  • Sélectionnez votre type de données
  • Cliquez sur le bouton Modifier le type de données

Changing database/build (dbkey)

You can tell Galaxy which dbkey (e.g. reference genome) your dataset is associated with. This may be used by tools to automatically use the correct settings.
  • Click the desired dataset’s name to expand it.
  • Click on the “?” next to database indicator:

    UI for changing dbkey

  • In the central panel, change the Database/Build field
  • Select your desired database key from the dropdown list
  • Click the Save button

Changing the datatype

Galaxy will try to autodetect the datatype of your files, but you may need to manually set this occasionally.
  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
  • In the galaxy-chart-select-data Assign Datatype, select your desired datatype from “New type” dropdown
    • Tip: you can start typing the datatype into the field to filter the dropdown menu
  • Click the Save button

Converting the file format

Some datasets can be transformed into a different format. Galaxy has some built-in file conversion options depending on the type of data you have.
  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-gear Convert tab on the top
  • In the upper part galaxy-gear Convert, select the appropriate datatype from the list
  • Click the Create dataset button to start the conversion.

Creating a new file

Galaxy allows you to create new files from the upload menu. You can supply the contents of the file.
  • Click galaxy-upload Upload Data at the top of the tool panel
  • Select galaxy-wf-edit Paste/Fetch Data at the bottom
  • Paste the file contents into the text field
  • Press Start and Close the window

Datasets not downloading at all

  1. Check to see if pop-ups are blocked by your web browser. Where to check can vary by browser and extensions.
  2. Double check your API key, if used. Go to User > Preferences > Manage API key.
  3. Check the sharing/permission status of the Datasets. Go to Dataset > Pencil icon galaxy-pencil > Edit attributes > Permissions. If you do not see a “Permissions” tab, then you are not the owner of the data.

Notes:

  • If the data was shared with you by someone else from a Shared History, or was copied from a Published History, be aware that there are multiple levels of data sharing permissions.
  • All data are set to not shared by default.
  • Datasets sharing permissions for a new history can be set before creating a new history. Go to User > Preferences > Set Dataset Permissions for New Histories.
  • User > Preferences > Make all data private is a “one click” option to unshare ALL data (Datasets, Histories). Note that once confirmed and all data is unshared, the action cannot be “undone” in batch, even by an administrator. You will need to re-share data again and/or reset your global sharing preferences as wanted.
  • Only the data owner has control over sharing/permissions.
  • Any data you upload or create yourself is automatically owned by you with full access.
  • You may not have been granted full access if the data were shared or imported, and someone else is the data owner (your copy could be “view only”).
  • After you have a fully shared copy of any shared/published data from someone else, then you become the owner of that data copy. If the other person or you make changes, it applies to each person’s copy of the data, individually and only.
  • Histories can be shared with included Datasets. Datasets can be downloaded/manipulated by others or viewed by others.
  • Share access to Datasets is distinct but it relates to Histories’ access.

Detecting the datatype (file format)

  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
  • Click the Auto-detect button to have Galaxy try to autodetect it.

Different dataset icons and their usage

Icons provide a visual experience for objects, actions, and ideas

Dataset icons and their usage:

  • galaxy-eye “Eye icon”: Display dataset contents.
  • galaxy-pencil “Pencil icon”: Edit attributes of dataset metadata: labels, datatype, database.
  • galaxy-delete “Trash icon”: Delete the dataset.
  • galaxy-save “Disc icon”: Download the dataset.
  • galaxy-link “Copy link”: Copy link URL to the dataset.
  • galaxy-info “Info icon”: Dataset details and job runtime information: inputs, parameters, logs.
  • galaxy-refresh “Refresh/Rerun icon”: Run this (selected) job again or examine original submitted form.
  • galaxy-barchart “Visualize icon”: External display links (UCSC, IGV, NPL, PV); Charts and graphing; Editor (manually edit text).
  • galaxy-dataset-map “Dataset Map icon”: Filter the history for related Input/Output Datasets. Click again to clear the filter.
  • galaxy-bug “Bug icon”: Review subset of logs (review all under galaxy-info), and optionally submit a bug report.

Downloading datasets

  1. Click on the dataset in your history to expand it
  2. Click on the Download icon galaxy-save to save the dataset to your computer.

Downloading datasets using command line

From the terminal window on your computer, you can use wget or curl.

  1. Make sure you have wget or curl installed.
  2. Click on the Dataset name, then click on the copy link icon galaxy-link. This is the direct-downloadable dataset link.
  3. Once you have the link, use any of the following commands:
    • For wget

      wget '<link>'
      wget -O '<link>'
      wget -O --no-check-certificate '<link>' # ignore SSL certificate warnings
      wget -c '<link>' # continue an interrupted download

    • For curl

      curl -o outfile '<link>'
      curl -o outfile --insecure '<link>' # ignore SSL certificate warnings
      curl -C - -o outfile '<link>' # continue an interrupted download

  4. For dataset collections and datasets within collections you have to supply your API key with the request
    • Sample commands for wget and curl respectively are:

      wget https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

      curl -o myfile.txt https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

Finding BAM dataset identifiers

Quickly learn what the identifiers are in any **BAM** dataset that is the result from mapping
  1. Run Samtools: IdxStats on the aligned data (bam dataset).
  2. The “index header” chromosome names and lengths will be listed in the output (along with read counts).
  3. Compare the chromosome identifiers to the chromosome (aka “chrom”) field in all other inputs: VCF, GTF, GFF(3), BED, Interval, etc.

Note:

  • The original mapping target may have been a built-in genome index, custom genome (transcriptome, exome, other) – the same bam data will still be summarized.
  • This method will not work for “sequence-only” bam datasets, as these usually have no header.

Finding Datasets

  • To review all active Datasets in your account, go to User > Datasets.

Notes:

  • Logging out of Galaxy while the Upload tool is still loading data can cause uploads to abort. This is most likely to occur when a dataset is loaded by browsing local files.
  • If you have more than one browser window open, each with a different Galaxy History loaded, the Upload tool will load data into the most recently used history.
  • Click on refresh icon galaxy-refresh at the top of the History panel to display the current active History with the datasets.

How to unhide "hidden datasets"?

If you have run a workflow with hidden datasets, in your History:

  • Click the gear icon galaxy-gear → Click Unhide Hidden Datasets
  • Or use the toggle hidden to view them

When using the Copy Datasets feature, hidden datasets will not be available to transfer from the Source History list of datasets. To include them:

  1. Click the gear icon galaxy-gear → Click Unhide Hidden Datasets
  2. Click the gear icon galaxy-gear → Click Copy Datasets

Mismatched Chromosome identifiers and how to avoid them

Reference data mismatches are similiar to bad reagents in a wet lab experiment: all sorts of odd problems can come up!

You inputs must be all based on an identical genome assembly build to achieve correct scientific results.

There are two areas to review for data to be considered identical.

  1. The data are based on the same exact genome assembly (or “assembly release”).
    • The “assembly” refers to the nucleotide sequence of the genome.
    • If the base order and length of the chromosomes are not the same, then your coordinates will have scientific problems.
    • Converting coordinates between assemblies may be possible. Search tool panel with CrossMap.
  2. The data are based on the same exact genome assembly build.
    • The “build” refers to the labels used inside the file. In this context, pay attention to the chromosome identifiers.
    • These all may mean the same thing to a person but not to a computer or tool: chr1, Chr1, 1, chr1.1
    • Converting identifiers between builds may be possible. Search tool panel with Replace.

The methods listed below help to identify and correct errors or unexpected results when the underlying genome assembly build for all inputs are not identical.

Method 1: Finding BAM dataset identifiers

Method 2: Directly obtaining UCSC sourced genome identifiers

Method 3: Adjusting identifiers for UCSC sourced data used with other sourced data

Method 4: Adjusting identifiers or input source for any mixed sourced data

tip Reference data is self referential. More help for your genome, transcriptome, and annotation

tip Genome not available as a native index? Use a custom genome fasta and create a custom build database instead.

tip More notes on Native Reference Genomes

  • Native reference genomes (FASTA) are built as pre-computed indexes on the Galaxy server where you are working.
  • Different servers host both common and different reference genome data.
  • Most reference annotation (tabular, GTF, GFF3) is supplied from the history by the user, even when the genome is indexed.
  • Public Galaxy servers source reference genomes preferentially from UCSC.
  • A reference transcriptome (FASTA) is supplied from the history by the user.
  • Many experiements use a combination of all three types of reference data. Consider pre-preparing your files at the start!
  • The default variant for a native genome index is “Full”. Defined as: all primary chromosomes (or scaffolds/contigs) including mitochondrial plus associated unmapped, plasmid, and other segments.
  • When only one version of a genome is available for a tool, it represents the default “Full” variant.
  • Some genomes will have more than one variant available.
  • The “Canonical Male” or sometimes simply “Canonical” variant contains the primary chromosomes for a genome. For example a human “Canonical” variant contains chr1-chr22, chrX, chrY, and chrM.
  • The “Canonical Female” variant contains the primary chromosomes excluding chrY.

Moving datasets between Galaxy servers

On the origin Galaxy server:

  1. Click on the name of the dataset to expand the info.
  2. Click on the Copy link icon galaxy-link.

On the destination Galaxy server:

  1. Click on Upload data > Paste / Fetch Data and paste the link. Select attributes, such as genome assembly, if required. Hit the Start button.

Note: The copy link icon galaxy-link cannot be used to move HTML datasets (but this can be downloaded using the download button galaxy-save) and SQLite datasets.

Purging datasets

  1. All account Datasets can be reviewed under User > Datasets.
  2. To permanently delete: use the link from within the dataset, or use the Operations on Multiple Datasets functions, or use the Purge Deleted Datasets option in the History menu.

Notes:

  • Within a History, deleted/permanently deleted Datasets can be reviewed by toggling the deleted link at the top of the History panel, found immediately under the History name.
  • Both active (shown by default) and hidden (the other toggle link, next to the deleted link) datasets can be reviewed the same way.
  • Click on the far right “X” to delete a dataset.
  • Datasets in a deleted state are still part of your quota usage.
  • Datasets must be purged (permanently deleted) to not count toward quota.

Quotas for datasets and histories

  • Deleted datasets and deleted histories containing datasets are considered when calculating quotas.
  • Permanently deleted datasets and permanently deleted histories containing datasets are not considered.
  • Histories/datasets that are shared with you are only partially considered unless you import them.

Note: To reduce quota usage, refer to How can I reduce quota usage while still retaining prior work (data, tools, methods)? FAQ.

Renaming a dataset

  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, change the Name field
  • Click the Save button

Understanding job statuses

Job statuses will help you understand the stages of your work.

Compare the color of your datasets to these job processing stages.

  • Grey: The job is queued. Allow this to complete!
  • Yellow: The job is executing. Allow this to complete!
  • Green: The job has completed successfully.
  • Red: The job has failed. Check your inputs and parameters with Help examples and GTN tutorials. Scroll to the bottom of the tool form to find these.
  • Light Blue: The job is paused. This indicates either an input has a problem or that you have exceeded the disk quota set by the administrator of the Galaxy instance you are working on.
  • Grey, Yellow, Grey again: The job is waiting to run due to admin re-run or an automatic fail-over to a longer-running cluster.

galaxy-info Don’t lose your queue placement! It is essential to allow queued jobs to remain queued, and to never interrupt an executing job. If you delete/re-run jobs, they are added back to the end of the queue again.

Related FAQs

Working with GFF GFT GTF2 GFF3 reference annotation

  • All annotation datatypes have a distinct format and content specification.
    • Data providers may release variations of any, and tools may produce variations.
    • GFF3 data may be labeled as GFF.
    • Content can overlap but is generally not understood by tools that are expecting just one of these specific formats.
  • Best practices
    • The sequence identifiers must exactly match between reference annotation and reference genomes transcriptomes exomes.
    • Most tools expect GFT format unless the tool form specifically notes otherwise.
      • Get the GTF version from the data providers if it is available.
      • If only GFF3 is available, you can attempt to transform it with the tool gffread.
    • Was GTF data detected as GFF during Upload? It probably has headers. -Remove the headers (lines that start with a “#”) with the Select tool using the option “NOT Matching” with the regular expression: ^#
    • UCSC annotation
      • Find annotation under their Downloads area. The path will be similar to: https://hgdownload.soe.ucsc.edu/goldenPath/<database>/bigZips/genes/
      • Copy the URL from UCSC and paste it into the Upload tool, allowing Galaxy to detect the datatype.

Working with deleted datasets

Deleted datasets and histories can be recovered by users as they are retained in Galaxy for a time period set by the instance administrator. Deleted datasets can be undeleted or permanently deleted within a History. Links to show/hide deleted (and hidden) datasets are at the top of the History panel.

  • To review or adjust an individual dataset:
    1. Click on the name to expand it.
    2. If it is only deleted, but not permanently deleted, you’ll see a message with links to recover or to purge.
      • Click on Undelete it to recover the dataset, making it active and accessible to tools again.
      • Click on Permanently remove it from disk to purge the dataset and remove it from the account quota calculation.
  • To review or adjust multiple datasets in batch:
    1. Click on the checked box icon galaxy-selector near the top left of the history panel (Select Items) to switch into “Operations on Multiple Datasets” mode.
    2. Accordingly for each individual dataset, choose the selection box. Check the datasets you want to modify and choose your option (show, hide, delete, undelete, purge, and group datasets).

Working with very large fasta datasets

  • Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
    • Search GTN tutorials with the keyword “qa-qc” for examples.
    • Search Galaxy Help with the keywords “qa-qc” and “fasta” for more help.
  • Assembly result?
    • Consider filtering by length to remove reads that did not assemble.
    • Formatting criteria:
      • All sequence identifiers must be unique.
      • Some tools will require that there is no description line content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
  • Custom genome, transcriptome exome?
    • Only appropriate for smaller genomes (bacterial, viral, most insects).
    • Not appropriate for any mammalian genomes, or some plants/fungi.
    • Sequence identifiers must be an exact match with all other inputs or expect problems. See GFF GFT GFF3.
    • Formatting criteria:
      • All sequence identifiers must be unique.
      • ALL tools will require that there is no description content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
      • The only exception is when executing the MakeBLASTdb tool and when the input fasta is in NCBI BLAST format (see the tool form).

Working with very large fastq datasets

  • Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
    • Search GTN tutorials with the keyword “qa-qc” for examples.
    • Search Galaxy Help with the keywords “qa-qc” and “fastq” for more help.
  • How to create a single smaller input. Search the tool panel with the keyword “subsample” for tool choices.
  • How to create multiple smaller inputs. Start with Split file to dataset collection, then merge the results back together using a tool specific for the datatype. Example: BAM results? Use MergeSamFiles.

Datatypes


Best practices for loading fastq data into Galaxy

  • As of release 17.09, fastq data will have the datatype fastqsanger auto-detected when that quality score scaling is detected and “autodetect” is used within the Upload tool. Compressed fastq data will be converted to uncompressed in the history.
  • To preserve fastq compression, directly assign the appropriate datatype (eg: fastqsanger.gz).
  • If the data is close to or over 2 GB in size, be sure to use FTP.
  • If the data was already loaded as fastq.gz, don’t worry! Just test the data for correct format (as needed) and assign the metadata type.

Compressed FASTQ files, (`*.gz`)

  • Files ending in .gz are compressed (zipped) files.
    • The fastq.gz format is a compressed version of a fastq dataset.
    • The fastqsanger.gz format is a compressed version of the fastqsanger datatype, etc.
  • Compression saves space (and therefore your quota).
  • Tools can accept the compressed versions of input files
  • Make sure the datatype (compressed or uncompressed) is correct for your files, or it may cause tool errors.

Compressed FASTQ files, (`*.gz`)

  • Files ending in .gz are compressed (zipped) files.
    • The fastq.gz format is a compressed version of a fastq dataset.
    • The fastqsanger.gz format is a compressed version of the fastqsanger datatype, etc.
  • Compression saves space (and therefore your quota).
  • Tools can accept the compressed versions of input files
  • Make sure the datatype (compressed or uncompressed) is correct for your files, or it may cause tool errors.

FASTQ files: `fastq` vs `fastqsanger` vs ..

FASTQ files come in various flavours. They differ in the encoding scheme they use. See our QC tutorial for a more detailed explanation of encoding schemes.

Nowadays, the most commonly used encoding scheme is sanger. In Galaxy, this is the fastqsanger datatype. If you are using older datasets, make sure to verify the FASTQ encoding scheme used in your data.

Be Careful: choosing the wrong encoding scheme can lead to incorrect results!

Tip: There are 2 Galaxy datatypes that have similar names, but are not the same, please make sure you fastqsanger and fastqcssanger (not the additional cs).

Tip: When in doubt, choose fastqsanger

FASTQ files: `fastq` vs `fastqsanger` vs ..

FASTQ files come in various flavours. They differ in the encoding scheme they use. See our QC tutorial for a more detailed explanation of encoding schemes.

Nowadays, the most commonly used encoding scheme is sanger. In Galaxy, this is the fastqsanger datatype. If you are using older datasets, make sure to verify the FASTQ encoding scheme used in your data.

Be Careful: choosing the wrong encoding scheme can lead to incorrect results!

Tip: There are 2 Galaxy datatypes that have similar names, but are not the same, please make sure you fastqsanger and fastqcssanger (not the additional cs).

Tip: When in doubt, choose fastqsanger

How do `fastq.gz` datasets relate to the `.fastqsanger` datatype metadata assignment?

Before assigning fastqsanger or fastqsanger.gz, be sure to confirm the format.

TIP:

  • Using non-fastqsanger scaled quality values will cause scientific problems with tools that expected fastqsanger formatted input.
  • Even if the tool does not fail, get the format right from the start to avoid problems. Incorrect format is still one of the most common reasons for tool errors or unexpected results (within Galaxy or not).
  • For more information on How to format fastq data for tools that require .fastqsanger format?

How to format fastq data for tools that require .fastqsanger format?

  • Most tools that accept FASTQ data expect it to be in a specific FASTQ version: .fastqsanger. The .fastqsanger datatype must be assigned to each FASTQ dataset.

In order to do that:

  • Watch the FASTQ Prep Illumina video for a complete walk-through.
  • Run FastQC first to assess the type.
    • Run FASTQ Groomer if the data needs to have the quality scores rescaled.
    • If you are certain that the quality scores are already scaled to Sanger Phred+33 (the result of an Illumina 1.8+ pipeline), the datatype .fastqsanger can be directly assigned. Click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype .fastqsanger, and save.
  • Run FastQC again on the entire dataset if any changes were made to the quality scores for QA.

Other tips

  • If you are not sure what type of FASTQ data you have (maybe it is not Illumina?), see the help directly on the FASTQ Groomer tool for information about types.
    • For Illumina, first run FastQC on a sample of your data (how to read the full report). The output report will note the quality score type interpreted by the tool. If not .fastqsanger, run FASTQ Groomer on the entire dataset. If .fastqsanger, just assign the datatype.
    • For SOLiD, run NGS: Fastq manipulation → AB-SOLID DATA → Convert, to create a .fastqcssanger dataset. If you have uploaded a color space fastq sequence with quality scores already scaled to Sanger Phred+33 (.fastqcssanger), first confirm by running FastQC on a sample of the data. Then if you want to double-encode the color space into psuedo-nucleotide space (required by certain tools), see the instructions on the tool form Fastq Manipulation for the conversion.
    • If your data is FASTA, but you want to use tools that require FASTQ input, then using the tool NGS: QC and manipulation → Combine FASTA and QUAL. This tool will create “placeholder” quality scores that fit your data. On the output, click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype .fastqsanger, and save.

Identifying and formatting Tabular Datasets

Format help for Tabular/BED/Interval Datasets

A Tabular datatype is human readable and has tabs separating data columns. Please note that tabular data is different from comma separated data (.csv) and the common datatypes are: .bed, .gtf, .interval, or .txt.

  1. Click the pencil icon galaxy-pencil to reach the Edit Attributes form.
    1. Change the datatype (3rd tab) and save.
    2. Label columns (1st tab) and save.
    3. Metadata will be assigned, then the dataset can be used.
  2. If the required input is a BED or Interval datatype, adjusting (.tab.bed, .tab.interval) maybe possible using a combination of Text Manipulation tools, to create a dataset that matches required specifications.
  3. Some tools require that BED format be followed, even if the datatype Interval (with less strict column ordering) is accepted on the tool form.
    • These tools will fail, if they are run with malformed BED datasets or non-specific column assignments.
    • Solution: reorganize the data to be in BED format and rerun.

Understanding Datatypes

  • Allow Galaxy to detect the datatype during Upload, and adjust from there if needed.
  • Tool forms will filter for the appropriate datatypes it can use for each input.
  • Directly changing a datatype can lead to errors. Be intentional and consider converting instead when possible.
  • Dataset content can also be adjusted (tools: Data manipulation) and the expected datatype detected. Detected datatypes are the most reliable in most cases.
  • If a tool does not accept a dataset as valid input, it is not in the correct format with the correct datatype.
  • Once a dataset’s content matches the datatype, and that dataset is repeatedly used (example: Reference annotation) use that same dataset for all steps in an analysis or expect problems. This may mean rerunning prior tools if you need to make a correction.
  • Tip: Not sure what datatypes a tool is expecting for an input?
    1. Create a new empty history
    2. Click on a tool from the tool panel
    3. The tool form will list the accepted datatypes per input
  • Warning: In some cases, tools will transform a dataset to a new datatype at runtime for you.
    • This is generally helpful, and best reserved for smaller datasets.
    • Why? This can also unexpectedly create hidden datasets that are near duplicates of your original data, only in a different format.
    • For large data, that can quickly consume working space (quota).
    • Deleting/purging any hidden datasets can lead to errors if you are still using the original datasets as an input.
    • Consider converting to the expected datatype yourself when data is large.
    • Then test the tool directly on converted data. If it works, purge the original to recover space.

Using compressed fastq data as tool inputs

  • If the tool accepts fastq input, then .gz compressed data assigned to the datatype fastq.gz is appropriate.
  • If the tool accepts fastqsanger input, then .gz compressed data assigned to the datatype fastqsanger.gz is appropriate.
  • Using uncompressed fastq data is still an option with tools. The choice is yours.

TIP: Avoid labeling compressed data with an uncompressed datatype, and the reverse. Jobs using mismatched datatype versus actual format will fail with an error.


Deployment


Blank page or no CSS/JavaScript

This generally means that serving of static content is broken:

  • Check browser console for 404 errors.
  • Check proxy error log for permission errors.
  • Verify that your proxy static configuration is correct.
  • If you have recently upgraded Galaxy or changed the GUI in some way, you will need to rebuild the client

Database Issues

For slow queries, start with EXPLAIN ANALYZE

However it can be useful to dig into the queries with the Postgres EXPLAIN Visualizer (PEV) to get a more visual and clear representation. (Try it with this demo data)

You can set some options in the Galaxy configuration or database that will help debugging this:

  • database_engine_option_echo (but warning, extremely verbose)
  • slow_query_log_threshold logs to Galaxy log file
  • sentry_sloreq_threshold if using Sentry

Additionally check that your database is running VACUUM regularly enough and look at VACUUM ANALYZE

There are some gxadmin query pg-* commands which can help you monitor and track this information.

Lastly, check your database settings! It might not have enough resources allocated. Check PGTune for some suggestions of optimised parameters.

Debugging tool errors

Tool stdout/stderr is available in UI under “i” icon on history dataset

  1. Set cleanup_job to onsuccess
  2. Cause a job failure
  3. Go to job working directory (find in logs or /data/jobs/<hash>/<job_id>)
  4. Poke around, try running things (srun --pty bash considered useful)

Familiarize yourself with the places Galaxy keeps things

Debugging tool memory errors

Often the tool output contains one of:

MemoryError                 # Python
what(): std::bad_alloc # C++
Segmentation Fault # C - but could be other problems too
Killed # Linux OOM Killer

Solutions:

  • Change input sizes or params
    • Map/reduce?
  • Decrease the amount of memory the tool needs
  • Increase the amount of memory available to the job
    • Request more memory from cluster scheduler
    • Use job resubmission to automatically rerun with a larger memory allocation
  • Cross your fingers and rerun the job

Galaxy UI is slow

There is a great Tutorial from @mvdbeek which we recommend you follow.

Additionally you can use py-spy to record the issue and generate a flame graph.

Tool missing from Galaxy

First, restart Galaxy and watch the log for lines like:

Loaded tool id: toolshed.g2.bx.psu.edu/repos/iuc/sickle/sickle/1.33, version: 1.33 into tool panel....

After startup, check integrated_tool_panel.xml for a line like the following to be sure it was loaded properly and added to the toolbox (if not, check the logs further)

<tool id="toolshed.g2.bx.psu.edu/repos/iuc/sickle/sickle/1.33" />

If it is a toolshed tool, check shed_tool_conf.xml for

<tool file="toolshed.g2.bx.psu.edu/repos/iuc/sickle/43e081d32f90/sickle/sickle.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/sickle/sickle/1.33">
...
</tool>

Additionally if you have multiple job handlers, sometimes, rarely they don’t all get the update. Just restart them if that’s the case. Alternatively you can send an (authenticated) API requested:

curl -X PUT https://galaxy.example.org/api/configuration

Using data source tools with Pulsar

Data source tools such as UCSC Main will fail if Pulsar is the default destination.

To fix this issue you can force individual tools to run on a specific destination or handler by adding to your job_conf file:

For job_conf.xml

<tools>
<tool id="ucsc_table_direct1" destination="my-local" />
</tools>

For job_conf.yml

tools:
- id: ucsc_table_direct1
handler: my-local

Diffs


How to read a Diff

If you haven’t worked with diffs before, this can be something quite new or different.

If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.

Input: Old
$ cat old
🍎
🍐
🍊
🍋
🍒
🥑
Output: New
$ cat new
🍎
🍐
🍊
🍋
🍍
🥑

We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

Diff lets us compare these files

$ diff old new
5c5
< 🍒
---
> 🍍

Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

There are a couple different formats to diffs, one is the ‘unified diff’

$ diff -U2 old new
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@
🍊
🍋
-🍒
+🍍
🥑

This is basically what you see in the training materials which gives you a lot of context about the changes:

  • --- old is the ‘old’ file in our view
  • +++ new is the ‘new’ file
  • @@ these lines tell us where the change occurs and how many lines are added or removed.
  • Lines starting with a - are removed from our ‘new’ file
  • Lines with a + have been added.

So when you go to apply these diffs to your files in the training:

  1. Ignore the header
  2. Remove lines starting with - from your file
  3. Add lines starting with + to your file

The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

Added & Removed Lines

Removals are very easy to spot, we just have removed lines

--- old	2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@
🍋
🍒
-🥑

And additions likewise are very easy, just add a new line, between the other lines in your file.

--- old	2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@
🍎
+🍍
🍐
🍊

Completely new files

Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.

$ diff -U2 /dev/null old
--- /dev/null 2022-02-15 11:47:16.100000270 +0100
+++ old 2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@
+🍎
+🍐
+🍊
+🍋
+🍒
+🥑

And removed files are similar, except with the new file being /dev/null

--- old	2022-02-16 14:06:19.697132568 +0100
+++ /dev/null 2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@
-🍎
-🍐
-🍊
-🍋
-🍒
-🥑

Estimation of strandedness


In 'infer experiments' I get unequal numbers, but in the IGV it looks like it is unstranded. What does this mean?

Question: In 'infer experiments' I get unequal numbers, but in the IGV it looks like it is unstranded. What does this mean?

It’s also often the case that elimination of the second strand is not perfect, and there are genuine cases of bidirectional transcription in the genome. 70 / 30 % as in your report is not a good result for a stranded library. You can treat this as a stranded library in your analysis, but for instance you couldn’t make the conclusion that a given gene is actually transcribed from the reverse strand. Likely that the library preparation didn’t work perfectly. This can depend on many factors, one is that you need to completely digest your DNA using a high quality DNase before doing the reverse transcription.

When is the "infer experiment" tool used in practice?

Question: When is the "infer experiment" tool used in practice?

Often you are already aware whether the RNA-seq data is stranded or not in the first place because you sequenced it yourself or ordered it from a company.

But it can happen in cases where you get the data from someone else, that this information is lost and you need to find out.


Features


Using the Window Manager to view multiple datasets

If you would like to view two or more datasets at once, you can use the Window Manager feature in Galaxy:

  1. Click on the Window Manager icon galaxy-scratchbook on the top menu bar.
    • You should see a little checkmark on the icon now
  2. View galaxy-eye a dataset by clicking on the eye icon galaxy-eye to view the output
    • You should see the output in a window overlayed over Galaxy
    • You can resize this window by dragging the bottom-right corner
  3. Click outside the file to exit the Window Manager
  4. View galaxy-eye a second dataset from your history
    • You should now see a second window with the new dataset
    • This makes it easier to compare the two outputs
  5. Repeat this for as many files as you would like to compare
  6. You can turn off the Window Manager galaxy-scratchbook by clicking on the icon again

Uso del cuaderno de apuntes para ver varios conjuntos de datos

Si deseas ver dos o más conjuntos de datos al mismo tiempo, puedes usar la función Scratchbook en Galaxy: 1. Haz clic en el icono Scratchbook galaxy-scratchbook en la barra de menú superior. - Debería aparecer ver una pequeña marca de verificación en el icono 2. Ver galaxy-eye un conjunto de datos haciendo clic en el icono de ojo galaxy-eye para ver el resultado. - Deberías ver la salida en una ventana emergente sobre Galaxy - Puedes cambiar el tamaño de esta ventana arrastrando la esquina inferior derecha 3. Haz clic fuera del archivo para salir del Scratchbook 4. Ver galaxy-eye un segundo conjunto de datos de tu historial - Ahora deberías poder ver una segunda ventana con el nuevo conjunto de datos - Esto hace que sea más fácil comparar las dos salidas. 5. Repite estos pasos para todos los archivos que desees comparar. 6. Puedes desactivar Scratchbook galaxy-scratchbook haciendo clic en el icono nuevamente.

Why not use Excel?

Excel is a fantastic tool and a great place to build simple analysis models, but when it comes to scaling, Galaxy wins every time.

You could just as easily use Excel to answer the same question, and if the goal is to learn how to use a tool, then either tool would be great! But what if you are working on a question where your analysis matters? Maybe you are working with human clinical data trying to diagnose a set of symptoms, or you are working on research that will eventually be published and maybe earn you a Nobel Prize?

In these cases your analysis, and the ability to reproduce it exactly, is vitally important, and Excel won’t help you here. It doesn’t track changes and it offers very little insight to others on how you got from your initial data to your conclusions.

Galaxy, on the other hand, automatically records every step of your analysis. And when you are done, you can share your analysis with anyone. You can even include a link to it in a paper (or your acceptance speech). In addition, you can create a reusable workflow from your analysis that others (or yourself) can use on other datasets.

Another challenge with spreadsheet programs is that they don’t scale to support next generation sequencing (NGS) datasets, a common type of data in genomics, and which often reach gigabytes or even terabytes in size. Excel has been used for large datasets, but you’ll often find that learning a new tool gives you significantly more ability to scale up, and scale out your analyses.


Format


FASTQ format

Although it looks complicated (and maybe it is), the FASTQ format is easy to understand with a little decoding. Each read, representing a fragment of DNA, is encoded by 4 lines:

Line Description
1 Always begins with @ followed by the information about the read
2 The actual nucleic sequence
3 Always begins with a + and contains sometimes the same info in line 1
4 Has a string of characters which represent the quality scores associated with each base of the nucleic sequence; must have the same number of characters as line 2

So for example, the first sequence in our file is:

@03dd2268-71ef-4635-8bce-a42a0439ba9a runid=8711537cc800b6622b9d76d9483ecb373c6544e5 read=252 ch=179 start_time=2019-12-08T11:54:28Z flow_cell_id=FAL10820 protocol_group_id=la_trappe sample_id=08_12_2019
AGTAAGTAGCGAACCGGTTTCGTTTGGGTGTTTAACCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTCGTGCGGAAGGCGCTTCACCCAGGGCCTCTCATGCTTTGTCTTCCTGTTTATTCAGGATCGCCCAAAGCGAGAATCATACCACTAGACCACACGCCCGAATTATTGTTGCGTTAATAAGAAAAGCAAATATTTAAGATAGGAAGTGATTAAAGGGAATCTTCTACCAACAATATCCATTCAAATTCAGGCA
+
$'())#$$%#$%%'-$&$%'%#$%('+;<>>>18.?ACLJM7E:CFIMK<=@0/.4<9<&$007:,3<IIN<3%+&$(+#$%'$#$.2@401/5=49IEE=CH.20355>-@AC@:B?7;=C4419)*$$46211075.$%..#,529,''=CFF@:<?9B522.(&%%(9:3E99<BIL?:>RB--**5,3(/.-8B>F@@=?,9'36;:87+/19BAD@=8*''&''7752'$%&,5)AM<99$%;EE;BD:=9<@=9+%$

It means that the fragment named @03dd2268-71ef-4635-8bce-a42a0439ba9a (ID given in line1) corresponds to:

  • the DNA sequence AGTAAGTAGCGAACCGGTTTCGTTTGGGTGTTTAACCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTCGTGCGGAAGGCGCTTCACCCAGGGCCTCTCATGCTTTGTCTTCCTGTTTATTCAGGATCGCCCAAAGCGAGAATCATACCACTAGACCACACGCCCGAATTATTGTTGCGTTAATAAGAAAAGCAAATATTTAAGATAGGAAGTGATTAAAGGGAATCTTCTACCAACAATATCCATTCAAATTCAGGCA (line2)
  • this sequence has been sequenced with a quality $'())#$$%#$%%'-$&$%'%#$%('+;<>>>18.?ACLJM7E:CFIMK<=@0/.4<9<&$007:,3<IIN<3%+&$(+#$%'$#$.2@401/5=49IEE=CH.20355>-@AC@:B?7;=C4419)*$$46211075.$%..#,529,''=CFF@:<?9B522.(&%%(9:3E99<BIL?:>RB--**5,3(/.-8B>F@@=?,9'36;:87+/19BAD@=8*''&''7752'$%&,5)AM<99$%;EE;BD:=9<@=9+%$ (line 4).

But what does this quality score mean?

The quality score for each sequence is a string of characters, one for each base of the nucleotide sequence, used to characterize the probability of misidentification of each base. The score is encoded using the ASCII character table (with some historical differences):

Encoding of the quality score with ASCII characters for different Phred encoding. The ascii code sequence is shown at the top with symbols for 33 to 64, upper case letters, more symbols, and then lowercase letters. Sanger maps from 33 to 73 while solexa is shifted, starting at 59 and going to 104. Illumina 1.3 starts at 54 and goes to 104, Illumina 1.5 is shifted three scores to the right but still ends at 104. Illumina 1.8+ goes back to the Sanger except one single score wider. Illumina

So there is an ASCII character associated with each nucleotide, representing its Phred quality score, the probability of an incorrect base call:

Phred Quality Score Probability of incorrect base call Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10,000 99.99%
50 1 in 100,000 99.999%
60 1 in 1,000,000 99.9999%

Kraken2 and the k-mer approach for taxonomy classification

In the \(k\)-mer approach for taxonomy classification, we use a database containing DNA sequences of genomes whose taxonomy we already know. On a computer, the genome sequences are broken into short pieces of length \(k\) (called \(k\)-mers), usually 30bp.

Kraken examines the \(k\)-mers within the query sequence, searches for them in the database, looks for where these are placed within the taxonomy tree inside the database, makes the classification with the most probable position, then maps \(k\)-mers to the lowest common ancestor (LCA) of all genomes known to contain the given \(k\)-mer.

Kraken2

Kraken2 uses a compact hash table, a probabilistic data structure that allows for faster queries and lower memory requirements. It applies a spaced seed mask of s spaces to the minimizer and calculates a compact hash code, which is then used as a search query in its compact hash table; the lowest common ancestor (LCA) taxon associated with the compact hash code is then assigned to the k-mer.

You can find more information about the Kraken2 algorithm in the paper Improved metagenomic analysis with Kraken 2.

Quality Scores

But what does this quality score mean?

The quality score for each sequence is a string of characters, one for each base of the nucleotide sequence, used to characterize the probability of misidentification of each base. The score is encoded using the ASCII character table (with some historical differences):

To save space, the sequencer records an ASCII character to represent scores 0-42. For example 10 corresponds to “+” and 40 corresponds to “I”. FastQC knows how to translate this. This is often called “Phred” scoring.

Encoding of the quality score with ASCII characters for different Phred encoding. The ascii code sequence is shown at the top with symbols for 33 to 64, upper case letters, more symbols, and then lowercase letters. Sanger maps from 33 to 73 while solexa is shifted, starting at 59 and going to 104. Illumina 1.3 starts at 54 and goes to 104, Illumina 1.5 is shifted three scores to the right but still ends at 104. Illumina 1.8+ goes back to the Sanger except one single score wider. Illumina

So there is an ASCII character associated with each nucleotide, representing its Phred quality score, the probability of an incorrect base call:

Phred Quality Score Probability of incorrect base call Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10,000 99.99%
50 1 in 100,000 99.999%
60 1 in 1,000,000 99.9999%

What does 0-42 represent? These numbers, when plugged into a formula, tell us the probability of an error for that base. This is the formula, where Q is our quality score (0-42) and P is the probability of an error:

Q = -10 log10(P)

Using this formula, we can calculate that a quality score of 40 means only 0.00010 probability of an error!

What is Taxonomy?

Taxonomy is the method used to naming, defining (circumscribing) and classifying groups of biological organisms based on shared characteristics such as morphological characteristics, phylogenetic characteristics, DNA data, etc. It is founded on the concept that the similarities descend from a common evolutionary ancestor.

Defined groups of organisms are known as taxa. Taxa are given a taxonomic rank and are aggregated into super groups of higher rank to create a taxonomic hierarchy. The taxonomic hierarchy includes eight levels: Domain, Kingdom, Phylum, Class, Order, Family, Genus and Species.

Example of taxonomy. It starts, top to bottom, with Kingdom "Animalia", Phylum "Chordata", Class "Mammalia", and Order "Carnivora". Then it splits in 3. On the left, Family "Felidae", with 2 genus "Felis" and "Panthera" and below 3 species "F. catus" and "F. pardalis" below "Felis", "P. pardus" below "Panthera". In the middle, Family "Canidae", genus "Canis" and 2 species "C. familiaris" and "C. lupus". On the right, Family "Ursidae", Genus "Ursus" and 2 species "U. arctos" and "U. horribilus". Below each species is a illustration of the species

The classification system begins with 3 domains that encompass all living and extinct forms of life

  • The Bacteria and Archae are mostly microscopic, but quite widespread.
  • Domain Eukarya contains more complex organisms

When new species are found, they are assigned into taxa in the taxonomic hierarchy. For example for the cat:

Level Classification
Domain Eukaryota
Kingdom Animalia
Phylum Chordata
Class Mammalia
Order Carnivora
Family Felidae
Genus Felis
Species F. catus

From this classification, one can generate a tree of life, also known as a phylogenetic tree. It is a rooted tree that describes the relationship of all life on earth. At the root sits the “last universal common ancestor” and the three main branches (in taxonomy also called domains) are bacteria, archaea and eukaryotes. Most important for this is the idea that all life on earth is derived from a common ancestor and therefore when comparing two species, you will -sooner or later- find a common ancestor for all of them.

Let’s explore taxonomy in the Tree of Life, using Lifemap


Further reading


Where can I read more about this analysis?

Question: Where can I read more about this analysis?

This tutorial was adapted from the mothur MiSeq SOP created by the Schloss lab. Here you can find more information about the mothur tools and file formats. Their FAQ page and Help Forum are also quite useful!

Where can I read more about this analysis?

Question: Where can I read more about this analysis?

This tutorial was adapted from the mothur MiSeq SOP created by the Schloss lab. Here you can find more information about the mothur tools and file formats. Their FAQ page and Help Forum are also quite useful!


Galaxy


How many mules?

Start with 2 and add more as needed. If you notice that your jobs seem to inexplicably sit for a long time before being dispatched to the cluster, or after they have finished on the cluster, you may need additional handlers.


Galaxy admin interface


Install tools via the Admin UI

  1. Open Galaxy in your browser and type `` in the tool search box on the left. If “” is among the search results, you can skip the following steps.
  2. Access the Admin menu from the top bar (you need to be logged-in with an email specified in the admin_users setting)
  3. Click “Install and Uninstall”, which can be found on the left, under “Tool Management”
  4. Enter `` in the search interface
  5. Click on the first hit, having devteam as owner
  6. Click the “Install” button for the latest revision
  7. Enter “” as the target section and click “OK”.

Gat


Time to git commit

Hands-on: Time to git commit

It’s time to commit your work! Check the status with

git status

Add your changed files with

git add ... # any files you see that are changed

And then commit it!

git commit -m 'Finished '

Using Git With Ansible Vaults

Hands-on: Using Git With Ansible Vaults

When looking at git log to see what you changed, you cannot easily look into Ansible Vault changes: you just see the changes in the encrypted versions which is unpleasant to read.

Instead we can use .gitattributes to tell git that we want to use a different program to visualise differences between two versions of a file, namely ansible-vault.

  1. Check your git log -p and see how the Vault changes look (you can type /vault to search). Notice that they’re just changed encoded content.
  2. Create the file .gitattributes in the same folder as your galaxy.yml playbook, with the following contents:

    group_vars/secret.yml diff=ansible-vault merge=binary
  3. Try again to git log -p and look for the vault changes. Note that you can now see the decrypted content! Very useful.

Historias


Cambiando el nombre de un historial

  1. Haz clic sobre Unnamed history (o el nombre que tenga el historial sobre el que estás trabajando) (Haz clic para cambiar el nombre del historial) en la parte superior de tu panel de historial
  2. Escribe el nombre nuevo
  3. Pulsa Enter

Para la creación de un historial nuevo

Los historiales son una parte importante de Galaxy, la mayoría de la gente utiliza un historial para cada análisis nuevo. Asegúrate siempre de darle buenos nombres a tus historiales, de tal forma que después puedas encontrar fácilmente tus resultados.

Haz click sobre el icono new-history en la parte superior del panel de historiales.


Histories


Compartiendo un historial

Puedes compartir tu trabajo en Galaxy. Hay varias formas de dar acceso a tus historiales a otros usuarios.

Compartir tu historial permite a otros importar y acceder a los conjuntos de datos, parámetros y pasos de tu historial.

  1. Compartir a través de un enlace
    • Abre el menú Opciones de historial galaxy-gear (icono de engranaje) en la parte superior del panel de historial
      • galaxy-toggle Hacer que el historial sea accesible
      • Aparecerá un Compartir enlace que puedes dar a otros usuarios.
    • Cualquiera que tenga este enlace puede ver y copiar tu historial.
  2. Publica tu historial
    • galaxy-toggle Hacer que el historial esté disponible públicamente en Historias publicadas
    • Cualquiera en este servidor Galaxy podrá ver tu historial en el menú Datos compartidos
  3. Comparte solo con otro usuario.
    • Haz clic en el botón Compartir con un usuario en la parte inferior
    • Ingresa una dirección de correo electrónico del usuario con el que deseas compartir
    • Tu historial se compartirá solo con este usuario.
  4. Encontrar historiales que otros han compartido conmigo
    • Haz clic en el menú Usuario en la barra superior
    • Selecciona Historiales compartidos conmigo
    • Aquí verás todos los historiales que otros han compartido contigo directamente ** Nota: ** Si deseas realizar cambios en tu historial sin afectar la versión compartida, crea una copia mediante al ícono galaxy-gear Opciones de historial en tu historial y haciendo clic en Copiar

Copy a dataset between histories

Sometimes you may want to use a dataset in multiple histories. You do not need to re-upload the data, but you can copy datasets from one history to another.

There 3 ways to copy datasets between histories

  1. From the original history

    1. Click on the galaxy-gear icon which is on the top of the list of datasets in the history panel
    2. Click on Copy Datasets
    3. Select the desired files

    4. Give a relevant name to the “New history”

    5. Validate by ‘Copy History Items’
    6. Click on the new history name in the green box that have just appear to switch to this history
  2. Using the galaxy-columns Show Histories Side-by-Side

    1. Click on the galaxy-dropdown dropdown arrow top right of the history panel (History options)
    2. Click on galaxy-columns Show Histories Side-by-Side
    3. If your target history is not present
      1. Click on ‘Select histories’
      2. Click on your target history
      3. Validate by ‘Change Selected’
    4. Drag the dataset to copy from its original history
    5. Drop it in the target history
  3. From the target history

    1. Click on User in the top bar
    2. Click on Datasets
    3. Search for the dataset to copy
    4. Click on its name
    5. Click on Copy to current History

Creating a new history

Histories are an important part of Galaxy, most people use a new history for every new analysis. Always make sure to give your histories good names, so you can easily find your results back later.

Click the new-history icon at the top of the history panel:

UI for creating new history

Créer un nouvel history

Les historiques sont une partie importante de Galaxy, la plupart des gens utilisent un nouvel historique pour chaque nouvelle analyse. Assurez-vous toujours de donner de bons noms à vos historiques, afin de pouvoir retrouver facilement vos résultats plus tard.

Cliquez sur l’icone new-history en haut du panneau d’historique.

Si l’icone new-history est manquant :

  1. Cliquez sur l’icone galaxy-gear (Options d’historique) en haut du panneau d’historique
  2. Selectionner l’option Créer un nouveau depuis le menu

Downloading histories

  1. Click on the gear icon galaxy-gear on the top of the history panel.
  2. Select “Export History to File” from the History menu.
  3. Click on the “Click here to generate a new archive for this history” text.
  4. Wait for the Galaxy server to prepare history for download.
  5. Click on the generated link to download the history.

Find all Histories and purge (aka permanently delete)

  1. Login to your Galaxy account.
  2. On the top navigation bar Click on User.
  3. On the drop down menu that appears Click on Histories.
  4. Click on Advanced Search, additional fields will be displayed.
  5. Next to the Status field, click All, a list of all histories will be displayed.
  6. Check the box next to Name in the displayed list to select all histories.
  7. Click Delete Permanently to purge all histories.
  8. A pop up dialogue box will appear letting you know history contents will be removed and cannot be undone, then click OK to confirm.

Finding Histories

  1. To review all histories in your account, go to User > Histories in the top menu bar.
  2. At the top of the History listing, click on Advanced Search.
  3. Set the status to all to view all of your active, deleted, and permanently deleted (purged) histories.
  4. Histories in all states are listed for registered accounts. Meaning one will always find their data here if it ever appears to be “lost”.
  5. Note: Permanently deleted (purged) Histories may be fully removed from the server at any time. The data content inside the History is always removed at the time of purging (by a double-confirmed user action), but the purged History artifact may still be in the listing. Purged data content cannot be restored, even by an administrator.

Finding and working with "Histories shared with me"

How to find and work on histories shared with you

To find histories shared with me:

  1. Log into your account.
  2. Select User, in the drop-down menu, select Histories shared with me.

To work with shared histories:

  • Import the History into your account via copying it to work with it.
  • Unshare Histories that you no longer want shared with you or that you have already made a copy of.

Note: Shared Histories (when copied into your account or not) do count in portion toward your total account data quota usage. More details on histories shared concerning account quota usage can be found in this link.

How to set Data Privacy Features?

Privacy controls are only enabled if desired. Otherwise, datasets by defaults remain private and unlisted in Galaxy. This means that a dataset you’ve created is virtually invisible until you publish a link to it.

Below are three optional steps to setting private Histories, a user can make use of any of the options below depending on what the user want to achieve:

  1. Changing the privacy settings of individual dataset.

    • Click on the dataset name for a dropdown.
    • Clicking the ‘pencil - galaxy-pencil icon
    • Move on the Permissions tab.
    • On the permission tab is two input tab
    • On the second input with a label of access
    • Search for the name of the user to grant permission
    • Click on save permission

    gif of the process described above, in Galaxy

    Note: Adding additional roles to the ‘access’ permission along with your “private role” does not do what you may expect. Since roles are always logically added together, only you will be able to access the dataset, since only you are a member of your “private role”.

  2. Make all datasets in the current history private.

    • Open the History Options galaxy-gear menu galaxy-gear at the top of your history panel
    • Click the Make Private option in the dropdown menu available
    • Sets the default settings for all new datasets in this history to private.

    gif of the process described above, in Galaxy

  3. Set the default privacy settings for new histories

    • Click user button on top of the main channel for a dropdown galaxy-dropdown
    • Click on the preferences under the dropdown galaxy-dropdown
    • Select Set Dataset Permissions for New Histories icon cofest
    • Add a permission and click save permission

    gif of the process described above, in Galaxy

    Note: Changes made here will only affect histories created after these settings have been stored.

Importing a history

  1. Open the link to the shared history
  2. Click on the new-history Import history button on the top right
  3. Enter a title for the new history
  4. Click on Import

Renaming a history

  1. Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
  2. Type the new name
  3. Click on Save

If you do not have the galaxy-pencil (Edit) next to the history name:

  1. Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
  2. Type the new name
  3. Press Enter

Searching your history

To make it easier to find datasets in large histories, you can filter your history by keywords as follows:

  1. Click on the search datasets box at the top of the history panel.

    history search box

  2. Type a search term in this box
    • For example a tool name, or sample name
  3. To undo the filtering and show your full history again, press on the clear search button galaxy-clear next to the search box

Sharing your History

You can share your work in Galaxy. There are various ways you can give access one of your histories to other users.

Sharing your history allows others to import and access the datasets, parameters, and steps of your history.

Access the history sharing menu via the History Options dropdown (galaxy-history-options), and clicking “history-share Share or Publish”

  1. Share via link
    • Open the History Options galaxy-history-options menu at the top of your history panel and select “history-share Share or Publish”
      • galaxy-toggle Make History accessible
      • A Share Link will appear that you give to others
    • Anybody who has this link can view and copy your history
  2. Publish your history
    • galaxy-toggle Make History publicly available in Published Histories
    • Anybody on this Galaxy server will see your history listed under the Shared Data menu
  3. Share only with another user.
    • Click the Share with a user button at the bottom
    • Enter an email address for the user you want to share with
    • Your history will be shared only with this user.
  4. Finding histories others have shared with me
    • Click on User menu on the top bar
    • Select Histories shared with me
    • Here you will see all the histories others have shared with you directly

Note: If you want to make changes to your history without affecting the shared version, make a copy by going to History Options galaxy-history-options icon in your history and clicking Copy this History

Transfer entire histories from one Galaxy server to another

Transfer a Single Dataset

At the sender Galaxy server, set the history to a shared state, then directly capture the galaxy-link link for a dataset and paste the URL into the Upload tool at the receiver Galaxy server.

Transfer an Entire History

Have an account at two different Galaxy servers, and be logged into both.

At the sender Galaxy server

  1. Navigate to the history you want to transfer, and set the history to a shared state.
  2. Click into the History Options menu in the history panel.
  3. Select from the menu galaxy-history-archive Export History to File.
  4. Choose the option for How do you want to export this History? as to direct download.
  5. Click on Generate direct download.
  6. Allow the archive generation process to complete. *
  7. Copy the galaxy-link link for your new archive.

At the receiver Galaxy server

  1. Confirm that you are logged into your account.
  2. Click on Data in the top menu, and choose Histories to reach your Saved Histories.
  3. Click on Import history in the grey button on the top right.
  4. Paste in your link’s URL from step 7.
  5. Click on Import History.
  6. Allow the archive import process to complete. *
  7. The transfered history will be uncompressed and added to your Saved Histories.

* For steps 6 and 13: It is Ok to navigate away for other tasks during processing. If enabled, Galaxy will send you status notifications.

tip If the history to transfer is large, you may copy just your important datasets into a new history, and create the archive from that new smaller history. Clearing away deleted and purged datasets will make all histories smaller and faster to archive and transfer!

Undeleting history

Undelete your deleted histories
  • Click on User then select Histories
  • Click on Advanced search on the top left side below Saved Histories
  • On Status click Deleted
  • Select the history you want to undelete using the checkbox on the left side
  • Click Undelete button below the deleted histories

Unsharing unwanted histories

  • All account Histories owned by others but shared with you can be reviewed under User > Histories shared with me.
  • The other person does not need to unshare a history with you. Unshare histories yourself on this page using the pull-down menu per history.
  • Dataset and History privacy options, including sharing, can be set under User > Preferences.

Three key features to work with shared data are:

  • View is a review feature. The data cannot be worked with, but many details, including tool and dataset metadata/parameters, are included.
  • Copy those you want to work with. This will increase your quota usage. This will also allow you to manipulate the datasets or the history independently from the original owner. All History/Dataset functions are available if the other person granted full access to the datasets to you.
  • Unshare any on the list not needed anymore. After a history is copied, you will still have your version of the history, even if later unshared or the other person who shared it with you changes their version later. Meaning, that each account’s version of a History and the Datasets in it are distinct (unless the Datasets were not shared, you will still only be able to “view” but not work with or download them).

Note: “Histories shared with me” result in only a tiny part of your quota usage. Unsharing will not significantly reduce quota usage unless hundreds (or more!) or many significant histories are shared. If you share a History with someone else, that does not increase or decrease your quota usage.

View a list of all histories

This FAQ demonstrates how to list all histories for a given user

There are multiple ways in which you can view your histories:

  1. Viewing histories using switch-historiesSwitch to history” button. This is best for quickly switching between multiple histories.

    Click the “Switch history” icon at the top of the history panel to bring up a list of all your histories: Listing histories using the "Switch history" button

  2. Using the “Activity Bar”:

    Click the “Show all histories” button within the Activity Bar on the left: Listing histories using Activity Bar

  3. Using “Data” drop-down:

    Click the “Data” link on the top bar of Galaxy interface and select “Histories”: Listing histories using "Data" menu

  4. Using the Multi-view, which is best for moving datasets between histories:

    Click the galaxy-history-options menu, and select galaxy-multihistory Show histories side-by-side

View histories side-by-side

This FAQ demonstrates how to view histories side-by-sde

You can view multiple Galaxy histories at once. This allows to better understand your analyses and also makes itv possible to drag datasets between histories. This is called “History multiview”. The multiview can be enabled either view History menu or via the ACtivity Bar:

  1. Enabling Multiview via History menu is done by first clicking on the galaxy-history-optionsHistory options” drop-down and selecting galaxy-multihistoryShow Histories Side-by-Side option”:

    Enabling side-by-side view using History Options menu

  2. Clicking the galaxy-multihistoryHistory Multiview” button within the Activity Bar:

    Enabling side-by-side view using Activity Bar


History


My jobs are not running / I cannot see the history overview menu

Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.


Igv


Add Mapped reads track to IGV from Galaxy

  1. Install IGV (if not already installed)
  2. Launch IGV on your computer
  3. Check if the reference genome is available on the IGV instance
  4. Expand the BAM dataset with the mapped reads in the history
  5. Click on the local in display with IGV to load the reads into the IGV browser
  6. Switch to the IGV instance

    The mapped reads track should appear. Be sure that all files have the same genome ID

Add Mapped reads track to IGV from Galaxy

  1. Install IGV (if not already installed)
  2. Launch IGV on your computer
  3. Check if the reference genome is available on the IGV instance
  4. Expand the BAM dataset with the mapped reads in the history
  5. Click on the local in display with IGV to load the reads into the IGV browser
  6. Switch to the IGV instance

    The mapped reads track should appear. Be sure that all files have the same genome ID

Add genome and annotations to IGV from Galaxy

  1. Upload a FASTA file with the reference genome and a GFF3 file with its annotation in the history (if not already there)
  2. Install IGV (if not already installed)
  3. Launch IGV on your computer
  4. Expand the FASTA dataset with the genome in the history
  5. Click on the local in display with IGV to load the genome into the IGV browser
  6. Wait until all Dataset status are ok
  7. Close the window

    An alert ERROR Parameter "file" is required may appear. Ignore it.

  8. Expand the GFF3 dataset with the annotations of the genome in the history
  9. Click on the local in display with IGV to load the annotation into the IGV browser
  10. Switch to the IGV instance

    The annotation track should appear. Be careful that all files have the same genome ID

Add genome and annotations to IGV from Galaxy

  1. Upload a FASTA file with the reference genome and a GFF3 file with its annotation in the history (if not already there)
  2. Install IGV (if not already installed)
  3. Launch IGV on your computer
  4. Expand the FASTA dataset with the genome in the history
  5. Click on the local in display with IGV to load the genome into the IGV browser
  6. Wait until all Dataset status are ok
  7. Close the window

    An alert ERROR Parameter "file" is required may appear. Ignore it.

  8. Expand the GFF3 dataset with the annotations of the genome in the history
  9. Click on the local in display with IGV to load the annotation into the IGV browser
  10. Switch to the IGV instance

    The annotation track should appear. Be careful that all files have the same genome ID


Inputs


Do I need to create collections to run MaxQuant analysis or can I use single sample inputs?

Question: Do I need to create collections to run MaxQuant analysis or can I use single sample inputs?

Collections are not necessary to run MaxQuant but they make the history more clean and easier to navigate. The multiple datasets options allows to select multiple files that are not part of a collection and will give the same result as with a collection as input.

Do we need a contaminant FASTA for MQ in galaxy?

Question: Do we need a contaminant FASTA for MQ in galaxy?

Normally MaxQuant has a default contaminant fasta that we don’t have to input ourselves. MaxQuant in galaxy comes with the option to add contaminants automatically (one does not need to add contaminants to the fasta file)

Do you need to merge the databases? Because you can select multiple fasta files in MaxQuant.

Question: Do you need to merge the databases? Because you can select multiple fasta files in MaxQuant.

For MaxQuant one does not need to merge the databases, also MaxQuant offers the function to add common contaminants to the provided fasta.


Instructors


How do I get help?

The support channel for instructors is the same as for individual learners. We suggest you start by posting a question to the Galaxy Training Network Gitter chat. Anyone can view the discussion, but you’ll need to login (using your GitHub or Twitter account) to add to the discussion.

If you have questions about Galaxy in general (that are not training-centric) then there are several support options.

What Galaxy instance should I use for my training?

To teach the hands-on tutorials you need a Galaxy server to run the examples on.

Each tutorial is annotated with the information on which public Galaxy servers it can be run. These servers are available to anyone on the world wide web and some may have all the tools that are needed by a specific tutorial. If you choose this option then you should work with that server’s admins to confirm that the server can handle the workload for a workshop. For example, the usegalaxy.eu

If your organization/consortia/community has its own Galaxy server, then you may want to run tutorials on that. This can be ideal because then the instance you are teaching on is the same as your participants will be using after the training. They’ll also be able to revisit any analysis they did during the training. If you pursue this option you’ll need to work with your organization’s Galaxy Admins to confirm that

  • the server can support a room full of people all doing the same analysis at the same time.
  • all tools and reference datasets needed in the tutorial are locally installed. To learn how to setup a Galaxy instance for a tutorial, you can follow our dedicated tutorial.
  • all participants will be able to create/use accounts on the system.

Some training topics have a Docker image that can be installed and run on all participants’ laptops. These images contain Galaxy instances that include all tools and datasets used in a tutorial, as well as saved analyses and repeatable workflows that are relevant.

Finally, you can also run your tutorials on cloud-based infrastructures. Galaxy is available on many national research infrastructures such as Jetstream (United States), GenAP (Canada), GVL (Australia), CLIMB (United Kingdom), and more.

What are the best practices for teaching with Galaxy?

We started to collect some best practices for instructors inside our Good practices slides

Where do I start?

Spend some time exploring the different tutorials and the different resources that are available. Become familiar with the structure of the tutorials and think about how you might use them in your teaching.


Interactive tools


Knitting RMarkdown documents in RStudio

Hands-on: Knitting RMarkdown documents in RStudio

One of the other nice features of RMarkdown documents is making lovely presentation-quality worthy documents. You can take, for example, a tutorial and produce a nice report like output as HTML, PDF, or .doc document that can easily be shared with colleagues or students.

Screenshot of the metadata with html_notebook and word_document being visible and a number of options controlling their output. TOC, standing for table of contents, has been set to true for both.

Now you’re ready to preview the document:

screenshot of preview dropdown with options like preview, knit to html, knit to pdf, knit to word

Click Preview. A window will popup with a preview of the rendered verison of this document.

screenshot of rendered document with the table of contents on left, title is in a large font, and there are coloured boxes similar to GTN tutorials offering tips and more information

The preview is really similar to the GTN rendering, no cells have been executed, and no output is embedded yet in the preview document. But if you have run cells (e.g. the first few loading a library and previewing the msleep dataset:

screenshot of the rendered document with a fancy table browser embedded as well as the output of each step

When you’re ready to distribute the document, you can instead use the Knit button. This runs every cell in the entire document fresh, and then compiles the outputs together with the rendered markdown to produce a nice result file as HTML, PDF, or Word document.

screenshot of the console with 'chunks' being knitted together

tip Tip: PDF + Word require a LaTeX installation

You might need to install additional packages to compile the PDF and Word document versions

And at the end you can see a pretty document rendered with all of the output of every step along the way. This is a fantastic way to e.g. distribute read-only lesson materials to students, if you feel they might struggle with using an RMarkdown document, or just want to read the output without doing it themselves.

screenshot of a PDF document showing the end of the tutorial where a pretty plot has been rendered and there is some text for conclusions and citations

Launch JupyterLab

Hands-on: Launch JupyterLab

Currently JupyterLab in Galaxy is available on Live.useGalaxy.eu, usegalaxy.org and usegalaxy.eu.

Hands-on: Run JupyterLab
  1. Interactive Jupyter Notebook. Note that on some Galaxies this is called Interactive JupyTool and notebook:
  2. Click Run Tool
  3. The tool will start running and will stay running permanently

    This may take a moment, but once the Executed notebook in your history is orange, you are up and running!

  4. Click on the User menu at the top and go to Active Interactive Tools and locate the JupyterLab instance you started.
  5. Click on your JupyterLab instance

If JupyterLab is not available on the Galaxy instance:

  1. Start Try JupyterLab

Launch RStudio

Hands-on: Launch RStudio

Depending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.

Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org

  1. Open the Rstudio tool tool by clicking here to launch RStudio
  2. Click Run Tool
  3. The tool will start running and will stay running permanently
  4. Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.

If RStudio is not available on the Galaxy instance:

  1. Register for RStudio Cloud, or login if you already have an account
  2. Create a new project

Launch RStudio

Hands-on: Launch RStudio

Depending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.

Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org

  1. Open the Rstudio tool tool by clicking here to launch RStudio
  2. Click Run Tool
  3. The tool will start running and will stay running permanently
  4. Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.

If RStudio is not available on the Galaxy instance:

  1. Register for RStudio Cloud, or login if you already have an account
  2. Create a new project

Launch RStudio

Hands-on: Launch RStudio

Depending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.

Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org

  1. Open the Rstudio tool tool by clicking here to launch RStudio
  2. Click Run Tool
  3. The tool will start running and will stay running permanently
  4. Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.

If RStudio is not available on the Galaxy instance:

  1. Register for RStudio Cloud, or login if you already have an account
  2. Create a new project

Learning with RMarkdown in RStudio

Hands-on: Learning with RMarkdown in RStudio

Learning with RMarkdown is a bit different than you might be used to. Instead of copying and pasting code from the GTN into a document you’ll instead be able to run the code directly as it was written, inside RStudio! You can now focus just on the code and reading within RStudio.

  1. Load the notebook if you have not already, following the tip box at the top of the tutorial

    Screenshot of the Console in RStudio. There are three lines visible of not-yet-run R code with the download.file statements which were included in the setup tip box.

  2. Open it by clicking on the .Rmd file in the file browser (bottom right)

    Screenshot of Files tab in RStudio, here there are three files listed, a data-science-r-dplyr.Rmd file, a css and a bib file.

  3. The RMarkdown document will appear in the document viewer (top left)

    Screenshot of an open document in RStudio. There is some yaml metadata above the tutorial showing the title of the tutorial.

You’re now ready to view the RMarkdown notebook! Each notebook starts with a lot of metadata about how to build the notebook for viewing, but you can ignore this for now and scroll down to the content of the tutorial.

You can switch to the visual mode which is way easier to read - just click on the gear icon and select Use Visual Editor.

Screenshot of dropdown menu after clicking on the gear icon. The first option is `Use Visual Editor`.

You’ll see codeblocks scattered throughout the text, and these are all runnable snippets that appear like this in the document:

Screenshot of the RMarkdown document in the viewer, a cell is visible between markdown text reading library tidyverse. It is slightly more grey than the background region, and it has a run button at the right of the cell in a contextual menu.

And you have a few options for how to run them:

  1. Click the green arrow
  2. ctrl+enter
  3. Using the menu at the top to run all

    Screenshot of the run dropdown menu in R, the first item is run selected lines showing the mentioned shortcut above, the second is run next chunk, and then it also mentions a 'run all chunks below' and 'restart r and run all chunks' option.

When you run cells, the output will appear below in the Console. RStudio essentially copies the code from the RMarkdown document, to the console, and runs it, just as if you had typed it out yourself!

Screenshot of a run cell, its output is included below in the RMarkdown document and the same output is visible below in the console. It shows a log of loading the tidyverse library.

One of the best features of RMarkdown documents is that they include a very nice table browser which makes previewing results a lot easier! Instead of needing to use head every time to preview the result, you get an interactive table browser for any step which outputs a table.

Screenshot of the table browser. Below a code chunk is a large white area with two images, the first reading 'r console' and the second reading 'tbl_df'. The tbl_df is highlighted like it is active. Below that is a pretty-printed table with bold column headers like name and genus and so on. At the right of the table is a small arrow indicating you can switch to seeing more columns than just the initial three. At the bottom of the table is 1-10 of 83 rows written, and buttons for switching between each page of results.

Open a Terminal in Jupyter

Hands-on: Open a Terminal in Jupyter

This tutorial will let you accomplish almost everything from this view, running code in the cells below directly in the training material. You can choose between running the code here, or opening up a terminal tab in which to run it.Here are some instructions for how to do this on various environments.

Jupyter on UseGalaxy.* and MyBinder.org

  1. Use the File → New → Terminal menu to launch a terminal.

    screenshot of jupyterlab showing the File menu expanded to show new and terminal option.

  2. Disable “Simple” mode in the bottom left hand corner, if it activated.

    screenshot of jupyterlab showing a toggle labelled simple

  3. Drag one of the terminal or notebook tabs to the side to have the training materials and terminal side-by-side

    screenshot of jupyterlab with notebook and terminal side-by-side.

CoCalc

  1. Use the Split View functionality of cocalc to split your view into two portions.

    screenshot of cocalc button to split views

  2. Change the view of one panel to a terminal

    screenshot of cocalc swapping view port to that of a terminal

Open interactive tool

  1. Go to User > Active InteractiveTools
  2. Wait for the to be running (Job Info)
  3. Click on

Stop RStudio

Hands-on: Stop RStudio

When you have finished your R analysis, it’s time to stop RStudio.

  1. First, save your work into Galaxy, to ensure reproducibility:
    1. You can use gx_put(filename) to save individual files by supplying the filename
    2. You can use gx_save() to save the entire analysis transcript and any data objects loaded into your environment.
  2. Once you have saved your data, you can proceed in 2 different ways:
    • Deleting the corresponding history dataset named RStudio and showing a “in progress state”, so yellow, OR
    • Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.

Stop RStudio

Hands-on: Stop RStudio

When you have finished your R analysis, it’s time to stop RStudio.

  1. First, save your work into Galaxy, to ensure reproducibility:
    1. You can use gx_put(filename) to save individual files by supplying the filename
    2. You can use gx_save() to save the entire analysis transcript and any data objects loaded into your environment.
  2. Once you have saved your data, you can proceed in 2 different ways:
    • Deleting the corresponding history dataset named RStudio and showing a “in progress state”, so yellow, OR
    • Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.

Stop RStudio

Hands-on: Stop RStudio

When you have finished your R analysis, it’s time to stop RStudio.

  1. First, save your work into Galaxy, to ensure reproducibility:
    1. You can use gx_put(filename) to save individual files by supplying the filename
    2. You can use gx_save() to save the entire analysis transcript and any data objects loaded into your environment.
  2. Once you have saved your data, you can proceed in 2 different ways:
    • Deleting the corresponding history dataset named RStudio and showing a “in progress state”, so yellow, OR
    • Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.

Introduction


How can I advertise the training materials on my posters?

We provide some QR codes and logos in the images folder.

How can I cite the GTN?

We wrote two articles about our efforts:

To cite individual tutorials, please find citation information at the end of the tutorial.

Here is the BibTeX formatted version of those citations:

@article{Hiltemann_2023,
title = {Galaxy Training: A powerful framework for teaching!},
author = {Hiltemann, Saskia and Rasche, Helena and Gladman, Simon and Hotz, Hans-Rudolf and Larivi\`{e}re, Delphine and Blankenberg, Daniel and Jagtap, Pratik D. and Wollmann, Thomas and Bretaudeau, Anthony and Gou\'{e}, Nadia and Griffin, Timothy J. and Royaux, Coline and Le Bras, Yvan and Mehta, Subina and Syme, Anna and Coppens, Frederik and Droesbeke, Bert and Soranzo, Nicola and Bacon, Wendi and Psomopoulos, Fotis and Gallardo-Alba, Crist\'{o}bal and Davis, John and F\"{o}ll, Melanie Christine and Fahrner, Matthias and Doyle, Maria A. and Serrano-Solano, Beatriz and Fouilloux, Anne Claire and van Heusden, Peter and Maier, Wolfgang and Clements, Dave and Heyl, Florian and Gr\"{u}ning, Bj\"{o}rn and Batut, B\'{e}r\'{e}nice},
year = 2023,
month = jan,
journal = {PLOS Computational Biology},
publisher = {Public Library of Science (PLoS)},
volume = 19,
number = 1,
pages = {e1010752},
doi = {10.1371/journal.pcbi.1010752},
issn = {1553-7358},
url = {http://dx.doi.org/10.1371/journal.pcbi.1010752},
editor = {Ouellette, Francis},
}
@article{Batut_2018,
title = {Community-Driven Data Analysis Training for Biology},
author = {Batut, B\'{e}r\'{e}nice and Hiltemann, Saskia and Bagnacani, Andrea and Baker, Dannon and Bhardwaj, Vivek and Blank, Clemens and Bretaudeau, Anthony and Brillet-Gu\'{e}guen, Loraine and \v{C}ech, Martin and Chilton, John and Clements, Dave and Doppelt-Azeroual, Olivia and Erxleben, Anika and Freeberg, Mallory Ann and Gladman, Simon and Hoogstrate, Youri and Hotz, Hans-Rudolf and Houwaart, Torsten and Jagtap, Pratik and Larivi\`{e}re, Delphine and Le Corguill\'{e}, Gildas and Manke, Thomas and Mareuil, Fabien and Ram\'{\i}rez, Fidel and Ryan, Devon and Sigloch, Florian Christoph and Soranzo, Nicola and Wolff, Joachim and Videm, Pavankumar and Wolfien, Markus and Wubuli, Aisanjiang and Yusuf, Dilmurat and Taylor, James and Backofen, Rolf and Nekrutenko, Anton and Gr\"{u}ning, Bj\"{o}rn},
year = 2018,
month = jun,
journal = {Cell Systems},
publisher = {Elsevier BV},
volume = 6,
number = 6,
pages = {752--758.e1},
doi = {10.1016/j.cels.2018.05.012},
issn = {2405-4712},
url = {http://dx.doi.org/10.1016/j.cels.2018.05.012},
}

How can I load data?

  • Load by “browsing” for a local file. Some servers will support load data that is 2 GB or larger. If you are having problems with this method, try FTP.
  • Load using an HTTP URL or FTP URL.
  • Load a few lines of plain text.
  • Load using FTP. Either line command or with a desktop client.

How is the content licensed?

The content of this website is licensed under the Creative Commons Attribution 4.0 License.

Using Answer Key Histories

If you get stuck, you can first check your history against an exemplar history, from your tutorial.

First, import the target history.

  1. Open the link to the shared history
  2. Click on the new-history Import history button on the top right
  3. Enter a title for the new history
  4. Click on Import

Next, compare the answer key history with your own history.

You can view multiple Galaxy histories at once. This allows to better understand your analyses and also makes itv possible to drag datasets between histories. This is called “History multiview”. The multiview can be enabled either view History menu or via the ACtivity Bar:

  1. Enabling Multiview via History menu is done by first clicking on the galaxy-history-optionsHistory options” drop-down and selecting galaxy-multihistoryShow Histories Side-by-Side option”:

    Enabling side-by-side view using History Options menu

  2. Clicking the galaxy-multihistoryHistory Multiview” button within the Activity Bar:

    Enabling side-by-side view using Activity Bar

You can compare there, or if you’re really stuck, you can also click and drag a given dataset to your history to continue the tutorial from there.

There 3 ways to copy datasets between histories

  1. From the original history

    1. Click on the galaxy-gear icon which is on the top of the list of datasets in the history panel
    2. Click on Copy Datasets
    3. Select the desired files

    4. Give a relevant name to the “New history”

    5. Validate by ‘Copy History Items’
    6. Click on the new history name in the green box that have just appear to switch to this history
  2. Using the galaxy-columns Show Histories Side-by-Side

    1. Click on the galaxy-dropdown dropdown arrow top right of the history panel (History options)
    2. Click on galaxy-columns Show Histories Side-by-Side
    3. If your target history is not present
      1. Click on ‘Select histories’
      2. Click on your target history
      3. Validate by ‘Change Selected’
    4. Drag the dataset to copy from its original history
    5. Drop it in the target history
  3. From the target history

    1. Click on User in the top bar
    2. Click on Datasets
    3. Search for the dataset to copy
    4. Click on its name
    5. Click on Copy to current History

You can also use our handy troubleshooting guide.

When something goes wrong in Galaxy, there are a number of things you can do to find out what it was. Error messages can help you figure out whether it was a problem with one of the settings of the tool, or with the input data, or maybe there is a bug in the tool itself and the problem should be reported. Below are the steps you can follow to troubleshoot your Galaxy errors.

  1. Expand the red history dataset by clicking on it.
    • Sometimes you can already see an error message here
  2. View the error message by clicking on the bug icon galaxy-bug

  3. Check the logs. Output (stdout) and error logs (stderr) of the tool are available:
    • Expand the history item
    • Click on the details icon
    • Scroll down to the Job Information section to view the 2 logs:
      • Tool Standard Output
      • Tool Standard Error
    • For more information about specific tool errors, please see the Troubleshooting section
  4. Submit a bug report! If you are still unsure what the problem is.
    • Click on the bug icon galaxy-bug
    • Write down any information you think might help solve the problem
      • See this FAQ on how to write good bug reports
    • Click galaxy-bug Report button
  5. Ask for help!

Ways to use Galaxy

All ways to use Galaxy are included in the Galaxy Directory listing.

Having one account at several public Galaxy servers expands your access to distinct data storage and computational resources, plus common and domain-specific analysis tools.

When running your own private Galaxy server for routine analysis, publishing results at a public Galaxy server allows for worldwide access by others when you share your data: Histories, Workflows, and related assets.

Tips:

  • Teaching with Galaxy We strongly recommend using Galaxy’s Training Infrastructure as a Service (TIaaS) for synchronous class work.
  • Public Galaxy servers are appropriate for many analysis projects or for when sharing data or results publicly is a goal. These are also a great choice when learning on your own with GTN tutorials.
  • Private Galaxy servers are more appropriate when working with very large data, time sensitive projects, and ongoing research projects that require more resources than the public Galaxy servers can support. These two options are scientist friendly as they require very little to no server administration.
    • GVL Cloudman is a single or multi-user choice and AWS offers grants.
    • AnVIL is a single-user choice sponsored by NHGRI and is a pay-for-use Google Cloud platform.

What are the tutorials for?

These tutorials can be used for learning and teaching how to use Galaxy for general data analysis, and for learning/teaching specific domains such as assembly and differential gene expression analysis with RNA-Seq data.

What audiences are the tutorials for?

There are two distinct audiences for these materials.

  1. Self-paced individual learners. These tutorials provide everything you need to learn a topic, from explanations of concepts to detailed hands-on exercises.
  2. Instructors. They are also designed to be used by instructors in teaching/training settings. Slides, and detailed tutorials are provided. Most tutorials also include computational support with the needed tools, data as well as Docker images that can be used to scale the lessons up to many participants.

What is Galaxy?

Galaxy is an open data integration and analysis platform for the life sciences, and it is particularly well-suited for data analysis training in life science research.

What is a Learning Pathway?

Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

What is my.galaxy.training

The my.galaxy.training is part of the GTN. We found that often need to direct our learners to specific pages within Galaxy, but which Galaxy? Should we add three links, one for each of the current bigger UseGalaxy.* servers? That would be really annoying for users who aren’t using one of those servers.

E.g. how do we link to /user, the user preferences page which is available on every Galaxy Instance? This service handles that in a private and user-friendly manner.

(Learners) How to Use It

When you access a my.galaxy.training page you’ll be prompted to select a server, simply select one and you’re good to go!

If you want to enter a private Galaxy instance, perhaps, behind a firewall, that’s also an option! Just select the ‘other’ option and provide your domain. Since the redirection happens in your browser with no servers involved, as long as you can access the server, you’ll get redirected to the right location.

(Tutorial Authors) How to use it

If you want to link to a specific page within Galaxy, simple construct the URL: https://my.galaxy.training/?path=/user where everything after ?path is the location they should be redirected to on Galaxy. That example link will eventually redirect the learner to something like https://usegalaxy.eu/user.

Technical Background

So we took inspiration from Home Assistant which had the same problem, how to redirect users to pages on their own servers. The my.galaxy.training service is a very simple static page which looks in the user’s localStorage for their preferred server. If it’s not set, the user can click one of the common domains, and be redirected. When they access another link, they’ll be prompted to use a button that remembers which server they chose.

Data Privacy

Any domain selected is not tracked nor communicated to any third party. Your preferred server is stored in your browser, and never transmitted to the GTN. That’s why we use localStorage instead of cookies.

What is this website?

This website is a collection of hands-on tutorials that are designed to be interactive and are built around Galaxy:

Interactive training

This material is developed and maintained by the worldwide Galaxy community. You can learn more about this effort by reading our article.

What licenses are used in the GTN?

We provide a listing of all licenses for code and things displayed to you in the GTN in the licenses page

Why host your materials with the GTN?

The short version is we’re a popular, FAIR training materials platform, and we want to be a home for your training materials.

Your content in front of the world

As of June 2023 the GTN sees around 60k visitors per month. Please see our public page view monitoring for more details.

FAIR

We go to great lengths to make sure our training platform is completely FAIR. See this FAQ for the details on how we achieve that.. All of our materials have extensive BioSchemas markup ensuring they’re easily accessible to search engines. Our materials are automatically indexed by TeSS, and we are working on a WorkflowHub integration.

Accessible

We regularly test our pages with a thorough suite of accessibility tools, as well as via screen reader.

Not Just Galaxy

Our name can be a bit misleading! While a lot of our tutorials are focused on Galaxy, we have multiple growing topics which are unrelated to Galaxy.

All of these topics are using the GTN as a platform to disseminate their materials far and wide.

Features

Do you need

  • Choose your own adventure tutorials
  • Automatic videos from your slides

Then choose the GTN.


Learners


How can I get help?

If you have questions about this training material, you can reach us using the Gitter chat. You’ll need a GitHub or Twitter account to post questions. If you have questions about Galaxy outside the context of training, see the Galaxy Support page.

How do I use this material?

Many topics include slide decks and if the topic you are interested in has slides then start there. These will introduce the topic and important concepts.

Most of your learning will happen in the next step - the hands-on tutorials. This is where you’ll become familiar with using the Galaxy interface and experiment with different ways to use Galaxy and the tools in Galaxy.

Where can I run the hands-on tutorials?

To run the hands-on tutorials you need a Galaxy server to run them on.

Each tutorial is annotated with information about which public Galaxy servers it can be run on. These servers are available to anyone on the world wide web and some may have all the tools that are needed by a specific tutorial.

If your organization/consortia/community has its own Galaxy server, then you may want to run tutorials on that. You will need to confirm that all necessary tools and reference genomes are available on your server and possible install missing tools and data. To learn how to do that, you can follow our dedicated tutorial.

Some topics have a Docker image that can be installed and run on participants’ laptops. These Docker images contain Galaxy instances that include all tools and datasets used in a tutorial, as well as saved analyses and repeatable workflows that are relevant. You will need to install Docker.

Finally, you can also run your tutorials on cloud-based infrastructures. Galaxy is available on many national research infrastructures such as Jetstream (United States), GenAP (Canada), GVL (Australia), CLIMB (United Kingdom), and more. These instances are typically easy to launch, and easy to shut down when you are done.

If you are already familiar with, and have an account on Amazon Web Services then you can also launch a Galaxy server there using CloudLaunch.

Where do I start?

If you are new to Galaxy then start with one of the introductory topics. These introduce you to concepts that are useful in Galaxy, no matter what domain you are doing analysis in.

If you are already familiar with Galaxy basics and want to learn how to use it in a particular domain (for example, ChIP-Seq), then start with one of those topics.

If you are already well informed about bioinformatics data analysis and you just want to get a feel for how it works in Galaxy, then many tutorials include Instructions for the impatient sections.


Mapping


Is it possible to visualize the RNA STAR bam file using the JBrowse tool?

Question: Is it possible to visualize the RNA STAR bam file using the JBrowse tool?

Yes, that should work.

RNAstar: Why do we set 36 for 'Length of the genomic sequence around annotated junctions'?

Question: RNAstar: Why do we set 36 for 'Length of the genomic sequence around annotated junctions'?

RNA STAR is using the gene model to create the database of splice junctions, and that these don’t “need” to have a length longer than the reads (37bp).


Markdown


How can I create a tutorial skeleton from a Galaxy workflow?

There are two ways to do this:

  1. Use planemo on your local machine. Please see the tutorial named “Creating a new tutorial” for detailed instructions.
  2. Use our web service

Notebooks


Contributing a Jupyter Notebook to the GTN

Problem: I have a notebook that I’d like to add to the GTN.

Solution: While we do not support directly adding notebooks to the GTN, as all of our notebooks are generated from the tutorial Markdown files, there is an alternative path! Instead you can:

  1. Install jupytext
  2. Use it to convert the ipynb file into a Markdown file (jupytext notebook.ipynb --to markdown)
  3. Add this Markdown file to the GTN
  4. Fix any missing header metadata

Then the GTN’s infrastructure will automatically convert that Markdown file directly to a notebook on deployment. This approach has the advantage that Markdown files are more diff-friendly than ipynb, making it much easier to review updates to a tutorial.


Other


Are there any upcoming events focused on Galaxy Training?

Yes, always! Have a look at the Galaxy Community Events Calendar for what coming up right now.

Compatible Versions of Galaxy

Warning: Compatible Versions of Galaxy

This tutorial may not be updated for the latest version of Galaxy.

  • Galaxy’s Interface may be different to the Galaxy where you are following this tutorial.
  • ✅ All tutorial steps will still be able to be followed (potentially with minor differences for moved buttons or changed icons.)
  • ✅ Tools will all still work

GTN Stats

Statistics over the GTN
31
Topics
401
Tutorials
17
Learning Paths
423
FAQs
369
Contributors
9.0
Years
90
News Posts
154
Videos (107.1h)

Sustainability of the training-material and metadata

This repository is hosted on GitHub using git as a DVCS. Therefore the community is hosting backups of this repository in a decentralised way. The repository is self-contained and contains all needed content and all metadata. In addition we mirror snapshops of this repo on Zenodo.


Outputs


Does MaxQuant give as output possibility the PSMs and PEPs?

Question: Does MaxQuant give as output possibility the PSMs and PEPs?

Many output options, evidence & msms contain e.g. PSM or feature level info


Proteogenomics general


Can I use these workflows on datasets generated from our laboratory?

Question: Can I use these workflows on datasets generated from our laboratory?

Yes, the workflows can be used on other datasets as well. However, you will need to consider data acquisition and sample preparation methods so that the tool parameters can be adjusted accordingly.

Can I use these workflows on datasets generated from our laboratory?

Question: Can I use these workflows on datasets generated from our laboratory?

Yes, the workflows can be used on other datasets as well. However, you will need to consider data acquisition and sample preparation methods so that the tool parameters can be adjusted accordingly.

Can I use these workflows on datasets generated from our laboratory?

Question: Can I use these workflows on datasets generated from our laboratory?

Yes, the workflows can be used on other datasets as well. However, you will need to consider data acquisition and sample preparation methods so that the tool parameters can be adjusted accordingly.

Example histories for the proteogenomics tutorials

If you get stuck or would like to see what the results should look like, you can have a look at one of the following public histories:

Galaxy EU (usegalaxy.eu):

Galaxy Main (usegalaxy.org):

Example histories for the proteogenomics tutorials

If you get stuck or would like to see what the results should look like, you can have a look at one of the following public histories:

Galaxy EU (usegalaxy.eu):

Galaxy Main (usegalaxy.org):

Example histories for the proteogenomics tutorials

If you get stuck or would like to see what the results should look like, you can have a look at one of the following public histories:

Galaxy EU (usegalaxy.eu):

Galaxy Main (usegalaxy.org):

The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Question: The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Query tabular loads a tabular database and creates a sqlite database and tabular file. To learn more about SQL Queries - please look at this documentation.

The help section on the Query Tabular tool provides simple examples of both filtering the input tabular datasets, as well as examples of SQL queries. Query Tabular also incorporates regex functions that can be used queries. The PSM report datasets in these tutorials have fields that are lists of protein IDs.

Query Tabular help shows how to normalize those protein list fields so that we can perform operations by protein ID. See section: Normalizing by Line Filtering into 2 Tables in the tool help (below the tool in Galaxy).

The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Question: The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Query tabular loads a tabular database and creates a sqlite database and tabular file. To learn more about SQL Queries - please look at this documentation.

The help section on the Query Tabular tool provides simple examples of both filtering the input tabular datasets, as well as examples of SQL queries. Query Tabular also incorporates regex functions that can be used queries. The PSM report datasets in these tutorials have fields that are lists of protein IDs.

Query Tabular help shows how to normalize those protein list fields so that we can perform operations by protein ID. See section: Normalizing by Line Filtering into 2 Tables in the tool help (below the tool in Galaxy).

The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Question: The workflows contain several Query tabular for text manipulation, is there a tutorial for that?

Query tabular loads a tabular database and creates a sqlite database and tabular file. To learn more about SQL Queries - please look at this documentation.

The help section on the Query Tabular tool provides simple examples of both filtering the input tabular datasets, as well as examples of SQL queries. Query Tabular also incorporates regex functions that can be used queries. The PSM report datasets in these tutorials have fields that are lists of protein IDs.

Query Tabular help shows how to normalize those protein list fields so that we can perform operations by protein ID. See section: Normalizing by Line Filtering into 2 Tables in the tool help (below the tool in Galaxy).

What kind of variants are seen in the output?

Question: What kind of variants are seen in the output?

From this workflow we can see insertions, deletions, SNVs, or we will know whether it’s an intron, exon, splice junction etc.

What kind of variants are seen in the output?

Question: What kind of variants are seen in the output?

From this workflow we can see insertions, deletions, SNVs, or we will know whether it’s an intron, exon, splice junction etc.

What kind of variants are seen in the output?

Question: What kind of variants are seen in the output?

From this workflow we can see insertions, deletions, SNVs, or we will know whether it’s an intron, exon, splice junction etc.



UCSC - I fetched data from a remote website but now I’m logged out of Galaxy and my data is gone?

This is a known bug with Chrome + Galaxy, we’re working on it galaxyproject/galaxy#11374. For now we can recommend using Firefox (known to work) or trying another browser.


Reference genomes


How to use Custom Reference Genomes?

A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome assembly build or release.

There are two options for reference genomes in Galaxy.

  • Native
    • Index provided by the server administrators.
    • Found on tool forms in a drop down menu.
    • A database key is automatically assigned. See tip 1.
    • The database is what links your data to a FASTA index. Example: used with BAM data
  • Custom
    • FASTA file uploaded by users.
    • Input on tool forms then indexed at runtime by the tool.
    • An optional custom database key can be created and assigned by the user.

There are five basic steps to use a Custom Reference Genome, plus one optional.

  1. Obtain a FASTA copy of the target genome. See tip 2.
  2. Upload the genome to Galaxy and to add it as a dataset in your history.
  3. Clean up the format with the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
  4. Make sure the chromosome identifiers are a match for other inputs.
  5. Set a tool form’s options to use a custom reference genome from the history and select the loaded genome FASTA.
  6. (Optional) Create a custom genome build’s database that you can assign to datasets.

tip TIP 1: Avoid assigning a native database to uploaded data unless you confirmed the data are based on the same exact genome assembly or you adjusted the data to be a match first!

tip TIP 2: When choosing your reference genome, consider choosing your reference annotation at the same time. Standardize the format of both as a preparation step. Put the files in a dedicated “reference data” history for easy reuse.

Sorting Reference Genome

Certain tools expect that reference genomes are sorted in lexicographical order. These tools are often downstream of the initial mapping tools, which means that a large investment in a project has already been made, before a problem with sorting pops up in conclusion layer tools. How to avoid? Always sort your FASTA reference genome dataset at the beginning of a project. Many sources only provide sorted genomes, but double checking is your own responsibility, and super easy in Galaxy!

  1. Convert Formats -> FASTA-to-Tabular
  2. Filter and Sort -> Sort on column: c1 with flavor: Alphabetical everything in: Ascending order
  3. Convert Formats -> Tabular-to-FASTA

Note: The above sorting method is for most tools, but not all. In particular, GATK tools have a tool-specific sort order requirement.

Troubleshooting Custom Genome fasta

If a custom genome/transcriptome/exome dataset is producing errors, double check the format and that the chromosome identifiers between ALL inputs. Clicking on the bug icon galaxy-bug will often provide a description of the problem. This does not automatically submit a bug report, and it is not always necessary to do so, but it is a good way to get some information about why a job is failing.

  • Custom genome not assigned as FASTA format

    • Symptoms include: Dataset not included in custom genome “From history” pull down menu on tool forms.
    • Solution: Check datatype assigned to dataset and assign fasta format.
    • How: Click on the dataset’s pencil icon galaxy-pencil to reach the “Edit Attributes” form, and in the Datatypes tab > redetect the datatype.
    • If fasta is not assigned, there is a format problem to correct.
  • Incomplete Custom genome file load

    • Symptoms include: Tool errors result the first time you use the Custom genome.
    • Solution: Use Text Manipulation → Select last lines from a dataset to check last 10 lines to see if file is truncated.
    • How: Reload the dataset (switch to FTP if not using already). Check your FTP client logs to make sure the load is complete.
  • Extra spaces, extra lines, inconsistent line wrapping, or any deviation from strict FASTA format

    • Symptoms include: RNA-seq tools (Cufflinks, Cuffcompare, Cuffmerge, Cuffdiff) fails with error Error: sequence lines in a FASTA record must have the same length!.
    • Solution: File tested and corrected locally then re-upload or test/fix within Galaxy, then re-run.
    • How:
      • Quick re-formatting Run the tool through the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
      • Optional Detailed re-formatting Start with FASTA manipulation → FASTA Width formatter with a value between 40-80 (60 is common) to reformat wrapping. Next, use Filter and Sort → Select with “>” to examine identifiers. Use a combination of Convert Formats → FASTA-to-Tabular, Text Manipulation tools, then Tabular-to-FASTA to correct.
      • With either of the above, finish by using Filter and Sort → Select with ^\w*$ to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).
  • Inconsistent line wrapping, common if merging chromosomes from various Genbank records (e.g. primary chroms with mito)

    • Symptoms include: Tools (SAMTools, Extract Genomic DNA, but rarely alignment tools) may complain about unexpected line lengths/missing identifiers. Or they may just fail for what appears to be a cluster error.
    • Solution: File tested and corrected locally then re-upload or test/fix within Galaxy.
    • How: Use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Finish by using Filter and Sort → Select with ^\w*$ to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).
  • Unsorted fasta genome file

    • Symptoms include: Tools such as Extract Genomic DNA report problems with sequence lengths.
    • Solution: First try sorting and re-formatting in Galaxy then re-run.
    • How: To sort, follow instructions for Sorting a Custom Genome.
  • Identifier and Description in “>” title lines used inconsistently by tools in the same analysis

    • Symptoms include: Will generally manifest as a false genome-mismatch problem.
    • Solution: Remove the description content and re-run all tools/workflows that used this input. Mapping tools will usually not fail, but downstream tools will. When this comes up, it usually means that an analysis needs to be started over from the mapping step to correct the problems. No one enjoys redoing this work. Avoid the problems by formatting the genome, by double checking that the same reference genome was used for all steps, and by making certain the ‘identifiers’ are a match between all planned inputs (including reference annotation such as GTF data) before using your custom genome.
    • How: To drop the title line description content, use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Next, double check that the chromosome identifiers are an exact match between all inputs.
  • Unassigned database

    • Symptoms include: Tools report that no build is available for the assigned reference genome.
    • Solution: This occurs with tools that require an assigned database metadata attribute. SAMTools and Picard often require this assignment.
    • How: Create a Custom Build and assign it to the dataset.

Reports


Enhancing tabular dataset previews in reports/pages

There are lots of fun advanced features!

There are a number of options, specifically for tabular data, that can allow it to render more nicely in your workflow reports and pages and anywhere that GalaxyMarkdown is used.

  • title to give your table a title
  • footer allows you to caption your table
  • show_column_headers=false to hide the column headers
  • compact=true to make the table show up more inline, hiding that it was embedded from a Galaxy dataset.

The existing history_dataset_display directive displays the dataset name and some useful context at the expense of potentially breaking the flow of the document

Input: Galaxy Markdown
```galaxy
history_dataset_display(history_dataset_id=1e8ab44153008be8)
```
Output: Example Screenshot

a tabular dataset rendered, it has a title and a download button and sortable columns

The existing history_dataset_embedded directive was implemented to try to inline results more and make the results more readable within a more… curated document. It is dispatches on tabular types and puts the results in a table but the table doesn’t have a lot of options.

Input: Galaxy Markdown
```galaxy
history_dataset_embedded(history_dataset_id=1e8ab44153008be8)
```
Output: Example Screenshot

the same as before but no title nor download button. just a rendered table with sortable columns

The history_dataset_as_table directive mirrors the history_dataset_as_image directive: it tries harder to coerce the data into a table and provides new table—specific options. The first of these is “show_column_headers which defaults to true`.

Input: Galaxy Markdown
```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false)
```
Output: Example Screenshot

the same as before but no title nor download button nor column headers

There is also a compact option. This provides a much more inline experience for tabular datasets:

Input: Galaxy Markdown
```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false,compact=true)
```
Output: Example Screenshot

again the same screenshot, no table metadata, and now it lacks the small margin around it.

Figures in general should have titles and legends — so there is the “title” and “footer” options also.

Input: Galaxy Markdown
```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false,title='Binding Site Results',footer='Here is a very good figure caption for this table.')
```
Output: Example Screenshot

the same table with now a tasteful title and small caption below it describing that the author would write a caption if he knew what a binding site was.

Making an element collapsible in a report

If you have extraneous information you might want to let a user collapse it.

This applies to any GalaxyMarkdown elements, i.e. the things you’ve clicked in the left panel to embed in your Workflow Report or Page

By adding a collapse="" attribute to a markdown element, you can make it collapsible. Whatever you put in the quotes will be the title of the collapsible box.

```
history_dataset_type(history_dataset_id=3108c91feeb505da, collapse="[TITLE]")
```

Rule-builder


Flatten a list of list of paired datasets into a list of paired datasets

Sometimes you find yourself with a list:list:paired, i.e. a collection of collection of paired end data, and you really want a list:paired, a flatter collection of paired end data. This is easy to resolve with Apply rules:

  1. Open Apply rules
  2. Select your collection
  3. Click Edit

You’ll now be in the Apply rules editing interface. There are three columns (if it’s a list:list:paired)

  1. The outermost list identifier(s)
  2. The next list identifier(s)
  3. The paired-end indicator

Flattening this top level list, so it’s just a list:paired requires a few changes:

  1. From Column menu select Concatenate Columns
    • “From Column”: A
    • “From Column”: B This creates a column with the top list identifier, and the inner list identifier, which will be our new list identifier for the flattened list.
  2. From Rules menu select Add / Modify Column Definitions
    • Click Add Definition button and select Paired-end Indicator
      • “Paired-end Indicator”: C
    • Click Add Definition button and select List Identifier(s)
      • “List Identifier(s)”: D
    • Click Apply
  3. Click Save
  4. Click Run Tool

The tool will execute and reshape your list, congratulations!


Sequencing


Illumina MiSeq sequencing

Comment: Illumina MiSeq sequencing

Illumina MiSeq sequencing is based on sequencing by synthesis. As the name suggests, fluorescent labels are measured for every base that bind at a specific moment at a specific place on a flow cell. These flow cells are covered with oligos (small single strand DNA strands). In the library preparation the DNA strands are cut into small DNA fragments (differs per kit/device) and specific pieces of DNA (adapters) are added, which are complementary to the oligos. Using bridge amplification large amounts of clusters of these DNA fragments are made. The reverse string is washed away, making the clusters single stranded. Fluorescent bases are added one by one, which emit a specific light for different bases when added. This is happening for whole clusters, so this light can be detected and this data is basecalled (translation from light to a nucleotide) to a nucleotide sequence (Read). For every base a quality score is determined and also saved per read. This process is repeated for the reverse strand on the same place on the flow cell, so the forward and reverse reads are from the same DNA strand. The forward and reversed reads are linked together and should always be processed together!

For more information watch this video from Illumina

Nanopore sequencing

Comment: Nanopore sequencing

Nanopore sequencing has several properties that make it well-suited for our purposes

  1. Long-read sequencing technology offers simplified and less ambiguous genome assembly
  2. Long-read sequencing gives the ability to span repetitive genomic regions
  3. Long-read sequencing makes it possible to identify large structural variations

How nanopore sequencing works

When using Oxford Nanopore Technologies (ONT) sequencing, the change in electrical current is measured over the membrane of a flow cell. When nucleotides pass the pores in the flow cell the current change is translated (basecalled) to nucleotides by a basecaller. A schematic overview is given in the picture above.

When sequencing using a MinIT or MinION Mk1C, the basecalling software is present on the devices. With basecalling the electrical signals are translated to bases (A,T,G,C) with a quality score per base. The sequenced DNA strand will be basecalled and this will form one read. Multiple reads will be stored in a fastq file.


Support


Contacting Galaxy Administrators

If you suspect there is something wrong with the server, or would like to request a tool to be installed, you should contact the server administrators for the Galaxy you are on.

I get a different number of transcripts with a significant change in gene expression between the G1E and megakaryocyte cellular states. Why?

Question: I get a different number of transcripts with a significant change in gene expression between the G1E and megakaryocyte cellular states. Why?

This is okay! Many aspects of the tutorial can potentially affect the exact results you obtain. For example, the reference genome version used and versions of tools. It’s less important to get the exact results shown in the tutorial, and more important to understand the concepts so you can apply them to your own data.

Where do I get more support?

If you need support for using Galaxy, running your analysis or completing a tutorial, please try one of the following options:


Tips


Opening a split screen in byobu

Shift-F2: Create a horizontal split

Shift-Left/Right/Up/Down: Move focus among splits

Ctrl-F6: Close split in focus

Ctrl-D: (Linux, Mac users) Close split in focus

There are more byobu commands described in this gist


Tools


Changing the tool version

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool.

Switching to a different version of a tool:

  • Open the tool
  • Click on the tool-versions versions logo at the top right
  • Select the desired version from the dropdown list

If a Tool is Missing

To use the tools installed and available on the Galaxy server:

  1. At the top of the left tool panel, type in a tool name or datatype into the tool search box.
  2. Shorter keywords find more choices.
  3. Tools can also be directly browsed by category in the tool panel.

If you can’t find a tool you need for a tutorial on Galaxy, please:

  1. Check that you are using a compatible Galaxy server
    • Navigate to the overview box at the top of the tutorial
    • Find the “Supporting Materials” section
    • Check “Available on these Galaxies”
    • If your server is not listed here, the tutorial is not supported on your Galaxy server
    • You can create an account on one of the supporting Galaxies screenshot of overview box with available Galaxies section
  2. Use the Tutorial mode feature
    • Open your Galaxy server
    • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
    • Navigate to your tutorial
    • Tool names in tutorials will be blue buttons that open the correct tool for you
    • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  3. Still not finding the tool?

Multipile similar tools available

Sometimes there are multiple tools with very similar names. If the parameters in the tutorial don’t match with what you see in Galaxy, please try the following:

  1. Use Tutorial Mode curriculum in Galaxy, and click on the blue tool button in the tutorial to automatically open the correct tool and version (not available for all tutorials yet)

    Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

    • Open your Galaxy server
    • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
    • Navigate to your tutorial
    • Tool names in tutorials will be blue buttons that open the correct tool for you
    • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
    • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
    Warning: Not all browsers work!
    • We’ve had some issues with Tutorial mode on Safari for Mac users.
    • Try a different browser if you aren’t seeing the button.

  2. Check that the entire tool name matches what you see in the tutorial.

Organizing the tool panel

Galaxy servers can have a lot of tools available, which can make it challenging to find the tool you are looking for. To help find your favourite tools, you can:

  • Keep a list of your favourite tools to find them back easily later.
    • Adding tools to your favourites
      • Open a tool
      • Click on the star icon galaxy-star next to the tool name to add it to your favourites
    • Viewing your favourite tools
      • Click on the star icon galaxy-star at the top of the Galaxy tool panel (above the tool search bar)
      • This will filter the toolbox to show all your starred tools
  • Change the tool panel view
    • Click on the galaxy-panelview icon at the top of the Galaxy tool panel (above the tool search bar)
    • Here you can view the tools by EDAM ontology terms
      • EDAM Topics (e.g. biology, ecology)
      • EDAM Operations (e.g. quality control, variant analysis)
      • You can always get back to the default view by choosing “Full Tool Panel”

Para volver a ejecutar una herramienta

  1. Expande uno de los conjuntos de datos de la salida de la herramienta haciendo clic sobre él
  2. Selecciona volver a ejecutar galaxy-refresh de la herramienta

Esto es de utilidad si quieres volver a correr la herramienta variando ligeramente los valores de los parámetros, o si deseas verificar la configuración de parámetros que utilizaste.

Re-running a tool

  1. Expand one of the output datasets of the tool (by clicking on it)
  2. Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

Regular Expressions 101

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches
abc an occurrence of abc within your data
(abc|def) abc or def
[abc] a single character which is either a, b, or c
[^abc] a character that is NOT a, b, nor c
[a-z] any lowercase letter
[a-zA-Z] any letter (upper or lower case)
[0-9] numbers 0-9
\d any digit (same as [0-9])
\D any non-digit character
\w any alphanumeric character
\W any non-alphanumeric character
\s any whitespace
\S any non-whitespace character
. any character
\.  
{x,y} between x and y repetitions
^ the beginning of the line
$ the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expression matches
\d{4} 4 digits (e.g. a year)
chr\d{1,2} chr followed by 1 or 2 digits
.*abc$ anything with abc at the end of the line
^$ empty line

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values.

Regular expression Input Captures
chr(\d{1,2}) chr14 \1 = 14
(\d{2}) July (\d{4}) 24 July 1984 \1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn’t stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

Select multiple datasets

  1. Click on param-files Multiple datasets
  2. Select several files by keeping the Ctrl (or COMMAND) key pressed and clicking on the files of interest

Selecting a dataset collection as input

  1. Click on param-collection Dataset collection in front of the input parameter you want to supply the collection to.
  2. Select the collection you want to use from the list

Sorting Tools

Sometimes input errors are caused because of non-sorted inputs. Try using these:

  • Picard SortSam: Sort SAM/BAM by coordinate or queryname.
  • Samtools Sort: Alternate for SAM/BAM, best when used for coordinate sorting only.
  • SortBED order the intervals: Best choice for BED/Interval.
  • Sort data in ascending or descending order: Alternate choice for Tabular/BED/Interval/GTF.
  • VCFsort: Best choice for VFC.
  • Tool Form Options for Sorting: Some tools have an option to sort inputs during job execution. Whenever possible, sort inputs before using tools, especially if jobs fail for not having enough memory resources.

Tool doesn't recognize input datasets

The expected input datatype assignment is explained on the tool form. Review the input select areas and the help section below the Run Tool button.

Understanding datatypes FAQ.

No datasets or collections available? Solutions:

  1. Upload or Copy an appropriate dataset for the input into the active history.
    • To load new datasets, review the Upload tool and more choices under Get Data within Galaxy.
    • To copy datasets from a different history into the active history see this FAQ.
    • To use datasets loaded into a shared Data Library see this FAQ.
  2. Resolve a datatype assignment incompatibility between the dataset and the tool.
  3. Individual datasets and dataset collections are selected differently on tool forms.
    • To select a collection input on a tool form see this FAQ.

Using tutorial mode

Tutorial mode saves you screen space, finds the tools you need, and ensures you use the correct versions for the tutorials to run.

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

  • Open your Galaxy server
  • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
  • Navigate to your tutorial
  • Tool names in tutorials will be blue buttons that open the correct tool for you
  • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
  • We’ve had some issues with Tutorial mode on Safari for Mac users.
  • Try a different browser if you aren’t seeing the button.

Viewing tool logs (`stdout` and `stderr`)

Most tools create log files as output, which can contain useful information about how the tool ran (stdout, or standard output), and what went wrong (stderr, or standard error).

To view these log files in Galaxy:

  • Expand one of the outputs of the tool in your history
  • Click on View details details
  • Scroll to the Job Information section
    • Here you will find links to the log files (stdout and stderr).

Where is the tool help?

Finding tool support

There is documentation available on the tool form itself which mentions the following information:

  • Parameters
  • Expected format for input dataset(s)
  • Links to publications and ToolShed source repositories
  • Tool and wrapper version(s)
  • 3rd party author web sites and documentation

Scroll down on the tool form to locate:

  • Information about expected inputs/outputs
  • Expanded definitions
  • Sample data
  • Example use cases
  • Graphics

Troubleshooting


How to find and correct tool errors related to Metadata?

Finding and Correcting Metadata

Tools can error when the wrong dataset attributes (metadata) are assigned. Some of these wrong assignments may be:

  • Tool outputs, which are automatically assigned without user action.
  • Incorrect autodetection of datatypes, which need manual modification.
  • Undetected attributes, which require user action (example: assigning database to newly uploaded data).

How to notice missing Dataset Metadata:

  • Dataset will not be downloaded when using the disk icon galaxy-save.
  • Tools error when using a previously successfully used specific dataset.
  • Tools error with a message that ends with: OSError: [Errno 2] No such file or directory.

Solution:

Click on the dataset’s pencil icon galaxy-pencil to reach the Edit Attributes forms and do one of the following as applies:

  • Directly reset metadata
    • Find the tab for the metadata you want to change, make the change, and save.
  • Autodetect metadata
    • Click on the Auto-detect button. The dataset will turn yellow in the history while the job is processing.

Incomplete Dataset Download

In case the dataset downloads incompletely:

  • Use the Google Chrome web browser. Sometimes Chrome works better at supporting continuous data transfers.
  • Use the command-line option instead. The data may really be too large to download OR your connection is slower. This can also be a faster way to download multiple datasets plus ensure a complete transfer (small or large data).

Understanding 'canceled by admin' or cluster failure error messages

The initial error message could be:


This job failed because it was cancelled by an administrator.
Please click the bug icon to report this problem if you need help.

Or


job info:
Remote job server indicated a problem running or monitoring this job.
  • Causes:
    • Server or cluster error.
    • Less frequently, input problems are a factor.
  • Solutions:

Understanding 'exceeds memory allocation' error messages

The error message to be displayed are as follows:


job info:
This job was terminated because it used more memory than it was allocated.
Please click the bug icon to report this problem if you need help.

Or


stderr:
Fatal error: Exit code 1 ()
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.

Sometimes this message may appear at the bottom


job stderr:
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.

In rare cases when the memory quota is exceeded very quickly, an error message such as the following can appear


job stderr:
Fatal error: Exit code 1 ()
Traceback (most recent call last):
(other lines)
Memory Error

Note: Job runtime memory is different from the amount of free storage space (quota) in an account.

  • Causes:
    • The job ran out of memory while executing on the cluster node that ran the job.
    • The most common reasons for this error are input and tool parameters problems that must be adjusted/corrected.
  • Solutions:
    • Try at least one rerun to execute the job on a different cluster node.
    • Review the Solutions section of the Understanding input error messages FAQ.
    • Your data may actually be too large to process at a public Galaxy server. Alternatives include setting up a private Galaxy server.

Understanding ValueError error messages

The full error is usually a longer message seen only after clicking on the bug icon or by reviewing the job details stderr.

How to do both is covered in the Troubleshooting errors FAQ.


stderr
...
Many lines of text, may include parameters
...
...
ValueError: invalid literal for int() with base 10: some-sequence-read-name
  • Causes:
    • MACS2 produces this error the first time it is run. MACS is not the only tool that can produce this issue, but it is the most common.
  • Solutions:
    • Try at least one rerun.
    • MACS/2 is not capable of interpreting sequence read names with spaces included. Try following these two:
      • Remove unmapped reads from the SAM dataset. There are several filtering tools in the groups SAMTools and Picard that can do this.
      • Convert the SAM input to BAM format with the tool SAMtools: SAM-to-BAM. When compressed input is given to MACS, the spaces are no longer an issue.

Understanding input error messages

Input problems are very common across any analysis that makes use of programmed tools.

  • Causes:
    • No quality assurance or content/formatting checks were run on the first datasets of an analysis workflow.
    • Incomplete dataset Upload.
    • Incorrect or unassigned datatype or database.
    • Tool-specific formatting requirements for inputs were not met.
    • Parameters set on a tool form are a mismatch for the input data content or format.
    • Inputs were in an error state (red) or were putatively successful (green) but are empty.
    • Inputs do not meet the datatype specification.
    • Inputs do not contain the exact content that a tool is expecting or that was input in the form.
    • Annotation files are a mismatch for the selected or assigned reference genome build.
    • Special case: Some of the data were generated outside of Galaxy, but later a built-in indexed genome build was assigned in Galaxy for use with downstream tools. This scenario can work, but only if those two reference genomes are an exact match.
  • Solutions:
    • Review our Troubleshooting Tips for what and where to check.
    • Review the GTN for related tutorials on tools/analysis plus FAQs.
    • Review Galaxy Help for prior discussion with extended solutions.
    • Review datatype FAQs.
    • Review the tool form.
      • Input selection areas include usage help.
      • The help section at the bottom of a tool form often has examples. Does your own data match the format/content?
      • See the links to publications and related resources.
    • Review the inputs.
      • All inputs must be in a success state (green) and actually contain content.
      • Did you directly assign the datatype or convert the datatype? What results when the datatype is detected by Galaxy? If these differ, there is likely a content problem.
      • For most analysis, allowing Galaxy to detect the datatype during Upload is best and adjusting a datatype later should rarely be needed. If a datatype is modified, the change has a specific purpose/reason.
      • Does your data have headers? Is that in specification for the datatype? Does the tool form have an option to specify if the input has headers or not? Do you need to remove headers first for the correct datatype to be detected? Example GTF.
      • Large inputs? Consider modifying your inputs to be smaller. Examples: FASTQ and FASTA.
    • Run quality checks on your data.
      • Search GTN tutorials with the keyword “qa-qc” for examples.
      • Search Galaxy Help with the keywords “qa-qc” and your datatype(s) for more help.
    • Reference annotation tips.
    • Input mismatch tips.
      • Do the chromosome/sequence identifiers exactly match between all inputs? Search Galaxy Help for more help about how to correct build/version identifier mismatches between inputs.
      • “Chr1” and “chr1” and “1” do not mean the same thing to a tool.
    • Custom genome transcriptome exome tips. See FASTA.

Understanding walltime error messages

The full error message will be reported as below, and can be found by clicking on the bug icon for a failed job run (red dataset):


job info:
This job was terminated because it ran longer than the maximum allowed job run time.
Please click the bug icon to report this problem if you need help.

Or sometimes,


job stderr:
slurmstepd: error: *** JOB XXXX ON XXXX CANCELLED AT 2019-XX-XXTXX:XX:XX DUE TO TIME LIMIT ***

job info:
Remote job server indicated a problem running or monitoring this job.
  • Causes:
    • The job execution time exceeded the “wall-time” on the cluster node that ran the job.
    • The server may be undergoing maintenance.
    • Very often input problems also cause this same error.
  • Solutions:

What information should I include when reporting a problem?

Writing bug reports is a good skill to have as bioinformaticians, and a key point is that you should include enough information from the first message to help the process of resolving your issue more efficient and a better experience for everyone.

What to include

  1. Which commands did you run, precisely, we want details. Which flags did you set?
  2. Which server(s) did you run those commands on?
  3. What account/username did you use?
  4. Where did it go wrong?
  5. What were the stdout/stderr of the tool that failed? Include the text.
  6. Did you try any workarounds? What results did those produce?
  7. (If relevant) screenshot(s) that show exactly the problem, if it cannot be described in text. Is there a details panel you could include too?
  8. If there are job IDs, please include them as text so administrators don’t have to manually transcribe the job ID in your picture.

It makes the process of answering ‘bug reports’ much smoother for us, as we will have to ask you these questions anyway. If you provide this information from the start, we can get straight to answering your question!

What does a GOOD bug report look like?

The people who provide support for Galaxy are largely volunteers in this community, so try and provide as much information up front to avoid wasting their time:

I encountered an issue: I was working on (this server> and trying to run (tool)+(version number) but all of the output files were empty. My username is jane-doe.

Here is everything that I know:

  • The dataset is green, the job did not fail
  • This is the standard output/error of the tool that I found in the information page (insert it here)
  • I have read it but I do not understand what X/Y means.
  • The job ID from the output information page is 123123abdef.
  • I tried re-running the job and changing parameter Z but it did not change the result.

Could you help me?

What information should I include when reporting a problem?

Writing bug reports is a good skill to have as bioinformaticians, and a key point is that you should include enough information from the first message to help the process of resolving your issue more efficient and a better experience for everyone.

What to include

  1. Which commands did you run, precisely, we want details. Which flags did you set?
  2. Which server(s) did you run those commands on?
  3. What account/username did you use?
  4. Where did it go wrong?
  5. What were the stdout/stderr of the tool that failed? Include the text.
  6. Did you try any workarounds? What results did those produce?
  7. (If relevant) screenshot(s) that show exactly the problem, if it cannot be described in text. Is there a details panel you could include too?
  8. If there are job IDs, please include them as text so administrators don’t have to manually transcribe the job ID in your picture.

It makes the process of answering ‘bug reports’ much smoother for us, as we will have to ask you these questions anyway. If you provide this information from the start, we can get straight to answering your question!

What does a GOOD bug report look like?

The people who provide support for Galaxy are largely volunteers in this community, so try and provide as much information up front to avoid wasting their time:

I encountered an issue: I was working on (this server> and trying to run (tool)+(version number) but all of the output files were empty. My username is jane-doe.

Here is everything that I know:

  • The dataset is green, the job did not fail
  • This is the standard output/error of the tool that I found in the information page (insert it here)
  • I have read it but I do not understand what X/Y means.
  • The job ID from the output information page is 123123abdef.
  • I tried re-running the job and changing parameter Z but it did not change the result.

Could you help me?


User interface


I can’t find the “Analyze Data” button

The Galaxy interface has changed a bit recently, “Analyze Data” was always the home button, and now looks like a home icon.

My Galaxy looks different than in the tutorial/video

Galaxy gets frequent updates, different servers will be running different versions. This is nothing to worry about, just let us know if you can’t find how to perform a task in your Galaxy.


User preferences


Does your account usage quota seem incorrect?

  1. Log out of Galaxy, then back in again. This refreshes the disk usage calculation displayed in the Masthead usage (summary) and under User > Preferences (exact).

Note:

  • Your account usage quota can be found at the bottom of your user preferences page.

Forgot Password

  1. Go to the Galaxy server you are using.
  2. Click on Login or Register.
  3. Enter your email on the Public Name or Email Address entry box.
  4. Click on the link under the password entry box titled Forgot password? Click here to reset your password.
  5. An email will be sent with a password reset link. This email may be in your email Spam or Trash folders, depending on your filters.
  6. Click on the reset link in the email or copy and paste it into a web browser window.
  7. Enter your new password and click on Save new password.

Getting your API key

  1. In your browser, open your Galaxy homepage
  2. Log in, or register a new account, if it’s the first time you’re logging in
  3. Go to User -> Preferences in the top menu bar, then click on Manage API key
  4. If there is no current API key available, click on Create a new key to generate it
  5. Copy your API key to somewhere convenient, you will need it throughout this tutorial

Utilities


Got lost along the way?

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!


Visualisation


Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Workflows


Annotate a workflow

  • Open the workflow editor for the workflow
  • Click on galaxy-pencil Edit Attributes on the top right
  • Write a description of the workflow in the Annotation box
  • Add a tag (which will help to search for the workflow) in the Tags section

Creating a new workflow

You can create a Galaxy workflow from scratch in the Galaxy workflow editor.
  1. Click Workflow on the top bar
  2. Click the new workflow galaxy-wf-new button
  3. Give it a clear and memorable name
  4. Clicking Save will take you directly into the workflow editor for that workflow
  5. Need more help? Please see the How to make a workflow subsection here

Ensuring Workflows meet Best Practices

When you are editing a workflow, there are a number of additional steps you can take to ensure that it is a Best Practice workflow and will be more reusable.

  1. Open a workflow for editing
  2. In the workflow menu bar, you’ll find the galaxy-wf-options Workflow Options dropdown menu.
  3. Click on it and select “Best Practices” from the dropdown menu.

    screenshot showing the best practices menu item in the gear dropdown.

  4. This will take you to a new side panel, which allows you to investigate and correct any issues with your workflow.

    screenshot showing the best practices side panel. several issues are raised like a missing annotation with a link to add that, and non-optional inputs that are unconnected. Additionally several items already have green checks like the workflow defining creator information and a license.

Extracting a workflow from your history

Galaxy can automatically create a workflow based on the analysis you have performed in a history. This means that once you have done an analysis manually once, you can easily extract a workflow to repeat it on different data.
  1. Clean up your history: remove any failed (red) jobs from your history by clicking on the galaxy-delete button.

    This will make the creation of the workflow easier.

  2. Click on galaxy-gear (History options) at the top of your history panel and select Extract workflow.

    `Extract Workflow` entry in the history options menu

    The central panel will show the content of the history in reverse order (oldest on top), and you will be able to choose which steps to include in the workflow.

  3. Replace the Workflow name to something more descriptive.

  4. Rename each workflow input in the boxes at the top of the second column.

  5. If there are any steps that shouldn’t be included in the workflow, you can uncheck them in the first column of boxes.

  6. Click on the Create Workflow button near the top.

    You will get a message that the workflow was created.

Extraer un flujo de trabajo de tu historial

Galaxy puede crear automáticamente un flujo de trabajo basado en un análisis almacenado en tu historial. Esto significa que una vez que hayas realizado un análisis manualmente, puedes extraer fácilmente un flujo de trabajo para repetirlo con diferentes datos.
  1. Elimina cualquier trabajo fallido o no deseado de tu historial.
  2. Haz clic en Opciones de historial (icono de engranaje galaxy-gear) en la parte superior del panel de historial.
  3. Selecciona Extraer flujo de trabajo
  4. Verifica los pasos, ingresa un nombre para tu flujo de trabajo y presiona el botón Crear flujo de trabajo.

Hiding intermediate steps

When a workflow is executed, the user is usually primarily interested in the final product and not in all intermediate steps. By default all the outputs of a workflow will be shown, but we can explicitly tell Galaxy which outputs to show and which to hide for a given workflow. This behaviour is controlled by the little checkbox in front of every output dataset:

Asterisk for `out_file1` in the `Select First` tool

Import workflows from DockStore

Dockstore is a free and open source platform for sharing reusable and scalable analytical tools and workflows.

  1. Ensure that you are logged in to your Galaxy account.
  2. Go to DockStore.
  3. Select any Galaxy workflow you want to import.
  4. Click on “Galaxy” dropdown within the “Launch with” panel located in the upper right corner.
  5. Select a galaxy instance you want to launch this workflow with.
  6. You will be redirected to Galaxy and presented with a list of workflow versions.
  7. Click the version you want (usually the latest labelled as “main”)
  8. You are done!

The following short video walks you through this uncomplicated procedure:

Video: Importing from Dockstore

Import workflows from WorkflowHub

WorkflowHub is a workflow management system which allows workflows to be FAIR (Findable, Accessible, Interoperable, and Reusable), citable, have managed metadata profiles, and be openly available for review and analytics.

  1. Ensure that you are logged in to your Galaxy account.
  2. Click on the Workflow menu, located in the top bar.
  3. Click on the Import button, located in the right corner.
  4. In the section “Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)”, click on Search form.
  5. In the TRS Server: workflowhub.eu menu you should type your query. galaxy TRS workflow search field, name:vgp is entered in the search bar, and five different workflows all labelled VGP are listed
  6. Click on the desired workflow, and finally select the latest available version.

After that, the imported workflows will appear in the main workflow menu. In order to run the workflow, just need to click in the workflow-run Run workflow icon.

Below is a short video showing this uncomplicated procedure:

Video: Importing from WorkflowHub

Importing a workflow

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on galaxy-upload Import at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Importing a workflow using the search

  1. Click on Workflow in the top menu bar of Galaxy. You will see a list of all your workflows.
  2. Click on the galaxy-upload Import icon at the top-right of the screen
  3. On the new page, select the GA4GH servers tab, and configure the GA4GH Tool Registry Server (TRS) Workflow Search interface as follows:
    1. “TRS Server”: the TRS Server you want to search on (Dockstore or Workflowhub)
    2. Type in the search query
    3. Expand the correct workflow by clicking on it
    4. Select the version you would like to galaxy-upload import

The workflow will be imported to your list of workflows. Note that it will also carry a little green check mark next to its name, which indicates that this is an original workflow version imported from a TRS server. If you ever modify the workflow with Galaxy’s workflow editor, it will lose this indicator.

Importing and Launching a Dockstore workflow

Hands-on: Importing and Launching a Dockstore workflow
  1. Go to Workflow → Import in your Galaxy
  2. Switch tabs to TRS ID
  3. Ensure the TRS server is set to “dockstore.org”
  4. Provide your workflow hub ID

Importing and Launching a WorkflowHub.eu Workflow

Hands-on: Importing and Launching a WorkflowHub.eu Workflow
  1. Go to Workflow → Import in your Galaxy
  2. Switch tabs to TRS ID
  3. Ensure the TRS server is set to “workflowhub.eu”
  4. Provide your workflow hub ID

Importing and launching a GTN workflow

Hands-on: Importing and launching a GTN workflow
  1. Find the material you are interested in
  2. View it’s workflows, which can be found in the metadata box at the top of the tutorial
  3. Click the button on any workflow to run it.

Make a workflow public

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows
  • Click on the interesting workflow
  • Click on Share
  • Click on Make Workflow accessible. This makes the workflow publicly accessible but unlisted.
  • To also list the workflow in the Shared Data section (in the top menu bar) of Galaxy, click Make Workflow publicly available in Published Workflows

Opening the workflow editor

  1. In the top menu bar, click on Workflows
  2. Click on the name of the workflow you want to edit Workflow drop down menu showing Edit option
  3. Select galaxy-wf-edit Edit from the dropdown menu to open the workflow in the workflow editor

Renaming workflow outputs

  1. Open the workflow editor
  2. Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
  3. Scroll down to the Configure Output section of your desired parameter, and click it to expand it.
    • Under Rename dataset, give it a meaningful name

      Rename output datasets

Running a workflow

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on the workflow-run (Run workflow) button next to your workflow
  • Configure the workflow as needed
  • Click the Run Workflow button at the top-right of the screen
  • You may have to refresh your history to see the queued jobs

Setting parameters at run-time

  1. Open the workflow editor
  2. Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
  3. Scroll down to the parameter you want users to provide every time they run the workflow
  4. Click on the arrow in front of the name workflow-runtime-toggle to toggle to set at runtime

Viewing a workflow report

You can find the workflow report from the workflow invocation
  • Go to User on the top menu bar of Galaxy.
  • Click on Workflow invocations
    • Here you will find a list of all the workflows you have run
  • Click on the name of a workflow invocation to expand it workflow invocations list
  • Click on View Report to go to the workflow report page
  • Note: The report can also be downloaded in PDF format by clicking on the galaxy-wf-report-download icon.

References

  1. Wood, D. E., and S. L. Salzberg, 2014 Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 15: R46. 10.1186/gb-2014-15-3-r46
  2. Devenyi, G. A., R. Emonet, R. M. Harris, K. L. Hertweck, D. Irving et al., 2018 Ten simple rules for collaborative lesson development (S. Markel, Ed.). PLOS Computational Biology 14: e1005963. 10.1371/journal.pcbi.1005963
  3. Garcia, L., B. Batut, M. L. Burke, M. Kuzak, F. Psomopoulos et al., 2020 Ten simple rules for making training materials FAIR (S. Markel, Ed.). PLOS Computational Biology 16: e1007854. 10.1371/journal.pcbi.1007854



Still have questions?
Gitter Chat Support
Galaxy Help Forum