Frequently Asked Questions
Tutorial Questions
How to enable the Activity Bar
This FAQ demonstrates how to enable the activity bar within the Galaxy interfaceIf you do not see the Activity Bar it can be enabled as follows:
- Click on the “User” link at the top of the Galaxy interface
- Select “Preferences”
- Scroll down and click on “Manage Activity Bar”
Toggle the “Enable Activity Bar” switch and voila!
Account
Can I create multiple Galaxy accounts?
The account registration form and activation email include a terms of service statement.
- You ARE NOT allowed to create more than 1 account per Galaxy server.
- You ARE allowed to have accounts on different servers.
For example, you are allowed to have 1 account on Galaxy US, and another account on Galaxy EU, but never 2 accounts on the same Galaxy.
WARNING: Having multiple accounts is a violation of the terms of service, and may result in deletion of your accounts.
Need more disk space?
- Review your User -> Preferences -> Storage Dashboard to find and manage all of your data.
- Read about more ways to free up space in your account
- Contact the admins of your Galaxy server to ask about possibilities for temporarily increasing your quota.
Other tips:
- Forgot your password? You can request a reset link in on the login page.
- If you want to associate your account with a different email address, you can do so under User -> Preferences in the top menu bar.
- To start over with a new account, delete your existing account(s) first before creating your new account. This can be done in User -> Preferences menu in the top bar.
Changing acount email or password
- Make sure you are logged in to Galaxy.
- Go to User > Preferences in the top menu bar.
- To change email and public name, click on Manage Information and to change password, click on Change Password.
- Make the changes and click on the Save button at the bottom.
- To change email successfully, verify your account by email through the activation link sent by Galaxy.
Note: Don’t open another account if your email changes, update the existing account email instead. Creating a new account will be detected as a duplicate and will get your account disabled and deleted.
How can I reduce quota usage while still retaining prior work (data, tools, methods)?
- Download Datasets as individual files or entire Histories as an archive. Then purge them from the public server.
- Transfer/Move Datasets or Histories to another Galaxy server, including your own Galaxy. Then purge.
- Copy your most important Datasets into a new/other History (inputs, results), then purge the original full History.
- Extract a Workflow from the History, then purge it.
- Back-up your work. It is a best practice to download an archive of your FULL original Histories periodically, even those still in use, as a backup.
Resources Much discussion about all of the above options can be found at the Galaxy Help forum.
How do I create an account on a public Galaxy instance?
To create an account at any public Galaxy instance, choose your server from the available list of Galaxy Platforms.
There are several UseGalaxy servers:
UseGalaxy.eu (EU)
UseGalaxy.org.au (AU)
UseGalaxy.org (US)
UseGalaxy.fr (FR)
Click on “Login or Register” in the masthead on the server.
On the login page, find the Register here link and click on it.
Fill in the the registration form, then click on Create.
Your account should now get created, but will remain inactive until you verify the email address you provided in the registration form.
Check for a Confirmation Email in the email you used for account creation.
Missing? Check your Trash and Spam folders.
Click on the Email confirmation link to fully activate your account.
galaxy-info Delivery of the confimation email is blocked by your email provider or you mistyped the email address in the registration form?
Please do not register again, but follow the instructions to change the email address registered with your account! The confirmation email will be resent to your new address once you have changed it.
Trouble logging in later? Account email addresses and public names are caSe-sensiTive. Check your activation email for formats.
How to update account preferences?
- Log in to Galaxy
- Navigate to User -> Preferences on the top menu bar
- Here you can update various preferences, such as:
- pref-info Manage Information (change your registered email addresses or public name)
- pref-password Change Password (change your login credentials)
- pref-permissions Set Dataset Permissions for New Histories (grant others default access to newly created histories)
- pref-toolboxfilters Manage Toolbox Filters (customize your Toolbox by displaying or omitting sets of Tools)
- pref-apikey Manage API Key (access your current API key or create a new one)
- pref-notifications Manage Notifications (allow push and tab notifcations on job completion)
- pref-cloud Manage Cloud Authorization (grants Galaxy to access your cloud-based resources)
- pref-identities Manage Third-Party Identities (connect or disconnect access to your third-party identities)
- pref-custombuilds Manage Custom Builds (custom databases based on fasta datasets)
- pref-list Manage Activity Bar (a bonus navigation bar)
- pref-palette Pick a Color Theme (interface color theme)
- pref-dataprivate Make All Data Private (disable all data sharing)
- pref-delete Delete Account (on this Galaxy server)
- pref-signout Sign out of Galaxy (signs you out of all sessions)
Analysis
Adding a custom database/build (dbkey)
Galaxy may have several reference genomes built-in, but you can also create your own.
- Navigate to the History that contains your fasta for the reference genome
- Standarize the fasta format
- In the top menu bar, go to User -> Preferences -> Manage Custom Builds
- Create a unique Name for your reference build
- Create a unique Database (dbkey) for your reference build
- Under Definition, select the option
FASTA-file from history
- Under FASTA-file, select your fasta file
- Click the Save button
Beware of Cuts
Galaxy has several different cut toolsWarning: Beware of CutsThe section below uses Cut tool. There are two cut tools in Galaxy due to historical reasons. This example uses tool with the full name Cut columns from a table (cut). However, the same logic applies to the other tool. It simply has a slightly different interface.
Extended Help for Differential Expression Analysis Tools
The error and usage help in this FAQ applies to most if not all Bioconductor tools.
- DEseq2
- Limma
- edgeR
- goseq
- Diffbind
- StringTie
- Featurecounts
- HTSeq-count
- HTseq-clip
- Kalisto
- Salmon
- Sailfish
- DEXSeq
- DEXSeq-count
- IsoformSwitchAnalyzeR
galaxy-info Review your error messages and you’ll find some clues about what may be going wrong and what needs to be adjusted in your rerun. If you are getting a message from
R
, that usually means the underlying tool could not read in or understand your inputs. This can be a labeling problem (what was typed on the form) or a content problem (data within the files).Expect odd errors or content problems if any of the usage requirements below are not met.
General
- Are your reference genome, reference transcriptome, and reference annotation all based on the same genome assembly?
- Check the identifiers in all inputs and adjust as needed.
- These all may mean the same thing to a person but not to a computer or tool: chr1, Chr1, 1, chr1.1
- Differential expression tools all require sample count replicates. Rationale from two of the DEseq tool authors.
- At least two factor levels/groups/conditions with two samples each.
- All must all contain unique content for valid scientific results.
- Factor/Factor level names should only contain alphanumeric characters and optionally underscores.
- Avoid starting these with a number and do not include spaces.
- Galaxy may be able to normalize these values for you, but if you are getting an error: standardize the format yourself.
- DEXSeq additionally requires that the first Condition is labeled as
Condition
.- If your count inputs have a header, the option Files have header? is set to Yes. If no headers, set to No.
- If your files have more than one header line: keep the sample header line, remove all extra line(s).
- Make sure that tool form settings match your annotation content or the tool cannot match up the inputs!
- If you are counting by gene_id, your annotation should contain gene_id attributes (9th column)
- If you are summarizing by exon, your annotation should contain exon features (3rd column)
- Sometimes these tools do not understand
transcript_id.N
andgene_id.N
notation (where N is a version number).
- This notation could be in fasta or tabular inputs.
- Try removing
.N
from all inputs, and check for the accidential creation of new duplicates!- Errors? Understanding the job log messages can be confusing! But are accessible and worth reviewing.
- The good news is that usage in Galaxy produces the same error messages as direct usage.
- This means that a search at the Bioconductor Support website can provide useful clues! Come back to the Galaxy Help forum with any remaining questions.
tip Remember, for any value in your inputs that is not a number, using only alphanumeric characters and optionally underscores
_
with no spaces is what the authors recommend. Check your factor names, sample names, gene identifiers, transcript identifiers, and header lines in files.Reference genome (fasta)
- Can be a server reference genome (hosted index in the pull down menu) or a custom reference genome (fasta from the history).
- Custom reference genomes must be formatted correctly.
- If you are using Salmon or Kalisto, you probably don’t need a reference genome but a reference transcriptome instead!
- More about understanding and working with large fasta datasets.
Reference transcriptome (fasta)
- Fasta file containing assembled transcripts.
- Unassembled short or long reads will not work as a substitute.
- The transcript identifiers on the
>seq
fasta lines must exactly match thetranscript_id
values in your annotation or tabular mapping file.Reference annotation (tabular, GTF, GFF3)
- Reference annotation in GTF format works best.
- If a GTF dataset is not available for your genome, a two-column tabular dataset containing
transcript <tab> gene
can be used instead with most of these tools.- HTseq-count requires GTF attributes. Featurecounts is an alternative tool choice.
- Sometimes the tool gffread is used to transform GFF3 data to GTF.
- DO use UCSC’s reference annotation (GTF) and reference transcriptome (fasta) data from their Downloads area.
- These are a match for the UCSC genomes indexed at public Galaxy servers.
- Links can be directly copy/pasted into the Upload tool.
- Allow Galaxy to autodetect the datatype to produce an uncompressed dataset in your history ready to use with tools.
- Avoid GTF data from the UCSC Table Browser: this leads to scientific problems. GTFs will have the same content populated for both the transcript_id and gene_id values. See the note at UCSC for more about why.
- Still have problems? Try removing all GTF header lines with the tool Remove beginning of a file.
- More about understanding and working with GTF/GFF/GFF3 reference annotation
How can I do analysis X? - Getting help
If you don’t know how to perform a certain analysis, you can ask the Galaxy community for help.
Where to ask
The best places to ask your analysis questions are:
Note: For questions about errors you’ve encountered in Galaxy, please see our troubleshooting page.
How to ask
The more detail you provide, the better we can help you. Please provide information about:
- Your data and experiment e.g. “paired-end RNASeq, mouse, 16 triplicates, 2 timepoints”, etc
- Your goal and research question e.g. “I want to detect diffentially expressed genes between these two groups and generate a volcano plot”
- What you have already tried? Do you already know which tools you want to use? Did you already try some but they didn’t work? Why not? Did you find good papers describing something similiar to what you want to do? etc.
- Which Galaxy are you using? And if you have already tried some steps, please share your Galaxy history via URL and provide this along with your question.
- Examples
- Bad Question: “Help!!! How to perform metagenomics analysis. I need it urgent!”
- Good Question: “Hello everybody, I have 16S rRNA sequencing data from Illumina, it was paired-end with 150bp reads. I want to perform a taxonomy analysis similar to this paper (provide link). I have followed this GTN tutorial (provide link), but my data is different because (reason) . How can I adapt this step of the analysis for my data? I read about a tool called X, but I cannot find it in Galaxy. I am using Galaxy EU, and here is a link to my history. Any help would be greatly appreciated!”
Before you ask
- Check the Galaxy Help forum to see if others have already asked a similar question before.
- Search the GTN website for a tutorial that matches what you want to do, and work your way through that. Even if it doesn’t doe exactly what you need, you usually learn a lot along the way that will help you adapt it to your own data or research question.
Be patient
Please remember that most of the people answering questions on Matrix chat and the help forum are volunteers from the community. They take time out of their busy days to help you. They may also be in a different time zone, so it may take some time to get answers. Please always be patient and kind to each other, and adhere to our code of conduct.
My jobs aren't running!
Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.
- Activate your account. If you have recently registered your account, you may first have to activate it. You will receive an e-mail with an activation link.
- Make sure to check your spam folder!
Be patient. Galaxy is a free service, when a lot of people are using it, you may have to wait longer than usual (especially for ‘big’ jobs, e.g. alignments).
- Contact Support. If you really think something is wrong with the server, you can ask for support
Pick the right Concatenate tool
Most Galaxy servers will have two Concatenate tools installed - know which one to pick!On most Galaxy servers you will find two tool Concatenate datasets tools installed:
- Concatenate datasets tail-to-head
- Concatenate datasets tail-to-head (cat)
The two tools have nearly identical interfaces, but behave differently in certain situations, specifically:
The second tool, the one with “(cat)” in its name, simply concatenates everything you give to it into a single output dataset.
Whether you give it multiple datasets or a collection as the first parameter, or some datasets as the first and some others as the second parameter, it will always concatenate them all. In fact, the only reason for having multiple parameters for this tool is that by providing inputs through multiple parameters, you can make sure they are concatenated in the order you pass them in.
The first tool, on the other hand, will only ever concatenate inputs provided through different parameters.
This tool allows you to specify an arbitrary number of param-file single datasets, but if you also want to use param-files multiple datasets or param-collection a collection for some of the Dataset parameters, then all of these need to be of the same type (multiple datasets or collections) and have the same number of inputs.
Now depending on the inputs, one of the following behaviors will occur:
- If all the different inputs are param-file single datasets, the tool will concatenate them all and produce a single output dataset.
- If all the different inputs are specified either as param-files multiple datasets or as param-collection, and all have the same number of datasets, then the tool will concatenate the first datasets of each input parameter, the second datasets of each input parameter, the third, etc., and produce an output collection with as many elements as there are inputs per Dataset parameter.
- In extension of the above, if some additional inputs are provided as param-file single datasets, the content of these will be recycled and be reused in the concatenation of all the nth elements of the other parameters.
Reporting usage problems, security issues, and bugs
- For reporting Usage Problems, related to tools and functions, head to the Galaxy Help site.
- Red Error Datasets:
- Refer to the Troubleshooting errors FAQ for red error in datasets.
- Unexpected results in Green Success Dataset:
- To resolve it you may be asked to send in a shared history link and possibly a shared workflow link. For sharing your history, refer to this these instructions.
- To reach our support team, visit Support FAQs.
- Functionality problems:
- Using Galaxy Help is the best way to get help in most cases.
- If the problem is more complex, email a description of the problem and how to reproduce it.
- Administrative problems:
- If the problem is present in your own Galaxy, the administrative configuration may be a factor.
- For the fastest help directly from the development community, admin issues can be alternatively reported to the mailing list or the GalaxyProject Gitter channel.
- For Security Issues, do not report them via GitHub. Kindly disclose these as explained in this document.
- For Bug Reporting, create a Github issue. Include the steps mentioned in these instructions.
- Search the GTN Search to find prior Q & A, FAQs, tutorials, and other documentation across all Galaxy resources, to verify in case your issue was already faced by someone.
Results may vary
Comment: Results may varyYour results may be slightly different from the ones presented in this tutorial due to differing versions of tools, reference data, external databases, or because of stochastic processes in the algorithms.
Troubleshooting errors
When you get a red dataset in your history, it means something went wrong. But how can you find out what it was? And how can you report errors?When something goes wrong in Galaxy, there are a number of things you can do to find out what it was. Error messages can help you figure out whether it was a problem with one of the settings of the tool, or with the input data, or maybe there is a bug in the tool itself and the problem should be reported. Below are the steps you can follow to troubleshoot your Galaxy errors.
- Expand the red history dataset by clicking on it.
- Sometimes you can already see an error message here
View the error message by clicking on the bug icon galaxy-bug
- Check the logs. Output (stdout) and error logs (stderr) of the tool are available:
- Expand the history item
- Click on the details icon
- Scroll down to the Job Information section to view the 2 logs:
- Tool Standard Output
- Tool Standard Error
- For more information about specific tool errors, please see the Troubleshooting section
- Submit a bug report! If you are still unsure what the problem is.
- Click on the bug icon galaxy-bug
- Write down any information you think might help solve the problem
- See this FAQ on how to write good bug reports
- Click galaxy-bug Report button
- Ask for help!
- Where?
- In the GTN Matrix Channel
- In the Galaxy Matrix Channel
- Browse the Galaxy Help Forum to see if others have encountered the same problem before (or post your question).
- When asking for help, it is useful to share a link to your history
Will my jobs keep running?
Galaxy is a fantastic system, but some users find themselves wondering:
Will my jobs keep running once I’ve closed the tab? Do I need to keep my browser open?
No, you don’t! You can safely:
- Start jobs
- Shut down your computer
and your jobs will keep running in the background! Whenever you next visit Galaxy, you can check if your jobs are still running or completed.
However, this is not true for uploading data from your computer. You must wait for uploading a dataset from your computer to finish. (Uploading via URL is not affected by this, if you’re uploading from URL you can close your computer.)
Collections
Adding a tag to a collection
- Click on the collection in your history to view it
- Click on Edit galaxy-pencil next to the collection name at the top of the history panel
- Click on Add Tags galaxy-tags
- Add a tag starting with
#
- Tags starting with
#
will be automatically propagated to the outputs any tools using this dataset.- Click Save galaxy-save
- Check that the tag appears below the collection name
Changing the datatype of a collection
This will set the datatype for all files in your collection. Does not change the files themselves.
- Click on Edit galaxy-pencil next to the collection name in your history
- In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
- Under new type, select your desired datatype
- tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
Cannot find the feature?
If you are on a smaller Galaxy server, i.e. not one of the large (multi)national public servers, you may not be able to find this operation, and there is no indication it is missing or why it is disabled.
Galaxy has recently started putting more features behind a setting and deployment configuration that needs to be enabled by the server administrator. Your administrator will need to deploy Celery and potentially additionally flower and redis to their stack to enable changing the datatype of a collection. Consider sending your Galaxy administrator the link to the simpler deployment option or more complex GTN tutorial for setting up redis and flower.
Converting the datatype of a collection
This will convert all files in your collection to a different format. This will change the files themselves and create a new collection.
- Click on Edit galaxy-pencil next to the collection name in your history
- In the central panel, click on the galaxy-gear Convert tab on the top
- Under Converter Tool, select your desired conversion
- Click the Convert Collection button
Creating a dataset collection
- Click on galaxy-selector Select Items at the top of the history panel
- Check all the datasets in your history you would like to include
Click n of N selected and choose Build Dataset List
- Enter a name for your collection
- Click Create collection to build your collection
- Click on the checkmark icon at the top of your history again
Creating a paired collection
- Click on galaxy-selector Select Items at the top of the history panel
- Check all the datasets in your history you would like to include
Click n of N selected and choose Build List of Dataset Pairs
- Change the text of unpaired forward to a common selector for the forward reads
- Change the text of unpaired reverse to a common selector for the reverse reads
- Click Pair these datasets for each valid forward and reverse pair.
- Enter a name for your collection
- Click Create List to build your collection
- Click on the checkmark icon at the top of your history again
Renaming a collection
- Click on the collection
- Click on the name of the collection at the top
- Change the name
- Press Enter
Collections, histories
Datasets versus collections
Explanation of why collections are needed and what they areDatasets versus collections
In Galaxy’s history datasets can be present as individual entries or they can be combined into Collections. Why do we need collections? Collections combine multiple individual datasets into a single entity which is easy to manage. Galaxy tools can use collections directly as inputs. Collection can be simple or nested.
Simple collections
Imagine that you’ve uploaded a hundred FASTQ files corresponding to a hundred samples. These will appear as a hundred individual datasets in your history making it very long. But the chances are that when you analyze these data you will do the same thing on each dataset.
To simplify this process you can combine all hundred datasets into a single entity called a dataset collection (or simply a collection or a list). It will appear as a single box in your history making it much easier to understand. Galaxy tools are designed to take collections as inputs. So, for example, if you want to map each of these datasets against a reference genome using, say, Minimap2 , you will need to provide
minmap2
with just one input, the collection, and it will automatically start 100 jobs behind the scenes and will combine all outputs into a single collection containing BAM files.There is a number of situations when simple collections are not sufficient to reflect the complexity of the data. To deal with this situation Galaxy allows for nested collections.
Nested collections
Probably the most common example of this is pared end data when each sample is represented by two files: one containing forward reads and another containing reverse reads. In Galaxy you can create nested collection that reflects the hierarchy of the data. In the case of paired data Galaxy supports paired collections.
Data upload
Data retrieval with “NCBI SRA Tools” (fastq-dump)
This section will guide you through downloading experimental metadata, organizing the metadata to short lists corresponding to conditions and replicates, and finally importing the data from NCBI SRA in collections reflecting the experimental design.
Downloading metadata
- It is critical to understand the condition/replicate structure of an experiment before working with the data so that it can be imported as collections ready for analysis. Direct your browser to SRA Run Selector and in the search box enter GEO data set identifier (for example: GSE72018). Once the study appears, click the box to download the “RunInfo Table”.
Organizing metadata
- The “RunInfo Table” provides the experimental condition and replicate structure of all of the samples. Prior to importing the data, we need to parse this file into individual files that contain the sample IDs of the replicates in each condition. This can be achieved by using a combination of the ‘group’, ‘compare two datasets’, ‘filter’, and ‘cut’ tools to end up with single column lists of sample IDs (SRRxxxxx) corresponding to each condition.
Importing data
- Provide the files with SRR IDs to NCBI SRA Tools (fastq-dump) to import the data from SRA to Galaxy. By organizing the replicates of each condition in separate lists, the data will be imported as “collections” that can be directly loaded to a workflow or analysis pipeline.
Directly obtaining UCSC sourced *genome* identifiers
Option 1
- Go to UCSC Genome Browser, navigate to “genomes”, then the species of interest.
- On the home page for the genome build, immediately under the top navigation box, in the blue bar next to the full genome build name, you will find View sequences button.
- Click on the View sequences button and it will take you to a detail page with a table listing out the contents.
Option 2
- Use the tool Get Data -> UCSC Main.
- In the Table Browser, choose the target genome and build.
- For “group” choose the last option “All Tables”.
- For “table” choose “chromInfo”.
- Leave all other options at default and send the output to Galaxy.
- This new dataset will load as a tabular dataset into your history.
- It will list out the contents of the genome build, including the chromosome identifiers (in the first column).
How can I upload data using EBI-SRA?
- Search for your data directly in the tool and use the Galaxy links.
- Be sure to check your sequence data for correct quality score formats and the metadata “datatype” assignment.
Importing data from Sierra LIMS
This section will guide you through generating external links to your data stored in the Sierra LIMS system to be downloaded directly into Galaxy.
- Go to the Sierra portal and login to your account.
- Click on the Sample ID of the sample you want to download data from.
- Click on the Edit Sample Details button.
- At the bottom of the page there will be an input box for creating a link, enter a description for the link in the Reason for link section, and click Create link. This will reload the page and add a new link to the sample under Authorised links to this sample.
- Go back to the sample page or click on the hyperlink called link to take you back.
- In the Results section select the lane you want to access your data from.
- The bottom of the page, under the Links section, will now contain a list of
wget
commands with links for accessing all the files within that sample/lane.- Since this list is for
wget
commands, you need to extract out the links from the command. You can copy the link in the first set of double quotes for each line and galaxy-wf-edit Paste/Fetch Data them directly into Galaxy to download the files.
Importing data from a data library
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
- Go into Data (top panel) then Data libraries
- Navigate to the correct folder as indicated by your instructor.
- On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
- Select the desired files
- Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
In the pop-up window, choose
- “Select history”: the history you want to import the data to (or create a new one)
- Click on Import
Importing data from remote files
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a Choose remote files:
- Click on Upload Data on the top of the left panel
Click on Choose remote files and scroll down to find your data folder or type the folder name in the
search box
on the top.- click on OK
- Click on Start
- Click on Close
- You can find the dataset has begun loading in you history.
Importing via links
- Copy the link location
Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data
Paste the link(s) into the text field
Press Start
- Close the window
NCBI SRA sourced fastq data
In these FASTQ data:
- The quality score identifier (+) is sometimes not a match for the sequence identifier (@).
- The forward and reverse reads may be interlaced and need to be separated into distinct datasets.
- Both may be present in a dataset. Correct the first, then the second, as explained below.
- Format problems of any kind can cause tool failures and/or unexpected results.
- Fix the problems before running any other tools (including FastQC, Fastq Groomer, or other QA tools)
For inconsistent sequence (@) and quality (+) identifiers
Correct the format by running the tool Replace Text in entire line with these options:
- Find pattern:
^\+SRR.+
- Replace with:
+
Note: If the quality score line is named like “+ERR” instead (or other valid options), modify the pattern search to match.
For interlaced forward and reverse reads
Solution 1 (reads named /1 and /2)
- Use the tool FASTQ de-interlacer on paired end reads
Solution 2 (reads named /1 and /2)
- Create distinct datasets from an interlaced fastq dataset by running the tool Manipulate FASTQ reads on various attributes on the original dataset. It will run twice.
Note: The solution does NOT use the FASTQ Splitter tool. The data to be manipulated are interlaced sequences. This is different in format from data that are joined into a single sequence.
Use the Manipulate FASTQ settings to produce a dataset that contains the
/1
reads**Match Reads
- Match Reads by
Name/Identifier
- Identifier Match Type
Regular Expression
- Match by
.+/2
Manipulate Reads
- Manipulate Reads by
Miscellaneous Actions
- Miscellaneous Manipulation Type
Remove Read
Use these Manipulate FASTQ settings to produce a dataset that contains the
/2
reads**
- Exact same settings as above except for this change: Match by
.+/1
Solution 3 (reads named /1 and /3)
- Use the same operations as in Solution 2 above, except change the first Manipulate FASTQ query term to be:
- Match by
.+/3
Solution 4 (reads named without /N)
- If your data has differently formatted sequence identifiers, the “Match by” expression from Solution 2 above can be modified to suit your identifiers.
Alternative identifiers such as:
@M00946:180:000000000-ANFB2:1:1107:14919:14410 1:N:0:1
@M00946:180:000000000-ANFB2:1:1107:14919:14410 2:N:0:1
Upload datasets from GenomeArk
- Open the file galaxy-upload upload menu
- Click on Choose remote files tab
- Click on the Genome Ark button and then click on species
You can find the data by following this path:
/species/${Genus}_${species}/${specimen_code}/genomic_data
. Inside a given datatype directory (e.g.pacbio
), select all the relevant files individually until all the desired files are highlighted and click the Ok button. Note that there may be multiple pages of files listed. Also note that you may not want every file listed.
Upload few files (1-10)
- Click on Upload Data on the top of the left panel
- Click on Choose local file and select the files or drop the files in the Drop files here part
- Click on Start
- Click on Close
Upload many files (>10) via FTP
Some Galaxies offer FTP upload for very large datasets.
Note: the “Big Three” Galaxies (Galaxy Main, Galaxy EU, and Galaxy Australia) no longer support FTP upload, due to the recent improvements of the default web upload, which should now support large file uploads and almost all use cases. For situations where uploading via the web interface is too tedious, the galaxy-upload commandline utility is also available as an alternative to FTP.
To upload files via FTP, please
Check that your Galaxy supports FTP upload and look up the FTP settings.
Make sure to have an FTP client installed
There are many options. We can recommend FileZilla, a free FTP client that is available on Windows, MacOS, and Linux.
- Establish FTP connection to the Galaxy server
- Provide the Galaxy server’s FTP server name (e.g.
ftp.mygalaxy.com
)- Provide the username (usually the e-mail address) and the password on the Galaxy server
- Connect
Add the files to the FTP server by dragging/dropping them or right clicking on them and uploading them
The FTP transfer will start. We need to wait until they are done.
- Open the Upload menu on the Galaxy server
- Click on Choose FTP file on the bottom
- Select files to import into the history
- Click on Start
Datasets
Adding a tag
Tags can help you to better organize your history and track datasets.Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.
To tag a dataset:
- Click on the dataset to expand it
- Click on Add Tags galaxy-tags
- Add tag text. Tags starting with
#
will be automatically propagated to the outputs of tools using this dataset (see below).- Press Enter
- Check that the tag appears below the dataset name
Tags beginning with
#
are special!They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):
- a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
- dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for
+
and-
strands. This generates two datasets (4 and 5 for plus and minus, respectively);- datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
- datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.
Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.
The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with
#plus
and#minus
, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.More information is in a dedicated #nametag tutorial.
Changing database/build (dbkey)
You can tell Galaxy which dbkey (e.g. reference genome) your dataset is associated with. This may be used by tools to automatically use the correct settings.
- Click the desired dataset’s name to expand it.
Click on the “?” next to database indicator:
- In the central panel, change the Database/Build field
- Select your desired database key from the dropdown list
- Click the Save button
Changing the datatype
Galaxy will try to autodetect the datatype of your files, but you may need to manually set this occasionally.
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click galaxy-chart-select-data Datatypes tab on the top
- In the galaxy-chart-select-data Assign Datatype, select your desired datatype from “New type” dropdown
- Tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
Converting the file format
Some datasets can be transformed into a different format. Galaxy has some built-in file conversion options depending on the type of data you have.
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click on the galaxy-gear Convert tab on the top
- In the upper part galaxy-gear Convert, select the appropriate datatype from the list
- Click the Create dataset button to start the conversion.
Creating a new file
Galaxy allows you to create new files from the upload menu. You can supply the contents of the file.
- Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data at the bottom
- Paste the file contents into the text field
- Press Start and Close the window
Datasets not downloading at all
- Check to see if pop-ups are blocked by your web browser. Where to check can vary by browser and extensions.
- Double check your API key, if used. Go to User > Preferences > Manage API key.
- Check the sharing/permission status of the Datasets. Go to Dataset > Pencil icon galaxy-pencil > Edit attributes > Permissions. If you do not see a “Permissions” tab, then you are not the owner of the data.
Notes:
- If the data was shared with you by someone else from a Shared History, or was copied from a Published History, be aware that there are multiple levels of data sharing permissions.
- All data are set to not shared by default.
- Datasets sharing permissions for a new history can be set before creating a new history. Go to User > Preferences > Set Dataset Permissions for New Histories.
- User > Preferences > Make all data private is a “one click” option to unshare ALL data (Datasets, Histories). Note that once confirmed and all data is unshared, the action cannot be “undone” in batch, even by an administrator. You will need to re-share data again and/or reset your global sharing preferences as wanted.
- Only the data owner has control over sharing/permissions.
- Any data you upload or create yourself is automatically owned by you with full access.
- You may not have been granted full access if the data were shared or imported, and someone else is the data owner (your copy could be “view only”).
- After you have a fully shared copy of any shared/published data from someone else, then you become the owner of that data copy. If the other person or you make changes, it applies to each person’s copy of the data, individually and only.
- Histories can be shared with included Datasets. Datasets can be downloaded/manipulated by others or viewed by others.
- Share access to Datasets is distinct but it relates to Histories’ access.
Detecting the datatype (file format)
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
- Click the Auto-detect button to have Galaxy try to autodetect it.
Different dataset icons and their usage
Icons provide a visual experience for objects, actions, and ideasDataset icons and their usage:
- galaxy-eye “Eye icon”: Display dataset contents.
- galaxy-pencil “Pencil icon”: Edit attributes of dataset metadata: labels, datatype, database.
- galaxy-delete “Trash icon”: Delete the dataset.
- galaxy-save “Disc icon”: Download the dataset.
- galaxy-link “Copy link”: Copy link URL to the dataset.
- galaxy-info “Info icon”: Dataset details and job runtime information: inputs, parameters, logs.
- galaxy-refresh “Refresh/Rerun icon”: Run this (selected) job again or examine original submitted form.
- galaxy-barchart “Visualize icon”: External display links (UCSC, IGV, NPL, PV); Charts and graphing; Editor (manually edit text).
- galaxy-dataset-map “Dataset Map icon”: Filter the history for related Input/Output Datasets. Click again to clear the filter.
- galaxy-bug “Bug icon”: Review subset of logs (review all under galaxy-info), and optionally submit a bug report.
Downloading datasets
- Click on the dataset in your history to expand it
- Click on the Download icon galaxy-save to save the dataset to your computer.
Downloading datasets using command line
From the terminal window on your computer, you can use wget or curl.
- Make sure you have wget or curl installed.
- Click on the Dataset name, then click on the copy link icon galaxy-link. This is the direct-downloadable dataset link.
- Once you have the link, use any of the following commands:
- For wget
wget '<link>'
wget -O '<link>'
wget -O --no-check-certificate '<link>' # ignore SSL certificate warnings
wget -c '<link>' # continue an interrupted download
- For curl
curl -o outfile '<link>'
curl -o outfile --insecure '<link>' # ignore SSL certificate warnings
curl -C - -o outfile '<link>' # continue an interrupted download
- For dataset collections and datasets within collections you have to supply your API key with the request
- Sample commands for wget and curl respectively are:
wget https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY
curl -o myfile.txt https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY
Finding BAM dataset identifiers
Quickly learn what the identifiers are in any **BAM** dataset that is the result from mapping
- Run Samtools: IdxStats on the aligned data (
bam
dataset).- The “index header” chromosome names and lengths will be listed in the output (along with read counts).
- Compare the chromosome identifiers to the chromosome (aka “chrom”) field in all other inputs: VCF, GTF, GFF(3), BED, Interval, etc.
Note:
- The original mapping target may have been a built-in genome index, custom genome (transcriptome, exome, other) – the same
bam
data will still be summarized.- This method will not work for “sequence-only”
bam
datasets, as these usually have no header.
Finding Datasets
- To review all active Datasets in your account, go to User > Datasets.
Notes:
- Logging out of Galaxy while the Upload tool is still loading data can cause uploads to abort. This is most likely to occur when a dataset is loaded by browsing local files.
- If you have more than one browser window open, each with a different Galaxy History loaded, the Upload tool will load data into the most recently used history.
- Click on refresh icon galaxy-refresh at the top of the History panel to display the current active History with the datasets.
How to delete datasets?
Deleting datasets individually
To delete datasets individually simply click the galaxy-delete button with dataset’s box. That’s it! This action is reversible: datasets can be undeleted.
Deleting datasets in bulk
To delete multiple datasets at once:
- Click history-select-multiple icon at the top of the history pane;
- Select datasets you want to delete;
- Click the dropdown that would appear at the top of the history;
- Select “Delete” option.
This action is also reversible: datasets can be undeleted.
Deleting datasets permanently warning Danger zone!
Warning: Permanent is ... PERMANENT!Datasets deleted in this fashion CANNOT be undeleted!
To delete multiple datasets PERMANENTLY:
- Click history-select-multiple icon at the top of the history pane;
- Select datasets you want to delete;
- Click the dropdown that would appear at the top of the history;
- Select “Delete (permanently)” option.
How to hide datasets?
To hide datasets:
- Click history-select-multiple icon at the top of the history pane;
- Select datasets you want to hide;
- Click the dropdown that would appear at the top of the history;
- Select “Hide” option.
How to un-delete datasets?
If your history contains deleted datasets you will see galaxy-delete “Include deleted” button directly above dataset display.
To un-delete datasets:
- Type
deleted:true
in the search box- Select datasets you want to un-delete
- Click the dropdown that would appear at the top of the history;
- Select “Undelete” option.
Alternatively, you can:
- click galaxy-delete “Include deleted” button directly above dataset display. This will cause deleted datasets to appear in history along with normal (un-deleted) datasets;
- deleted datasets are distinguished by having dataset-undelete within dataset box. Clicking on this icon will un-delete a given dataset;
How to un-hide datasets?
If your history contains hidden datasets you will see galaxy-show-hidden “Include hidden” button directly above the dataset display.
To un-hide datasets:
- Type
visible:hidden
in the search box- Select datasets you want to un-hide
- Click the dropdown that would appear at the top of the history;
- Select “Unhide” option.
Alternatively, you can:
- click galaxy-show-hidden “Include hidden” button directly above dataset display. This will cause hidden datasets to appear in history along with normal (un-hidden) datasets;
- hidden datasets are distinguished by having galaxy-show-hidden within dataset box. Clicking on this icon will un-hide a given dataset;
Mismatched Chromosome identifiers and how to avoid them
Reference data mismatches are similiar to bad reagents in a wet lab experiment: all sorts of odd problems can come up!
You inputs must be all based on an identical genome assembly build to achieve correct scientific results.
There are two areas to review for data to be considered identical.
- The data are based on the same exact genome assembly (or “assembly release”).
- The “assembly” refers to the nucleotide sequence of the genome.
- If the base order and length of the chromosomes are not the same, then your coordinates will have scientific problems.
- Converting coordinates between assemblies may be possible. Search tool panel with
CrossMap
.- The data are based on the same exact genome assembly build.
- The “build” refers to the labels used inside the file. In this context, pay attention to the chromosome identifiers.
- These all may mean the same thing to a person but not to a computer or tool: chr1, Chr1, 1, chr1.1
- Converting identifiers between builds may be possible. Search tool panel with
Replace
.The methods listed below help to identify and correct errors or unexpected results when the underlying genome assembly build for all inputs are not identical.
Method 1: Finding BAM dataset identifiers
Method 2: Directly obtaining UCSC sourced genome identifiers
Method 3: Adjusting identifiers for UCSC sourced data used with other sourced data
Method 4: Adjusting identifiers or input source for any mixed sourced data
tip Reference data is self referential. More help for your genome, transcriptome, and annotation
tip Genome not available as a native index? Use a custom genome fasta and create a custom build database instead.
tip More notes on Native Reference Genomes
- Native reference genomes (FASTA) are built as pre-computed indexes on the Galaxy server where you are working.
- Different servers host both common and different reference genome data.
- Most reference annotation (tabular, GTF, GFF3) is supplied from the history by the user, even when the genome is indexed.
- Public Galaxy servers source reference genomes preferentially from UCSC.
- A reference transcriptome (FASTA) is supplied from the history by the user.
- Many experiements use a combination of all three types of reference data. Consider pre-preparing your files at the start!
- The default variant for a native genome index is “Full”. Defined as: all primary chromosomes (or scaffolds/contigs) including mitochondrial plus associated unmapped, plasmid, and other segments.
- When only one version of a genome is available for a tool, it represents the default “Full” variant.
- Some genomes will have more than one variant available.
- The “Canonical Male” or sometimes simply “Canonical” variant contains the primary chromosomes for a genome. For example a human “Canonical” variant contains chr1-chr22, chrX, chrY, and chrM.
- The “Canonical Female” variant contains the primary chromosomes excluding chrY.
Moving datasets between Galaxy servers
On the origin Galaxy server:
- Click on the name of the dataset to expand the info.
- Click on the Copy link icon galaxy-link.
On the destination Galaxy server:
- Click on Upload data > Paste / Fetch Data and paste the link. Select attributes, such as genome assembly, if required. Hit the Start button.
Note: The copy link icon galaxy-link cannot be used to move HTML datasets (but this can be downloaded using the download button galaxy-save) and SQLite datasets.
Purging datasets
- All account Datasets can be reviewed under User > Datasets.
- To permanently delete: use the link from within the dataset, or use the Operations on Multiple Datasets functions, or use the Purge Deleted Datasets option in the History menu.
Notes:
- Within a History, deleted/permanently deleted Datasets can be reviewed by toggling the deleted link at the top of the History panel, found immediately under the History name.
- Both active (shown by default) and hidden (the other toggle link, next to the deleted link) datasets can be reviewed the same way.
- Click on the far right “X” to delete a dataset.
- Datasets in a deleted state are still part of your quota usage.
- Datasets must be purged (permanently deleted) to not count toward quota.
Quotas for datasets and histories
- Deleted datasets and deleted histories containing datasets are considered when calculating quotas.
- Permanently deleted datasets and permanently deleted histories containing datasets are not considered.
- Histories/datasets that are shared with you are only partially considered unless you import them.
Note: To reduce quota usage, refer to How can I reduce quota usage while still retaining prior work (data, tools, methods)? FAQ.
Renaming a dataset
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Name field
- Click the Save button
Understanding job statuses
Job statuses will help you understand the stages of your work.Compare the color of your datasets to these job processing stages.
- Grey: The job is queued. Allow this to complete!
- Yellow: The job is executing. Allow this to complete!
- Green: The job has completed successfully.
- Red: The job has failed. Check your inputs and parameters with Help examples and GTN tutorials. Scroll to the bottom of the tool form to find these.
- Light Blue: The job is paused. This indicates either an input has a problem or that you have exceeded the disk quota set by the administrator of the Galaxy instance you are working on.
- Grey, Yellow, Grey again: The job is waiting to run due to admin re-run or an automatic fail-over to a longer-running cluster.
galaxy-info Don’t lose your queue placement! It is essential to allow queued jobs to remain queued, and to never interrupt an executing job. If you delete/re-run jobs, they are added back to the end of the queue again.
Related FAQs
Working with GFF GFT GTF2 GFF3 reference annotation
- All annotation datatypes have a distinct format and content specification.
- Data providers may release variations of any, and tools may produce variations.
- GFF3 data may be labeled as GFF.
- Content can overlap but is generally not understood by tools that are expecting just one of these specific formats.
- Best practices
- The sequence identifiers must exactly match between reference annotation and reference genomes transcriptomes exomes.
- Most tools expect GFT format unless the tool form specifically notes otherwise.
- Get the GTF version from the data providers if it is available.
- If only GFF3 is available, you can attempt to transform it with the tool gffread.
- Was GTF data detected as GFF during Upload? It probably has headers. -Remove the headers (lines that start with a “#”) with the Select tool using the option “NOT Matching” with the regular expression: ^#
- Redetect the datatype. It should be GTF once corrected.
- UCSC annotation
- Find annotation under their Downloads area. The path will be similar to:
https://hgdownload.soe.ucsc.edu/goldenPath/<database>/bigZips/genes/
- Copy the URL from UCSC and paste it into the Upload tool, allowing Galaxy to detect the datatype.
Working with deleted datasets
Deleted datasets and histories can be recovered by users as they are retained in Galaxy for a time period set by the instance administrator. Deleted datasets can be undeleted or permanently deleted within a History. Links to show/hide deleted (and hidden) datasets are at the top of the History panel.
- To review or adjust an individual dataset:
- Click on the name to expand it.
- If it is only deleted, but not permanently deleted, you’ll see a message with links to recover or to purge.
- Click on Undelete it to recover the dataset, making it active and accessible to tools again.
- Click on Permanently remove it from disk to purge the dataset and remove it from the account quota calculation.
- To review or adjust multiple datasets in batch:
- Click on the checked box icon galaxy-selector near the top left of the history panel (Select Items) to switch into “Operations on Multiple Datasets” mode.
- Accordingly for each individual dataset, choose the selection box. Check the datasets you want to modify and choose your option (show, hide, delete, undelete, purge, and group datasets).
Working with very large fasta datasets
- Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
- Search GTN tutorials with the keyword “qa-qc” for examples.
- Search Galaxy Help with the keywords “qa-qc” and “fasta” for more help.
- Assembly result?
- Consider filtering by length to remove reads that did not assemble.
- Formatting criteria:
- All sequence identifiers must be unique.
- Some tools will require that there is no description line content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
- Custom genome, transcriptome exome?
- Only appropriate for smaller genomes (bacterial, viral, most insects).
- Not appropriate for any mammalian genomes, or some plants/fungi.
- Sequence identifiers must be an exact match with all other inputs or expect problems. See GFF GFT GFF3.
- Formatting criteria:
- All sequence identifiers must be unique.
- ALL tools will require that there is no description content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
- The only exception is when executing the MakeBLASTdb tool and when the input fasta is in NCBI BLAST format (see the tool form).
Working with very large fastq datasets
- Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
- Search GTN tutorials with the keyword “qa-qc” for examples.
- Search Galaxy Help with the keywords “qa-qc” and “fastq” for more help.
- How to create a single smaller input. Search the tool panel with the keyword “subsample” for tool choices.
- How to create multiple smaller inputs. Start with Split file to dataset collection, then merge the results back together using a tool specific for the datatype. Example: BAM results? Use MergeSamFiles.
Datatypes
Best practices for loading fastq data into Galaxy
- As of release
17.09
,fastq
data will have the datatypefastqsanger
auto-detected when that quality score scaling is detected and “autodetect” is used within the Upload tool. Compressedfastq
data will be converted to uncompressed in the history.- To preserve
fastq
compression, directly assign the appropriate datatype (eg:fastqsanger.gz
).- If the data is close to or over 2 GB in size, be sure to use FTP.
- If the data was already loaded as
fastq.gz
, don’t worry! Just test the data for correct format (as needed) and assign the metadata type.
Compressed FASTQ files, (`*.gz`)
- Files ending in
.gz
are compressed (zipped) files.
- The
fastq.gz
format is a compressed version of afastq
dataset.- The
fastqsanger.gz
format is a compressed version of thefastqsanger
datatype, etc.- Compression saves space (and therefore your quota).
- Tools can accept the compressed versions of input files
- Make sure the datatype (compressed or uncompressed) is correct for your files, or it may cause tool errors.
FASTQ files: `fastq` vs `fastqsanger` vs ..
FASTQ files come in various flavours. They differ in the encoding scheme they use. See our QC tutorial for a more detailed explanation of encoding schemes.
Nowadays, the most commonly used encoding scheme is sanger. In Galaxy, this is the
fastqsanger
datatype. If you are using older datasets, make sure to verify the FASTQ encoding scheme used in your data.Be Careful: choosing the wrong encoding scheme can lead to incorrect results!
Tip: There are 2 Galaxy datatypes that have similar names, but are not the same, please make sure you
fastqsanger
andfastqcssanger
(not the additionalcs
).Tip: When in doubt, choose
fastqsanger
How do `fastq.gz` datasets relate to the `.fastqsanger` datatype metadata assignment?
Before assigning
fastqsanger
orfastqsanger.gz
, be sure to confirm the format.TIP:
- Using non-fastqsanger scaled quality values will cause scientific problems with tools that expected
fastqsanger
formatted input.- Even if the tool does not fail, get the format right from the start to avoid problems. Incorrect format is still one of the most common reasons for tool errors or unexpected results (within Galaxy or not).
- For more information on How to format fastq data for tools that require .fastqsanger format?
How to format fastq data for tools that require .fastqsanger format?
- Most tools that accept FASTQ data expect it to be in a specific FASTQ version:
.fastqsanger
. The.fastqsanger
datatype must be assigned to each FASTQ dataset.In order to do that:
- Watch the FASTQ Prep Illumina video for a complete walk-through.
- Run FastQC first to assess the type.
- Run FASTQ Groomer if the data needs to have the quality scores rescaled.
- If you are certain that the quality scores are already scaled to Sanger Phred+33 (the result of an Illumina 1.8+ pipeline), the datatype
.fastqsanger
can be directly assigned. Click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype.fastqsanger
, and save.- Run FastQC again on the entire dataset if any changes were made to the quality scores for QA.
Other tips
- If you are not sure what type of FASTQ data you have (maybe it is not Illumina?), see the help directly on the FASTQ Groomer tool for information about types.
- For Illumina, first run FastQC on a sample of your data (how to read the full report). The output report will note the quality score type interpreted by the tool. If not
.fastqsanger
, run FASTQ Groomer on the entire dataset. If.fastqsanger
, just assign the datatype.- For SOLiD, run NGS: Fastq manipulation → AB-SOLID DATA → Convert, to create a
.fastqcssanger
dataset. If you have uploaded a color space fastq sequence with quality scores already scaled to Sanger Phred+33 (.fastqcssanger
), first confirm by running FastQC on a sample of the data. Then if you want to double-encode the color space into psuedo-nucleotide space (required by certain tools), see the instructions on the tool form Fastq Manipulation for the conversion.- If your data is FASTA, but you want to use tools that require FASTQ input, then using the tool NGS: QC and manipulation → Combine FASTA and QUAL. This tool will create “placeholder” quality scores that fit your data. On the output, click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype
.fastqsanger
, and save.
Identifying and formatting Tabular Datasets
Format help for Tabular/BED/Interval DatasetsA Tabular datatype is human readable and has tabs separating data columns. Please note that tabular data is different from comma separated data (.csv) and the common datatypes are:
.bed
,.gtf
,.interval
, or.txt
.
- Click the pencil icon galaxy-pencil to reach the Edit Attributes form.
- Change the datatype (3rd tab) and save.
- Label columns (1st tab) and save.
- Metadata will be assigned, then the dataset can be used.
- If the required input is a BED or Interval datatype, adjusting (
.tab
→.bed
,.tab
→.interval
) maybe possible using a combination of Text Manipulation tools, to create a dataset that matches required specifications.- Some tools require that BED format be followed, even if the datatype Interval (with less strict column ordering) is accepted on the tool form.
- These tools will fail, if they are run with malformed BED datasets or non-specific column assignments.
- Solution: reorganize the data to be in BED format and rerun.
Understanding Datatypes
- Allow Galaxy to detect the datatype during Upload, and adjust from there if needed.
- Tool forms will filter for the appropriate datatypes it can use for each input.
- Directly changing a datatype can lead to errors. Be intentional and consider converting instead when possible.
- Dataset content can also be adjusted (tools: Data manipulation) and the expected datatype detected. Detected datatypes are the most reliable in most cases.
- If a tool does not accept a dataset as valid input, it is not in the correct format with the correct datatype.
- Once a dataset’s content matches the datatype, and that dataset is repeatedly used (example: Reference annotation) use that same dataset for all steps in an analysis or expect problems. This may mean rerunning prior tools if you need to make a correction.
- Tip: Not sure what datatypes a tool is expecting for an input?
- Create a new empty history
- Click on a tool from the tool panel
- The tool form will list the accepted datatypes per input
- Warning: In some cases, tools will transform a dataset to a new datatype at runtime for you.
- This is generally helpful, and best reserved for smaller datasets.
- Why? This can also unexpectedly create hidden datasets that are near duplicates of your original data, only in a different format.
- For large data, that can quickly consume working space (quota).
- Deleting/purging any hidden datasets can lead to errors if you are still using the original datasets as an input.
- Consider converting to the expected datatype yourself when data is large.
- Then test the tool directly on converted data. If it works, purge the original to recover space.
Using compressed fastq data as tool inputs
- If the tool accepts
fastq
input, then.gz
compressed data assigned to the datatypefastq.gz
is appropriate.- If the tool accepts
fastqsanger
input, then.gz
compressed data assigned to the datatypefastqsanger.gz
is appropriate.- Using uncompressed
fastq
data is still an option with tools. The choice is yours.TIP: Avoid labeling compressed data with an uncompressed datatype, and the reverse. Jobs using mismatched datatype versus actual format will fail with an error.
Features
Using the Window Manager to view multiple datasets
If you would like to view two or more datasets at once, you can use the Window Manager feature in Galaxy:
- Click on the Window Manager icon galaxy-scratchbook on the top menu bar.
- You should see a little checkmark on the icon now
- View galaxy-eye a dataset by clicking on the eye icon galaxy-eye to view the output
- You should see the output in a window overlayed over Galaxy
- You can resize this window by dragging the bottom-right corner
- Click outside the file to exit the Window Manager
- View galaxy-eye a second dataset from your history
- You should now see a second window with the new dataset
- This makes it easier to compare the two outputs
- Repeat this for as many files as you would like to compare
- You can turn off the Window Manager galaxy-scratchbook by clicking on the icon again
Why not use Excel?
Excel is a fantastic tool and a great place to build simple analysis models, but when it comes to scaling, Galaxy wins every time.You could just as easily use Excel to answer the same question, and if the goal is to learn how to use a tool, then either tool would be great! But what if you are working on a question where your analysis matters? Maybe you are working with human clinical data trying to diagnose a set of symptoms, or you are working on research that will eventually be published and maybe earn you a Nobel Prize?
In these cases your analysis, and the ability to reproduce it exactly, is vitally important, and Excel won’t help you here. It doesn’t track changes and it offers very little insight to others on how you got from your initial data to your conclusions.
Galaxy, on the other hand, automatically records every step of your analysis. And when you are done, you can share your analysis with anyone. You can even include a link to it in a paper (or your acceptance speech). In addition, you can create a reusable workflow from your analysis that others (or yourself) can use on other datasets.
Another challenge with spreadsheet programs is that they don’t scale to support next generation sequencing (NGS) datasets, a common type of data in genomics, and which often reach gigabytes or even terabytes in size. Excel has been used for large datasets, but you’ll often find that learning a new tool gives you significantly more ability to scale up, and scale out your analyses.
Histories
Copy a dataset between histories
Sometimes you may want to use a dataset in multiple histories. You do not need to re-upload the data, but you can copy datasets from one history to another.There 3 ways to copy datasets between histories
From the original history
- Click on the galaxy-gear icon which is on the top of the list of datasets in the history panel
- Click on Copy Datasets
Select the desired files
Give a relevant name to the “New history”
- Validate by ‘Copy History Items’
- Click on the new history name in the green box that have just appear to switch to this history
Using the galaxy-columns Show Histories Side-by-Side
- Click on the galaxy-dropdown dropdown arrow top right of the history panel (History options)
- Click on galaxy-columns Show Histories Side-by-Side
- If your target history is not present
- Click on ‘Select histories’
- Click on your target history
- Validate by ‘Change Selected’
- Drag the dataset to copy from its original history
- Drop it in the target history
From the target history
- Click on User in the top bar
- Click on Datasets
- Search for the dataset to copy
- Click on its name
- Click on Copy to current History
Creating a new history
Histories are an important part of Galaxy, most people use a new history for every new analysis. Always make sure to give your histories good names, so you can easily find your results back later.To create a new history simply click the new-history icon at the top of the history panel:
Dataset colors
Explains meaning of dataset colors in Galaxy's historyThere are several different “states” a dataset can be in. These states are indicated by colors:
- ok: everything is fine, life is good;
- new: the dataset was just created. Galaxy does not yet know when it is;
- queued: indicates that the job generating this dataset is scheduled for execution but not running yet;
- running: job generating this dataset is running;
- setting metadata: when a new dataset is uploaded Galaxy examines it to understand what kind of data it is (e.g., BAM, FASTQ, fasta, BED, etc.). This is called “setting metadata”;
- deferred: sometimes it does not make sense to upload the dataset until it is needed for an analysis. Galaxy will download deferred datasets later during the job execution. Those datasets do not count toward your quota;
- paused: in some cases as, for example, workflow executions, upstream errors prevent subsequent jobs from starting creating datasets in “paused” state;
- discarded: something went wrong such as, for example, a job producing this dataset might have been cancelled;
- error: everything is not fine; life is bad!
- placeholder: similar to “new”; we know something will be there but are not yet sure what;
- failed populated state: this refers to collections (not individual datasets). Here a collection has failed to be populated with datasets;
- new populated state: this refers to collections (not individual datasets). A collection was created but not populated yet.
Dataset snippet
Describes features of a single dataset element in the historyA single Galaxy dataset can either be “collapsed” or “expanded”.
Collapsed dataset view
Datasets in the panel are initially shown in a “collapsed” view:
It contains the following elements:
- Dataset number: (“1”) order of dataset in the history;
- Dataset name: (“M117-bl_1.fq.gz”) its name;
- galaxy-eye: click this to view the dataset contents;
- galaxy-pencil: click this to edit dataset properties;
- galaxy-delete: click this to delete the dataset from the history (don’t worry, you can undo this action!).
Clicking on a collapsed dataset will expand it.
Some of the buttons above may be disabled if the dataset is in a state that doesn’t allow the action. For example, the ‘edit’ button is disabled for datasets that are still queued or running
Expanded dataset view
Expanded dataset view adds a preview element and many additional controls.
In addition to the elements described above for the collapsed dataset, its expanded view contains:
- Add tags galaxy-tags: click on this to tag this dateset;
- Dataset size: (“2 variants, 18 comments”) lists the size of the dataset. When datasets are small (like in this example) the exact size is shown. For large datasets, Galaxy gives an approximate estimate.
- format: (“VCF”) lists the datatype;
- database: (“?”) lists which genome built this dataset corresponds to. This usually lists “?” unless the genome build is set explicitly or the dataset is derived from another dataset with defined genome build information;
- info field: (“INFO [2024-03-26 12:08:53,435]…”) displays information provided by the tool that generated this dataset. This varies widely and depends on the type of job that generated this dataset.
- dataset-save: Saves dataset to disk;
- dataset-link: Copies dataset link into clipboard;
- dataset-info: Displays additional details about the dataset in the center pane;
- dataset-rerun: Reruns job that generated this dataset. This button is unavailable for datasets uploaded into history because they were not produced by a Galaxy tool;
- dataset-visualize: Displays visualization options for this dataset. The list of options is dependent on the datatype;
- dataset-related-datasets: Shows datasets related to this dataset. This is useful for tracking down parental datasets - those that were used as inputs into a job that produced this particular dataset.
Downloading histories
- Click on the gear icon galaxy-gear on the top of the history panel.
- Select “Export History to File” from the History menu.
- Click on the “Click here to generate a new archive for this history” text.
- Wait for the Galaxy server to prepare history for download.
- Click on the generated link to download the history.
Find all Histories and purge (aka permanently delete)
- Login to your Galaxy account.
- On the top navigation bar Click on User.
- On the drop down menu that appears Click on Histories.
- Click on Advanced Search, additional fields will be displayed.
- Next to the Status field, click All, a list of all histories will be displayed.
- Check the box next to Name in the displayed list to select all histories.
- Click Delete Permanently to purge all histories.
- A pop up dialogue box will appear letting you know history contents will be removed and cannot be undone, then click OK to confirm.
Finding Histories
- To review all histories in your account, go to User > Histories in the top menu bar.
- At the top of the History listing, click on Advanced Search.
- Set the status to all to view all of your active, deleted, and permanently deleted (purged) histories.
- Histories in all states are listed for registered accounts. Meaning one will always find their data here if it ever appears to be “lost”.
- Note: Permanently deleted (purged) Histories may be fully removed from the server at any time. The data content inside the History is always removed at the time of purging (by a double-confirmed user action), but the purged History artifact may still be in the listing. Purged data content cannot be restored, even by an administrator.
Finding and working with "Histories shared with me"
How to find and work on histories shared with youTo find histories shared with me:
- Log into your account.
- Select User, in the drop-down menu, select Histories shared with me.
To work with shared histories:
- Import the History into your account via copying it to work with it.
- Unshare Histories that you no longer want shared with you or that you have already made a copy of.
Note: Shared Histories (when copied into your account or not) do count in portion toward your total account data quota usage. More details on histories shared concerning account quota usage can be found in this link.
History annotation
Explains how to annotate a historySometimes tags and names are not enough to describe the work done within a history. Galaxy allows you to create history annotations: longer text entries that allow for more formatting options. The formatting of the text is preserved. Later, if you publish or share the history, the annotation will be displayed automatically - allowing you to share additional notes about the analysis. Multiple lines, spaces, and emoji! 😹🏳️⚧️🌈 can be used while writing annotations.
To annotate a history:
- Click on galaxy-pencil (Edit) next to the history name. A larger text section will appear displaying any existing annotation or
Annotation (optional)
if empty.- Add your text. Enter will move the cursor to the next line. (Tabs cannot be entered since the ‘Tab’ button is used to switch between controls on the page - tabs can be pasted in, however).
- Click on Save galaxy-save.
- To cancel, click the galaxy-undo “Cancel” button.
History options
Explains different history optionsClicking the galaxy-history-options button will open a drop-down menu with several options:
- Show histories side-by-side - brings up a view in which multiple histories can be viewed and manipulated simultaneously. Datasets can be dragged between histories in this view.
- Resume Paused Jobs - restarts paused jobs in history.
- Copy this history - creates an exact copy of the current history in the current account.
- Delete this history - deletes the current history.
- Export tool citations - export citations for tools that were used in the current history.
- Export history to File - creates a compressed archive containing data from the current history.
- Archive history - moves history to a non-active, archived, state.
- Extract workflow - converts the current history into a workflow
- Show invocations - shows a list of all workflows that were run in the current history
- Share or Publish - allows controlling access to history. It can be made public or shared with a specific user.
- Set Permissions - allows to set the rules on who can access daysets in the current history.
- Make Private - resets all permission and makes the current history private.
History tagging
Explains how to add tags to a historyTags are short pieces of text used to describe the thing they’re attached to and many things in Galaxy can be tagged. Each item can have many tags and you can add new tags or remove them at any time. Tags can be another useful way to organize and search your data. For instance, you might tag a history with the type of analysis you did in it:
assembly
orvariants
. Or you may tag them according to data sources or some other metadata:long-term-care-facility
oryellowstone-park:2014
.To tag a history:
- Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”).
- Click on Add tags galaxy-tags and start typing. Any tags that you’ve used previously will show below your partial entry - allowing you to use this ‘autocomplete’ data to re-use your previous tags without typing them in full.
- Click on Save galaxy-save.
- To cancel, click the galaxy-undo “Cancel” button.
Warning: Do not use spacesIt is strongly recommended to replace spaces in tags with
_
or-
, as spaces will automatically be removed when the tag is saved.
How to set Data Privacy Features?
Privacy controls are only enabled if desired. Otherwise, datasets by defaults remain private and unlisted in Galaxy. This means that a dataset you’ve created is virtually invisible until you publish a link to it.
Below are three optional steps to setting private Histories, a user can make use of any of the options below depending on what the user want to achieve:
Changing the privacy settings of individual dataset.
- Click on the dataset name for a dropdown.
- Clicking the ‘pencil - galaxy-pencil icon
- Move on the Permissions tab.
- On the permission tab is two input tab
- On the second input with a label of access
- Search for the name of the user to grant permission
- Click on save permission
Note: Adding additional roles to the ‘access’ permission along with your “private role” does not do what you may expect. Since roles are always logically added together, only you will be able to access the dataset, since only you are a member of your “private role”.
Make all datasets in the current history private.
- Open the History Options galaxy-gear menu galaxy-gear at the top of your history panel
- Click the Make Private option in the dropdown menu available
- Sets the default settings for all new datasets in this history to private.
Set the default privacy settings for new histories
- Click user button on top of the main channel for a dropdown galaxy-dropdown
- Click on the preferences under the dropdown galaxy-dropdown
- Select Set Dataset Permissions for New Histories icon cofest
- Add a permission and click save permission
Note: Changes made here will only affect histories created after these settings have been stored.
Importing a history
- Open the link to the shared history
- Click on the Import this history button on the top left
- Enter a title for the new history
- Click on Copy History
Manipulating multiple history datasets
Explains how to manipulate multiple history datasets at onceYou can also hide, delete, and purge multiple datasets at once by multi-selecting datasets:
- galaxy-selector Click the multi-select button containing the checkbox just below the history size.
- Checkboxes will appear inside each dataset in the history.
- Scroll and click the checkboxes next to the datasets you want to manage.
- Click the ‘n of N selected’ to choose the action. The action will be performed on all selected datasets, except for the ones that don’t support the action. That is, if an action doesn’t apply to a selected dataset, like deleting a deleted dataset, nothing will happen to that dataset, while all other selected datasets will be deleted.
- You can click the multi-select button again to hide the checkboxes.
Renaming a history
Explains how to rename a history
- Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
- Type the new name
- Click on Save
- To cancel renaming, click the galaxy-undo “Cancel” button
If you do not have the galaxy-pencil (Edit) next to the history name (which can be the case if you are using an older version of Galaxy) do the following:
- Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
- Type the new name
- Press Enter
Searching your history
To make it easier to find datasets in large histories, you can filter your history by keywords as follows:
Click on the search datasets box at the top of the history panel.
- Type a search term in this box
- For example a tool name, or sample name
- To undo the filtering and show your full history again, press on the clear search button galaxy-clear next to the search box
Sharing your History
You can share your work in Galaxy. There are various ways you can give access to one of your histories to other users.Sharing your history allows others to import and access the datasets, parameters, and steps of your history.
Access the history sharing menu via the History Options dropdown (galaxy-history-options), and clicking “history-share Share or Publish”
- Share via link
- Open the History Options galaxy-history-options menu at the top of your history panel and select “history-share Share or Publish”
- galaxy-toggle Make History accessible
- A Share Link will appear that you give to others
- Anybody who has this link can view and copy your history
- Publish your history
- galaxy-toggle Make History publicly available in Published Histories
- Anybody on this Galaxy server will see your history listed under the Published Histories tab opened via the galaxy-histories-activity Histories activity
- Share only with another user.
- Enter an email address for the user you want to share with in the Please specify user email input below Share History with Individual Users
- Your history will be shared only with this user.
- Finding histories others have shared with me
- Click on the galaxy-histories-activity Histories activity in the activity bar on the left
- Click the Shared with me tab
- Here you will see all the histories others have shared with you directly
Note: If you want to make changes to your history without affecting the shared version, make a copy by going to History Options galaxy-history-options icon in your history and clicking Copy this History
Switching to an existing history
Shows how to switch to another existing history in your accountTo switch to an existing history simply click the switch-histories icon at the top of the history panel. This opens a list of histories existing in a given Galaxy account in the middle part of the interface.
Top level history controls
Description of three history buttons for creating a new histiory, switching histories, and opening history options dropdownAbove the current history panel are three buttons:
- The new-history “Create new history” button will create an empty history.
- The switch-histories “Switch to history” will open a window letting you easily swap to any of your other histories.
- The galaxy-history-options “History options” (formerly the galaxy-gear “Gear menu”) gives you access to advanced options to work with your history.
Transfer entire histories from one Galaxy server to another
Transfer a Single Dataset
At the sender Galaxy server, set the history to a shared state, then directly capture the galaxy-link link for a dataset and paste the URL into the Upload tool at the receiver Galaxy server.
Transfer an Entire History
Have an account at two different Galaxy servers, and be logged into both.
At the sender Galaxy server
- Navigate to the history you want to transfer, and set the history to a shared state.
- Click into the History Options menu in the history panel.
- Select from the menu galaxy-history-archive Export History to File.
- Choose the option for How do you want to export this History? as to direct download.
- Click on Generate direct download.
- Allow the archive generation process to complete. *
- Copy the galaxy-link link for your new archive.
At the receiver Galaxy server
- Confirm that you are logged into your account.
- Click on Data in the top menu, and choose Histories to reach your Saved Histories.
- Click on Import history in the grey button on the top right.
- Paste in your link’s URL from step 7.
- Click on Import History.
- Allow the archive import process to complete. *
- The transfered history will be uncompressed and added to your Saved Histories.
* For steps 6 and 13: It is Ok to navigate away for other tasks during processing. If enabled, Galaxy will send you status notifications.
tip If the history to transfer is large, you may copy just your important datasets into a new history, and create the archive from that new smaller history. Clearing away deleted and purged datasets will make all histories smaller and faster to archive and transfer!
Undeleting history
Undelete your deleted historiesDeleted histories can be undeleted:
- Select “Histories” from the activity bar on the left
- Toggle “Advanced search”
- Click “Deleted”
- Click on the title of the history you want to un-delete and un-delete it!
Unsharing unwanted histories
- All account Histories owned by others but shared with you can be reviewed under User > Histories shared with me.
- The other person does not need to unshare a history with you. Unshare histories yourself on this page using the pull-down menu per history.
- Dataset and History privacy options, including sharing, can be set under User > Preferences.
Three key features to work with shared data are:
- View is a review feature. The data cannot be worked with, but many details, including tool and dataset metadata/parameters, are included.
- Copy those you want to work with. This will increase your quota usage. This will also allow you to manipulate the datasets or the history independently from the original owner. All History/Dataset functions are available if the other person granted full access to the datasets to you.
- Unshare any on the list not needed anymore. After a history is copied, you will still have your version of the history, even if later unshared or the other person who shared it with you changes their version later. Meaning, that each account’s version of a History and the Datasets in it are distinct (unless the Datasets were not shared, you will still only be able to “view” but not work with or download them).
Note: “Histories shared with me” result in only a tiny part of your quota usage. Unsharing will not significantly reduce quota usage unless hundreds (or more!) or many significant histories are shared. If you share a History with someone else, that does not increase or decrease your quota usage.
View a list of all histories
This FAQ demonstrates how to list all histories for a given userThere are multiple ways in which you can view your histories:
Viewing histories using switch-histories “Switch to history” button. This is best for quickly switching between multiple histories.
Click the “Switch history” icon at the top of the history panel to bring up a list of all your histories:
Using the “Activity Bar”:
Click the “Show all histories” button within the Activity Bar on the left:
Using “Data” drop-down:
Click the “Data” link on the top bar of Galaxy interface and select “Histories”:
Using the Multi-view, which is best for moving datasets between histories:
Click the galaxy-history-options menu, and select galaxy-multihistory Show histories side-by-side
View histories side-by-side
This FAQ demonstrates how to view histories side-by-sdeYou can view multiple Galaxy histories at once. This allows to better understand your analyses and also makes it possible to drag datasets between histories. This is called “History multiview”. The multiview can be enabled either view History menu or via the Activity Bar:
Enabling Multiview via History menu is done by first clicking on the galaxy-history-options “History options” drop-down and selecting galaxy-multihistory “Show Histories Side-by-Side option”:
Clicking the galaxy-multihistory “History Multiview” button within the Activity Bar:
Interactive tools
Knitting RMarkdown documents in RStudio
Hands-on: Knitting RMarkdown documents in RStudioOne of the other nice features of RMarkdown documents is making lovely presentation-quality worthy documents. You can take, for example, a tutorial and produce a nice report like output as HTML, PDF, or
.doc
document that can easily be shared with colleagues or students.Now you’re ready to preview the document:
Click Preview. A window will popup with a preview of the rendered verison of this document.
The preview is really similar to the GTN rendering, no cells have been executed, and no output is embedded yet in the preview document. But if you have run cells (e.g. the first few loading a library and previewing the
msleep
dataset:When you’re ready to distribute the document, you can instead use the
Knit
button. This runs every cell in the entire document fresh, and then compiles the outputs together with the rendered markdown to produce a nice result file as HTML, PDF, or Word document.tip Tip: PDF + Word require a LaTeX installation
You might need to install additional packages to compile the PDF and Word document versions
And at the end you can see a pretty document rendered with all of the output of every step along the way. This is a fantastic way to e.g. distribute read-only lesson materials to students, if you feel they might struggle with using an RMarkdown document, or just want to read the output without doing it themselves.
Launch JupyterLab
Hands-on: Launch JupyterLabCurrently JupyterLab in Galaxy is available on Live.useGalaxy.eu, usegalaxy.org and usegalaxy.eu.
Hands-on: Run JupyterLab
- Interactive Jupyter Notebook. Note that on some Galaxies this is called Interactive JupyTool and notebook:
- Click Run Tool
The tool will start running and will stay running permanently
This may take a moment, but once the
Executed notebook
in your history is orange, you are up and running!- Click on the User menu at the top and go to Active Interactive Tools and locate the JupyterLab instance you started.
- Click on your JupyterLab instance
If JupyterLab is not available on the Galaxy instance:
- Start Try JupyterLab
Launch RStudio
Hands-on: Launch RStudioDepending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.
Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org
- Open the Rstudio tool tool by clicking here to launch RStudio
- Click Run Tool
- The tool will start running and will stay running permanently
- Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.
If RStudio is not available on the Galaxy instance:
- Register for RStudio Cloud, or login if you already have an account
- Create a new project
Learning with RMarkdown in RStudio
Hands-on: Learning with RMarkdown in RStudioLearning with RMarkdown is a bit different than you might be used to. Instead of copying and pasting code from the GTN into a document you’ll instead be able to run the code directly as it was written, inside RStudio! You can now focus just on the code and reading within RStudio.
Load the notebook if you have not already, following the tip box at the top of the tutorial
Open it by clicking on the
.Rmd
file in the file browser (bottom right)The RMarkdown document will appear in the document viewer (top left)
You’re now ready to view the RMarkdown notebook! Each notebook starts with a lot of metadata about how to build the notebook for viewing, but you can ignore this for now and scroll down to the content of the tutorial.
You can switch to the visual mode which is way easier to read - just click on the gear icon and select
Use Visual Editor
.You’ll see codeblocks scattered throughout the text, and these are all runnable snippets that appear like this in the document:
And you have a few options for how to run them:
- Click the green arrow
- ctrl+enter
Using the menu at the top to run all
When you run cells, the output will appear below in the Console. RStudio essentially copies the code from the RMarkdown document, to the console, and runs it, just as if you had typed it out yourself!
One of the best features of RMarkdown documents is that they include a very nice table browser which makes previewing results a lot easier! Instead of needing to use
head
every time to preview the result, you get an interactive table browser for any step which outputs a table.
Open a Terminal in Jupyter
Hands-on: Open a Terminal in JupyterThis tutorial will let you accomplish almost everything from this view, running code in the cells below directly in the training material. You can choose between running the code here, or opening up a terminal tab in which to run it.Here are some instructions for how to do this on various environments.
Jupyter on UseGalaxy.* and MyBinder.org
Use the File → New → Terminal menu to launch a terminal.
Disable “Simple” mode in the bottom left hand corner, if it activated.
Drag one of the terminal or notebook tabs to the side to have the training materials and terminal side-by-side
CoCalc
Use the Split View functionality of cocalc to split your view into two portions.
Change the view of one panel to a terminal
Open interactive tool
- Go to User > Active InteractiveTools
- Wait for the to be running (Job Info)
- Click on
Stop RStudio
Hands-on: Stop RStudioWhen you have finished your R analysis, it’s time to stop RStudio.
- First, save your work into Galaxy, to ensure reproducibility:
- You can use
gx_put(filename)
to save individual files by supplying the filename- You can use
gx_save()
to save the entire analysis transcript and any data objects loaded into your environment.- Once you have saved your data, you can proceed in 2 different ways:
- Deleting the corresponding history dataset named
RStudio
and showing a “in progress state”, so yellow, OR- Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.
Reference genomes
How to use Custom Reference Genomes?
A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome assembly build or release.
There are two options for reference genomes in Galaxy.
- Native
- Index provided by the server administrators.
- Found on tool forms in a drop down menu.
- A database key is automatically assigned. See tip 1.
- The database is what links your data to a FASTA index. Example: used with BAM data
- Custom
- FASTA file uploaded by users.
- Input on tool forms then indexed at runtime by the tool.
- An optional custom database key can be created and assigned by the user.
There are five basic steps to use a Custom Reference Genome, plus one optional.
- Obtain a FASTA copy of the target genome. See tip 2.
- Upload the genome to Galaxy and to add it as a dataset in your history.
- Clean up the format with the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
- Make sure the chromosome identifiers are a match for other inputs.
- Set a tool form’s options to use a custom reference genome from the history and select the loaded genome FASTA.
- (Optional) Create a custom genome build’s database that you can assign to datasets.
tip TIP 1: Avoid assigning a native database to uploaded data unless you confirmed the data are based on the same exact genome assembly or you adjusted the data to be a match first!
tip TIP 2: When choosing your reference genome, consider choosing your reference annotation at the same time. Standardize the format of both as a preparation step. Put the files in a dedicated “reference data” history for easy reuse.
Sorting Reference Genome
Certain tools expect that reference genomes are sorted in lexicographical order. These tools are often downstream of the initial mapping tools, which means that a large investment in a project has already been made, before a problem with sorting pops up in conclusion layer tools. How to avoid? Always sort your FASTA reference genome dataset at the beginning of a project. Many sources only provide sorted genomes, but double checking is your own responsibility, and super easy in Galaxy!
- Convert Formats -> FASTA-to-Tabular
- Filter and Sort -> Sort on column: c1 with flavor: Alphabetical everything in: Ascending order
- Convert Formats -> Tabular-to-FASTA
Note: The above sorting method is for most tools, but not all. In particular, GATK tools have a tool-specific sort order requirement.
Troubleshooting Custom Genome fasta
If a custom genome/transcriptome/exome dataset is producing errors, double check the format and that the chromosome identifiers between ALL inputs. Clicking on the bug icon galaxy-bug will often provide a description of the problem. This does not automatically submit a bug report, and it is not always necessary to do so, but it is a good way to get some information about why a job is failing.
Custom genome not assigned as FASTA format
- Symptoms include: Dataset not included in custom genome “From history” pull down menu on tool forms.
- Solution: Check datatype assigned to dataset and assign fasta format.
- How: Click on the dataset’s pencil icon galaxy-pencil to reach the “Edit Attributes” form, and in the Datatypes tab > redetect the datatype.
- If
fasta
is not assigned, there is a format problem to correct.Incomplete Custom genome file load
- Symptoms include: Tool errors result the first time you use the Custom genome.
- Solution: Use Text Manipulation → Select last lines from a dataset to check last 10 lines to see if file is truncated.
- How: Reload the dataset (switch to FTP if not using already). Check your FTP client logs to make sure the load is complete.
Extra spaces, extra lines, inconsistent line wrapping, or any deviation from strict FASTA format
- Symptoms include: RNA-seq tools (Cufflinks, Cuffcompare, Cuffmerge, Cuffdiff) fails with error
Error: sequence lines in a FASTA record must have the same length!.
- Solution: File tested and corrected locally then re-upload or test/fix within Galaxy, then re-run.
- How:
- Quick re-formatting Run the tool through the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
- Optional Detailed re-formatting Start with FASTA manipulation → FASTA Width formatter with a value between 40-80 (60 is common) to reformat wrapping. Next, use Filter and Sort → Select with “>” to examine identifiers. Use a combination of Convert Formats → FASTA-to-Tabular, Text Manipulation tools, then Tabular-to-FASTA to correct.
- With either of the above, finish by using Filter and Sort → Select with
^\w*$
to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).Inconsistent line wrapping, common if merging chromosomes from various Genbank records (e.g. primary chroms with mito)
- Symptoms include: Tools (SAMTools, Extract Genomic DNA, but rarely alignment tools) may complain about unexpected line lengths/missing identifiers. Or they may just fail for what appears to be a cluster error.
- Solution: File tested and corrected locally then re-upload or test/fix within Galaxy.
- How: Use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Finish by using Filter and Sort → Select with
^\w*$
to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).Unsorted fasta genome file
- Symptoms include: Tools such as Extract Genomic DNA report problems with sequence lengths.
- Solution: First try sorting and re-formatting in Galaxy then re-run.
- How: To sort, follow instructions for Sorting a Custom Genome.
Identifier and Description in “>” title lines used inconsistently by tools in the same analysis
- Symptoms include: Will generally manifest as a false genome-mismatch problem.
- Solution: Remove the description content and re-run all tools/workflows that used this input. Mapping tools will usually not fail, but downstream tools will. When this comes up, it usually means that an analysis needs to be started over from the mapping step to correct the problems. No one enjoys redoing this work. Avoid the problems by formatting the genome, by double checking that the same reference genome was used for all steps, and by making certain the ‘identifiers’ are a match between all planned inputs (including reference annotation such as GTF data) before using your custom genome.
- How: To drop the title line description content, use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Next, double check that the chromosome identifiers are an exact match between all inputs.
Unassigned database
- Symptoms include: Tools report that no build is available for the assigned reference genome.
- Solution: This occurs with tools that require an assigned database metadata attribute. SAMTools and Picard often require this assignment.
- How: Create a Custom Build and assign it to the dataset.
Reports
Enhancing tabular dataset previews in reports/pages
There are lots of fun advanced features!There are a number of options, specifically for tabular data, that can allow it to render more nicely in your workflow reports and pages and anywhere that GalaxyMarkdown is used.
title
to give your table a titlefooter
allows you to caption your tableshow_column_headers=false
to hide the column headerscompact=true
to make the table show up more inline, hiding that it was embedded from a Galaxy dataset.The existing
history_dataset_display
directive displays the dataset name and some useful context at the expense of potentially breaking the flow of the documentInput: Galaxy Markdown```galaxy
history_dataset_display(history_dataset_id=1e8ab44153008be8)
```Output: Example ScreenshotThe existing
history_dataset_embedded
directive was implemented to try to inline results more and make the results more readable within a more… curated document. It is dispatches on tabular types and puts the results in a table but the table doesn’t have a lot of options.Input: Galaxy Markdown```galaxy
history_dataset_embedded(history_dataset_id=1e8ab44153008be8)
```Output: Example ScreenshotThe
history_dataset_as_table
directive mirrors thehistory_dataset_as_image
directive: it tries harder to coerce the data into a table and provides new table—specific options. The first of these is “show_column_headerswhich defaults to
true`.Input: Galaxy Markdown```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false)
```Output: Example ScreenshotThere is also a
compact
option. This provides a much more inline experience for tabular datasets:Input: Galaxy Markdown```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false,compact=true)
```Output: Example ScreenshotFigures in general should have titles and legends — so there is the “title” and “footer” options also.
Input: Galaxy Markdown```galaxy
history_dataset_as_table(history_dataset_id=1e8ab44153008be8,show_column_headers=false,title='Binding Site Results',footer='Here is a very good figure caption for this table.')
```Output: Example Screenshot
Making an element collapsible in a report
If you have extraneous information you might want to let a user collapse it.This applies to any GalaxyMarkdown elements, i.e. the things you’ve clicked in the left panel to embed in your Workflow Report or Page
By adding a
collapse=""
attribute to a markdown element, you can make it collapsible. Whatever you put in the quotes will be the title of the collapsible box.```
history_dataset_type(history_dataset_id=3108c91feeb505da, collapse="[TITLE]")
```
Sequencing
Illumina MiSeq sequencing
Comment: Illumina MiSeq sequencingIllumina MiSeq sequencing is based on sequencing by synthesis. As the name suggests, fluorescent labels are measured for every base that bind at a specific moment at a specific place on a flow cell. These flow cells are covered with oligos (small single strand DNA strands). In the library preparation the DNA strands are cut into small DNA fragments (differs per kit/device) and specific pieces of DNA (adapters) are added, which are complementary to the oligos. Using bridge amplification large amounts of clusters of these DNA fragments are made. The reverse string is washed away, making the clusters single stranded. Fluorescent bases are added one by one, which emit a specific light for different bases when added. This is happening for whole clusters, so this light can be detected and this data is basecalled (translation from light to a nucleotide) to a nucleotide sequence (Read). For every base a quality score is determined and also saved per read. This process is repeated for the reverse strand on the same place on the flow cell, so the forward and reverse reads are from the same DNA strand. The forward and reversed reads are linked together and should always be processed together!
For more information watch this video from Illumina
Nanopore sequencing
Comment: Nanopore sequencingNanopore sequencing has several properties that make it well-suited for our purposes
- Long-read sequencing technology offers simplified and less ambiguous genome assembly
- Long-read sequencing gives the ability to span repetitive genomic regions
- Long-read sequencing makes it possible to identify large structural variations
When using Oxford Nanopore Technologies (ONT) sequencing, the change in electrical current is measured over the membrane of a flow cell. When nucleotides pass the pores in the flow cell the current change is translated (basecalled) to nucleotides by a basecaller. A schematic overview is given in the picture above.
When sequencing using a MinIT or MinION Mk1C, the basecalling software is present on the devices. With basecalling the electrical signals are translated to bases (A,T,G,C) with a quality score per base. The sequenced DNA strand will be basecalled and this will form one read. Multiple reads will be stored in a fastq file.
Support
Can I use a public Galaxy for my private data?
Of course*!
If your data is not sensitive (i.e. human patient) but just private (sequencing from other animals/bacteria/etc), then it is absolutely ok to use a public galaxy server like usegalaxy.eu or usegalaxy.org!
Data uploaded is private to your account, it isn’t available to others publicly. No one will scoop your results, if you use a public galaxy server to analyse your data :)
A great benefit of this is then when your paper is being reviewed you can share that history or workflow with reviewers, and when it’s published you can click a button to share those results with the world as well, such that others can reproduce your analysis!
(of course system administrators can see the files on disk but they are not interested and will not be looking at your data. If you file a bug report they may see your data but they are system administrators, not bioinformatics experts that might be interested in your results.)
Contacting Galaxy Administrators
If you suspect there is something wrong with the server, or would like to request a tool to be installed, you should contact the server administrators for the Galaxy you are on.
- Tool error? Please follow these troubleshooting steps
- Each Galaxy server has different contact procedures, here are the contact options for the 3 biggest servers:
- Galaxy US: Gitter channel
- Galaxy EU: Gitter channel, Request TIaaS
- Galaxy AU: Email, Request a tool, Request Data Quota
- Galaxy FR: Request TIaaS
- Other Galaxy servers? Check the homepage for more information.
Where do I get more support?
If you need support for using Galaxy, running your analysis or completing a tutorial, please try one of the following options:
- Gitter Chat: You can get help on Gitter chat platform, on various channels.
Galaxy Help Forum: You can also have a look at the Galaxy Help Forum. Your question may already have been answered here before. If not, you can post your question here.
- Contact Server Admins: If you think there is a problem with the Galaxy server, or you would like to make a request, contact the Galaxy server administrators.
Tools
Add Toolshed category to a tool
Find the target tool in the Galaxy Toolshed.
Note: the easiest way to do this from the Galaxy interface is to (A) search for the tool, then (B) select the drop-down menu See tool in toolshed.
- Follow the Development respository URL.
- Go to the
.shed.yml
file.- In the
categories:
metadata section, add your Toolshed category (which must correspond to those already in the Galaxy Toolshed.Example format:
categories:
- Single Cell
- Spatial Omics
- Transcriptomics
Changing the tool version
Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool.
Switching to a different version of a tool:
- Open the tool
- Click on the tool-versions versions logo at the top right
- Select the desired version from the dropdown list
If a Tool is Missing
To use the tools installed and available on the Galaxy server:
- At the top of the left tool panel, type in a tool name or datatype into the tool search box.
- Shorter keywords find more choices.
- Tools can also be directly browsed by category in the tool panel.
If you can’t find a tool you need for a tutorial on Galaxy, please:
- Check that you are using a compatible Galaxy server
- Navigate to the overview box at the top of the tutorial
- Find the “Supporting Materials” section
- Check “Available on these Galaxies”
- If your server is not listed here, the tutorial is not supported on your Galaxy server
- You can create an account on one of the supporting Galaxies
- Use the Tutorial mode feature
- Open your Galaxy server
- Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
- Navigate to your tutorial
- Tool names in tutorials will be blue buttons that open the correct tool for you
- Note: this does not work for all tutorials (yet)
- Still not finding the tool?
- Ask help in Gitter.
Multipile similar tools available
Sometimes there are multiple tools with very similar names. If the parameters in the tutorial don’t match with what you see in Galaxy, please try the following:
Use Tutorial Mode curriculum in Galaxy, and click on the blue tool button in the tutorial to automatically open the correct tool and version (not available for all tutorials yet)
Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.
- Open your Galaxy server
- Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
- Navigate to your tutorial
- Tool names in tutorials will be blue buttons that open the correct tool for you
- Note: this does not work for all tutorials (yet)
- You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
- We’ve had some issues with Tutorial mode on Safari for Mac users.
- Try a different browser if you aren’t seeing the button.
Check that the entire tool name matches what you see in the tutorial.
Organizing the tool panel
Galaxy servers can have a lot of tools available, which can make it challenging to find the tool you are looking for. To help find your favourite tools, you can:
- Keep a list of your favourite tools to find them back easily later.
- Adding tools to your favourites
- Open a tool
- Click on the star icon galaxy-star next to the tool name to add it to your favourites
- Viewing your favourite tools
- Click on the star icon galaxy-star at the top of the Galaxy tool panel (above the tool search bar)
- This will filter the toolbox to show all your starred tools
- Change the tool panel view
- Click on the galaxy-panelview icon at the top of the Galaxy tool panel (above the tool search bar)
- Here you can view the tools by EDAM ontology terms
- EDAM Topics (e.g. biology, ecology)
- EDAM Operations (e.g. quality control, variant analysis)
- You can always get back to the default view by choosing “Full Tool Panel”
Re-running a tool
- Expand one of the output datasets of the tool (by clicking on it)
- Click re-run galaxy-refresh the tool
This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.
Regular Expressions 101
Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.
Finding
Below are just a few examples of basic expressions:
Regular expression Matches abc
an occurrence of abc
within your data(abc|def)
abc
ordef
[abc]
a single character which is either a
,b
, orc
[^abc]
a character that is NOT a
,b
, norc
[a-z]
any lowercase letter [a-zA-Z]
any letter (upper or lower case) [0-9]
numbers 0-9 \d
any digit (same as [0-9]
)\D
any non-digit character \w
any alphanumeric character \W
any non-alphanumeric character \s
any whitespace \S
any non-whitespace character .
any character \.
{x,y}
between x and y repetitions ^
the beginning of the line $
the end of the line Note: you see that characters such as
*
,?
,.
,+
etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So\?
matches the question mark character exactly.Examples
Regular expression matches \d{4}
4 digits (e.g. a year) chr\d{1,2}
chr
followed by 1 or 2 digits.*abc$
anything with abc
at the end of the line^$
empty line ^>.*
Line starting with >
(e.g. Fasta header)^[^>].*
Line not starting with >
(e.g. Fasta sequence)Replacing
Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups
(...)
, which we can refer to using\1
,\2
etc for the first and second captured values. If you want to refer to the whole match, use&
.
Regular expression Input Captures chr(\d{1,2})
chr14
\1 = 14
(\d{2}) July (\d{4})
24 July 1984 \1 = 24
,\2 = 1984
An expression like
s/find/replacement/g
indicates a replacement expression, this will search (s
) for any occurrence offind
, and replace it withreplacement
. It will do this globally (g
) which means it doesn’t stop after the first match.Example:
s/chr(\d{1,2})/CHR\1/g
will replacechr14
withCHR14
etc.You can also use replacement modifier such as convert to lower case
\L
or upper case\U
. Example:s/.*/\U&/g
will convert the whole text to upper case.Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the
s/../../g
structure.There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.
Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.
Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.
Tip: Cyrilex is a visual regular expression tester.
Request Galaxy tools on a specific server
To request tools that already exist in the Galaxy toolshed, but not in your server, please raise an issue at:
Europe - usegalaxy.eu | https://github.com/usegalaxy-eu/usegalaxy-eu-tools
USA - usegalaxy.org | https://github.com/galaxyproject/usegalaxy-tools
Australia - usegaalxy.org.au | https://github.com/usegalaxy-au/usegalaxy-au-tools/tree/master/usegalaxy.org.au
Select multiple datasets
- Click on param-files Multiple datasets
- Select several files by keeping the Ctrl (or COMMAND) key pressed and clicking on the files of interest
Selecting a dataset collection as input
- Click on param-collection Dataset collection in front of the input parameter you want to supply the collection to.
- Select the collection you want to use from the list
Sorting Tools
Sometimes input errors are caused because of non-sorted inputs. Try using these:
- Picard SortSam: Sort SAM/BAM by coordinate or queryname.
- Samtools Sort: Alternate for SAM/BAM, best when used for coordinate sorting only.
- SortBED order the intervals: Best choice for BED/Interval.
- Sort data in ascending or descending order: Alternate choice for Tabular/BED/Interval/GTF.
- VCFsort: Best choice for VFC.
- Tool Form Options for Sorting: Some tools have an option to sort inputs during job execution. Whenever possible, sort inputs before using tools, especially if jobs fail for not having enough memory resources.
Tool doesn't recognize input datasets
The expected input datatype assignment is explained on the tool form. Review the input select areas and the help section below the Run Tool button.
Understanding datatypes FAQ.
No datasets or collections available? Solutions:
- Upload or Copy an appropriate dataset for the input into the active history.
- Resolve a datatype assignment incompatibility between the dataset and the tool.
- Individual datasets and dataset collections are selected differently on tool forms.
- To select a collection input on a tool form see this FAQ.
Using tutorial mode
Tutorial mode saves you screen space, finds the tools you need, and ensures you use the correct versions for the tutorials to run.Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.
- Open your Galaxy server
- Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
- Navigate to your tutorial
- Tool names in tutorials will be blue buttons that open the correct tool for you
- Note: this does not work for all tutorials (yet)
- You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
- We’ve had some issues with Tutorial mode on Safari for Mac users.
- Try a different browser if you aren’t seeing the button.
Viewing tool logs (`stdout` and `stderr`)
Most tools create log files as output, which can contain useful information about how the tool ran (
stdout
, or standard output), and what went wrong (stderr
, or standard error).To view these log files in Galaxy:
- Expand one of the outputs of the tool in your history
- Click on View details details
- Scroll to the Job Information section
- Here you will find links to the log files (
stdout
andstderr
).
Where is the tool help?
Finding tool supportThere is documentation available on the tool form itself which mentions the following information:
- Parameters
- Expected format for input dataset(s)
- Links to publications and ToolShed source repositories
- Tool and wrapper version(s)
- 3rd party author web sites and documentation
Scroll down on the tool form to locate:
- Information about expected inputs/outputs
- Expanded definitions
- Sample data
- Example use cases
- Graphics
Troubleshooting
How to find and correct tool errors related to Metadata?
Finding and Correcting MetadataTools can error when the wrong dataset attributes (metadata) are assigned. Some of these wrong assignments may be:
- Tool outputs, which are automatically assigned without user action.
- Incorrect autodetection of datatypes, which need manual modification.
- Undetected attributes, which require user action (example: assigning database to newly uploaded data).
How to notice missing Dataset Metadata:
- Dataset will not be downloaded when using the disk icon galaxy-save.
- Tools error when using a previously successfully used specific dataset.
- Tools error with a message that ends with:
OSError: [Errno 2] No such file or directory
.Solution:
Click on the dataset’s pencil icon galaxy-pencil to reach the Edit Attributes forms and do one of the following as applies:
- Directly reset metadata
- Find the tab for the metadata you want to change, make the change, and save.
- Autodetect metadata
- Click on the Auto-detect button. The dataset will turn yellow in the history while the job is processing.
Incomplete Dataset Download
In case the dataset downloads incompletely:
- Use the Google Chrome web browser. Sometimes Chrome works better at supporting continuous data transfers.
- Use the command-line option instead. The data may really be too large to download OR your connection is slower. This can also be a faster way to download multiple datasets plus ensure a complete transfer (small or large data).
Understanding 'canceled by admin' or cluster failure error messages
The initial error message could be:
This job failed because it was cancelled by an administrator.
Please click the bug icon to report this problem if you need help.Or
job info:
Remote job server indicated a problem running or monitoring this job.
- Causes:
- Server or cluster error.
- Less frequently, input problems are a factor.
- Solutions:
- Try at least one rerun. Server/cluster errors like this are usually transient.
- Review the Solutions section of the Understanding input error messages FAQ.
- If after any corrections, the job still fails, please report the technical issue following the extended issue guidelines.
Understanding 'exceeds memory allocation' error messages
The error message to be displayed are as follows:
job info:
This job was terminated because it used more memory than it was allocated.
Please click the bug icon to report this problem if you need help.Or
stderr:
Fatal error: Exit code 1 ()
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.Sometimes this message may appear at the bottom
job stderr:
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.In rare cases when the memory quota is exceeded very quickly, an error message such as the following can appear
job stderr:
Fatal error: Exit code 1 ()
Traceback (most recent call last):
(other lines)
Memory ErrorNote: Job runtime memory is different from the amount of free storage space (quota) in an account.
- Causes:
- The job ran out of memory while executing on the cluster node that ran the job.
- The most common reasons for this error are input and tool parameters problems that must be adjusted/corrected.
- Solutions:
- Try at least one rerun to execute the job on a different cluster node.
- Review the Solutions section of the Understanding input error messages FAQ.
- Your data may actually be too large to process at a public Galaxy server. Alternatives include setting up a private Galaxy server.
Understanding ValueError error messages
The full error is usually a longer message seen only after clicking on the bug icon or by reviewing the job details
stderr
.How to do both is covered in the Troubleshooting errors FAQ.
stderr
...
Many lines of text, may include parameters
...
...
ValueError: invalid literal for int() with base 10: some-sequence-read-name
- Causes:
- MACS2 produces this error the first time it is run. MACS is not the only tool that can produce this issue, but it is the most common.
- Solutions:
- Try at least one rerun.
- MACS/2 is not capable of interpreting sequence read names with spaces included. Try following these two:
- Remove unmapped reads from the SAM dataset. There are several filtering tools in the groups SAMTools and Picard that can do this.
- Convert the SAM input to BAM format with the tool SAMtools: SAM-to-BAM. When compressed input is given to MACS, the spaces are no longer an issue.
Understanding input error messages
Input problems are very common across any analysis that makes use of programmed tools.
- Causes:
- No quality assurance or content/formatting checks were run on the first datasets of an analysis workflow.
- Incomplete dataset Upload.
- Incorrect or unassigned datatype or database.
- Tool-specific formatting requirements for inputs were not met.
- Parameters set on a tool form are a mismatch for the input data content or format.
- Inputs were in an error state (red) or were putatively successful (green) but are empty.
- Inputs do not meet the datatype specification.
- Inputs do not contain the exact content that a tool is expecting or that was input in the form.
- Annotation files are a mismatch for the selected or assigned reference genome build.
- Special case: Some of the data were generated outside of Galaxy, but later a built-in indexed genome build was assigned in Galaxy for use with downstream tools. This scenario can work, but only if those two reference genomes are an exact match.
- Solutions:
- Review our Troubleshooting Tips for what and where to check.
- Review the GTN for related tutorials on tools/analysis plus FAQs.
- Review Galaxy Help for prior discussion with extended solutions.
- Review datatype FAQs.
- Review the tool form.
- Input selection areas include usage help.
- The help section at the bottom of a tool form often has examples. Does your own data match the format/content?
- See the links to publications and related resources.
- Review the inputs.
- All inputs must be in a success state (green) and actually contain content.
- Did you directly assign the datatype or convert the datatype? What results when the datatype is detected by Galaxy? If these differ, there is likely a content problem.
- For most analysis, allowing Galaxy to detect the datatype during Upload is best and adjusting a datatype later should rarely be needed. If a datatype is modified, the change has a specific purpose/reason.
- Does your data have headers? Is that in specification for the datatype? Does the tool form have an option to specify if the input has headers or not? Do you need to remove headers first for the correct datatype to be detected? Example GTF.
- Large inputs? Consider modifying your inputs to be smaller. Examples: FASTQ and FASTA.
- Run quality checks on your data.
- Search GTN tutorials with the keyword “qa-qc” for examples.
- Search Galaxy Help with the keywords “qa-qc” and your datatype(s) for more help.
- Reference annotation tips.
- In most cases, GTF is preferred over GFF3.
- Search Galaxy Help with the keywords “gtf” and “gff3” for more help.
- Input mismatch tips.
- Do the chromosome/sequence identifiers exactly match between all inputs? Search Galaxy Help for more help about how to correct build/version identifier mismatches between inputs.
- “Chr1” and “chr1” and “1” do not mean the same thing to a tool.
- Custom genome transcriptome exome tips. See FASTA.
Understanding walltime error messages
The full error message will be reported as below, and can be found by clicking on the bug icon for a failed job run (red dataset):
job info:
This job was terminated because it ran longer than the maximum allowed job run time.
Please click the bug icon to report this problem if you need help.Or sometimes,
job stderr:
slurmstepd: error: *** JOB XXXX ON XXXX CANCELLED AT 2019-XX-XXTXX:XX:XX DUE TO TIME LIMIT ***
job info:
Remote job server indicated a problem running or monitoring this job.
- Causes:
- The job execution time exceeded the “wall-time” on the cluster node that ran the job.
- The server may be undergoing maintenance.
- Very often input problems also cause this same error.
- Solutions:
- Try at least one rerun.
- Check the server homepage for banners or notices. Selected servers also post to the Galaxy status page.
- Review the Solutions section of the Understanding input error messages FAQ.
- Your data may actually be too large to process at a public Galaxy server. Alternatives include setting up a private Galaxy server.
What information should I include when reporting a problem?
Writing bug reports is a good skill to have as bioinformaticians, and a key point is that you should include enough information from the first message to help the process of resolving your issue more efficient and a better experience for everyone.
What to include
- Which commands did you run, precisely, we want details. Which flags did you set?
- Which server(s) did you run those commands on?
- What account/username did you use?
- Where did it go wrong?
- What were the stdout/stderr of the tool that failed? Include the text.
- Did you try any workarounds? What results did those produce?
- (If relevant) screenshot(s) that show exactly the problem, if it cannot be described in text. Is there a details panel you could include too?
- If there are job IDs, please include them as text so administrators don’t have to manually transcribe the job ID in your picture.
It makes the process of answering ‘bug reports’ much smoother for us, as we will have to ask you these questions anyway. If you provide this information from the start, we can get straight to answering your question!
What does a GOOD bug report look like?
The people who provide support for Galaxy are largely volunteers in this community, so try and provide as much information up front to avoid wasting their time:
I encountered an issue: I was working on (this server> and trying to run (tool)+(version number) but all of the output files were empty. My username is jane-doe.
Here is everything that I know:
- The dataset is green, the job did not fail
- This is the standard output/error of the tool that I found in the information page (insert it here)
- I have read it but I do not understand what X/Y means.
- The job ID from the output information page is 123123abdef.
- I tried re-running the job and changing parameter Z but it did not change the result.
Could you help me?
User preferences
Does your account usage quota seem incorrect?
- Log out of Galaxy, then back in again. This refreshes the disk usage calculation displayed in the Masthead usage (summary) and under User > Preferences (exact).
Note:
- Your account usage quota can be found at the bottom of your user preferences page.
Forgot Password
- Go to the Galaxy server you are using.
- Click on Login or Register.
- Enter your email on the Public Name or Email Address entry box.
- Click on the link under the password entry box titled Forgot password? Click here to reset your password.
- An email will be sent with a password reset link. This email may be in your email Spam or Trash folders, depending on your filters.
- Click on the reset link in the email or copy and paste it into a web browser window.
- Enter your new password and click on Save new password.
Getting your API key
- In your browser, open your Galaxy homepage
- Log in, or register a new account, if it’s the first time you’re logging in
- Go to
User -> Preferences
in the top menu bar, then click onManage API key
- If there is no current API key available, click on
Create a new key
to generate it- Copy your API key to somewhere convenient, you will need it throughout this tutorial
Visualisation
Open History files in Integrated Genome Browser (IGB)
You can open some file types in Integrated Genome Browser (IGB), a desktop genome browser. (Supported File Types)
Here’s how:
- Install IGB on your computer (download page).
- Start IGB.
- In Galaxy, click the desired dataset’s name to expand it.
- Check that the reference genome (dbkey) is set (instructions).
- Click on the Charts icon galaxy-barchart
- In the central panel, next to
display in IGB
, chooseView
.When you choose “View” in Galaxy, your browser opens a new tab showing a page from BioViz.org. Check the newly opened page for next steps.
Having trouble? Working with a custom genome assembly not yet available in Galaxy or IGB?
Contact the IGB team for help and advice!
Using IGV with Galaxy
You can send data from your Galaxy history to IGV for viewing as follows:
- Install IGV on your computer (IGV download page)
- Start IGV
- In recent versions of IGV, you will have to enable the port:
- In IGV, go to
View > Preferences > Advanced
- Check the box
Enable Port
- In Galaxy, expand the dataset you would like to view in IGV
- Make sure you have set a reference genome/database correctly (dbkey) (instructions)
- Under
display in IGV
, click onlocal
Workflows
Annotate a workflow
- Open the workflow editor for the workflow
- Click on galaxy-pencil Edit Attributes on the top right
- Write a description of the workflow in the Annotation box
- Add a tag (which will help to search for the workflow) in the Tags section
Creating a new workflow
You can create a Galaxy workflow from scratch in the Galaxy workflow editor.
- Click Workflow on the top bar
- Click the new workflow galaxy-wf-new button
- Give it a clear and memorable name
- Clicking Save will take you directly into the workflow editor for that workflow
- Need more help? Please see the How to make a workflow subsection here
Ensuring Workflows meet Best Practices
When you are editing a workflow, there are a number of additional steps you can take to ensure that it is a Best Practice workflow and will be more reusable.
- Open a workflow for editing
- In the workflow menu bar, you’ll find the galaxy-wf-options Workflow Options dropdown menu.
Click on it and select galaxy-wf-best-practices Best Practices from the dropdown menu.
This will take you to a new side panel, which allows you to investigate and correct any issues with your workflow.
The Galaxy community also has a guide on best practices for maintaining workflows. This guide includes the best practices from the Galaxy workflow panel, plus:
- adding tests to the workflow
- publishing the workflow on GitHub, a public GitLab server, or another public version-controlled repository
- registering the workflow with a workflow registry such as WorkflowHub or Dockstore
Extracting a workflow from your history
Galaxy can automatically create a workflow based on the analysis you have performed in a history. This means that once you have done an analysis manually once, you can easily extract a workflow to repeat it on different data.
Clean up your history: remove any failed (red) jobs from your history by clicking on the galaxy-delete button.
This will make the creation of the workflow easier.
Click on galaxy-gear (History options) at the top of your history panel and select Extract workflow.
The central panel will show the content of the history in reverse order (oldest on top), and you will be able to choose which steps to include in the workflow.
Replace the Workflow name to something more descriptive.
Rename each workflow input in the boxes at the top of the second column.
If there are any steps that shouldn’t be included in the workflow, you can uncheck them in the first column of boxes.
Click on the Create Workflow button near the top.
You will get a message that the workflow was created.
Hiding intermediate steps
When a workflow is executed, the user is usually primarily interested in the final product and not in all intermediate steps. By default all the outputs of a workflow will be shown, but we can explicitly tell Galaxy which outputs to show and which to hide for a given workflow. This behaviour is controlled by the little checkbox in front of every output dataset:
Import workflows from DockStore
Dockstore is a free and open source platform for sharing reusable and scalable analytical tools and workflows.
- Ensure that you are logged in to your Galaxy account.
- Go to DockStore.
- Select any Galaxy workflow you want to import.
- Click on “Galaxy” dropdown within the “Launch with” panel located in the upper right corner.
- Select a galaxy instance you want to launch this workflow with.
- You will be redirected to Galaxy and presented with a list of workflow versions.
- Click the version you want (usually the latest labelled as “main”)
- You are done!
The following short video walks you through this uncomplicated procedure:
Import workflows from WorkflowHub
WorkflowHub is a workflow management system which allows workflows to be FAIR (Findable, Accessible, Interoperable, and Reusable), citable, have managed metadata profiles, and be openly available for review and analytics.
- Ensure that you are logged in to your Galaxy account.
- Click on the Workflow menu, located in the top bar.
- Click on the Import button, located in the right corner.
- In the section “Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)”, click on Search form.
- In the TRS Server: workflowhub.eu menu you should type your query.
- Click on the desired workflow, and finally select the latest available version.
After that, the imported workflows will appear in the main workflow menu. In order to run the workflow, just need to click in the workflow-run Run workflow icon.
Below is a short video showing this uncomplicated procedure:
Importing a workflow
- Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
- Click on galaxy-upload Import at the top-right of the screen
- Provide your workflow
- Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
- Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
- Click the Import workflow button
Below is a short video demonstrating how to import a workflow from GitHub using this procedure:
Importing a workflow using the search
- Click on Workflow in the top menu bar of Galaxy. You will see a list of all your workflows.
- Click on the galaxy-upload Import icon at the top-right of the screen
- On the new page, select the GA4GH servers tab, and configure the GA4GH Tool Registry Server (TRS) Workflow Search interface as follows:
- “TRS Server”: the TRS Server you want to search on (Dockstore or Workflowhub)
- Type in the search query
- Expand the correct workflow by clicking on it
- Select the version you would like to galaxy-upload import
The workflow will be imported to your list of workflows. Note that it will also carry a little green check mark next to its name, which indicates that this is an original workflow version imported from a TRS server. If you ever modify the workflow with Galaxy’s workflow editor, it will lose this indicator.
Importing and Launching a Dockstore Workflow
Hands-on: Importing and Launching a Dockstore Workflow
- Go to Workflow → Import in your Galaxy
- Switch tabs to TRS ID
- Ensure the TRS server is set to “dockstore.org”
- Provide your workflow hub ID
Importing and Launching a WorkflowHub.eu Workflow
Hands-on: Importing and Launching a WorkflowHub.eu Workflow
- Go to Workflow → Import in your Galaxy
- Switch tabs to TRS ID
- Ensure the TRS server is set to “workflowhub.eu”
- Provide your workflow hub ID
Importing and launching a GTN workflow
Hands-on: Importing and launching a GTN workflow
- Find the material you are interested in
- View it’s workflows, which can be found in the metadata box at the top of the tutorial
- Click the button on any workflow to run it.
Make a workflow public
- Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows
- Click on the interesting workflow
- Click on Share
- Click on Make Workflow accessible. This makes the workflow publicly accessible but unlisted.
- To also list the workflow in the Shared Data section (in the top menu bar) of Galaxy, click Make Workflow publicly available in Published Workflows
Opening the workflow editor
- In the top menu bar, click on Workflows
- Click on the name of the workflow you want to edit
- Select galaxy-wf-edit Edit from the dropdown menu to open the workflow in the workflow editor
Renaming workflow outputs
- Open the workflow editor
- Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
- Scroll down to the Configure Output section of your desired parameter, and click it to expand it.
Under Rename dataset, give it a meaningful name
Running a workflow
- Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
- Click on the workflow-run (Run workflow) button next to your workflow
- Configure the workflow as needed
- Click the Run Workflow button at the top-right of the screen
- You may have to refresh your history to see the queued jobs
Setting parameters at run-time
- Open the workflow editor
- Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
- Scroll down to the parameter you want users to provide every time they run the workflow
- Click on the arrow in front of the name workflow-runtime-toggle to toggle to set at runtime
Viewing a workflow report
You can find the workflow report from the workflow invocation
- Go to User on the top menu bar of Galaxy.
- Click on Workflow invocations
- Here you will find a list of all the workflows you have run
- Click on the name of a workflow invocation to expand it
- Click on View Report to go to the workflow report page
- Note: The report can also be downloaded in PDF format by clicking on the galaxy-wf-report-download icon.