How to use Custom Reference Genomes?
A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome assembly build or release.
There are two options for reference genomes in Galaxy.
- Index provided by the server administrators.
- Found on tool forms in a drop down menu.
- A database key is automatically assigned. See tip 1.
- The database is what links your data to a FASTA index. Example: used with BAM data
- FASTA file uploaded by users.
- Input on tool forms then indexed at runtime by the tool.
- An optional custom database key can be created and assigned by the user.
There are five basic steps to use a Custom Reference Genome, plus one optional.
- Obtain a FASTA copy of the target genome. See tip 2.
- Upload the genome to Galaxy and to add it as a dataset in your history.
- Clean up the format with the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
- Make sure the chromosome identifiers are a match for other inputs.
- Set a tool form’s options to use a custom reference genome from the history and select the loaded genome FASTA.
- (Optional) Create a custom genome build’s database that you can assign to datasets.
tip TIP 2: When choosing your reference genome, consider choosing your reference annotation at the same time. Standardize the format of both as a preparation step. Put the files in a dedicated “reference data” history for easy reuse.