VGP assembly pipeline
Under Development!
This tutorial is not in its final state. The content may change a lot in the next months. Because of this status, it is also not listed in the topic pages.
Overview
QuestionsObjectives
what combination of tools can produce the highest quality assembly of vertebrate genomes?
How can we evaluate how good it is?
Requirements
Learn the tools necessary to perform a de novo assembly of a vertebrate genome
Evaluate the quality of the assembly
- Introduction to Galaxy Analyses
- Sequence analysis
- Quality Control: slides slides - tutorial hands-on
Time estimation: 2 hoursSupporting MaterialsLast modification: Jul 22, 2021License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is MIT
Introduction
An assembly can be defined as a hierarchical data structure that maps the sequence data to a putative reconstruction of the target (Miller et al. 2010). Advances in sequencing technologies over the last few decades have revolutionised the field of genomics, allowing for a reduction in both the time and resources required to carry out de novo genome assembly. Until recently, second-generation DNA sequencing technologies allowed to produced either short highly accurate reads, or error-prone long reads. However, in recent years, third-generation sequencing technologies, usually known as real-time single-molecule sequencing, have become dominant in de novo assembly of large genomes. It uses native DNA fragments to sequence instead of template amplification, avoiding copying errors, sequence-dependent biases and information losses (Hon et al. 2020). An example of such a technology is PacBio Single molecule high-fidelity (HiFi) Sequencing, which enables average read lengths of 10-20 kb with average sequence identity greater than 99%, which is one of the technologies used to generate the data for this tutorial.
Deciphering the structural organisation of complex vertebrate genomes is currently one of the most important problems in genomics. (Frenkel et al. 2012). However, despite the great progress made in recent years, a key question remain to be answered: what combination of data and tools can produce the highest quality assembly? In order to adequately answer this question, it is necessary to analyse two of the main factors that determine the difficulty of genome assembly processes: repeated sequences and heterozigosity.
Repetitive sequences can be grouped into two categories: interspersed repeats, such as transposable elements (TE) that occur at multiple loci throughout the genome, and tandem repeats (TR), that occur at a single locus (Tørresen et al. 2019). Repetitive sequences, and TE in particular, are an important component of eukariotes genomes, constituting more than a third of the genome in the case of mammals (Sotero-Caio et al. 2017, Chalopin et al. 2015). In the case of tamdem repeats, various estimates suggest that they are present in at least one third of human protein sequences (Marcotte et al. 1999). TE content is probably the main factor contributing to fragmented genomes, specially in the case of large genomes, as its content is highly correlated with genome size (Sotero-Caio et al. 2017). On the other hand, TR usually lead to local genome assembly collapse and partial or complete loss of genes, specially when the read length of the sequencing method is shorter than the TR (Tørresen et al. 2019).
In the case of heterozygosity, haplotype phasing, that is, the identification of alleles that are co-located on the same chromosome, has become a fundamental problem in heterozygous and polyploid genome assemblies (Zhang et al. 2020). A common strategy to overcome these difficulties is to remap genomes to a single haplotype, which represents the whole genome. This approach is useful for highly inbred samples that are nearly homozygous, but when applied to highly heterozygous genomes, such as aquatic organism, it missses potential differences in sequence, structure, and gene presence, usually leading to ambiguties and redundancies in the initial contig-level assemblies (Angel et al. 2018, Zhang et al. 2020).
To address these problems, the G10K consortium launched the Vertebrate Genomes Project (VGP), whose goal is generating high-quality, near-error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assembly for each of the vertebrate species currently present on planet Earth (Rhie et al. 2021). The protocol proposed in this tutorial, the VGP assembly pipeline, is the result of years of study and analysis of the available tools and data sources.
Agenda
In this tutorial, we will cover:
- VGP assembly pipeline overview
- Hands-on Sections
- Genome profile analysis
- Generation of k-mer spectra with Meryl
- Sub-step with Meryl
- Sub-step with GenomeScope
- Re-arrange
- Sub-step with Parse parameter value
- Sub-step with Cutadapt
- Sub-step with Collapse Collection
- Sub-step with Hifiasm
- Sub-step with GFA to FASTA
- Sub-step with GFA to FASTA
- Sub-step with Meryl
- Sub-step with Quast
- Sub-step with Purge overlaps
- Sub-step with Busco
- Sub-step with Merqury
- Sub-step with Map with minimap2
- Sub-step with Map with minimap2
- Sub-step with Purge overlaps
- Sub-step with Compute
- Sub-step with Compute
- Sub-step with Advanced Cut
- Sub-step with Advanced Cut
- Sub-step with Parse parameter value
- Sub-step with Parse parameter value
- Sub-step with Purge overlaps
- Sub-step with Purge haplotigs
- Sub-step with Purge overlaps
- Sub-step with Merqury
- Sub-step with Bionano Hybrid Scaffold
- Sub-step with Quast
- Sub-step with Busco
- Sub-step with Concatenate datasets
- Sub-step with Concatenate datasets
- Sub-step with Map with minimap2
- Sub-step with Purge overlaps
- Sub-step with Merqury
- Sub-step with Quast
- Sub-step with Busco
- Sub-step with Map with BWA-MEM
- Sub-step with Map with BWA-MEM
- Sub-step with Purge overlaps
- Sub-step with Map with minimap2
- Sub-step with bellerophon
- Sub-step with Purge overlaps
- Sub-step with bedtools BAM to BED
- Sub-step with Purge haplotigs
- Sub-step with Sort
- Sub-step with Purge overlaps
- Sub-step with SALSA
- Sub-step with Quast
- Sub-step with Merqury
- Sub-step with Busco
- Sub-step with Merqury
- Sub-step with Busco
- Sub-step with Quast
- Sub-step with Map with BWA-MEM
- Sub-step with PretextMap
- Sub-step with Pretext Snapshot
- Re-arrange
VGP assembly pipeline overview

Give some background about what the trainees will be doing in the section. Remember that many people reading your materials will likely be novices, so make sure to explain all the relevant concepts.
Title for a subsection
Section and subsection titles will be displayed in the tutorial index on the left side of the page, so try to make them informative and concise!
Hands-on Sections
Below are a series of hand-on boxes, one for each tool in your workflow file. Often you may wish to combine several boxes into one or make other adjustments such as breaking the tutorial into sections, we encourage you to make such changes as you see fit, this is just a starting point :)
Anywhere you find the word “TODO”, there is something that needs to be changed depending on the specifics of your tutorial.
have fun!
Get data
hands_on Hands-on: Data upload
- Create a new history for this tutorial
Import the files from Zenodo or from the shared data library (
GTN - Material->assembly->VGP assembly pipeline):TODO: Add the files by the ones on Zenodo here (if not added)
TODO: Remove the useless files (if added)
Tip: Importing via links
- Copy the link location
Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
- Select Paste/Fetch Data
Paste the link into the text field
Press Start
Close the window
- By default, Galaxy uses the URL as the name, so rename the files with a more useful name.
Tip: Importing data from a data library
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
Go into Shared data (top panel) then Data libraries
Find the correct folder (ask your instructor)
- Select the desired files
- Click on the To History button near the top and select as Datasets from the dropdown menu
- In the pop-up window, select the history you want to import the files to (or create a new one)
- Click on Import
- Rename the datasets
Check that the datatype
Tip: Changing the datatype
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
- Select
datatypes- Click the Change datatype button
Add to each database a tag corresponding to …
Tip: Adding a tag
- Click on the dataset
- Click on galaxy-tags Edit dataset tags
Add a tag starting with
#Tags starting with
#will be automatically propagated to the outputs of tools using this dataset.- Check that the tag is appearing below the dataset name
Genome profile analysis
An important step before starting a de novo genome assembly project is to proceed with the analysis of the genome profile. Determining these characteristics in advance has the potential to reveal whether an analysis is not reflecting the full complexity of the genome, for example, if the number of variants is underestimated or a significant fraction of the genome is not assembled (Vurture et al. 2017).
Traditionally DNA flow citometry was considered the golden standart for estimating the genome size, one of the most important factors to determine the required coverage level. However, nowadays experimental methods have been replaced by computational approaches Wang et al. 2020. One of the most widely used procedures for undertaking genomic profiling is the analyis of k-mer frequencies. It allows to provide information not only about the genomic complexity, such as the genome size, levels of heterozygosity and repeat content, but also about the data quality. In addition, k-mer spectra analysis can be used in a reference-free manner for assessing genome assembly quality metrics (Rhie et al. 2020).
In this tutorial we will use two basic tools to computationally estimate the genome features: Meryl and GenomeScope.
Generation of k-mer spectra with Meryl
Meryl will allow us to perform the k-mer profiling by decomposing the sequencing data into k-lenght substrings and determining its frequency. The original version was developed for use in the Celera Assembler, and it comprises three modules: one for generating k-mer databases, one for filtering and combining databases, and one for searching databases. The k-mer database is stored in sorted order, similar to words in a dictionary (Rhie et al. 2020).
comment K-mer size estimation
One of the important aspects is the size of the k-mer, which must be large enough to map uniquely to the genome, but not too large, since it can lead to wasting computational resources. Given an estimated genome size (G) and a tolerable collision rate (p), an appropriate k can be computed as k = log4 (G(1 − p)/p).
hands_on Hands-on: Task description
- Meryl Tool: toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy2 with the following parameters:
- “Operation type selector”:
Count operations
- param-collection “Input sequences”:
output(Input dataset collection)- “K-mer size selector”:
Estimate the best k-mer sizeTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Meryl
hands_on Hands-on: Task description
- Meryl Tool: toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy1 with the following parameters:
- “Operation type selector”:
Operations on sets of k-mers
- “Operations on sets of k-mers”:
Union-sum: return k-mers that occur in any input, set the count to the sum of the counts- param-file “Input meryldb”:
read_db(output of Meryl tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with GenomeScope
hands_on Hands-on: Task description
- GenomeScope Tool: toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0 with the following parameters:
- param-file “Input histogram file”:
read_db_hist(output of Meryl tool)- “Add the model parameters to your history”:
Yes- “Output a summary of the analysis”:
Yes- “K-mer length used to calculate k-mer spectra”:
31- “Create testing.tsv file with model parameters”:
YesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Re-arrange
To create the template, each step of the workflow had its own subsection.
TODO: Re-arrange the generated subsections into sections or other subsections. Consider merging some hands-on boxes to have a meaningful flow of the analyses
Conclusion
Sum up the tutorial and the key takeaways here. We encourage adding an overview image of the pipeline used.
Sub-step with Parse parameter value
hands_on Hands-on: Task description
- Parse parameter value Tool: param_value_from_file with the following parameters:
- param-file “Input file containing parameter to parse out of”:
output(Input dataset)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Cutadapt
hands_on Hands-on: Task description
- Cutadapt Tool: toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/3.4 with the following parameters:
- “Single-end or Paired-end reads?”:
Single-end
- param-collection “FASTQ/A file”:
output(Input dataset collection)- In “Read 1 Options”:
- In “5’ or 3’ (Anywhere) Adapters”:
- param-repeat “Insert 5’ or 3’ (Anywhere) Adapters”
- “Source”:
Enter custom sequence
- “Enter custom 5’ or 3’ adapter sequence”:
ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT- param-repeat “Insert 5’ or 3’ (Anywhere) Adapters”
- “Source”:
Enter custom sequence
- “Enter custom 5’ or 3’ adapter sequence”:
ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT- In “Adapter Options”:
- “Match times”:
3- “Maximum error rate”:
0.1- “Minimum overlap length”:
35- “Look for adapters in the reverse complement”:
True- In “Filter Options”:
- “Discard Trimmed Reads”:
YesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Collapse Collection
hands_on Hands-on: Task description
- Collapse Collection Tool: toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2 with the following parameters:
- param-file “Collection of files to collapse into single dataset”:
out1(output of Cutadapt tool)- “Prepend File name”:
YesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Hifiasm
hands_on Hands-on: Task description
- Hifiasm Tool: toolshed.g2.bx.psu.edu/repos/bgruening/hifiasm/hifiasm/0.14+galaxy0 with the following parameters:
- “Assembly mode”:
Standard
- param-file “Input reads”:
out1(output of Cutadapt tool)- “Advanced options”:
Leave default- “Assembly options”:
Leave default- “Options for purging duplicates”:
SpecifyTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with GFA to FASTA
hands_on Hands-on: Task description
- GFA to FASTA Tool: toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2 with the following parameters:
- param-file “Input GFA file”:
primary_contig_graph(output of Hifiasm tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with GFA to FASTA
hands_on Hands-on: Task description
- GFA to FASTA Tool: toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2 with the following parameters:
- param-file “Input GFA file”:
alternate_contig_graph(output of Hifiasm tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Meryl
hands_on Hands-on: Task description
- Meryl Tool: toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy0 with the following parameters:
- “Operation type selector”:
Generate histogram dataset
- param-file “Input meryldb”:
read_db(output of Meryl tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Quast
hands_on Hands-on: Task description
- Quast Tool: toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 with the following parameters:
- “Use customized names for the input files?”:
No, use dataset names
- param-file “Contigs/scaffolds file”:
out_fa(output of GFA to FASTA tool)- “Type of assembly”:
Genome
- “Use a reference genome?”:
No- In “Genes”:
- “Tool for gene prediction”:
Don't predict genesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
split FASTA file by 'N's
- param-file “Base-level coverage file”:
out_fa(output of GFA to FASTA tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Busco
hands_on Hands-on: Task description
- Busco Tool: toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.0.0+galaxy0 with the following parameters:
- param-file “Sequences to analyse”:
out_fa(output of GFA to FASTA tool)- “Mode”:
Genome assemblies (DNA)
- “Use Augustus instead of Metaeuk”:
Use Metaeuk- “Lineage”:
Vertebrata- In “Advanced Options”:
- “Which outputs should be generated”: ``
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Merqury
hands_on Hands-on: Task description
- Merqury Tool: toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 with the following parameters:
- “Evaluation mode”:
Default mode
- param-file “K-mer counts database”:
read_db(output of Meryl tool)- “Number of assemblies”:
One assembly (pseudo-haplotype or mixed-haplotype)
- param-file “Genome assembly”:
out_fa(output of GFA to FASTA tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with minimap2
hands_on Hands-on: Task description
- Map with minimap2 Tool: toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.17+galaxy4 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
out_fa(output of GFA to FASTA tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
out1(output of Cutadapt tool)- “Select a profile of preset options”:
Long assembly to reference mapping (-k19 -w19 -A1 -B19 -O39,81 -E3,1 -s200 -z200 --min-occ-floor=100). Typically, the alignment will not extend to regions with 5% or higher sequence divergence. Only use this preset if the average divergence is far below 5%. (asm5)- In “Alignment options”:
- “Customize spliced alignment mode?”:
No, use profile setting or leave turned off- In “Set advanced output options”:
- “Select an output format”:
pafTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with minimap2
hands_on Hands-on: Task description
- Map with minimap2 Tool: toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.17+galaxy4 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
split_fasta(output of Purge overlaps tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
split_fasta(output of Purge overlaps tool)- “Select a profile of preset options”:
Construct a self-homology map - use same genome as query and reference (-DP -k19 -w19 -m200) (self-homology)- In “Mapping options”:
- “force minimap2 to always use k-mers occuring this many times or fewer”:
100- “minimal chaining score (matching bases minus log gap penalty)”:
40- In “Alignment options”:
- “Customize spliced alignment mode?”:
No, use profile setting or leave turned off- “Score for a sequence match”:
1- “Penalty for a mismatch”:
19- “Gap open penalties for deletions”:
39- “Gap open penalties for insertions”:
81- “Gap extension penalties; a gap of size k cost ‘-O + -Ek’. If two numbers are specified, the first is the penalty of extending a deletion and the second for extending an insertion”*:
3- “Gap extension penalty for extending an insertion; if left empty uses the value specified for Gap extension penalties above”:
1- “Z-drop threshold for truncating an alignment”:
200- In “Set advanced output options”:
- “Select an output format”:
pafTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
create read depth histogram and base-level read depth for pacbio data
- param-file “PAF input file”:
alignment_output(output of Map with minimap2 tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Compute
hands_on Hands-on: Task description
- Compute Tool: toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 with the following parameters:
- “Add expression”:
1.5*c3- param-file “as a new column to”:
model_params(output of GenomeScope tool)- “Round result?”:
Yes- “Input has a header line with column names?”:
NoTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Compute
hands_on Hands-on: Task description
- Compute Tool: toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 with the following parameters:
- “Add expression”:
3*c7- param-file “as a new column to”:
out_file1(output of Compute tool)- “Round result?”:
Yes- “Input has a header line with column names?”:
NoTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Advanced Cut
hands_on Hands-on: Task description
- Advanced Cut Tool: toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0 with the following parameters:
- param-file “File to cut”:
out_file1(output of Compute tool)- “Cut by”:
fields
- “List of Fields”:
c8TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Advanced Cut
hands_on Hands-on: Task description
- Advanced Cut Tool: toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0 with the following parameters:
- param-file “File to cut”:
out_file1(output of Compute tool)- “Cut by”:
fields
- “List of Fields”:
cc7TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Parse parameter value
hands_on Hands-on: Task description
- Parse parameter value Tool: param_value_from_file with the following parameters:
- param-file “Input file containing parameter to parse out of”:
output(output of Advanced Cut tool)- “Select type of parameter to parse”:
IntegerTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Parse parameter value
hands_on Hands-on: Task description
- Parse parameter value Tool: param_value_from_file with the following parameters:
- param-file “Input file containing parameter to parse out of”:
output(output of Advanced Cut tool)- “Select type of parameter to parse”:
IntegerTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
calculate coverage cutoffs
- param-file “STAT input file”:
pbcstat_stat(output of Purge overlaps tool)- “Transition between haploid and diploid”:
{'id': 26, 'output_name': 'integer_param'}- “Upper bound for read depth”:
{'id': 25, 'output_name': 'integer_param'}TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge haplotigs
hands_on Hands-on: Task description
- Purge haplotigs Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy0 with the following parameters:
- “Select the purge_dups function”:
purge haplotigs and overlaps for an assembly
- param-file “PAF input file”:
alignment_output(output of Map with minimap2 tool)- param-file “Base-level coverage file”:
pbcstat_cov(output of Purge overlaps tool)- param-file “Cutoffs file”:
calcuts_tab(output of Purge overlaps tool)- “Rounds of chaining”:
1 roundTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
obtain seqeuences after purging
- param-file “Fasta input file”:
out_fa(output of GFA to FASTA tool)- param-file “Bed input file”:
purge_dups_bed(output of Purge haplotigs tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Merqury
hands_on Hands-on: Task description
- Merqury Tool: toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 with the following parameters:
- “Evaluation mode”:
Default mode
- param-file “K-mer counts database”:
read_db(output of Meryl tool)- “Number of assemblies”:
One assembly (pseudo-haplotype or mixed-haplotype)
- param-file “Genome assembly”:
get_seqs_purged(output of Purge overlaps tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Bionano Hybrid Scaffold
hands_on Hands-on: Task description
- Bionano Hybrid Scaffold Tool: toolshed.g2.bx.psu.edu/repos/bgruening/bionano_scaffold/bionano_scaffold/3.6.1+galaxy2 with the following parameters:
- param-file “NGS FASTA”:
get_seqs_purged(output of Purge overlaps tool)- param-file “BioNano CMAP”:
output(Input dataset)- “Configuration mode”:
VGP modeTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Quast
hands_on Hands-on: Task description
- Quast Tool: toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 with the following parameters:
- “Use customized names for the input files?”:
No, use dataset names
- param-file “Contigs/scaffolds file”:
get_seqs_purged(output of Purge overlaps tool)- “Type of assembly”:
Genome
- “Use a reference genome?”:
No- In “Genes”:
- “Tool for gene prediction”:
Don't predict genesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Busco
hands_on Hands-on: Task description
- Busco Tool: toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.0.0+galaxy0 with the following parameters:
- param-file “Sequences to analyse”:
get_seqs_purged(output of Purge overlaps tool)- “Mode”:
Genome assemblies (DNA)
- “Use Augustus instead of Metaeuk”:
Use Metaeuk- “Lineage”:
Vertebrata- In “Advanced Options”:
- “Which outputs should be generated”: ``
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Concatenate datasets
hands_on Hands-on: Task description
- Concatenate datasets Tool: cat1 with the following parameters:
- param-file “Concatenate Dataset”:
get_seqs_hap(output of Purge overlaps tool)- In “Dataset”:
- param-repeat “Insert Dataset”
- param-file “Select”:
out_fa(output of GFA to FASTA tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Concatenate datasets
hands_on Hands-on: Task description
- Concatenate datasets Tool: cat1 with the following parameters:
- param-file “Concatenate Dataset”:
ngs_contigs_scaffold_trimmed(output of Bionano Hybrid Scaffold tool)- In “Dataset”:
- param-repeat “Insert Dataset”
- param-file “Select”:
ngs_contigs_not_scaffolded_trimmed(output of Bionano Hybrid Scaffold tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with minimap2
hands_on Hands-on: Task description
- Map with minimap2 Tool: toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.17+galaxy4 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
out_file1(output of Concatenate datasets tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
out1(output of Cutadapt tool)- “Select a profile of preset options”:
Long assembly to reference mapping (-k19 -w19 -A1 -B19 -O39,81 -E3,1 -s200 -z200 --min-occ-floor=100). Typically, the alignment will not extend to regions with 5% or higher sequence divergence. Only use this preset if the average divergence is far below 5%. (asm5)- In “Alignment options”:
- “Customize spliced alignment mode?”:
No, use profile setting or leave turned off- In “Set advanced output options”:
- “Select an output format”:
pafTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
split FASTA file by 'N's
- param-file “Base-level coverage file”:
out_file1(output of Concatenate datasets tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Merqury
hands_on Hands-on: Task description
- Merqury Tool: toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 with the following parameters:
- “Evaluation mode”:
Default mode
- param-file “K-mer counts database”:
read_db(output of Meryl tool)- “Number of assemblies”:
One assembly (pseudo-haplotype or mixed-haplotype)
- param-file “Genome assembly”:
out_file1(output of Concatenate datasets tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Quast
hands_on Hands-on: Task description
- Quast Tool: toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 with the following parameters:
- “Use customized names for the input files?”:
No, use dataset names
- param-file “Contigs/scaffolds file”:
out_file1(output of Concatenate datasets tool)- “Type of assembly”:
Genome
- “Use a reference genome?”:
No- In “Genes”:
- “Tool for gene prediction”:
Don't predict genesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Busco
hands_on Hands-on: Task description
- Busco Tool: toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.0.0+galaxy0 with the following parameters:
- param-file “Sequences to analyse”:
out_file1(output of Concatenate datasets tool)- “Mode”:
Genome assemblies (DNA)
- “Use Augustus instead of Metaeuk”:
Use Metaeuk- “Lineage”:
Vertebrata- In “Advanced Options”:
- “Which outputs should be generated”: ``
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with BWA-MEM
hands_on Hands-on: Task description
- Map with BWA-MEM Tool: toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
out_file1(output of Concatenate datasets tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
output(Input dataset)- “Set read groups information?”:
Do not set- “Select analysis mode”:
1.Simple Illumina mode- “BAM sorting mode”:
Sort by read names (i.e., the QNAME field)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with BWA-MEM
hands_on Hands-on: Task description
- Map with BWA-MEM Tool: toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
out_file1(output of Concatenate datasets tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
output(Input dataset)- “Set read groups information?”:
Do not set- “Select analysis mode”:
1.Simple Illumina mode- “BAM sorting mode”:
Sort by read names (i.e., the QNAME field)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
create read depth histogram and base-level read depth for pacbio data
- param-file “PAF input file”:
alignment_output(output of Map with minimap2 tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with minimap2
hands_on Hands-on: Task description
- Map with minimap2 Tool: toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.17+galaxy4 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
split_fasta(output of Purge overlaps tool)- “Single or Paired-end reads”:
Single
- param-file “Select fastq dataset”:
split_fasta(output of Purge overlaps tool)- “Select a profile of preset options”:
Construct a self-homology map - use same genome as query and reference (-DP -k19 -w19 -m200) (self-homology)- In “Mapping options”:
- “force minimap2 to always use k-mers occuring this many times or fewer”:
100- “minimal chaining score (matching bases minus log gap penalty)”:
40- In “Alignment options”:
- “Customize spliced alignment mode?”:
No, use profile setting or leave turned off- “Score for a sequence match”:
1- “Penalty for a mismatch”:
19- “Gap open penalties for deletions”:
39- “Gap open penalties for insertions”:
81- “Gap extension penalties; a gap of size k cost ‘-O + -Ek’. If two numbers are specified, the first is the penalty of extending a deletion and the second for extending an insertion”*:
3- “Gap extension penalty for extending an insertion; if left empty uses the value specified for Gap extension penalties above”:
1- “Z-drop threshold for truncating an alignment”:
200- In “Set advanced output options”:
- “Select an output format”:
pafTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with bellerophon
hands_on Hands-on: Task description
bellerophon Tool: toolshed.g2.bx.psu.edu/repos/iuc/bellerophon/bellerophon/1.0+galaxy0 with the following parameters:
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
calculate coverage cutoffs
- param-file “STAT input file”:
pbcstat_stat(output of Purge overlaps tool)- “Transition between haploid and diploid”:
31- “Upper bound for read depth”:
94TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with bedtools BAM to BED
hands_on Hands-on: Task description
- bedtools BAM to BED Tool: toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_bamtobed/2.30.0+galaxy1 with the following parameters:
- param-file “Convert the following BAM file to BED”:
outfile(output of bellerophon tool)- “What type of BED output would you like”:
Create a full, 12-column "blocked" BED fileTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge haplotigs
hands_on Hands-on: Task description
- Purge haplotigs Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy0 with the following parameters:
- “Select the purge_dups function”:
purge haplotigs and overlaps for an assembly
- param-file “PAF input file”:
alignment_output(output of Map with minimap2 tool)- param-file “Base-level coverage file”:
pbcstat_cov(output of Purge overlaps tool)- param-file “Cutoffs file”:
calcuts_tab(output of Purge overlaps tool)- “Rounds of chaining”:
1 roundTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Sort
hands_on Hands-on: Task description
- Sort Tool: sort1 with the following parameters:
- param-file “Sort Dataset”:
output(output of bedtools BAM to BED tool)- “on column”:
c4- “with flavor”:
Alphabetical sort- “everything in”:
Ascending orderTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Purge overlaps
hands_on Hands-on: Task description
- Purge overlaps Tool: toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2 with the following parameters:
- “Select the purge_dups function”:
obtain seqeuences after purging
- param-file “Fasta input file”:
out_file1(output of Concatenate datasets tool)- param-file “Bed input file”:
purge_dups_bed(output of Purge haplotigs tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with SALSA
hands_on Hands-on: Task description
- SALSA Tool: toolshed.g2.bx.psu.edu/repos/iuc/salsa/salsa/2.2+galaxy0 with the following parameters:
- param-file “Initial assembly file”:
out_file1(output of Concatenate datasets tool)- param-file “Bed alignment”:
out_file1(output of Sort tool)- “Restriction enzyme sequence(s)”:
{'id': 5, 'output_name': 'text_param'}TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Quast
hands_on Hands-on: Task description
- Quast Tool: toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 with the following parameters:
- “Use customized names for the input files?”:
No, use dataset names
- param-file “Contigs/scaffolds file”:
get_seqs_purged(output of Purge overlaps tool)- “Type of assembly”:
Genome
- “Use a reference genome?”:
No- In “Genes”:
- “Tool for gene prediction”:
Don't predict genesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Merqury
hands_on Hands-on: Task description
- Merqury Tool: toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 with the following parameters:
- “Evaluation mode”:
Default mode
- param-file “K-mer counts database”:
read_db(output of Meryl tool)- “Number of assemblies”:
One assembly (pseudo-haplotype or mixed-haplotype)
- param-file “Genome assembly”:
get_seqs_purged(output of Purge overlaps tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Busco
hands_on Hands-on: Task description
- Busco Tool: toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.0.0+galaxy0 with the following parameters:
- param-file “Sequences to analyse”:
get_seqs_purged(output of Purge overlaps tool)- “Mode”:
Genome assemblies (DNA)
- “Use Augustus instead of Metaeuk”:
Use Metaeuk- “Lineage”:
Vertebrata- In “Advanced Options”:
- “Which outputs should be generated”: ``
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Merqury
hands_on Hands-on: Task description
- Merqury Tool: toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 with the following parameters:
- “Evaluation mode”:
Default mode
- param-file “K-mer counts database”:
read_db(output of Meryl tool)- “Number of assemblies”:
One assembly (pseudo-haplotype or mixed-haplotype)
- param-file “Genome assembly”:
scaffolds_fasta(output of SALSA tool)TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Busco
hands_on Hands-on: Task description
- Busco Tool: toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.0.0+galaxy0 with the following parameters:
- param-file “Sequences to analyse”:
scaffolds_fasta(output of SALSA tool)- “Mode”:
Genome assemblies (DNA)
- “Use Augustus instead of Metaeuk”:
Use Metaeuk- “Lineage”:
Vertebrata- In “Advanced Options”:
- “Which outputs should be generated”: ``
TODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Quast
hands_on Hands-on: Task description
- Quast Tool: toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 with the following parameters:
- “Use customized names for the input files?”:
No, use dataset names
- param-file “Contigs/scaffolds file”:
scaffolds_fasta(output of SALSA tool)- “Type of assembly”:
Genome
- “Use a reference genome?”:
No- In “Genes”:
- “Tool for gene prediction”:
Don't predict genesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Map with BWA-MEM
hands_on Hands-on: Task description
- Map with BWA-MEM Tool: toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 with the following parameters:
- “Will you select a reference genome from your history or use a built-in index?”:
Use a genome from history and build index
- param-file “Use the following dataset as the reference sequence”:
scaffolds_fasta(output of SALSA tool)- “Single or Paired-end reads”:
Paired
- param-file “Select first set of reads”:
output(Input dataset)- param-file “Select second set of reads”:
output(Input dataset)- “Set read groups information?”:
Do not set- “Select analysis mode”:
1.Simple Illumina modeTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with PretextMap
hands_on Hands-on: Task description
- PretextMap Tool: toolshed.g2.bx.psu.edu/repos/iuc/pretext_map/pretext_map/0.1.6+galaxy0 with the following parameters:
- param-file “Input dataset in SAM or BAM format”:
bam_output(output of Map with BWA-MEM tool)- “Sort by”:
Don't sortTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Sub-step with Pretext Snapshot
hands_on Hands-on: Task description
- Pretext Snapshot Tool: toolshed.g2.bx.psu.edu/repos/iuc/pretext_snapshot/pretext_snapshot/0.0.3+galaxy0 with the following parameters:
- param-file “Input Pretext map file”:
pretext_map_out(output of PretextMap tool)- “Output image format”:
png- “Show grid?”:
YesTODO: Check parameter descriptions
TODO: Consider adding a comment or tip box
comment Comment
A comment about the tool or something else. This box can also be in the main text
TODO: Consider adding a question to test the learners understanding of the previous exercise
question Questions
- Question1?
- Question2?
solution Solution
- Answer for question1
- Answer for question2
Re-arrange
To create the template, each step of the workflow had its own subsection.
TODO: Re-arrange the generated subsections into sections or other subsections. Consider merging some hands-on boxes to have a meaningful flow of the analyses
Conclusion
Sum up the tutorial and the key takeaways here. We encourage adding an overview image of the pipeline used.
Key points
The take-home messages
They will appear at the end of the tutorial
Frequently Asked Questions
Have questions about this tutorial? Check out the FAQ page for the Assembly topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help ForumReferences
- Marcotte, E. M., M. Pellegrini, T. O. Yeates, and D. Eisenberg, 1999 A census of protein repeats. Journal of Molecular Biology 293: 151–160. 10.1006/jmbi.1999.3136
- Miller, J. R., S. Koren, and G. Sutton, 2010 Assembly algorithms for next-generation sequencing data. Genomics 95: 315–327. 10.1016/j.ygeno.2010.03.001
- Frenkel, S., V. Kirzhner, and A. Korol, 2012 Organizational Heterogeneity of Vertebrate Genomes (V. Laudet, Ed.). PLoS ONE 7: e32076. 10.1371/journal.pone.0032076
- Chalopin, D., M. Naville, F. Plard, D. Galiana, and J.-N. Volff, 2015 Comparative Analysis of Transposable Elements Highlights Mobilome Diversity and Evolution in Vertebrates. Genome Biology and Evolution 7: 567–580. 10.1093/gbe/evv005
- Sotero-Caio, C. G., R. N. Platt, A. Suh, and D. A. Ray, 2017 Evolution and Diversity of Transposable Elements in Vertebrate Genomes. Genome Biology and Evolution 9: 161–177. 10.1093/gbe/evw264
- Vurture, G. W., F. J. Sedlazeck, M. Nattestad, C. J. Underwood, H. Fang et al., 2017 GenomeScope: fast reference-free genome profiling from short reads (B. Berger, Ed.). Bioinformatics 33: 2202–2204. 10.1093/bioinformatics/btx153
- Angel, V. D. D., E. Hjerde, L. Sterck, S. Capella-Gutierrez, C. Notredame et al., 2018 Ten steps to get started in Genome Assembly and Annotation. F1000Research 7: 148. 10.12688/f1000research.13598.1
- Tørresen, O. K., B. Star, P. Mier, M. A. Andrade-Navarro, A. Bateman et al., 2019 Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research 47: 10994–11006. 10.1093/nar/gkz841
- Wang, H., B. Liu, Y. Zhang, F. Jiang, Y. Ren et al., 2020 Estimation of genome size using k-mer frequencies from corrected long reads. arXiv preprint arXiv:2003.11817.
- Zhang, X., R. Wu, Y. Wang, J. Yu, and H. Tang, 2020 Unzipping haplotypes in diploid and polyploid genomes. Computational and Structural Biotechnology Journal 18: 66–72. 10.1016/j.csbj.2019.11.011
- Rhie, A., B. P. Walenz, S. Koren, and A. M. Phillippy, 2020 Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21: 10.1186/s13059-020-02134-9
- Hon, T., K. Mars, G. Young, Y.-C. Tsai, J. W. Karalius et al., 2020 Highly accurate long-read HiFi sequencing data for five complex genomes. Scientific Data 7: 10.1038/s41597-020-00743-4
- Rhie, A., S. A. McCarthy, O. Fedrigo, J. Damas, G. Formenti et al., 2021 Towards complete and error-free genome assemblies of all vertebrate species. Nature 592: 737–746. 10.1038/s41586-021-03451-0
Feedback
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Citing this Tutorial
- Delphine Lariviere, Alex Ostrovsky, 2021 VGP assembly pipeline (Galaxy Training Materials). https://training.galaxyproject.org/archive/2021-08-01/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html Online; accessed TODAY
- Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
details BibTeX
@misc{assembly-vgp_genome_assembly, author = "Delphine Lariviere and Alex Ostrovsky", title = "VGP assembly pipeline (Galaxy Training Materials)", year = "2021", month = "07", day = "22" url = "\url{https://training.galaxyproject.org/archive/2021-08-01/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html}", note = "[Online; accessed TODAY]" } @article{Batut_2018, doi = {10.1016/j.cels.2018.05.012}, url = {https://doi.org/10.1016%2Fj.cels.2018.05.012}, year = 2018, month = {jun}, publisher = {Elsevier {BV}}, volume = {6}, number = {6}, pages = {752--758.e1}, author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning}, title = {Community-Driven Data Analysis Training for Biology}, journal = {Cell Systems} }
Questions