Calling very rare variants

Author(s)	Anton Nekrutenko Nick Stoler
Reviewers

Overview
Questions:

What frequency of variants is so low that it is obscured by sequencing error rate?

What are the different types of consensus sequences produced from duplex sequencing?

Objectives:

Processing raw duplex sequencing data into consensus sequences

Find rare variants without relying on diploid assumptions

Requirements:

Introduction to Galaxy Analyses

slides Slides: Quality Control

tutorial Hands-on: Quality Control

slides Slides: Mapping

tutorial Hands-on: Mapping

Time estimation: 3 hours

Supporting Materials:

Datasets

Workflows

FAQs

instances Available on these Galaxies

Known Working

UseGalaxy.org (Main) ✅ ⭐️

Published: Feb 22, 2017

Last modification: Dec 9, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00310

rating Rating: 5.0 (2 recent ratings, 3 all time)

version Revision: 21

This page explains how to perform discovery of low frequency variants from duplex sequencing data. As an example we use the ABL1 dataset published by Schmitt and colleagues (SRA accession SRR1799908).

Agenda

Background

How to use this tutorial

Generating consensus sequences

Getting data in and assessing quality

Processing reads into duplex consensus sequences with Du Novo

Filtering consensuses

Calling variants with duplex consensus sequences

Mapping the reads

Calling the variants

Results

Calling variants with single strand consensus sequences

Re-running analyses with workflows

Conclusion

Background

Finding rare variants

Most popular variant callers focus on the common case of sequencing a diploid individual to find heterozygous and homozygous variants. This is a well-studied problem with its own challenges, but at least you can expect your variants to be present in either 100%, 50%, or 0% of your sample DNA. If you observe a variant present in 99%, 56%, or 2% of the reads at a site, you can probably assume the allele is actually present at 100%, 50%, or 0%, respectively, in your sample.

But in this tutorial, we’re looking for rare variants. So our true frequency might actually be 13%, 1%, or even 0.4%. The challenge then becomes distinguishing these situations from sequencing errors. Next-generation sequencers produce noise at this level, making it challenging to make this distinction in data produced with standard resequencing methods.

Duplex sequencing

Duplex sequencing is a method that addresses the problem of distinguishing sequencing signal from noise. It can increase sequencing accuracy by over four orders of magnitude. Duplex sequencing uses randomly generated oligomers to uniquely tag each fragment in a sample after random shearing. The tagged fragments are then PCR amplified prior to sequencing, so that many reads can be obtained from each original molecule. The tags in each read can then be used to identify which original fragment the read came from. Identifying multiple reads from each fragment allows building a consensus of the original sequence of the fragment, eliminating errors.

The key to duplex sequencing, as opposed to other types of consensus-based methods (review here), is that both ends of the original fragment are tagged such that its strands can be distinguished. Knowing which strand each read comes from allows us to recognize errors even in the first round of PCR.

Processing the raw reads into consensus sequences consists of four main steps:

Group reads by their tags.
Align reads in the same tag group.
Build single-strand consensus sequences (SSCS) of reads coming from the same original strand.
Build duplex consensus sequences (DCS) from pairs of SSCS.

Du Novo is a tool which can carry out these steps. Unlike most other such tools, it can do so without the use of a reference sequence, and it can correct for errors in the tags which can contribute to data loss.

Comment: Terminology

Du Novo processes the tags from each fragment by concatenating them into a single barcode.

For a standard protocol with two 12bp tags, this results in a 24bp barcode which identifies each family.

Schmitt et al. 2012 provides this overview of the whole method:

Figure 1: The logic of duplex sequencing. The computational process is shown in part C.

The value of single-strand consensus sequences

The DCSs have the ultimate accuracy, yet the SSCSs can also be very useful when ampliconic DNA is used as an input to a duplex experiment. Let us illustrate the utility of SSCSs with the following example. Suppose one is interested in quantifying variants in a virus that has a very low titer in body fluids. Since the duplex procedure requires a substantial amount of starting DNA (between between 0.2 and 3 micrograms) the virus needs to be enriched. This can be done, for example, with a PCR designed to amplify the entire genome of the virus. Yet the problem is that during the amplification heterologous strands will almost certainly realign to some extent forming heteroduplex molecules:

Figure 2: Heteroduplex formation in ampliconic templates. Image by Barbara Arbeithuber from Stoler et al. 2016. Here there are two distinct types of viral genomes: carrying A and G. Because the population of genomes is enriched via PCR, heteroduplex formation takes place, skewing frequency estimates performed using DCSs.

In the image above there are two alleles: green (A) and red (G). After PCR a fraction of molecules are in heteroduplex state. If this PCR-derived DNA is now used as the starting material for a DS experiment, the heteroduplex molecules will manifest themselves as having an N base at this site (because Du Novo interprets disagreements as Ns during consensus generation). So, DSCs produced from this dataset will have A, G, and N at the polymorphic site. Yet, SSCSs will only have A and G. Thus SSCS will give a more accurate estimate of the allele frequency at this site in this particular case. In Du Novo SSCSs are generated when the param-check Output single-strand consensus sequences option of tool Du Novo: Make consensus reads tool is set to Yes (see below).

How to use this tutorial

The entire analysis described here is accessible as a Galaxy history that you can copy and play with.

Comment: Running the tools

Leave all parameters on their default settings, unless instructed otherwise.

Comment: Helping Du Novo

But if you’d like to help improve Du Novo, consider checking Yes under param-check Send usage data.

This analysis can be divided into three parts:

Generating consensus sequences
Calling variants with duplex consensus sequences
Calling variants with single strand consensus sequences

Here are the steps, displayed as the Galaxy history you’ll end up with if you follow the instructions:

Note: Galaxy histories show the first step at the bottom!

Figure 3: Analysis outline

Generating consensus sequences

The starting point of the analysis is sequencing reads (in FASTQ format) produced from a duplex sequencing library.

Getting data in and assessing quality

You can obtain the data from Schmitt et al. 2015 from Zenodo, or just import the data from the Galaxy Data library under the “GTN Materials” folder.

Hands On: Importing the raw data
Import the datasets from Zenodo, or from a data library.
https://zenodo.org/record/3554549/files/SRR1799908_forward.fastq
https://zenodo.org/record/3554549/files/SRR1799908_reverse.fastq
Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

Go into Libraries (left panel)

Navigate to the correct folder as indicated by your instructor.

On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.

Select the desired files

Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu

In the pop-up window, choose

“Select history”: the history you want to import the data to (or create a new one)

Click on Import
Rename galaxy-pencil the datasets to

SRR1799908_forward

SRR1799908_reverse

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, change the Name field

Click the Save button

This creates two datasets in our galaxy history: one for forward reads and one for reverse.

We then evaluated the quality of the data by running FastQC on both datasets (forward and reverse):

Hands On: Evaluating input read quality

FastQC ( Galaxy version 0.74+galaxy1) with the following parameters:

param-file Short read data from your current history: One of the raw FASTQ datasets (SRR1799908_forward)

Leave the rest of the options as their defaults.

Repeat with the other FASTQ file (SRR1799908_reverse)

This created two datasets in our galaxy history: one for forward reads and one for reverse. We then evaluated the quality of the data by running FastQC on both datasets (forward and reverse). You can read about using tool FastQC in the dedicated quality-control tutorial.

This gave us the following plots:

Quality scores across all bases: foward.

Quality scores across all bases: reverse.

Figure 4: FastQC assessment of the quality of the raw reads. Left: Forward reads. Right: Reverse reads.

One can see that these data are of excellent quality and no additional processing is required before we can start the actual analysis.

Processing reads into duplex consensus sequences with Du Novo

Now we are ready to collapse the raw reads into duplex consensus sequences.

Sorting reads into families

The tool Du Novo: Make families tool will separate the 12bp tags from each read pair and concatenate them into a 24bp barcode. Then, it will use the barcodes to sort the reads into families that all descend from the same original fragment.

Hands On: Sorting reads into families

Du Novo: Make families ( Galaxy version 3.0.2) with the following parameters:

param-file Sequencing reads, mate 1: The forward raw reads (SRR1799908_forward)

param-file Sequencing reads, mate 2: The reverse raw reads (SRR1799908_reverse)

param-text Tag length: 12

Correcting barcodes

The grouping reads based on barcode relies on exact barcode matches. Any PCR or sequencing error in the barcode sequence will prevent the affected reads from being joined with their other family members.

Du Novo includes a tool which can correct most of these errors and recover the affected reads. This can increase the final yield of duplex consensus reads by up to 11% (Stoler et al. 2018, in preparation).

Hands On: Correcting barcodes

Du Novo: Correct barcodes ( Galaxy version 3.0.2) with the following parameters:

param-file Input reads: The output of tool Make families

param-text Maximum differences: 3

Aligning families

After grouping reads that came from the same original fragment, we need to align them with each other. This next tool will perform a multiple sequence alignment on each family.

Comment: Analysis bottleneck

This is by far the most time-consuming step.

On this dataset, it took 2 hours to complete when run on Galaxy Main.

At the time, Galaxy allocated 6 cores to the job.

Hands On: Aligning families

Du Novo: Align families ( Galaxy version 3.0.2) with the following parameters:

param-file Input reads: The output of tool Correct barcodes

param-select Multiple sequence aligner: Kalign2

Making consensus sequences

Now, we need to collapse the aligned reads into consensus sequences. This next tool will process each group of aligned reads that came from the same single-stranded family into a consensus. Then it will align the consensus sequences from the two strands of each original molecule, and call a consensus between them.

Normally, the tool only produces the final double-stranded consensus sequences. But we will make use of the single-stranded consensus sequences later, so we’ll tell it to keep those as well.

Hands On: Making consensus sequences

Du Novo: Make consensus reads ( Galaxy version 3.0.2) with the following parameters:

param-file Aligned input reads: The output of tool Align families

param-text Minimum reads for a consensus sequence: 3

param-text Consensus % threshold: 0.7

param-select Output format: FASTQ

param-check Output single-strand consensus sequences as well: Yes

This tool will produce 4 output files, in two sets of paired-end FASTQ files:

one pair for the single-stranded consensus sequences (called SSCS mate 1 and 2)

one pair for the double-stranded consensus sequences (called mate 1 and 2).

Comment: Setting output formats

You may have to set the datatype of the outputs from tool Du Novo: Make consensus reads tool.

Versions below 2.16 only set the datatype to fastq, not the more specific fastqsanger. Many tools (like tool Map with BWA-MEM) won’t accept FASTQ input without it specifying what subtype it is.

In your history, click on the pencil icon next to the dataset name.

Click on the Datatypes tab.

In the Change datatype pane, click on the dropdown where it says fastq.

Enter fastqsanger, then click the Change datatype button in the upper right of the pane.

There is no easy way to assign a PHRED score to a consensus base derived from many duplex reads.

So Du Novo does not attempt to give a meaningful score. It assigns the same arbitrary score to all bases.

It produces FASTQ for compatibility, but the output contains no more information than a FASTA file.

You may have noticed the param-text Output PHRED score parameter in the tool Du Novo: Make consensus reads tool. This allows you to specify which score to assign to (all) the bases.

Filtering consensuses

You may have realized that when calling a “consensus” between two sequences, if the two disagree on a base, there’s no way to know which is correct. So in these situations, Du Novo uses the IUPAC ambiguity letter for the two different bases (e.g. W = A or T). Also, when calling single-stranded consensus sequences, if there aren’t enough high-quality bases to call a position (in the above hands-on, we set this threshold to 70%), it gives an N.

This information could be useful for some analyses, but not for our variant calling. The tool tool Sequence Content Trimmer will help with filtering these out. With the settings below, it will move along the read, tracking the frequency of ambiguous (non-ACGT) bases in a 10bp window. If it sees more than 2 ambiguous bases in a window, it will remove the rest of the read, starting with the first offending base in the window. We’ll also tell it to remove entirely any read pair containing a read that got trimmed to less than 50bp.

Hands On: Filtering the consensus sequences

Sequence Content Trimmer ( Galaxy version 0.2.3) with the following parameters:

param-select Paired reads?: Paired

param-file Input reads (mate 1): The double-stranded output of tool Make consensus reads (mate 1)

param-file Input reads (mate 2): The double-stranded output of tool Make consensus reads (mate 2)

param-text Bases to filter on: ACGT

param-text Frequency threshold: 0.2

param-text Size of the window: 10

param-check Invert filter bases: Yes

param-check Set a minimum read length: Yes

param-text Minimum read length: 50

Calling variants with duplex consensus sequences

At this point we have trimmed DCSs. We can now proceed to call variants. This involves aligning the variants against the reference genome, then counting variants.

We’re not specifically interested in the reference sequence, since all we care about is sequence content of the consensus reads. But we’ll be using the reference sequence to figure out where all the reads come from. This lets us stack them on top of each other, with equivalent bases lined up in columns. Then we can step through each column, count how many times we see each base, and and compile a list of variants.

Mapping the reads

Align against the genome with BWA-MEM

Here, we’ll use tool Map with BWA-MEM to map the DCS reads to the human reference genome.

Hands On: Align with BWA-MEM

Map with BWA-MEM ( Galaxy version 0.7.18) with the following parameters:

param-select Using reference genome?: Human (Homo sapiens) (b38): hg38

param-file Select first set of reads: The first output from the tool Sequence Content Trimmer

param-file Select second set of reads: The second output from the tool Sequence Content Trimmer

Left Aligning indels

To normalize the positional distribution of indels we use the tool BamLeftAlign utility from the FreeBayes package. You can find it in the NGS: Variant Analysis section. This is necessary to avoid erroneous polymorphisms flanking regions with indels (e.g., in low complexity loci):

Hands On: Left-align indels

BamLeftAlign ( Galaxy version 1.3.8) with the following parameters:

param-file Select alignment file in BAM format: The output of tool Map with BWA-MEM

param-select Using reference genome: Human (Homo sapiens): hg38

The same genome we aligned to.

Calling the variants

Now we’ll use our aligned consensus reads to find variants.

Normally, in a diploid resequencing experiment, you would call variants relative to the reference. So, you’d report sites which are different from the reference (and whether they’re hetero- or homozygous).

In our case, we’re interested in rare variants. So what we’ll report is the sites where there is more than one allele, and what the frequency is of the less-common allele (the minor allele). This has the potential to include every small sequencing error (even though we’re using duplex, there still are errors). So to reduce the noise, we’ll set a lower threshold at 1% minor allele frequency (MAF).

Finding variants in the alignment

To identify sites containing variants we use the tool Naive Variant Caller (NVC) tool from the NGS: Variant Analysis section. This reads the alignment and counts the number of bases of each type at each site.

Hands On: Count the variants

Naive Variant Caller (NVC) ( Galaxy version 0.0.4) with the following parameters:

param-file BAM file: The output of tool BamLeftAlign

param-select Using reference genome: hg38

The same genome we aligned to.

param-check Insert Restrict to regions: Click to add a region.

param-text Chromosome: chr9

ABL1 is on chr9. Restricting it to this region saves some processing time.

param-text Minimum base quality: 0

In our case, base quality isn’t meaningful, so we set the threshold to 0.

param-text Minimum mapping quality: 20

param-text Ploidy: 1

Ploidy is irrelevant here as it is a mixture of multiple genomes.

The tool Naive Variant Caller (NVC) generates a VCF file that can be viewed at genome browsers such as IGV. Yet one rarely finds variants by looking at genome browsers. We’ll want to use tools to search for variants that fit our criteria.

Finding minor alleles

Now we’ll want to parse the VCF produced by the NVC, determine what the major and minor allele is at each site, and calculate their frequencies. The tool Variant Annotator from the NGS: Variant Analysis section can do this.

Hands On: Read the variants file

Variant Annotator ( Galaxy version 1.3.2) with the following parameters:

param-file Input variants from Naive Variants Detector: The output of tool Naive Variant Caller (NVC)

param-text Minor allele frequency threshold: 0

param-text Coverage threshold: 10

param-check Output stranded base counts: Yes

To be able to filter for strand bias.

Filtering out the noise

Now we have a file containing the base counts for every site covered by at least 10 reads. We’d like to filter through this data to find sites with a reasonable chance of being a real variant, not sequencing error.

The tool Variant Annotator produces a simple tab-delimited file, with one site per line. We can use the tool Filter tool from the Filter and Sort section to process this kind of file. We’ll use the filter c16 >= 0.01 to remove lines where the value in column 16 is less than 0.01. Column 16 contains the minor allele frequency, so this will remove all sites with a MAF less than 1%.

Hands On: Filter the raw variants list

Filter - data on any column using simple expressions with the following parameters:

param-file Filter: The output of tool Variant Annotator

param-text With following condition: c16 >= 0.01

param-check Number of header lines to skip: 1

Results

Now we’re down to just two sites:

Position (chr9)	Major allele	Minor allele	MAF
Column 3	Column 14	Column 15	Column 16
130,872,141	G	A	0.01259
130,880,141	A	G	0.47764

The polymorphism we are interested in (and the one reported by Schmitt et al. 2015) is at the position 130,872,141 and has a frequency of 1.3%. The other site (position 130,880,141) is a known common variant rs2227985, which is heterozygous in this sample.

Calling variants with single strand consensus sequences

To analyze the SSCS data, go back to the Filtering consensuses step, and replace the input to the tool Sequence Content Trimmer with the single-stranded output of tool Make consensus reads instead of the double-stranded output. Then continue through the same steps as before, with the same parameters, to the end.

Then you will have a new set of variants, but produced with the SSCS reads, giving you a higher sensitivity (fewer false negatives), but lower specificity (more false positives).

There’s a shortcut to avoid setting every parameter the second time you run a tool.

In your history, click on an output of the first run to expand it.

Click on the button with the circular “re-run” arrows.

Now the parameters will all be the same as the last run. All you have to do is change the input file(s).

Re-running analyses with workflows

Instead of manually re-running all the tools in the variant calling section, you can use a workflow to automatically run the same tools, but on the SSCS reads. Workflows let you run a chain of tools on different input data with a single click of a button. You can find more information on using workflows in the Galaxy 101 introductory tutorial.

We’ve prepared two workflows which split the above analysis into two steps:

Using Du Novo to create consensus sequences from raw reads.
- This will generate trimmed DCS and SSCS files from raw sequencing data.
- This does not include the FastQC step. You should always run FastQC on your raw reads first, to check the quality of your sequencing run before proceeding with the analysis.
Comment: Helping Du Novo

The param-check Send usage data option is left off in the above workflow. This is because we want to make sure you only share data knowingly.

But again, if you’d like to help improve Du Novo, consider turning it on.
Calling variants from consensus sequences.
- This takes a pair of FASTQ files and calls variants using them.
- If you’d like variants from both DCS and SSCS, you’ll have to run this twice, once on each.
- N.B. Remember that this workflow is designed for the above ABL1 analysis. If you want to use it for any other dataset, you’ll have to change the relevant options.

You can use the variant calling workflow to call variants using the SSCS instead of the DCS.

Conclusion

You should now understand duplex sequencing, rare variants, and be able to process the former to find the latter.

If things don’t work…

…you need to complain. Use Galaxy’s Help Forum to do this.

You've Finished the Tutorial

Key points

Diploid variant calling relies on assumptions that rare variant calling cannot make

Duplex consensus sequences are usually most accurate, but sometimes you must rely on single-strand consensus sequences instead.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Anton Nekrutenko, Nick Stoler, Calling very rare variants (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/dunovo/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{variant-analysis-dunovo,
author = "Anton Nekrutenko and Nick Stoler",
	title = "Calling very rare variants (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/dunovo/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/variant-analysis/tutorials/dunovo/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: naive_variant_caller
  owner: blankenberg
  revisions: 6be51647d31a
  tool_panel_section_label: Variant Calling
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: bwa
  owner: devteam
  revisions: 3fe632431b68
  tool_panel_section_label: Mapping
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: bwa
  owner: devteam
  revisions: 4a196b9c72c2
  tool_panel_section_label: Mapping
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fastqc
  owner: devteam
  revisions: 2c64fded1286
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: freebayes
  owner: devteam
  revisions: 156b60c1530f
  tool_panel_section_label: Variant Calling
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: freebayes
  owner: devteam
  revisions: 3e954e7125bf
  tool_panel_section_label: Variant Calling
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: allele_counts
  owner: nick
  revisions: 411adeff1eec
  tool_panel_section_label: Variant Calling
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: allele_counts
  owner: nick
  revisions: cf2af5c3118c
  tool_panel_section_label: Variant Calling
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 9dc43bf7d1db
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 9dc43bf7d1db
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 0f8e0dc73d1d
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 9dc43bf7d1db
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 0f8e0dc73d1d
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 9dc43bf7d1db
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dunovo
  owner: nick
  revisions: 0f8e0dc73d1d
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: sequence_content_trimmer
  owner: nick
  revisions: 7f170cb06e2e
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: sequence_content_trimmer
  owner: nick
  revisions: 464aee13e2df
  tool_panel_section_label: Du Novo
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

t{ hist[0] | to_stars }} 3

June 2025

5 stars: Liked: Calling rare alleles Disliked: Calculation of B-allele frequency and its use for detecting ancestry

5 stars: Liked: Finding minor alleles Disliked: Begin with Freebayes and end with MAF calculation