NGS_tutorial

introduction-galaxy-intro-ngs-data-managment/ngs-tutorial

Author(s)
Anton Nekrutenko, Marius van den Beek, Dave Clements, Daniel Blankenberg, Armin Dadras
version Version
1
last_modification Last updated
Apr 14, 2025
license License
CC-BY-4.0
galaxy-tags Tags
introductions

Features
Tutorial
hands_on NGS data logistics

Workflow Testing
Tests: ✅
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:
RO-Crate logo with flask Download Workflow RO-Crate
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Dataset\nAccessions"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\nGenome"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["Download sequencing data"];
  0 -->|output| 2;
  ceeab6ee-59e9-4ea1-8301-29e9763da7e7["Output\nLog file"];
  2 --> ceeab6ee-59e9-4ea1-8301-29e9763da7e7;
  style ceeab6ee-59e9-4ea1-8301-29e9763da7e7 stroke:#2c3143,stroke-width:4px;
  dc27aec8-9751-4877-a761-3e0df6196120["Output\nUnpaired datasets"];
  2 --> dc27aec8-9751-4877-a761-3e0df6196120;
  style dc27aec8-9751-4877-a761-3e0df6196120 stroke:#2c3143,stroke-width:4px;
  c5eb96e5-d151-4c56-9d49-6d38967846ef["Output\nPaired-end datasets"];
  2 --> c5eb96e5-d151-4c56-9d49-6d38967846ef;
  style c5eb96e5-d151-4c56-9d49-6d38967846ef stroke:#2c3143,stroke-width:4px;
  8e89df59-0358-4ed8-9e73-4874fa63da74["Output\nSingle-end datasets"];
  2 --> 8e89df59-0358-4ed8-9e73-4874fa63da74;
  style 8e89df59-0358-4ed8-9e73-4874fa63da74 stroke:#2c3143,stroke-width:4px;
  3["Adapter trimming with fastp"];
  2 -->|list_paired| 3;
  6134ed21-60d2-4463-94ed-90d405fa79c7["Output\nPaired-end Collection"];
  3 --> 6134ed21-60d2-4463-94ed-90d405fa79c7;
  style 6134ed21-60d2-4463-94ed-90d405fa79c7 stroke:#2c3143,stroke-width:4px;
  23007acc-0e39-4c79-a501-f7bfdbe4eca4["Output\nReport in HTML format"];
  3 --> 23007acc-0e39-4c79-a501-f7bfdbe4eca4;
  style 23007acc-0e39-4c79-a501-f7bfdbe4eca4 stroke:#2c3143,stroke-width:4px;
  c0ae08ab-5964-4e24-91ea-2b56592450fd["Output\nReport in JSON format"];
  3 --> c0ae08ab-5964-4e24-91ea-2b56592450fd;
  style c0ae08ab-5964-4e24-91ea-2b56592450fd stroke:#2c3143,stroke-width:4px;
  4["Map sequencing reads to reference genome with BWA-MEM"];
  3 -->|output_paired_coll| 4;
  1 -->|output| 4;
  c5884bbf-7972-402e-9679-bf112e435c5d["Output\nMapping BAM output"];
  4 --> c5884bbf-7972-402e-9679-bf112e435c5d;
  style c5884bbf-7972-402e-9679-bf112e435c5d stroke:#2c3143,stroke-width:4px;
  5["Samtools view"];
  4 -->|bam_output| 5;
  4cd8f23c-ee1e-4587-b880-a016bfda9b9a["Output\nMapping SAM output"];
  5 --> 4cd8f23c-ee1e-4587-b880-a016bfda9b9a;
  style 4cd8f23c-ee1e-4587-b880-a016bfda9b9a stroke:#2c3143,stroke-width:4px;
  6["Removing duplicate sequences originating from library preparation artifacts and sequencing artifacts with MarkDuplicates"];
  5 -->|outputsam| 6;
  8e8357a3-d8e6-4a28-a000-9897240a6c7f["Output\nMarkDuplicates BAM"];
  6 --> 8e8357a3-d8e6-4a28-a000-9897240a6c7f;
  style 8e8357a3-d8e6-4a28-a000-9897240a6c7f stroke:#2c3143,stroke-width:4px;
  1d2eaa9f-336a-4954-82b0-526da01d43eb["Output\nMarkDuplicates Metrics"];
  6 --> 1d2eaa9f-336a-4954-82b0-526da01d43eb;
  style 1d2eaa9f-336a-4954-82b0-526da01d43eb stroke:#2c3143,stroke-width:4px;
  7["Correcting the misalignments around insertions and deletions with Realign reads"];
  6 -->|outFile| 7;
  1 -->|output| 7;
  af463107-789c-49cf-8dfa-37799b4bea9b["Output\nRealigned reads BAM file"];
  7 --> af463107-789c-49cf-8dfa-37799b4bea9b;
  style af463107-789c-49cf-8dfa-37799b4bea9b stroke:#2c3143,stroke-width:4px;
  8["Samtools stats"];
  6 -->|outFile| 8;
  c4dc38a6-eb6b-4a52-b38b-a88a73682928["Output\nStatistics for BAM dataset"];
  8 --> c4dc38a6-eb6b-4a52-b38b-a88a73682928;
  style c4dc38a6-eb6b-4a52-b38b-a88a73682928 stroke:#2c3143,stroke-width:4px;
  9["Adding the indel qualities into our alignment file via Insert indel qualities"];
  7 -->|realigned| 9;
  1 -->|output| 9;
  6dee7ec9-9a2e-48a6-9172-fe5a1bb9f9b0["Output\nRealigned BAM dataset with indel qualities"];
  9 --> 6dee7ec9-9a2e-48a6-9172-fe5a1bb9f9b0;
  style 6dee7ec9-9a2e-48a6-9172-fe5a1bb9f9b0 stroke:#2c3143,stroke-width:4px;
  10["Summarizing the analyses with MultiQC"];
  3 -->|report_json| 10;
  6 -->|metrics_file| 10;
  8 -->|output| 10;
  8c8064a6-11fe-4d20-afa0-39dc8fbaa2b0["Output\nMultiQC Stat table"];
  10 --> 8c8064a6-11fe-4d20-afa0-39dc8fbaa2b0;
  style 8c8064a6-11fe-4d20-afa0-39dc8fbaa2b0 stroke:#2c3143,stroke-width:4px;
  d83d789f-6723-4360-ac69-47a20db5dd15["Output\nMultiQC HTML report"];
  10 --> d83d789f-6723-4360-ac69-47a20db5dd15;
  style d83d789f-6723-4360-ac69-47a20db5dd15 stroke:#2c3143,stroke-width:4px;
  11["Calling the Variants using lofreq Call variants"];
  9 -->|output| 11;
  1 -->|output| 11;
  e6369b39-d95c-4c11-870d-855f833aa2bd["Output\nAll called variants"];
  11 --> e6369b39-d95c-4c11-870d-855f833aa2bd;
  style e6369b39-d95c-4c11-870d-855f833aa2bd stroke:#2c3143,stroke-width:4px;
  12["Annotating the variant effects with SnpEff eff"];
  11 -->|variants| 12;
  bbec359e-3ccb-4173-9fc3-6eff1edb0864["Output\nHTML summary of results "];
  12 --> bbec359e-3ccb-4173-9fc3-6eff1edb0864;
  style bbec359e-3ccb-4173-9fc3-6eff1edb0864 stroke:#2c3143,stroke-width:4px;
  1bab9bf4-8788-4c7a-b12a-11afceb90a54["Output\nVariant dataset with added variant effects"];
  12 --> 1bab9bf4-8788-4c7a-b12a-11afceb90a54;
  style 1bab9bf4-8788-4c7a-b12a-11afceb90a54 stroke:#2c3143,stroke-width:4px;
  13["Creating table of variants using SnpSift Extract Fields"];
  12 -->|snpeff_output| 13;
  4bfe24cc-6e25-47ad-afd1-13b1729fea01["Output\nVariant dataset with added variant effects in tabular format"];
  13 --> 4bfe24cc-6e25-47ad-afd1-13b1729fea01;
  style 4bfe24cc-6e25-47ad-afd1-13b1729fea01 stroke:#2c3143,stroke-width:4px;
  14["Collapsing the data into a single dataset"];
  13 -->|output| 14;
  dda400de-8467-4861-8cf1-78594dce83a3["Output\nSummarized variant analysis result dataset"];
  14 --> dda400de-8467-4861-8cf1-78594dce83a3;
  style dda400de-8467-4861-8cf1-78594dce83a3 stroke:#2c3143,stroke-width:4px;

Inputs

Input Label
Input dataset Accessions
Input dataset Genome

Outputs

From Output Label
toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1 Faster Download and Extract Reads in FASTQ Download sequencing data
toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.24.0+galaxy4 fastp Adapter trimming with fastp
toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.19 Map with BWA-MEM Map sequencing reads to reference genome with BWA-MEM
toolshed.g2.bx.psu.edu/repos/iuc/samtools_view/samtools_view/1.20+galaxy3 Samtools view
toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/3.1.1.0 MarkDuplicates Removing duplicate sequences originating from library preparation artifacts and sequencing artifacts with MarkDuplicates
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_viterbi/lofreq_viterbi/2.1.5+galaxy0 Realign reads Correcting the misalignments around insertions and deletions with Realign reads
toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5 Samtools stats
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_indelqual/lofreq_indelqual/2.1.5+galaxy1 Insert indel qualities Adding the indel qualities into our alignment file via Insert indel qualities
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.27+galaxy3 MultiQC Summarizing the analyses with MultiQC
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_call/lofreq_call/2.1.5+galaxy3 Call variants Calling the Variants using lofreq Call variants
toolshed.g2.bx.psu.edu/repos/iuc/snpeff_sars_cov_2/snpeff_sars_cov_2/4.5covid19 SnpEff eff: Annotating the variant effects with SnpEff eff
toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0 SnpSift Extract Fields Creating table of variants using SnpSift Extract Fields
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 Collapse Collection Collapsing the data into a single dataset

Tools

Tool Links
toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.19 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/3.1.1.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.5 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.24.0+galaxy4 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_call/lofreq_call/2.1.5+galaxy3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_indelqual/lofreq_indelqual/2.1.5+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/lofreq_viterbi/lofreq_viterbi/2.1.5+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.27+galaxy3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/samtools_view/samtools_view/1.20+galaxy3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/snpeff_sars_cov_2/snpeff_sars_cov_2/4.5covid19 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands On: Importing a workflow
  1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
  2. Click on galaxy-upload Import at the top-right of the screen
  3. Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  4. Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments
1 ae5f7f4a7 2025-04-11 22:34:01 Updated the workflow and tools. Added history key answers. Added Planemo test files.

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-ngs-data-managment/workflows/NGS_tutorial.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows