Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation

microbiome-pathogen-detection-from-nanopore-foodborne-data/pathogen-detection-pathogfair-samples-aggrtion-and-visualisation

Author(s)
Engy Nasr, Bérénice Batut, Paul Zierep
version Version
2
last_modification Last updated
Jun 7, 2024
license License
MIT
galaxy-tags Tags
name:Collection
name:microGalaxy
name:PathoGFAIR
name:IWC

Features
Tutorial
hands_on Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition
workflow Other workflows associated with this material
Workflow Testing
Tests: ❌
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:W00144
RO-Crate logo with flask Download Workflow RO-Crate Workflowhub cloud with gears logo View on WorkflowHub
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Collection\namr_identified_by_ncbi"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Collection\nvfs_of_genes_identified_by_vfdb"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Dataset\nmetadata"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["ℹ️ Input Dataset\nremoved_hosts_percentage_tabular"];
  style 3 stroke:#2c3143,stroke-width:4px;
  4["ℹ️ Input Dataset\nmapping_mean_depth_per_sample"];
  style 4 stroke:#2c3143,stroke-width:4px;
  5["ℹ️ Input Collection\namrs"];
  style 5 stroke:#2c3143,stroke-width:4px;
  6["ℹ️ Input Dataset\nmapping_coverage_percentage_per_sample"];
  style 6 stroke:#2c3143,stroke-width:4px;
  7["ℹ️ Input Dataset\nnumber_of_variants_per_sample"];
  style 7 stroke:#2c3143,stroke-width:4px;
  8["ℹ️ Input Collection\ncontigs"];
  style 8 stroke:#2c3143,stroke-width:4px;
  9["ℹ️ Input Collection\nvfs"];
  style 9 stroke:#2c3143,stroke-width:4px;
  10["Filter failed datasets"];
  0 -->|output| 10;
  11["Filter failed datasets"];
  1 -->|output| 11;
  12["Bar chart"];
  3 -->|output| 12;
  0961166e-38b6-44e3-93db-100d821b0070["Output\nnumber_of_reads_before_host_removal_and_number_of_host_reads_found_per_sample_fig"];
  12 --> 0961166e-38b6-44e3-93db-100d821b0070;
  style 0961166e-38b6-44e3-93db-100d821b0070 stroke:#2c3143,stroke-width:4px;
  13["Bar chart"];
  3 -->|output| 13;
  548975e5-4619-49fc-9e95-4d7f4d761dfd["Output\nremoved_host_percentage_fig"];
  13 --> 548975e5-4619-49fc-9e95-4d7f4d761dfd;
  style 548975e5-4619-49fc-9e95-4d7f4d761dfd stroke:#2c3143,stroke-width:4px;
  14["Bar chart"];
  4 -->|output| 14;
  a71ebb67-1154-4f25-a62d-e8fa2b839e2e["Output\nmapping_mean_depth_per_sample_fig"];
  14 --> a71ebb67-1154-4f25-a62d-e8fa2b839e2e;
  style a71ebb67-1154-4f25-a62d-e8fa2b839e2e stroke:#2c3143,stroke-width:4px;
  15["Filter failed datasets"];
  5 -->|output| 15;
  16["Bar chart"];
  6 -->|output| 16;
  2567360e-d39f-4316-84f5-77aaf8e8198c["Output\nmapping_coverage_percentage_per_sample_fig"];
  16 --> 2567360e-d39f-4316-84f5-77aaf8e8198c;
  style 2567360e-d39f-4316-84f5-77aaf8e8198c stroke:#2c3143,stroke-width:4px;
  17["Bar chart"];
  7 -->|output| 17;
  68c76f06-d2ea-4280-9e6f-a7b4f1568389["Output\nnumber_of_Variants_and_SNPs_indentified_fig"];
  17 --> 68c76f06-d2ea-4280-9e6f-a7b4f1568389;
  style 68c76f06-d2ea-4280-9e6f-a7b4f1568389 stroke:#2c3143,stroke-width:4px;
  18["Filter failed datasets"];
  8 -->|output| 18;
  19["Filter failed datasets"];
  9 -->|output| 19;
  20["Remove beginning"];
  10 -->|output| 20;
  21["Remove beginning"];
  11 -->|output| 21;
  22["Remove beginning"];
  15 -->|output| 22;
  23["Collapse Collection"];
  18 -->|output| 23;
  86910e39-57bf-4a76-ac4c-739340fd2387["Output\nall_samples_contigs_in_one_fasta_file"];
  23 --> 86910e39-57bf-4a76-ac4c-739340fd2387;
  style 86910e39-57bf-4a76-ac4c-739340fd2387 stroke:#2c3143,stroke-width:4px;
  24["Collapse Collection"];
  19 -->|output| 24;
  02b996c6-a912-4e4e-b3ec-49601faaa452["Output\nall_vfs_in_one_tabular"];
  24 --> 02b996c6-a912-4e4e-b3ec-49601faaa452;
  style 02b996c6-a912-4e4e-b3ec-49601faaa452 stroke:#2c3143,stroke-width:4px;
  25["Remove beginning"];
  19 -->|output| 25;
  26["Count"];
  20 -->|out_file1| 26;
  27["Count"];
  21 -->|out_file1| 27;
  28["Group"];
  21 -->|out_file1| 28;
  29["Unique"];
  22 -->|out_file1| 29;
  30["Split by group"];
  24 -->|output| 30;
  59f8cb09-424b-47b1-b94b-a612c2610cab["Output\nsplit_by_group_collection"];
  30 --> 59f8cb09-424b-47b1-b94b-a612c2610cab;
  style 59f8cb09-424b-47b1-b94b-a612c2610cab stroke:#2c3143,stroke-width:4px;
  31["Unique"];
  25 -->|out_file1| 31;
  32["Cut"];
  26 -->|out_file1| 32;
  33["Cut"];
  27 -->|out_file1| 33;
  34["Filter empty datasets"];
  28 -->|out_file1| 34;
  35["Cut"];
  29 -->|outfile| 35;
  36["Cut"];
  30 -->|split_output| 36;
  eeb25a51-ea21-4a19-a196-55d5bd919b10["Output\nadjusted_abricate_vfs_tabular_part1"];
  36 --> eeb25a51-ea21-4a19-a196-55d5bd919b10;
  style eeb25a51-ea21-4a19-a196-55d5bd919b10 stroke:#2c3143,stroke-width:4px;
  37["Cut"];
  31 -->|outfile| 37;
  38["Collapse Collection"];
  32 -->|out_file1| 38;
  39["Collapse Collection"];
  33 -->|out_file1| 39;
  40["Column join"];
  34 -->|output| 40;
  41["bedtools getfasta"];
  23 -->|output| 41;
  35 -->|out_file1| 41;
  42["Remove beginning"];
  36 -->|out_file1| 42;
  aaaa4446-0817-4e5c-aa1b-9ec384f2a363["Output\nadjusted_abricate_vfs_tabular_part2"];
  42 --> aaaa4446-0817-4e5c-aa1b-9ec384f2a363;
  style aaaa4446-0817-4e5c-aa1b-9ec384f2a363 stroke:#2c3143,stroke-width:4px;
  43["bedtools getfasta"];
  23 -->|output| 43;
  37 -->|out_file1| 43;
  44["Column Regex Find And Replace"];
  38 -->|output| 44;
  4809c36b-31ef-4664-8e4e-47f0f72152de["Output\namrs_count"];
  44 --> 4809c36b-31ef-4664-8e4e-47f0f72152de;
  style 4809c36b-31ef-4664-8e4e-47f0f72152de stroke:#2c3143,stroke-width:4px;
  45["Column Regex Find And Replace"];
  39 -->|output| 45;
  87efc81d-4d84-4af3-831f-dfe033c59f78["Output\nvfs_count"];
  45 --> 87efc81d-4d84-4af3-831f-dfe033c59f78;
  style 87efc81d-4d84-4af3-831f-dfe033c59f78 stroke:#2c3143,stroke-width:4px;
  46["Column Regex Find And Replace"];
  40 -->|tabular_output| 46;
  f5c221e3-00ef-4834-9a5f-a94c97fd6764["Output\nheatmap_table"];
  46 --> f5c221e3-00ef-4834-9a5f-a94c97fd6764;
  style f5c221e3-00ef-4834-9a5f-a94c97fd6764 stroke:#2c3143,stroke-width:4px;
  47["Regex Find And Replace"];
  41 -->|output| 47;
  48["bedtools getfasta"];
  23 -->|output| 48;
  42 -->|out_file1| 48;
  82ce2107-89a3-438c-95bb-dc871b5258b7["Output\nfiltered_sequences_with_vfs_fasta"];
  48 --> 82ce2107-89a3-438c-95bb-dc871b5258b7;
  style 82ce2107-89a3-438c-95bb-dc871b5258b7 stroke:#2c3143,stroke-width:4px;
  49["Regex Find And Replace"];
  43 -->|output| 49;
  50["Multi-Join"];
  45 -->|out_file1| 50;
  44 -->|out_file1| 50;
  51["Heatmap w ggplot"];
  46 -->|out_file1| 51;
  c0417c91-a513-4c6a-9a62-3aac2f1f8e85["Output\nheatmap_pdf"];
  51 --> c0417c91-a513-4c6a-9a62-3aac2f1f8e85;
  style c0417c91-a513-4c6a-9a62-3aac2f1f8e85 stroke:#2c3143,stroke-width:4px;
  97816bc2-fd0c-4077-a721-8dd1470879d1["Output\nheatmap_png"];
  51 --> 97816bc2-fd0c-4077-a721-8dd1470879d1;
  style 97816bc2-fd0c-4077-a721-8dd1470879d1 stroke:#2c3143,stroke-width:4px;
  52["Filter empty datasets"];
  47 -->|out_file1| 52;
  53["ClustalW"];
  48 -->|output| 53;
  9b7bd78c-f112-480b-a7df-c10711af254c["Output\nclustalw_on_input_dnd"];
  53 --> 9b7bd78c-f112-480b-a7df-c10711af254c;
  style 9b7bd78c-f112-480b-a7df-c10711af254c stroke:#2c3143,stroke-width:4px;
  6af20322-24da-4036-80de-37bed2d25848["Output\nclustalw_on_input_clustal"];
  53 --> 6af20322-24da-4036-80de-37bed2d25848;
  style 6af20322-24da-4036-80de-37bed2d25848 stroke:#2c3143,stroke-width:4px;
  54["Filter empty datasets"];
  49 -->|out_file1| 54;
  55["Replace Text"];
  50 -->|outfile| 55;
  91745da0-0b8d-4a7a-b927-b36107f17ec5["Output\nvfs_amrs_count_table"];
  55 --> 91745da0-0b8d-4a7a-b927-b36107f17ec5;
  style 91745da0-0b8d-4a7a-b927-b36107f17ec5 stroke:#2c3143,stroke-width:4px;
  56["FASTA-to-Tabular"];
  52 -->|output| 56;
  57["Filter empty datasets"];
  53 -->|output| 57;
  32d00c1d-68c4-4069-8d5d-023aabdfadbe["Output\nfiltered_empty_datasets"];
  57 --> 32d00c1d-68c4-4069-8d5d-023aabdfadbe;
  style 32d00c1d-68c4-4069-8d5d-023aabdfadbe stroke:#2c3143,stroke-width:4px;
  58["FASTA-to-Tabular"];
  54 -->|output| 58;
  59["Cut"];
  56 -->|output| 59;
  60["FASTTREE"];
  57 -->|output| 60;
  aacdfe45-eb0c-4f6e-a479-eeb170774757["Output\nfasttree_nhx"];
  60 --> aacdfe45-eb0c-4f6e-a479-eeb170774757;
  style aacdfe45-eb0c-4f6e-a479-eeb170774757 stroke:#2c3143,stroke-width:4px;
  61["Cut"];
  58 -->|output| 61;
  62["Group"];
  59 -->|out_file1| 62;
  63["Newick Display"];
  60 -->|output| 63;
  0c22178c-dc85-4137-80e2-f3040b92bd20["Output\nnewick_genes_tree_graphs_collection"];
  63 --> 0c22178c-dc85-4137-80e2-f3040b92bd20;
  style 0c22178c-dc85-4137-80e2-f3040b92bd20 stroke:#2c3143,stroke-width:4px;
  64["Group"];
  61 -->|out_file1| 64;
  65["Tabular-to-FASTA"];
  62 -->|out_file1| 65;
  66["Tabular-to-FASTA"];
  64 -->|out_file1| 66;
  67["FASTA Merge Files and Filter Unique Sequences"];
  65 -->|output| 67;
  68["FASTA Merge Files and Filter Unique Sequences"];
  66 -->|output| 68;
  69["ClustalW"];
  67 -->|output| 69;
  70["ClustalW"];
  68 -->|output| 70;
  71["FASTTREE"];
  69 -->|output| 71;
  72["FASTTREE"];
  70 -->|output| 72;
  73["Newick Display"];
  71 -->|output| 73;
  1f9cb2cf-219f-48de-8058-d6d45f3b3158["Output\nall_samples_phylogenetic_tree_based_amrs"];
  73 --> 1f9cb2cf-219f-48de-8058-d6d45f3b3158;
  style 1f9cb2cf-219f-48de-8058-d6d45f3b3158 stroke:#2c3143,stroke-width:4px;
  74["Newick Display"];
  72 -->|output| 74;
  6bb4b32b-7cca-4e04-8120-be9f64ccba39["Output\nall_samples_phylogenetic_tree_based_vfs"];
  74 --> 6bb4b32b-7cca-4e04-8120-be9f64ccba39;
  style 6bb4b32b-7cca-4e04-8120-be9f64ccba39 stroke:#2c3143,stroke-width:4px;

Inputs

Input Label
Input dataset collection amr_identified_by_ncbi
Input dataset collection vfs_of_genes_identified_by_vfdb
Input dataset metadata
Input dataset removed_hosts_percentage_tabular
Input dataset mapping_mean_depth_per_sample
Input dataset collection amrs
Input dataset mapping_coverage_percentage_per_sample
Input dataset number_of_variants_per_sample
Input dataset collection contigs
Input dataset collection vfs

Outputs

From Output Label
barchart_gnuplot Bar chart
barchart_gnuplot Bar chart
barchart_gnuplot Bar chart
barchart_gnuplot Bar chart
barchart_gnuplot Bar chart
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 Collapse Collection
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 Collapse Collection
toolshed.g2.bx.psu.edu/repos/bgruening/split_file_on_column/tp_split_on_column/0.6 Split by group
Cut1 Cut
Remove beginning1 Remove beginning
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_getfastabed/2.30.0+galaxy1 bedtools getfasta
toolshed.g2.bx.psu.edu/repos/iuc/ggplot2_heatmap/ggplot2_heatmap/3.4.0+galaxy0 Heatmap w ggplot
toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1 ClustalW
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.3+galaxy1 Replace Text
__FILTER_EMPTY_DATASETS__ Filter empty datasets
toolshed.g2.bx.psu.edu/repos/iuc/fasttree/fasttree/2.1.10+galaxy1 FASTTREE
toolshed.g2.bx.psu.edu/repos/iuc/newick_utils/newick_display/1.6+galaxy1 Newick Display
toolshed.g2.bx.psu.edu/repos/iuc/newick_utils/newick_display/1.6+galaxy1 Newick Display
toolshed.g2.bx.psu.edu/repos/iuc/newick_utils/newick_display/1.6+galaxy1 Newick Display

Tools

Tool Links
Count1
Cut1
Grouping1
Remove beginning1
__FILTER_EMPTY_DATASETS__
__FILTER_FAILED_DATASETS__
barchart_gnuplot
toolshed.g2.bx.psu.edu/repos/bgruening/split_file_on_column/tp_split_on_column/0.6 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_multijoin_tool/9.3+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.3+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sorted_uniq/9.3+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab/1.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_getfastabed/2.30.0+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/collection_column_join/collection_column_join/0.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/fasttree/fasttree/2.1.10+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/ggplot2_heatmap/ggplot2_heatmap/3.4.0+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/newick_utils/newick_display/1.6+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands-on: Importing a workflow
  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on galaxy-upload Import at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments
3 cdd93376a 2024-06-06 12:00:29 adding tags to some of the workflow outputs, updating the training with the latest PathoGFAIR workflows updates
2 211b69394 2024-05-26 09:45:27 adding workflow reports to the workflows of the training to match the latest version of the IWC PR
1 d320748c5 2024-05-20 18:17:48 Foodborne training update 2024

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/pathogen_detection_pathoGFAIR_samples_aggregation_and_visualisation.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows