preparing genomic data for phylogeny recostruction (GTN)

ecology-phylogeny-data-prep/main-workflow

Author(s)
Miguel Roncoroni
version Version
1
last_modification Last updated
Oct 21, 2022
license License
CC-BY-4.0
galaxy-tags Tags
ecology

Features

Tutorial
hands_on Preparing genomic data for phylogeny reconstruction

Workflow Testing
Tests: ❌
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:W00053
RO-Crate logo with flask Download Workflow RO-Crate Workflowhub cloud with gears logo View on (Dev) WorkflowHub
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Collection\nInput genomes as collection"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["Replace Text"];
  0 -->|output| 1;
  6a2d3fcd-b557-4440-910d-b4c537feef20["Output\nheaders_shortened"];
  1 --> 6a2d3fcd-b557-4440-910d-b4c537feef20;
  style 6a2d3fcd-b557-4440-910d-b4c537feef20 stroke:#2c3143,stroke-width:4px;
  2["RepeatMasker"];
  1 -->|outfile| 2;
  1db6607a-343a-4a61-9a3b-43101eb8223b["Output\nrepeat_masked"];
  2 --> 1db6607a-343a-4a61-9a3b-43101eb8223b;
  style 1db6607a-343a-4a61-9a3b-43101eb8223b stroke:#2c3143,stroke-width:4px;
  3["Funannotate predict annotation"];
  2 -->|output_masked_genome| 3;
  642c1c09-82f4-4ef4-bd53-d14a583044e2["Output\nfunannotate_predicted_proteins"];
  3 --> 642c1c09-82f4-4ef4-bd53-d14a583044e2;
  style 642c1c09-82f4-4ef4-bd53-d14a583044e2 stroke:#2c3143,stroke-width:4px;
  4["Extract ORF"];
  3 -->|annot_gbk| 4;
  07fb8c04-990e-4bc7-b607-9c4161b4786d["Output\nextracted_ORFs"];
  4 --> 07fb8c04-990e-4bc7-b607-9c4161b4786d;
  style 07fb8c04-990e-4bc7-b607-9c4161b4786d stroke:#2c3143,stroke-width:4px;
  5["Regex Find And Replace"];
  4 -->|aa_output| 5;
  8dc378a8-d485-42df-8322-6cf8230257a0["Output\nsample_names_to_headers"];
  5 --> 8dc378a8-d485-42df-8322-6cf8230257a0;
  style 8dc378a8-d485-42df-8322-6cf8230257a0 stroke:#2c3143,stroke-width:4px;
  6["Collapse Collection"];
  5 -->|out_file1| 6;
  e9a55459-4a2c-4238-8494-e99ec67307ea["Output\nproteomes_to_one_file"];
  6 --> e9a55459-4a2c-4238-8494-e99ec67307ea;
  style e9a55459-4a2c-4238-8494-e99ec67307ea stroke:#2c3143,stroke-width:4px;
  7["Proteinortho"];
  5 -->|out_file1| 7;
  ee688b7a-2a9e-4480-a27a-db8cf795b635["Output\nProteinortho on input dataset(s): orthology-groups"];
  7 --> ee688b7a-2a9e-4480-a27a-db8cf795b635;
  style ee688b7a-2a9e-4480-a27a-db8cf795b635 stroke:#2c3143,stroke-width:4px;
  8["Busco"];
  5 -->|out_file1| 8;
  9["Filter"];
  7 -->|proteinortho| 9;
  10["Proteinortho grab proteins"];
  6 -->|output| 10;
  9 -->|out_file1| 10;
  8625e8b1-e3af-4afa-bf85-1a3258cbbfb2["Output\nProteinortho_extract_by_orthogroup"];
  10 --> 8625e8b1-e3af-4afa-bf85-1a3258cbbfb2;
  style 8625e8b1-e3af-4afa-bf85-1a3258cbbfb2 stroke:#2c3143,stroke-width:4px;
  11["Regex Find And Replace"];
  10 -->|listproteinorthograbproteins| 11;
  b072d32e-f725-4833-af0b-74f4df526d9a["Output\nfasta_header_cleaned"];
  11 --> b072d32e-f725-4833-af0b-74f4df526d9a;
  style b072d32e-f725-4833-af0b-74f4df526d9a stroke:#2c3143,stroke-width:4px;
  12["ClustalW"];
  11 -->|out_file1| 12;
  f704b4b2-5214-4393-8a85-6274bda27c8c["Output\nClustalW on input dataset(s): clustal"];
  12 --> f704b4b2-5214-4393-8a85-6274bda27c8c;
  style f704b4b2-5214-4393-8a85-6274bda27c8c stroke:#2c3143,stroke-width:4px;
  13["ClipKIT. Alignment trimming software for phylogenetics."];
  12 -->|output| 13;
  37092981-191a-4413-8f60-51802dd95f9c["Output\nTrimmed alignment."];
  13 --> 37092981-191a-4413-8f60-51802dd95f9c;
  style 37092981-191a-4413-8f60-51802dd95f9c stroke:#2c3143,stroke-width:4px;
  14["PhyKit - Alignment-based functions"];
  13 -->|trimmed_output| 14;
  09ad25e3-cd68-4fb4-9c57-3e79212b8e01["Output\nConcatenated fasta alignment file"];
  14 --> 09ad25e3-cd68-4fb4-9c57-3e79212b8e01;
  style 09ad25e3-cd68-4fb4-9c57-3e79212b8e01 stroke:#2c3143,stroke-width:4px;
  1d546e4c-7e3c-499d-870c-4846feb7a46d["Output\nA partition file ready for input into RAxML or IQ-tree"];
  14 --> 1d546e4c-7e3c-499d-870c-4846feb7a46d;
  style 1d546e4c-7e3c-499d-870c-4846feb7a46d stroke:#2c3143,stroke-width:4px;
  26846814-d43a-4ce7-9f26-cfb70f184dce["Output\nAn occupancy file that summarizes the taxon occupancy per sequence"];
  14 --> 26846814-d43a-4ce7-9f26-cfb70f184dce;
  style 26846814-d43a-4ce7-9f26-cfb70f184dce stroke:#2c3143,stroke-width:4px;

Inputs

Input Label
Input dataset collection Input genomes as collection

Outputs

From Output Label
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2 Replace Text
toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.2-p1+galaxy0 RepeatMasker
toolshed.g2.bx.psu.edu/repos/iuc/funannotate_predict/funannotate_predict/1.8.9+galaxy2 Funannotate predict annotation
toolshed.g2.bx.psu.edu/repos/bgruening/glimmer_gbk_to_orf/glimmer_gbk_to_orf/3.02 Extract ORF
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1 Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2 Collapse Collection
toolshed.g2.bx.psu.edu/repos/iuc/proteinortho/proteinortho/6.0.14+galaxy2.9.1 Proteinortho
toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/4.1.4 Busco
Filter1 Filter
toolshed.g2.bx.psu.edu/repos/iuc/proteinortho_grab_proteins/proteinortho_grab_proteins/6.0.14+galaxy2.9.1 Proteinortho grab proteins
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1 Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1 ClustalW
toolshed.g2.bx.psu.edu/repos/padge/clipkit/clipkit/0.1.0 ClipKIT. Alignment trimming software for phylogenetics.
toolshed.g2.bx.psu.edu/repos/padge/phykit/phykit_alignment_based/0.1.0 PhyKit - Alignment-based functions

Tools

Tool Links
Filter1
toolshed.g2.bx.psu.edu/repos/bgruening/glimmer_gbk_to_orf/glimmer_gbk_to_orf/3.02 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.2-p1+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/4.1.4 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/funannotate_predict/funannotate_predict/1.8.9+galaxy2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/proteinortho/proteinortho/6.0.14+galaxy2.9.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/proteinortho_grab_proteins/proteinortho_grab_proteins/6.0.14+galaxy2.9.1
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/padge/clipkit/clipkit/0.1.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/padge/phykit/phykit_alignment_based/0.1.0 View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands-on: Importing a workflow
  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on galaxy-upload Import at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments
2 f7b9464bb 2022-06-29 07:55:27 Update main_workflow.ga
1 d6ad32e26 2022-05-06 14:51:50 create phylogenetic-data-prep training backbone

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/ecology/tutorials/phylogeny-data-prep/workflows/main_workflow.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows