Dataset construction for bacterial comparative genomics

Overview
Creative Commons License: CC-BY Questions:
  • to do

  • to do

Objectives:
  • to do

  • to do

Requirements:
Time estimation: 1 hour
Level: Introductory Introductory
Supporting Materials:
Published: Mar 14, 2025
Last modification: Mar 14, 2025
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
version Revision: 1

Introduction

Agenda

In this tutorial, we will cover:

  1. Get genomes from GTDB
  2. Extract IDs for NCBI
  3. Download genomes from NCBI
  4. Conclusion

Get genomes from GTDB

Hands On: Task description
Hands On: Data Upload
  1. Create a new history for this analysis

    To create a new history simply click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Rename the history

    1. Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
    2. Type the new name
    3. Click on Save
    4. To cancel renaming, click the galaxy-undo “Cancel” button

    If you do not have the galaxy-pencil (Edit) next to the history name (which can be the case if you are using an older version of Galaxy) do the following:

    1. Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
    2. Type the new name
    3. Press Enter

  3. Import the contig file from Zenodo or from Galaxy shared data libraries:

    https://zenodo.org/records/1/files/DRR187559_contigs.fasta
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    1. Go into Libraries (left panel)
    2. Navigate to the correct folder as indicated by your instructor.
      • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
    3. Select the desired files
    4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
    5. In the pop-up window, choose

      • “Select history”: the history you want to import the data to (or create a new one)
    6. Click on Import

Extract IDs for NCBI

Hands On: Task description
  1. Cut with the following parameters:
    • “Cut columns”: c1
    • “Delimited by”: Comma
    • param-file “From”: output (Input dataset)

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2
Hands On: Task description
  1. Cut with the following parameters:
    • “Cut columns”: c6
    • param-file “From”: output (Input dataset)

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2
Hands On: Task description
  1. Concatenate datasets with the following parameters:
    • param-file “Concatenate Dataset”: out_file1 (output of Cut tool)
    • In “Dataset”:
      • param-repeat “Insert Dataset”
        • param-file “Select”: out_file1 (output of Cut tool)

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2
Hands On: Task description
  1. Sort with the following parameters:
    • param-file “Sort Dataset”: out_file1 (output of Concatenate datasets tool)
    • “on column”: c1
  2. Unique ( Galaxy version 0.3) with the following parameters:
    • param-file “from query”: out_file1 (output of Sort tool)
    • “Advanced Options”: Hide Advanced Options

    TODO: Check parameter descriptions

    TODO: Consider adding a comment or tip box

    Comment: short description

    A comment about the tool or something else. This box can also be in the main text

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2
Hands On: Task description
  1. Remove beginning with the following parameters:
    • “Remove first”: 2
    • param-file “from”: outfile (output of Unique tool)

    TODO: Check parameter descriptions

    TODO: Consider adding a comment or tip box

    Comment: short description

    A comment about the tool or something else. This box can also be in the main text

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2

Download genomes from NCBI

Hands On: Task description
  1. NCBI Datasets Genomes ( Galaxy version 16.42.0+galaxy0) with the following parameters:
    • In “Query”:
      • “Choose how to find genomes to download”: By NCBI assembly or BioProject accession
        • “Enter accession or read from file ?”: Read a list of NCBI Assembly accessions from a dataset
          • param-file “Select dataset with list of NCBI Assembly accessions”: out_file1 (output of Remove beginning tool)

    TODO: Check parameter descriptions

    TODO: Consider adding a comment or tip box

    Comment: short description

    A comment about the tool or something else. This box can also be in the main text

TODO: Consider adding a question to test the learners understanding of the previous exercise

Question
  1. Question1?
  2. Question2?
  1. Answer for question1
  2. Answer for question2

Conclusion

Sum up the tutorial and the key takeaways here. We encourage adding an overview image of the pipeline used.