Group tags for complex experimental designs

Overview
Creative Commons License: CC-BY Questions:
  • What are group tags?

  • How can I use group tags to perform multi-factor analyses with collections

Objectives:
  • Learn how to set group tags

  • Learn how to select group tags in tools

Time estimation: 10 minutes
Supporting Materials:
Published: Mar 6, 2019
Last modification: Nov 3, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00149
rating Rating: 1.5 (2 recent ratings, 6 all time)
version Revision: 17

Advanced uses of Galaxy often require the use of dataset collections, which can contain between one and tens of thousands of datasets. Grouping datasets in this way has numerous advantages:

  • It is easy to represent a single collection in the History
  • Dataset names (“Element Identifiers”) are immutable and preserved
  • Collections can be split and nested in arbitrary ways

While collections can be split in any way, doing so for multi-factor analysis quickly becomes cumbersome and messy. An alternative is to label collection elements with special group tags, i.e. tags prefixed by the string group:. Note that group tags currently do not propagate, i.e. they are not inherited to datasets resulting from analyses. These tags can be displayed in the Tool form, allowing users to select subsets of collections.

This tutorial outlines how to set and use group tags with the DESeq2 tool. For a more detailed description and background for differential expression testing see the Reference-based RNA-Seq data analysis.

Agenda

In this tutorial, we will cover:

  1. Setting group tags using the apply rules tool
    1. Set group tags during upload
    2. Set group tags using the “Tag elements from file” tool
  2. Using group tags in tool, e.g. DESeq2
  3. Conclusion

Setting group tags using the apply rules tool

There are several ways to set group tags:

  • Using the Rule Based Uploader
  • Using the “Tag elements from file” tool
  • Using the “Apply Rules” tool
  • Manually adding dataset tags with the prefix group:

We will use the first two methods in this tutorial. The second and third methods work at any step during the analysis. Note that the function of the “Apply Rules” tool is (nearly) identical to the Rule Based Uploader.

Set group tags during upload

Hands-on: Set group tags during upload
  1. Create a new history for this tutorial

    Click the new-history icon at the top of the history panel.

    If the new-history is missing:

    1. Click on the galaxy-gear icon (History options) on the top of the history panel
    2. Select the option Create New from the menu

  2. Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
  3. Click on Rule-based on the top

    Rule-based upload.

    As you can see in this dialog, data can be selected from a history dataset or pasted in directly

  4. Set Upload data as: to Collection(s)
  5. Paste the following links into the text box

    https://zenodo.org/record/1185122/files/GSM461176_untreat_single.counts
    https://zenodo.org/record/1185122/files/GSM461177_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461178_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461179_treat_single.counts
    https://zenodo.org/record/1185122/files/GSM461180_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461181_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461182_untreat_single.counts
    
  6. Click Build
  7. We will add a regex that creates 3 new columns with accession, treatment and library type:
    • Click on the Column button and then Using a Regular Expression
    • Select Create columns matching expression groups
    • Paste .*(GSM.*)_(.*)_(.*).counts in “Regular Expression”
    • Set “Number of Groups” to 3
    • Click on Apply

      We should have now a table with 4 columns: link, sample name, treatment, sequencing type

    • Click on Rules and then Add / Modify Column Definitions
    • Click on Add Definition and select:
      • “URL”: Column A (Note that this option is absent when using the “Apply rules tool”)
      • “List Identifiers”: Column B
      • “Group Tags”: Columns C and D (Select Column C first and then add D by clicking on “… Add another column”)
    • Click Apply
    • Enter a name for the new collection
    • Click Upload
  8. Expand the generated collection and the files in it and check their names and tags

    Group tags in Galaxy UI are prefixed with group:.

Set group tags using the “Tag elements from file” tool

We now want to add group tags using the “Tag elements from file” tool.

Hands-on: Upload and create a collection
  1. Create a new history for this tutorial
  2. Import the following files

    https://zenodo.org/record/1185122/files/GSM461176_untreat_single.counts
    https://zenodo.org/record/1185122/files/GSM461177_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461178_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461179_treat_single.counts
    https://zenodo.org/record/1185122/files/GSM461180_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461181_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461182_untreat_single.counts
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Create a Dataset List (Collection) with these 7 files

    • Click on galaxy-selector Select Items at the top of the history panel Select Items button
    • Check The 7 datasets you’ve just imported
    • Click 7 of N selected and choose Build Dataset List

      build list collection menu item

    • Enter a name for your collection
    • Click Create List to build your collection
    • Click on the checkmark icon at the top of your history again

    Creating a simple collection

We have now a collection with our files. We can now either upload a tabular file containing the element identifiers and the tags we want to apply, or we can extract the element identifiers and extract the tags using a Regular Expression. We will do the latter.

Hands-on: Set group tags using the "Tag elements from file" tool
  1. Extract element identifiers tool
    • param-collection “Dataset collection”: created collection
  2. Replace Text in entire line tool
    • param-file “File to process”: output of Extract element identifiers tool
    • In “Replacement”:
      • In “1: Replacement”
        • “Find pattern”: (.*)_(.*)_(.*).counts
        • “Replace with”: \1_\2_\3.counts\tgroup:\2\tgroup:\3

    This step adds an additional columns that can be used with the Tag elements from file tool

  3. Change the datatype to tabular

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click on the galaxy-gear Convert tab on the top
    • In the lower part galaxy-chart-select-data Datatypes, select tabular
      • tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

  4. Tag elements from file tool
    • param-collection “Input Collection”: created collection
    • param-collection “Tag collection elements according to this file”: output of Replace Text tool

You should now have a properly tagged collection of tabular files that can be used in DESeq2.

Using group tags in tool, e.g. DESeq2

DESeq2 has two modes for specifying factors. One can either select datasets corresponding to factors, or use group tags to specify factors. We will use the group tags present in our collection to specify factors.

The tool interface will prompt you with the group tags that are available for your inputs:

Group tags in the tool UI.

Hands-on: Running DESeq2 with group tags
  1. DESeq2 tool with the following parameters:
    • “how”: Select group tags corresponding to levels
      • param-collection “Count file(s) collection”: Generated collection
      • In “Factor”:
        • In “1: Factor”
          • “Specify a factor name”: Treatment
          • In “Factor level”:
            • In “1: Factor level”:
              • “Specify a factor level”: treat
              • “Select groups that correspond to this factor level”: Tags: treat
            • In “2: Factor level”:
              • “Specify a factor level”: untreat
              • “Select groups that correspond to this factor level”: Tags: untreat
        • Click on param-repeat “Insert Factor” (not on “Insert Factor level”)
        • In “2: Factor”
          • “Specify a factor name” to Sequencing
          • In “Factor level”:
            • In “1: Factor level”:
              • “Specify a factor level”: paired
              • “Select groups that correspond to this factor level”: Tags: paired
            • In “2: Factor level”:
              • “Specify a factor level”: single
              • “Select groups that correspond to this factor level”: Tags: single
    • “Files have header?”: No
    • “Output normalized counts table”: Yes

Conclusion

We can select a subset of Collections using the special group tag.