Group tags for complex experimental designs

Overview

Questions:
  • What are group tags?

  • How can I use group tags to perform multi-factor analyses with collections

Objectives:
  • Learn how to set group tags

  • Learn how to select group tags in tools

Time estimation: 10 minutes
Last modification: May 4, 2021
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Introduction

Advanced uses of Galaxy often require the use of dataset collections, which can contain between one and tens of thousands of datasets. Grouping datasets in this way has numerous advantages:

  • It is easy to represent a single collection in the History
  • Dataset names (“Element Identifiers”) are immutable and preserved
  • Collections can be split and nested in arbitrary ways

While collections can be split in any way, doing so for multi-factor analysis quickly becomes cumbersome and messy. An alternative is to label collection elements with special group tags, i.e. tags prefixed by the string group:. Note that group tags currently do not propagate, i.e. they are not inherited to datasets resulting from analyses. These tags can be displayed in the Tool form, allowing users to select subsets of collections.

This tutorial outlines how to set and use group tags with the DESeq2 tool. For a more detailed description and background for differential expression testing see the Reference-based RNA-Seq data analysis.

Agenda

In this tutorial, we will cover:

  1. Setting group tags using the apply rules tool
    1. Set group tags during upload
    2. Set group tags using the “Tag elements from file” tool
  2. Using group tags in tool, e.g. DESeq2

Setting group tags using the apply rules tool

There are several ways to set group tags:

  • Using the Rule Based Uploader
  • Using the “Tag elements from file” tool
  • Using the “Apply Rules” tool
  • Manually adding dataset tags with the prefix group:

We will use the first two methods in this tutorial. The second and third methods work at any step during the analysis. Note that the function of the “Apply Rules” tool is (nearly) identical to the Rule Based Uploader.

Set group tags during upload

hands_on Hands-on: Set group tags during upload

  1. Create a new history for this tutorial

    Tip: Creating a new history

    Click the new-history icon at the top of the history panel.

    If the new-history is missing:

    1. Click on the galaxy-gear icon (History options) on the top of the history panel
    2. Select the option Create New from the menu
  2. Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
  3. Click on Rule-based on the top

    Rule-based upload

    As you can see in this dialog, data can be selected from a history dataset or pasted in directly

  4. Set Upload data as: to Collection(s)
  5. Paste the following links into the text box

    https://zenodo.org/record/1185122/files/GSM461176_untreat_single.counts
    https://zenodo.org/record/1185122/files/GSM461177_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461178_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461179_treat_single.counts
    https://zenodo.org/record/1185122/files/GSM461180_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461181_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461182_untreat_single.counts
    
  6. Click Build
  7. We will add a regex that creates 3 new columns with accession, treatment and library type:
    • Click on the Column button and then Using a Regular Expression
    • Select Create columns matching expression groups
    • Paste .*(GSM.*)_(.*)_(.*).counts in “Regular Expression
    • Set “Number of Groups” to 3
    • Click on Apply

      We should have now a table with 4 columns: link, sample name, treatment, sequencing type

    • Click on Rules and then Add / Modify Column Definitions
    • Click on Add Definition and select:
      • “URL”: Column A (Note that this option is absent when using the “Apply rules tool”)
      • “List Identifiers”: Column B
      • “Group Tags”: Columns C and D (Select Column C first and then add D by clicking on “… Add another column”)
    • Click Apply
    • Enter a name for the new collection
    • Click Upload
  8. Expand the generated collection and the files in it and check their names and tags

    Group tags in Galaxy UI are prefixed with group:

Set group tags using the “Tag elements from file” tool

We now want to add group tags using the “Tag elements from file” tool.

hands_on Hands-on: Upload and create a collection

  1. Create a new history for this tutorial
  2. Import the following files

    https://zenodo.org/record/1185122/files/GSM461176_untreat_single.counts
    https://zenodo.org/record/1185122/files/GSM461177_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461178_untreat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461179_treat_single.counts
    https://zenodo.org/record/1185122/files/GSM461180_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461181_treat_paired.counts
    https://zenodo.org/record/1185122/files/GSM461182_untreat_single.counts
    
    • Copy the link location
    • Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

    • Select Paste/Fetch Data
    • Paste the link into the text field

    • Press Start

    • Close the window

    • By default, Galaxy uses the URL as the name, so rename the files with a more useful name.
  3. Click on the galaxy-selector icon (Operations on multiple datasets)
  4. Check all new datasets
  5. Click on For all selected… and then Build Dataset List
  6. Enter a name for the new collection and click Create list

We have now a collection with our files. We can now either upload a tabular file containing the element identifiers and the tags we want to apply, or we can extract the element identifiers and extract the tags using a Regular Expression. We will do the latter.

hands_on Hands-on: Set group tags using the “Tag elements from file” tool

  1. Extract element identifiers tool
    • param-collection “Dataset collection”: created collection
  2. Replace Text in entire line tool
    • param-file “File to process”: output of Extract element identifiers tool
    • In “Replacement”:
      • In “1: Replacement”
        • “Find pattern”: (.*)_(.*)_(.*).counts
        • “Replace with”: \1_\2_\3.counts\tgroup:\2\tgroup:\3

    This step add an additional columns that can be used with the Tag elements from file tool

  3. Change the datatype to tabular

    Tip: Changing the datatype

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
    • Select tabular
    • Click the Change datatype button
  4. Tag elements from file tool
    • param-collection “Input Collection”: created collection
    • param-collection “Tag collection elements according to this file”: output of Replace Text tool

You should now have a properly tagged collection of tabular files that can be used in DESeq2.

Using group tags in tool, e.g. DESeq2

DESeq2 has two modes for specifying factors. One can either select datasets corresponding to factors, or use group tags to specify factors. We will use the grop tags present in our collection to specify factors.

The tool interface will prompt you with the group tags that are available for your inputs:

Group tags in the tool UI

hands_on Hands-on: Running DESeq2 with group tags

  1. DESeq2 tool with the following parameters:
    • “how”: Select group tags corresponding to levels
      • param-collection “Count file(s) collection”: Generated collection
      • In “Factor”:
        • In “1: Factor”
          • “Specify a factor name”: Treatment
          • In “Factor level”:
            • In “1: Factor level”:
              • “Specify a factor level”: treat
              • “Select groups that correspond to this factor level”: Tags: treat
            • In “2: Factor level”:
              • “Specify a factor level”: untreat
              • “Select groups that correspond to this factor level”: Tags: untreat
        • param-repeat Click on “Insert Factor” (not on “Insert Factor level”)
        • In “2: Factor”
          • “Specify a factor name” to Sequencing
          • In “Factor level”:
            • In “1: Factor level”:
              • “Specify a factor level”: paired
              • “Select groups that correspond to this factor level”: Tags: paired
            • In “2: Factor level”:
              • “Specify a factor level”: single
              • “Select groups that correspond to this factor level”: Tags: single
    • “Files have header?”: No
    • “Output normalized counts table”: Yes

Conclusion

We can select a subset of Collections using the special group tag.

Key points

  • Group tags allow complex analyses without reshaping or unhiding datasets in a collection

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Using Galaxy and Managing your Data topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.

Click here to load Google feedback frame

Citing this Tutorial

  1. Marius van den Beek, 2021 Group tags for complex experimental designs (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/group-tags/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

details BibTeX

@misc{galaxy-interface-group-tags,
author = "Marius van den Beek",
title = "Group tags for complex experimental designs (Galaxy Training Materials)",
year = "2021",
month = "05",
day = "04"
url = "\url{https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/group-tags/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                

Congratulations on successfully completing this tutorial!