Creating, Editing and Importing Galaxy Workflows

Overview
Creative Commons License: CC-BY Questions:
  • How can you construct Galaxy workflows from scratch?

  • How can you label outputs?

  • How can you include workflows in workflows?

  • How can you tag workflows?

  • How can you manage tool versions?

  • How can you manage workflow versions?

Objectives:
  • Understand key aspects of workflows

  • Create clean, non-repetitive workflows

Requirements:
Time estimation: 30 minutes
Level: Intermediate Intermediate
Supporting Materials:
Published: Jul 17, 2020
Last modification: Nov 9, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00163
rating Rating: 1.0 (1 recent ratings, 8 all time)
version Revision: 12

Workflows are a powerful feature in Galaxy that allow you to link multiple steps of complex analysis. In this tutorial we will demonstrate how to use the Workflow Editor to construct multiple variants of a simple workflow. Note that these workflows are meant to illustrate different concepts. Not all workflows require using all of the features described below, but we hope this tutorial will inspire you to make your analysis tasks more efficient.

Read about extracting workflows from histories in this tutorial.

Agenda

In this tutorial, we will cover:

  1. Workflow steps
  2. Creating a new workflow
  3. Editing our simple workflow
  4. Embedding a workflow within a workflow
  5. Conclusion

Workflow steps

Workflows logically connect a collection of steps. Possible step types are currently workflow inputs, tools, and workflows.

Creating a new workflow

Hands-on: Create a new workflow
  • Click on Workflow in the top panel of the Galaxy page
  • On the top right you will see 2 buttons: Create and Import
  • To create a new workflow click on Create
  • Enter a Name and Annotation for your workflow and click Save
  • The Workflow Editor will open with a new, empty workflow loaded
A new empty workflow. Open image in new tab

Figure 1: A new empty workflow

On the left hand side of the Editor you see the available tools in the tool panel. The center panel (or “canvas”) holds the workflow layout. Steps will appear in the center panel. On the right you see the attributes of the workflow, such as name, version, annotation and tags. Depending on the context the contents of the right panel will change, but you can always return to these attributes by clicking on the Edit Attributes button (the Pencil icon on the upper right). If there is no Pencil icon you can find the Edit Attributes button under the the Workflow options button (a wheel icon) on the top right of the editor.

We will start by creating a very simple workflow with just 2 tools, and then add more advanced features.

Hands-on: Insert a dataset input
  1. Expand the “Inputs” section in the tool panel and click on “Input dataset” to create a new dataset input
  2. Click on the new input dataset in the center panel. Set the following parameter on the right side:
    • Label: A simple text input dataset
Comment: Optional Input Datasets & Formats

Tools may have optional dataset inputs. If your workflow should use optional datasets, you can set Optional to Yes. Doing this allows you to connect such an input only to Tool inputs that are optional. You can also restrict the format of an input dataset or input dataset collection. This serves as documentation and prevents selection of incompatible datasets.

Comment: Input modules

There are 3 input types, “Input dataset”, “Input dataset collection” and “Simple inputs used for workflow logic”. Insert an input dataset or dataset collection for each possible input to your workflow. “Simple inputs used for workflow logic” allow the definition of parameters that users can or should change when running your workflow. Please check out the Using Workflow Parameters tutorial for a detailed description of how to use these.

We’re now ready to add a first tool and connect it to our input dataset.

Hands-on: Add tac reverse a file (reverse cat) to your workflow
  1. Find tac reverse a file (reverse cat) tool in the tool panel and click on it
  2. A new box labeled tac tool will appear in the center panel
  3. Click on tac in the center panel and see the tool parameters on the right side
  4. We will keep the default tool settings and only give the step a label
    • Label: Reverse dataset
  5. Click on the round blue symbol of the input dataset and drag the connection to the highlighted round green tool input
Connecting an input. Open image in new tab

Figure 2: Connecting outputs and inputs
Comment: Tip: Workflow connections

Connections can be made by clicking on an output terminal and dragging the cursor to an input terminal. Input terminals that are compatible with the current output are highlighted in green, while input terminals that can’t be connected are highlighted in Orange. When dragging an incompatible output over an input a small textbox appears mentioning the reason why a connection cannot be made. A valid connection can be made if the format of an output is allowed as input. A simple text file output for instance cannot be used when the input requires a binary format. If a dataset collection is required as input but the output of a node is a single dataset you will see the message “Cannot attach a data output to a collection input”. If an output of a step is connected to another input one cannot change the input dataset to a dataset collection. In order to connect inputs in such a case, all outputs of the step must be disconnected. Connections can be removed by hovering over an input terminal and clicking.

Comment: Steps can be labeled

The default label is the tool name, but it is often useful to label a step with what it does, especially if a tool is used multiple times in a workflow. A click on a step will open the step’s settings on the right side. Any label will immediately appear in the center panel as well.

This is great, but while a single tool in a workflow might be handy (for instance if there are many parameters to be set), let’s add another tool that works on the output of tac reverse a file (reverse cat) tool for an authentic workflow experience. From now on we’ll contract steps 1 to 4 and just mention the tool and parameters to insert, since the procedure is always the same.

Hands-on: Add Select first lines from a dataset to your workflow
  1. Select first lines from a dataset tool
    • Label: Select first lines
    • Select first: 1
  2. Connect the output of the Reverse dataset step to the input
  3. Save galaxy-save your workflow using the save button on the top right

We now have a very simple workflow that will reverse the contents of a file and then output the first line of the resulting dataset. Now we’re ready to upload a test dataset and run our workflow.

Hands-on: Running the workflow
  1. Return to the analysis page by clicking the Home button galaxy-home (or Analyze Data on older versions of Galaxy) on the top
  2. Upload a dataset using “Paste/Fetch data” with the contents

    A
    B
    C
    D
    E
    F
    
  3. Run your workflow

    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

The outputs of the workflow will now appear in your history. In addition to our input file we will see 2 new datasets: 2: tac on data 1 which contains the reversed dataset and 3: Select first on data 2 which just contains the line F.

This is fine, but if we want to process many datasets at once the naming of input datasets in the history will be difficult to follow. Luckily we can use dataset collections as inputs, which will maintain element identifiers across all steps of an analysis. We can also add colorful tags that can help us identify groups of datasets and we can label and rename outputs.

Editing our simple workflow

We will now add tags to step outputs and label one of the 2 output datasets.

Comment: Configuring Outputs

Open a step and scroll to the “Configure Output:” section on the right side of the editor. Here you can set a Label. Outputs with a label can be used as outputs in a subworkflow. You will also be able to set an output name for the dataset and to add or remove tags. You can also force a datatype. Note that setting a datatype does not change the content, so use this only if the file content fits the datatype you are going to select. This can be used to change a text output to tabular or gff/bed for instance.

Hands-on: Editing our simple workflow
  1. Open our simple workflow in the Workflow Editor
  2. Remove the input dataset called A simple text input dataset using the white galaxy-cross icon
  3. Add an input dataset collection and label it
    • Label: A text dataset collection
  4. Disconnect the exisiting connections and reconnect
  5. Select the Reverse dataset step and under Configure Output: outfile set
    • Add Tags: name:reverse
  6. Select the Select first lines step and under Configure Output: outfile set
    • Add Tags: name:first
    • Label: Last lines
    • Rename dataset: Renamed datasets
  7. Save galaxy-save your workflow using the save button on the top right
Hands-on: Running the workflow
  1. Return to the analysis page by clicking the Home button galaxy-home (or Analyze Data on older versions of Galaxy) on the top
  2. Create a dataset collection from the first 2 files in your history

    • Click on galaxy-selector Select Items at the top of the history panel Select Items button
    • Check all the datasets in your history you would like to include
    • Click n of N selected and choose Build Dataset List

      build list collection menu item

    • Enter a name for your collection
    • Click Create collection to build your collection
    • Click on the checkmark icon at the top of your history again

  3. Run your workflow using the newly created collection input

    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

You will now see only 1 new dataset collection, Renamed datasets, in your history. This is because we have labeled only the last step in the workflow. This collection has 2 name tags, reverse and first. The other output collection is hidden in the history but can be seen by clicking on hidden in your history.

We will now use this workflow and embed it in a new workflow.

Embedding a workflow within a workflow

Another step type is the subworkflow. We can use this to include a section of a workflow that is repeated within a workflow or a workflow that contains steps that are useful in more than one workflow, so that we don’t have to maintain and update closely related workflows.

Here we will include our workflow twice within a new workflow and then paste the contents of each workflow together.

Hands-on: Embedding a workflow
  1. Create a new, empty workflow
  2. Insert a dataset collection input
  3. On the left side scroll down until you see the Workflows section
  4. Insert the previously created workflow by clicking on the workflow name
  5. Label the new workflow step:
    • Label: First workflow
  6. Repeat steps 4 and 5, but change the Label
    • Label: Second workflow
  7. Insert Paste two files side by side tool
  8. Connect the 2 workflow outputs to the Paste two files side by side tool input
  9. Save galaxy-save your workflow using the save button on the top right

    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

This is a very contrived example, but this technique can be used to separate re-useable steps in real world scenarios.

Comment: Workflow versions

Every time a workflow is saved a new version is created, so that you can go back and forth between new and old versions of a workflow. Click on the pencil symbol to bring up the workflow attributes. You can freely select different versions. You can change an old version of a workflow, and when you save it it will become the newest version.

Comment: Importing workflows

Workflows can be imported via URL, through Shared Data -> Workflows or from a local file on your computer.

Comment: Managing tool versions

Versions of a tool in a workflow can be changed by clicking on a tool step in the center panel and on the right side clicking on Select another tool version.

Conclusion

You now know the ins and outs of Workflows in Galaxy and should be able to make your analyses more efficient and less manual!