+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting.

Notes for next slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting.

What is Galaxy?

2 / 26

Galaxy

Data Intensive analysis for everyone

  • Versatile and reproducible workflows
  • Web platform
  • Open source under Academic Free License
  • Developed at Penn State, Johns Hopkins, OHSU and Cleveland Clinic with substantial outside contributions

Galaxy resources

3 / 26
  • The Galaxy Team is composed by bioinformaticians and software engineers
  • OHSU = Oregon Health & Science University
  • Workflows are similar to cooking recipes. This metaphor helps comprehension of workflows for users researchers, citizens and hobbyists that are not familiar with data intensive research.

Core values

  • Accessibility
    • Users without programming experience can easily upload/retrieve data, run complex tools and workflows, and visualize data
  • Reproducibility
    • Galaxy captures information so that any user can understand and repeat a complete computational analysis
  • Transparency
    • Users can share or publish their analyses (histories, workflows, visualizations)
    • Pages: online Methods for your paper
4 / 26

accessible reproducible transparent research means sharing everything.

The Galaxy framework aims to make as simple as possible for researchers to:

  • share their analyses
  • track all used tools and versions
  • check all parameters
  • justify each step in the analysis
  • publish the findings with all aforementioned information

Pages: interactive, web-based documents that describe a complete analysis.

Galaxy growth

  • More than 7,000 ready to use tools for users
  • More than 10,000 citations
  • More than 160 public Galaxy resources
    • 120+ public servers, many more non-public
    • Both general-purpose and domain-specific
5 / 26
  • The number of public Galaxy resources should be regularly updated.

Galaxy timeline

Galaxy ecosystem

Galaxy ecosystem

User Interface

6 / 26

So now that we know what Galaxy and the Galaxy Project are all about, let's look at the Galaxy interface.

Main Galaxy interface

Galaxy user interface

Home page divided into 3 panels

7 / 26

Top menu

Top menu

Link Usage
Analyze Data go back to the homepage
Workflow access existing workflows or create new one using the editable diagrammatic pipeline
Visualize create new visualisations and launch Interactive Environments
Shared Data access data libraries, histories, workflows, visualizations and pages shared with you
Help links to Galaxy Help Forum (Q&A), Galaxy Community Hub (Wiki), and Interactive Tours
User your preferences and saved histories, datasets, pages and visualizations
8 / 26

Tools

Tool interface

  • The tool search helps in finding a tool in a crowded toolbox
9 / 26

Tool interface

  • A tool form contains:
    • input datasets and parameters
    • help, citations, metadata
    • an Execute button to start a job, which will add some output datasets to the history
  • New tool versions can be installed without removing old ones to ensure reproducibility
10 / 26

The tool form is generated from a simple XML file describing:

  • the input datasets and their datatypes
  • the tool parameters (numerical, text, boolean, selections, colour)
  • the dependencies required to run the tool
  • how to generate a command to execute the tool with the specified inputs and parameters
  • the output datasets the tool should produce and their datatypes
  • tests
  • help, citations
  • various metadata (e.g. the tool version)

Tools can be viewed as tiny LEGO pieces: each one solves a specific problem, and they can be combined together to build complex analysis pipelines.

Tool Shed

  • Free "app" store: Galaxy Tool Shed
    • Thousands of tools already available
    • Most software can be integrated
      • If a tool is not available, ask the Galaxy community for help!
    • Only a Galaxy admin can install tools
11 / 26

History

  • Location of all analyses History

    • collects all datasets produced by tools
    • collects all operations performed on the data
  • For each dataset (the heart of Galaxy’s reproducibility), the history tracks

    • name, format, size, creation time, datatype-specific metadata
    • tool id, version, inputs, parameters
    • standard output (stdout) and error (stderr)
    • state (waiting, running, success, failed)
    • hidden, deleted, purged
12 / 26
  • We say datasets to refer to files as well as databases
  • Purged means permanently deleted

Multiple histories

  • You can have as many histories as you want
    • each history should correspond to a different analysis
    • and should have a meaningful name

13 / 26
  • Give it a good name so you can find it later, which can otherwise become difficult when you have a large number of old histories.
  • You can drag and drop datasets between histories

History options menu

History behavior is controlled by the History options (gear icon) History menu

History options gear button

  • Create new history (+ icon) will not make your current history disappear
  • To see all of your histories, use the history switcher

    History Switcher

  • Copy Datasets from one history to another and save disk space for your quota

14 / 26
  • Copying datasets between histories does not affect your quota, only a single copy of the file is stored on disk because datasets are never modified after creation.

Loading data

15 / 26

So now you know about the tools to manipulate data and the history where you can see your data, your inputs and outputs. Let's discuss how to get data into Galaxy

Importing data

  • Copy/paste some text
  • Upload files from your local computer
  • Upload data from an internet URL
  • Upload data from online databases: UCSC, BioMart, ENCODE, modENCODE, Flymine etc.
  • Import from Shared Data (libraries, histories, pages)
  • Upload data from FTP

See Getting data into Galaxy

16 / 26

Datatypes

  • Tools only accept input datasets with the appropriate datatypes
  • When uploading a dataset, its datatype can be either:
    • automatically detected
    • assigned by the user
  • Datasets produced by a tool have their datatype assigned by the tool
  • To change the datatype of a dataset, either:
    • galaxy-pencil Edit attributes and Datatypes (if original wrong), or
    • galaxy-pencil Edit attributes and Convert
17 / 26
  • When you upload data, Galaxy will try to autodetect the format of the data, but can sometimes get it wrong, so you may need to correct it later.
  • Edit Attributes → Datatype is used to fix a wrongly assigned datatype
  • Edit Attributes → Convert Formats creates a new dataset using a tool that converts the original dataset in the new format
  • New datatypes can be added to the Galaxy code base, if missing

Reference datasets

Example: reference Genome

  • Genome build specifies which genome assembly a dataset is associated with
    • e.g. mm10, hg38...
  • Can be assigned by a tool or by the user
  • Users can create custom genome builds
  • New builds can be added by the admin

Genome Builds

18 / 26
  • Just like datatypes, you can specify which genome assembly your dataset is about. Some tools need to know this, and Galaxy can tell the tool for you.

Workflows

19 / 26

Now that you've got data into Galaxy, you know you can use tools to manipulate this data, and histories to keep track of what you've done. You're only missing one key part: workflows. These help you easily reproduce the exact analysis that you ran.

Workflow Editor

Workflow interface

  • Extracted from a history
  • Built manually by adding and configuring tools using the canvas
  • Imported using an existing shared workflow
20 / 26

Biologist:

  • workflows are great
  • single button to run all of these 50 different tools
  • a lot of work once to figure out analysis, but easy in the future to just rerun, go get coffee and wait for thing to be done :)

Bioinf / dev:

  • Boxes are workflow steps
    • 2 types: input and tool steps
  • Steps are connected by arrows representing the flow of datasets
  • Tool panel on the left with Inputs on top (to add input datasets and collections)
  • Small tool form on the right
  • Extracting a workflow from a history allows to easily convert an existing history into an analysis workflow

Why would you want to create workflows?

  • Re-run the same analysis on different input data sets
  • Change parameters before re-running a similar analysis
  • Make use of the workflow job scheduling
    • jobs are submitted as soon as their inputs are ready
  • Create sub-workflows: a workflow inside another workflow
  • Share workflows for publication and with the community
21 / 26

Potential information overload for newbies

Visualizations

  • Datatypes know what tools can be used to visualize datasets:
    • Sequencing data has a button for visualizing in IGV
    • Tabular data will prompt you to build charts
    • Protein data can be seen in a 3D viewer
  • Interactive environments: Jupyter, RStudio, etc
22 / 26

Sharing data

  • Share everything you do in Galaxy - histories, workflows, and visualizations
    • Directly using a Galaxy account's email addresses on the same instance
    • Using a web link, with anyone who knows the link
    • Using a web link and publishing it to make it accessible to everyone from the Shared Data menu
23 / 26

Community

24 / 26
  • know was a lot
  • we'll come back
  • slides are always available online
  • first real analysis after the coffee?
25 / 26

What is Galaxy?

2 / 26
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow