InterMine integration with Galaxy

Overview
Creative Commons License: CC-BY Questions:
  • How to export your query results from your InterMine of choice to Galaxy?

  • How to export a list of identifiers from Galaxy to your InterMine of choice?

Objectives:
  • Learn how to import/export data from/to InterMine instances

  • Understand the InterMine Interchange Dataset

Time estimation: 1 hour
Supporting Materials:
Published: Dec 9, 2020
Last modification: Feb 29, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00152
version Revision: 14

InterMine (Smith et al. 2012) is a well-establish platform to integrate and access life sciences data. It provides the integrated data via a web interface and RESTful web services.

Other organizations download and deploy InterMine on their servers: there are more than 30 instances over the world (registered at registry.intermine.org), covering many organism, including human data, model animals, plants and drug targets.

InterMine has been integrated with Galaxy: the InterMine tool server in Galaxy allows to import the data returned by any InterMine search and viceversa, using the InterMine Interchange format it’s possible to export a list of identifiers from Galaxy into any InterMine instance of your choice.

Learn more in this tutorial.

Agenda

In this tutorial, we will cover:

  1. Import data from InterMine
  2. Export identifiers into InterMine
    1. Get data
    2. Create InterMine Interchange dataset
    3. Send identifiers to InterMine
  3. Conclusion

Import data from InterMine

Hands-on: Import

Search Galaxy for InterMine (not case sensitive; intermine is fine too), and click on InterMine Server under Get Data.

  1. InterMine Server

  2. This will redirect you to the InterMine registry, which shows a full list of InterMines and the various organisms they support. Find an InterMine that has the organism type you’re working with, and click on it to redirect to that InterMine.

  3. Once you arrive at your InterMine of choice, you can run a query as normal - this could be a search, a list results page, a template, or a query in the query builder. Eventually you’ll be presented with an InterMine results table.

  4. Click on Export (top right). This will bring up a modal window.
  5. Select Send to Galaxy and double-check the “Galaxy Location” is correct.
  6. Click on the Send to Galaxy button on the bottom right of the pop-up window.

    If you get an error when you click on the Send to Galaxy button, please make sure to allow popups and try again.

You have now exported your query results from InterMine to Galaxy.

Export identifiers into InterMine

Get data

Hands-on: Data upload
  1. Import some fly data from Zenodo or from the data library

    https://zenodo.org/record/3407174/files/GenesLocatedOnChromosome4.tsv
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    1. Go into Shared data (top panel) then Data libraries
    2. Navigate to the correct folder as indicated by your instructor.
      • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
    3. Select the desired files
    4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
    5. In the pop-up window, choose

      • “Select history”: the history you want to import the data to (or create a new one)
    6. Click on Import

  2. Rename the dataset to GenesLocatedOnChromosome4

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

  3. Inspect the data

The dataset contains the secondary identifier and the symbol of the Drosophila melanogaster genes and their location on the chromosome 4

Question

Do the data contain the type, e.g Protein or Gene?

No, they don’t. So we have to specify it, when we create the InterMine Interchange file

Create InterMine Interchange dataset

We will use Create InterMine Interchange Dataset tool in order to generate an intermediate file which will be used to send the identifiers (e.g. gene identifiers) to InterMine. This file requires the identifier’s type (e.g. Gene), the identifier (e.g WBGene00007063) and, optionally, the organims’s name.

Hands-on: Generate InterMine file
  1. Create InterMine Interchange dateset ( Galaxy version 0.0.1) with the following parameters:
    • param-file “Tabular file”: select the GenesLocatedOnChromosome4 dataset which contains some fly’s genes
    • “Feature Type Column”: Column: 1
    • “Feature Type”: Gene
    • “Feature Identifier column”: Column: 2
    Comment
    • In this example, because the GenesLocatedOnChromosome4 dataset does not contain the type we have to specify it, in the “Feature Type”
    • “Feature Type”: this is type of the identifiers you are exporting to InterMine, in this example Gene. It must be a class in the InterMine data model.
    • “Feature Identifier column”: select a column from the input file which contains the identifier. We have selected Column 2, which contains the gene symbol.
    • “Feature Identifier”: This could be, as an example, a gene symbol like GATA1 or another other identifier, e.g. FBGN0000099 or perhaps a protein accession. In our example we do not have to edit anything because the values for this field are contained in the GenesLocatedOnChromosome4 dataset, in Column 2.
    • “Organism Name column”: select a column from the input file which contains the organism’s name, if you have multiple organisms in the same dataset.
    • “Organism Name”: alternatively you can directly provide the organism’s name. The organims’ name is not mandatory, but is good to provide if it is known. It does not have to be precise
  2. Click on Run Tool

Send identifiers to InterMine

Once the generation of the interchange dataset has been completed, open the green box related to Create InterMine Interchange on data.

Hands-on: Send data
  1. Click on view intermine at Registry to be redirected to the InterMine registry, which shows a full list of InterMines and the various organisms they support.
  2. Find an InterMine that has the organism type you’re working with, in our case FlyMine, and click on the Send to green button to export the identifiers to.
    1. You are redirected to FlyMine, in the List Analysis page showing the identifiers you have just exported from Galaxy.

Conclusion

You have now exported your identifiers from Galaxy to InterMine.