name: inverse layout: true class: center, middle, inverse
---
# Integrate and query local datasets and distant RDF data with AskOmics using Semantic Web technologies
Xavier Garnier
Anthony Bretaudeau
Anne Siegel
Olivier Dameron
Mateo Boudet
last_modification
Updated:
purl
PURL
:
gxy.io/GTN:S00097
text-document
Plain-text slides
|
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - [Introduction to Galaxy Analyses](/training-material/topics/introduction) - [Sequence analysis](/training-material/topics/sequence-analysis) - Quality Control: [
slides
slides](/training-material/topics/sequence-analysis/tutorials/quality-control/slides.html) - [
tutorial
hands-on](/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html) - Mapping: [
slides
slides](/training-material/topics/sequence-analysis/tutorials/mapping/slides.html) - [
tutorial
hands-on](/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html) --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - What is the Semantic Web and how can it help integrating and querying data? - How can AskOmics help benefiting from Semantic Web technologies without having to write RDF or SPARQL code? --- ### <i class="fas fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Understand the basics of RDF and SPARQL - Learn how input data must be structured to be integrated with AskOmics - Learn how to connect distant SPARQL endpoints to local data with AskOmics --- # How to explore data --- ## Requirements Study of biological mechanisms requires to: - **integrate** multiple data sources (differential expression results, genome annotation, remote protein database) - **query** them to answer a biological question (which genes are over-expressed in a condition, and what are the proteins coded by these genes) .image-00[ ] --- # What is the Semantic Web? --- ## Semantic Web Set of recommendations to **integrate data**, to **integrate domain knowledge** and to **perform query** and reasoning. - Resource Description Framework (RDF): annotate data - RDFS + OWL: represent knowledge (data description) - SPARQL Protocol and RDF Query Language (SPARQL): query data --- ## RDF - RDF is for describing **resources** (the R in RDF) - resources are identified by URIs (`nextprot:P01137`, `taxon:9606`) - **describing** (D in RDF) a resource is representing it explicitly - its attributes (`nextprot:P01137 :hasSequence "MPPSGLRLLL..."`) - its relations to other entities (`nextprot:P01137 :hasTaxon taxon:9606`) - its descriptions (aka classes) (`nextprot:P01137 :is nextprot:Protein`) --- ## RDF: Set of triples - RDF is represented by triples (subject, predicate, object) - **Subject**: the resource being described - **Predicate**: the relation (from subject to object) - **Object**: a value of the predicate for the subject .image-01[ ] ```turtle nextprot:P01137 :hasTaxon taxon:9606 . nextprot:P01137 :hasSequence "MPPSGLRLLL" . ``` --- ## RDF: triples form a labeled directed graph ```turtle # Description nextprot:P01137 rdf:type nextprot:Protein . taxon:9606 rdf:type nextprot:Organism . # Data nextprot:P01137 :hasTaxon taxon:9606 . nextprot:P01137 :hasSequence "MPPSGLRLLL" . ``` .image-02[ ] --- ## SPARQL - The SPARQL language is a set of triple patterns with variables (`?variable_name`) ``` SELECT ?gene WHERE { ?gene rdf:type :Gene . ?gene :hasTaxon taxon:9606 . } ``` - All `?gene` with `rdf:type` `:Gene` **and** with `:hasTaxon` `taxon:9606` - In other words, the query returns all the human genes --- ## SPARQL: entity matching allow federated queries .pull-left[ - Using the same identifier for the same entity (**entity matching**) across multiple datasets allows federated queries - Federated SPARQL queries provide unified querying capabilities over multiple datasets as if they were a single virtual graph ] .pull-right[ .image-03[ ] ] --- # What is AskOmics? --- ## AskOmics Web software for **data integration** and **query** using Semantic Web. The main functionalities are: - Convert of **multiple data formats** into RDF triples and store them in a local triplestore - Generate **complex SPARQL queries** using a user-friendly interface - Support external SPARQL endpoint to **cross-reference integrated data with remote data** AskOmics can be used as a standalone software, or **with Galaxy** --- # Data integration with AskOmics --- ## Local data integration - RDF and SPARQL are good infrastructures to describe and query biological datasets, but most of them are still stored on flat files like CSV/TSV and GFF. - AskOmics converts multiple structured data formats into RDF triples * CSV/TSV * GFF * BED --- ## AskOmics generates the graph of data and the abstraction AskOmics uses the file structure (*e.g.* header of TSV files) to generate the graph of data description: **the abstraction** .image-04[ ] The rest of the files is converted to RDF triples that correspond to the data. --- ## Distant RDF data integration - Some public databases (*e.g.* [neXtProt](https://sparql.nextprot.org)) provide RDF data through a SPARQL endpoint (public access for RDF data) - To connect with a remote SPARQL endpoint, AskOmics needs its RDF abstraction - The abstraction can be generated with [abstractor](https://github.com/askomics/abstractor) ```bash pip3 install abstractor abstractor -s https://sparql.nextprot.org/sparql -o nextprot_abstraction.ttl -f turtle ``` - This abstraction can then be uploaded into AskOmics as a standard file --- # Query multiple data sources with AskOmics --- ## Query composition - Users navigate through the abstraction of local and remote data and create a path that represents a query .image-06[ ] - The query is converted to SPARQL code that is executed on the local and remote RDF data - Results are returned and can be downloaded --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - RDF and SPARQL are Semantic Web technologies that come useful for data integration and querying but the technical aspects may deter end-users - AskOmics is a web platform for data integration and query using Semantic Web in a user-friendly way --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Author(s)
Xavier Garnier
Anthony Bretaudeau
Anne Siegel
Olivier Dameron
Mateo Boudet
Reviewers
Tutorial Content is licensed under
Creative Commons Attribution 4.0 International License
.