FAIR and its Origins

Author(s)	Robert Andrews Nick Juty Munazah Andrabi Nicola Soranzo Sara Morsy Kellie Snow Korneel Hens Philippe Rocca-Serra Laura Cooper Xenia Perez Sitja Andrew Mason Branka Franicevic Saskia Lawson-Tovey Katarzyna Kamieniecka Khaled Jum'ah Krzysztof Poterlowicz
Reviewers

Overview
Questions:

What is FAIR and the FAIR Guiding Principles?

Where does FAIR come from?

Objectives:

Identify the FAIR principles and their origin.

Explain the difference between FAIR and open data.

Contextualise the main principles of FAIR around the common characteristics of identifiers, access, metadata and registration.

Requirements:

tutorial Hands-on: FAIR in a nutshell

Time estimation: 40 minutes

Supporting Materials:

FAQs

instances Available on these Galaxies

Possibly Working

UseGalaxy.eu

UseGalaxy.org

UseGalaxy.org.au

UseGalaxy.fr

UseGalaxy.ca

Published: Mar 26, 2024

Last modification: Mar 27, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00433

version Revision: 2

FAIR as an acronym, its origins and its guiding principles are introduced. Learners will be able to explain the difference between FAIR and open data and contextualise the main principles of FAIR around the common characteristics of identifiers, access, metadata and registration.

Agenda

In this tutorial, we will cover:

FAIR

Open data and FAIR

What is meant by FAIRification and FAIRness of data?

FAIR’s origins

Course structure

Glossary of FAIR terms

Useful Resources

FAIR

The concepts underlying the FAIR principles are intuitively grounded in good scientific practice, though formalising them as easily measurable is not straightforward. Being completely ‘FAIR’ is somewhat of an ideal, where it is often more pragmatic to be ’FAIR enough’ for a particular purpose or use case.

FAIR’s primary goal is to maximise data reuse by researchers. The FAIR principles enable reuse by helping researchers share and manage their data, though FAIR is not limited to data alone and can be applied to services, software, training and workflows. The word FAIR is an acronym, derived from its major components: ‘F’indable, ‘A’ccessible, ‘I’nteroperable, and ‘R’eusable which form the foundation of the FAIR Guiding Principles.

Findable means that data and its metadata can be found/discovered by humans and computers. Part of this is making rich metadata and keywords available to search engines and data repositories, so that companion data can be discovered.

Accessible means that once discovered, data and metadata can be accessed/downloaded by humans and computers. Typically this means the commitment of the resource to its long term hosting and availability, with a suitable licence, and in appropriate format.

Interoperable means that data and metadata are supplied in formats that can be easily used and interpreted by humans and computers. The file formats and terms (vocabularies) used can be integrated easily with other datasets and software.

Reusable means that metadata is rich, enabling appropriate reuse. Commonly FAIR will encourage the use of community standards for data curation.

These 4 major components form headings for the 15 FAIR Guiding Principles shown in Table 1.1

The FAIR Guiding Principles
To be Findable:	F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:	A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available
To be Interoperable:	I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data
To be Reusable:	R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards

Table 1.1: The FAIR guiding principles as described in Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) Wilkinson et al. 2016.

Question

Look at the wording of the FAIR principles in Table 1.1. Which terms are used more than once? Which terms are you seeing for the first time?

Shared terms are: data (F1, F2, F4, A1, A2, I1, I2, I3, R1, R1.1, R1.2, R1,3), metadata (F1, F2, F3, F4, A1, A2, I1, I2, I3, R1, R1.1, R1.2, R1,3), identifier (F1, F3, A1), protocol (A1, A1.1, A1.2).
Not so obviously shared terms are: vocabularies (I1, I2), access (A1, A1.1, A1.2, R1.1)
Terms which you may be seeing for the first time are: metadata, persistent identifier, searchable resource, standardised communications protocol, authentication and authorisation procedure, knowledge representation, vocabularies, qualified references, data usage license, provenance, domain-relevant community standards.

The aim of this course is to put these principles into context and familiarise learners with terms used commonly around FAIR. A glossary of terms appears at the bottom of this episode as a point of reference.

Open data and FAIR

FAIR_is_not_the_same_as_Open_data. — **Figure 1**: FAIR is not the same as Open data

Since FAIR promotes data sharing, it is often misunderstood as Open data.

The Open Data handbook defines Open data as “data that can be freely used, reused and redistributed by anyone.” Commonly, FAIR data is open however FAIR compliance does not mandate access without restriction. Instead, for instances where FAIR data is subject to restricted access, the conditions of access need to be stated to be compliant with FAIR, for example access around sensitive data. Often in FAIR data, people adhere to the philosophy of “as open as possible, and as closed as necessary” thereby maximising opportunities to reuse data.

Question

Under which circumstances would restricted or closed access to FAIR data be advisable?

Where data is sensitive or subject to intellectual property. Protecting sensitive data overrules mandating research data should be open access.
Note though that in most cases, people following FAIR principles will be looking to share their data openly. Also note that sensitive data can be released through anonymisation and in many cases subject to controlled access by the authority of the principal investigator or data access committee.

What is meant by FAIRification and FAIRness of data?

FAIRification is the process of making your data FAIR compliant by applying the 15 Guiding Principles shown in Table 1.1. The extent to which you apply these principles defines the FAIRness of your data. In other words, FAIRness refers to the extent by which your data is FAIR and implies some implicit means of measuring its compliance.

FAIR’s origins

A report from the European Commission Expert Group on FAIR data describes the origins of FAIR and its development in 2014-2015 by a FORCE11 Working Group. The following exercise dips into this report and asks you to investigate some of FAIR’s history and foundation.

Question

Read page 11 of the European Commission report, under the heading “Origins and definitions of FAIR”. What benefit did the FORCE11 Working Group see to coining the word FAIR?

The report states: “a FORCE11 Working Group coined the FAIR data definition, latching onto an arresting and rhetorically useful acronym. The wordplay with fairness, in the sense of equity and justice, has also been eloquent in communicating the idea that FAIR data serves the best interests of the research community, and the advancement of science as a public enterprise that benefits society.”

Course structure

During the remainder of this course we will put the FAIR Guiding Principles into context using the 4 FAIR characteristics of metadata, data registration, access and identifiers, and devote an episode to each of these. Whilst we teach, we will map content to the appropriate FAIR principle and define all relevant terms identified.

Glossary of FAIR terms

All terms in this glossary are mentioned in the FAIR Guiding Principles (Table 1.2) and referenced in the following episodes of this course.

The FAIR Guiding Principles
To be Findable:	F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:	A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available
To be Interoperable:	I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data
To be Reusable:	R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards

Table 1.2: Terms used by the FAIR Principles that appear in this glossary are highlighted in black.

Globally unique and persistent identifier: a reference to a digital resource such as a dataset, a document, a database, etc, usually given as a URL, that takes the user to that resource. The persistent identifier (PID) is unique, globally, in the sense that it is not used to identify any other digital resource. The PID is persistent, in that it enables a resource to be located long term, preferably permanently. For more see: How to FAIR and FAIR Cookbook, episode 5 (identifiers).

Indexed in a searchable resource: repositories and catalogues that can be queried (search box) are examples of searchable resources. Indexing refers to processes within the architecture of the data repository where (meta)data are organised so that they can be queried based on defined fields. Indexing by an internet search engine (for example, Google) is another example of an indexed and searchable resource. For more see: GO FAIR.

Standardised communications protocol: a method that connects two computers and ensures secure data transfer. Examples of this include the hypertext transfer protocol (http(s)) and the file transfer protocol (ftp) that permit data to be requested and downloaded by selecting a link on a webpage, or launched from within a script, for example using an (application programming interface) API. For more see the Australian Research Data Commons.

Universally implementable: in the context of a standardised communications protocol such as http(s), universally implementable means it can be used by a number of resources and software, for example Firefox, Chrome and Unix commands such as wget. For more see the Australian Research Data Commons.

Authentication and authorisation procedure: data repositories with authentication and authorisation procedures generally require a login (and password) to access (meta)data. For more see the Australian Research Data Commons.

Language for knowledge representation: in the context of (meta)data exchange between computers, (meta)data should be in formats that are universally recognised (interoperable standards). For more see GO FAIR and FAIR Cookbook.

Vocabularies: (or controlled vocabulary) is a dictionary of terms you can use when producing (meta)data. Controlled vocabularies are often shared between databases and communities so by using them you can allow data from different sources to be merged (interoperable), based on a shared understanding of the concepts. Ontologies are related to vocabularies, where terms in the vocabulary are organised by relations between them. A commonly used ontology is the NCBI taxonomy where the term ‘Homo sapiens’ belongs to a hierarchy of parent terms such as ‘Primates’ and ‘Mammalia’. The ontology defines the vocabularies and the parent/child relationships. For more see Ten simple rules for making a vocabulary FAIR Cox et al. 2021.

Qualified references: are terms used to describe relationships to pieces of (meta)data. For more see GO FAIR.

Data usage license: describes the legal rights on how others use your data. For more details on considerations in selecting a licence, or to find out more about types of licence that are available, see RDMkit.

Data provenance: refers to metadata describing the origin of a piece of data, including information such as version, original location of the data, and usually an audit trail up to the current version. For more see RDMkit.

Community standards: are standard guidelines used to structure and exchange data, usually supported by community-developed resources and/or software. In the context of (meta)data, community standards relate often to the standardised ontologies used by a domain of research, and minimum information guidelines allowing data to be interoperable. For more see FAIRDOM.

Machine-readable: though this term is not referenced in the FAIR principles, it is often discussed within the FAIR context. Machine-readable (meta)data is supplied in a structured format that can be read by a computer. For more see Open Data Handbook and RDMkit.

(Meta)data: is shorthand for ‘metadata and data’.

Useful Resources

The published FAIR Guiding Principles: Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) Wilkinson et al. 2016.
Recipes for data FAIRification, written by domain experts giving real-world examples: FAIR Cookbook
Documentation and frameworks for data FAIRification. Each of the 15 FAIR principles is put into context with real data examples: GO FAIR
FAIR walkthrough using examples from across all academic disciplines: How to FAIR

You've Finished the Tutorial

Key points

FAIR stands for Findable, Accessible, Interoperable and Reusable.

Metadata, Identifiers, Registration and Access are 4 key components in the process of FAIRification.

FAIR data is as open as possible, and as closed as necessary.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

References

Wilkinson, M. D., M. Dumontier, I. J. J. Aalbersberg, G. Appleton, M. Axton et al., 2016 The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3: 10.1038/sdata.2016.18
Cox, S. J. D., A. N. Gonzalez-Beltran, B. Magagna, and M.-C. Marinescu, 2021 Ten simple rules for making a vocabulary FAIR (S. Markel, Ed.). PLOS Computational Biology 17: e1009041. 10.1371/journal.pcbi.1009041

Glossary

FAIR: Findable, Accessible, Interoperable, Reusable

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Robert Andrews, Nick Juty, Munazah Andrabi, Nicola Soranzo, Sara Morsy, Kellie Snow, Korneel Hens, Philippe Rocca-Serra, Laura Cooper, Xenia Perez Sitja, Andrew Mason, Branka Franicevic, Saskia Lawson-Tovey, Katarzyna Kamieniecka, Khaled Jum'ah, Krzysztof Poterlowicz, FAIR and its Origins (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/fair/tutorials/fair-origin/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{fair-fair-origin,
author = "Robert Andrews and Nick Juty and Munazah Andrabi and Nicola Soranzo and Sara Morsy and Kellie Snow and Korneel Hens and Philippe Rocca-Serra and Laura Cooper and Xenia Perez Sitja and Andrew Mason and Branka Franicevic and Saskia Lawson-Tovey and Katarzyna Kamieniecka and Khaled Jum'ah and Krzysztof Poterlowicz",
	title = "FAIR and its Origins (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/fair/tutorials/fair-origin/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Funding

These individuals or organisations provided funding support for the development of this resource

DASH UK

This Fellowship was funded through the ELIXIR-UK DaSH project as part of the UKRI Innovation Scholars: Data Science Training in Health and Bioscience call (DaSH). (MR/V038966/1). The project aims to embed Research Data Management (RDM) know-how into UK universities and institutes by producing and delivering training in FAIR data stewardship using ELIXIR-UK knowledge and resources.

Congratulations on successfully completing this tutorial!

Do you want to extend your knowledge?
Follow one of our recommended follow-up trainings:

tutorial Hands-on: Metadata

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.