FAIR data management solutions

Author(s)	Katarzyna Kamieniecka Krzysztof Poterlowicz
Editor(s)	Helena Rasche
Reviewers

Overview
Questions:

Is there a reproducibility crisis?

What can go wrong with data analysis?

Objectives:

Learn best practices in data management

Learn how to introduce computational reproducibility in your research

Requirements:

tutorial Hands-on: FAIR in a nutshell

Time estimation: 10 minutes

Supporting Materials:

FAQs

instances Available on these Galaxies

Possibly Working

UseGalaxy.eu

UseGalaxy.org

UseGalaxy.org.au

UseGalaxy.fr

UseGalaxy.ca

Published: May 30, 2023

Last modification: Jan 16, 2025

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00349

version Revision: 5

The FAIR (Findable, Accessible, Interoperable, Reusable) data stewardship created the foundation for sharing and publishing digital assets, especially data. This apply to machine accessibility and emphasize that all digital assets should share data in a way that will enable maximum use and reuse.

This tutorial is a short introduction to the FAIR data management framework. You can find out more at the FAIR Pointers online course.

Agenda

In this tutorial, we will cover:

Data management planning

Data description and collection or reuse of existing data

Documentation and data quality

Storage and backup

Legal and ethical requirements

Conclusion

Data management planning

In recent years we have notice a data explosion. Number of sequence records in each release of GenBank, from 1982 to the present, doubled in size approximately every 18 months. Great amounts of data available followed by expanding range of tools and computational solutions result in reproducibility crisis. Having data management plan (DMP) in place is essential to achieve FAIR data management. DMPs are often described as living documents and should be updated according to changing circumstances.

There are several ways to set up FAIR (Findable, Accessible, Interoperable, Reusable) data management plans (DMPs) :

Findable (F): Data description and collection or reuse of existing data
Accessible (A): Standardised authentication or authorisation (e.g. HTTP, HTTPS)
Interoperable (I): Documentation and data quality
Reusable (R): Storage and backup supported by legal and ethical requirements

Data description and collection or reuse of existing data

Reusing legacy datasets from institutional repositories or the digital libraries data collections can be FAIRified retrospectively. Support for data collection and development, throughout the life cycle can be provided and followed by change management and capacity improvement.

Multi-part FAIR research need a way of wrapping up, describing and sharing to promote the reuse of data. Data sharing agreements define the purpose of data sharing. Reference roles and responsibilities; specifies the purpose and legal requirements, e.g. for data security.

An institutional aim should be to create an integrated view and context over fragmented resources using their persistent identifiers (PIDs) and metadata. To make datasets findable, these metadata need to be as widely available as possible.

Enhancing reproducibility, quality and transparency by ensuring information flow and showcasing secondary use is also a part of data management. Promoting hands-on data experience and events activities built an collaborative environment for reproducible science.

Documentation and data quality

Having access to local knowledge and encouraging best practises at the departmental level is a smart way to offer direction on a variety of standards and methods. In order to implement FAIR data practises within an institution, resources and infrastructure are needed. To increase the possibility of data reuse, several FAIR requirements can be satisfied using freely available guidelines e.g. RDMkit, FAIR Cookbook, ELIXIR-UK DaSH Fellowship initiative and repositories e.g. Zenodo, Harvard Dataverse and figshare.

Storage and backup

Systems for storage, backup and collaboration depend upon technical infrastructure.The ‘3-2-1 rule’, a recommendation for saving three copies of the research data—two locally and one off-site—is a standard backup strategy for research data. Data, metadata, and other research artefacts, such as ontologies, software, documentation, and papers, must all be kept in locations where they are adequately safeguarded, backed up, and accessible to maximise their potential for reuse. Appropriate access management is essential, in addition to backup and restoration services that protect researchers against data loss, theft, malfunctioning computers or storage media, and accidental deletion or inadvertent alterations to the data.

The fundamental component of infrastructure required for the FAIR research data lifecycle are repository services. They allow access to the data, a persistent identifier, and the descriptive metadata that support interoperability. Repositories can include basic data storage, resource finding, managing access and use of private information, facilitating peer review of information related to publications or services requiring digital preservation, and more.

The OpenAIRE repository guide advises users to check the availability of a suitable repository in following order:

The most effective option (if available) is to maintain the data in accordance with acknowledged discipline-specific criteria using an established, dedicated (external) data archive or repository that caters specifically to the study domain.
Making use of institutional data repositories is the second-best option.
If none of those options is practical, a free data repository should be used.

Up-to-date lists of available registered data repositories can be found at re3data and FAIRsharing.

Legal and ethical requirements

Institutional support network (data stewards, ethics boards, IP, legal and financial offices) need to guide researchers in safeguarding data management responsibilities and resources.

Conclusion

You will have the advantage of saving time and resources by planning how to FAIRify your data in the early phases of your research endeavour. To put this into action, a data management strategy, or DMP, must be written. A DMP is also where you outline your data collection, storage, processing, sharing, and disposal procedures. Planning the management and FAIRification of your data reduces the possibility of issues down the road, whether they be practical, legal, or technical.

Keep in mind that creating FAIR data is a complex process. Consider how you can make your data FAIR one step at a time at each stage of the creation, collection, documentation, storage, sharing, archiving, and preservation processes. The framework for the rest of your study planning is laid by incorporating your data documentation. Imagine you want to use a dataset that was generated by another researcher and how you would like it to be found. Hope this quick introduction to FAIR data management solutions will help you improve not only your experience with data but also influence others by using your guidance and FAIRified resources.

You've Finished the Tutorial

Key points

FAIR data management allows machines to automatically find and use the data accordingly.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Glossary

DMP: Data Management Plan
FAIR: Findable, Accessible, Interoperable, Reusable
PID: Persistent Identifier

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Katarzyna Kamieniecka, Krzysztof Poterlowicz, FAIR data management solutions (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/fair/tutorials/data-management/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{fair-data-management,
author = "Katarzyna Kamieniecka and Krzysztof Poterlowicz",
	title = "FAIR data management solutions (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/fair/tutorials/data-management/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Funding

These individuals or organisations provided funding support for the development of this resource

DASH UK

This Fellowship was funded through the ELIXIR-UK DaSH project as part of the UKRI Innovation Scholars: Data Science Training in Health and Bioscience call (DaSH). (MR/V038966/1). The project aims to embed Research Data Management (RDM) know-how into UK universities and institutes by producing and delivering training in FAIR data stewardship using ELIXIR-UK knowledge and resources.

Congratulations on successfully completing this tutorial!

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.