5 misconceptions about research data

Research data management is now a major scientific, economic and societal issue. Failure to apply the FAIR principles (Findable, Accessible, Interoperable, Reusable) not only leads to considerable financial losses, but also to wasted time, poor exploitation of scientific results and a hindrance to reproducibility.

Misconception #1: ‘I own my research data.’

False for public research. In France, the 2016 Law for a Digital Republic equates research data with public data when more than half of the work is funded by public funds.

Institutions (universities, public scientific and technological establishments, funding agencies) generally have legal and ethical responsibility for this data, which extends beyond the individual who produces it.
Scientists are responsible for their data but rarely own it. They must ensure its traceability, preservation and, as far as possible, availability in accordance with FAIR principles.

This responsibility now takes place in a rapidly changing geostrategic context (international tensions, issues of sovereignty and national security). Thus, even though the FAIR principles encourage sharing, certain data may be subject to distribution restrictions for strategic reasons.

Misconception #2: ‘I can easily manage my data myself.’

False. Data management is a technical skill in its own right, just like statistical analysis, for example. It requires a variety of knowledge: structuring, formats, metadata, documentation, data management plans, archiving, legal and ethical aspects.

Individual management can lead to data loss or corruption, confusion between files, a lack of information rendering the data unusable, and inconsistencies in formats.
More and more institutions are recruiting data stewards, data managers or data management engineers, and offering training courses. This saves valuable time and guarantees quality.

Misconception #3: ‘Data management has a major environmental impact.’

True and false. Data storage consumes energy, and its impact depends on how the data is organised and stored.

Poor practices increase the carbon footprint of data:

  • Multiple redundant copies, which require more storage space.
  • Failure to sort between useful data and obsolete raw data.
  • Storage in non-optimised infrastructures.

By centralising, organising and documenting datasets correctly, it is possible to significantly reduce these impacts. For example, applying FAIR principles can directly reduce the number of redundant copies of data by 20%1. Structured storage limits the number of duplicate copies that would appear if each person kept their own version.

Misconception #4: ‘Having the data alone is enough!’

False. Without metadata, data is unusable. Metadata provides the context necessary to understand, interpret and reuse a dataset.
Having data without metadata is like having a tin can without a label: you can't know what's inside or how it was produced!

Some examples of essential metadata include collection methods, experimental protocols, units of measurement, temporal and spatial context, and more.
Without these elements, even your own data can become incomprehensible after a few months. In a famous survey2 conducted by Monya Baker and published in Nature in 2016, more than 70% of research teams reported having tried and failed to reproduce the experiments of other scientists, and 50% also reported having failed to reproduce their own experiments.

Misconception #5: ‘Implementing FAIR principles is too expensive and time-consuming...’

False. While the initial effort of documentation and structuring may seem significant at times, studies show that it is the absence of FAIR principles that generates the highest costs.

A European Commission report1 estimated the annual cost of non-compliance with FAIR principles at at least €10.2 billion, of which 43.8% was related to time lost by scientists and 52.4% to the storage of unnecessarily duplicated data. Based on €302.9 billion in research expenditure in 2016, this represents approximately 3% of the total European research budget.

Implementing FAIR principles reduces:

  • time wasted searching for information;
  • the risk of data loss or misunderstanding;
  • infrastructure costs;
  • the risk of errors and rework;
  • data duplication, which has a high environmental and financial cost.

 

Adopting rigorous data management is not an additional constraint, but a strategic investment: it saves time, reduces costs and improves the quality of research. Preconceived ideas must give way to a culture of data management!

For more information on data management, see the guide: https://www.pepr-agroeconum.fr/ressources-utiles/guide-gestion-des-donnees

Many thanks to Frédéric de Lamotte (https://orcid.org/0000-0003-4234-1172), Data Steward (INRAE), for his expertise and assistance in writing this article.

1 European Commission report ‘Cost of not having FAIR research data’ (2018). https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1/language-en
2 Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016). https://doi.org/10.1038/533452a

See also

Further reading:
More information on the diversity of careers in data: https://hal-lara.archives-ouvertes.fr/hal-05265596v1
Guide to open science on research data: https://www.ouvrirlascience.fr/wp-content/uploads/2024/03/24-02-22-Donnees-FR-WEB.pdf