You are here
Data, a Treasure to Be Shared
Data is the touchstone of any scientific undertaking. It makes it possible to establish new discoveries, favours one hypothesis over another, disproves certain theories and helps conceive new ones. Without data, science would be blind. Yet surprising though it may seem, and no matter the significant percentage of scientific articles published in open access each year, the evidence on which this research is based remains surprisingly difficult to access. Worse still, in some cases it is impossible to get hold of a publication's original data. "Research data sharing lags behind publications," concedes Marin Dacos, in charge of the National Plan for Open Science launched during the summer of 2018 by the French Ministry of Higher Education, Research and Innovation. "In many disciplines, there is no or little culture of sharing or of documenting data in order to facilitate its reuse," he laments.
Some domains nevertheless led the way a long time ago. "In 1977, astronomers established a format for exchanging numerical data in order to share astronomical observations and all associated information (e.g. location, conditions of observation, type of instrument, etc.)," points out Françoise Genova, a researcher who directed the Strasbourg-based CDS1 between 1995 and 2015, and now builds on this pioneering experience to guide reflection on data openness at the national, European, and international levels.
Data that is easy to find, accessible, interoperable, and reusable
This reflection is taking place in an international context that has undergone significant change in recent years. Large public research organisations, such as the National Science Foundation in the United States or the European Research Council, want the data produced by the scientific programmes they finance to henceforth be available in open access — "and yet closed when necessary," as points out CNRS Chief Research Officer Alain Schuhl, who cites the example of a patent registration associated with the data.
In an effort to better accompany researchers along this path, this commitment to openness comes with a series of recommendations,2 one that can be summarised by a new concept, and has the ring of a slogan: FAIR, for "Findability, Accessibility, Interoperability, and Reusability". While in the age of the Internet and search engines the first two criteria —findability and accessibility — do not seem to pose technical difficulties, the latter two still lack universal solutions: "The diversity of scientific data and variety of practices require addressing the problem discipline by discipline," stresses Françoise Genova, taking a realistic approach of the situation.
The interoperability and reusability of data entail that such data be accompanied by a series of descriptions so that it can be properly interpreted and jointly used. A measurement on its own means nothing if the conditions in which it was obtained are not known (simply saying that it is 10 °C makes this isolated data unusable, but specifying that, on the morning of 12 November 2019, it was 10 °C according to a certain type of thermometer, and that the measurement was conducted in the middle of Paris, gives its meaning to this information, which can now be integrated into other temperature readings).
Finally, in order to be easily reusable as required by the final criterion, this data and its detailed description should also be available in a standard format, which is to say inscribed within a document that is legible to everyone, as is the case for exchanging music and videos on the Internet. "While certain scientific disciplines such as astronomy, crystallography, and genomics have already made progress on these issues, there is still room for improvement in many other domains with regard to the implementation of international standards," the scientist adds.
The Research Data Alliance (RDA), an international organisation with 8,800 members in 137 countries, was created to support scientists in this effort. "It provides a framework for discussing and sharing know-how between scientific communities across the globe," Genova enthuses. In France, the Opidor (Data Management Plan for Optimized Sharing and Interoperability of Research Data) platform, which was developed at the CNRS by the Institute for Scientific and Technical Information (INIST), anticipated this evolution a few years ago. It offers a number of tools that assist researchers who want to build a data management plan in conformity with FAIR recommendations: "This data management plan has actually been required by the French National Research Agency since 2019 for all funding applications."
A reflection on all levels
In addition to thorny issues surrounding the form that this sharing will take, the question arises of the physical resources needed to store this data. Who will physically host and provide long-term access to this considerable mass of information? "This reflection must be conducted on all levels — whether regional, national, or international — in conjunction with other research organisations," stresses Alain Schuhl.
Beyond these technical challenges, "the evaluation of researchers remains an obstacle to adopting these best practices," Genova indicates. "People who become involved in these matters actually spend less time conducting their own research. They take more risks for their career. It is therefore essential that research organisations join forces on the issue." In any event, this is the declared goal of the National Plan for Open Science: to encourage and promote these practices among all French actors in public research. There is more at stake than just data openness, as an entirely new and promising branch of science is emerging in its wake. "The possibility of gathering vast sets of information from different sources could help extract new knowledge thanks to mining software," Alain Schuhl predicts. Open data therefore is not only about sharing, but opening new horizons for science.