Comprehensive data quality for Open Science

The SATW examines existing standards and measures to simplify the exchange of data among researcher in a SWITCH Innovation Lab.

Text: Manuel Kugler, published on 26.11.2019

In the age of Big Data, everyone generates hundreds of megabytes of data per day, and the trend is rising. The standards for these are generally relatively low and the majority of this data is hardly reused. This should not be the case in research, where expensive measuring equipment or elaborate surveys can make it costly to obtain data. Making research data widely accessible is a goal of the Open Science movement and in particular of Open Data.

Data must be FAIR

Research data management presents the international scientific community with various challenges. Various initiatives are currently underway in Switzerland to meet these challenges. For example, the Swiss National Science Foundation (SNSF) has committed itself to making publicly financed research accessible to the public free of charge wherever possible. swissuniversities is developing an Open Science programme to enable Swiss universities to reuse and disseminate research data. The Swiss Academies of Arts and Sciences are also involved in this programme, and in their recently published fact sheet they make recommendations on the promotion of Open Access and Open Data. The central principles of these initiatives can be summarised by the acronym FAIR: Findable, accessible, interoperable and reusable.

The documentation of research data depends on the science in question

Comprehensive data quality is central to the exchange and reuse of research data. Researchers need to know how raw data has been processed and that it has not been manipulated. This requires appropriate documentation, for example in the form of metadata. This is particularly important when data from different research disciplines are brought together. Since different data is generated in each field, a range of different attributes are also being used.

Comprehensive data quality for a research data Connectome

SWITCH Innovation Labs was recently launched by the SWITCH foundation as an agile collaboration platform with higher education partners. In order to promote the Swiss Open Science ecosystem and the creation of a research data Connectome, two labs were defined: "Comprehensive data quality" and "Technologies for a research data Connectome". SATW was commissioned for the first lab. Clear data quality metrics are essential to share research data across disciplines.

In addition to new research questions and findings, comprehensive data quality enables better reproducibility of results. So far, this can be very time-consuming or even impossible. However, the requirement to disclose and share research data can also trigger resistance among researchers. After all, this increases the risk of errors being exposed and potentially damaging the reputation of researchers or their institutions. In addition, the documentation shouldn't generate significant additional effort - so that researchers can concentrate on their core tasks.

Expert survey documents the state of knowledge

With the help of an expert survey, the SATW collects the state of knowledge and measures for a well-founded, comprehensive data quality in various research areas. This is based on the assumption that individual disciplines apply different standards and approach the issue as such from various angles. The survey identifies national needs and problems. Initial results are expected by the end of 2019, and  follow-up activities will be initiated from 2020.


Update 2 March 2020:

About the author
Manuel    Kugler

Manuel Kugler

Manuel Kugler joined SATW in 2016 and heads the Advanced Manufacturing and Artificial Intelligence priority programmes. He holds a Master's degree in Materials Science from ETH Zurich. Before joining SATW, he first worked as a technical specialist in the group for micro- and nanosystems and then as a project manager at greenTEG AG on the development of thermoelectric heat flux sensors.


FAIR data connected for the future

Our vision to connect repositories and the research data they contain: the research data Connectome.  

The Research Data Connectome

Other articles