Skip to main content

Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic



The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data ‘silos’ that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR.


In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors’ research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital.


Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.


The COVID-19 pandemic has challenged healthcare and research data management systems worldwide to provide reusable patient data for rapid and efficient translational research. Clinical data, laboratory measurements, and various omics data such as transcriptomics and metabolomics, are routinely collected from hospitalized COVID-19 patients to inform medical doctors about patients’ health status and to support research on treatment options. Analysing data integrated from multiple sources in a hospital, complemented with data from other hospitals and public knowledge bases, can generate critical information about disease mechanisms to support diagnosis, prognosis and decisions on interventions. However, research and clinical data are often not prepared for instant secondary use involving multiple sources. This was already an obstacle for efficient clinical and biomedical research in general, but a pandemic of a poorly understood novel disease that overloads hospitals’ capacity has revealed the significance of this problem.

Integrative analysis is challenged by software systems used to collect these various types of data from patients in hospitals. Different formats may be used (e.g. CSV or JSON) and the semantics of data are often underspecified and captured in a proprietary syntax or by different standards (e.g. HL7 FHIR or OpenEHR). This can result in fragmentation over multiple ‘silos’ that are not sufficiently interoperable for instant computational analysis. Reuse and reproducibility are further hampered by missing or unstandardised provenance, such as the time and date at which data were collected (e.g. scans may be performed on a different day than blood measurements). Furthermore, to expand analysis beyond one hospital, information on consent and regulations that control data access, reuse, and sharing are often unclear and not easily assessable. Complete harmonization of access regulations between institutes and countries is not realistic, but analysis could still be efficient if access regulations were at least computationally assessable.

Ideally, hospital systems are set up with integrative, federated data analytics in mind. Global leaders in data science have posed that this can be achieved by applying agreed upon standards to make data globally findable, accessible, interoperable, and reusable for humans and computers, also referred to by as ‘the FAIR principles’ [1]. Indeed, projects such as the GO FAIR Virus Outbreak Data Network (VODAN) [2], the ZonMW Covid program [3], the Trusted World of Corona (TWOC) [4], and ELIXIR Covid project [5] embrace FAIR principles as a key element of their COVID-19 data management strategy. A quintessential objective is turning data and data containers into machine actionable FAIR Digital Objects (FDOs), in this paper defined as resources in a digital, machine understandable form including explanatory metadata and addressable by a globally unique persistent and resolvable identifier; a formal framework for FDOs is under development, see [6, 7]. This will optimize the ability to integrate and visualise data from many sources, facilitate fine-grained data access regulation, and allow for decentralised and machine assisted analysis [8]. The latter is further enabled by the development of infrastructure that supports ‘data visiting’ [9, 10]. This is attractive for clinical data because (i) existing systems can be complemented with data visiting functions, thereby keeping their other functions in place, (ii) the output of an analysis is generally less privacy sensitive than the input. In Europe, the General Data Protection Regulation (GDPR) policy supports data visiting by requiring that access regulations for personal data are clearly defined [11].

Methods to facilitate the implementation of FAIR principles, or ‘FAIRification’, are currently being investigated in multiple projects and initiatives. We use ‘FAIRification’ to denote the process towards achieving FAIRification goals, irrespective of specific implementation choices per principle. We have previously published a generic workflow [12], as a basis for specialised variations such as for rare disease registries [13]. Related activities are the development of the FAIR cookbook in the FAIRplus project [14, 15], the three point framework for FAIRification of metadata by the VODAN GO FAIR network [16], and the organisation of a FAIRification steward team to support rare disease registries reach their FAIR goals [17]. The application of FAIR principles in hospitals is starting to be adopted in Europe as a key strategy for nationwide healthcare research data infrastructure [18, 19]. Cross connections through multinational collaborations, such as in ELIXIR and GO FAIR, and domain specific collaborations such as via globally operating patient organisations, could support convergence of FAIR implementation choices to further facilitate the adoption of FAIR principles and thereby efficient analysis across multiple hospitals in multiple countries.

At the Leiden University Medical Centre (LUMC), the implementation of FAIR principles for COVID-19 data is part of a multidisciplinary collaboration, coined ‘The BEAT-COVID project’. This collaboration was initiated in March 2020 to face the multiple analysis challenges of the COVID-19 pandemic. The LUMC is a tertiary care, teaching and research hospital in the Netherlands that encompasses clinical and research groups with expertise on immunology, biomedicine, data management and data science. The groups work together on collecting and sharing different types of patient data, analyses, findings, expertise, and novel solutions implemented in the hospital (e.g. see [20]). One of the challenges is to implement a FAIR Research Data Management plan (RDM) comprising FAIRification of priority resources and a FAIR based architecture that complements the existing data management systems in the hospital.

We hypothesise that the use of existing ontologies and ontological models will enable turning patient data into machine readable digital objects that are prepared for secondary use. Our objective is to develop ontological models that represent and link the data records and metadata of the datasets in the existing LUMC data management systems (Fig. 1). In our ontology centred approach, data can stay in existing systems but are made accessible ’in terms of’ the central data linking model to create a virtual warehouse. We reused existing ontological models such as the core ontological model for common data elements developed in the European Joint Programme on Rare Diseases (EJP RD) for patient registries [21], and the Data Catalogue Vocabulary (DCAT) for datasets [22]. The metadata is made accessible by a FAIR Data Point (FDP) instance [23]. FDPs ensure that BEAT-COVID resources can be found and used through querying machine readable metadata. It includes the pointers to access the content of the resource for analysis workflows, if access is permitted. By using ontologies, patient data in the hospital are virtually linked with other ontologically described data in the hospital, but also public Linked ‘Open’ Data (LOD). This can boost the potential for knowledge discovery and data+knowledge driven analytics. Interestingly, ontologies may also be used to describe data access restrictions [24, 25] to complement FAIR metadata with information that supports data safety and patient privacy.

Fig. 1
figure 1

Illustration of the central concepts of the envisioned FAIR based architecture: the central star represents the data linking model for interoperability that the sources refer to (data and metadata), the small stars next to each source represent what is used of the central model to describe the source (thereby becoming ‘self-describing’), the arrows represent workflows or scripts: for the source systems to map or convert source data and metadata to the central data linking model, for retrieving data from across the sources through the central data linking model, and for analysis. FAIR Data Points provide access to the ‘ontologised’ metadata and data (not shown)

In this paper, we describe and implement our approach for FAIRification of COVID-19 observational patient data in an academic hospital. We selected cytokine measurements of hospitalised patients as our primary objective of FAIRification and development of the FAIR RDM. We synthesized an artificial dataset mimicking original laboratory data obtained from patient samples to study the data lifecycle without the risk of violating patient privacy. Our main result is the FAIRification in the hospital. We also show that our FAIRifcation approach is providing cytokine measurements as FDOs and is enabling applications on top of this FAIR patient data for analytics. Importantly, this work has been done in close collaboration with clinicians and data managers who are familiar with the existing hospital data systems and data lifecycles to establish best practices for making data FAIR in the hospital. We demonstrate that a FAIR RDM plan based on describing data and metadata by ontologies delivers an infrastructure that complements existing infrastructure with FDOs that are prepared for integrative and federated analysis. We show our first results and the solutions that are currently being developed as LUMC research data management procedures. We finally discuss what FAIRification entails in a ‘real world’ hospital situation involving different stakeholders and departments, and future challenges such as data access regulation in a FAIR ecosystem.


FAIR status of patient data in existing systems

Our FAIR assessment of the cytokine data in existing systems revealed that while the structure and findability improved with each step of the data management lifecycle, no FAIR standards were applied to make the data and metadata globally understandable ‘for machines’, such as for automated computer processing (Table 1). The original data from the clinical laboratory that measured the cytokine levels was well structured, but not in a uniform, globally machine readable way. The data were further pre-processed manually and transferred to the Electronic Data Capture (EDC) software Castor [26]. Although this captured data electronically and in a uniform way, there were no ontological representations added to the data collection forms to create a FAIR dataset. Data was subsequently transferred from Castor into the Opal data warehouse system [27], conform the standard workflow for preparing data for research at the LUMC. Opal is a generic system to bring datasets from different systems in the hospital into one warehouse, supporting transformations and annotation on the data level with a vocabulary chosen by the user. Opal provides researchers at the LUMC a central access point to research data that are syntactically machine readable. It offers APIs that bioinformaticians can incorporate in their workflows. Anonymised data of daily parameters from patient records was imported into Opal without including retrievable patient identifiers in the research environment of the hospital on almost real time.

Table 1 FAIR assessment of existing systems containing cytokine data

Opal’s native metadata tool Mica [28] provides annotation on the dataset level such as how, when, where, by whom, under what conditions data has been collected. This information is subsequently published in a Web portal. Therefore, Mica provides resource information that is human readable on the Web. This metadata is not also available in a machine readable form. Findability for machines can be improved by adding a machine readable ontological representation. Our automated FAIR assessment of a dataset described in Mica (see here) showed specifically which FAIR improvements could be made to make the metadata descriptions in Mica more machine actionable and standardized. Although Mica implements unique identifiers, these were not persistent in our case, and they were also not explicitly defined in the metadata. This creates challenges for data accessibility and reusability. Some systems, notably Opal [29], provide handles to integrate FAIR features, but we chose to first incorporate independent components to minimize requirements for other systems and thereby optimize reusability of the approach.

Coordinated FAIRification

A coordinated FAIRification process with BEAT-COVID colleagues was set up to improve the machine readability, global interoperability, and findability of the COVID-19 data. We developed ontological models for data record in collaboration with data collectors, data managers, data analysts and medical doctors. Similarly, we developed machine actionable metadata to improve the findability, accessibility, and reusability of the datasets in collaboration with IT and database managers. Both tasks were performed in parallel and in a synergistic way to consistently support the entire data management lifecycle for data analysis, and they are ongoing for additional data types. While the BEAT-COVID project group was maintaining one-hour bi-weekly video calls for general update and logistic discussions, specific video calls were set up with the required experts and duration for the topic at hand. These regular and iterative meetings with all stakeholders were necessary to enable the development of optimal semantic modelling and computational standardization.

Representing patient data as FAIR digital objects

Central to our approach to implementing FAIR principles ‘for machines’ is the composition of ontological models from existing commonly used ontologies. These models serve as reference for the data in the source systems, creating a larger ‘virtual’ data warehouse. In this section we present the ontological models and FAIR infrastructure that were set up to represent patient data as FDOs discoverable for analytics. FDOs are broadly speaking a digital object identified by a Globally Unique, Persistent and Resolvable IDentifier (GUPRID) and described by metadata [6]. In the Materials and methods section, we explicitly describe how we represent patient data as FDOs where GUPRIDs role are defined.

Ontological data model for interoperability of clinical measurements

To create a user centred research driven data infrastructure, we used the medical research questions as drivers for the data modelling. Important for our approach was to enable a high level of interoperability of patient data within the hospital. To that end we targeted all the FAIR principles that enable interoperability, which are I1 ((meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation), I2 ((meta)data use vocabularies that follow FAIR principles), and I3 ((meta)data include qualified references to other (meta)data). We first created a general concept model for the questions to extend with relevant clinical data, and mapped recurrent important terms mentioned by medical doctors into terms in Open Biological Biomedical Ontologies (OBO) ontologies [30, 31] described in the Web Ontology Language (OWL) [32]. When we received the first actual data, cytokine measurements on samples collected from clinically admitted patients, we created an ontological model in Resource Description Framework (RDF) [33] for this data (see Fig. 2). The cytokine model is based on the core semantic model that was developed in the EJP RD for common data elements in rare disease patient registries. This is a simple model that abstracts that every element in a patient registry is the outcome of a process, so that process becomes the core concept of the model [34]. We reused this model jointly with the quantitative trait semantic model [35] to capture clinical data measurements, where the ‘process of measurement’ is the core concept. Reusing these existing ontological models for observational data in the LUMC supports FAIR data. Not only does it allow interoperability with patient registries and quantitative traits, but also the common biomedical ontologies used allow data integration with external knowledge such as LOD.

Fig. 2
figure 2

Ontological data model for the cytokine measurements patient dataset

We also modelled a new semantic module for disease severity score phenotypes following the same EJP RD core model, see Fig. 3. Apart from tracking the Apache IV Severity Score [36] and the SOFA Severity Score [37], medical doctors defined the Leiden Severity Score to obtain daily scores of disease severity for both COVID-19 patients admitted to the ward and ICU (Intensive Care Unit), more detailed information in the Materials and methods section. All these scores are based on lab results and clinical data and reflect the actual disease severity of the patient on that day and are informative for doctors to make decisions about patient care management. The ontological linking data model, and its modules (lab measurements, biosamples and disease severity score), are publicly available on GitHub-datamodel .

Fig. 3
figure 3

Semantic module to represent disease severity score phenotypes calculated in the hospital

Ontological metadata model for COVID-19 resources

To allow the metadata of COVID-19 resources in the hospital to be findable, accessible, and reusable by both humans and machines, we provided an ontological model to expose it in a machine readable way. The FAIR principles that we prioritised were those about the use of globally unique and persistent identifiers for data and metadata, and providing rich metadata. We also followed the best practice of using resolvable identifiers. In particular, for findability we targeted F1 ((meta)data are assigned a globally unique and persistent identifier), F2 (data are described with rich metadata), and F3 (metadata clearly and explicitly include the identifier of the data it describes), for accessibility A1 ((meta)data are retrievable by their identifier using a standardized communications protocol), and A1.1 (the protocol is open, free, and universally implementable), and for reusability R1 (meta(data) are richly described with a plurality of accurate and relevant attributes), R1.1 ((meta)data are released with a clear and accessible data usage license), R1.2 ((meta)data are associated with detailed provenance), and R1.3 ((meta)data meet domain-relevant community standards). We targeted these principles to enable a high level of machine actionability for evidence-based analysis within the hospital and across public biomedical research resources. Not yet prioritised were F4 ((meta)data are registered or indexed in a searchable resource), A1.2 (the protocol allows for an authentication and authorization procedure, where necessary), and A2 (metadata are accessible, even when the data are no longer available), because federated discovery and learning with real world observations across hospitals is planned for future iterations of FAIRification. A1.2 is especially relevant in the case of sensitive patient health data. In practice, we designed a model by extending the DCAT2 based metadata modelFootnote 1 that is to manage the metadata of common datasets. With four additional metadata elements from three standard ontologies, including the property “TYPE” from the DCAT2, the properties “DESCRIBES” and “DATA INPUT OF” from the Allotrope Foundation Ontologies (AFO)Footnote 2, and the property “HAS QUALITY” from the OBO Relations Ontology (RO)Footnote 3, the metadata model features finer semantic granularity. In Fig. 4, we show how we can specify that the BEAT-COVID data resource in our project is a knowledge base, that describes COVID-19, that is supposed to contain data input of clinical studies, and that has synthetic quality by means of these four object property values or edges in the RDF graph. This makes the structured semantics of the metadata of COVID-19 resources richer and more precise. The metadata model is publicly available on GitHub-metadatamodel .

Fig. 4
figure 4

Ontological metadata model instantiated as an RDF graph. The four lower edges are the four additional metadata elements for COVID-19 data resource description

FAIR data point for assessing the metadata of BEAT-COVID patient data

The basic idea of an FDP is to support scalable and transparent “routing” of data resources through stored metadata. The metadata stored and managed by an FDP makes the data resources described by the metadata semantically findable and reusable by machines. As an open gateway, it also makes different data resources accessible under defined constraints. Based on the designed ontological metadata model, we implemented an FDP to describe datasets in Opal and to publish FAIR metadata of these datasets on the Internet as complementary to the Mica system. This FDP publishes structured metadata for machines to automatically find BEAT-COVID datasets and to interpret how to access and use the data stored in Opal, for instance to those algorithms visiting the data with the right access (Fig. 5). Important to the FDP approach is that the data never leave its repository thereby protecting patient data and ensuring only authorized users have access. We performed an automated FAIR assessment of the same dataset from Mica described in the FDP. The results can be found here and showed that various aspects of the metadata description were improved in comparison to the Mica analysis results. For instance, FDP evaluation resulted in better identifier description of the (meta)data. With the publication of the BEAT-COVID resource metadata into the FDP we expect to increase the discoverability of COVID-19 patient data in the LUMC and to enable federated analytics for extended populations. To point out that an FDP is accessible and readable by machines through a REST API, and by humans through a Graphical User Interface (GUI). Note that the BEAT-COVID resource metadata is not all human readable. This is because the GUI of the current version of FDP only renders to the last fragment of a URI (Uniform Resource Identifier). For instance, the URI “” renders to the label “description” and the URI “” renders to the label “EL_00001”. We are working on a more appropriate solution to display the “LABEL” property from RDF SchemaFootnote 4, following the best practice to always provide this label for humans. The FDP is publicly available at

Fig. 5
figure 5

Integration of our ontological approach with existing systems

Integrating the ontological models with the existing research data warehouse

Our next step was to add access to patient measurements as instances of the ontological model (‘ontologised data’) as a feature to the existing RDM. In Fig. 5, we show how ontologised data is integrated with the existing Opal and Mica data management system. Our objective was to use the Opal and Mica systems as a foundation for FAIRification in the LUMC. While the Opal system manages integration of datasets in the hospital, the Mica system adds valuable metadata about the data resources. Even though Opal and Mica do not directly provide semantic modelling functionality, they do provide a basic annotation functionality that we used as the basis for connecting the ontological models. To instantiate the ontological linking data model in RDF, we developed an ‘RDFizer’ Python script as a minimal prototype for patient data FAIRification (see yellow arrow from Opal to Triple Store in Fig. 5). Our current prototype uses CSV files with synthetic cytokine data as input to connect data from Opal to the ontological model that we developed for this data, thereby creating ‘ontologised data’ in RDF. Opal allows exporting datasets to CSV through its export function APIFootnote 5.

Conversely, REST Web APIs can be generated from the ontologised data using the grlc server [38] (see yellow arrow from Triple Store to Opal in Fig. 5). grlc is a tool to automatically convert SPARQL queries into REST Web APIs and make selected RDF data accessible to the Web. Moreover, it can translate SPARQL [39] queries stored and documented in GitHub repositories to Linked Data APIs on the fly. Essentially, it includes an additional DCAT2 data distribution interface (REST APIs) on top of the existing SPARQL endpoint. To demonstrate this additional way of reusing FAIR patient data, we implemented a set of Web API endpoints to retrieve patient data in RDF. We first developed data retrieval SPARQL queries, and then we ‘decorated’ and uploaded them in a GitHubrepository-grlcqueries to be interpreted by the grlc server and build the REST API interface automatically. The SPARQL queries are examples of the potential power to execute sophisticated federated analysis that can be extended as more data resources become available. The Web API endpoints are publicly available at

Querying FAIR patient data with LOD for medical questions

To showcase that the FAIR RDM and the derived data infrastructure allow answering medical questions by querying patient data in terms of the ontological model and together with external open science knowledge, we performed two simple SPARQL queries on the synthetic cytokine data (Table 2). The queries were defined to answer the initial real world medical doctors’ hypothesis related to cytokines FAIR data. From clinical practice, doctors observed different disease courses with different cytokine related immune responses and different prognoses, and potentially different disease molecular mechanisms. To personalize different treatment strategies, doctors need to know what the clinical parameters are that can be used as biomarkers for predicting the disease course of a patient. Cytokine levels could be such biomarkers. To stratify patients, we first defined the query to count the number of patients in the LUMC. Then, we defined a second query to link each measured clinical parameter with biological protein information from external sources in order to build patient cytokine profiles that can characterize individual immune responses at different time points. Queries such as these provide the basis for further analysis of prognostic indicators and disease mechanisms.

Table 2 Example queries using external LOD resources

The first query demonstrates that clinical information from the LUMC can be queried, while the second demonstrates that queries can run across LUMC clinical data and external biomedical databases such as the UniProt protein knowledgebase by means of the federated SPARQL query shown in Fig. 6. The SPARQL queries are available on GitHub-queries . The aforementioned grlc server provides an additional REST Web API for these queries.

Fig. 6
figure 6

Federated SPARQL query crossing FAIR patient data with the UniProt knowledgebase


FAIRification in the hospital

The COVID-19 pandemic revealed how critically important it can be that patient data from multiple systems in the hospital are prepared for instant integrative analysis across those systems, as well as across hospitals and countries. This would be feasible if the hospital had a FAIR RDM plan that implied making patient data available as FDOs and thereby findable, accessible, interoperable, and reusable for computers [1]. However, COVID-19 patient data are not yet natively collected as FAIR data. Therefore, we have described a strategy to facilitate the adoption of the FAIR principles in the hospital based on the FAIR architecture shown in Fig. 1 that complements an existing data management infrastructure. The strategy applies ontologies to increase the interoperability and machine readability of patient data records and patient datasets. We demonstrated that in the hospital (i) ontological models can complement existing data infrastructure, and (ii) they are an appropriate mechanism to formally capture agreement between stakeholders on what their data mean. They combine precise semantics for humans and corresponding actionable semantics for computers. Additional benefits are that they are extendible and they allow replacement with an improved ontological model (or adding multiple models). A similar ontology based approach is also applied to provide patient derived data as FDOs in biomedical and rare disease research such as in the EJP RD [21]. Interestingly, the results that we reused from the EJP RD project were addressing similar requirements as we had for COVID-19 data.

Coordination with different stakeholders

The development of the FAIR RDM plan was made possible by a coordinated interdisciplinary effort. In our experience, FAIRification requires at least data producers, data consumers, and FAIR data modellers [13, 40]. This is because the essential step of capturing the meaning of data in terms of ontologies requires the combined expertise of these stakeholders. In our case, this was available through the BEAT-COVID collaboration. The collaboration is providing user needs, technical requirements, insight in existing procedures and best practices regarding the management of the data lifecycle in the hospital. A clear challenge for our FAIRification process was communication between the different stakeholders with very different backgrounds. This was further hampered by the communication limitations due to the pandemic itself. To mitigate the communication gap, we recorded meetings and shared material that was presented during the meetings. We also plan to organize Bring Your Own Data workshops to make stakeholders who are not FAIR experts more aware of the advantages that FAIR brings [4143]. Under pressure of the urgency of the pandemic, we worked without dedicated FAIR stewards for this project. However, in going forward, this role seems essential to manage the necessary communication between disciplines [44].

Establishing goals for FAIRification

Questions of researchers in the hospital were used as the drivers to establish FAIRification goals and to plan a FAIR RDM. The FAIRification preparation consisted of several meetings with medical doctors and clinical researchers. The focus of the meetings with domain experts was two-fold: (i) to identify the FAIRification goals, and (ii) to extract a set of specific research questions that drive the (meta)data modelling step. Both aims are related, because being able to answer at least the driving research questions is one of the main goals of FAIRification. The list of research questions included ‘What are the clinical parameters that can predict the disease course of a patient?’, ‘What are the biological pathways underlying patient symptoms and disease phenotypes?’, ‘How could biological pathways be positively or adversely affected by a particular treatment?’. The results of these meetings were guiding how data in the hospital should be interrelated and in what context they should be interpreted. We used this to define domain semantics in the context of testing and generating hypothesis with the help of OWL ontologies. The extendibility of ontologies mitigates the risk of limiting applications, because of initial overfitting on driving questions. Wider reusability of the FAIR RDM is a primary objective. To ensure that we are correctly capturing the semantics of knowledge and data, we are also exploring a formal method to validate the (meta)data models by the use of Competency Questions (CQs) and goal modelling. This will again rely on working with domain experts in close interdisciplinary collaboration. These research questions also facilitate communication between people of different expertise.

Technical and social challenges and opportunities

For developing our approach within the BEAT-COVID collaboration, we took into account (i) the emergency of the situation, (ii) that various data management systems are in place at the hospital, (iii) that different types of data need to be prepared for timely exchange and efficient research. Consequently, our challenge was two-fold (i) to adapt our generic FAIRification workflow [12] in a hospital setting, (ii) to require minimal technical knowledge transfer, taking the opportunity of the combined expertise in the hospital that BEAT-COVID brought together. Key to our method is the development of two ontological models, one to enable analysis across clinical data (e.g. symptoms), investigational parameters (e.g. cytokine measurements), and data outside of the hospital, and another to represent the metadata of the patient data resources to increase the findability, accessibility and reusability. A metadata store was deployed conform to the FDP specification to provide access to this metadata. The metadata also includes a reference to access the ontological data. We demonstrated that Linked Data and Semantic Web technologies such as OWL ontologies, Triple Stores and the SPARQL query language provide the means to query patient data across sources in terms of the ontologies (Table 2). Taken together, these provide the FDOs for COVID-19 patient data and the basis for instant integrative federated analysis in the hospital.

While our ontological models aim to reflect our shared understanding of the data, a lack of tools still makes it challenging to transform health data to common data models such as HL7 FHIR [45], and for publishing it to findable resources [46]. There is a need for FAIRifier tools that support stakeholders in a clinical setting in every step from FAIR RDM planning to FAIR data creation, publication, evaluation, and reuse. Integration of FAIR implementations in existing data management tools such as Castor EDC can lower the burden substantially [40]. Similarly, the vocabulary and annotation features of Opal and Mica provide handles for future integration of FAIRification. The reuse of an abstract ontological data model, such as the EJP RD core model, in combination with the implementation of FDPs may further reduce thresholds for implementation and FAIR data reuse. An additional practical and technical challenge thereby is to protect patient identifying information but at the same time to have clinical data available close to real time. Classically most studies would retrieve data in retrospect from patient records. However, in the combat against COVID-19, first analyses were done when patients where still hospital admitted. Advanced data encryption was used to retrieve daily updates from patient records without including retrievable patient identifiers in the research data infrastructure. Although the big commitment of the BEAT-COVID group is facilitating the progress, other challenges for FAIRification in the hospital were ’social’, presumably because stakeholders are not familiar with the steps that are needed to make a resource reusable by computers across multiple locations. We propose that a FAIR data policy is put in place for health research data conform [47]. To pave the way, there are several ongoing efforts to meet the need for education, such as FAIR training for researchers, clinicians and different types of stakeholders in organizations such as ELIXIR TeSS [48] and the EJP RD project for rare diseases.

Patient data accessibility hurdles

Protecting patient data and privacy is a major concern and it is part of FAIRification to make a clear reference to how data are protected. As researchers, we must establish data management mechanisms that ensure that patient privacy is preserved and its usage under control. There are several options to deal with data privacy and safety such as using anonymised datasets, using substitute synthetic representations of sensitive datasets, and having the legal and ethical framework in place for the processing of sensitive personal data in the sense of the GDPR. As first step, the hospital needs to develop and implement a data governance policy that clearly specifies how to extract and apply the data as approved by the patient in the informed consent. Delaying data governance may delay the FAIRification process because it needs to be clear what data will be available and in which form to plan the FAIRification, but also to specify data governance in the metadata of the resource when an algorithm visits the data to use. Then, underdeveloped metadata in data accessibility and data privacy hampers interoperability outside of the hospital. Consequently, it hampers data visiting, which means it hampers federated query and learning over FDPs and, therefore, limits hospital research capacity for analysis. Also, very important for accessibility and data privacy is that the digital objects per se can accommodate the criteria and protocols necessary to comply with regulatory and governance frameworks. Ontologies can aid in opening and protecting patient data by exposing logical definitions of data use conditions. Indeed, there are ontologies to define access and reuse conditions for patient data such as the Informed Consent Ontology (ICO) [24], the Global Alliance for Genomics and Health Data Use Ontology (DUO) standard [25], and the Open Digital Rights Language (ODRL) vocabulary recommended by W3CFootnote 6. The first two are OBO ontologies for the formal specification the former of the patient informed consent and its process for research studies in the medical field, and the latter of the consented data use conditions and restrictions for research with large genomics and health data repositories. We are furthermore considering if the ODRL can serve as a common language to express access permissions for machines, similar to how DCAT2 provides a common language for resource metadata. Finally, it is worth noting that privacy preserving methods are available if data of the same person in multiple systems are required for a federated analysis [49, 50].

International adoption of the FAIR principles for health data of hospitalised patients

The method for FAIRification that we described is focused on patient derived health data, down to the data record level. Two main outcomes are that we produced FAIR data for hospitalized patients, and we demonstrated that this data is instantly reusable for various secondary uses: for building software applications (and analysis workflows) via REST Web APIs, for querying cross-domain patient data and open public knowledge to add richer context to answer healthcare questions. While there are several projects that develop FAIRification procedures, they predominantly focus on life sciences data [14, 15, 51]. FAIR data in health is gaining momentum, and we already can find dedicated projects such as FAIR4Health [52] to use FAIR data in health to improve research. Our method has the same basis as the procedure followed earlier for rare disease patient registries (e.g. VASCA [13]), but here we integrated it with the hospital infrastructure, and demonstrated how the adoption of FAIR principles can be facilitated in the hospital through interdisciplinary collaboration. Hence, our experience may be valuable to national and global consensus on implementing FAIR principles in hospitals by the clinical community. For instance, the Dutch national Health Research Infrastructure (Health-RI) has stated that data stewardship at the Dutch University Medical Centres should adhere to the FAIR Principles [53]. Similar nationwide initiatives to improve health data reuse can be seen in Switzerland (Swiss Personalized Health Network [18]) and Germany (NFDI4Health [19]). These initiatives rely on a federated infrastructure, enhanced data interoperability and data linkage in compliance with privacy regulations for research. Our example has shown that FAIRification within the hospital can contribute to this infrastructure.

Limitations and future work

We observed a number of limitations of our approach to enabling instant analysis of COVID-19 data across multiple hospital systems. First, we observed that the interdisciplinary collaboration and the willingness to implement FAIR principles, because of the pandemic, are not sufficient to provide easy access to data for implementing the FAIR services. A partial solution, at least to speed up the deployment of the FAIR services, could be to have synthetic patient data available. This could, for instance, be instantiated by Synthea [54, 55] from data in HL7 FHIR format. Second, at this time we have not incorporated a way to formally express patient consent and data usage conditions in our FAIR metadata. Currently, there are several efforts in human data communities to identify which elements are required, and standards are under development to capture these in machine readable ontological form, such as by ICO and DUO. These can be linked into our FDP metadata model in the future. Third, we have not specifically addressed tooling (including standards) to support hospital data stewards in FAIR data management. This could pertain to tools for capturing FAIRification goals, ontological data modelling, data conversion, and mapping. Data modelling and mapping were the most time consuming steps. For some of our data types it was difficult to identify an appropriate ontology term that we could incorporate trivially in our OBO-based application ontology. For instance, to map 103 cytokine measurement datum types, we needed two different ontologiesFootnote 7Footnote 8), which is not a best practice. The majority could be mapped to the Experimental Factor Ontology (EFO) [56], which is not an OBO ontology. And, we could not find some specific data types in any ontology. Therefore, we mapped them to a more general class, for instance we mapped specific interleukins measurement datum types such as for ‘interleukin-11’, ‘interleukin-26’ or ‘interleukin-32’ (among others) to the general data item class ‘blood interleukin measurement’Footnote 9, which is the superclass of ‘blood interleukin-6 level’ classFootnote 10, or we mapped specific measurement process types such as for ‘Tumor Necrosis Factor Ligand Superfamily Member 14’ cytokine to the general process class ‘Cytokine Measurement’Footnote 11. We expect new limitations once we analyze new omics datasets and clinical observations. Also, tools that evaluate the ‘FAIRness’ of data can guide the FAIRification process. This partly depends on the standards used by the domain of the data community providers [57], but it is not always clear what these standards are, if any. Current ongoing work in the FAIRification ’world’ is to identify these community specific FAIR requirements and implementation choices. For instance, we envision as future work the establishment of FAIR maturity indicators for clinical data. Finally, we aim to progress on the opportunities for advancing research with FAIR patient data, further developing a FAIR Web API service to complement Opal APIs and knowledge graph based learning techniques. We would like to highlight the following developments.

Evaluation of ontological data models

We are evaluating the ontological models using CQs that are based on realistic questions posed by data model users [58], which are proposed as means to verify the scope (e.g., what is relevant to solve the challenges) and the relationships between concepts (e.g., check for missing or redundant relationships). A preliminary set of CQs from meetings with domain experts is available on GitHub-CQs .

COVID-19 hypothesis generation tool

We are developing a COVID-19 Hypothesis Generation tool for the LUMC based on the structured reviews for data and knowledge driven framework [59], as a means to exploit the FAIRification work for aiding medical doctors and researchers to answer their research questions. This framework has previously been used to support rare disease researchers to explore hypotheses as paths in case specific knowledge graph for their observations in the lab. After creating a preliminary knowledge graph with the FAIR synthetic cytokine data, we aim to incorporate background knowledge. The preliminary knowledge graph is available for browsing at LUMCBEAT-COVIDKnowledgeGraph .

Federated analytics across hospitals

We also aim to show how this FAIR infrastructure allows to query FAIR data from the BEAT-COVID project in the LUMC across other hospitals’ FAIR data without data leaving their source, i.e. the ‘data visiting’ approach. In the VODAN project, the GO FAIR VODAN in a box FDP [60] was used to test the trains and tracks of the PHT concept [61] and demonstrated the first intercontinental FDP SPARQL VODAN Africa proof of concept [62] developed by VODAN Africa and Asia - GO FAIR [2] query AllegroGraph WebView [63]. Secure FDP technology testing must be developed to implement trusted access control policies and to enable visiting synthetic datasets and pseudo-anonymised healthcare data. We aim to build on the VODAN and TWOC experiences and prepare an FDP instance that publishes BEAT-COVID metadata to be automatically found and used in trusted automated analytics workflows across multiple hospitals.


We demonstrated that a FAIR research data management plan approach based on ontological models, open Science, Semantic Web technologies, and FDPs is a powerful method for generating FAIR patient data at source. FAIRification is providing data infrastructure that improves findability, accessibility, interoperability and reusability of patient real world observations in the hospital. Most importantly, we shown that FAIR patient data is machine actionable as digital objects linkable to LOD for analysis and ready to be used to develop applications for hypothesis generation and knowledge discovery on top. Finally, this work (in progress) showed what FAIRification entails in a real world hospital situation with existing infrastructure, different stakeholders and departments and the GDPR, and we discussed obstacles, challenges, solutions and future directions. We aim to provide a state of the art research data infrastructure in the hospital to deliver a federated solution enabling data access across the country and international borders, and accelerating research and translation to healthcare.

Materials and methods


FAIR digital objects and globally unique persistent identifiers (GUPRIDs)

The FAIR principles, specifically F1, include the requirement that metadata and data should be identified by GUPRIDs. In addition to this, the FAIR principle A1 requires that metadata and data are retrievable by their identifiers using a standardized communications protocol. As such, we set up our persistent identifiers according to these requirements for data and metadata (and the FDP itself as well):

Data The patient synthetic cytokines lab measurements dataset, which in turn is described by metadata records as FDOs themselves, is identified and retrievable by the W3ID persistent identifier serviceFootnote 12 base, e.g. the RDF distribution GUPRID is, and accessible through the LUMCBEAT-COVIDFDP .

Metadata The metadata of the patient cytokines dataset is identified and retrievable by the PURL persistent identifier serviceFootnote 13 base and the GUPRID is, and accessible through the LUMCBEAT-COVIDFDP .


We mapped to Open Biological Biomedical Ontologies or OBO ontologies to facilitate biomedical integrative analytics since these ontologies are developed to be interoperable, logically well-formed and scientifically accurate by the community following the OBO principles [30, 31]. For data annotation with OBO ontologies we mainly used the Ontobee software systemFootnote 14, the Ontology Lookup Service from the EBIFootnote 15, and the NCBO BioPortalFootnote 16 as search engines to find ontological terms. See the description of the ontologies used for each model below.

Data model. For basic knowledge representation in RDF: RDF vocabulary or RDFFootnote 17, RDF Schema or RDFSFootnote 18, DCMI Metadata Terms – Dublin Core or DCTFootnote 19, XML Schema or XSDFootnote 20. For general Science and provenance representation: Semanticscience Integrated Ontology or SIOFootnote 21, The PROV ontology or PROV-OFootnote 22. For biological and biomedical domain representation: OBO ontologiesFootnote 23 (such as NCIT, IAO, OBI, RO, CMO and LABO), the Experimental Factor Ontology or EFOFootnote 24. For the BEAT-COVID study specific representation: The BEAT-COVID Ontology or BCOFootnote 25 developed for the formal representation of cytokine data model in OWL2.

Metadata model. For basic DCAT based metadata representation: RDF Vocabulary or RDFFootnote 26, Data Catalog Vocabulary - version 2 or DCAT2Footnote 27, DCMI Metadata Terms – Dublin Core or DCTFootnote 28, FOAF Vocabulary or FOAFFootnote 29. For the BEAT-COVID study representation: RDF Schema or RDFSFootnote 30, XML Schema or XSDFootnote 31, FDP Ontology or FDP-OFootnote 32, the W3C Linked Data Platform Vocabulary or LDPFootnote 33, OBO ontologiesFootnote 34 (such as NCIT, MONDO, IAO, RO, OGMS, EXO and DO), Semanticscience Integrated Ontology or SIOFootnote 35, Wikidata VocabularyFootnote 36, Allotrope Foundation Ontology or AFXFootnote 37, the DataCite OntologyFootnote 38.


The BEAT-COVID dataset we based our ontological models was an anonymized longitudinal set of cytokine levels measured on COVID-19 hospitalized patients in the LUMC. We created a cut shorter and synthetic pilot version of the dataset to proof our concept approach for FAIRification in the hospital. We created the synthetic dataset using randomization functions in excel. The synthetic dataset contains 9 rows of measurement records on 103 cytokines performed in 4 different panels using Luminex technology. The dataset contains basic information for each record, such as the record timestamp, the date of sampling, the age of the patient, the date of measurement and the cytokine levels. Example to data records in tabular format is available on this GitHub-syntheticdatalink.


We used several technologies in the different steps of our method. The FAIRification tools and versions used are described within each step in the Methods section below. The software and tools we used to build three different applications on top of FAIR data, were:

Data analytics with Semantic Web technologies. We used the W3C recommended SPARQL query language [39] to perform data analytics over the LUMC RDF patient data and across diverse external data sources in LOD. We used the free edition of GraphDB Triple Store v9.7.0, where the data is natively stored as RDF.

Web API development. We used grlc v1.3.6 [38] to enable programmatic access to FAIR data in the hospital. Grlc is a lightweight server that automatically builds consistent, well documented and neatly organized Linked Data APIs on the fly, with no input required from users beyond a URL path to a GitHub repository hosting a set of SPARQL queries that complies with the specific grlc syntaxFootnote 39. It provides three basic operations: 1. generates the Swagger spec of a specified GitHub repository; 2. generates the Swagger UI to provide an interactive user facing frontend of the API contents; and 3. translates SPARQL queries into HTTP requests to call the operations of the API against a SPARQL endpoint with parameters set in the queries.

Hypothesis generation tool. We used the Neo4j graph database framework [64] as used in the structured reviews approach [59] for storage, management and mining of FAIR patient data. The graph database technology has been shown to facilitate management and exploration of biomedical knowledge [65]. Neo4j graph database enables users to query the knowledge graph using the Cypher query language, either through an API or a GUI. RDF data was imported into the Neo4j Community Server v4.2.5 graph database through the Neo4j neosemantics toolkit v4.2.0 [66].

Note that we created a GUPRID for each software/service application based on the W3ID persistent identifier service, i.e. for the FDP, for the Triple Store, and for the Neo4j browser (see Availability of data and materials section).


We defined and implemented a method to make COVID-19 observational patient data in the hospital FAIR. This method is described in a detailed FAIRification workflow illustrated in Fig. 7 and is an adapted version of the workflow presented by Jacobsen et al. [12]. We explicitly add the result obtained in each step, where applicable. We also include in which steps the FAIR experts worked in collaboration with other members of the BEAT-COVID group.

Fig. 7
figure 7

BEAT-COVID FAIRification workflow to make the data management and infrastructure in the hospital more FAIR. Collaborators and results are described in every step where applicable


Step 1: identify FAIRification objective

The first step was to determine the objective for making COVID-19 observational patient data FAIR in the hospital to define the specific FAIR requirements, implementations and workflow of this study. Medical doctors have pressing questions at point of care such as ‘What are the clinical parameters that can predict the disease course of a patient?’, ‘What are the biological pathways underlying patient symptoms and disease phenotypes?’, and ‘How can a patient be positively or adversely affected by a particular treatment?’. The FAIRification objective was therefore to prepare the diverse COVID-19 observational patient data to answer these questions. To this end, data needs to be integrated in a network and systems medicine approach [67], combined with external biomedical knowledge, and ready for computational analysis as illustrated in Fig. 1.

Step 2 and step 3: analyze data and metadata

Research data management in the hospital. From admission date until discharge, patient data were collected by different departments. The types of COVID-19 observational data relevant for research, and so for FAIRification, were diverse: demographics information, clinical information, laboratory measurements, transcriptomics (RNA-Seq) data, metabolomics data, and if the patient was transferred to ICU, then data related to ICU outcome. The format depends on the different EDC systems used. Within LUMC, clinical and preclinical information were collected in HiX [68] and Castor EDCs [26], whereas ICU data was managed by the MetaVision software [69]. These EDC systems have different data access interfaces and use different technologies. To provide a single point of data access, research data were combined in the Opal data warehousing system. Opal is the OBiBa’s (Open Source Software for Epidemiology) core database application to store data in central data repositories that integrate under a uniform interface data collected from multiple sources, and it provides tools to import, transform and describe data [27]. Patient data was anonymised before importing it into Opal using advanced data encryption. Descriptions of the datasets, i.e metadata, stored in Opal were published on the Web through the Mica software application. Mica is used to create Web data portals for large scale studies or multiple study consortia. It provides a structured description of consortia, study catalogues and datasets, annotated and searchable data dictionaries, and data access request management. It is built upon a multitier architecture consisting of a REST application server for data management and administration, and clients to create and display data on the Web [28]. Opal and Mica are two standalone but interoperable software applications that provide features for management, harmonization, and analysis of epidemiological datasets [29, 70].

FAIR analysis of COVID-19 observational patient data. To improve the findability, accessibility, interoperability, and reusability of digital assets, we performed a FAIR analysis of (meta)data, i.e. an analysis of the FAIR status of data and metadata. We analysed data and databases to evaluate the FAIRification effort needed [12]. We started by analysing observational clinical measurements. We first got access to laboratory measurements of immunoresponse clinical parameters, cytokine levels, collected on different time points per patient to monitor its condition progress. Access to data was provided to us as an anonymised dataset. Then, we analysed the databases where these data were stored, which were first in Castor databases since this was the primary EDC system used in the hospital, second in Opal data warehouse since this system was used to integrate and store data from the various data sources. We investigated the representation (structure and format) and meaning (semantics) of the data, and the tools and technologies of each database system to optimize the FAIRification process of data.


Step 2a and step 4a: improving interoperability with semantic web technologies and a linking data model

We described a synthetic cytokines dataset with ontologies. In Europe, GDPR imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the EU. To comply with GDPR, we created a synthetic dataset of cytokine measurements, i.e. substituted synthetic representations of sensitive datasets, by using randomization for modelling patient data. This dataset contains basic information related to cytokine measurements and biosamples used per patient and time point, and a patient clinical identifier to link to clinical data. With the goals to answer research questions of medical doctors and make patient data machine readable to enable interoperability within data resources in the hospital and with external open science datasets such as LOD, we designed ontological models for cytokine lab measurements, biosamples and severity scores to represent data based on the Linked Data principles [71] and Semantic Web technologies such as the W3C recommended RDF and OWL standards [32, 33]. Our approach was to define a conceptual model as an abstract and reusable model to capture as much of patient data (measurements, biosamples and score phenotypes), by using standard common schemas and well established ontologies and vocabularies widely used by the biomedical community such as the ones in the OBO Foundry [30]. With this approach we created an ontological linking model for cytokines measurements dataset from the laboratory.

Step 3a and step 4b: improving findability, accessibility, interoperability and reusability with semantic web technologies, a metadata model and FAIR data points

With the goals to answer research questions of medical doctors and make resource metadata human and machine readable to enable cross-resource data analytics, we designed a metadata ontological model and implemented an FDP instance [23] to make LUMC COVID-19 digital objects findable for machines on the Internet. An FDP is a Web application that enables data owners to expose information about their datasets using rich machine actionable metadata. It allows creating, storing, and serving FAIR metadata about datasets and its distributions for both humans and machines. An FDP does not enable open access, but the metadata is expected to include information about what the resource contains and how datasets and content can be accessed under defined conditions. Opening up FAIR (meta)data by publishing them on an FDP allows algorithms to search these (meta)data, looking for patterns [72]. Mica is a tool to expose datasets from an Opal database on the Internet through Web portals that allow (meta)data descriptions. An FDP provides additional means to expose FAIR metadata, i.e. machine actionable, via the FDP specification, a standardized metadata ontological model based on DCAT [22]. FDP also exposes (meta)data via a REST Web API that enables client applications to automate retrieval, aggregation and filtering (meta)data from distributed FDPs. We used FDP v1.10.0.

Step 5 and step 6: make (meta)data as linked data and host FAIR data

To host and publish patient data, we cut the original synthetic cytokine patient dataset into a few rows. We generated patient Linked Data using this synthetic patient data we created as input and instantiating the linking data ontological model. To do it we developed ‘RDFizer’ a FAIRification tool in Python 3 that parses and converts the synthetic data CSV file into RDF. To host the generated FAIR data, we used the free edition of GraphDB Triple Store [73] v9.7.0 where the data is natively stored as RDF. We implemented an FDP instance where the metadata ontological model is described and published as DCAT based Linked Data.


Step 7: assessment and software applications

Evaluation We evaluated the discoverability of the BEAT-COVID resource by means of the FAIR Maturity Indicators evaluator tool [74]. We have evaluated our ontological models by means of several CQs [58] (in progress). We have answered the questions using SPARQL queries for the sake of reusability, then users can reuse the queries if they want updated answers in the future.

Built applications on top of FAIR data. We implemented three different applications: 1. SPARQL federated queries for data analytics with Semantic Web technologies; 2. Web API service for programmatic access; 3. Knowledge graph based hypothesis generation tool. See software details in the Materials section.

Severity score calculation

The severity score is based on the 4C mortality score developed by Knight et al. [75]. The 4C mortality score is a prediction score calculated at admission. The severity score calculated in our cohort represents the daily clinical disease severity, and thus is dependent on parameters that can change from day to day. Therefore, the fixed parameters of the 4C score were removed (i.e. age, sex at birth, number of comorbidities), and daily oxygen flow for non-ICU patients (l/min) and p/f ratio (kPa) and FiO2 (%) for ICU patients were added to our severity score.

Availability of data and materials

The datasets supporting the conclusions of this article are available in the following repositories. The ontological models, the SPARQL queries, grlc SPARQL queries, SPARQL CQs and scripts are freely available at the Biosemantics (GitHub): The data model is available at

The metadata model is available at

Synthetic cytokine patient dataset in CSV is available at

Source code for RDFizer is available at

COVID-19 synthetic patient cytokine knowledge graph in RDF is available at

RDF data is accessible through the LUMC BEAT-COVID FDP at

Source code for FDP implementation is freely available at the FAIRDataPoint at

RDF is queryable through the Beat-COVID Triple Store at

SPARQL queries are available at

grlc endpoint APIs are available at

grlc SPARQL queries are available at

LUMC BEAT-COVID Knowledge graph is available for browsing at

Evaluations: FAIR assessment results of a dataset described in Mica are available at!/evaluations/4081, and the FAIR assessment results of the same dataset, but described in a FDP are available at!/evaluations/5589

SPARQL CQs are available at

Figures: All model figures both in this manuscript and in GitHub project repository were automatically produced using the corresponding RDF/Turtle file as input and the Web drawing tool at








  7. The Clinical Measurement Ontology (CMO), OBO Foundry,

  8. The Experimental Factor Ontology (EFO), EMBL-EBI,

  9. The ’blood interleukin measurement’ class from CMO, OBO Foundry,

  10. The ’blood interleukin-6 level’ class from CMO, OBO Foundry,

  11. The ’Cytokine Measurement’ class from the NCI Thesaurus OBO Edition (NCIT), OBO Foundry,














  25. owl/cytokine_ontological_model.owl












  37. voc--afo--REC--2021--03--afo_type=pyLodeDoc.html





Findable, Accessible, Interoperable and Reusable


Virus Outbreak Data Network


Trusted World of Corona


FAIR Digital Object


General Data Protection Regulation


Leiden University Medical Center


Research Data Management


European Joint Programme Rare Diseases


Data Catalogue Vocabulary


FAIR Data Point


Linked Open Data


Electronic Data Capture


Open Biological Biomedical Ontologies


Web Ontology Language


Resource Description Framework


Intensive Care Unit


Graphical User Interface


Uniform Resource Identifier


Competency Question


  1. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, et al.The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016; 3:160018.

    Article  Google Scholar 

  2. GO FAIR. Virus Outbreak Data Network (VODAN). 2021. Accessed 23 Jul 2021.

  3. ZonMw. COVID-19 Programme. 2021. Accessed 23 Jul 2021.

  4. Health Holland. Trusted World of Corona (TWOC). 2021. Accessed 23 Jul 2021.

  5. ELIXIR. ELIXIR COVID-19 Services. 2021. Accessed 27 Jul 2021.

  6. Luiz Olavo Bonino da Silva Santos. FAIR Digital Object Framework. 2020. Accessed 27 Jul 2021.

  7. Lamprecht AL, Garcia L, et al.Towards fair principles for research software. Data Sci. 2020; 3:37–59.

    Article  Google Scholar 

  8. GO FAIR. Data Together COVID-19 Appeal and Actions. 2020. Accessed 23 Jul 2021.

  9. van Soest J, Sun C, Mussmann O, et al.Using the personal health train for automated and privacy-preserving analytics on vertically partitioned data. Stud Health Tech Inf. 2018; 247:581–5.

    Google Scholar 

  10. Beyan O, Choudhury A, van Soest J, Kohlbacher O, Zimmermann L, Stenzhorn H, Md. Karim R, Dumontier M, Decker S, da Silva Santos LOB, Dekker A. Distributed analytics on sensitive medical data: The personal health train. Data Intell. 2020; 2:96–107.

    Article  Google Scholar 

  11. Landi A, Thompson M, Giannuzzi V, Bonifazi F, Labastida I, da Silva Santos LOB, Roos M. The “A” of FAIR – As Open as Possible, as Closed as Necessary. Data Intell. 2020; 2(1-2):47–55.

    Article  Google Scholar 

  12. Jacobsen A, Kaliyaperumal R, da Silva Santos LOB, Mons B, Schultes E, Roos M, Thompson M. A generic workflow for the data fairification process. Data Intell. 2020; 2:56–65.

    Article  Google Scholar 

  13. Groenen KHJ, Jacobsen A, Kersloot MG, Vieira B, van Enckevort E, Kaliyaperumal R, Arts DL, ‘t Hoen PAC, Cornet R, Roos M, Kool LS. The de novo fairification process of a registry for vascular anomalies. medRxiv. 2020.

  14. FAIRplus project: FAIR Cookbook. The FAIR Cookbook: a deliverable of the FAIRplus project (grant agreement 802750), funded by the IMI programme, a private-public partnership that receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. 2019. Accessed 26 Jul 2021.

  15. Innovative Medicine Initiative. FAIRplus project. 2019. Accessed 26 Jul 2021.

  16. GO FAIR VODAN. A three-point framework for FAIRification. 2020. Accessed 28 Jul 2021.

  17. dos Santos Vieira B, et al.A de novo fairification process for rare disease registries. In: Abstracts of the International Congress of Research on Rare and Orphan Diseases: January 13-15, 2021; Online: 2021. p. 67.

  18. Swiss Academy of Medical Sciences. Swiss Personalized Health Network. 2020. Accessed 28 Jul 2021.

  19. NFDI, 4Health. NFDI4Health Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten. 2021. Accessed 28 Jul 2021.

  20. Roukens AHE, König M, Dalebout T, Tak T, Azimi S, Kruize Y, Pothast CR, Hagedoorn RS, Arbous SM, Zhang JLH, Verheij M, Prins C, van der Does AM, Hiemstra PS, de Vries JJC, Janse JJ, Roestenberg M, Myeni SK, Kikkert M, Heemskerk MHM, Yazdanbakhsh M, Smits HH, Jochems SP, group B-C. Prolonged activation of nasal immune cell populations and development of tissue-resident sars-cov-2 specific cd8 t cell responses following covid-19. medRxiv. 2021.

  21. Kaliyaperumal R, Wilkinson MD, Alarcón Moreno P, Benis N, Cornet R, dos Santos Vieira B, Dumontier M, Bernabé CH, Jacobsen A, Le Cornec CMA, Godoy MP, Queralt-Rosinach N, Schultze Kool LJ, Swertz MA, van Damme P, van der Velde KJ, van Lin N, Zhang S, Roos M. Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data. medRxiv. 2021.

  22. W, 3C. DCAT2 W3C Homepage. 2020. Accessed 24 Aug 2020.

  23. da Silva Santos LOB, Wilkinson MD, Kuzniar A, Kaliyaperumal R, Thompson M, Dumontier M, Burger K. Fair data points supporting big data interoperability. Lond ISTE Press. 2016; 4:270–9.

    Google Scholar 

  24. Lin Y, Harris MR, Manion FJ, Eisenhauer E, Zhao B, Shi W, Karnovsky A, He Y. Development of a BFO-Based Informed Consent Ontology (ICO) In: Hogan WR, Arabandi S, Brochhausen M, editors. Proceedings of the 5th International Conference on Biomedical Ontology, ICBO 2014, Houston, Texas, USA, October 8-9, 2014. Aachen, Germany: CEUR Workshop Proceedings: 2014. p. 84–86.,, dblpcomputersciencebibliography,

    Google Scholar 

  25. Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, Bowers SR, Boyles RR, Brookes AJ, Brush M, Burdett T, Clissold H, Donnelly S, Dyke SOM, Freeberg MA, Haendel MA, Hata C, Holub P, Jeanson F, Jene A, Kawashima M, Kawashima S, Konopko M, Kyomugisha I, Li H, Linden M, Rodriguez LL, Morita M, Mulder N, Muller J, Nagaie S, Nasir J, Ogishima S, Ota Wang V, Paglione LD, Pandya RN, Parkinson H, Philippakis AA, Prasser F, Rambla J, Reinold K, Rushton GA, Saltzman A, Saunders G, Sofia HJ, Spalding JD, Swertz MA, Tulchinsky I, van Enckevort EJ, Varma S, Voisin C, Yamamoto N, Yamasaki C, Zass L, Guidry Auvil JM, Nyrönen TH, Courtot M. The data use ontology to streamline responsible access to human biomedical datasets. Cell Genomics. 2021; 1(2):100028.

    Article  Google Scholar 

  26. castor. Castor Homepage. 2021. Accessed 20 Aug 2020.

  27. OBiBa. Opal OBiBa’s software Homepage. 2020. Accessed 20 Aug 2020.

  28. OBiBa. Mica OBiBa’s software Homepage. 2020. Accessed 20 Aug 2020.

  29. Doiron D, Marcon Y, Fortier I, Burton P, Ferretti V. Software application profile: Opal and mica: open-source software solutions for epidemiological data management, harmonization and dissemination. Int J Epidemiol. 2017; 46(5):1372–8.

    Article  Google Scholar 

  30. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone S-A, Scheuermann RH, Shah N, Whetzel PL, Lewis S. The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration. Nat Biotechnol. 2007; 25(11):1251–5.

    Article  Google Scholar 

  31. Jackson R, Matentzoglu N, Overton JA, Vita R, Balhoff JP, Buttigieg PL, Carbon S, Courtot M, Diehl AD, Dooley DM, Duncan WD, Harris NL, Haendel MA, Lewis SE, Natale DA, Osumi-Sutherland D, Ruttenberg A, Schriml LM, Smith B, Stoeckert Jr CJ, Vasilevsky NA, Walls RL, Zheng J, Mungall CJ, Peters B. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. Database. 2021; 2021.

  32. OWL W3C Homepage. Accessed 21 Aug 2020.

  33. RDF W3C Homepage. Accessed 21 Aug 2020.

  34. EJP RD. EJP RD core CDE semantic model. Accessed 18 Oct 2020.

  35. Queralt-Rosinach N, Bello S, Hoehndorf R, Weiland C, Rocca-Serra P, Schofield PN. Modeling quantitative traits for covid-19 case reports. medRxiv. 2020.

  36. Acute Physiology And Chronic Health Evaluation (APACHE). APACHE IV Score. 2021. Accessed 28 Jul 2021.

  37. Sequential Organ Failure Assessment (SOFA). SOFA Score. 2021. Accessed 28 Jul 2021.

  38. Meroño-Peñuela A, Hoekstra R. grlc makes github taste like linked data apis In: Sack H, Rizzo G, Steinmetz N, Mladenic D, Auer S, Lange C, editors. The Semantic Web. ESWC 2016. Lecture Notes in Computer Science, vol 9989: 2016.

  39. SPARQL W3C Homepage. Accessed 21 Aug 2020.

  40. Kersloot MG, Jacobsen A, Groenen KHJ, Vieira BdS, Kaliyaperumal R, Abu-Hanna A, Cornet R, ‘t Hoen PAC, Roos M, Kool LS, Arts DL. De-novo fairification via an electronic data capture system by automated transformation of filled electronic case report forms into machine-readable data. medRxiv. 2021.

  41. Roos M, Lopes P. Bring your own data parties and beyond: make your data linkable to speed up rare disease research. Rare Dis Orphan Drugs Int J Publ Health. 2014; 1(4):21.

    Google Scholar 

  42. Roos M, et al.Bring your own data workshops: a mechanism to aid data owners to comply with linked data best practices In: Paschke A, Burger A, Romano P, Marshall MS, Splendiani A, editors. Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences: December 9-11, 2014; Berlin, Germany: 2014. p. 16–27.

  43. ELIXIR-EXCELERATE. Bring Your Own Data. 2019. Accessed 28 Jul 2021.

  44. Scholtens S, Jetten M, Böhmer J, Staiger C, Slouwerhof I, van der Geest M, van Gelder CWG. Final report: Towards FAIR data steward as profession for the lifesciences. Report of a ZonMw funded collaborative approach built on existing expertise. 2019.

  45. HL, 7 International. HL7 FHIR Homepage. 2019. Accessed 18 Oct 2020.

  46. Löbe M, Matthies F, Stäubert S, Meineke FA, Winter A. Problems in fairifying medical datasets In: Pape-Haugaard LB, Lovis C, Madsen IC, Weber P, Nielsen PH, Scott P, editors. Digital Personalized Health and Medicine - Proceedings of MIE 2020, Medical Informatics Europe, Geneva, Switzerland, April 28 - May 1, 2020. Studies in Health Technology and Informatics: 2020. p. 392–6.

  47. FAIR, 4Health. D2.3. Guidelines for implementing FAIR Open Data policy in health research. 2019. Accessed 24 Ago 2020.

  48. ELIXIR. ELIXIR’s Training Portal. 2020. Accessed 28 Jul 2021.

  49. Ammenwerth E, Hörbst A, Hayn D, Schreier G. eHealth2014 - Health Informatics Meets eHealth, Studies in Health Technology and Informatics. Amsterdam, The Netherlands: IOS Press; 2014, p. 272.

    Google Scholar 

  50. Baker DB, Knoppers BM, Phillips M, van Enckevort D, Kaufmann P, Lochmuller H, Taruscio D. Privacy-preserving linkage of genomic and clinical data sets. IEEE/ACM Trans Comput Biol Bioinforma. 2019; 16(4):1342–8.

    Article  Google Scholar 

  51. ELIXIR. RDMkit. 2021. Accessed 26 Jul 2021.

  52. FAIR, 4Health. FAIR4Health. 2019. Accessed 28 Jul 2021.

  53. Health-RI. FAIR Principles. 2019. Accessed 28 Jul 2021.

  54. Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inf Assoc. 2017; 25(3):230–8.

    Article  Google Scholar 

  55. MITRE. Synthea. 2021. Accessed 28 Jul 2021.

  56. Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, Zhukova A, Brazma A, Parkinson H. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics. 2010; 26(8):1112–8.

    Article  Google Scholar 

  57. Jacobsen A, de Miranda Azevedo R, Juty N, Batista D, Coles S, Cornet R, Courtot M, Crosas M, Dumontier M, Evelo CT, Goble C, Guizzardi G, Hansen KK, Hasnain A, Hettne K, Heringa J, Hooft RWW, Imming M, Jeffery KG, Kaliyaperumal R, Kersloot MG, Kirkpatrick CR, Kuhn T, Labastida I, Magagna B, McQuilton P, Meyers N, Montesanti A, van Reisen M, Rocca-Serra P, Pergl R, Sansone S-A, da Silva Santos LOB, Schneider J, Strawn G, Thompson M, Waagmeester A, Weigel T, Wilkinson MD, Willighagen EL, Wittenburg P, Roos M, Mons B, Schultes E. FAIR Principles: Interpretations and Implementation Considerations. Data Intell. 2020; 2(1-2):10–29.

    Article  Google Scholar 

  58. Grüninger M, Fox MS. Methodology for the Design and Evaluation of Ontologies. In: International Joint Conferences on Artificial Intelligence (IJCAI), Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal, Canada, April 13, 1995. San Francisco, CA, United States: Morgan Kaufmann Publishers Inc.: 1995.

    Google Scholar 

  59. Queralt-Rosinach N, Stupp GS, Li TS, Mayers M, Hoatlin ME, Might M, Good BM, Su AI. Structured reviews for data and knowledge-driven research. Database. 2020; 2020.

  60. GO FAIR. VODAN in a Box: the all in one solution for easy instalment of VODAN FAIR Data Points. 2020. Accessed 28 Jul 2021.

  61. Health-RI. Personal Health Train. 2019. Accessed 28 Jul 2021.

  62. GO FAIR. Proof of Concept developed by VODAN Africa and Asia. 2020. Accessed 28 Jul 2021.

  63. GO FAIR. AllegroGraph WebView ( 2020. Accessed 28 Jul 2021.

  64. Neo, 4j. Neo4j Graph Database Homepage. 2019. Accessed 29 Jul 2021.

  65. Lysenko A, Roznovat IA, Saqi M, et al.Representing and querying disease networks using graph databases. BioData Min. 2016; 9(23).

  66. Neo, 4j. neosemantics (n10s): Neo4j RDF & Semantics toolkit. 2019. Accessed 29 Jul 2021.

  67. Comte B, Baumbach J, Benis A, et al.Network and systems medicine: Position paper of the european collaboration on science and technology action on open multiscale systems medicine. Netw Syst Med. 2020; 3(1):67–90.

    Article  Google Scholar 

  68. ChipSoft. HiX Homepage. 2020. Accessed 18 Oct 2020.

  69. iMDsoft. MetaVision iMDsoft Homepage. 2017. Accessed 20 Aug 2020.

  70. Gaye A, Marcon Y, Isaeva J, et al.Datashield: taking the analysis to the data, not the data to the analysis. Int J Epidemiol. 2014; 43(6):1929–44.

    Article  Google Scholar 

  71. Bizer C, Heath T, Berners-Lee T. Linked data - the story so far. Int J Semant Web Inf Syst. 2009; 5:1–22.

    Google Scholar 

  72. Data F. FDP specification Homepage. 2021. Accessed 24 Aug 2020.

  73. Ontotext. GraphDB Homepage. 2015. Accessed 29 Jul 2021.

  74. Wilkinson MD, Dumontier M, Sansone SA, et al.Evaluating fair maturity through a scalable, automated, community-governed framework. Sci Data. 2019; 174(6).

  75. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, Dunning J, Fairfield CJ, Gamble C, Green CA, Gupta R, Halpin S, Hardwick HE, Holden KA, Horby PW, Jackson C, Mclean KA, Merson L, Nguyen-Van-Tam JS, Norman L, Noursadeghi M, Olliaro PL, Pritchard MG, Russell CD, Shaw CA, Sheikh A, Solomon T, Sudlow C, Swann OV, Turtle LC, Openshaw PJ, Baillie JK, Semple MG, Docherty AB, Harrison EM. Risk stratification of patients admitted to hospital with covid-19 using the isaric who clinical characterisation protocol: development and validation of the 4c mortality score. BMJ. 2020; 370.

Download references


We would like to specially thank Eleni Mina, Tooba Abassi-Daloii, Daniël Wijnbergen, Winette Koning, Luiz Olavo Bonino da Silva and Katy Wolstencroft. We would also like to thank our EJP RD colleagues Peter-Bram ’t Hoen and Mark Wilkinson for all the discussions. Finally, we would like to thank Professor Barend Mons for inspiring us to make a real difference in data sharing and knowledge representation.


N. Queralt-Rosinach, R. Kaliyaperumal, C. Bernabé, Q. Long, A. Jacobsen and M. Roos are supported by funding from the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP N 825575. We would also like to thank to the EJP RD, the GO FAIR VODAN, and the ZonMW Health Holland under the Trusted World of Corona, for supporting the research on FAIR data stewardship that was reused here. We would like to acknowledge that work in the BEAT-COVID project was partly funded by the Wake Up To Corona crowdfunding initiated by the Leiden University Fund (LUF).

Author information





MR conceived the initial RDM plan. The BEAT-COVID group provided feedback and guidance on research priorities, the meaning of data types, and the RDM plan. NQR, RK, CHB, QL and AJ conceptualised and realised the RDM plan. NQR, CHB, QL, AJ and RK contributed in regular FAIRification discussions. SJ provided guidance and access to the laboratory measurements data. The COVID-19 LUMC group provided data to the BEAT-COVID project. HJW provided guidance and access to the Opal and Mica software applications used in the existing research data management. RK, QL, KB and ELAF contributed to the harmonization of the BEAT-COVID FDP with the LUMC VODAN FDP; NQR drafted the initial version of the manuscript. AJ and MR revised the manuscript. MR, BM and the BEAT-COVID group acquired funding to support this work. All authors reviewed and approved the final version of the manuscript.

Authors’ information

BEAT-COVID group (in alphabetical order, IR): M. Sesmu Arbous1, Bernard M. van den Berg2, Suzanne Cannegieter3, Christa M. Cobbaert4, Anne M. van der Does5, Jacques J.M. van Dongen6, Jeroen Eikenboom7, Mariet C.W. Feltkamp8, Annemieke Geluk9, Jelle J. Goeman10, Martin Giera11, Thomas Hankemeier12, Mirjam H.M. Heemskerk13, Pieter S. Hiemstra5, Cornelis H. Hokke14, Jacqueline J. Janse14, Simon P. Jochems14, Simone A. Joosten9, Marjolein Kikkert8, Lieke Lamont12, Judith Manniën10, Tom H.M. Ottenhoff9, T. Pongracz11, Michael R. del Prado1, Núria Queralt-Rosinach15, Meta Roestenberg 9,14, M. Roos15, Anna H.E. Roukens9, Hermelijn H. Smits14, Eric J. Snijder8, Frank J.T. Staal6, Leendert A. Trouw6, Roula Tsonaka10, Aswin Verhoeven11, Leo G. Visser9, Jutte J.C. de Vries8, David J. van Westerloo1, Jeanette Wigbers1, Henk J. van der Wijk10, Robin C. van Wissen4, Manfred Wuhrer11, Maria Yazdanbakhsh14, Mihaela Zlei6

1 Dept. of Intensive Care, LUMC

2 Dept. of Internal Medicine, LUMC

3 Dept. of Clinical Epidemiology, LUMC

4 Dept. of Clinical Chemistry, LUMC

5 Dept. of Pulmonology, LUMC

6 Dept. of Immunology, LUMC

7 Dept. of Internal Medicine, LUMC

8 Dept. of Medical Microbiology, LUMC

9 Dept. of Infectious Diseases, LUMC

10 Dept. of Biomedical Data Sciences, LUMC

11 Center for Proteomics and Metabolomics, LUMC

12 Division of Systems Biomedicine and Pharmacology, Leiden Academic Center for Drug Research, Leiden University, the Netherlands

13 Dept. of Hematology, LUMC

14 Dept. of Parasitology, LUMC

15 Dept. of Human Genetics, LUMC

COVID-19 LUMC group (IR): Josine A. Oud, MSc1; Meryem Baysan, MSc 2,3; Jeanette Wigbers2; Lieke J. van Heurn, BSc3; Susan B. ter Haar, BSc3; Alexandra G.L. Toppenberg, BSc3; Laura Heerdink, BSc3; Annekee A. van IJlzinga Veenstra, BSc3; Anna M. Eikenboom, BSc3; Julia M. Wubbolts, MSc4; Jonathan Uzorka MD4, Willem Lijfering MD PhD3; Romy Meier1; Ingeborg de Jonge3; Sesmu M. Arbous MD PhD2; Mark G.J. de Boer MD PhD4; Anske G. van der Bom, MD PhD3; Olaf M. Dekkers, MD PhD3: Frits Rosendaal, MD PhD 3

1 Dept. of Hematology, LUMC

2 Dept. of Intensive Care, LUMC

3 Dept. of Clinical Epidemiology, LUMC

4 Dept. of Infectious Diseases, LUMC

Corresponding author

Correspondence to Marco Roos.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Queralt-Rosinach, N., Kaliyaperumal, R., Bernabé, C.H. et al. Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic. J Biomed Semant 13, 12 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Patient data
  • Ontologies
  • FAIR
  • Research data management
  • Hospital
  • Open science