Features of a FAIR vocabulary

Xu, Fuqi; Juty, Nick; Goble, Carole; Jupp, Simon; Parkinson, Helen; Courtot, Mélanie

doi:10.1186/s13326-023-00286-8

Research
Open access
Published: 01 June 2023

Features of a FAIR vocabulary

Fuqi Xu¹^na1,
Nick Juty²^na1,
Carole Goble²,
Simon Jupp³,
Helen Parkinson¹ &
…
Mélanie Courtot^1,4,5

Journal of Biomedical Semantics volume 14, Article number: 6 (2023) Cite this article

2046 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Background

The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies.

Results

We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features. We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how they can be used for evaluating and improving vocabularies using exemplary biomedical vocabularies.

Conclusions

Our work proposes features of FAIR vocabularies and corresponding indicators for assessing the FAIR levels of different types of vocabularies, identifies use cases for vocabulary engineers, and guides the evolution of vocabularies.

Background

The Findable, Accessible, Interoperable and Reusable (FAIR) Principles [1] have rapidly gained traction in the biomedical community since their publication in 2016, with many groups attempting to improve their data quality, develop FAIR capable data resources, and design generic FAIR assessment tools for biomedical data [2,3,4]. The heterogeneous nature and broad scope of biomedical data, ranging from molecular data to human studies via interdisciplinary analysis, stringent requirements for data FAIRness can ensure the usefulness of such data in benefiting human health. While assessing the FAIR level of datasets and data resources [5], we noted a recursive cycle with respect to the ‘Interoperable’ FAIR Principle, “I2 - (Meta)data use vocabularies that follow FAIR principles”. To comply with that principle, datasets need to use FAIR vocabularies, which themselves need to be FAIR. However, it remains unclear what is a FAIR vocabulary. Having FAIR vocabularies promote biomedical data FAIRness throughout the data life cycle, during the data generation, curation, and distribution processes, and support data exchange and integration across data resources. Therefore, it is crucial to clarify the definition and features of a FAIR vocabulary.

Standards for FAIR vocabularies have been generated in different domains. The FAIRsFAIR recommendations [6] provide guidance on FAIR semantic artefacts, as well as supporting vocabulary search engines and repositories. Garijo and Poveda-Villalon [7] discussed detailed requirements of ontology, such as Uniform Resource Identifiers(URI), versioning strategies, and the formatting of those ontologies. Furthermore, “Ten simple rules for making a vocabulary FAIR [8]” for converting print-based or other forms of legacy vocabularies to FAIR vocabularies have also been proposed.

Researchers have also developed indicators and services to assess the FAIR level of digital objects both manually and automatically; FAIR indicators in the FAIRsharing [9] FAIR Maturity Evaluation Service [4], FAIR metrics in the F-UJI Automated FAIR Data Assessment Tool [10], and the Research Data Alliance (RDA) Data Maturity Model Specification and Guidelines [11], also known as the RDA indicators. Among them, the RDA indicators are a set of representative and descriptive indicators to evaluate the FAIR level of data which have been used in many projects and with diverse data types [5]. Some automated assessment services, such as FOOPS! [12], have been developed to measure the FAIR level of public, machine-readable vocabularies. The Open Biomedical Ontology(OBO) Dashboard service checks how ontology adheres to OBO principles [13].

Vocabularies come in different forms, such as lists, thesaurus, taxonomies, and ontologies; each at different levels of semantic complexity and FAIRness and can all archive various levels of FAIR. To the best of our knowledge, there have not yet been quantifiable FAIR assessment approaches developed to measure the FAIR level of different formats of vocabularies objectively.

Therefore, in this paper, we distinguished the concepts of FAIR data, FAIR data resources, and FAIR vocabularies, and propose a set of general FAIR Vocabulary Features (FVFs) as a set of satisfiable features for vocabularies. We also adapted the RDA indicators to measure the FAIR levels of vocabularies. Further, we provided example assessments based on selected ontologies available from the EMBL-EBI Ontology Lookup Service(OLS) [14] and other vocabulary resources.

Results

Defining FAIR Data and FAIR vocabulary

To describe the features of a FAIR vocabulary, the first step is to define what a FAIR vocabulary is, and how to distinguish it from FAIR data. In our analysis, data can be FAIR, to a greater or lesser extent, and data resources and data vocabularies are capable of supporting FAIRness at different levels. Data vocabularies are designed to support FAIR data, and they can also be considered FAIR data resources. The orthogonality of these concepts is an important context for this work when determining the features of a FAIR vocabulary. Table 1 presents a definition for FAIR data, FAIR data resources, and FAIR vocabulary.

Table 1 Definitions of FAIR data, FAIR vocabularies and FAIR metadata

Full size table

A FAIR vocabulary has a set of FAIR features and also has a list of associated FAIR indicators. It is usable for annotation, analysis and presentation of data, and is deployable in the context of a FAIR-capable data resource or tools. It also serves ‘aggregation’ use cases where data originates from different domains and enables data interoperability, such as concept mapping, where different vocabularies are used. Requirements for a FAIR vocabulary cover three aspects:

(1)
FAIR in terms of its application to FAIR data
(2)
FAIR in the context of FAIR capable resources
(3)
FAIR in the context of other vocabularies

Alignment with existing standards

We analysed the possibility of reusing the OBO principles to define the FAIR vocabulary features. We noted that not all the OBO principles are expressed at the same level of maturity or granularity, and therefore some were unmappable and excluded from the features. Whilst the FAIR principles apply to both ontology developers and ontologies themselves, only principles that focus on ontologies as digital objects were selected. As a result, this analysis did not include OBO Principle 1: Open, 4: Versioning, 6: Textual definitions, 9: Documented plurality of users, 10: Collaboration commitment, 11: Authority, 12: Naming conventions, and 20: responsiveness. The rationale for suitability as FVFs is discussed in detail in Supplementary Table 1.

FAIR vocabulary features

Based on the analysis of the OBO foundry practices, FAIR principles, and our previous experience working with and developing ontologies, we proposed eleven features for FAIR vocabulary in Table 2, covering requirements for identifiers, access protocols, knowledge representation, and other aspects. The relationships among FVF, the FAIR principles and three aspects of FAIR vocabularies are presented in Tabel 3. The FVFs cover all four aspects of the FAIR principles with a focus on the interoperability aspects of FAIR data and data resources.

Table 2 FAIR Vocabulary Feature details

Full size table

Table 2 also provides examples for each FAIR feature representing in different formats and at varying FAIRness levels amongst those vocabularies. For example, for FVF-6: versioning and persistent vocabularies, of all ontologies indexed and updated in OLS, 59.3\(\%\) of selected vocabularies use a date format of “yyyy-mm-dd” in the “versionIRI”, such as http://purl.obolibrary.org/obo/scdo/releases/2021-04-15/scdo.owl. 2.5\(\%\) of vocabularies use semantic versioning, such as http://www.ebi.ac.uk/efo/releases/v3.34.0/efo.owl or other forms of numeric versioning, such as http://www.orpha.net/version3.2. 31.7\(\%\) of vocabularies do not provide valid machine-readable versioned IRIs. For FVF-1: identifiers, 74\(\%\) of vocabularies use OBO-format Persistent Uniform Resource Locators (PURL), identifier.org, w3id.org identifiers, as well as other domain-specific identifiers. For FVF-5: accessible using standard protocols, of all 199 selected ontologies, only one ontology uses the HTTPS protocol; the rest use HTTP protocols.

Table 3 FAIR vocabulary features mapped to FAIR principles and FAIR vocabulary requirements

Full size table

Indicators for FAIR vocabulary features

While FVFs identify general characteristics of a FAIR vocabulary, these features need to be objectively quantified to be useful in vocabulary selection, development and assessment. Hence, we aligned FVFs with FAIR indicators to enable the computation of a discrete FAIR score.

We mapped the RDA indicators to FVF, filtering out indicators that do not apply to vocabularies. For example, RDA-F3-01M: Metadata includes the identifier for the data is not applicable to ontologies and other types of vocabularies, where the metadata is usually directly embedded within the vocabulary data. We also specified the digital object to which the indicator refers, and identified within each indicator the relevant standards used in corresponding domains. The FVFs, associated with selected indicators, can be used as indicators for FAIR Vocabulary as shown in Table 4. Other indicators that are not suitable for FAIR vocabularies are listed in Supplementary Table 2.

Table 4 Indicators for FAIR vocabulary features. Alignment between the FAIR vocabulary features and RDA data maturity level indicators

Full size table

FAIR assessments for common data vocabularies

We tested the FVF indicators in three representative ontologies. Both the Gene Ontology (GO) and Experimental Factor Ontology (EFO) are vocabularies of high FAIR level, with over 80% FVFs fulfilled (See details in Table 5). GO only partially complies with ‘FVF-6: Vocabularies and their terms are persistent over time and are appropriately versioned‘, with a Fail in ‘Indicator RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language‘. ‘FVF-2: Vocabularies and their terms have rich metadata’ was not complied with since no general description of the ontology is provided in the released artefact. Compared with these two ontologies, the taxonomy, International Classification of Diseases 11th Revision (ICD-11) [33], fully complies with 18.18% FVFs and partially complies with 36.36% FVFs. This is because ICD-11 neither refers to other vocabularies within each term description or within the metadata for those terms nor adheres to other community standards, such as vocabulary formats. ICD-11 was selected for this evaluation as it already offers significant FAIR improvements over ICD-10 [34], such as providing a standard licence.

Table 5 FAIR vocabulary feature applied. Assessment results of Gene ontology, Experimental factor Ontology and ICD-11

Full size table

Discussion

Comparing the assessment results of the two ontologies, GO and EFO, with the list-type dictionary, ICD-11, ontology-based vocabularies follow stricter semantics and therefore fared better in the scoring of FAIR features. One of the reasons is that many ontology-related standards have been established, including formats, such as the Web Ontology Language (OWL), guidelines such as the OBO principles, minimum information standards, such as Minimum Information for Biological and Biomedical Investigations (MIBBI) [35], and mechanisms for cross-references or incorporating external ontologies, such as the Minimum Information to Reference an External Ontology Term (MIREOT) [25]. This is naturally reflected in a higher score for compliance with community standards, which is a core part of FVF, and which improves the interoperability and reusability of a vocabulary. However, being a FAIR ontology does not ensure the quality and usability of an ontology. The scope, popularity, and accuracy of a vocabulary are also factors to consider.

The FVFs we proposed integrate multiple FAIR vocabulary requirements and serve as FAIR vocabulary standards to guide the development and maintenance of vocabularies. Each FVF is associated with indicators, to support its quantifiable and objective assessment against each feature. These indicators can also be plugged into existing or emerging standards in other domains to support the evolving of new vocabularies and suit emerging use cases. For example, in FVF-8: cross-referencing other vocabularies can be linked to the ontology cross-reference standards, such as MIREOT. Because of our expertise and requirements, this manuscript focuses on the biomedical domain; however, we anticipate this framework could be reused elsewhere.

Compared with other FAIR vocabulary requirements, FVFs apply to multiple vocabulary formats, and we demonstrated the potential for using them across other forms of vocabularies with the ICD-11 example. We focused on how FVFs can be applied to ontologies and did not include other types of vocabulary specifications, because an ontology has a clearly defined structure, schema, standards, repositories, and supporting standards.

Integrating the FVF with FAIR indicators makes it possible to assess the FAIR level of vocabularies, identify progressive ontology development use cases, and improve accordingly. We selected the RDA indicators since they have proven to be useful in many datasets, and have been referenced by other assessment approaches in FAIRassist.org; yet, FVFs could alternatively be aligned to other FAIR-principle-based indicators which would similarly reflect the FAIR principles. The RDA indicators are designed to evaluate biomedical datasets, where data refers to outcomes of sequencing or screening experiments, and metadata refers to sample information, experiment designs, etc, which needs to be annotated with controlled vocabularies. In our context, data refers to the vocabulary themselves, and metadata, on the other hand, points to ontology versioning and editing information. When performing an assessment, it is crucial for assessors to agree on the definition of data and metadata.

Besides manual assessments, quantifiable formal indicators are also amenable to becoming machine-actionable. Reusing shared indicators will make it possible to perform automated FAIR vocabulary assessments. The bottleneck of automated assessment, however, is the variations in the implementation of the same requirement. For example, the VersionIRI case presented above demonstrates the challenges of exhausting all formats of interpretation to build a unified assessment service. Other features, such as “Complying with domain standards” are even harder to automate. Therefore, manual assessment using indicators for FVFs is still one of the more practical and accurate approaches.

The FAIR scores provide a quantitive and intuitive “summary” of the FAIR level of a vocabulary and can be an effective measure of how the vocabulary has evolved. However, it should neither be taken as an absolute measure to evaluate either the quality comparison across vocabularies or compare different vocabularies. For example, for vocabularies which are used and shared within an institution and not designed for external usage, having global identifiers (FVF-1) is not a core requirement. In this case, the vocabulary is still FAIR for its designed purpose within the organisation, even if the FAIR score is low. When checking the FAIR level of a vocabulary, it is important to examine the detailed use cases and features, instead of just comparing scores. A vocabulary being “FAIR enough” for its purpose is more important than having a general FAIR score. Moreover, each assessment system might have different FAIR scores for the same vocabulary. Instead of aiming for an absolute higher score, assessors should understand the mechanism behind each indicator, and focus on the interpretation of each test.

These FVF and assessments provide insights on how to improve vocabularies. For example, based on the EFO assessments, the FAIR level of EFO could easily be improved by adding a description of the aim and function of EFO, allowing different vocabulary management services to harvest that information. They also assist and guide the evolution of FAIR vocabularies by striving to iteratively improve FAIR levels of subsequently developed versions. For example, compared to ICD-10, its successor ICD-11 has incorporated many features to make it FAIRer, such as providing application programming interfaces (APIs) for easier access, having a machine-readable license, etc.

Conclusions

We defined a FAIR vocabulary and proposed a set of features of a FAIR vocabulary. This explicitly links previous ontology standardisation efforts with the works of the fast-growing FAIR data community. The features not only cover ontology-type vocabularies but also apply to other formats of vocabulary. Furthermore, we provided a way to measure the FAIR level of a vocabulary quantitatively by aligning existing FAIR indicators with such features. This provides a foundation for further vocabulary assessment work. Finally, the features were tested against common vocabularies, and examples of how to perform FAIR assessments on vocabularies are provided. We aim to integrate existing guidelines in both the FAIR and the ontology community to deliver a comprehensive and quantitative measure of the FAIR level of vocabulary. In the future, we can develop automated tests based on the FVF requirements, perform health checks on ontology repositories and improve the ontology standards development accordingly.

Materials and methods

Existing vocabulary standards

Instead of reinventing a new set of features, we reused the outcomes of previous vocabulary standardisation work to determine features of FAIR vocabulary. The standards include both generic standards, such as the OBO principles [26] which cover multiple areas in ontology design and ontology development. The OBO principles aim to coordinate specifically the development of biomedical ontologies. Fourteen principles cover ontology development, ontology design and coverage. Standards targeting specific aspects of vocabularies, such as MIREOT, which focuses on ontology cross-referencing, and the OBI minimal list of metadata for term annotation, are also included. We evaluated the suitability of using such standards in the FVF definition by analysing the context and their application in discussions.

FVF in the OLS repository

We fetched ontologies indexed in the OLS repository and selected those that are successfully loaded and up-to-date. OLS contains 266 biomedical ontologies by the time we access the database (https://www.ebi.ac.uk/ols/api/ontologies). We filtered out ontologies which could not be indexed automatically (without a valid loaded timestamp) and removed inactive ontologies based on the date information in the versionIRI section. 200 ontologies were selected based on these criteria. The loading timestamp, identifier, and version information in each ontology were checked.

We filtered out some inactive ontologies based on the loading time (only ontologies with a loading timestamp after 2019-01-01 were chosen) and date information in the version IRI (ontologies with a date before 2019-01-01 in the verionIRI field were removed). The information was collected based on machine-readable metadata imported from OLS instead of each ontology itself. But these criteria do not ensure all vocabularies selected are up-to-date. For example, for ontologies using semantic versioning format where no date information is provided in the versionIRI, or some update information is collected in other metadata fields such as ‘annotation editor comments’, etc. Despite the constraints of the analysis, it still provides enough information to showcase the status of current vocabularies.

Development of indicators for FAIR vocabulary features

The mapping between the RDA indicators was based on text analysis, using the RDA indicator definition, description and examples. It is worth noting that when mapping the RDA indicators to datasets, metadata refers to the metadata to which the vocabulary can be applied, while in the context of vocabularies, metadata and data refer to the description of the vocabulary and the vocabulary information. Therefore, we combined the indicators evaluating data and metadata in the mapping we performed, wherever possible.

Representative vocabularies

We selected three representative vocabularies to test the applicability of FVF indicators in different types of vocabularies.

ICD-11 is a large taxonomy of diseases and is the global standard for diagnostic information, disease definitions and synonyms. As a World Health Organisation(WHO) standard, ICD-11 is one of the most widely adopted disease vocabularies. It represents types of vocabularies that have low semantic maturity and is expressed as a list or a dictionary.

GO [36] is a well-established and highly regarded and utilised biomedical ontology. It contains over 43000 terms and has been cross-referenced in other classification systems, such as UniProt [37], HAMAP [38], and InterPro [39]. GO is also a reference OBO Foundry ontology [40] and has been reused in many other resources. It is selected as a representative of domain ontology.

EFO [41], on the other hand, is an application ontology built for communities like Open Targets [42] for describing experimental variables. Application ontologies, although using the standard ontology format, are mainly developed for project-specific use cases.

Assessment against indicators for FAIR vocabulary features

We tested the FVF and corresponding indicators on three representative vocabularies, GO, EFO and ICD-11. For each FVF, three compliance levels we re-assigned; if a vocabulary meets the requirements of all indicators, full compliance is achieved. Otherwise, depending on the scoring for each FVF, partial compliance or no compliance results are given. The percentages of full compliance, partial compliance and no compliance features are also calculated. Supplementary table 3-5 provides the assessment details.

Availability of data and materials

The datasets used and analysed during this study are included in this published article and its supplementary information files.

Abbreviations

API:: Application programming interface
CC-BY:: Creative commons attribution
ChEBI:: Chemical entities of biological interest
EFO:: Experimental factor ontology
EMBL-EBI:: European molecular biology laboratory’s European bioinformatics institute
FAIR:: Findable, accessible, interoperable, reusable
FTP:: File transfer protocol
FVF:: FAIR vocabulary features
GO:: Gene ontology
HAMAP:: High-quality automated and manual annotation of proteins
HTTP:: Hypertext transfer protocol
HTTPS:: Hypertext transfer protocol secure
ICD-10:: International classification of diseases 10th revision
ICD-11:: International classification of diseases 11th revision
IRI:: Internationalized resource identifier
MIAPPE:: Minimum information about a plant phenotyping experiment
MIBBI:: Minimum information for biological and biomedical investigations
MIT license:: Massachusetts institute of technology license
MIREOT:: Minimum information to reference an external ontology term
NCBI:: National center for biotechnology information
OBO:: Open biological and biomedical ontology
OLS:: Ontology lookup service
OWL:: The Web ontology language
PPEO:: Plant phenotype experiment ontology
PURL:: Persistent uniform resource locator
RDA:: Research data alliance
RDF:: Resource description framework
RO:: Relationship ontology
SNOMED:: Systematized nomenclature of medicine
SNOMED-CT:: Systematized nomenclature of medicine – clinical terms
UBERON:: Uber anatomy ontology
UniProt:: The universal protein resource
URI:: Uniform resource identifier
WHO:: World health organisation
XML:: Extensible markup language

References

Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018.
Article Google Scholar
The FAIRsharing team, University of Oxford. FAIRassist.org. 2019. https://fairassist.org/. Accessed 1 Sept 2021.
Drysdale R, Cook CE, Petryszak R, Baillie-Gerritsen V, Barlow M, Gasteiger E, et al. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences. Bioinformatics. 2020;36(8):2636–42.
Article Google Scholar
Batista D, Wilkinson MD, Prieto M, McQuilton P, Rocca-Serra P, Sansone SA, et al. Fair Evaluation Services. FAIRsharing.org. 2021. https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/. Accessed 1 Sept 2021.
Burdett T, Xu F, Courtot M, et al. FAIRplus: D3.2 IMI FAIR Metrics Publication. Zenodo. 2021. https://doi.org/10.5281/zenodo.4428633.
Hugo W, Le Franc Y, Coen G, et al. D2.5 FAIR Semantics Recommendations Second Iteration. FAIRsFAIR. 2020. https://doi.org/10.5281/zenodo.4314321.
Garijo D, Poveda-Villalón M. Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web. 2020. arXiv. https://arxiv.org/abs/2003.13084. Accessed 1st Sept 2021.
Cox SJD, Gonzalez-Beltran AN, Magagna B, Marinescu MC. Ten simple rules for making a vocabulary FAIR. PLOS Comput Biol. 2021;17(6):e1009041.
Article Google Scholar
Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019;37(4):358–67.
Article Google Scholar
Devaraju A, Huber R. F-UJI - An Automated FAIR Data Assessment Tool. Zenodo. 2020. https://doi.org/10.5281/zenodo.4063720.
FAIR Data Maturity Model Working Group. FAIR Data Maturity Model. Specification and Guidelines. Zenodo. 2020. https://doi.org/10.15497/rda00050.
Garijo D, Corcho O, Poveda-Villalón M. FOOPS!: An Ontology Pitfall Scanner for the FAIR principles. In: Proceedings of the ISWC. 2021. p. 4. http://ceur-ws.org/Vol-2980/paper321.pdf.
Jackson R, Matentzoglu N, Overton JA, Vita R, Balhoff JP, Buttigieg PL, et al. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. Database. 2021;2021:baab069.
Jupp S, Burdett T, Leroy C, Parkinson HE. A new Ontology Lookup Service at EMBL-EBI. SWAT4LS. 2015;2:118–119.
PURL Administration. Internet Archive 2021. https://purl.prod.archive.org/. Accessed 1 Sept 2021.
Wimalaratne SM, Juty N, Kunze J, Janée G, McMurry JA, Beard N, et al. Uniform resolution of compact identifiers for biomedical data. Sci Data. 2018;5:180029.
Article Google Scholar
W3C Permanent Identifier Community Group. w3id.org - Permanent Identifiers for the Web. World Wide Web Consortium. 2021. https://w3id.org/. Accessed 1 Sept 2021.
Ruttenberg A, Courtot M, Mungall C. OBO Foundry Identifier policy. 2018. http://www.obofoundry.org/id-policy. Accessed 1 Sept 2021.
Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39:W541–5.
Article Google Scholar
Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267-270.
Article Google Scholar
The OBO Foundry. Versioning (principle 4). The OBO Consortium. 2021. http://www.obofoundry.org/principles/fp-004-versioning.html. Accessed 1 Sept 2021.
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44:D1214-1219.
Article Google Scholar
Antoniou G, van Harmelen F. Web Ontology Language: OWL. In: Staab S, Studer R, editors. Handbook on Ontologies. International Handbooks on Information Systems. Springer; 2004. p. 67–92. https://doi.org/10.1007/978-3-540-24750-0_4.
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012;13(1):R5.
Article Google Scholar
Courtot M, Gibson F, Lister AL, Malone J, Schober D, Brinkman RR, Ruttenberg A. MIREOT: The minimum information to reference an external ontology term. Appl Ontol. 2011;6(1):23–33. https://doi.org/10.3233/ao-2011-0087.
Day-Richter J. The OBO Flat File Format Specification, version 1.2. W3C. 2006. https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_2.html. Accessed 1 Sept 2021.
The OBI consortium. Minimal requirement for term annotations in OBI (metadata). SourceForce.net; 2021. http://obi.sourceforge.net/ontologyInformation/MinimalMetadata.html. Accessed 15 Aug 2022.
Lessig L. The creative commons. Mont. Law Rev. 2004;65:1.
Google Scholar
Saltzer JH. The origin of the “MIT license’’. IEEE Ann Hist Comput. 2020;42(4):94–8.
Article Google Scholar
Cornet R, de Keizer N. Forty years of SNOMED: a literature review. BMC Med Inform Decis Making. 2008;8(1):S2.
Article Google Scholar
Arnaud E, Cooper L, Shrestha R, Menda N, Nelson RT, Matteis L, et al. Towards a reference plant trait ontology for modeling knowledge of plant traits and phenotypes. In: KEOD 2012 - Proceedings of the International Conference on Knowledge Engineering and Ontology Development, vol. 2. 2012. p. 220–225. http://wrap.warwick.ac.uk/59831/. Accessed 23 May 2023.
Krajewski P, Chen D, Ćwiek H, van Dijk ADJ, Fiorani F, Kersey P, et al. Towards recommendations for metadata and data handling in plant phenotyping. J Exp Bot. 2015;66(18):5417–27.
Article Google Scholar
World Health Organization. ICD-11 Implementation or Transition Guide. Geneva; 2019. https://icd.who.int/docs/ICD-11%20Implementation%20or%20Transition%20Guide_v105.pdf. Accessed 1 Sept 2021.
World Health Organization and others. International classification of diseases for mortality and morbidity statistics (10h Revision). World Health Organization; 2010. https://www.who.int/classifications/icd/ICD10Volume2_en_2010.pdf. Accessed 1 Sept 2021.
Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations. Nat Biotechnol. 2008;26(8):889–96.
Article Google Scholar
Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330–8.
Article Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–9.
Article Google Scholar
Pedruzzi I, Rivoire C, Auchincloss AH, Coudert E, Keller G, de Castro E, et al. HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 2015;43:D1064–70.
Article Google Scholar
Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–54.
Article Google Scholar
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251–5.
Article Google Scholar
Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, et al. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics. 2010;26(8):1112–8.
Article Google Scholar
Ochoa D, Hercules A, Carmona M, Suveges D, Gonzalez-Uriarte A, Malangone C, et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 2021;49:D1302–10.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work is funded by the IMI-FAIRplus project(Grant number 802750), EOSC-Life(Grant number 824087) from the European Union’s Horizon 2020 programme, and the European Molecular Biology Laboratory - European Bioinformatics Institute core funds.

Author information

Fuqi Xu and Nick Juty contributed equally to this work.

Authors and Affiliations

European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
Fuqi Xu, Helen Parkinson & Mélanie Courtot
The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
Nick Juty & Carole Goble
SciBite BioData Innovation Centre, Wellcome Genome Campus, Hinxton, CB10 1DR, UK
Simon Jupp
Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, M5G 0A3, Canada
Mélanie Courtot
Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
Mélanie Courtot

Authors

Fuqi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nick Juty
View author publications
You can also search for this author in PubMed Google Scholar
Carole Goble
View author publications
You can also search for this author in PubMed Google Scholar
Simon Jupp
View author publications
You can also search for this author in PubMed Google Scholar
Helen Parkinson
View author publications
You can also search for this author in PubMed Google Scholar
Mélanie Courtot
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

FX wrote the first draft. MC, HP, CG and NJ reviewed and revised the article. HP, SJ, MC and CG contributed to the initial discussion of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mélanie Courtot.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table 1.

The suitability of OBO principles used as FAIR Vocabulary Features. A detailed evaluation of how suitable each OBO principle is for use as features for FAIR vocabularies.

Additional file 2: Supplementary Table 2.

Excluded RDA data maturity indicators. RDA indicators that are not mapped to FAIR Vocabulary Features.

Additional file 3: Supplementary Table 3.

FAIR assessment results of Gene ontology. Details of the FAIR assessment for Gene Ontology.

Additional file 4: Supplementary Table 4.

FAIR assessment results of Experimental Factor Ontology. Details of the FAIR assessment for Experimental Factor Ontology.

Additional file 5: Supplementary Table 5.

FAIR assessment results of ICD-11. Details of the FAIR assessment for International Classification of Diseases 11th Revision.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Xu, F., Juty, N., Goble, C. et al. Features of a FAIR vocabulary. J Biomed Semant 14, 6 (2023). https://doi.org/10.1186/s13326-023-00286-8

Download citation

Received: 18 March 2022
Accepted: 27 April 2023
Published: 01 June 2023
DOI: https://doi.org/10.1186/s13326-023-00286-8

Features of a FAIR vocabulary

Abstract

Background

Results

Conclusions

Background

Results

Defining FAIR Data and FAIR vocabulary

Alignment with existing standards

FAIR vocabulary features

Indicators for FAIR vocabulary features

FAIR assessments for common data vocabularies

Discussion

Conclusions

Materials and methods

Existing vocabulary standards

FVF in the OLS repository

Development of indicators for FAIR vocabulary features

Representative vocabularies

Assessment against indicators for FAIR vocabulary features

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplementary Table 1.

Additional file 2: Supplementary Table 2.

Additional file 3: Supplementary Table 3.

Additional file 4: Supplementary Table 4.

Additional file 5: Supplementary Table 5.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Biomedical Semantics

Contact us