Skip to main content

Table 2 FAIR Vocabulary Feature details

From: Features of a FAIR vocabulary

ID

Features

Description

Examples

FVF-1

Vocabulary and constituent terms are assigned globally unique and persistent identifiers.

Vocabulary and constituent terms should have identifiers that are globally unique and persistent to ensure that each item can be identified unambiguously over time.

Examples of globally unique and persistent identifiers are PURL [15], identifiers.org [16], and w3id.org [17]. The OBO foundry provides an identifier policy [18] for biomedical ontologies and requires the use of PURLs with standard prefixes, such as http://purl.obolibrary.org/obo/GO_0000022.

FVF-2

Vocabulary and constituent terms have rich metadata.

Vocabulary and constituent terms should have sufficient metadata to support discovery by both humans and machines.

Vocabulary metadata should provide information about the creation date, creator and editor, version, licence, target domain and short description. Metadata should describe term editing history, definition source, and other metadata.

FVF-3

Vocabulary and constituent terms can be accessed using identifiers, preferably by both humans and machines.

The URIs for the vocabulary itself and its constituent terms can be dereferenced by both humans and machines.

http://www.ebi.ac.uk/efo/EFO_0000311 resolves to the Term “Cancer” in the Experimental Factor Ontology(EFO), which can be accessed by both humans using ontology browsers and machines through the OLS API.

FVF-4

Vocabulary and constituent terms are registered or indexed in a searchable engine or a resource.

The vocabulary itself and its constituent terms are registered in vocabulary archives or other vocabulary management systems and are indexed by local or/and global search engines.

EMBL-EBI Ontology Lookup Service(OLS) and NCBI BioPortal [19]are two popular public vocabulary archives. Property X-Robots-Tag:index in vocabularies allows them to be indexed by search engines.

FVF-5

Vocabulary and constituent terms are retrievable using a standardised communication protocol, preferably open, free and universally implementable protocols, which allow for authentication and authorisation, where necessary.

The vocabulary itself and its constituent terms are retrievable using a standardised communications protocol, preferably open, free and universally implementable protocols, such as HTTPS, HTTP or FTP. The protocol should also allow identification of the user and grant access based on their associated privilege, when necessary.

Most public ontologies can be accessed using HTTP or HTTPS protocols. For example, EFO uses HTTP, while the Unified Medical Language System [20] uses the HTTPS protocol, only allowing access to authenticated users.

FVF-6

Vocabulary and constituent terms are persistent over time and are appropriately versioned.

Changes in the vocabulary are reflected in different versions. Vocabularies and their terms are versioned, and each unaltered version of the vocabulary can be identified and retrieved in perpetuity. Vocabulary metadata is available even when the vocabulary is no longer available.

Changes in EFO are included in each release and identified with versioned IRI, such as http://www.ebi.ac.uk/efo/releases/v3.31.0/efo.owl, which resolves to the versioned vocabulary. OBO Foundry also provides guidelines [21] for ontology versioning and how different versions of the vocabularies should be labelled, stored and published.

FVF-7

Vocabulary and constituent terms use a formal, accessible and broadly applicable, and preferably machine-understandable language for knowledge representation.

The vocabulary itself and its constituent terms use a formal, accessible and broadly applicable, and preferably machine-understandable language for knowledge representation.

OWL-based vocabularies can be serialised using RDF-XML, or relational databases e.g. ChEBI [22] can be converted into OWL [23]

FVF-8

Vocabulary and constituent terms use qualified references to other vocabularies.

Vocabulary reuses terms from other vocabularies when applicable, provides adequate metadata about external terms, and follows vocabulary cross-reference standards.

EFO reuses human anatomy terms such as “liver” from UBERON [24](UBERON_0002107) and links to the original UBERON term. Property Xref indicates a cross-reference relationship between two vocabulary terms. MIREOT [25] defines a methodology and minimum information requirements for importing external terms into an extant ontology.

FVF-9

Vocabulary and constituent terms are described with a plurality of accurate and relevant attributes.

Vocabulary terms include sufficient attributes, such as labels, synonyms, definitions, examples of usage, and cross-references, to support the interpretation and reuse of vocabulary terms.

The OBO flat-file format specification [26], synonym, Xref, relationship, etc. The Minimal requirement for term annotations in OBI (metadata) [27] also specifies minimum requirements for each ontology term.

FVF-10

Vocabularies are released with a standard data usage licence, preferably a machine-readable licence.

The vocabulary includes information about how the vocabulary can be reused.

Common public data usage licences are CC-BY [28] and MIT [29]. For example, Gene Ontology uses Creative Commons Attribution 4.0 Unported License. SNOMEDTM [30]uses a self-defined SNOMED CTTM affiliate license agreement.

FVF-11

Vocabularies meet domain-relevant community standards.

Vocabularies cover essential terms for the specific domain, reflect knowledge of this domain and can be used in existing data standards and data models.

Community standards, such as minimum information requirements and data models can be found in FAIRsharing [9]. The Plant Phenotyping Experiment Ontology(PPEO) [31] implements the Minimum Information about Plant Phenotyping Experiment(MIAPPE) [32] standards and covers essential attributes to describe a MIAPPE-compliant phenotype dataset.