Skip to main content

Table 1 An overview of ontologies, thesauri and knowledge bases used by biomedical semantic annotation tools discussed in the paper

From: Semantic annotation in biomedicine: the current landscape

BioPortal (http://bioportal.bioontology.org/)

A major repository of biomedical ontologies, currently hosting over 500 ontologies, controlled vocabularies and terminologies. Its Resource Index provides an ontology-based unified index of and access to multiple heterogeneous biomedical resources (annotated with BioPortal ontologies).

DBpedia (http://wiki.dbpedia.org/)

“Wikipedia for machines”, that is, a huge KB developed through a community effort of extracting information from Wikipedia and representing it in a structured format suitable for automated machine processing. It is the central hub of the Linked Open Data Cloud.

LLD - Linked Life Data (https://datahub.io/dataset/linked-life-data/)

LLD platform provides access to a huge KB that includes and semantically interlinks knowledge about genes, proteins, molecular interactions, pathways, drugs, diseases, clinical trials and other related types of biomedical entities. It is part of the Linked Open Data Cloud (http://lod-cloud.net/)

NCBI Biosystems Database (https://www.ncbi.nlm.nih.gov/biosystems)

Repository providing integrated access to structured data and knowledge about biological systems and their components: genes, proteins, and small molecules.

The NCBI Taxonomy contains the names and phylogenetic lineages of all the organisms that have molecular data in the NCBI databases.

OBO - Open Biomedical Ontologies (http://www.obofoundry.org/)

Community of ontology developers devoted to the development of a family of interoperable and scientifically accurate biomedical ontologies. Well known OBO ontologies include:

Chemical Entities of Biological Interest (ChEBI) - focused on molecular entities, molecular parts, atoms, subatomic particles, and biochemical roles and applications

Gene Ontology (GO) - aims to standardize the representation of gene and gene product attributes; consists of 3 distinct sub-ontologies: Molecular Function, Biological Process, and Cellular Component

Protein Ontology (PRO) - provides a structural representation of protein-related entities

SNOMED CT (http://www.ihtsdo.org/snomed-ct)

SNOMED CT is considered the world’s most comprehensive and precise, multilingual health terminology. It is used for the electronic exchange of clinical health information. It consists of concepts, concept descriptions (i.e., several terms that are used to refer to the concept), and concept relationships.

UMLS (Unified Medical Language System) Metathesaurus (https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/)

The most well-known and widely used knowledge source in the biomedical domain. It assigns a unique identifier (CUI) to each medical concept and connects concepts to each other thus forming a graph-like structure; each concept (i.e. CUI) is associated with its ‘semantic type’, a broad category such as Gene, Disease or Syndrome; each concept is also associated with several terms used to refer to that concept in biomedical texts; these terms are pulled from nearly 200 biomedical vocabularies. Some well-known vocabularies that have been used by biomedical semantic annotators include:

Human Phenotype Ontology (HPO) contains terms that describe phenotypic abnormalities encountered in human disease, and is used for large-scale computational analysis of the human phenome.

Logical Observation Identifiers Names and Codes (LOINC) provides standardized vocabulary for laboratory and other clinical observations, and is used for exchange and/or integration of clinical results from several disparate sources.

Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus created and maintained by U.S. National Library of Medicine (NLM), and has been primarily used for indexing articles in PubMed

RxNorm provides normalized names for clinical drugs and links between many of the drug vocabularies commonly used in pharmacy management and drug interaction software.

UniProtKb/Swiss-Prot (http://www.uniprot.org/uniprot/)

Part of UniProtKB, a comprehensive protein sequence KB, which contains manually annotated entries. The entries are curated by biologists, regularly updated and cross-linked to numerous external databases, with the ultimate objective of providing all known relevant information about a particular protein.