Skip to main content

Table 3 General purpose biomedical semantic annotation tools (Part I)

From: Semantic annotation in biomedicine: the current landscape

 

cTAKES [4]

NOBLE Coder [20]

MetaMap [31, 32]

NCBO annotator [14]

Modularity/configuration options

Modular text processing pipeline

Vocabulary (terminology);

Term matching options and strategies

Text processing pipeline;

Vocabulary (terminology);

Term matching options and strategies

Vocabulary (terminology);

Term matching options

Disambiguation of terms

Enabled through integration of the YTEX component [8]

Instead of through WSD, it uses heuristics to choose one concept among candidate concepts for the same piece of input text

Supported; based on:

- removal of word senses based on a manual study of UMLS ambiguity

- a WSD algorithm that chooses a concept with the most likely semantic type for a given context

Not supported

Vocabulary (terminology)

Subset of UMLS, namely SNOMED CT and RxNORM

Several pre-built vocabularies, based on subsets of UMLS

(e.g. SNOMED CT, MeSH, RxNORM)

UMLS Metathesaurus

UMLS Metathesaurus and BioPortal ontologies (over 330 ontologies)

Speed*

Suitable for real-time processing

Suitable for real-time processing

Better for off-line batch processing

Suitable for real-time processing

Implementation form

Software (Java) library;

Stand-alone application

Software (Java) library;

Stand-alone application

Software library;

originally version in Prolog;

Java implementation, known as MMTX, is also available

RESTful Web service

Availability

open source;

available under Apache License, v.2.0

open-source;

available under GNU Lesser General Public License v3

open source;

terms and conditions at: https://metamap.nlm.nih.gov/MMTnCs.shtml

closed source, but freely available

Specific features

Better performance on clinical texts than on biomedical scientific literature (its NLP components are trained on clinical texts)

Offers user interface for creating custom terminologies (to be used for annotation) by selecting and merging elements from several different thesauri/ontologies

Primarily developed for annotation of biomedical literature (MEDLINE/PubMed citations); performs better on this kind of text than clinical notes

It uses MGrep term-to-concept matching tool to get primary set of annotations; these are then extended using different forms of ontology-based semantic matching

URL

http://ctakes.apache.org/

http://noble-tools.dbmi.pitt.edu/

https://metamap.nlm.nih.gov/

https://bioportal.bioontology.org/annotator

  1. *Note that speed estimates are based on the experimental results reported in the literature; those experiments were done with corpora of up to 200 documents (paper abstracts or clinical notes); the given estimates might not hold for significantly larger corpora