Skip to main content

Table 4 This table gives a schema for categorizing terms into different types that help to sort and count the most frequent false positive and false negative results

From: Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

Label

Description

Semantics

Relevance

Examples

1C

1 character

Undefined

Artefact

N

2C

2 characters

Undefined

Margin

T3, PU, LH

a-PG

Acronym PGN

Confirmed by UniProtKb

Core

Ras, c-fos, Wnt

ea-PG

Extended a-PG

a-PG with preserved semantics

Margin

Ras gene, p53 protein, Src family

xa-PG

Expanded a-PG

a-PG with modified semantics

Artefact

p53 mutations

PGT

Protein/gene term

Confirmed by UniProtKb

Core

E-cadherin, beta-catenin, glucocorticoid receptor

BMT

Biomedical term

Specific to biomedical scientific text, excluding PGT

Margin

olymerase, prion, IgM

GE

General English term

Occurring in non-scientific text

Artefact

Plasma, renal, inhibitor, antibodies