From: Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources
Label | Description | Semantics | Relevance | Examples |
---|---|---|---|---|
1C | 1 character | Undefined | Artefact | N |
2C | 2 characters | Undefined | Margin | T3, PU, LH |
a-PG | Acronym PGN | Confirmed by UniProtKb | Core | Ras, c-fos, Wnt |
ea-PG | Extended a-PG | a-PG with preserved semantics | Margin | Ras gene, p53 protein, Src family |
xa-PG | Expanded a-PG | a-PG with modified semantics | Artefact | p53 mutations |
PGT | Protein/gene term | Confirmed by UniProtKb | Core | E-cadherin, beta-catenin, glucocorticoid receptor |
BMT | Biomedical term | Specific to biomedical scientific text, excluding PGT | Margin | olymerase, prion, IgM |
GE | General English term | Occurring in non-scientific text | Artefact | Plasma, renal, inhibitor, antibodies |