Skip to main content

Table 13 Distribution of sources for the BioLemmatizer lexicon

From: BioLemmatizer: a lemmatization tool for morphological processing of biomedical text

 

Lexical Source

Domain of Focus

POS tagset

No. of Entries

Perc.

1

MorphAdorner

General English

NUPOS

161,166

46%

2

GENIA tagger

Biomedicine

Penn Treebank

68,990

20%

3

BioLexicon

Biomedicine

Penn Treebank

116,809

34%

Total

BioLemmatizer

Biomedicine

NUPOS, Penn Treebank

346,965

100%