Skip to main content

Table 3 Statistics describing the manually annotated corpora

From: Ambiguity and variability of database and software names in bioinformatics

 

Development

Test

Total number of documents

60

25

Total database and software mentions

2416

1479

Total unique resource mentions

401

301

Percentage of database mentions

36 %

28 %

Percentage of unique database mentions

27 %

30 %

Average mentions per document

40.3

70.0

Average unique mentions per document

8.1

13.4

Maximum mentions in a single document

227

217

Maximum unique mentions in a single document

57

55

Resources with only a single lexicographic mention

201

147