Skip to main content

Table 5 Statistics of used disease entity recognition data sets

From: We are not ready yet: limitations of state-of-the-art disease named entity recognizers

 

Data set

NCBI

BC5CDR

miRNA-disease

COVID Disease

BioNLP13-CG

Size (# Abstracts)

 

593

500

201

-

300

Unique mentions

training

1614

1445

461

-

349

Unique concepts

 

632

649

-

-

-

Size (# Abstracts)

 

100

500

-

-

200

Unique mentions

development

343

1343

-

-

154

Unique concepts

 

170

589

-

-

-

Size (# Abstracts)

 

100

500

100

50

100

Unique mentions

test

407

1432

224

68

260

Unique concepts

 

192

640

-

-

-