Skip to main content

Table 5 Corpus description and inter-annotator agreement

From: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature

Category Number of sentences (%) per category Dimension Number of sentences (%) per dimension Percent Kappa Kappa 95% CI
Inclusion 1923 out of 3971 (48.4%) Biomedical & Procedure 1449 (36.5%) 95.00% 88.96% 0.87—0.90
Standard codes 385 (9.7%) 99.47% 97.01% 0.95—0.98
Medications 593 (14.9%) 99.09% 96.44% 0.95—0.97
Laboratories 246 (6.2%) 99.70% 97.42% 0.95—0.98
Use of Natural Language Processing (NLP) 49 (1.2%) 99.65% 83.54% 0.74—0.92
Intermediate 1851 out of 3971 (46.6%) Data sources 1370 (34.5%) 96.71% 92.59% 0.91—0.93
Study design and/or Institutional Review Board (IRB) 780 (19.6%) 98.00% 93.56% 0.92—0.94
Exclusion 2273 out of 3971 (57.3%) Irrelative evidence 733 (18.4%) 97.27% 91.05% 0.89—0.92
Computational and statistical evidence 1314 (33.1%) 96.84% 92.83% 0.91—0.94
Insufficient evidence 359 (9.0%) 95.96% 78.72% 0.75—0.82