Skip to main content

Table 5 Corpus description and inter-annotator agreement

From: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature

Category

Number of sentences (%) per category

Dimension

Number of sentences (%) per dimension

Percent

Kappa

Kappa 95% CI

Inclusion

1923 out of 3971 (48.4%)

Biomedical & Procedure

1449 (36.5%)

95.00%

88.96%

0.87—0.90

Standard codes

385 (9.7%)

99.47%

97.01%

0.95—0.98

Medications

593 (14.9%)

99.09%

96.44%

0.95—0.97

Laboratories

246 (6.2%)

99.70%

97.42%

0.95—0.98

Use of Natural Language Processing (NLP)

49 (1.2%)

99.65%

83.54%

0.74—0.92

Intermediate

1851 out of 3971 (46.6%)

Data sources

1370 (34.5%)

96.71%

92.59%

0.91—0.93

Study design and/or Institutional Review Board (IRB)

780 (19.6%)

98.00%

93.56%

0.92—0.94

Exclusion

2273 out of 3971 (57.3%)

Irrelative evidence

733 (18.4%)

97.27%

91.05%

0.89—0.92

Computational and statistical evidence

1314 (33.1%)

96.84%

92.83%

0.91—0.94

Insufficient evidence

359 (9.0%)

95.96%

78.72%

0.75—0.82