Skip to main content

Table 6 Disorder multiple identifier distribution by data set

From: CUILESS2016: a clinical corpus applying compositional normalization of text mentions

Disorder CUI type

Development count

Development proportion

Training count

Training proportion

CUI-less

1

0.05

7

0.20

Single

1687

87.46

2823

81.40

Double

221

11.46

562

16.21

Triple

18

0.93

73

2.11

Quadruple

2

0.10

3

0.09

Total

1929

100

3468

100

  1. Differences in disorder mention distribution between the development and training data set are likely due to note composition (see Table 3), a larger (4) set of annotators in the training data and a lack of a consensus process for the training data since each training document is annotated only by a single annotator