Skip to main content

Table 6 Disorder multiple identifier distribution by data set

From: CUILESS2016: a clinical corpus applying compositional normalization of text mentions

Disorder CUI type Development count Development proportion Training count Training proportion
CUI-less 1 0.05 7 0.20
Single 1687 87.46 2823 81.40
Double 221 11.46 562 16.21
Triple 18 0.93 73 2.11
Quadruple 2 0.10 3 0.09
Total 1929 100 3468 100
  1. Differences in disorder mention distribution between the development and training data set are likely due to note composition (see Table 3), a larger (4) set of annotators in the training data and a lack of a consensus process for the training data since each training document is annotated only by a single annotator