Skip to main content

Table 3 Number and percentage of annotations per corpus subset: Training, validation and test

From: De-identifying Spanish medical texts - named entity recognition applied to radiology reports

  Training (words / %) Validation (words / %) Test (words / %)
CAB 1987 / 21.37% 993 / 20.87% 120 / 9.4%
NAME 3286 / 35.34% 1591 / 33.45% 386 / 30.25%
DIR 128 / 1.38% 106 / 2.23% 72 / 5.64%
LOC 79 / 0.85% 46 / 0.97% 26 / 2.04%
NUM 1159 / 12.47% 585 / 12.29% 143 / 11.21%
FECHA 1655 / 17.79% 897 / 18.86% 300 / 23.51%
INST 1004 / 10.80% 539 / 11.33% 229 / 17.95%