Skip to main content

Table 3 Number and percentage of annotations per corpus subset: Training, validation and test

From: De-identifying Spanish medical texts - named entity recognition applied to radiology reports

 

Training (words / %)

Validation (words / %)

Test (words / %)

CAB

1987 / 21.37%

993 / 20.87%

120 / 9.4%

NAME

3286 / 35.34%

1591 / 33.45%

386 / 30.25%

DIR

128 / 1.38%

106 / 2.23%

72 / 5.64%

LOC

79 / 0.85%

46 / 0.97%

26 / 2.04%

NUM

1159 / 12.47%

585 / 12.29%

143 / 11.21%

FECHA

1655 / 17.79%

897 / 18.86%

300 / 23.51%

INST

1004 / 10.80%

539 / 11.33%

229 / 17.95%