SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks

Table 10 Comparison between similar clinical annotation projects

Corpus	Type	Strict	Lenient	Flexible	Relaxed
CLEF [6]	entities	0.77 (8.5%)	0.80 (−3.6%)	0.77 (0%)	0.80 (−13.0%)
CLEF [6]	relations	–	0.75 (−12.7%)	–	–
IxaMed-GS [14]	entities	0.84 (18.3%)	0.90 (8.4%)	0.84 (9.1%)	0.90 (−2.2%)
IxaMed-GS [14]	relations	–	0.82 (−4.6%)	–	–
MERLOT [15]	entities	0.79 (11.2%)	–	0.79 (2.6%)	–
MERLOT [15]	relations	–	0.78 (−9.3%)	–	–
MedAlert [18, 55]	entities	0.80 (12.6%)	–	0.80 (3.9%)	–
MedAlert [18, 55]	relations	–	0.66 (−23.2%)	–	–
MiPACQ [36]	entities	0.69 (− 2.8%)	0.75 (−9.6%)	0.69 (−10.4%)	0.75 (−18.5%)
MiPACQ [36]	relations	–	–	–	–

The percentage difference in performance between the proposed corpus and other clinical annotation projects is shown in parentheses. Note that the IAA values for Flexible and Relaxed matches are copies of Strict and Lenient scores, respectively to be able to report the percentage difference between our values and those of other authors who did not calculate these metrics specifically

ISSN: 2041-1480