From: De-identifying Spanish medical texts - named entity recognition applied to radiology reports
Study | Methodology | Recall | F1-score | Corpus size | Identifying tokens |
---|---|---|---|---|---|
Dalianis et al. [5] | CRF | 0.715 | 0.810 | 100 clinical records, train set | 6170 |
4-fold cross-validation | |||||
Menger et al. [12] | Regular expression rules | 0.916 | 0.862 | 2000 medical texts, development | 542, test set |
and tree-based hashing | 400 medical texts, test set | ||||
Jian et al. [13] | Rule-based and CRF | 0.851 | 0.848 | 201 sentences, train set | 1259, train set |
1000 clinical records, test set | |||||
Lange et al. [28] | BiLSTM with CRF | 0.974 | 0.974 | 500 clinical records, train set | 11333, train set |
250 clinical records, development | 5801, development | ||||
250 clinical records, test set | 5661, test set | ||||
Jiang et al. [29] | BERT and flair system | 0.968 | 0.962 | 500 clinical records, train set | 11333, train set |
250 clinical records, development | 5801, development | ||||
250 clinical records, test set | 5661, test set | ||||
Pérez et al. [30] | spaCy | 0.953 | 0.960 | 500 clinical records, train set | 11333, train set |
250 clinical records, development | 5801, development | ||||
250 clinical records, test set | 5661, test set |