Skip to main content

Table 2 Precision, recall and F1-score for both the corresponding test set and the respective other test set (i.e., cross evaluation)

From: We are not ready yet: limitations of state-of-the-art disease named entity recognizers

Algorithm

Train set

Test set

Precision[%]

Recall[%]

F1-Score[%]

BioBERT

NCBI

NCBI

84.62

90.09

87.27

  

BC5CDR

69.77

67.75

68.75

 

BC5CDR

NCBI

73.63

63.19

68.01

  

BC5CDR

82.07

85.39

83.07

TaggerOne

NCBI

NCBI

83.46

82.66

83.06

  

BC5CDR

70.01

40.75

51.51

 

BC5CDR

NCBI

68.30

56.38

61.77

  

BC5CDR

83.59

80.67

82.11

scispaCy

BC5CDR

NCBI

65.65

57.49

61.30

  

BC5CDR

76.20

75.22

75.71

DNorm

NCBI

NCBI

80.80

81.90

81.35

  

BC5CDR

65.73

50.29

56.98

Stanza

NCBI

NCBI

86.65

88.54

87.58

  

BC5CDR

70.24

57.78

63.40

 

BC5CDR

NCBI

75.57

62.50

68.42

  

BC5CDR

82.85

84.95

83.88

HUNER

NCBI

NCBI

83.82

86.35

85.07

  

BC5CDR

70.20

64.92

67.46

 

BC5CDR

NCBI

77.84

69.90

73.66

  

BC5CDR

83.07

83.52

83.30