Skip to main content

Table 9 Results of manual inspection of random samples of annotations

From: Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition

 

Baseline B2

With rules

Overall

IC

# Terms

# Annotations

Accuracy

# Terms

# Annotations

Accuracy

Accuracy

Undefined

35

231

0.98

75

363

0.70

0.81

[0,1)

1

15

0.20

0

0

0.00

0.20

[1,2)

1

15

1.00

1

4

1.00

1.00

[2,3)

1

15

1.00

1

4

1.00

1.00

[3,4)

1

4

1.00

1

1

0.00

0.80

[4,5)

2

30

0.60

2

24

0.88

0.72

[5,6)

4

60

0.97

2

13

0.23

0.84

[6,7)

7

79

0.99

5

41

0.49

0.82

[7,8)

10

136

0.89

11

116

0.65

0.78

[8,9)

15

197

0.98

19

163

0.83

0.91

[9,10)

16

175

0.97

26

205

0.79

0.87

[10,11)

14

119

0.83

30

217

0.80

0.81

[11,12)

10

103

0.97

22

141

0.77

0.86

[12,13)

8

93

0.98

22

156

0.72

0.82

Total

125

1272

0.94

217

1448

0.74

0.83

  1. Accuracy, calculated via manual review of textual annotations for correctness, of random subsets of concepts recognized from the large literature collections. We sampled 1 % of concepts, with up to 15 randomly sampled specific text spans per concept, from concepts identified using baseline B2. We sampled 10 % of concepts, with up to 15 randomly sampled text spans per concept, from the new concepts recognized through the presented synonym generation rules. Overall accuracy is calculated by combining annotations of the same IC from baseline and with our rules