Skip to main content
Figure 7 | Journal of Biomedical Semantics

Figure 7

From: Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

Figure 7

Clustering has been applied to determine the similarity of the tagging solutions based on the FP and FN profiles determined on the different GSCs. The most frequent 100 FNs and FPs, respectively, from all annotations for a given corpus have been used to characterise the performance of a PGN tagger against the different corpora (GP7: GeneProt7, Wh.GP7: Wh-Ukpmc (GP7)). Four taggers have been selected that are closely related due to their components. The comparison of their FP and FN errors, respectively, or their FP and FN profiles, will help to trace back types of errors to the composition of the tagging solutions. The error profiles have been clustered according to their FN profile similarity (left diagram) and their FP profile similarity (right diagram). – Chang2 and Banner profiles always cluster together, since they rely on the same technology. The FN profiles of the taggers cluster together according to the corpus which has been used to test them, since the FN errors are predefined by the annotation standards of the corpus, i.e. by the pre-assigned annotations missed by a tagger. When we take the FP profiles into consideration, the taggers cluster together across the different corpora. Here, it is also remarkable that SP and Wh7 cluster together, although they do differ in terms of their resources.

Back to article page