Skip to main content

Table 4 Statistical significance (McNemar’s) tests for the ASM and APG classifiers, for the null-hypothesis being that the two classifiers are equally accurate and a significance threshold of 0.01

From: Exploiting graph kernels for high performance biomedical relation extraction

Dataset

Number of examples

Accuracy

P-value

 

Training

Testing

APG

ASM

 

AIMed

11,246

5,834

58.6

53.1

3.8e-7

BioInfer

7,414

9,666

77.8

76.8

0.0011

HPRD50

16,647

433

70.9

68.1

0.999

IEPA

16,263

817

73.6

66.5

3.9e-6

LLL

16,750

330

75.4

65.1

2.2e-6

CID: Sentence level relations.

9,913

5,099

72.2

71.2

0.0969

CID: Non Sentence level relations

21,656

11,562

84.9

84.1

0.0002

  1. P-values less than the threshold are shown in italicized font