Exploiting graph kernels for high performance biomedical relation extraction

Table 4 Statistical significance (McNemar’s) tests for the ASM and APG classifiers, for the null-hypothesis being that the two classifiers are equally accurate and a significance threshold of 0.01

Dataset	Number of examples		Accuracy		P-value
	Training	Testing	APG	ASM
AIMed	11,246	5,834	58.6	53.1	3.8e-7
BioInfer	7,414	9,666	77.8	76.8	0.0011
HPRD50	16,647	433	70.9	68.1	0.999
IEPA	16,263	817	73.6	66.5	3.9e-6
LLL	16,750	330	75.4	65.1	2.2e-6
CID: Sentence level relations.	9,913	5,099	72.2	71.2	0.0969
CID: Non Sentence level relations	21,656	11,562	84.9	84.1	0.0002

ISSN: 2041-1480