Simple tricks for improving pattern-based information extraction from the biomedical literature

Table 3 Effect of filtering on combined training data (cross-validation folds from development and training corpus) and on the held-back test data set.

	Development (per split)					Test
	# patterns	Aver. pattern length	Precision	Recall	F1	Precision	Recall	F1
Baseline	590	8.93	24.7	49.2	32.9	17.2	43.9	24.8
Split 1	50	5.34	65.6	51.8	57.9	64.7	42.7	51.4
Split 2	50	4.86	78.1	52.3	62.6	63.0	37.8	47.3
Split 3	60	4.68	67.6	52.9	59.3	60.9	42.5	50.1
Split 4	40	5.02	67.7	49.5	57.2	66.6	36.7	47.3
Split 5	50	4.80	63.7	48.7	55.2	64.2	40.7	49.8
Union of patterns	104	5.65				58.2	46.8	51.9
Best 90	90	5.66				59.7	45.1	51.4
Best 80	80	5.75				64.8	37.7	47.6
Best 70	70	6.01				69.4	26.7	38.6
Best 60	60	6.17				60.0	10.0	17.1
Results of the winner of the shared task [21]						78.5	69.8	73.9

ISSN: 2041-1480