Journal of Biomedical Semantics

Table 2 Overall performance of literature features on human proteins

From: Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct

Molecular function
Features	F-max	Precision	Recall	macro-AUC
Baseline (Original)	0.094	0.055	0.327	0.680
Baseline (Enhanced)	0.064	0.036	0.322	0.701
Co-mentions (Original)	0.386	0.302	0.533	0.769
Co-mentions (Enhanced)	0.377	0.336	0.447	0.764
BoW	0.394	0.376	0.414	0.768
Co-mentions + BoW	0.408	0.354	0.491	0.790
Biological process
Features	F-max	Precision	Recall	macro-AUC
Baseline (Original)	0.134	0.091	0.249	0.610
Baseline (Enhanced)	0.155	0.103	0.311	0.611
Co-mentions (Original)	0.424	0.426	0.422	0.750
Co-mentions (Enhanced)	0.429	0.427	0.430	0.752
BoW	0.461	0.467	0.455	0.768
Co-mentions + BoW	0.459	0.426	0.510	0.779
Cellular component
Features	F-max	Precision	Recall	macro-AUC
Baseline (Original)	0.086	0.050	0.305	0.640
Baseline (Enhanced)	0.073	0.041	0.317	0.642
Co-mentions (Original)	0.587	0.590	0.585	0.744
Co-mentions (Enhanced)	0.589	0.583	0.596	0.753
BoW	0.608	0.594	0.624	0.755
Co-mentions + BoW	0.607	0.592	0.622	0.773

Precision, Recall and F-max are micro-averaged across all proteins. Baseline corresponds to using only the co-mentions mined from the literature as a classifier. Macro-AUC is the average AUC per GO category. “Co-mentions + BoW” utilizes original co-mentions and BoW features within a single classifier.

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com