Skip to main content

Table 2 Overall performance of literature features on human proteins

From: Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct

Molecular function

Features

F-max

Precision

Recall

macro-AUC

Baseline (Original)

0.094

0.055

0.327

0.680

Baseline (Enhanced)

0.064

0.036

0.322

0.701

Co-mentions (Original)

0.386

0.302

0.533

0.769

Co-mentions (Enhanced)

0.377

0.336

0.447

0.764

BoW

0.394

0.376

0.414

0.768

Co-mentions + BoW

0.408

0.354

0.491

0.790

Biological process

Features

F-max

Precision

Recall

macro-AUC

Baseline (Original)

0.134

0.091

0.249

0.610

Baseline (Enhanced)

0.155

0.103

0.311

0.611

Co-mentions (Original)

0.424

0.426

0.422

0.750

Co-mentions (Enhanced)

0.429

0.427

0.430

0.752

BoW

0.461

0.467

0.455

0.768

Co-mentions + BoW

0.459

0.426

0.510

0.779

Cellular component

Features

F-max

Precision

Recall

macro-AUC

Baseline (Original)

0.086

0.050

0.305

0.640

Baseline (Enhanced)

0.073

0.041

0.317

0.642

Co-mentions (Original)

0.587

0.590

0.585

0.744

Co-mentions (Enhanced)

0.589

0.583

0.596

0.753

BoW

0.608

0.594

0.624

0.755

Co-mentions + BoW

0.607

0.592

0.622

0.773

  1. Precision, Recall and F-max are micro-averaged across all proteins. Baseline corresponds to using only the co-mentions mined from the literature as a classifier. Macro-AUC is the average AUC per GO category. “Co-mentions + BoW” utilizes original co-mentions and BoW features within a single classifier.