Skip to main content

Table 1 Statistics of co-mentions extracted from both Medline and PMCOA using the different dictionaries for identifying GO terms

From: Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct

Human

Dictionary

Span

Unique proteins

Unique GO terms

Unique co-mentions

Total co-mentions

Original

sentence

12,826

14,102

1,473,579

25,765,168

 

non-sentence

13,459

17,231

3,070,466

147,524,964

 

combined

13,492

17,424

3,222,619

173,289,862

Enhanced

sentence

12,998

15,415

1,839,360

33,199,284

 

non-sentence

13,513

18,713

3,725,450

196,761,554

 

combined

13,536

18,920

3,897,951

229,960,838

Yeast

Dictionary

Span

Unique proteins

Unique GO terms

Unique co-mentions

Total co-mentions

Original

sentence

5,016

9,471

317,715

2,945,833

 

non-sentence

5,148

12,582

715,363

18,142,448

 

combined

5,160

12,819

748,427

21,088,281

Enhanced

sentence

5,063

12,877

414,322

3,853,994

 

non-sentence

5,160

13,769

901,123

23,986,761

 

combined

5,167

14,018

939,743

27,840,755