Skip to main content

Table 1 Performance of different regexes with different datasets

From: ResidueFinder: extracting individual residue mentions from protein literature

 

Program and Version

Paper Set

TP

FP

FN

P

R

F1

F2

time

1

MF(Full text) (20) (Mutations)

I

66

3

102

0.957

0.393

0.557

0.445

13m22.452s

2

MF cut(Full text)(20)

I

66

3

102

0.957

0.393

0.557

0.445

0m1.600s

3

MF (Only Abstracts)(20)

I

9

0

5

1.000

0.643

0.783

0.692

0m33.612s

4

MF cut(Only Abstracts)(20)

I

9

0

5

1.000

0.643

0.783

0.692

0m0.120s

5

RF Regex 1(Full text)(20)

I

144

64

264

0.692

0.353

0.468

0.391

11m21.468s

6

RF 1 (Only Abstracts) (20)

I

15

13

8

0.536

0.652

0.588

0.625

0m32.240s

7

RF Regex 3 (Full text)(20)

I

385

602

23

0.390

0.944

0.552

0.735

56m30.868s

8

RF Regex 3 cut(Full text)(20)

I

370

569

38

0.394

0.907

0.549

0.720

0m8.896s

9

RF 3 (Only Abstracts)(20)

I

22

27

1

0.449

0.957

0.611

0.780

2m5.648s

10

RF 3 cut(Only Abstracts)(20)

I

21

27

2

0.438

0.913

0.592

0.750

0m0.440s

11

MF devo set (Only Abstracts)

II

201

4

26

0.980

0.885

0.931

0.903

4m51.408s

12

MF devo set cut (Only Abstracts)

II

175

0

52

1.000

0.771

0.871

0.808

0m0.748s

13

MF test set (Only Abstracts)

III

305

13

64

0.959

0.827

0.888

0.850

8m27.164s

14

MF test set cut (Only Abstracts)

III

257

0

112

1.000

0.696

0.821

0.741

0m1.208s

15

RF Regex 1(Full text)(100)

IV

661

378

520

0.636

0.560

0.595

0.573

56m7.653s

16

RF Regex 1 cut(Full text)(100)

IV

566

373

615

0.603

0.479

0.534

0.500

0m16.747s

17

RF Regex 1(no bib)(100)

IV

661

338

520

0.662

0.560

0.606

0.577

43m21.403s

18

RF Regex 1 cut(no bib)(100)

IV

561

341

620

0.622

0.475

0.539

0.499

0m13.200s

19

RF 1 (Only Abstracts) (100)

IV

59

12

45

0.831

0.567

0.674

0.606

1m15.552s

21

RF Regex 3 (Full text)(100)

IV

1030

2969

151

0.258

0.872

0.398

0.590

259m12.181s

22

RF Regex 3 cut(Full text)(100)

IV

878

2938

303

0.230

0.743

0.351

0.514

0m48.696s

23

RF Regex 3(no bib)(100)

IV

1027

2407

154

0.299

0.870

0.445

0.629

190m45.317s

24

RF Regex 3 cut(no bib)(100)

IV

876

2385

305

0.269

0.742

0.394

0.549

0m37.872s

25

RF 3 (Only Abstracts)(100)

IV

81

143

23

0.362

0.779

0.494

0.633

5m31.332s

26

RF 3 cut(Only Abstracts)(100)

IV

71

142

33

0.333

0.683

0.448

0.564

0m1.379s

27

RF 1 Single Count(20)

V

152

53

141

0.741

0.519

0.610

0.552

 

28

RF 1 Full Count(20)

V

590

240

766

0.711

0.435

0.540

0.472

 

29

RF 3 Single Count(20)

V

212

607

44

0.259

0.828

0.394

0.575

 

30

RF 3 Full Count(20)

V

1199

3657

157

0.247

0.884

0.386

0.583

 

31

Results from Verspoor et al.

VI

2463

412

245

0.857

0.910

0.882

0.898

 

32

R3 Single Count Verspoor data

VI

1345

1230

31

0.522

0.977

0.681

0.832

 

33

R3 Full Count Verspoor data

VI

3123

4558

94

0.407

0.971

0.573

0.760