Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles

Table 2 Performance assessment results of the Whatizit ANA module

Database	Evaluation	#TP		#FP		#FN		Precision (%)		Recall (%)		F-score (%)
Database	Evaluation	New	Old	New	Old	New	Old	New	Old	New	Old	New	Old
ENA	Automatic	276	267	10	7	170	181	96.50	97.45	61.88	59.60	75.41	73.96
ENA	Manual	286	274	0	0	170	181	100	100	62.72	60.22	77.10	75.17
UniProt	Automatic	574	569	28	8	39	39	95.35	98.61	93.64	93.59	94.49	96.03
UniProt	Manual	601	577	1	0	39	39	99.83	100	93.91	93.67	96.78	96.73
PDBe	Automatic	568	529	32	30	12	50	94.67	94.63	97.93	91.36	96.27	92.97
PDBe	Manual	620	559	0	0	12	50	100	100	98.10	91.79	99.04	95.72

FP: False Positive, FN: False Negative, Old: Old Whatizit-ANA settings, New: New Whatizit-ANA settings.
Manual and automatic evaluation: In the automatic evaluation; we estimated the performance of the tool by assuming that publisher-supplied accession numbers in the articles are a gold standard for annotation. However, when we manually analysed the false positive annotations provided from our pipeline, we realised that the accession numbers provided in articles (the annotations that we assumed as gold standard in the automatic evaluation) might not be always complete or correct. Therefore, the annotations made by our tool, which were not already annotated in the article, were deemed false positives by the automatic evaluation, however, such annotations could be reassigned as true positives on manual inspection.

ISSN: 2041-1480