Skip to main content

Table 1 Extraction patterns and contextual cues for databases

From: Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles

Database

Patterns

Contextual cues

ENA

[A-Z][0–9]{5}; [A-Z]{2}[0–9]{6}; [A-Z]{3}[0–9]{5}; [A-Z]{4}[0–9]{8,10}; [A-Z]{5}[0–9]{7}

genbank, gen, ddbj, embl

UniProt

[A-N,R-Z][0–9][A-Z][A-Z, 0–9][A-Z, 0–9][0–9]; [O,P,Q][0–9][A-Z, 0–9][A-Z, 0–9][A-Z, 0–9][0–9]

swissprot, sprot, uniprot

PDBe

[0–9][A-Z, 0–9]{3}

pdb

InterPro

IPR[0–9]{6}

interpro

Pfam

PF(AM)?[0–9]{5}

hmm, family, pfam

ArrayExpress

E-[A-Z]{4}-[0–9]+

arrayexpress

OMIM

[0–9]{6}

omim

Ensembl

ENS[A-Z]*G[0–9]{11}+

ensembl

RefSeq

(AC|AP|NC|NG|NM|NP|NR|NT|NW|NZ|XM|XP|XR|YP|ZP|NS)_([A-Z]{4})*[0–9]{6,9}(?:[.][0–9]+)?

refseq

RefSNP

RS[0–9]{5,9}

snp