Skip to main content

Table 7 Pattern definitions and regular expressions to detect amino acid residues and mutations in the text

From: Literature mining of protein-residue associations with graph rules learned through distant supervision

Pattern name

Pattern Meaning

Expressions

RES-S

Single letter amino acid code

[ARNDCQEGHILKMFPSTWYVOUBZX]

RES-T

Three letter amino acid code

([aA]la|ALA|[aA]rg|ARG| [aA]sn|ASN|[aA]sp|ASP| [cC]y|CYS|[gG]ln|GLN| [gG]lu|GLU|[gG]ly|GLY| [hH]is|HIS|[iI]le|ILE| [lL]eu|LEU|[lL]ys|LYS| [mM]et|MET|[pP]he|PHE| [pP]ro|PRO|[sS]er|SER| [tT]hr|THR|[tT]rp|TRP| [tT]yr|TYR|[vV]al|VAL| [pP]yl|PYL|[sS]ec|SEC)

RES-F

Full amino acid names

([aA]lanine|[aA]rginine| [aA]sparagine| [aA]spart(ate|ic acid)| [cC]ysteine|[gG]lutamine| [gG]lutam(ate|ic acid)| [gG]lycine|[hH]istidine| [iI]soleucine|[lL]eucine| [lL]ysine|[mM]ethionine| [pP]henylalanine|[pP]roline| [sS]erine|[tT]hreonine| [tT]ryptophan|[tT]yrosine| [vV]aline|[pP]yrrolysine| [aA]spartic acid |[aA]sparagine|[gG]lutamic acid|[gG]lutamine)

POS

Residue Position

0[1–9]{1,5}

WTRES

Wild type residue

(RES-S|RES-T|RES-F)

MUTRES

Mutant residue

(RES-S|RES-T|RES-F)

UNIARR

Unicode character for arrows

\\u2192,\\u21D2

UNIDASH

Unicode character for dash

\\u2013

GRAMMAR

Grammatical expressions

residues? at positions?|for| position|residues? (in|on|at) |substitutions? at|always exists as|at positions?|mutated to|substituted by

POSCOORD

Co-ordination of residue position

POS(,\\s?POS)* (and|or) POS

[e.g., 75, 76 and 78, 82 and 95]

AMINOCOORD

Co-ordination of amino acid residues

(RES-T|RES-F)(,\\s?RES-T|RES-F)* (and|or) (RES-T|RES-F)

[e.g., Alanine and Valine]

WORD

ANY WORD

 

PREP

Prepositions

in, at, on, within, of

  1. Pattern names are shown in THIS FONT and can be themselves used within other regular expressions.