Skip to main content

Table 11 List of orthographic features used in training the CRFs model

From: A framework for ontology-based question answering with application to parasite immunology

Orthographic feature

Regular expression

HASDASH

.*-.*

INITDASH

-.*

ENDDASH

.*-

INITCAPS

[A-Z].*

INITCAPSALPHA

[A-Z][a-z].*

REALNUMBERS

[-0-9]+[.,]+[0-9.,]+

NATURALNUMBER

[0-9]+

ALLCAPS

[A-Z]+

CAPSMIX

[A-Za-z]+

DIGIT

.*[0-9].*

SINGLEDIGIT

[0-9]

DOUBLEDIGIT

[0-9][0-9]

GENEPATT

.*[tbglmjfrnix0-9]+[.][0-9]+.*

DNASEQUENCE

[ACTG]+

HASROMAN

.*\\b[IVXDLCM]+\\b.

ROMAN

[IVXDLCM]+