Skip to main content

Table 1 Feature sets: features used by the NN and CRF (see the “Features” section for details)

From: Entity recognition in the biomedical domain using a hybrid approach

  Neural network Conditional random fields
Implementation
 Software R [67], nnet library CRFSuite [68]
 Model parameters 1 hidden layer of size 2×(n u m b e r o f i n p u t f e a t u r e s), softmax output layer Training algorithm: averaged perceptron, default epsilon, 2 words window
Input n-grams selected by OGER Single tokens
Features
 Candidate character count Count
 Candidate is all uppercase Label yes/no Label yes/no
 Candidate is all lowercase Label yes/no Label yes/no
 Candidate contains Greek (i.e. “alpha”, α) Label yes/no Label yes/no
 Candidate contains dashes (‘-’) Count Label yes/no
 Candidate contains numbers Count Label yes/no
 Candidate ends with a number Label yes/no Label yes/no
 Candidate contains capital letter not in first position Label yes/no Label yes/no
 Candidate contains lowercase characters Count Label yes/no
 Candidate contains uppercase characters Count Label yes/no
 Candidate contains spaces Count Label yes/no
 Candidate contains symbols Count Label yes/no
 2-3 character affixes appearing in an ontology in [36] Normalized frequency Label yes/no
 Candidate is symbol Label yes/no
 Candidate’s part-of-speech Yes, using [69]
 Candidate’s stem Yes, using [70]
 Candidate pre-selected by OGER Yes (see the “Features” section)
Total features 36 About 2.8 million
Tagging speed (on an Intel 4720HQ CPU) 1286 tokens/sec 632 tokens/sec