From: Entity recognition in the biomedical domain using a hybrid approach
Neural network | Conditional random fields | |
---|---|---|
Implementation | ||
Software | R [67], nnet library | CRFSuite [68] |
Model parameters | 1 hidden layer of size 2×(n u m b e r o f i n p u t f e a t u r e s), softmax output layer | Training algorithm: averaged perceptron, default epsilon, 2 words window |
Input | n-grams selected by OGER | Single tokens |
Features | ||
Candidate character count | Count | — |
Candidate is all uppercase | Label yes/no | Label yes/no |
Candidate is all lowercase | Label yes/no | Label yes/no |
Candidate contains Greek (i.e. “alpha”, α) | Label yes/no | Label yes/no |
Candidate contains dashes (‘-’) | Count | Label yes/no |
Candidate contains numbers | Count | Label yes/no |
Candidate ends with a number | Label yes/no | Label yes/no |
Candidate contains capital letter not in first position | Label yes/no | Label yes/no |
Candidate contains lowercase characters | Count | Label yes/no |
Candidate contains uppercase characters | Count | Label yes/no |
Candidate contains spaces | Count | Label yes/no |
Candidate contains symbols | Count | Label yes/no |
2-3 character affixes appearing in an ontology in [36] | Normalized frequency | Label yes/no |
Candidate is symbol | — | Label yes/no |
Candidate’s part-of-speech | — | Yes, using [69] |
Candidate’s stem | — | Yes, using [70] |
Candidate pre-selected by OGER | — | Yes (see the “Features” section) |
Total features | 36 | About 2.8 million |
Tagging speed (on an Intel 4720HQ CPU) | 1286 tokens/sec | 632 tokens/sec |