From: Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
i2b2/VA corpus
JNLPBA corpus
Documents
349
2,000
Tokens
260,570
492,301
Concept phrases
Problem
11,968
Protein
30,269
Test
7,369
DNA
9,530
Treatment
8,500
Cell Type
6,710
Cell Line
3,830
RNA
951