Skip to main content

Table 1 Descriptive statistics of the corpora

From: Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

   i2b2/VA corpus   JNLPBA corpus
Documents 349 2,000
Tokens 260,570 492,301
Concept phrases Problem 11,968 Protein 30,269
Test 7,369 DNA 9,530
Treatment 8,500 Cell Type 6,710
   Cell Line 3,830
   RNA 951