Skip to main content

Table 1 Descriptive statistics of the corpora

From: Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

  

i2b2/VA corpus

 

JNLPBA corpus

Documents

349

2,000

Tokens

260,570

492,301

Concept phrases

Problem

11,968

Protein

30,269

Test

7,369

DNA

9,530

Treatment

8,500

Cell Type

6,710

  

Cell Line

3,830

  

RNA

951