Journal of Biomedical Semantics

Table 1 Descriptive statistics of the corpora

From: Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

		i2b2/VA corpus		JNLPBA corpus
Documents	349		2,000
Tokens	260,570		492,301
Concept phrases	Problem	11,968	Protein	30,269
	Test	7,369	DNA	9,530
	Treatment	8,500	Cell Type	6,710
			Cell Line	3,830
			RNA	951

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com