Skip to main content

Table 3 Summary statistics for the corpora

From: Pooling annotated corpora for clinical concept extraction

Set name

Documents

Lines

Tokens

Concepts

% of tokens included in concept annotation

i2b2/VA

349

30,673

260,570

11,967

10.9

Mayo

160

2,487

40,988

2,076

11.3