From: Pooling annotated corpora for clinical concept extraction
Set name
Documents
Lines
Tokens
Concepts
% of tokens included in concept annotation
i2b2/VA
349
30,673
260,570
11,967
10.9
Mayo
160
2,487
40,988
2,076
11.3