From: Synonym extraction and abbreviation expansion with ensembles of semantic spaces
Corpus | With stop words | Without stop words | Segments |
---|---|---|---|
Clinical | ∼42.5M tokens | ∼22.5M tokens | 268,727 documents |
(∼0.4M types) | (∼0.4M types) | ||
Medical | ∼20.3M tokens | ∼12.1M tokens | 1,153,824 sentences |
(∼0.3M types) | (∼0.3M types) |