Skip to main content

Table 1 Corpora statistics

From: Synonym extraction and abbreviation expansion with ensembles of semantic spaces

Corpus

With stop words

Without stop words

Segments

Clinical

42.5M tokens

22.5M tokens

268,727 documents

 

(0.4M types)

(0.4M types)

 

Medical

20.3M tokens

12.1M tokens

1,153,824 sentences

 

(0.3M types)

(0.3M types)

 
  1. The number of tokens and unique terms (word types) in the medical and clinical corpus, with and without stop words.