Skip to main content

Table 1 Corpora statistics

From: Synonym extraction and abbreviation expansion with ensembles of semantic spaces

Corpus With stop words Without stop words Segments
Clinical 42.5M tokens 22.5M tokens 268,727 documents
  (0.4M types) (0.4M types)  
Medical 20.3M tokens 12.1M tokens 1,153,824 sentences
  (0.3M types) (0.3M types)  
  1. The number of tokens and unique terms (word types) in the medical and clinical corpus, with and without stop words.