Skip to main content

Table 2 Datasets used for the experiments

From: Thematic clustering of text documents using an EM-based approach

Datasets Number of Documents Number of Clusters
News-Different-3 300 3
News-Similar-3 300 3
News-Moderated-6 600 6
Parkinson's Disease 25992 -
Huntington's Disease 5602 -
  1. News-Different-3, News-Similar-3, and News-Moderated-6 are from the 20-Newsgroup collection. Parkinson's Disease and Huntington's Disease are from the MEDLINE dataset.