Skip to main content

Table 2 Datasets used for the experiments

From: Thematic clustering of text documents using an EM-based approach

Datasets

Number of Documents

Number of Clusters

News-Different-3

300

3

News-Similar-3

300

3

News-Moderated-6

600

6

Parkinson's Disease

25992

-

Huntington's Disease

5602

-

  1. News-Different-3, News-Similar-3, and News-Moderated-6 are from the 20-Newsgroup collection. Parkinson's Disease and Huntington's Disease are from the MEDLINE dataset.