Skip to main content

Table 1 The thematic clustering algorithm

From: Thematic clustering of text documents using an EM-based approach

Given K initial clusters, the number n U , and the set of prior probabilities {pr d }dD,

   1. Create a random partition { V i } i = 1 K of D with corresponding relations { R i } i = 1 K .

   2. Compute p t , q t , and r t for V i .

   3. Compute α t for V i .

   4. For each cluster, select the n U points for which α t is the greatest to define the set U and the indicator values {u t }tT.

   5. Compute the probabilities {pz d }dDfor each cluster V i .

   6. For all d, assign a document to the cluster in which the document has the highest probability.

   7. Test for convergence. Terminate if converged.

   8. For a subset D s D V i , where the documents in D s has the lowest 1% {pz d } in V i , re-assign to the clusters that have the second highest probabilities.

   9. Return to Step 2.