From: Thematic clustering of text documents using an EM-based approach
Given K initial clusters, the number n U , and the set of prior probabilities {pr d }d∈D, |
1. Create a random partition of D with corresponding relations . |
2. Compute p t , q t , and r t for V i . |
3. Compute α t for V i . |
4. For each cluster, select the n U points for which α t is the greatest to define the set U and the indicator values {u t }t∈T. |
5. Compute the probabilities {pz d }d∈Dfor each cluster V i . |
6. For all d, assign a document to the cluster in which the document has the highest probability. |
7. Test for convergence. Terminate if converged. |
8. For a subset , where the documents in D s has the lowest 1% {pz d } in V i , re-assign to the clusters that have the second highest probabilities. |
9. Return to Step 2. |