From: Thematic clustering of text documents using an EM-based approach
Given K initial clusters, the number n_{ U }, and the set of prior probabilities {pr_{ d }}_{d∈D}, |
1. Create a random partition ${\left\{{V}_{i}\right\}}_{i=1}^{K}$ of D with corresponding relations ${\left\{{R}_{i}\right\}}_{i=1}^{K}$. |
2. Compute p_{ t }, q_{ t }, and r_{ t } for V_{ i }. |
3. Compute α_{ t } for V_{ i }. |
4. For each cluster, select the n_{ U } points for which α_{ t } is the greatest to define the set U and the indicator values {u_{ t }}_{t∈T}. |
5. Compute the probabilities {pz_{ d }}_{d∈D}for each cluster V_{ i }. |
6. For all d, assign a document to the cluster in which the document has the highest probability. |
7. Test for convergence. Terminate if converged. |
8. For a subset ${D}_{s}\subset {D}_{{V}_{i}}$, where the documents in D_{ s } has the lowest 1% {pz_{ d }} in V_{ i }, re-assign to the clusters that have the second highest probabilities. |
9. Return to Step 2. |