[BiO BB] Quick question on choosing clusters ...

Ilya Venger ivenger at wisemail.weizmann.ac.il
Fri Apr 28 14:56:14 EDT 2006

There is no really good way for choosing an optimal number of clusters. It very much depends on the nature of the data itself. Some algorithms exist such that involve computing mean split silhouette and computing the within and across cluster variability, but none will give you a really definite answer, they can only point to local minima that you are better off with.
For example, you might see that 4 clusters are better than 3 or 5, but also that 12 is better than 13 and 11, but you wouldn't realy know, which is beter, 4 or 12. As I said you need to know your data well and know how many clusters in general you expect. You might also want to employ some sorting algorithms (such as Eytan Domani's SPIN) first, in order to visualize and observe the structure of your data. 
I think that MSS was implemented some time in R, but you will need to check it.

Hope this helps,

>>> dankoc at gmail.com 04/27/06 10:11 PM >>>

I am using hierarchical agglomerative clustering, but need to split clusters
once they are identified.  Is there some way that I can determine the
optimal number of clusters to use for my data set?  Is there a function in R
that I can use for this purpose?


More information about the BBB mailing list