[BiO BB] Clustering

William Thompson thompson at wadsworth.org
Wed Sep 3 13:40:41 EDT 2003

If all you are looking for is simple clustering, check out R 
http://www.r-project.org/ It has an extensive clustering package.


Bill Thompson, PhD
Center for Bioinformatics
Wadsworth Center
NY State Dept of Health
ESP C-644
P.O. Box 509
Albany, NY  12201-0509
phone: (518) 486-7882

> Date: Wed, 3 Sep 2003 17:46:56 +0100 (BST)
> From: Dan Bolser <dmb at mrc-dunn.cam.ac.uk>
> To: bio_bulletin_board at bioinformatics.org
> Subject: Re: [BiO BB] Clustering
> Reply-To: bio_bulletin_board at bioinformatics.org
> > > What packages support clustering of points
> > > with a with a similarity matrix?
> > 
> > I don't think I quite understand the question, can you elaborate on that?
> Yup... I am always finding that I have some similarities between things,
> and I would like to be able to do a simple clustering of the points,
> but I am not familiar with the algoithms, so I would just like to play
> around a bit.
> I know you can do phylogenetic analysis on any similarity matrix, but
> I don't need the high resolution (many similar points closly linked to
> one short branch). I would like to generally see what 'blobs' of data
> I have without investing too much time into the analysis (or the
> computation!).
> For example I might have the AA composition of 1000 sequences, and we
> may suspect that the composition is biased across these sequences (not 
> uniform). So we think - maby I should break up into secondary structure,
> maby into families, maby I should perform chi-squaird between every
> possible combination of groups of the 1000 to find sub populations within
> which the composition isn't biased...
> If I take each protein and compare it's composition to every other, I have
> an N**2/2 similarity matrix, which I would like to cluster, just to see
> if any protein families, structural classes or taxonomic groups have a
> particular bias in terms of AA composition, but this is a long complicated
> analysis (I think to myself), so I don't bother.
> Now I ask I am sure there are 1000's of clustering toolkits out there, 
> I should just google. Does anyone have any recomendations?

More information about the BBB mailing list