[Bioclusters] [Fwd: [BiO BB] question on clustering]

Aaron Darling bioclusters@bioinformatics.org
Thu, 5 Aug 2004 18:46:48 -0500 (CDT)


One simple approach would be using Neighbor Joining to build a tree where
each leaf represents one of your CCP-modules and path length between two
leaves represents the relative similarity between a pair of
modules.
There are a number of phylogenetic analysis software packages with
implementations of Neighbor Joining, let google be your guide, but they
typically take a distance matrix as input rather than a similarity matrix.
Of course, your similarity matrix could be transformed into a 'relative
distance matrix'.  If the largest value in the matrix is k, then for each
entry x, replace x with x_new = 1 - (x/k).

Although there are certainly more rigorous approaches, this ought to be
simple and would suffice as a first approximation at clustering.  It
really depends on what you plan to do with the clustering once you've
created it.

Hope that helps
-Aaron


---------------
Original Message Follows
---------------

Date: Fri,  6 Aug 2004 00:16:47 +0100
From: FJPB Asselbergs <s0340567@sms.ed.ac.uk>
Reply-To: bio_bulletin_board@bioinformatics.org
To: bio_bulletin_board@bioinformatics.org
Subject: [BiO BB] question on clustering



Hi all,

I have a question that concerns my MSc project. I am trying to cluster 30
CCP-modules (Complement Receptor 1) after having used a novel approach that
looks at the electrostatic surfaces. I have reached the stage where I have
obtained a similarity matrix of 30 by 30 filled with positive scores. The
higher a score the more similar two modules are. For example, if matrix entry
(3,6) = 13 and entry (3,8) = 24 then module 3 is more similar to module 8 than
to module 6 due to a higher score. My problem now is to cluster these 30
modules based on this one similarity matrix. I am not used to have to cluster
small datasets or in this case a similarity matrix and not having training
data. I have searched around a lot on Google for programs that could cluster my
modules using the similarity matrix but so far I have not found anything very
helpful. Does anyone know of a program (preferably free software) that could
help me out here, or another way which I could easily implement myself in a
script, that would be valid? I would really appreciate all replies to this
message and thank you all already for looking at this and thinking about
this.

Thanks and regards,
Floris


On Thu, 5 Aug 2004, J.W. Bizzaro wrote: