[BiO BB] Clustering small DNA sequences into groups

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue Aug 9 14:53:45 EDT 2005

On Tue, 9 Aug 2005, Samantha Fox wrote:

>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to
>group them into clusters based on sequence.
>Any suggestions for doing that ?

I never tried using CD-HIT to cluster DNA, but it should work (you will
have to alter the 'throwaway' length to something like 4 to stop all your
sequences being filterd as too short. 

I found blastclust (which can be explicitly set to cluster
DNA) automatically ignores any protein sequence of less than 30
residues. While it could cluster those together (100% identical for
example) it always seems to put any protein fragment less than 30 residues
into a new cluster.

Not sure if the behaviour is the same in DNA mode.

>Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org

More information about the BBB mailing list