[BiO BB] Clustering small DNA sequences into groups

Samantha Fox bioinfosm at gmail.com
Tue Aug 9 17:23:09 EDT 2005


Thanks so much for your replies. However, it did not work yet. cd-hit
gave this error, and blastclust is not usable for such small sequences
!

Any suggestions ? 

> cat fasta
>one
tagcgc
>two
atcgtt
> ./cd-hit -i fasta -o www
total seq: 0
longest and shortest : 0 and 99999
Total letters: 0
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
Abort (core dumped)

> ./cd-hit -i fasta -o www -l 5

Fatal Error
Too short -l, redefine it

Program halted !!



On 8/9/05, Martin Gollery <marty.gollery at gmail.com> wrote:
> I believe those sequences are too short for Blastclust. The default
> word size is 32.
> 
> Marty
> 
> On 8/9/05, Marcos Oliveira de Carvalho <operon at cbiot.ufrgs.br> wrote:
> >
> >
> > Hi Samantha,
> >
> > BLASTCLUST can group DNA sequences. Maybe you will need to tweak the
> > parameters (almost the same for BLAST). You can get it at the NCBI ftp:
> > ftp://ftp.ncbi.nih.gov/blast/
> >
> > cheers
> > Marcos
> >
> >
> >
> > On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox <bioinfosm at gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to
> > > group them into clusters based on sequence.
> > >
> > > Any suggestions for doing that ?
> > >
> > > Thanks,
> > >
> > > Samantha



More information about the BBB mailing list