[BiO BB] base counting

Corné HW Klaassen c.klaassen at cwz.nl
Thu Mar 16 04:36:26 EST 2006

Hi Peter,

Thanks for the quick reply. On paper this is exactly what I'm looking 
for but ......I gave compseq a try and it doesn't seem to work on 
features larger than 20 nt whereas I'm particularly interested in 
features 40-140 nt (I realize that this can be a very computational 
intensive job). Any other suggestions? Is there perhaps something 
similar for protein sequences or on some other arbitrary units?


>> I remember having seem this once but I do not recollect exactly where 
>> so I'll just pop this question here:
>> Does anyone know of a free software package (windows or on-line) that 
>> analyzes the frequency or counts all possible combinations of bases 
>> in a given sequence (single bases, dinucl. trinucl. tetranuc. etc.).
> compseq from EMBOSS will do this. For example, it will find in E.coli 
> sequences the dramatic underrepresentation of CTAG (or CCTAG and 
> CTAGG) due to mismatch repair mechanisms.
> To find such features on a range of scales, the chaos program in 
> EMBOSS (Chaos Game Representation) can also be useful. The above 
> feature shows as sets of white boxes. CpG features in mammalian 
> genomes also appear in the plot. Shorter sequences take up larger 
> areas of the plot. Once you know the scale of the feature you are 
> looking for, a compseq run will report the under or over represented 
> sequences.
> Hope that helps,
> Peter Rice
> _______________________________________________
> Bioinformatics.Org general forum  -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

More information about the BBB mailing list