[BiO BB] base counting

Peter Rice pmr at ebi.ac.uk
Thu Mar 16 03:48:49 EST 2006

Corné HW Klaassen wrote:

> I remember having seem this once but I do not recollect exactly where so 
> I'll just pop this question here:
> Does anyone know of a free software package (windows or on-line) that 
> analyzes the frequency or counts all possible combinations of bases in a 
> given sequence (single bases, dinucl. trinucl. tetranuc. etc.).

compseq from EMBOSS will do this. For example, it will find in E.coli 
sequences the dramatic underrepresentation of CTAG (or CCTAG and CTAGG) due to 
mismatch repair mechanisms.

To find such features on a range of scales, the chaos program in EMBOSS (Chaos 
Game Representation) can also be useful. The above feature shows as sets of 
white boxes. CpG features in mammalian genomes also appear in the plot. 
Shorter sequences take up larger areas of the plot. Once you know the scale of 
the feature you are looking for, a compseq run will report the under or over 
represented sequences.

Hope that helps,

Peter Rice

More information about the BBB mailing list