[BiO BB] base counting
Peter Rice
pmr at ebi.ac.uk
Thu Mar 16 03:48:49 EST 2006
Corné HW Klaassen wrote:
> I remember having seem this once but I do not recollect exactly where so
> I'll just pop this question here:
> Does anyone know of a free software package (windows or on-line) that
> analyzes the frequency or counts all possible combinations of bases in a
> given sequence (single bases, dinucl. trinucl. tetranuc. etc.).
compseq from EMBOSS will do this. For example, it will find in E.coli
sequences the dramatic underrepresentation of CTAG (or CCTAG and CTAGG) due to
mismatch repair mechanisms.
To find such features on a range of scales, the chaos program in EMBOSS (Chaos
Game Representation) can also be useful. The above feature shows as sets of
white boxes. CpG features in mammalian genomes also appear in the plot.
Shorter sequences take up larger areas of the plot. Once you know the scale of
the feature you are looking for, a compseq run will report the under or over
represented sequences.
Hope that helps,
Peter Rice
More information about the BBB
mailing list