[BiO BB] Restriction sites frequencies in mouse genome
Harry Mangalam
harry.mangalam at uci.edu
Wed Sep 6 15:43:08 EDT 2006
If by calculating frequencies, you want to find all the sites in a
genome, tacg will do this. It will find all the sites you give it
(I've tested it on all human chromosome assemblies) as well as the
predicted frequency based on the base pair distribution.
It can theoretically do the entire genome in one shot if you have
enough RAM, but I've never tried it and the output would be pretty
ferocious.
for example, for chromosome 21 (a paltry 33.6MB), the summary output
is:
## Sequence: #1; from file: UNAVAILABLE
Format: FASTA; ID: gi:89161201; Description: Homo sapiens
chromosome 21, alternate assembly (based on Celera assembly), whole
genome shotgun sequence.
== Sequence info:
NB: sequence length > A+C+G+T due to -> 224404 <- IUPAC
degeneracies.
# of: N:224404 Y:0 R:0 W:0 S:0 K:0 M:0 B:0 D:0 H:0 V:0
#s below are for top strand; 'sites exp' values calculated on the
basis of both strands.
33216610 bases; 9772353 A(29.42 %) 6752472 C(20.33 %) 6753971
G(20.33 %) 9713410 T(29.24 %)
== Enzymes that DO NOT MAP to this sequence:
There were NO NON-matches - ALL patterns matched at least
ONCE.
== Total Number of Hits per Enzyme:
AatII 1068 BsiEI 1803 EcoRV 4841 PsiI
20384
AccI 12230 BsiHKAI 23981 FauI 18509
PspGI112279
AccII 9733 BsiWI 174 Fnu4HI 74994 PspOMI
6067
Acc65I 3021 BslI 91011 FokI 59656 PstI
15561
AciI 52859 BsmI 13955 FseI 235 PvuI
181
AclI 2047 BsmAI 73662 FspI 1211 PvuII
12841
AfeI 1406 BsmBI 7619 HaeII 7030 RsaI
56361
AflII 7226 BsmFI 45828 HaeIII 99508 RsrII
126
AflIII 18426 Bsp1286I 57995 HgaI 8115 SacI
6829
AgeI 676 BspEI 1246 HhaI 21013 SacII
893
AhdI 3149 BspHI 11844 HinP1I 21013 SalI
392
AluI143869 BspMI 16591 HincII 13046 SanDI
3409
AlwI 37296 BsrI 63802 HindIII 9457 SapI
4316
AlwNI 16140 BsrBI 2994 HinfI 96900 Sau96I
77627
ApaI 6067 BsrDI 16179 HpaI 4478 Sau3AI
79640
ApaLI 6042 BsrFI 4609 HpaII 29934 SbfI
1068
ApoI 74171 BsrGI 9408 HphI 67904 ScaI
5880
AscI 47 BssHII 890 KasI 2793
ScrFI137189
AseI 17631 BssKI137189 KpnI 3021 SexAI
3472
AvaI 12916 BssSI 5101 MaeII 28783 SfaNI
42093
AvaII 31938 BstAPI 9253 MaeIII 83257 SfcI
39408
AvrII 6112 BstBI 1256 MboII100007 SfiI
599
BaeI 2868 Bst4CI 87767 MfeI 6359 SfoI
2793
BaeI 2868 BstDSI 14918 MluI 334 SgfI
13
BamHI 4165 BstEII 4065 MlyI 44962 SgrAI
214
BanI 18704 BstF5I 59661 MnlI308118 SmaI
4948
BanII 27893 BstNI112279 MscI 14579 SmlI
29332
BbeI 2793 BstUI 9733 MseI226716 SnaBI
1598
BbsI 16623 BstXI 19685 MslI 38862 SpeI
4362
BbvI 63057 BstYI 24349 MspA1I 17762 SphI
6477
BbvCI 14806 BstZ17I 4605 MwoI 73785 SrfI
302
BcgI 3733 Bsu36I 10646 NaeI 1898 SspI
28450
BcgI 3733 BtgI 14918 NarI 2793 StuI
8988
BciVI 7495 BtrI 3836 NciI 24927 StyI
34781
BclI 8350 Cac8I 66066 NcoI 8941 SwaI
2801
BfaI 83296 ClaI 1121 NdeI 10096 TaiI
28783
BglI 6550 Csp6I 56361 NgoMIV 1898 TaqI
17908
BglII 8895 CviJI507227 NheI 2770 TatI
30303
BlpI 6131 CviRI168208 NlaIII161486 TfiI
51945
BmrI 19063 DdeI155096 NlaIV 87348 TliI
1496
BplI 11478 DpnI 79640 NotI 127 TseI
63101
BpmI 32957 DraI 41466 NruI 209 Tsp45I
47283
Bpu10I 25858 DraIII 6989 NsiI 11383
Tsp509I254887
BsaI 18254 DrdI 3165 NspI 36783 TspRI
98632
BsaAI 9382 EaeI 20232 PacI 1946 Tth111I
7783
BsaBI 4988 EagI 1139 PciI 12666 XbaI
9158
BsaHI 6162 EarI 25525 PflMI 11275 XcmI
9507
BsaJI121468 EciI 6774 PleI 44962 XhoI
1496
BsaWI 3529 Ecl136II 6829 PmeI 539 XmaI
4948
BseMII104754 Eco57I 24123 PmlI 4081 XmnI
11146
BseRI 23673 EcoNI 8774 Ppu10I 11383
BseSI 25059 EcoO109I 28937 PpuMI 12989
BsgI 24191 EcoRI 8938 PshAI 3251
To get the actual prdicted number of sites, you have to generate the
Sites info which would be enormous but easily sed-able to extract
what you needed.
This took 9.5s on a 2GHz Opteron running 64bit Linux
If you want, I'll send you the source tarball in a separate email.
hjm
On Tuesday 29 August 2006 05:35, Benoit VARVENNE wrote:
> Hello everybody,
>
> Thanks to all for your ideas and suggestions. I think i'm going to
> consider perl programming to calculate restriction sites frequency
> as softwares mentionned in your mails (+softwares i found) don't
> seem to be useful for a whole genome scale. Programming was to be
> avoid for this study but it seems to be the only solution. I'm
> really surprised not being able to find such an already done study.
>
> Thanks again,
> Regards,
>
> Benoît Varvenne,
> Bioinformatics pearson in charge,
> Genoway Lyon - France.
>
> Le 28/08/06 11:34, « Benoit VARVENNE » <varvenne at genoway.com>
a écrit :
> > Dear Members,
> >
> > I am a new member of this mailing-list and i don't know if such a
> > post will draw the attention of anyone here. So excuse me in
> > advance if my subject is not appropriate.
> > I am searching for a way to calculate restriction sites frequency
> > in mouse genome (so sequences from 6 to 13bp). I have already
> > tried to do so using blast (or blast-like) tools and configuring
> > them as needed but it gave no results, because of too numerous
> > hits i think.
> >
> > I would be very greatful if someone could help me on this topic.
> >
> > Thanks a lot for your help,
> > Best regards,
> >
> > Benoît Varvenne,
> > Bioinformatics pearson in charge,
> > Genoway Lyon - France
> >
> > _______________________________________________
> > General Forum at Bioinformatics.Org -
> > BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
--
Harry Mangalam - Research Computing at NACS, E2148, Engineering Gateway,
UC Irvine 92697 949 824 0084(o), 949 285 4487(c)
harry.mangalam at uci.edu
More information about the BBB
mailing list