[BiO BB] How can I retrieve the biggest transposable elementfromNCBI?

zheng hui hzheng_hotml at hotmail.com
Wed May 3 11:07:47 EDT 2006


Thank you for your enthusiastic help.  I have some exprience in perl 
programing but don't know eUtils before.  This tool is a real powerful 
utility and I feel so glad that you introduce it to me.


>From: "Mike Marchywka" <mmarchywka at eyewonder.com>
>Reply-To: "The general forum at Bioinformatics.Org" 
<bio_bulletin_board at bioinformatics.org>
>To: "The general forum at Bioinformatics.Org" 
<bio_bulletin_board at bioinformatics.org>
>Subject: RE: [BiO BB] How can I retrieve the biggest transposable 
elementfromNCBI?
>Date: Tue, 2 May 2006 19:44:17 -0400
>
>I got some time so I modified my script to do this. Please check since
>I have never bothered to search of genes, just proteins and papers.
>
>$ eutilsnew -out transpx -v -nuc "transposable element"
>Count is 7254
>--18:33:51--  
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=biot
>echmarchywka&email=marchywka at hotmail.com&rettype=gb&retmode=text&retstart=0&retm

>ax=7254&db=nucleotide&query_key=1&WebEnv=0cY3im7_1EoYdn6YdoWhXDgFBoXmBY08HIOiFw7

>caoA5sCVabQVX5c at w92iPIIOFuEAAAsMPkIAAAAB
>            => `transpx'
>Resolving eutils.ncbi.nlm.nih.gov... 130.14.29.110
>Connecting to eutils.ncbi.nlm.nih.gov[130.14.29.110]:80... connected.
>HTTP request sent, awaiting response... 200 OK
>Length: unspecified [pubmed/text]
>
>     [      <=>                            ] 664,162,678  315.71K/s
>
>19:15:57 (256.82 KB/s) - `transpx' saved [664162678]
>...............
>This is pretty sloppy but it more or less worked. You can check this list
>manually and clean it up but this illustrates how you can do arbitrary
>or one-off stuff. Obviously, you need to be careful too( you can run
>various checksums with "wc" for example or, better, write scripts that 
check
>syntax more accurately. You could parse each entry and then pick a name 
and length not
>just look for things that look about right  ) :
>
>$ more transpx | grep "ACCESSION\|source" > names_and_length
>
>$ more names_and_length | sed -e 's/\.\./ /'|grep "^ACCESSION\|^     
source" |a
>wk '{ if ($1=="source") print acc" "$3-$2; else acc=$2}' >tentative_list
>
>
>$ cat tentative_list | sort -g -r -k 2 | more
>
>NT_107239 28196692
>NC_003076 26992727
>NC_003074 23470804
>NC_003071 19705358
>NC_003075 18585041
>NT_079899 17249720
>NT_079927 14818988
>AE005173 14668882
>AE005172 14221814
>NT_079879 11570171
>NT_036312 10050052
>NT_107181 9248308
>NC_003888 8667506
>NT_079926 8469245
>NT_107178 7645237
>NC_004578 6397125
>AE016853 6397125
>NC_002947 6181862
>AE015451 6181862
>NT_107180 6019142
>NT_107179 4877844
>NC_007355 4837407
>CP000099 4837407
>NT_080067 4809258
>NC_003198 4809036
>NC_003143 4653727
>AP009048 4646331
>AC_000091 4646331
>NT_080068 4609299
>NC_000962 4411531
>NC_002945 4345491
>NT_080060 4008076
>NT_079961 3480659
>NT_079923 2970703
>NT_080061 2697501
>NT_079854 2592258
>NC_002935 2488634
>NC_002950 2343475
>AE015924 2343475
>NT_080065 2206061
>NC_004116 2160266
>NT_079947 1767735
>NT_107183 1680143
>NT_080064 1671486
>NT_107176 1593846
>NT_080066 1531813
>NT_107077 1516643
>NT_107224 975436
>NC_002771 963878
>NT_080062 569062
>BA000027 425934
>NC_003903 356022
>BX248360 349658
>BX842574 349563
>BX248354 348516
>AF172282 339484
>AL445563 327649
>AL445564 321249
>BX248336 320049
>AL445565 315078
>AL939126 295149
>AL939116 293049
>AF427791 261264
>AL627283 249049
>AL645702 205102
>AJ414160 203727
>AL161495 199614
>AL161493 198219
>AL161505 198175
>AL161494 194891
>AP005160 194110
>AC092748 192266
>AC092172 191396
>AL161533 190025
>AP005298 189892
>AC068654 189348
>AC153856 188026
>AC079852 185903
>--More--

_________________________________________________________________
免费下载 MSN Explorer:   http://explorer.msn.com/lccn/  




More information about the BBB mailing list