[BiO BB] How can I retrieve the biggest transposable element fromNCBI?

Mike Marchywka mmarchywka at eyewonder.com
Tue May 2 19:44:17 EDT 2006


I got some time so I modified my script to do this. Please check since
I have never bothered to search of genes, just proteins and papers.

$ eutilsnew -out transpx -v -nuc "transposable element"
Count is 7254
--18:33:51--  http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=biot
echmarchywka&email=marchywka at hotmail.com&rettype=gb&retmode=text&retstart=0&retm
ax=7254&db=nucleotide&query_key=1&WebEnv=0cY3im7_1EoYdn6YdoWhXDgFBoXmBY08HIOiFw7
caoA5sCVabQVX5c at w92iPIIOFuEAAAsMPkIAAAAB
           => `transpx'
Resolving eutils.ncbi.nlm.nih.gov... 130.14.29.110
Connecting to eutils.ncbi.nlm.nih.gov[130.14.29.110]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [pubmed/text]

    [      <=>                            ] 664,162,678  315.71K/s

19:15:57 (256.82 KB/s) - `transpx' saved [664162678]
...............
This is pretty sloppy but it more or less worked. You can check this list
manually and clean it up but this illustrates how you can do arbitrary
or one-off stuff. Obviously, you need to be careful too( you can run 
various checksums with "wc" for example or, better, write scripts that check
syntax more accurately. You could parse each entry and then pick a name and length not
just look for things that look about right  ) :

$ more transpx | grep "ACCESSION\|source" > names_and_length

$ more names_and_length | sed -e 's/\.\./ /'|grep "^ACCESSION\|^     source" |a
wk '{ if ($1=="source") print acc" "$3-$2; else acc=$2}' >tentative_list


$ cat tentative_list | sort -g -r -k 2 | more

NT_107239 28196692
NC_003076 26992727
NC_003074 23470804
NC_003071 19705358
NC_003075 18585041
NT_079899 17249720
NT_079927 14818988
AE005173 14668882
AE005172 14221814
NT_079879 11570171
NT_036312 10050052
NT_107181 9248308
NC_003888 8667506
NT_079926 8469245
NT_107178 7645237
NC_004578 6397125
AE016853 6397125
NC_002947 6181862
AE015451 6181862
NT_107180 6019142
NT_107179 4877844
NC_007355 4837407
CP000099 4837407
NT_080067 4809258
NC_003198 4809036
NC_003143 4653727
AP009048 4646331
AC_000091 4646331
NT_080068 4609299
NC_000962 4411531
NC_002945 4345491
NT_080060 4008076
NT_079961 3480659
NT_079923 2970703
NT_080061 2697501
NT_079854 2592258
NC_002935 2488634
NC_002950 2343475
AE015924 2343475
NT_080065 2206061
NC_004116 2160266
NT_079947 1767735
NT_107183 1680143
NT_080064 1671486
NT_107176 1593846
NT_080066 1531813
NT_107077 1516643
NT_107224 975436
NC_002771 963878
NT_080062 569062
BA000027 425934
NC_003903 356022
BX248360 349658
BX842574 349563
BX248354 348516
AF172282 339484
AL445563 327649
AL445564 321249
BX248336 320049
AL445565 315078
AL939126 295149
AL939116 293049
AF427791 261264
AL627283 249049
AL645702 205102
AJ414160 203727
AL161495 199614
AL161493 198219
AL161505 198175
AL161494 194891
AP005160 194110
AC092748 192266
AC092172 191396
AL161533 190025
AP005298 189892
AC068654 189348
AC153856 188026
AC079852 185903
--More--


-----Original Message-----
From: bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org]On Behalf Of hzheng_hotml at hotmail.com
Sent: TuesdayMay-02-2006 11:41 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] How can I retrieve the biggest transposable element fromNCBI?


    I want to retrieve the sequence of the biggest transposable element 
from NCBI.  Can anybody tell me what's the proper steps?  
    I've try to do it by searched nucleotide databases in NCBI with keyword 
"transposable element", but most records of what I retrieved was 
irrespective.  Another problem is how can I distinguish the biggest 
transposon from others before I download all the sequences.  After all, 
nearly 8000 records is rather large. 
   I'll be so glad if anyone tell me better ways to achieve the aim. Thanks.

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn  

_______________________________________________
Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board




More information about the BBB mailing list