[BiO BB] extracting data from NCBI taxonomy

Cook, Malcolm MEC at Stowers-Institute.org
Mon Aug 29 12:59:07 EDT 2005

You will need to script eutils

For starters:


tells you there are 902185 such proteints

You can then get any subrange of those 902185 of them with URLS.  For
instance, you can get the second 50 such gi numbers (in wordy xml) in
your browser using 


however, "The maximum number of retrieved records is 10,000."

So you need a loop, so you need a script (which you'll need to parse the
wordy xml anyway), which means you will need to study eutils a little
more and learn how to use the web environment feature.

I don't think there is another way, but I hope I can be proved wrong.


Malcolm Cook

-----Original Message-----
bio_bulletin_board-bounces+mec=stowers-institute.org at bioinformatics.org
[mailto:bio_bulletin_board-bounces+mec=stowers-institute.org at bioinformat
ics.org] On Behalf Of Lipika Ray
Sent: Friday, August 26, 2005 1:11 PM
To: marty.gollery at gmail.com
Cc: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] extracting data from NCBI taxonomy 


I don't want to pick it up through web browser. I want it through perl
program. For that I need a definite URL which will point to the text
version of gi-list in protein database for a txid7711[Organism:exp],

Lipika Ray
Postdoctoral Fellow
SUNY, Albany

>Go to NCBI and search for txid9606[Organism:exp]
>Then click the pulldown next to 'display' and pick 'GI list'. Pick
>'Send to: File' from the pulldown, and you can name the file whatever
>you want.


On 8/26/05, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
> Lipika Ray wrote:
> > Yes, I know that grepping from file 'gi_taxid_prot.dmp.gz' is
> > 9606. But if you want to know the gi list of taxid 7711, say, then
> > can't get any gi list associated with that taxid in that flat file.
> > That's why I am searching for the link to definite URL where from I
> > get the information directly.
> That sounds strange. Perhaps they forgot to update the file?
> > Lipika Ray
> > Postdoctoral Fellow
> > SUNY, Albany
> >
> >
> >
> >>Message: 8
> >>Date: Fri, 26 Aug 2005 15:51:55 +0100
> >>From: Dan Bolser <dmb at mrc-dunn.cam.ac.uk>
> >>Subject: Re: [BiO BB] extracting data from NCBI taxonomy
> >>To: "The general forum at Bioinformatics.Org"
> >>      <bio_bulletin_board at bioinformatics.org>
> >>Message-ID: <430F2C8B.7030605 at mrc-dunn.cam.ac.uk>
> >>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >>
> >>Lipika Ray wrote:
> >>
> >>>Hi,
> >>>
> >>>I want to extract the full gi list of protein database in text
format of
> >>>Homo sapiens with Taxid 9606. I want to do it through perl
> >>>So
> >>>I need the definite URL which will extract this information and
> >>>into
> >>>a file.
> >>>But I am seeing that there is some parameter, say query_key which
> >>>related to the history page. I don't understand how to set this
> >>>parameter.
> >>>For example,
> >>>The link with a taxonomy id has a definite URL like:
> >>>
> >>>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7215
> >>>
> >>>I don't understand what should be the definite URL by which I can
> >>>extract
> >>>the required information.
> >>>Please help me in this regard.
> >>>Thanking you,
> >>
> >>You could always just use this file...
> >>
> >>ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_prot.dmp.gz
> >>
> >>and grep for 9606
> >

Bioinformatics.Org general forum  -
BiO_Bulletin_Board at bioinformatics.org

More information about the BBB mailing list