[BiO BB] Re: [ssml] Parsing taxonomy from blast output

Ian Donaldson idonalds at blueprint.org
Fri Apr 1 15:11:35 EST 2005


Hi all


I should also mention that you can retrieve this information using the
SeqHound remote Perl API (or Java/C/C++).

No need to use up disk space or wait for downloads.

The call is SHoundTaxIDFromGi described here:

http://www.blueprint.org/seqhound/apifunctsdet.html#SHoundTaxIDFromGi

You can download the API from here:

ftp://ftp.blueprint.org/pub/SeqHound/Code/

and follow the enclosed instructions to get started or look at the first few
pages of the SeqHound Manual

http://www.blueprint.org/seqhound/seqhound_documentation.html.

Taxid assignments to Gi's are updated daily as part of the core module.
Check here

http://seqhound.blueprint.org/report.html

Other API calls can also provide you with names of taxons.

Cheers

Ian

-----Original Message-----
From:
bio_bulletin_board-bounces+idonalds=blueprint.org at bioinformatics.org
[mailto:bio_bulletin_board-bounces+idonalds=blueprint.org at bioinformatics
.org]On Behalf Of Dan Bolser
Sent: April 1, 2005 12:40 PM
To: Goel, Manisha
Cc: ssml-general at bioinformatics.org;
bio_bulletin_board at bioinformatics.org; pdb-l at sdsc.edu
Subject: [BiO BB] Re: [ssml] Parsing taxonomy from blast output


On Fri, 1 Apr 2005, Goel, Manisha wrote:

>Hi All,
>
>I need to parse the blast ouput to get the taxonomy information.
>If I could get the taxonomy nodes associted with each gi number .. This
>would also work.

Yeah, this data is here...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/

See...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid.readme

"The gi_taxid_prot.dmp is about 17 MB and contains two columns:  the
protein's gi  and taxid."

You can then use the 'taxdump' to get the names.dmp (for the names) and
nodes.dmp (for the structure of the taxonomic tree) files (if you need
them).

See...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt

All the best,
Dan.


>I have been trying SEALS taxonomy commands but somehow quite a few
>sequences turn up "not_retrieved", although we have tried updating the
>database etc.
>I do not want to use the BLAST web server because I have too many files
>to run.
>Please suggest any program/script that might be useful.
>
>Thanks,
>-Manisha
>

_______________________________________________
Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board




More information about the BBB mailing list