[BiO BB] Re: [ssml] Parsing taxonomy from blast output
Ian Donaldson
idonalds at blueprint.org
Fri Apr 1 15:11:35 EST 2005
Hi all
I should also mention that you can retrieve this information using the
SeqHound remote Perl API (or Java/C/C++).
No need to use up disk space or wait for downloads.
The call is SHoundTaxIDFromGi described here:
http://www.blueprint.org/seqhound/apifunctsdet.html#SHoundTaxIDFromGi
You can download the API from here:
ftp://ftp.blueprint.org/pub/SeqHound/Code/
and follow the enclosed instructions to get started or look at the first few
pages of the SeqHound Manual
http://www.blueprint.org/seqhound/seqhound_documentation.html.
Taxid assignments to Gi's are updated daily as part of the core module.
Check here
http://seqhound.blueprint.org/report.html
Other API calls can also provide you with names of taxons.
Cheers
Ian
-----Original Message-----
From:
bio_bulletin_board-bounces+idonalds=blueprint.org at bioinformatics.org
[mailto:bio_bulletin_board-bounces+idonalds=blueprint.org at bioinformatics
.org]On Behalf Of Dan Bolser
Sent: April 1, 2005 12:40 PM
To: Goel, Manisha
Cc: ssml-general at bioinformatics.org;
bio_bulletin_board at bioinformatics.org; pdb-l at sdsc.edu
Subject: [BiO BB] Re: [ssml] Parsing taxonomy from blast output
On Fri, 1 Apr 2005, Goel, Manisha wrote:
>Hi All,
>
>I need to parse the blast ouput to get the taxonomy information.
>If I could get the taxonomy nodes associted with each gi number .. This
>would also work.
Yeah, this data is here...
ftp://ftp.ncbi.nih.gov/pub/taxonomy/
See...
ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid.readme
"The gi_taxid_prot.dmp is about 17 MB and contains two columns: the
protein's gi and taxid."
You can then use the 'taxdump' to get the names.dmp (for the names) and
nodes.dmp (for the structure of the taxonomic tree) files (if you need
them).
See...
ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
All the best,
Dan.
>I have been trying SEALS taxonomy commands but somehow quite a few
>sequences turn up "not_retrieved", although we have tried updating the
>database etc.
>I do not want to use the BLAST web server because I have too many files
>to run.
>Please suggest any program/script that might be useful.
>
>Thanks,
>-Manisha
>
_______________________________________________
Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
More information about the BBB
mailing list