[BiO BB] Obtaining lineage information from an NCBI taxId

Gary Van Domselaar gary at primary.bioinformatics.org
Fri Sep 28 13:39:08 EDT 2007

Hi James,

It sounds like you want a local copy of the NCBI taxonomy database:

the taxdump.tar.gz file, and corresponding readme will give you the 
information you need to traverse the ncbi taxonomy tree.  We have this 
parsed into a mysql database, with some perl and java code to extract 
lineages, we can send your way, if you are interested, there is also a 
good python / turbogears tutorial from andrew dalke for making a taxonomy 
server, you  could adapt the methodology to your application pretty 
quickly I think:



Gary Van Domselaar, PhD
Head, Bioinformatics
National Microbiology Laboratory
Public Health Agency of Canada
820 Elgin St., Winnipeg, MB, Canada R3E 3R2

Suite E-006
Phone:  +1 204 784 5994
Mobile: +1 204 230 1338
Fax:    +1 204 789 2018
gary_van_domselaar [at] phac-aspc.gc.ca
gary.vandomselaar [at] gmail.com

On Thu, 27 Sep 2007, James Wagner wrote:

> Hello, I was just trying to obtain the full phylogenetic lineage from a
> given NCBI taxonomy ID using BioPerl. What i am discovering is that some of
> these ids are missing names at certain levels. For example, for ID 1166 at
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=1128&lvl=3&keep=1&srchmode=1&unlock&lin=s
> the lineage
> Bacteria<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=2&lvl=3&keep=1&srchmode=1&unlock>;
> Cyanobacteria<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=1117&lvl=3&keep=1&srchmode=1&unlock>;
> Chroococcales<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=1118&lvl=3&keep=1&srchmode=1&unlock>;
> Microcystis<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=1125&lvl=3&keep=1&srchmode=1&unlock>is
> obtained
> from the tooltips one can see that this is information for the Kingdom,
> Phylum, Order, and Genus respectively, but Family and Class are missing.
> While I can get this lineage from BioPerl, I cannot figure out how to find
> out specifically that Family and Class are missing, and I was wondering if
> there was some way to script NCBI (or anywhere else) to retrieve this
> without resorting to screen scraping, as these tool-tips are the only place
> that I can seem to find this information. Or is there some sort of rule in
> bacterial taxonomy that I can apply to make this easier?
> Thanks,
> James
> _______________________________________________
> General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

Gary Van Domselaar, PhD
Associate Director, Bioinformatics.Org
gary at bioinformatics.org

More information about the BBB mailing list