[BiO BB] protein sequence for all organism

Hongyu Zhang me at hongyu.org
Tue Nov 11 15:50:42 EST 2008


My solution is to download the taxonomy files from Genebank, which contain the information of the taxonomy numbers for all GI numbers and the hierarchical taxonomy tree structure. You can write a program to partition the protein NR file into separated files/folders, each belonging to a specific taxonomy number that is a descendant of the eukaryote node in the taxonomy tree.

The location of the Genbank taxonomy files is ftp://ftp.ncbi.nih.gov/pub/taxonomy/


More information about the BBB mailing list