[BiO BB] mapping genbank accession to Organism name

Ryan Golhar golharam at umdnj.edu
Sun Feb 26 13:20:22 EST 2006


Use the NCBI eutils.  Submit the GenBank accession # to get the GenBank
record in XML.  The XML record will contain an entry for the organism.

Here's a script I have to do this.  First input is the database,
nucleotide in your case, search term is the accession #.  No options
causes the results to be sent back in ASN.1 format.  I forget the option
to make it xml.  I think its rettype=XML or something like that.
 
You can even put your input in a text file and redirect the file as
input to the script...
 
 
---BEGIN: efetch.pl---
#!/usr/bin/perl -w
 
use LWP::Simple;
 
my $utils = " <http://eutils.ncbi.nlm.nih.gov/entrez/eutils>
http://eutils.ncbi.nlm.nih.gov/entrez/eutils";
 
print "Database: ";
$db = <STDIN>;
chomp $db;
 
print "Search Term: ";
$term = <STDIN>;
chomp $term;
 
print "Options: ";
$options = <STDIN>;
chomp $options;
 
my $esearch =
"$utils/esearch.fcgi?db=$db&term=$term&usehistory=y&tool=efetch";
print "$esearch\n";
my $esearch_result = get($esearch);
 
if ($esearch_result =~ m/<Id>(\d+)<\/Id>/) {
        $id = $1;
}
if ($esearch_result =~ m/<QueryKey>(\d+)<\/QueryKey>/) {
        $key = $1;
}
if ($esearch_result =~ m/<WebEnv>(.*)<\/WebEnv>/) {
        $webenv = $1;
}
 
if (defined($id)) {
        print "ID: $id\nKey: $key\nWebEnv: $webenv\n\n";
 
        $esearch = "$utils/efetch.fcgi?db=$db&id=$id&tool=efetch";
        print "$esearch\n";
        my $esummary_result = get($esearch);
        print "$esummary_result\n";
} else {
        print "$esearch_result\n";
}
---END: efetch.ph---
 

-----Original Message-----
From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
[mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
] On Behalf Of Samantha Fox
Sent: Sunday, February 26, 2006 12:58 PM
To: The general forum at Bioinformatics.Org
Subject: [BiO BB] mapping genbank accession to Organism name


Hi all,
This should be really easy, but somehow I am cannot figure it out. I
BLASTed my sequences with the non-redundant nt sequence from NCBI.
The hits are something like gb|AC167666.4, emb|BX640434.1,
emb|BX640418.1 ...

Can someone suggest me a way to get the organism name from these genbank
accessions ?

hope someone has done this already :) ..
~S


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20060226/a681f1c2/attachment.html>


More information about the BBB mailing list