[BiO BB] mapping genbank accession to Organism name
Ryan Golhar
golharam at umdnj.edu
Sun Feb 26 13:20:22 EST 2006
Use the NCBI eutils. Submit the GenBank accession # to get the GenBank
record in XML. The XML record will contain an entry for the organism.
Here's a script I have to do this. First input is the database,
nucleotide in your case, search term is the accession #. No options
causes the results to be sent back in ASN.1 format. I forget the option
to make it xml. I think its rettype=XML or something like that.
You can even put your input in a text file and redirect the file as
input to the script...
---BEGIN: efetch.pl---
#!/usr/bin/perl -w
use LWP::Simple;
my $utils = " <http://eutils.ncbi.nlm.nih.gov/entrez/eutils>
http://eutils.ncbi.nlm.nih.gov/entrez/eutils";
print "Database: ";
$db = <STDIN>;
chomp $db;
print "Search Term: ";
$term = <STDIN>;
chomp $term;
print "Options: ";
$options = <STDIN>;
chomp $options;
my $esearch =
"$utils/esearch.fcgi?db=$db&term=$term&usehistory=y&tool=efetch";
print "$esearch\n";
my $esearch_result = get($esearch);
if ($esearch_result =~ m/<Id>(\d+)<\/Id>/) {
$id = $1;
}
if ($esearch_result =~ m/<QueryKey>(\d+)<\/QueryKey>/) {
$key = $1;
}
if ($esearch_result =~ m/<WebEnv>(.*)<\/WebEnv>/) {
$webenv = $1;
}
if (defined($id)) {
print "ID: $id\nKey: $key\nWebEnv: $webenv\n\n";
$esearch = "$utils/efetch.fcgi?db=$db&id=$id&tool=efetch";
print "$esearch\n";
my $esummary_result = get($esearch);
print "$esummary_result\n";
} else {
print "$esearch_result\n";
}
---END: efetch.ph---
-----Original Message-----
From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
[mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
] On Behalf Of Samantha Fox
Sent: Sunday, February 26, 2006 12:58 PM
To: The general forum at Bioinformatics.Org
Subject: [BiO BB] mapping genbank accession to Organism name
Hi all,
This should be really easy, but somehow I am cannot figure it out. I
BLASTed my sequences with the non-redundant nt sequence from NCBI.
The hits are something like gb|AC167666.4, emb|BX640434.1,
emb|BX640418.1 ...
Can someone suggest me a way to get the organism name from these genbank
accessions ?
hope someone has done this already :) ..
~S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20060226/a681f1c2/attachment.html>
More information about the BBB
mailing list