[Biodevelopers] Batch download of RefSeq or dbSNP?
    Titus Brown 
    titus at caltech.edu
       
    Wed Jul  5 20:24:13 EDT 2006
    
    
  
On Wed, Jul 05, 2006 at 02:25:25PM -0400, Christopher Dwan wrote:
-> 
-> I'm writing some scripts to download data.  Specifically, I need  
-> FASTA versions of:
-> 
-> * All the "finished" mouse proteins in refseq
-> * All the "finished" human proteins in refseq
-> * All the sequences in dbSNP
-> 
-> Ideally, my script would produce updated versions of these datasets  
-> nightly or so.  I would prefer to do this without spamming the NCBI  
-> servers (or my bandwidth providers) too much.
-> 
-> I've messed around with the bioperl Bio::DB routines enough to get  
-> really confused by ENTREZ queries.  I've also looked at the FASTA  
-> source available through FTP from NCBI, and that confused me more.
-> 
-> How do smart people do this sort of thing these days?
I don't know if I'm smart, but I use the NCBI Web services interface
directly,
	http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
You can also use SOAP:
	http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html
The three tasks you mention above should be pretty easy with the basic
EUtils interface.
cheers,
--titus
    
    
More information about the Biodevelopers
mailing list