[Bioclusters] blast db update

Peiran Song bioclusters@bioinformatics.org
Mon, 18 Oct 2004 10:54:50 -0700 (PDT)


Thank you for your reply and advice!

The thing still bugs me is the updateing of huge database like nt, how to avoid 
downloading the whole thing every time. Do you do incremental updates? What is 
your strategy there?


>Delivered-To: bioclusters@bioinformatics.org
>Mime-Version: 1.0 (Apple Message framework v619)
>Content-Transfer-Encoding: 7bit
>From: David Adelson <david.adelson@tamu.edu>
>Subject: Re: [Bioclusters] blast db update
>To: bioclusters@bioinformatics.org
>X-BeenThere: bioclusters@bioinformatics.org
>X-Mailman-Version: 2.0.8
>List-Unsubscribe: <https://bioinformatics.org/mailman/listinfo/bioclusters>, 
>List-Id: Clustering, compute farming & distributed computing in life science 
informatics <bioclusters.bioinformatics.org>
>List-Post: <mailto:bioclusters@bioinformatics.org>
>List-Help: <mailto:bioclusters-request@bioinformatics.org?subject=help>
>List-Subscribe: <https://bioinformatics.org/mailman/listinfo/bioclusters>, 
>List-Archive: <https://bioinformatics.org/pipermail/bioclusters/>
>Date: Mon, 18 Oct 2004 10:59:42 -0500
>X-Virus-Scanned: clamd / ClamAV version 0.70, clamav-milter version 0.70j
>sorry about the first reply, I need to read things before I reply to  
>For the type of download you refer to you can use an entrez query with  
>one of the SOAP tools.
>efetchseq_help.html#SequenceDatabases for some details.
>You should be able to write a perl script based on the example they  
>provide and organism name (taxonomy ID) that allows you to retrieve  
>just the sequences from the organism you want from the db you want in  
>fasta format.
>For example, ("txid4530"[Organism] AND biomol_genomic[PROP])  should  
>return all rice genomic sequences.
>  Just have cron run it and then btformatdb it as usual.
>Hope this helps.
>On Oct 14, 2004, at 4:37 PM, Peiran Song wrote:
>> Hi,
>> This has been a topic before, but I am still in need of suggestions on
>> the job that I try to do. I need to build a local Genbank human, mouse
>> and zebrafish blast database which is updated fairly frequently if not
>> nightly, and be able to run the btblastall from iNquiry software to
>> parallel blast job.
>> I could think of two ways to get the database, but am troubled with the
>> updates on both.
>> One is to get the nt database and run blast with gi list of the species
>> interested. I will have to get FASTA data from NCBI so that to format  
>> it
>> in a way that the btblastall could parallel with. But I don't think the
>> NCBI site support rsync, ture? Then what are people's solution for
>> frequent update? Another problem of this strategy is the gi list also
>> has to be updated, I don't have a good idea on that either...
>> Another choice is to parse the genbank release to get initial data, and
>> use the daily file for updates. But as fmerge is no longer supported,  
>> is
>> there a good way to do the merge with NCBI db format? (WU BLAST package
>> has utility to achieve that.)
>> Help me out!
>> Thanks,
>> Peiran Song
>> Zebrafish Information Network
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters@bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>Bioclusters maillist  -  Bioclusters@bioinformatics.org