Good to know! Thanks again for your input! Peiran >Delivered-To: bioclusters@bioinformatics.org >Mime-Version: 1.0 (Apple Message framework v619) >Content-Transfer-Encoding: 7bit >From: David Adelson <david.adelson@tamu.edu> >Subject: Re: [Bioclusters] blast db update >To: bioclusters@bioinformatics.org >X-BeenThere: bioclusters@bioinformatics.org >X-Mailman-Version: 2.0.8 >List-Unsubscribe: <https://bioinformatics.org/mailman/listinfo/bioclusters>, <mailto:bioclusters-request@bioinformatics.org?subject=unsubscribe> >List-Id: Clustering, compute farming & distributed computing in life science informatics <bioclusters.bioinformatics.org> >List-Post: <mailto:bioclusters@bioinformatics.org> >List-Help: <mailto:bioclusters-request@bioinformatics.org?subject=help> >List-Subscribe: <https://bioinformatics.org/mailman/listinfo/bioclusters>, <mailto:bioclusters-request@bioinformatics.org?subject=subscribe> >List-Archive: <https://bioinformatics.org/pipermail/bioclusters/> >Date: Mon, 18 Oct 2004 13:28:39 -0500 >X-Virus-Scanned: clamd / ClamAV version 0.70, clamav-milter version 0.70j > >Peiran > >We just download the whole thing. nt is actually not so huge compared >to raw trace file data for canine, bovine and chicken :-). We have >used perl scripts along with entrez queries to download portions of >htgs or gss for organism specific dbs and that avoids having to >download the whole thing if you just want one organism. We do this >mainly to speed up the blast searches, so that people working on >sorghum or rice don't have to wait for blast to search the 85% of htgs >they are not interested in. > >Dave > >On Oct 18, 2004, at 12:54 PM, Peiran Song wrote: > >> Dave, >> >> Thank you for your reply and advice! >> >> The thing still bugs me is the updateing of huge database like nt, how >> to avoid >> downloading the whole thing every time. Do you do incremental updates? >> What is >> your strategy there? >> >> thanks, >> Peiran >> >> >> >> >>> Delivered-To: bioclusters@bioinformatics.org >>> Mime-Version: 1.0 (Apple Message framework v619) >>> Content-Transfer-Encoding: 7bit >>> From: David Adelson <david.adelson@tamu.edu> >>> Subject: Re: [Bioclusters] blast db update >>> To: bioclusters@bioinformatics.org >>> X-BeenThere: bioclusters@bioinformatics.org >>> X-Mailman-Version: 2.0.8 >>> List-Unsubscribe: >>> <https://bioinformatics.org/mailman/listinfo/bioclusters>, >> <mailto:bioclusters-request@bioinformatics.org?subject=unsubscribe> >>> List-Id: Clustering, compute farming & distributed computing in life >>> science >> informatics <bioclusters.bioinformatics.org> >>> List-Post: <mailto:bioclusters@bioinformatics.org> >>> List-Help: >>> <mailto:bioclusters-request@bioinformatics.org?subject=help> >>> List-Subscribe: >>> <https://bioinformatics.org/mailman/listinfo/bioclusters>, >> <mailto:bioclusters-request@bioinformatics.org?subject=subscribe> >>> List-Archive: <https://bioinformatics.org/pipermail/bioclusters/> >>> Date: Mon, 18 Oct 2004 10:59:42 -0500 >>> X-Virus-Scanned: clamd / ClamAV version 0.70, clamav-milter version >>> 0.70j >>> >>> Peiran, >>> >>> sorry about the first reply, I need to read things before I reply to >>> them. >>> >>> For the type of download you refer to you can use an entrez query with >>> one of the SOAP tools. >>> >>> See >>> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/ >>> efetchseq_help.html#SequenceDatabases for some details. >>> >>> You should be able to write a perl script based on the example they >>> provide and organism name (taxonomy ID) that allows you to retrieve >>> just the sequences from the organism you want from the db you want in >>> fasta format. >>> >>> For example, ("txid4530"[Organism] AND biomol_genomic[PROP]) should >>> return all rice genomic sequences. >>> >>> Just have cron run it and then btformatdb it as usual. >>> >>> Hope this helps. >>> >>> Dave >>> >>> On Oct 14, 2004, at 4:37 PM, Peiran Song wrote: >>> >>>> Hi, >>>> >>>> This has been a topic before, but I am still in need of suggestions >>>> on >>>> the job that I try to do. I need to build a local Genbank human, >>>> mouse >>>> and zebrafish blast database which is updated fairly frequently if >>>> not >>>> nightly, and be able to run the btblastall from iNquiry software to >>>> parallel blast job. >>>> >>>> I could think of two ways to get the database, but am troubled with >>>> the >>>> updates on both. >>>> >>>> One is to get the nt database and run blast with gi list of the >>>> species >>>> interested. I will have to get FASTA data from NCBI so that to format >>>> it >>>> in a way that the btblastall could parallel with. But I don't think >>>> the >>>> NCBI site support rsync, ture? Then what are people's solution for >>>> frequent update? Another problem of this strategy is the gi list also >>>> has to be updated, I don't have a good idea on that either... >>>> >>>> Another choice is to parse the genbank release to get initial data, >>>> and >>>> use the daily file for updates. But as fmerge is no longer supported, >>>> is >>>> there a good way to do the merge with NCBI db format? (WU BLAST >>>> package >>>> has utility to achieve that.) >>>> >>>> Help me out! >>>> >>>> Thanks, >>>> Peiran Song >>>> >>>> Zebrafish Information Network >>>> >>>> _______________________________________________ >>>> Bioclusters maillist - Bioclusters@bioinformatics.org >>>> https://bioinformatics.org/mailman/listinfo/bioclusters >>>> >>> >>> _______________________________________________ >>> Bioclusters maillist - Bioclusters@bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bioclusters >> >> _______________________________________________ >> Bioclusters maillist - Bioclusters@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters >> > >_______________________________________________ >Bioclusters maillist - Bioclusters@bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters