[Bioclusters] blast db update

David Adelson bioclusters@bioinformatics.org
Mon, 18 Oct 2004 10:59:42 -0500


Peiran,

sorry about the first reply, I need to read things before I reply to  
them.

For the type of download you refer to you can use an entrez query with  
one of the SOAP tools.

See  
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/ 
efetchseq_help.html#SequenceDatabases for some details.

You should be able to write a perl script based on the example they  
provide and organism name (taxonomy ID) that allows you to retrieve  
just the sequences from the organism you want from the db you want in  
fasta format.

For example, ("txid4530"[Organism] AND biomol_genomic[PROP])  should  
return all rice genomic sequences.

  Just have cron run it and then btformatdb it as usual.

Hope this helps.

Dave

On Oct 14, 2004, at 4:37 PM, Peiran Song wrote:

> Hi,
>
> This has been a topic before, but I am still in need of suggestions on
> the job that I try to do. I need to build a local Genbank human, mouse
> and zebrafish blast database which is updated fairly frequently if not
> nightly, and be able to run the btblastall from iNquiry software to
> parallel blast job.
>
> I could think of two ways to get the database, but am troubled with the
> updates on both.
>
> One is to get the nt database and run blast with gi list of the species
> interested. I will have to get FASTA data from NCBI so that to format  
> it
> in a way that the btblastall could parallel with. But I don't think the
> NCBI site support rsync, ture? Then what are people's solution for
> frequent update? Another problem of this strategy is the gi list also
> has to be updated, I don't have a good idea on that either...
>
> Another choice is to parse the genbank release to get initial data, and
> use the daily file for updates. But as fmerge is no longer supported,  
> is
> there a good way to do the merge with NCBI db format? (WU BLAST package
> has utility to achieve that.)
>
> Help me out!
>
> Thanks,
> Peiran Song
>
> Zebrafish Information Network
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>