[Bioclusters] breaking up NCBI databases

tristan bioclusters@bioinformatics.org
Sun, 4 May 2003 20:52:43 -0700


Hello Jeremy,

I try not to use this list as a marketing tool but rather a source of
information.... but since you asked this may interest you.

Paracel Blast does an excellent job splitting up and distributing data bases
across a Linux cluster. Also, you do not have to format specifically for the
number of nodes you are using.  Another advantage of the way we break
databases up is the processing of large databases and sequences that is
normally limited by the memory on the node can run to completion.

There are many other advantages to Paracel Blast including acceleration
(200%-300% on average), an efficient job queuing manager, integration of the
Paracel Filtering Package and the ability to handle large databases and
queries.  All this leads to significantly increasing the price performance
of the Linux cluster.

If you would like to try a free evaluation of the software I would be happy
to help.  Likewise feel free to contact me anytime with questions or
requests.

Best regards,

Tristan

Tristan Gill
Bioinformatics Account Manager
Paracel Inc.
1055 E. Colorado Blvd.
Pasadena, CA 91106
tristan@paracel.com
Office:  626-744-2064
Cell:    626-327-0707
Fax:     626-744-2001


> -----Original Message-----
> From: bioclusters-admin@bioinformatics.org
> [mailto:bioclusters-admin@bioinformatics.org]On Behalf Of Jeremy Mann
> Sent: Thursday, May 01, 2003 3:30 PM
> To: bioclusters@bioinformatics.org
> Subject: [Bioclusters] breaking up NCBI databases
>
>
>
> I am curious if any knows of any commercial or open source solution to
> breaking up the NCBI dbs into various sizes. Here, our
> present solution is
> NFS mounts of /ncbi to each cluster node. Today, we gave a go at
> submitting numerous BLAST jobs with PBS. Boy, talk about a complete
> performance drain (namely nfsd). I haven't seen so many
> switch lights non
> stop in a LONG time ;)
>
> I have been experimenting with mpiBLAST (using 4 test nodes).
> So far its
> worked extremely well. I like the fact that its formatdb formats equal
> segments for how many nodes you specify. Now this database is
> only useful
> when using mpiBLAST. I want to try and use one version of the
> database for
> all immplementations (we also use wwwblast and command line
> tools). And
> since the BLAST dbs are precompiled, we have to use fastacmd
> to revert to
> unformatted, THEN run the mpiblast formatdb. Now do this for
> 20 nodes ;(
> If we choose this method we would have to update once a week
> instead of
> the present nightly updates.
>
> Now there has been talk lately on this list regarding
> installing an extra
> harddrive in each node and use that as the database drive.
> After today I
> am completely sold on doing this seeing how drives are very,
> very cheap.
>
> I guess the ending question is, we would like to use one
> database (with
> equal segments for 22 nodes) for parallel, www blast and command line
> BLAST programs. Does such a thing exist or am I just wishing?
>
> Oh, one more catch, we also use SAM, PFam and GCG which use
> our existing
> NCBI dbs.
>
>
>
>
> --
> Jeremy Mann
> jeremy@biochem.uthscsa.edu
>
> University of Texas Health Science Center
> Bioinformatics Core Facility
> http://www.bioinformatics.uthscsa.edu
> Phone: (210) 567-2672
>
>
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters