[Bioclusters] blast and nfs
Hunter Matthews
bioclusters@bioinformatics.org
21 Apr 2003 17:32:21 -0400
On Mon, 2003-04-21 at 15:18, Chris Dagdigian wrote:
>
> All you need to do is have enough local disk in each of your compute
> nodes to hold all (or some) of your BLAST datasets. The idea is that you
> use the NFS mounted blast databases only as a 'staging area' for
> rsync'ing or copying your files to scratch or temp space on your compute
> nodes. Given the cheap cost of 40-80gb IDE disk drives this is a quick
> and easy way to get around NFS related bottlenecks.
>
> Each search can then be done against local disk on each compute node
> rather than all nodes hitting the NFS fileserver and beating it to death...
>
> This is generally what most BLAST farm operators will do as a "first
> pass" approach. It works very well and is pretty much standard practice
> these days.
>
Are there any available scripts/instructions for either the first pass
or the second pass setups?
I'm afraid I'm enough of a unix admin to do the work, but not enough of
a biologist to always understand what NCBI blast wants. (esp for the
second approach)
> The "second pass" approach is more complicated and involves splitting up
> your blast datasets into RAM-sized chunks, distributing them across the
> nodes in your cluster and then multiplexing your query across all the
> nodes to get faster throughput times. This is harder to implement and is
> useful only for long queries against big databases as there is a certain
> amount of overhead required to merge your multiplexed query results back
> into one human or machine parsable file.
>
> People only implement the 'second pass' approach when they really need
> to. Usually in places where pipelines are constantly repeating the same
> big searches over and over again.
>
>
> My $.02 of course
>
> -Chris
> www.bioteam.net
>
>
>
>
>
> _______________________________________________
> Bioclusters maillist - Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
--
Hunter Matthews Unix / Network Administrator
Office: BioScience 145/244 Duke Univ. Biology Department
Key: F0F88438 / FFB5 34C0 B350 99A4 BB02 9779 A5DB 8B09 F0F8 8438
Never take candy from strangers. Especially on the internet.