[Bioclusters] parallel blast???

Chris Dagdigian bioclusters@bioinformatics.org
Sat, 21 Sep 2002 19:01:34 -0400


Last time I looked at them solid state disks were amazingly expensive. 
I was thinking about trying them out as swap devices on a big 
alphaserver but ended  up deciding to spend the $$ on more physical 
memory for the system.

In a blast or blast-farm context I'd probably just skip the solid state 
disks and instead put the databases into a ramdisk. That would be a 
cheaper approach since you don't really need the 
data-is-kept-when-power-goes-away or the backup hard disk that solid 
state systems give you. You also are limited by whatever pipe connects 
the SSD to the system (SCSI?).

Even ramdisks are of limited utility given the size and growth rate of 
some of the more common sequence databases -- you'd fall behind 
eventually.

Although -- if you put 1 or 2 GB ramdisks in each of your cluster nodes 
and then set up a system for chunking blast databases into 
ramdisk-friendly sizes you could build a really fast blast farm. In 
that context the performance bottleneck would then become the time and 
resources needed to merge the XML output from N queries against split 
databases into a single result file. I've seen such systems in the past 
and merging the results could in some cases take longer than the actual 
search did.

-Chris



On Tuesday, September 17, 2002, at 08:38  AM, Steve Gaudet wrote:

> Hello Chris,
>
>> <snip>
>>
>> You can have the fastest server on earth but if you searching with
>> blast against an NFS mounted database and your network or
>> fileserver is
>> slow then your blast searching speeds will be horrible. Give
>> me a small
>> number of speedy linux boxes and I can bring a $300,000
>> NFS/NAS system
>> to its knees. Storage does matter.
>
> Anyone ever look or try solid state disks?
>
>
>> <snip>