Many biologists fear customized code or site-specific Blast solutions because they want to be 100% sure that the statistics and alignments they get back from a search will be 100% comparable to what a vanilla wu-blast or ncbi-blast search would return. Anything that does not return exactly the same results, scores, p-values and alignments as a standard commandline search will likely cause uneasyness and questions about the reproducibility of the work. The "best" Blast servers for biologists that I have seen do not try to reinvent the wheel with whizzy new implementations of standard heuristic algorithims, especially when blast is (a) embarassingly parallel anyway and (b) performs amazingly well on dual-CPU AMD or Intel CPUs. Running blast on a Sun, SGI or HPaq server is a waste of money. It is far better to use 'big' machines for jobs that require massive memory or SMP while using your 'cheap' linux cluster to soak up the load from embarassingly parrallel stuff like blast etc. Such approaches also extend the usable lifespan of your big iron machines -- you don't need to replace them as often if you can dump much of your computational load on to a compute farm made up of essentially disposable linux boxes. People with seven-figure Alpha or SGI machines love to hear news like this. This is how I would configure a blast service for biologists today: o dual CPU machines (Athalon or Pentium III) o at least 2GB RAM per node, more if price is reasonable o at least 2 large IDE disks on separate PCI channels for use with linux software RAID0 o fastest ethernet topology I could afford o fastest NAS fileserver I could afford for staging a couple terabytes worth of blast databases o Sun GridEngine or Platform LSF doing the scheduling, job execution & resource allocation The nodes would run standard wu-blast or ncbi-blast and large jobs would be controlled by a batch-scheduler / distributed resource management system such as Platform LSF (commercial & expensive but really good) or Sun Gridengine (freely available, solid product). Such a system would be performance bottlenecked at the I/O level particularly if the blast databases are sitting on the NAS fileserver. By using dual-ATA drives in your compute nodes with linux software RAID0 you can (a) cache blast databases to local disk and (b) achieve sustained data read rates exceeding 90mb/second which is faster than what you can typically do with NFS over gigabit ethernet or a direct fiber channel connection to a SAN volume. my $.02 -chris Mario Belluardo wrote: >Dear Ognen, >I'm trying to obtain the best from our hardware possibility to give a >Blast server to biology scientist. Seem that we could have 32-CPU's >cluster. >I'm intereseted in using or testing special code for this use, anyway >I'm also interested in documentation of modifying NCBI code. > >Thanks > >bioclusters-request@bioinformatics.org wrote: > > > >>There is a parallel version of Blast based on PVM written by a former >>colleague of mine. It was written/tested on our 32-node beowulf cluster. >>Instead of posting his email address online, if interested people can >>email me and I will make sure they get in touch with him for sharing >>experiences / results and possibly obtaining the code (I dont know what >>licensing agreements he and our former employer have in place). >> >>Ognen >> >>--__--__-- >> >>_______________________________________________ >>Bioclusters maillist - Bioclusters@bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bioclusters >> >>End of Bioclusters Digest >> >> > > > -- Chris Dagdigian, <dag@sonsorol.org> Bioteam.net - Independent Bio-IT & Informatics consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net