--------------52E51B5073111FB58AE6A81B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > > > Although -- if you put 1 or 2 GB ramdisks in each of your cluster nodes > and then set up a system for chunking blast databases into > ramdisk-friendly sizes you could build a really fast blast farm. In > that context the performance bottleneck would then become the time and > resources needed to merge the XML output from N queries against split > databases into a single result file. I've seen such systems in the past > and merging the results could in some cases take longer than the actual > search did. > Regarding XML output, this is absolutely correct. The advantage to having XML is all of the data you could possibly want from your BLAST search is available and you can parse out whichever pieces you're after. The disadvantage is that XML is 2-3X bigger in terms of volume of data produced compared to pairwise text and over an order of magnitude larger than tabular (-m 8 in NCBI BLAST). In a large search (100's - 1000's of queries vs. large databases), what are you really looking for? Are you going to eyeball all of the alignments? For your sake, I hope not. Or are you just interested in what input hit which target and how well? If the latter, run tabular first, figure out which alignments you're really interested in, and then run those jobs singly as you need to see the alignment. This eliminates a large amount of storage and I/O issues which are what will slow you down. >Pentium IIIs are "old" if you listen to Intel :) They have a vested >interest in moving people to the more expensive Pentium IV platform. >While it is true that Intel will probably end-of-life them sometime >soon they are still really good when it comes to price/performance >ratios. > >Many of the large, production-grade and 'conservative' clusters and >farms I've seen are built around PIII CPUs in the compute elements. >They are rock solid stable and your choice of motherboards and products >is still huge. I've never heard of a PIII cluster falling over because >of heat or flaky hardware or mainboard reliability problems. Your >particular needs or benchmark results may point you towards a Pentium >IV or AMD chip though so do your own testing... A 1.4 GHz PIII processor can crank through 1 sequence vs the nt database (blastn) in a bit under 8 seconds, if the entire database is already in memory. If this kind of performance is good enough, save money on the processing side and spend it on a networking/software setup that will let you keep the processors busy, not waiting for the data to get there or the results to be written. John Smutko smutt235@attbi.com "Enjoy yourself, it's later than you think..." --------------52E51B5073111FB58AE6A81B Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <blockquote TYPE=CITE> <p>Although -- if you put 1 or 2 GB ramdisks in each of your cluster nodes <br>and then set up a system for chunking blast databases into <br>ramdisk-friendly sizes you could build a really fast blast farm. In <br>that context the performance bottleneck would then become the time and <br>resources needed to merge the XML output from N queries against split <br>databases into a single result file. I've seen such systems in the past <br>and merging the results could in some cases take longer than the actual <br>search did. <br> </blockquote> Regarding XML output, this is absolutely correct. The advantage to having XML is all of the data you could possibly want from your BLAST search is available and you can parse out whichever pieces you're after. The disadvantage is that XML is 2-3X bigger in terms of volume of data produced compared to pairwise text and over an order of magnitude larger than tabular (-m 8 in NCBI BLAST). In a large search (100's - 1000's of queries vs. large databases), what are you really looking for? Are you going to eyeball all of the alignments? For your sake, I hope not. Or are you just interested in what input hit which target and how well? If the latter, run tabular first, figure out which alignments you're really interested in, and then run those jobs singly as you need to see the alignment. This eliminates a large amount of storage and I/O issues which are what will slow you down. <p>>Pentium IIIs are "old" if you listen to Intel :) They have a vested <br>>interest in moving people to the more expensive Pentium IV platform. <br>>While it is true that Intel will probably end-of-life them sometime <br>>soon they are still really good when it comes to price/performance <br>>ratios. <br>> <br>>Many of the large, production-grade and 'conservative' clusters and <br>>farms I've seen are built around PIII CPUs in the compute elements. <br>>They are rock solid stable and your choice of motherboards and products <br>>is still huge. I've never heard of a PIII cluster falling over because <br>>of heat or flaky hardware or mainboard reliability problems. Your <br>>particular needs or benchmark results may point you towards a Pentium <br>>IV or AMD chip though so do your own testing... <p>A 1.4 GHz PIII processor can crank through 1 sequence vs the <i>nt</i> database (blastn) in a bit under 8 seconds, if the entire database is already in memory. If this kind of performance is good enough, save money on the processing side and spend it on a networking/software setup that will let you keep the processors busy, not waiting for the data to get there or the results to be written. <p>John Smutko <br>smutt235@attbi.com <br>"Enjoy yourself, it's later than you think..." <br> </html> --------------52E51B5073111FB58AE6A81B--