I just joined the mailing list after reading the archives. Great reading. I have a favor to ask the group. I'm writing a book about BLAST and one of the sections is titled "Industrial Strength BLAST", which covers high throughput considerations rather than optimal search parameters (things like hardware configurations and clustering - the kinds of things discussed on this list). There are a couple of experiments I could use some help on for those interested. (1) Benchmarking is always controversial. This is probably especially true for BLAST because people have different needs. That said, I think a few real world examples with actual numbers would help people make sound decisions when purchasing hardware. I don't have convenient access to that many different machines, so I'm asking (maybe begging) for a little help. I'd like to propose a couple of tests, but before I do, I think it would be only reasonable that (a) these experiments are "owned" by this group and the book will make appropriate reference and (b) you don't participate in the tests if it will invalidate some kind "no benchmark" contract you may have with a vendor. (1.1) The first test is to search the Pfam globin family against itself using default parameters. There are 1203 sequences in the family. You can find the file at http://dna.cs.wustl.edu/globins.gz. I'm using WU-BLAST with the following command line. time blastp globins globins V=1203 B=1203 cpus=1 filter=seg+xnu > /dev/null Notes: I'm setting the CPU count to 1. Also, although I'm using WU-BLAST here, if more people are using NCBI-BLAST, I'd like to report that instead. This is not a bake-off of NCBI-BLAST vs. WU-BLAST. People have their preferences, and I'm only going to include one or the other in the book. This test isn't an accurate real world test in the sense that most of the sequences are going to match each other, but the data is small enough that the burden shouldn't be too great for anyone. It will probably take somewhere between 5-25 minutes depending on your hardware. (1.2) I'd like the second test to be a BLASTN search of some kind. This will require a larger database, and I think it will keep the same all-vs-all approach. If the response to the first experiment is good, I'll post another database. If not, I'll go sulk and do whatever experiments I can. (1.3) Is there another test anyone can think of that would be simple enough for lots of people to run? If we could come up with a suite of reasonable tests, it might be nice to have a "spec-BLAST" benchmark. One could also try tests on more than one CPU to show how an entire system performs. It sounds like a fun paper to write and a great resource, but it's beyond the scope of the book. Any takers? (2) There are differences in operating systems and compilers too. If the same tests above could be run on identical hardware but with different operating systems, this would provide a valuable resource. (3) This isn't an experiment. Is there a favorite BLAST-based question you'd like to see answered in a book? Perhaps something you already know about that is often puzzling to the inexperienced? Thanks, Ian Korf