Taking into account the whole pipeline (including networked I/O, formatdb, etc) is both a great idea and will give much more realistic results. I also think that a collection of data would be a catalyst for great future discussions and questions, e..g, "how the heck did you get your formatdb to run so fast on the 20K data?", the responses would then give the rest of us who may be a bit behind in these things great insight and ideas. I'd be VERY interested to see if anyone has results from using cluster filesystems, for example..... > -----Original Message----- > From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org > [mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On Behalf > Of James Cuff > Sent: Friday, June 24, 2005 9:46 AM > To: Tim Cutts > Cc: Clustering, compute farming & distributed computing in life science > informatics > Subject: Re: [Bioclusters] topbiocluster.org > > ........ massively long but good posting removed for brevity's sake..