Chris Dwan writes, concerning Don Gilbert's gridlet that downloads information to each node on an as need basis: > The fact that the target needs to be re-formatted >every time we gain or lose a compute node seems particularly iffy. I had this concern as well: why go through the re-format (i.e., formatdb) each time you wish to run a job? I know that my current formatting of the databases takes a long time every week. However in trying out Don's gridlet I was pleasantly surprised to find that the format took an insignificant amount of time compared to the blast search itself. This was using datasets of 2000 sequences and input of 50+ 1000bp sequences. Of course reformatting a large dataset just to use against an input of 1 or 2 sequences would be time inefficient. Naturally there is a lot of other framework needed aside from the gridlet. Chris mentioned a few as well as the existing "queuing system of your choice is used to schedule jobs onto nodes, manage transient and permanent failures, stage data, and all that other neat stuff." There is no reason, in my mind, that such a queuing system could not also handle jobs that split up the databases dynamically. Such splitting up may become more necessary as the data becomes larger than our computers' memory. Already I have a PC cluster with very limited memory (but it was "free" to me) that is limited in what datasets I can submit to it. In summary, I think that the gridlet might be a worthwhile tool. -- Rick Rick Westerman westerman@purdue.edu Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building Bioinformatics specialist at the Genomics Facility. href="http://www.genomics.purdue.edu/~westerm"