On Fri, Feb 27, 2004 at 10:30:39AM +0000, Dan Bolser wrote: > Hi, > > Regarding the previous discussion on blast parallelization, I would like > to know more about segmentation. > > Can anyone give me a reference to this topic? If I split my target > database over n machines, doesn't that mean I have to run my query n > times? > > Cheers, > Dan. It does, but that's n times "in parallel" - meaning you're roughly looking at the execution time for a single query (reduced size). You could wrap the query up with a shell script too, so that you only physically need to run a 'single script' rather than n seperate queries. Generally there are 3 levels of segmentation used for BLAST - the query, the database, or by the number of queries. Most implementations use one form or the other, but rarely/never a combination? General theory: R.C. Braun, K.T. Pedretti, T.L. Casavant, T.E. Scheetz, C.L. Birkett, C.A. Roberts. "Parallelization of local BLAST service on workstation clusters". Future Generation Computer Systems, 2001, vol. 17, pp 745-754. Segmenting the number of queries: R. Clifford and A.J. Mackey. "Disperse: a simple and efficient approach to parallel database searching". Bioinformatics, 2000, vol. 16, no. 6, pp 564-565. Segmenting the database: D.R. Mathog. "Parallel BLAST on split databases". Bioinformatics, 2003, vol. 19, no. 14, pp 1865-1866. A.E. Darling, L. Carey, W. Feng. "The Design, Implementation, and Evaluation of mpiBLAST". ClusterWorld Conference & Expo and the 4th International Conference on Linux Clusters: The HPC Revolution 2003. Kp -- Karl Podesta Dublin City University, Ireland National Institute for Cellular Biotechnology, Ireland