Because parallel BLAST is such a common problem, numerous free/open-source implementations exist. Obviously mpiBLAST won't work on a unix cluster without message passing, and its unclear to me whether MPI and condor will play nice with each other on windows (anybody have success with this?). If there are other reasons mpiBLAST is unsuitable for you I'd like to hear about them, the software is still being actively developed and we are open to suggestion for features! As part of writing a grant for the mpiBLAST project I did some research on other free, open-source parallel BLAST options. Here's a brief overview of what I was able to find. If I'm missing any significant projects or I've got the details wrong please correct me. Also, this only covers parallelizations that use database segmentation. Because query segmentation is easy so many programs have been written to use it exclusively that I'd be hard pressed to list them all. NBLAST - designed for NxN comparisons of sequence databases, e.g. every database entry gets BLAST searched against every other database entry - stores results in ASN.1 format - adjusts e-values using the database length only, providing approximately correct e-values - uses MoBiDiCK for job startup on a cluster - there is a paper describing it here: http://www.biomedcentral.com/1471-2105/3/13/ - uses unmodified NCBI blastall blast.pm - database segmentation - part of the mollusc package - written in perl, works under unix - uses rsh/ssh for job startup (need password-free login to cluster machines) - adjusts e-values using a linear-regression model that provides approximate e-value statistics - supports text output formats only - uses unmodified NCBI blastall dBlast - database segmentation - free only for non-commercial use - written in perl, works under unix - requires manual database distribution - requires OpenPBS for job management - e-value adjustments are (purportedly) accurate. dBlast uses both the effective db length and the effective query length to calculate e-values. Their clever method for e-value adjustment inspired us to make some changes for the next mpiBLAST release to give accurate e-value statistics. - supports text output formats only - requires compiling a modified NCBI blastall - see http://www.cmbi.kun.nl/software/dBlast/ for more info parallelblast by David Mathog - database segmentation - written in perl/C, works under unix - uses PVM, and optionally SGE - does approximate e-value adjustment using the effective db length - supports text and html output formats - requires compiling a modified NCBI blastall - http://bioinformatics.oupjournals.org/cgi/content/abstract/19/14/1865?ijkey=13CoOSo3fnITz&keytype=ref mpiBLAST - database segmentation - written in c++, works under unix/windows - requires MPI, optionally PBS, SGE, LSF, or Condor - e-value adjustments are approximate based on db. length (but as previously mentioned, the next release will include accurate e-value statistics) - supports all of the NCBI blastall output formats (text, html, XML, ASN.1) - requires compiling the NCBI Toolkit - includes code to interface a wwwblast server with mpiBLAST + PBS - more info at http://mpiblast.lanl.gov Also since you're considering BLAST under Windows, you may want to check into what the Cornell Theory Center is using for parallel BLAST on their windows cluster. I don't know whether their software is publicly or freely available however. None of the freely availably options (that I am aware of) currently implement combined query and database segmentation. I've found the lack of a comprehensive resource for information on parallel BLAST frustrating. Hopefully this e-mail will prove to be a useful resource for people considering parallel BLAST options. -Aaron darling(at)cs.wisc.edu On Thu, 26 Feb 2004, Micha Bayer wrote: > Hi, > > does anyone know of a non-commercial, open source/free package that > provides a parallelisation of BLAST (apart from mpiBLAST which is not > suitable for us). > > I am interested in something that would split input files into single > query sequences, partition the database and collate the results (ideally > with an adjustment of the e-values etc). > > It looks like some of the commercial packages like Paracel do all of the > above but I really need an open source version and before I get writing > my own I want to make sure I have tried all the available options. > > I am looking to run a service both on a Windows XP based Condor pool and > on a cluster that uses OpenPBS but has no message passing capabilities > to speak of. > > cheers > > Micha > > > -- > -------------------------------------------------- > Dr Micha M Bayer > Grid Developer, Bridges Project > National eScience Centre, Glasgow Hub > 246c Kelvin Building > University of Glasgow > Glasgow G12 8QQ > Scotland, UK > Email: michab@dcs.gla.ac.uk > Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/ > Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/ > Tel.: +44 (0)141 330 2958 > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >