[Bioclusters] cluster computing options

Andrew Fant bioclusters@bioinformatics.org
Fri, 18 Jan 2002 16:14:24 -0500 (EST)


Amit,
    Depending on how many systems you are planning to put into play for this grid,
you might not want to go far into the structural simulation and quantum chemistry
space.  While there are exceptions, most of those codes do not tend to scale well
beyond about 32 processors.  Also, have you considered the I/O requirements?  In
addition to passing relatively small (2K or so) messages between the processes to
keep themselves in sync, that class of program usually requires very large scratch
files as well.  I have seen a large amber job utterly thrash a fairly high-end NAS
fileserver.  If you want to look at that kind of code you do have a few options.  In
quantum chemistry, nwchem from the environmental science group at Pacific National
Laboratories is designed to work well on linux clusters, as is GAMESS from the
Gordon Group at ISU/Ames Laboratories.  For MM/MD work, NAMD (from the theoretical
biophysics group at UIUC) is essentally free for the download, as is AMMP from
Georgia State.  A good place to start when looking for code like this is
http://sal.kachinatech.com  .  It's not always the most current, but it is not
ashamed to link to other sites that provide more specialized coverage.

     Another option that would allow you to build up your chops in biocomputing
would be to talk to a bioinformatics type about workflow type applications where
several different applications are used in an algorithmic manner to solve more
complex problems.  The nice thing about trying something like this is that it leads
well to looking at meta-clustering,  where clusters separated by WAN links are
brought in to cooperatively deal with grand-challenge class problems.  An
interesting example of workflows ( on a VERY grand scale) can be seen at
http://www.cs.virginia.edu/~legion/papers/hpdc01.pdf.

     I hope this helps.  Post if you have any more questions.  It's nice to see some
traffic on the list.

Andy

On 17-Jan-2002 chris dagdigian wrote:
> 
> Hi Amit,
> 
> You will find that most life science applications are not parallel 
> aware. The majority are standalone binaries or algorithims that are run 
> serially or in "embarassingly parallel" mode.
> 
> There are starting to be exceptions, especially for those researchers 
> who are doing chemistry, structural and molecular modelling work. The 
> problem is that lots of those programs are commercial (supplied by 
> companies like MSI and Accelrys) and others are made available under all 
> sorts of license clauses.  An example of this is AMBER 
> (http://www.amber.ucsf.edu/amber/amber.html) which happens to be MPI 
> aware but is available to academics under license that apparently costs 
> $400 USD...
> 
> -chris
> 
> 
> Amit Murthy wrote:
> 
>> Hi,
>> 
>> I am approaching biocomputing with only a computer engineering
>> background. I intend to set up a Globus grid locally in order to learn
>> more about grid computing. In this context I would like to run a sample
>> bio computing related problem and benchmark the grid with different
>> numbers of nodes. The focus will be more on learning about grid
>> computing. I need to identify a biocomputing problem to act as an
>> application on top of the cluster.
>> 
>> I am looking for suggestions from people as to what problems I should
>> select  for the benchmarking. It will be helpful if the code for the bio
>> computing part is available open source and has already been
>> parallelized.
>> 
>> Any suggestions/pointers will be helpful.
>> 
>> Thanks and regards,
>> 
>> Amit
>> 
>> 
>> 
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters@bioinformatics.org
>> http://bioinformatics.org/mailman/listinfo/bioclusters
>> 
> 
> 
> -- 
> Chris Dagdigian, <dag@sonsorol.org>
> Life Science IT & Research Computing Freelancer
> Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
> Yahoo IM: craffi
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Andrew Fant           |                                 | email: fant@vrtx.com
HPC Geek              |                                 | phone: (617)444-6100
Vertex Pharmaceuticals| Disclaimer: Who would be crazy  |
Cambridge, MA  02139  | enough to claim these opinions? |  ICBM: 42.35N 71.09W