[Bioclusters] http://www.sistina.com/products_gfs.htm

Goran Ceric bioclusters@bioinformatics.org
Mon, 13 May 2002 13:46:06 -0700

You could use GFS on a fiber channel shared storage and have multiple NFS
servers serving files to your cluster from it. This would help you out even
if you keep everything locally on nodes because you still have to copy
everything to the nodes and I assume you do it from your master node/NFS
server. Copying 3-4 GB of data to 100 nodes from only one machine over Fast
Ethernet/NFS (or ssh) can take forever.

Goran Ceric
System Administrator
Washington University, St. Louis
Department of Genetics, Eddy Lab

-----Original Message-----
From: bioclusters-admin@bioinformatics.org
[mailto:bioclusters-admin@bioinformatics.org]On Behalf Of Ivo Grosse
Sent: Monday, May 13, 2002 10:32 AM
To: bioclusters@bioinformatics.org
Subject: Re: [Bioclusters] http://www.sistina.com/products_gfs.htm

Hi Joe,

thanks for your great *general* answer.

Hi Joe and Chris and others,

I try to make my question more *specific*:

0. we often use Blast, and we often blast two large sets against each
other, e.g. the human against the mouse genome.  In that example, one
genome (e.g. mouse) will be the database, and we will chop up the human
genome into, say, 101-kb pieces  overlapping by 1 kb, and then throw
those 30,000 101-kb pieces against the mouse database using SGE.  We
(in our group) do NOT need or want Mosix.

1. the (mouse) database will live in RAM (of each slave node), and the
way in which we feed the database to the RAM for each of the 30,000
jobs is as follows:

- cp the database to /tmp/ of ALL of the slave nodes.

- start the 30,000 jobs through SGE, where the database is READ from
/tmp/ (on the local node) and the output is WRITTEN to the central file

This is, of course, much faster than reading a GB-size database from
the central file server 30,000 times.

2. another group here at CSHL is currently in the process of preparing
the installation of a new cluster, and they have some good reasons for
choosing Mosix.  But once in a wile they also need to run Blast jobs,
of similar sizes as ours.  The question is: can Mosix + GFS + DFSA
support a protocol similar to 1.?

Best regards, Ivo


Instead of writing N identical replicas of the database to the N slave
nodes, one could keep just one copy of the database on /pvfs/, which is
accessible through all of the slave nodes.  Then, however, the GB-size
database would need to be read through the network 30,000 times.  Is
this correct?


Do you know a smarter (than 1.) way of running the Blast jobs?

Bioclusters maillist  -  Bioclusters@bioinformatics.org