[Bioclusters] Re: Mosix and Blast

Michael Will bioclusters@bioinformatics.org
Mon, 22 Mar 2004 20:01:25 +0000 (UTC)

>> 1. Will Mosix migrate commands that logically should not be migrated, like
>> ifconfig?
> I dont know.  I asked the folks on the open-mosix list to comment here
> on these questions.

Should not be a problem since mosix does migrate your processes but still keeps
a stub on the original node if you opened a file or device. 
So even if ifconfig ends up producing its output on a different node, it should 
still write across the network to its original homenodes stub and so to the 
correct device driver. 

Apart from ifconfigs runtime being too short to be suspect to migration.

>> 2. Is MFS/DFSA a striped network file system like PVFS or something more? I
>> am somewhat confused by its /mfs/nodename... syntax. 
I understand it is similar to having an NFS mount to all mosix nodes 
local disks.

You do not share one filesystem but rather you have all others mounted. 

Depending on the path you write to the remote local storage. 

If one of them is full it does not automatically use the others.

>> 4. It's not clear whether Mosix migrates processes using shared memory. 
>> (AFAIK blast uses shared memory.) Even if Mosix migrates such processes,is
>> it possible that different threads of the same blast invocation will wind
>> up on different nodes?
> No.  BLAST uses pthreads, and Mosix could not migrate a thread.  I
> believe it migrates at the process level.  Migrating BLAST would
> probably be a Very Bad Thing(TM), as BLAST mmaps the database(s).  I do
> not know what Mosix does with mmaps, but I have difficulty visualizing
> how moving the memory map away from where the IO is occuring could be a
> win.
Mosix does not have distributed shared memory yet. People work on it, but
unless you have a real fast (low latency) interconnect it might not be that

>> 5. Mosix docs underscore that it migrates processes to the nodes where they
>> perform heavy I/O. However, I wonder whether it will be beneficial for our
>> mode of work: we run many invocations of blast (almost) simultaneously
>> until we run out of CPUs, and the processes all read the same database when
>> they start. What is the best way to utilize Mosix here?

 Check out mpiblast which has the right approach for you:
- a special formatdb splits up the database into separately searchable pieces
- copy them out so each node has one database part on its local disk
- all following searches on the same data go like this:
  a patched blast search is started on each node and
  uses the local piece of the database to search in it, 
  the results are then combined.

You might want to hack it to not rely on MPI, but still use 
the partial databases instead of one complete one from the same node:


Also, the Mosix FAQ hints that you could actually use MPI on Mosix, 
but I have no details:


Michael Will
Michael Will, Linux Sales Engineer
Tel:  415-954-2822  Toll Free:  888-PENGUIN Fax:  415-954-2899