[Bioclusters] Large memory blast servers

Wed, 10 Dec 2003 12:25:49 -0800

On Wed, 2003-12-10 at 08:59, Michael Cariaso wrote:
> It seems that a version of BLAST that could address 64G would be a 
> significant benefit to the community. Does anyone have any idea what 
> would be involved in adding PAE support into NCBI-BLAST?

My understanding is that PAE is something that can be implemented in the
kernel so that the kernel on a 32-bit machine can access 64G of memory. 
However, on a 32-bit machine with PAE and with a kernel that supports
PAE, individual processes can still only access the 2 or 3 GB[1].

In other words, I believe that one cannot add PAE support to a
user-space application.  OTOH, if you have a machine with enough RAM,
the entire database could still live in RAM in the kernel's disk cache. 
Even though a single BLAST process can't directly access more than a few
gigabytes of RAM, when the BLAST process goes to open different parts of
the database, it will be able to access them at RAM-speeds because of
the kernel's disk access caching.

When Tim Cutts said that "only 3 GB can be used by any single BLAST
job", he was referring to the per-process limit; a single BLAST run can
still benefit from larger amounts of RAM through the kernel disk cache.

With mpiBLAST, you split up the database into several chunks, each of
which can (potentially) fit into the RAM of an individual compute node. 
In this case, you have a tradeoff, which I think is getting back to your
original question.  Say you want the amount of RAM in your entire
cluster to be 24GB; you can choose between fewer nodes with more RAM in
each (say, 3 nodes with 8GB), which requires less inter-node
communication, or you can choose more nodes with less RAM in each (say,
6 nodes with 4GB), which gives you more processors.

My understanding of what Tim Cutts was saying is that having more nodes
to get more processors is worth the extra inter-node communication
required.  Apparently (I have no direct experience, though I've been
reading the list posts), mpiBLAST requires relatively little inter-node
communication.  In that case, more nodes/less RAM per node is a win from
a performance standpoint, and getting hardware that's more common
(<=4GB/node) is also a win from a cost standpoint.

Regards,
Mitch

[1] - I think there's a relatively recent patch that will give you 4GB
per-process VM space, at the cost of extra context-switch latency. 
Still, not the kind of numbers you've been talking about.