[Bioclusters] Xserve G5 memory

Joe Landman bioclusters@bioinformatics.org
Tue, 05 Oct 2004 16:29:59 -0400

Aaron Darling wrote:

> [I just finished composing this as Joe's post showed up]


> Under most modern operating systems, BLAST databases automatically get 
> cached in memory by the OS as they are searched.  The operating system 
> typically uses all available memory (unused by currently active 
> applications) to cache frequently used data from files on disk.  The 
> memory cache for data stored on disk is referred to as a "buffer 
> cache".  Different operating systems have different policies regarding 
> which data gets stored in the buffer cache, but the common theme is 
> that they all attempt to cache the most frequently used (MFU) data in 
> the buffer cache.
> To the best of my knowledge, NCBI's blastall code uses memory-mapped 
> I/O whenever possible.  Although I am not aware of any, there may be 
> some Darwin-specific limitation that prevents memory-mapped files from 
> getting added to the buffer cache.  This may be an avenue for further 
> investigation.

There are noticeable differences in the behavior of mmap between Unixen 
and Linux.  Under Irix, there were methods (ok, going on memory here) to 
pin the mmaped file into memory for a bit.  You can do it (sort of) 
under linux, by opening a companion program that does nothing but sit 
there and hold the file open after reading through it.  The OS will 
force IO requests to the memory mapped arena.

Do I recommend this?  Heck no.

Anyone else remember memory management techniques (still in common use) 
under Fortran?  This has that same hackish flavor, and it effectively 
defeats the buffer cache.   Not sure if this is a good idea.  Buffer 
cache under Linux usually works pretty well, but then again, I am 
running into some rather annoying buffer cache issues on a 32 GB Linux 
machine right now.   With 2.6 you can tune the behavior a bit.

What about Apple?  I think the best approach may be to ask the Apple 
kernel folks.

> A quick hack to ensure the DB stays in memory would be creating a 
> RAMdisk and copy the DB onto it.  I think OS X supports RAMdisks.(?)  
> Unfortunately, the memory consumed by the RAMdisk can't be used for 
> anything else when you aren't running BLAST.

The think which I have not looked into is whether or not RAM disks are 
subject to be paged out, or if these pages are locked (the latter I hope).

> -Aaron
> Victor M.Ruotti wrote:
>> Hi Juan,
>> How exactly do you hold your databases in memory. Do you it through 
>> programming? It may help to describe how exactly this is done. I am 
>> also curious to know how you do it.
>> Victor
>> On Oct 5, 2004, at 12:09 PM, Juan Carlos Perin wrote:
>>> I have been running benchmarks with blastall on several different 
>>> machines.
>>> We've come to realize that one of the biggest differences affecting 
>>> search
>>> times is how the machines actually maintain the search databases in 
>>> memory.
>>> Eg..  On our IBM 8-way machine, the databases are held in the 
>>> memory, which
>>> seems to be an effect of the architecture of the machine, and search 
>>> times
>>> become incredibly fast after an initial run, which stores the 
>>> database in
>>> memory.  The same effect seems to take place on our Dual Xeon Dell (PE
>>> 1650), which also outpaces the Xserves significantly after an 
>>> initial run to
>>> populate the db in  memory.
>>> It would appear the the Xserves dump the db from memory after each 
>>> search,
>>> even when submitting batch jobs with multiple sequences in a file.  Is
>>> anyone aware of how this functions, and how this effect might be 
>>> changed to
>>> allow the db to stay in memory longer?  Thanks
>>> Juan Perin
>>> Child. Hospital of Philadelphia

Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615