[Bioclusters] Xserve G5 memory

Joe Landman bioclusters@bioinformatics.org
Tue, 05 Oct 2004 16:40:41 -0400


Note: nt is a huge database.  How much memory do you have?  Unless you 
have 8-10 GB of ram, I would imagine you would be paging.  A RAM disk 
for NT is most definitely not the approach I would recommend.

I would strongly urge you to look at the "-v" switch for formatdb, and 
break your nt up into databases about 1/3 the size of your physical ram.

       formatdb -p F -o T -v 512 nt

should break nt into about 512 MB segments.  If your memory is 1 GB, 
look at using -v 333. If your memory is 4 GB or larger I would recommend 
-v 1024 . 

Joe

Juan Carlos Perin wrote:

>Well, for these benchmarks we focused on the larger NT database.  This is
>the only database that really shows significant lag time in search runs due
>to its large size, but I have confirmed that the database is not overflowing
>into swap space.  This wasn't initially assumed because we have plenty of
>memory, but I verified it anyway.
>
>OSX can be 'kernel tuned' through 'sysctl' but it doesn't seem obvious how
>to do this to allow retention of a flatfile database for this type of an
>application.  
>
>One suggestion, according to Aaron Darling suggest creating a RAMdisk in
>OSX, which I will look into, and if this is the case, we may very well just
>dedicate a node for this reason, since the search time when the DB is in
>memory is so much faster, and it is very important in our scenario.
>
>Thanks
>  
>
>
>On 10/5/04 4:01 PM, "Joe Landman" <landman@scalableinformatics.com> wrote:
>
>  
>
>>Unless you are using a memory mapped process reading these same files at
>>the same time, it is likely that they are only showing up in buffer
>>cache.  OSX probably has a very different cache retention policy as
>>compared to AIX and Linux.  This is usually a kernel tunable.
>>
>>Ask your Apple folks about how to tweak the kernel.
>>
>>Another very important issue is the size of the file as compared to the
>>size of memory (more specifically buffer cache).  If the file overflows
>>memory, the mmap mechanism will happily ask the kernel to start paging.
>>This is "A Bad Thing(tm)".  You want your database index small enough to
>>hold in memory for good performance.  You don't want them too small, as
>>you will start to pay some costs associated with increased file activity.
>>
>>Which database is being used?
>>
>>Victor M.Ruotti wrote:
>>
>>    
>>
>>>Hi Juan,
>>>How exactly do you hold your databases in memory. Do you it through
>>>programming? It may help to describe how exactly this is done. I am
>>>also curious to know how you do it.
>>>
>>>Victor
>>>On Oct 5, 2004, at 12:09 PM, Juan Carlos Perin wrote:
>>>
>>>      
>>>
>>>>I have been running benchmarks with blastall on several different
>>>>machines.
>>>>We've come to realize that one of the biggest differences affecting
>>>>search
>>>>times is how the machines actually maintain the search databases in
>>>>memory.
>>>>
>>>>Eg..  On our IBM 8-way machine, the databases are held in the memory,
>>>>which
>>>>seems to be an effect of the architecture of the machine, and search
>>>>times
>>>>become incredibly fast after an initial run, which stores the
>>>>database in
>>>>memory.  The same effect seems to take place on our Dual Xeon Dell (PE
>>>>1650), which also outpaces the Xserves significantly after an initial
>>>>run to
>>>>populate the db in  memory.
>>>>
>>>>It would appear the the Xserves dump the db from memory after each
>>>>search,
>>>>even when submitting batch jobs with multiple sequences in a file.  Is
>>>>anyone aware of how this functions, and how this effect might be
>>>>changed to
>>>>allow the db to stay in memory longer?  Thanks
>>>>
>>>>Juan Perin
>>>>Child. Hospital of Philadelphia
>>>>


-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615