[Bioclusters] Xserve G5 memory
Joe Landman
bioclusters@bioinformatics.org
Tue, 05 Oct 2004 16:40:41 -0400
Note: nt is a huge database. How much memory do you have? Unless you
have 8-10 GB of ram, I would imagine you would be paging. A RAM disk
for NT is most definitely not the approach I would recommend.
I would strongly urge you to look at the "-v" switch for formatdb, and
break your nt up into databases about 1/3 the size of your physical ram.
formatdb -p F -o T -v 512 nt
should break nt into about 512 MB segments. If your memory is 1 GB,
look at using -v 333. If your memory is 4 GB or larger I would recommend
-v 1024 .
Joe
Juan Carlos Perin wrote:
>Well, for these benchmarks we focused on the larger NT database. This is
>the only database that really shows significant lag time in search runs due
>to its large size, but I have confirmed that the database is not overflowing
>into swap space. This wasn't initially assumed because we have plenty of
>memory, but I verified it anyway.
>
>OSX can be 'kernel tuned' through 'sysctl' but it doesn't seem obvious how
>to do this to allow retention of a flatfile database for this type of an
>application.
>
>One suggestion, according to Aaron Darling suggest creating a RAMdisk in
>OSX, which I will look into, and if this is the case, we may very well just
>dedicate a node for this reason, since the search time when the DB is in
>memory is so much faster, and it is very important in our scenario.
>
>Thanks
>
>
>
>On 10/5/04 4:01 PM, "Joe Landman" <landman@scalableinformatics.com> wrote:
>
>
>
>>Unless you are using a memory mapped process reading these same files at
>>the same time, it is likely that they are only showing up in buffer
>>cache. OSX probably has a very different cache retention policy as
>>compared to AIX and Linux. This is usually a kernel tunable.
>>
>>Ask your Apple folks about how to tweak the kernel.
>>
>>Another very important issue is the size of the file as compared to the
>>size of memory (more specifically buffer cache). If the file overflows
>>memory, the mmap mechanism will happily ask the kernel to start paging.
>>This is "A Bad Thing(tm)". You want your database index small enough to
>>hold in memory for good performance. You don't want them too small, as
>>you will start to pay some costs associated with increased file activity.
>>
>>Which database is being used?
>>
>>Victor M.Ruotti wrote:
>>
>>
>>
>>>Hi Juan,
>>>How exactly do you hold your databases in memory. Do you it through
>>>programming? It may help to describe how exactly this is done. I am
>>>also curious to know how you do it.
>>>
>>>Victor
>>>On Oct 5, 2004, at 12:09 PM, Juan Carlos Perin wrote:
>>>
>>>
>>>
>>>>I have been running benchmarks with blastall on several different
>>>>machines.
>>>>We've come to realize that one of the biggest differences affecting
>>>>search
>>>>times is how the machines actually maintain the search databases in
>>>>memory.
>>>>
>>>>Eg.. On our IBM 8-way machine, the databases are held in the memory,
>>>>which
>>>>seems to be an effect of the architecture of the machine, and search
>>>>times
>>>>become incredibly fast after an initial run, which stores the
>>>>database in
>>>>memory. The same effect seems to take place on our Dual Xeon Dell (PE
>>>>1650), which also outpaces the Xserves significantly after an initial
>>>>run to
>>>>populate the db in memory.
>>>>
>>>>It would appear the the Xserves dump the db from memory after each
>>>>search,
>>>>even when submitting batch jobs with multiple sequences in a file. Is
>>>>anyone aware of how this functions, and how this effect might be
>>>>changed to
>>>>allow the db to stay in memory longer? Thanks
>>>>
>>>>Juan Perin
>>>>Child. Hospital of Philadelphia
>>>>
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 612 4615