[Bioclusters] batching of blast searches

Joseph Landman bioclusters@bioinformatics.org
18 Mar 2003 10:17:44 -0500


Hi Justin:

  The db's that fit into memory have a finite load time, but it is not
bad.  The problem that I have been trying to understand better is the
tuning of the page cache system so as to give up pages more readily. 
Linux appears (by default from most distributions) to hold onto its
cache as long as possible, which could significantly reduce the size of
the core process mmap.  This means more IO subsystem traffic, and the
costs associated with that.

  It is still faster than real disk IO though.  You will have to pay
that program startup cost, including the db load, the memory allocation,
etc, for each program instance.  If you are lucky, the db is already in
page cache (or even better, in a locked/shared mmap (shmem) region)

Joe



On Tue, 2003-03-18 at 09:54, Justin Powell wrote:
> On 18 Mar 2003, Joseph Landman wrote:
> 
> >   You are fighting the database load (actually an mmap) time, as well as
> > the queue latency, against a sequence comparison time, which is
> > dominated by the search portin.
> 
> I'm not sure I understand how the whole linux page cache and mmap thing
> works, but naively I would assume that the VM system knows that the
> pages in the cache still contain the mmaped database, and can bring them
> in using hardware page mapping rather than any kind of memory to memory
> copying? In which case database loading (assuming the whole db fits in
> memory) should be insignificant on subsequent runs?
> 
> justin powell
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Joseph Landman <landman@scalableinformatics.com>