[Bioclusters] BioPerl and memory handling

Mike Cariaso cariaso at yahoo.com
Mon Nov 29 18:03:57 EST 2004

This message is being cross posted from bioclusters to
bioperl. I'd appreciate a clarification from anyone in
bioperl who can speak more authoritatively than my

Perl does have a garbage collector. It is not wildly
sophisticated. As you've suggested it uses simple
reference counting. This means that circular
references will cause memory to be held until program

However I think you are overstating the inefficiency
in the system. While the perl GC *may* not release
memory to the system, it does at least allow memory to
be reused within the process. 

If the system instead behaved as you describe, I think
perl would hemorrhage memory and would be unsuitable
for any long running processes. 

However I can say with considerable certainty that
that BPLite is able to handle blast reports which
cause SearchIO to thrash. I've attributed this to
BPLite being a true stream processor, while SearchIO
seems to slurp the whole file and object heirarchy
into memory.

I know that SearchIO is the prefered blast parser, but
it seems that BPLite is not quite dead, for the
reasons above. If this is infact the unique benefit of
BPLite, perhaps the documentation should be clearer
about this, as I suspect I'm not the only person to
have had to reengineer a substantial piece of code to
adjust between their different models. Had I known of
this difference early on I would have chosen BPLite.

So, bioperlers (especially Jason Stajich) can you shed
any light on this vestigial bioperl organ?

--- Malay <mbasu at mail.nih.gov> wrote:

> Michael Cariaso wrote:
> > Michael Maibaum wrote:
> > 
> >>
> >> On 10 Nov 2004, at 18:25, Al Tucker wrote:
> >>
> >>> Hi everybody.
> >>>
> >>> We're new to the Inquiry Xserve scientific
> cluster and trying to iron 
> >>> out a few things.
> >>>
> >>> One thing is we seem to be coming up against is
> an out of memory 
> >>> error when getting large sequence analysis
> results (5,000 seq - at 
> >>> least- and above) back from BTblastall. The
> problem seems to be with 
> >>> BioPerl.
> >>>
> >>> Might anyone here know if BioPerl is knows
> enough not to try and 
> >>> access more than 4gb of RAM in a single process
> (an OS X limit)? I'm 
> >>> told Blastall and BTblastall are and will chunk
> problems accordingly, 
> >>> but we're not certain if BioPerl is when called
> to merge large Blast 
> >>> results back together. It's the default version
> 1.2.3 that's supplied 
> >>> btw, and OS X 10.3.5 with all current updates
> just short of the 
> >>> latest 10.3.6 update.
> >>
> >>
> >> BioPerl tries to slurp up the entire results set
> from a BLAST query, 
> >> and build objects for each little bit of the
> result set and uses lots 
> >> of memory. It doesn't have anything smart at all
> about breaking up the 
> >> job within the result set, afaik.
> >>
> This is not really true. SearchIO module as far as I
> know works on stream.
> >>  I ended up stripping out results that hit a
> certain threshold size to 
> >> run on a different, large memory opteron/linux
> box and I'm 
> >> experimenting with replacing BioPerl with
> BioPython etc.
> >>
> >> Michael
> > 
> > 
> > You may find hthat the BPLite parser works better
> when dealing with 
> > large blast result files. Its not as clean or
> maintained, but it does 
> > the job nicely for my current needs, which
> overloaded the usual parser.
> There is basically no difference between BPLite and
> other BLAST parser 
> interfaces in Bioperl.
> The problem lies in the core of Perl iteself. Perl
> does not release 
> memory to the system even after the reference count
> of an object created 
> in the memory goes to 0, unless the program in
> actually over. Perl 
> object system in highly inefficient to handle large
> number of objects 
> created in the memory.
> -Malay
> _______________________________________________
> Bioclusters maillist  - 
> Bioclusters at bioinformatics.org

Mike Cariaso

More information about the Bioclusters mailing list