[Bioclusters] BioPerl 1.2.3 and memory handling

Tue Nov 30 16:22:18 EST 2004

On Nov 30, 2004, at 4:10 PM, Mike Cariaso wrote:

> Al,
>
> While I'm certainly learning a bit from the
> bioperlers, we seem to have strayed a bit from your
> original question.
>
> If you don't need to see the alignments, you might
> wish to investigate if your software can be made to
> use blast's table output ("blastall -m 8" I believe).
> Perhaps the bioperl parser will recognize the format,
> and will be able to complete since it will have no
> alignments to eat up memory. If its not automatically
> recognized writing a parser for this might be pretty
> simple.
>
This is the 'blasttable' format - it will be more efficient since there 
is less data to store, but may still suffer from the memory overhead of 
creating Result/Hit/HSP objects even if they don't contain the 
alignment information.   Bioperl 1.2.3 is pretty old so it might not 
have this - upgrading to 1.4 or the upcoming 1.5 release is suggested 
if you want to take advantage of bugfixes and new functionality.

If you are running WU-BLAST you can specify the -noseqs option to not 
see the alignment data and the modern SearchIO::blast (I think only 
since the 1.4 bioperl release) will properly construct HSPs for you 
with the start/end information but no alignment sequences.

My feeling is you should use SearchIO if you want the flexibility of 
changing algorithms, versions of programs, or output options and not 
have to change your script code which expects an API for the objects.

If speed is what you want then convert things down to tab delimited 
format and the parser is  as follows and you get to do whatever you 
want with the columns.
while(<>) {
my @fields = split(/\t/,$_)
# do something with an HSP
}

I personally use a combination of approaches, trying to find the right 
tool for the job.

> If you need the alignments but don't need all the
> statistics, you might wish to use the BPLite parser,
> which manages to handle some reports that the SearchIO
> parser cannot.
>
> If you need both, you can probably still use BPLite,
> but you'll need to do a bit more work.
>
> Sadly, I don't believe that the XML (-m 7) format is
> handled by bioperl yet. That would probably solve all
> of these issues.

The format parser is called blastxml and it has been supported since 
Bio::SearchIO was written as it was in fact the first one I wrote 
because I wanted to write a SAX-like environment from the outset.

[jason at lugano SearchIO]$ cvs log blastxml.pm | grep -A2 -P 'revision 
1\.1\s+'
revision 1.1
date: 2001/10/22 02:56:32;  author: jason;  state: Exp;
initial commit of SearchIO modules and new Search objects

>
>
> That'll teach you to ask a question! ;)
> Mike Cariaso
>
>
>
>
> --- Al Tucker <act at comm.rockefeller.edu> wrote:
>
>> Hi everybody.
>>
>> We're new to the Inquiry Xserve scientific cluster
>> and trying to iron
>> out a few things.
>>
>> One thing is we seem to be coming up against is an
>> out of memory
>> error when getting large sequence analysis results
>> (5,000 seq - at
>> least- and above) back from BTblastall. The problem
>> seems to be with
>> BioPerl.
>>
>> Might anyone here know if BioPerl is knows enough
>> not to try and
>> access more than 4gb of RAM in a single process (an
>> OS X limit)? I'm
>> told Blastall and BTblastall are and will chunk
>> problems accordingly,
>> but we're not certain if BioPerl is when called to
>> merge large Blast
>> results back together. It's the default version
>> 1.2.3 that's supplied
>> btw, and OS X 10.3.5 with all current updates just
>> short of the
>> latest 10.3.6 update.
>>
>> - Al Tucker
>
>
> =====
> Mike Cariaso
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/