[Bioclusters] BioPerl 1.2.3 and memory handling

Mike Cariaso cariaso at yahoo.com
Tue Nov 30 16:27:47 EST 2004


sweet merciful crap. 

When is O'Reilly finally gonna pay you guys to write
the book?


--- Jason Stajich <jason.stajich at duke.edu> wrote:

> 
> On Nov 30, 2004, at 4:10 PM, Mike Cariaso wrote:
> 
> > Al,
> >
> > While I'm certainly learning a bit from the
> > bioperlers, we seem to have strayed a bit from
> your
> > original question.
> >
> > If you don't need to see the alignments, you might
> > wish to investigate if your software can be made
> to
> > use blast's table output ("blastall -m 8" I
> believe).
> > Perhaps the bioperl parser will recognize the
> format,
> > and will be able to complete since it will have no
> > alignments to eat up memory. If its not
> automatically
> > recognized writing a parser for this might be
> pretty
> > simple.
> >
> This is the 'blasttable' format - it will be more
> efficient since there 
> is less data to store, but may still suffer from the
> memory overhead of 
> creating Result/Hit/HSP objects even if they don't
> contain the 
> alignment information.   Bioperl 1.2.3 is pretty old
> so it might not 
> have this - upgrading to 1.4 or the upcoming 1.5
> release is suggested 
> if you want to take advantage of bugfixes and new
> functionality.
> 
> If you are running WU-BLAST you can specify the
> -noseqs option to not 
> see the alignment data and the modern
> SearchIO::blast (I think only 
> since the 1.4 bioperl release) will properly
> construct HSPs for you 
> with the start/end information but no alignment
> sequences.
> 
> My feeling is you should use SearchIO if you want
> the flexibility of 
> changing algorithms, versions of programs, or output
> options and not 
> have to change your script code which expects an API
> for the objects.
> 
> If speed is what you want then convert things down
> to tab delimited 
> format and the parser is  as follows and you get to
> do whatever you 
> want with the columns.
> while(<>) {
> my @fields = split(/\t/,$_)
> # do something with an HSP
> }
> 
> I personally use a combination of approaches, trying
> to find the right 
> tool for the job.
> 
> > If you need the alignments but don't need all the
> > statistics, you might wish to use the BPLite
> parser,
> > which manages to handle some reports that the
> SearchIO
> > parser cannot.
> >
> > If you need both, you can probably still use
> BPLite,
> > but you'll need to do a bit more work.
> >
> > Sadly, I don't believe that the XML (-m 7) format
> is
> > handled by bioperl yet. That would probably solve
> all
> > of these issues.
> 
> The format parser is called blastxml and it has been
> supported since 
> Bio::SearchIO was written as it was in fact the
> first one I wrote 
> because I wanted to write a SAX-like environment
> from the outset.
> 
> [jason at lugano SearchIO]$ cvs log blastxml.pm | grep
> -A2 -P 'revision 
> 1\.1\s+'
> revision 1.1
> date: 2001/10/22 02:56:32;  author: jason;  state:
> Exp;
> initial commit of SearchIO modules and new Search
> objects
> 
> >
> >
> > That'll teach you to ask a question! ;)
> > Mike Cariaso
> >
> >
> >
> >
> > --- Al Tucker <act at comm.rockefeller.edu> wrote:
> >
> >> Hi everybody.
> >>
> >> We're new to the Inquiry Xserve scientific
> cluster
> >> and trying to iron
> >> out a few things.
> >>
> >> One thing is we seem to be coming up against is
> an
> >> out of memory
> >> error when getting large sequence analysis
> results
> >> (5,000 seq - at
> >> least- and above) back from BTblastall. The
> problem
> >> seems to be with
> >> BioPerl.
> >>
> >> Might anyone here know if BioPerl is knows enough
> >> not to try and
> >> access more than 4gb of RAM in a single process
> (an
> >> OS X limit)? I'm
> >> told Blastall and BTblastall are and will chunk
> >> problems accordingly,
> >> but we're not certain if BioPerl is when called
> to
> >> merge large Blast
> >> results back together. It's the default version
> >> 1.2.3 that's supplied
> >> btw, and OS X 10.3.5 with all current updates
> just
> >> short of the
> >> latest 10.3.6 update.
> >>
> >> - Al Tucker
> >
> >
> > =====
> > Mike Cariaso
> > _______________________________________________
> > Bioclusters maillist  - 
> Bioclusters at bioinformatics.org
> >
>
https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioclusters maillist  - 
> Bioclusters at bioinformatics.org
>
https://bioinformatics.org/mailman/listinfo/bioclusters
> 


=====
Mike Cariaso


More information about the Bioclusters mailing list