[Biodevelopers] XML for huge DB?
Joseph Landman
landman at scalableinformatics.com
Thu Jul 31 12:44:52 EDT 2003
On Thu, 2003-07-31 at 12:26, Dan Bolser wrote:
> No, the problem is that a big results file can grab 50% of the 4GB
> memory on the system. When I run 4 processes (and a file of this
> size takes about 1 hour to process with XML::Simple) then as soon
> as more that one process encounters a big file I am skuppered.
Have a look at XML::Twig
"XML::Twig - A perl module for processing huge XML documents in tree
mode."
http://search.cpan.org/author/MIROD/XML-Twig-3.10/Twig.pm
> I am looking for a memory lite way of parsing the blast results
> files from XML, I.E. one HST at a time with a print event
> for each, rather than whole file at a time processing from
> XML::Simple....
You might also look at Bioperl to handle this. They have a neat
interface to exactly this.
XML::Simple slurps the entire file into memory for parsing. This is not
a good idea for big documents. XML::SAX is possible, but you have to
work harder to write your callbacks and parsers. The callbacks under
Twig are easy to write as closures.
The XML::Twig->next_sibling() may be useful for this.
--
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
web: http://scalableinformatics.com
phone: +1 734 612 4615
More information about the Biodevelopers
mailing list