[Biodevelopers] XML for huge DB?
Alex Milowski
alex at milowski.com
Thu Jul 31 12:39:28 EDT 2003
On Thursday, July 31, 2003, at 09:02 AM, Dan Bolser wrote:
> Hello,
>
> How can I use XML efficiently to parse multiple blast results
> files?
>
> I want to parse them on a multi processor environment, without
> hitting the system memory limit.
>
> This is likely to happen, as big files take the most time, so the
> processes tend to work on big files at the same time, leading
> to a system memory outage....
You need to parse your XML in a "streaming" fashion. If you are using
Java, for most people, that means using SAX. You should write a
ContentHandler
(org.xml.sax package) that gathers your results. The SAX
ContentHandler is
a call-back style API and so programming can get complicated--but that
isn't necessarily
true.
Many C/C++ APIs have a similar call-back style APIs. Basically, you
want to interface
the parser directly and get the essential information as efficiently as
possible.
If you plan to use Java 2, check out version 1.4.x and the
javax.xml.parsers and
org.xml.sax packages.
Alex Milowski FAX: (707) 598-7649
alex at milowski.com
"The excellence of grammar as a guide is proportional to the paucity of
the
inflexions, i.e. to the degree of analysis effected by the language
considered."
Bertrand Russell in a footnote of Principles of Mathematics
More information about the Biodevelopers
mailing list