[Biodevelopers] XML for huge DB?
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Thu Jul 31 18:43:23 EDT 2003
On 31 Jul 2003, Michael Gruenberger wrote:
> I agree with the other posters, but if you want to continue using your
> XML::Simple package, a quick 'hack' might be to check if you are already
> parsing a large file in one of your other processes.
> And only parse files larger than a certain size when there is enough
> memory and no other process parsing a large file....
>
> As you have a .cam.ac.uk address ... is there anything you could use on
> mole.bio.cam.ac.uk ? Maybe they would be willing to share some code?!
?
What is this?
Ta,
Dan.
>
> Michael.
>
> On Thu, 2003-07-31 at 16:39, Alex Milowski wrote:
> > On Thursday, July 31, 2003, at 09:02 AM, Dan Bolser wrote:
> >
> > > Hello,
> > >
> > > How can I use XML efficiently to parse multiple blast results
> > > files?
> > >
> > > I want to parse them on a multi processor environment, without
> > > hitting the system memory limit.
> > >
> > > This is likely to happen, as big files take the most time, so the
> > > processes tend to work on big files at the same time, leading
> > > to a system memory outage....
> >
> > You need to parse your XML in a "streaming" fashion. If you are using
> > Java, for most people, that means using SAX. You should write a
> > ContentHandler
> > (org.xml.sax package) that gathers your results. The SAX
> > ContentHandler is
> > a call-back style API and so programming can get complicated--but that
> > isn't necessarily
> > true.
> >
> > Many C/C++ APIs have a similar call-back style APIs. Basically, you
> > want to interface
> > the parser directly and get the essential information as efficiently as
> > possible.
> >
> > If you plan to use Java 2, check out version 1.4.x and the
> > javax.xml.parsers and
> > org.xml.sax packages.
> >
> > Alex Milowski FAX: (707) 598-7649
> > alex at milowski.com
> >
> > "The excellence of grammar as a guide is proportional to the paucity of
> > the
> > inflexions, i.e. to the degree of analysis effected by the language
> > considered."
> >
> > Bertrand Russell in a footnote of Principles of Mathematics
> >
> >
> > _______________________________________________
> > Biodevelopers mailing list
> > Biodevelopers at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/biodevelopers
>
More information about the Biodevelopers
mailing list