[Biodevelopers] XML for huge DB?
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Thu Jul 31 18:28:42 EDT 2003
On Thu, 31 Jul 2003, Patrick McConnell wrote:
>
>
>
>
>
> You are better off using SAX instead of DOM. What we do is filter Hsps and
> Hits using a streaming technology (such as SAX), and then we parse the rest
> with DOM. But, if you need all the Hsps and Hits, then you must use SAX or
> load balancing.
Yup, cheers, SAX is the way forward.
>
> Load balance based on file size. When your threads (or processes) ask for
> another document to parse, you must give them one based on the size of the
> documents the other threads are parsing. But I feel like the large
> documents are still going to dominate the CPU time, and thus you will only
> be left with a bunch of large documents in the end.
I thought about this too, but I hate anything complex;)
I found a really neat way to do massive dumps to mysql without
incuring any of the normal overheads - Either increasingly slow
index updates or (very) large prepared files for LOAD DATA INFILE
...
Simply LOAD DATA INFILE from a named pipe... All is perfect,
and multiprocessors (with a common file system) can cooperate
like a charm.
I found this solution in a mysql bug report.
Thanks again,
Dan.
>
> -Patrick
>
>
>
>
>
> Dan Bolser <dmb at mrc-dunn.cam.ac.uk>@bioinformatics.org on 07/31/2003
> 12:02:17 PM
>
> Please respond to biodevelopers at bioinformatics.org
>
> Sent by: biodevelopers-admin at bioinformatics.org
>
>
> To: biodevelopers at bioinformatics.org
> cc:
>
> Subject: [Biodevelopers] XML for huge DB?
>
> Hello,
>
> How can I use XML efficiently to parse multiple blast results
> files?
>
> I want to parse them on a multi processor environment, without
> hitting the system memory limit.
>
> This is likely to happen, as big files take the most time, so the
> processes tend to work on big files at the same time, leading
> to a system memory outage....
>
> Cheers,
> Dan.
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>
>
>
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>
More information about the Biodevelopers
mailing list