[Biodevelopers] NCBI XML
Alex Milowski
alex at milowski.com
Thu Jan 16 01:05:16 EST 2003
On Wednesday, January 15, 2003, at 09:51 PM, Joe Landman wrote:
> The other problem for structured documents of this nature is that the
> size of them almost precludes real parsing efforts. A parser is going
> to build up data structures which represent the content of the
> document,
> and these structures should be of comparable size to the document in
> various cases.
>
> We probably need to start looking at things differently in the file
> systems, and handling the output somewhat differently (and more
> succinctly).
>
Part of my interest is that I've been working on event-parsing schemes
for XML that should be of good use in this area. There are lots of
useful things you can do in an event-oriented environment where you
only look at small subtrees at any point in time. This would then
allow you to traverse a large document (i.e. genome data), doing
whatever
you do, without have to try to "load" it into some data structure first.
I've just found BSML [1] so I'm going to take a look at that to see if
it is any better.
[1] http://www.bsml.org Bioinfomatic Sequence Markup Language
Alex Milowski FAX: (707) 598-7649
alex at milowski.com
"The excellence of grammar as a guide is proportional to the paucity of
the
inflexions, i.e. to the degree of analysis effected by the language
considered."
Bertrand Russell in a footnote of Principles of Mathematics
More information about the Biodevelopers
mailing list