[BiO BB] Parsing GenBank XML?

Mike Marchywka marchywka at hotmail.com
Fri Sep 5 07:28:57 EDT 2008




> So it looks like I can't combine SAX with XSLT unless I somehow
> trigger an XSLT parse of one 'chunk' of XML per SAX 'new-record'
> event...
>
> I'll try!
>

All of my eutils scripts pretty much use the text format where it is not ambiguous.
If you need XML or SOAP fine but normally it doesn't add much and
it can be slow to format and unformat stuff. For most situations, reasonably formatted
text is just fine.

>
>> I've got my own hard coded c++ that I use for my string processing rules source
>> code, FDA AERA SGML parsing, SOAP utilities, etc, that will output all the fields
>> in a simple format of "label value" per line, but there are SAX libraries in just about
>> every language. Personally I finally gave up on PERL as speed, at least under cygwin,
>> was unpredictable and degraded quickly when you ran out of physical memory.
>>
>>
>>
>> Mike Marchywka
>> 586 Saint James Walk
>> Marietta GA 30067-7165
>> 415-264-8477 (w)<- use this
>> 404-788-1216 (C)<- leave message
>> 989-348-4796 (P)<- emergency only
>> marchywka at hotmail.com
>> Note: If I am asking for free stuff, I normally use for hobby/non-profit
>> information but may use in investment forums, public and private.
>> Please indicate any concerns if applicable.
>> Note: hotmail is getting cumbersom, try also marchywka at yahoo.com
>>
>>
>>
>>> Date: Thu, 4 Sep 2008 15:29:20 +0100
>>> From: dan.bolser at gmail.com
>>> To: BBB at bioinformatics.org
>>> Subject: [BiO BB] Parsing GenBank XML?
>>>
>>> Hi,
>>>
>>> Dumb / noob question I am sure but... I am parsing the results of a
>>> GenBank query obtained using esearch / efetch:
>>>
>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
>>>
>>>
>>> The XML looks like this...
>>>
>>> http://pastebin.com/f3ef02d85
>>>
>>> the only difference being that the real document has (possibly)
>>> millions of 's.

_________________________________________________________________
Want to do more with Windows Live? Learn “10 hidden secrets” from Jamie.
http://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog-cns!550F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008



More information about the BBB mailing list