I hear you! I have a workaround which does some munging as it goes in the Bio::SearchIO::blastxml. Because we process each report one-at-time in the parser I have to have the middle code layer strip out these lines before allowing the lower-level XML lib to handle the stream. Not ideal, but it works. I think Warren added XML to WU-BLAST but unfortunately he implemented the same problems too! http://blast.wustl.edu/blast/parameters.html#mformat -jason On May 11, 2005, at 11:10 PM, Joe Landman wrote: > Simple problem: take NCBI BLAST XML output and parse it. It is an > XML document after all, so it should be easy ... right? > > Sort of ... > > The NCBI XML output file is really a container of XML documents. > You cannot hand the container to be parsed to an XML Parser, as it > (the container) is not a valid XML document (a valid XML document > has exactly one <?xml version=""?> tag in it according to the > standards on w3c.org). > > So here is my (perl based) "solution" (read as hack). > > # assume entire document in $all, though this is Bad(TM) > # for huge documents, very wasteful of memory resouces. > # > @sub_documents = split(/\<\?xml version=\"1.0\"\?>/,$all); > shift @sub_documents; > > Now, each sub_document is in fact a valid XML document, that you > can happily and easily parse. > > foreach (@sub_document) > { > # do stuff with $_ which is now a valid XML document > } > > If there are any NCBI folks lurking here, is there a nice way to > make the -m 7 output generate a single large valid XML document so > we can use the huge document parsers, rather than using hacks like > the above? > > As XML documents can be containers themselves, it seems to make > sense to make the entire output parseable without giving xmllint > (and other XML parsers) fits > > [landman at crunch-r.scalableinformatics.com:/ > big] 137 > >xmllint tomato_test1.1 > tomato_test1.1:7365: parser error : XML declaration allowed only at > the start of the document > <?xml version="1.0"?> > ^ > tomato_test1.1:7366: parser error : Extra content at the end of the > document > <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" > "NCBI_BlastOutput.dt > ^ > > Thanks. > > Joe > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://www.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 734 786 8452 > cell : +1 734 612 4615 > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >