Hi Folks: Working on a quick parser project, and I just spent too much time chasing down a bug. Short version: I need to make the output of mpiBLAST (based upon NCBI BLAST) appear to provide identical output for the same input across multiple machines with the same data sets and databases. In theory this is not too difficult, and it was something we had solved a while ago for a different case. Ok, I had suggested using XML, and the -m 7 output, and then simply parsing the document and returning it in a specific order. Works well ... sort of. The resulting XML from a BLAST run starts out with <?xml version="1.0"?> <BlastOutput_reference>~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~"Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> and then it gives the rest of the hits ... and then it gives ... <?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd"> <BlastOutput> <BlastOutput_program>blastx</BlastOutput_program> <BlastOutput_version>blastx 2.2.10 [Oct-19-2004]</BlastOutput_version> Is this valid? See http://www.w3.org/TR/2004/REC-xml-20040204/#sec-well-formed . I am trying to track down a bug in the XML parser, and I ran into that second XML tag. Basically, xmllint complains: [landman at crunch:~] 124 >xmllint /big/tomato_test1.1 /big/tomato_test1.1:7365: parser error : XML declaration allowed only at the start of the document <?xml version="1.0"?> ^ /big/tomato_test1.1:7366: parser error : Extra content at the end of the document <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dt ^ Which makes me think that this is not well formed XML. I do have a few options here, they are hacks, but they are options. Is the -m 7 output generally considered to be valid XML by people who consume it, or do you need to run it through parsers which have been made less sensitive? Any thoughts? I am sure others have solved issues like this in the past. I am ok with being forgiving on what I read, but it is breaking the parser, so I need to either fix the parser, or be more sensitive to what I am parsing. Thanks! -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615