[Bioclusters] blast output (-m 7) in XML, and the XML spec

Joaquin Zaragoza jzaragoza at lbk.ars.usda.gov
Sun Apr 24 17:37:32 EDT 2005


When you blast more than one sequence, BLAST concatenates the results into a
single file.  So, the file you are parsing is really multiple single files
that start with the "<xml version="1.0"?>" tag.  One file for each input,
fasta sequence.

Hope that helps.


Joaquin Zaragoza
USDA-ARS-LIRU



-----Original Message-----
From: bioclusters-bounces+jzaragoza=lbk.ars.usda.gov at bioinformatics.org
[mailto:bioclusters-bounces+jzaragoza=lbk.ars.usda.gov at bioinformatics.org]
On Behalf Of Joe Landman
Sent: Saturday, April 23, 2005 7:50 AM
To: Clustering, compute farming & distributed computing in life science
informatics
Subject: Re: [Bioclusters] blast output (-m 7) in XML, and the XML spec

Hi Tim:

   I wanted to make sure my understanding of the standard was correct. 
I usually use xmllint to make sure a document is standard.

   The problem is that this is an issue with NCBI BLAST, and I haven't 
been too successful at get our patches included in the past, so I am 
hesitant to fix the code.  Might be easier to write a simple shoe-horn 
function to fix this (current plan) for the post-processing parser.  I 
will look at the xml out code and if we can fix it easily, and maintain 
the patch as we do for the Opteron bits, then we might do this.

   Thanks for your note on this.

Joe

Tim White wrote:
> The line:
> 
> <?xml version="1.0"?>
> 
> can appear only once in a well-formed XML file, right at the top.  Also,
> if a DOCTYPE tag appears, it must appear immediately after this tag
> (formally this is called a "document type declaration").  So it seems to
> me that you need to arrange for the second "<?xml...?>" tag to be
> removed, and for the "<!DOCTYPE...>" tag appearing after it to be either
> removed as well, or moved to just after the initial "<?xml...?>" tag.
> 
> I would not expect any XML parser to happily read your document as it
> stands, though of course some might.  (As an aside, the "culture"
> surrounding XML generally discourages leniency in parsing, in the hopes
> of ensuring that every XML document is well-formed, and will therefore
> be interpreted the same way by all XML tools.)
> 
> Hope this helps,
> Tim
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters



More information about the Bioclusters mailing list