HTMLAWED 1.1 filtering MS HTML
Some examples illustrating use of the htmLawed PHP script to filter/purify HTML containing non-standard, proprietary tags generated by Microsoft Office applications like Word. (htmLawed also transforms XML-non-compliant character code-points like the ones the applications use for
em dashes and
inverted quotes.)
Example texts are shown both before and after
the clean-up. The PHP code used is:
$out = htmLawed($in, $config);
Refer to
htmLawed documentation for customizing the filtering using htmLawed's $config parameter.
1.
Proprietary o:p tags, unquoted attributes, discouraged code-pointsInput code »<div class=3DSection1>
<p class=3DMsoNormal>A sample:</p>
<p class=3DMsoNormal><o:p> </o:p></p>
<p class=3DMsoNormal>“Where is he?” asked
both Mary – the one so lovely – and Jane.</p>
</div>
Output code »<div class="3DSection1">
<p class="3DMsoNormal">A sample:</p>
<p class="3DMsoNormal"> </p>
<p class="3DMsoNormal">“Where is he?” asked
both Mary – the one so lovely – and Jane.</p>
</div>
2.
Mis- and non-quoted attributes, untidyInput code »<div class=Section1>
<p class=MsoNormal>A table:</p>
<p class=MsoNormal><o:p> </o:p></p>
<table class=MsoTableColorful3 border=1 cellspacing=0 cellpadding=0
style='border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt;
mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh:
.75pt solid silver'>
<tr style='mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt'>
<td width=295 valign=top style='width:221.4pt;border-top:2.25pt;border-left:
4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style:
solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt:
.75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt:
solid;background:black;mso-shading:white;mso-pattern:solid black;padding:
0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:517'><i style='mso-bidi-font-style:
normal'><span style='color:white;mso-bidi-font-weight:bold'>1<o:p></o:p></span></i></p>
</td>
<td width=295 valign=top style='width:221.4pt;border-top:solid black 2.25pt;
border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 2.25pt;
mso-border-top-alt:solid black 2.25pt;mso-border-bottom-alt:solid black .75pt;
mso-border-right-alt:solid black 2.25pt;background:teal;mso-shading:white;
mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:1'>2</p>
</td>
</tr>
<tr style='mso-yfti-irow:0;mso-yfti-lastrow:yes'>
<td width=295 valign=top style='width:221.4pt;border-top:none;border-left:
solid black 4.5pt;border-bottom:solid black 2.25pt;border-right:solid black 1.0pt;
mso-border-top-alt:solid silver .75pt;mso-border-top-alt:silver .75pt;
mso-border-left-alt:black 4.5pt;mso-border-bottom-alt:black 2.25pt;
mso-border-right-alt:black .75pt;mso-border-style-alt:solid;background:teal;
mso-shading:white;mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt'>
<p class=MsoNormal style='mso-yfti-cnfc:4'><b style='mso-bidi-font-weight:
normal'><i style='mso-bidi-font-style:normal'>3<o:p></o:p></i></b></p>
</td>
<td width=295 valign=top style='width:221.4pt;border-top:none;border-left:
none;border-bottom:solid black 2.25pt;border-right:solid black 2.25pt;
mso-border-top-alt:solid silver .75pt;background:#DDE6E6;mso-shading:white;
mso-pattern:gray-25 teal;padding:0in 5.4pt 0in 5.4pt'>
<p class=MsoNormal>4</p>
</td>
</tr>
</table>
<p class=MsoNormal><o:p> </o:p></p>
</div>
Output code »<div class="Section1">
<p class="MsoNormal">A table:</p>
<p class="MsoNormal"> </p>
<table class="MsoTableColorful3" border="1" cellspacing="0" cellpadding="0" style="border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt; mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh: .75pt solid silver">
<tr style="mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt">
<td valign="top" style="width:221.4pt;border-top:2.25pt;border-left: 4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style: solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt: .75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt: solid;background:black;mso-shading:white;mso-pattern:solid black;padding: 0in 5.4pt 0in 5.4pt;height:34.6pt; width: 295px;">
<p class="MsoNormal" style="mso-yfti-cnfc:517"><i style="mso-bidi-font-style: normal"><span style="color:white;mso-bidi-font-weight:bold">1</span></i></p>
</td>
<td valign="top" style="width:221.4pt;border-top:solid black 2.25pt; border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 2.25pt; mso-border-top-alt:solid black 2.25pt;mso-border-bottom-alt:solid black .75pt; mso-border-right-alt:solid black 2.25pt;background:teal;mso-shading:white; mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt;height:34.6pt; width: 295px;">
<p class="MsoNormal" style="mso-yfti-cnfc:1">2</p>
</td>
</tr>
<tr style="mso-yfti-irow:0;mso-yfti-lastrow:yes">
<td valign="top" style="width:221.4pt;border-top:none;border-left: solid black 4.5pt;border-bottom:solid black 2.25pt;border-right:solid black 1.0pt; mso-border-top-alt:solid silver .75pt;mso-border-top-alt:silver .75pt; mso-border-left-alt:black 4.5pt;mso-border-bottom-alt:black 2.25pt; mso-border-right-alt:black .75pt;mso-border-style-alt:solid;background:teal; mso-shading:white;mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt; width: 295px;">
<p class="MsoNormal" style="mso-yfti-cnfc:4"><b style="mso-bidi-font-weight: normal"><i style="mso-bidi-font-style:normal">3</i></b></p>
</td>
<td valign="top" style="width:221.4pt;border-top:none;border-left: none;border-bottom:solid black 2.25pt;border-right:solid black 2.25pt; mso-border-top-alt:solid silver .75pt;background:#DDE6E6;mso-shading:white; mso-pattern:gray-25 teal;padding:0in 5.4pt 0in 5.4pt; width: 295px;">
<p class="MsoNormal">4</p>
</td>
</tr>
</table>
<p class="MsoNormal"> </p>
</div>
3.
Additional removal of selected element-specific attributesInput code »
<table class=MsoTableColorful3 border=1 cellspacing=0 cellpadding=0
style='border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt;
mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh:
.75pt solid silver'>
<tr style='mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt'>
<td width=295 valign=top style='width:221.4pt;border-top:2.25pt;border-left:
4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style:
solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt:
.75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt:
solid;background:black;mso-shading:white;mso-pattern:solid black;padding:
0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:517'><i style='mso-bidi-font-style:
normal'><span style='color:white;mso-bidi-font-weight:bold'>1<o:p></o:p></span></i></p>
Output code »<table border="1" cellspacing="0" cellpadding="0">
<tr style="mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt">
<td valign="top" style="width: 295px;">
<p style="mso-yfti-cnfc:517"><i style="mso-bidi-font-style: normal"><span style="color:white;mso-bidi-font-weight:bold">1</span></i></p>
</td>
</tr>
</table>