PHP Labware internal utilities / htmLawed

HTMLAWED 1.1 filtering MS HTML

Some examples illustrating use of the htmLawed PHP script to filter/purify HTML containing non-standard, proprietary tags generated by Microsoft Office applications like Word. (htmLawed also transforms XML-non-compliant character code-points like the ones the applications use for em dashes and inverted quotes.)

Example texts are shown both before and after the clean-up. The PHP code used is:

$out = htmLawed($in, $config);

Refer to htmLawed documentation for customizing the filtering using htmLawed's $config parameter.

1. Proprietary o:p tags, unquoted attributes, discouraged code-points

Input code »
<div class=3DSection1>

<p class=3DMsoNormal>A sample:</p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>“Where is he?” asked
both Mary – the one so lovely – and Jane.</p>

</div>

Output code »
<div class="3DSection1">

<p class="3DMsoNormal">A sample:</p>

<p class="3DMsoNormal">&nbsp;</p>

<p class="3DMsoNormal">&#8220;Where is he?&#8221; asked
both Mary &#8211; the one so lovely &#8211; and Jane.</p>

</div>

2. Mis- and non-quoted attributes, untidy

Input code »
<div class=Section1>

<p class=MsoNormal>A table:</p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<table class=MsoTableColorful3 border=1 cellspacing=0 cellpadding=0
style='border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt;
mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh:
.75pt solid silver'>
<tr style='mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt'>
<td width=295 valign=top style='width:221.4pt;border-top:2.25pt;border-left:
4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style:
solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt:
.75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt:
solid;background:black;mso-shading:white;mso-pattern:solid black;padding:
0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:517'><i style='mso-bidi-font-style:
normal'><span style='color:white;mso-bidi-font-weight:bold'>1<o:p></o:p></span></i></p>
</td>
<td width=295 valign=top style='width:221.4pt;border-top:solid black 2.25pt;
border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 2.25pt;
mso-border-top-alt:solid black 2.25pt;mso-border-bottom-alt:solid black .75pt;
mso-border-right-alt:solid black 2.25pt;background:teal;mso-shading:white;
mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:1'>2</p>
</td>
</tr>
<tr style='mso-yfti-irow:0;mso-yfti-lastrow:yes'>
<td width=295 valign=top style='width:221.4pt;border-top:none;border-left:
solid black 4.5pt;border-bottom:solid black 2.25pt;border-right:solid black 1.0pt;
mso-border-top-alt:solid silver .75pt;mso-border-top-alt:silver .75pt;
mso-border-left-alt:black 4.5pt;mso-border-bottom-alt:black 2.25pt;
mso-border-right-alt:black .75pt;mso-border-style-alt:solid;background:teal;
mso-shading:white;mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt'>
<p class=MsoNormal style='mso-yfti-cnfc:4'><b style='mso-bidi-font-weight:
normal'><i style='mso-bidi-font-style:normal'>3<o:p></o:p></i></b></p>
</td>
<td width=295 valign=top style='width:221.4pt;border-top:none;border-left:
none;border-bottom:solid black 2.25pt;border-right:solid black 2.25pt;
mso-border-top-alt:solid silver .75pt;background:#DDE6E6;mso-shading:white;
mso-pattern:gray-25 teal;padding:0in 5.4pt 0in 5.4pt'>
<p class=MsoNormal>4</p>
</td>
</tr>
</table>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

Output code »
<div class="Section1">
    <p class="MsoNormal">A table:</p>
    <p class="MsoNormal">&nbsp;</p>
    <table class="MsoTableColorful3" border="1" cellspacing="0" cellpadding="0" style="border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt; mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh: .75pt solid silver">
        <tr style="mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt">
            <td valign="top" style="width:221.4pt;border-top:2.25pt;border-left: 4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style: solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt: .75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt: solid;background:black;mso-shading:white;mso-pattern:solid black;padding: 0in 5.4pt 0in 5.4pt;height:34.6pt; width: 295px;">
            <p class="MsoNormal" style="mso-yfti-cnfc:517"><i style="mso-bidi-font-style: normal"><span style="color:white;mso-bidi-font-weight:bold">1</span></i></p>
            </td>
            <td valign="top" style="width:221.4pt;border-top:solid black 2.25pt; border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 2.25pt; mso-border-top-alt:solid black 2.25pt;mso-border-bottom-alt:solid black .75pt; mso-border-right-alt:solid black 2.25pt;background:teal;mso-shading:white; mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt;height:34.6pt; width: 295px;">
            <p class="MsoNormal" style="mso-yfti-cnfc:1">2</p>
            </td>
        </tr>
        <tr style="mso-yfti-irow:0;mso-yfti-lastrow:yes">
            <td valign="top" style="width:221.4pt;border-top:none;border-left: solid black 4.5pt;border-bottom:solid black 2.25pt;border-right:solid black 1.0pt; mso-border-top-alt:solid silver .75pt;mso-border-top-alt:silver .75pt; mso-border-left-alt:black 4.5pt;mso-border-bottom-alt:black 2.25pt; mso-border-right-alt:black .75pt;mso-border-style-alt:solid;background:teal; mso-shading:white;mso-pattern:solid teal;padding:0in 5.4pt 0in 5.4pt; width: 295px;">
            <p class="MsoNormal" style="mso-yfti-cnfc:4"><b style="mso-bidi-font-weight: normal"><i style="mso-bidi-font-style:normal">3</i></b></p>
            </td>
            <td valign="top" style="width:221.4pt;border-top:none;border-left: none;border-bottom:solid black 2.25pt;border-right:solid black 2.25pt; mso-border-top-alt:solid silver .75pt;background:#DDE6E6;mso-shading:white; mso-pattern:gray-25 teal;padding:0in 5.4pt 0in 5.4pt; width: 295px;">
            <p class="MsoNormal">4</p>
            </td>
        </tr>
    </table>
    <p class="MsoNormal">&nbsp;</p>
</div>

3. Additional removal of selected element-specific attributes

Input code »
<table class=MsoTableColorful3 border=1 cellspacing=0 cellpadding=0
style='border-collapse:collapse;border:none;mso-border-alt:solid black 2.25pt;
mso-yfti-tbllook:480;mso-padding-alt:0in 5.4pt 0in 5.4pt;mso-border-insideh:
.75pt solid silver'>
<tr style='mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt'>
<td width=295 valign=top style='width:221.4pt;border-top:2.25pt;border-left:
4.5pt;border-bottom:1.0pt;border-right:1.0pt;border-color:black;border-style:
solid;mso-border-top-alt:2.25pt;mso-border-left-alt:4.5pt;mso-border-bottom-alt:
.75pt;mso-border-right-alt:.75pt;mso-border-color-alt:black;mso-border-style-alt:
solid;background:black;mso-shading:white;mso-pattern:solid black;padding:
0in 5.4pt 0in 5.4pt;height:34.6pt'>
<p class=MsoNormal style='mso-yfti-cnfc:517'><i style='mso-bidi-font-style:
normal'><span style='color:white;mso-bidi-font-weight:bold'>1<o:p></o:p></span></i></p>

Output code »
<table border="1" cellspacing="0" cellpadding="0">
    <tr style="mso-yfti-irow:-1;mso-yfti-firstrow:yes;height:34.6pt">
        <td valign="top" style="width: 295px;">
            <p style="mso-yfti-cnfc:517"><i style="mso-bidi-font-style: normal"><span style="color:white;mso-bidi-font-weight:bold">1</span></i></p>
        </td>
    </tr>
</table>

htmLawed | PHP Labware home | visitors since 23 Oct'11