<![CDATA[ ... ]]> seemingly not supported at all
Brought to you by:
derrickoswald
If you use HTMLParser to extract text of a document, you're relying on text nodes being returned as Text.
If you feed in a trivial document like this:
<p>I hope this works.</p>
The text does come back in a single Text node.
If you feed in a document like this instead:
<p><![CDATA[I hope this works.]]></p>
Now you get no Text nodes at all. You get a Tag! ![CDATA[I is the name and "hope", "this" and "works.]]" are somehow attributes despite them not looking like attributes at all.
This also causes serious problems if it appears inside <script>. which is of course the most common place it occurs.</p></script>