Menu

#298 <![CDATA[ ... ]]> seemingly not supported at all

v2.0
open
nobody
None
5
2013-11-18
2013-09-09
Trejkaz
No

If you use HTMLParser to extract text of a document, you're relying on text nodes being returned as Text.

If you feed in a trivial document like this:

<p>I hope this works.</p>

The text does come back in a single Text node.

If you feed in a document like this instead:

<p><![CDATA[I hope this works.]]></p>

Now you get no Text nodes at all. You get a Tag! ![CDATA[I is the name and "hope", "this" and "works.]]" are somehow attributes despite them not looking like attributes at all.

This also causes serious problems if it appears inside <script>. which is of course the most common place it occurs.

Discussion


Log in to post a comment.