If an HTML file is actually a valid XHTML file, it will have the XML header at the top:
<?xml version="1.0" encoding="some_encoding" ?>
HTMLParser apparently returns this as a string node. Is that the expected behaviour?
Also, I wonder what happens in general when this sort of PI is encountered in the middle of a document.
e.g.:
<p>The value is <?php ... ?></p>
I assume that this shouldn't be text, either. But if I do a search and replace through the text for the pattern "<\?.*\?>", I might intercept some cases I'm not supposed to:
<p>The value is <?php ... ?></p>
Because I receive that value decoded, it will match the same expression.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If an HTML file is actually a valid XHTML file, it will have the XML header at the top:
<?xml version="1.0" encoding="some_encoding" ?>
HTMLParser apparently returns this as a string node. Is that the expected behaviour?
Also, I wonder what happens in general when this sort of PI is encountered in the middle of a document.
e.g.:
<p>The value is <?php ... ?></p>
I assume that this shouldn't be text, either. But if I do a search and replace through the text for the pattern "<\?.*\?>", I might intercept some cases I'm not supposed to:
<p>The value is <?php ... ?></p>
Because I receive that value decoded, it will match the same expression.