I was serializing the DOM of a page containing <noframes> and noticed a
problem. For background, <noframes> was a child of <html> and the page also
contained a <frameset>, but <noframes> was outside <frameset>. The result
of parsing was that the contents of noframes was stored in the DOM as one
big, unparsed, TEXT node.
As a result of being stored as an unparsed TEXT node, upon serialization of
the DOM, all the contents of <noframes> were escaped. That is, this markup
parsed by NekoHTML...
<html>
...
<noframes>
<p>some text</p>
</noframes>
...
</html>
...became this after serialization...
<html>
...
<noframes>
<p>some text</p>
</noframes>
...
</html>
Looking at the changes history, it seems like this is the result of a "fix"
from release 1.9.14. Specifically, #2854697 which maps to NekoHTML bug
#87...
http://sourceforge.net/p/nekohtml/bugs/87/
Can someone explain how this "fixes" anything? I guess there was,
apparently, a StackOverflowError that no longer occurs, but it leaves the
DOM in a state where the serializer needs to take special care to output the
contents of <noframes> in the way they were originally meant to be. How does
that help anything? Was this really the intention of the "fix" or is this a
bug/regression?
Note that <noscript> appears to be parsed like normal. Why the difference
with <noframes>?
Jake
|