Hello!
I use nekohtml-0.9.5.jar in my project.
Sorry, older version! but I cannot build my project with latest
nekohtml because of some referenced library!
Then I'd like to parse HTML5 page (written for iPhone site)
such fragments written in that page
<a href="foo.html">
<p>some sentence</p>
<p>some sentence</p>
</a>
That's a wrong HTML4 or XHTML Document but correct HTML5 Document.
In this case I cannot get correct result but I got result s.a.
<a href="foo.html"></a>
<p><a href="foo.html">some sentence</a></p>
<p>some sentence</p>
I found that can be fixed by replacing forrowing sorcecode fragment and build:
org.cyberneko.html.HTMLElements:
184: new Element(A, "A", Element.INLINE, BODY, null),
To:
184: new Element(A, "A", 0, BODY, null),
and I got the result I wanted.
Now I have 2 questions
1. Does the latest nekohtml support HTML5 Documents?
2. Does my approach have any problems?
thank you very much for reading my broken english
Susumu ISHIGAMI
|