Re: [Htmlunit-user] Disable Tidy

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gael Harbonnier wrote:

> Ok I started to study the source code but I noticed that the html is 
> well parsed when I call
>
> WebResponse rep = webClient.getWebConnection().getResponse(new 
> WebRequestSettings(url));
> System.out.println(rep.getContentAsString());

I've never seen NekoHTML fail to parse legal html.  The only time I've 
seen it get confused is when the html is really badly formed.

What you get from asXml() is the DOM tree after NekoHTML has parsed it.  
If the elements you want aren't there then NekoHTML stripped them out.  
If you believe that the html is correct and that NekoHTML has a bug then 
either post some html samples here or send them to the author of 
NekoHTML - http://people.apache.org/~andyc/neko/doc/html/

An easy way to see if your html is legal is to run it through the W3C 
validator - http://validator.w3.org/

If NekoHtml can't parse the html then I'd almost guarentee that it won't 
pass the validator either.

Hope this helps.

-- 
Mike Bowler
President, Gargoyle Software Inc.
Website: http://www.GargoyleSoftware.com
Weblog : http://www.SphericalImprovement.com/blogs/mbowler/

Re: [Htmlunit-user] Disable Tidy

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Disable Tidy