From: Mike B. <mb...@Ga...> - 2005-05-11 11:10:04
|
Gael Harbonnier wrote: > Ok I started to study the source code but I noticed that the html is > well parsed when I call > > WebResponse rep = webClient.getWebConnection().getResponse(new > WebRequestSettings(url)); > System.out.println(rep.getContentAsString()); I've never seen NekoHTML fail to parse legal html. The only time I've seen it get confused is when the html is really badly formed. What you get from asXml() is the DOM tree after NekoHTML has parsed it. If the elements you want aren't there then NekoHTML stripped them out. If you believe that the html is correct and that NekoHTML has a bug then either post some html samples here or send them to the author of NekoHTML - http://people.apache.org/~andyc/neko/doc/html/ An easy way to see if your html is legal is to run it through the W3C validator - http://validator.w3.org/ If NekoHtml can't parse the html then I'd almost guarentee that it won't pass the validator either. Hope this helps. -- Mike Bowler President, Gargoyle Software Inc. Website: http://www.GargoyleSoftware.com Weblog : http://www.SphericalImprovement.com/blogs/mbowler/ |