From: Denis N. A. <den...@ca...> - 2005-04-01 18:50:39
|
Hi Marc I had to find the time to look into it before posting again... On a practical level: I modified locally the head htmlunit from cvs to behave as proposed, and the changes amount to less than 10 statements in the productive code, including a pair of setter/getter. The included unit tests, at least, show no impacts on the rest of the functionality. I see your point, most (all?) common browsers give us a only view on the original document. I do think though that the functionality belongs to htmlunit for three main reasons. First, I find it easier to let the entitites untouched, rather than converting them to characters in htmlunit and converting the characters back to entities in webtest. Second, the two conversions won't be equivalent to the original document. Should the original document contains a mix of entities and characters, the final document will contain only characters (as now) or only entities. Third, the need for unconverted entities may arise by other user of htmlunit. The necessary conversion code would then be replicated. By the way, I could not test htmlunit with maven 1.0.2, maven complained about attempting to execute scripts that had been garbage collected. Is this known? Best dna On 25 mars 05, at 13:53, Marc Guillemot wrote: > Hi Denis, > > I now think that htmlunit should resolve the entities as it does and > that it would be wrong to have the entity code "as it". My motivation > comes from the comparison with browsers: except view source, which is > comparable to WebResponse.getContentAsString(), the different methods > to access the source show the resolved entity: in js innerHTML or > innerText (for IE) and View selection source (for Mozilla). Therefore > I think that htmlunit behaves like browsers, what is correct, and that > we should handle it only in webtest. > > Marc. > > Denis N. Antonioli wrote: >> Hi >> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for >> those that don't know it). >> In the present case, Webtest lets htmlunit generate a dom of an html >> page before querying the document with xpath. >> For example <verifyxpath xpath="/html/body/h2" text="Resultate"/> >> makes sure that a h2 header displays the text 'Resultate'. >> I have the problem that, at some time, the text I want webtest to >> verify is using html entities: >> <verifyxpath xpath="/html/body/h2" text="Resultate f&uuml;r das >> Team 33"/> >> With the help of Marc, I've found that htmlunit is always generating >> a dom where all entities have been resolved. >> nekohtml seems to provide a feature >> (http://apache.org/xml/features/scanner/notify-builtin-refs) to tell >> when/where the >> source contains entities (see description at >> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify- >> builtin-html-refs>). >> Does someone know if it is possible to get a dom tree in which the >> text nodes contain entitites instead of characters? >> Does it make sense? >> Was it already tried? >> Would it be difficult? >> dna > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > -- Beware of the bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth |