Re: [Htmlunit-user] Handling of entities in htmlunit?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Marc

I had to find the time to look into it before posting again...

On a practical level: I modified locally the head htmlunit from cvs to 
behave as proposed, and the changes amount to less than 10 statements 
in the productive code, including a pair of setter/getter. The included 
unit tests, at least, show no impacts on the rest of the functionality.

I see your point, most (all?) common browsers give us a only view on 
the original document. I do think though that the functionality belongs 
to htmlunit for three main reasons.

First, I find it easier to let the entitites untouched, rather than 
converting them to characters in htmlunit and converting the characters 
back to entities in webtest.

Second, the two conversions won't be equivalent to the original 
document. Should the original document contains a mix of entities and 
characters, the final document will contain only characters (as now) or 
only entities.

Third, the need for unconverted entities may arise by other user of 
htmlunit. The necessary conversion code would then be replicated.

By the way, I could not test htmlunit with maven 1.0.2, maven 
complained about attempting to execute scripts that had been garbage 
collected. Is this known?

Best
	dna

On 25 mars 05, at 13:53, Marc Guillemot wrote:

> Hi Denis,
>
> I now think that htmlunit should resolve the entities as it does and 
> that it would be wrong to have the entity code "as it". My motivation 
> comes from the comparison with browsers: except view source, which is 
> comparable to WebResponse.getContentAsString(), the different methods 
> to access the source show the resolved entity: in js innerHTML or 
> innerText (for IE) and View selection source (for Mozilla). Therefore 
> I think that htmlunit behaves like browsers, what is correct, and that 
> we should handle it only in webtest.
>
> Marc.
>
> Denis N. Antonioli wrote:
>> Hi
>> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for 
>> those that don't know it).
>> In the present case, Webtest lets htmlunit generate a dom of an html 
>> page before querying the document with xpath.
>> For example <verifyxpath xpath="/html/body/h2" text="Resultate"/> 
>> makes sure that a h2 header displays the text 'Resultate'.
>> I have the problem that, at some time, the text I want webtest to 
>> verify is using html entities:
>> <verifyxpath xpath="/html/body/h2" text="Resultate f&amp;uuml;r das 
>> Team 33"/>
>> With the help of Marc, I've found that htmlunit is always generating 
>> a dom where all entities have been resolved.
>> nekohtml seems to provide a feature 
>> (http://apache.org/xml/features/scanner/notify-builtin-refs) to tell 
>> when/where the
>> source contains entities (see description at 
>> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify- 
>> builtin-html-refs>).
>> Does someone know if it is possible to get a dom tree in which the 
>> text nodes contain entitites instead of characters?
>> Does it make sense?
>> Was it already tried?
>> Would it be difficult?
>>     dna
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
-- 
Beware of the bugs in the above code;
I have only proved it correct, not tried it.
   -- Donald Knuth

Re: [Htmlunit-user] Handling of entities in htmlunit?

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Handling of entities in htmlunit?