Re: [Htmlunit-user] Handling of entities in htmlunit?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

what I really want is to write simple, robust tests that handle=20
extended character sets (internationalization?) without excessive=20
magic.

In the present situation, I have an html page that comes from the=20
server with an encoding and gets translated by htmlunit into a=20
different encoding. As far as I know, that last encoding can be=20
determined by the web server, by the content of the html page or by=20
some default.
On the other side, I write tests in xml documents which are again=20
translated by an xml parser (ant) into a different encoding.

The first problem I have is: how to code in the xml document non-ascii=20=

characters (such as an uuml or a nbsp)?

(1) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f=FCr das =
Team 33"=20
/>
(2) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f&amp;uuml;r =
das=20
Team 33" />
(3) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f&#x00fc;r =
das=20
Team 33" />

Then comes: And how to do it in a way that makes the xml document=20
robust in a multi-platform environment? I have seen enough files with =FC=20=

or =E9 written on a PC becoming unreadable once moved to, e.g., a Mac or=20=

a Linux computer. I'd rather not use solution (1).

So, in solution (2), there is a comparison of the ascii characters=20
'&uuml;' , which will most probably be the same in all situations.
In solution (3), am I sure that htmlunit will always translate to=20
unicode?

Best
	dna

On 1 avr. 05, at 21:09, Brad Clarke wrote:

> I don't really understand why you'd want this. Once the document has=20=

> already been
> parsed why does it matter if it was an enitity or not? What exactly=20
> are you testing
> that you need to know an enitity was used?
>
> The test failures are a known bug in a supporting library that will be=20=

> fixed when
> that library is released again.
>
> Brad C
>
> --- "Denis N. Antonioli" <den...@ca...> wrote:
>
>> Hi Marc
>>
>> I had to find the time to look into it before posting again...
>>
>> On a practical level: I modified locally the head htmlunit from cvs =
to
>> behave as proposed, and the changes amount to less than 10 statements
>> in the productive code, including a pair of setter/getter. The=20
>> included
>> unit tests, at least, show no impacts on the rest of the=20
>> functionality.
>>
>> I see your point, most (all?) common browsers give us a only view on
>> the original document. I do think though that the functionality=20
>> belongs
>> to htmlunit for three main reasons.
>>
>> First, I find it easier to let the entitites untouched, rather than
>> converting them to characters in htmlunit and converting the=20
>> characters
>> back to entities in webtest.
>>
>> Second, the two conversions won't be equivalent to the original
>> document. Should the original document contains a mix of entities and
>> characters, the final document will contain only characters (as now)=20=

>> or
>> only entities.
>>
>> Third, the need for unconverted entities may arise by other user of
>> htmlunit. The necessary conversion code would then be replicated.
>>
>>
>> By the way, I could not test htmlunit with maven 1.0.2, maven
>> complained about attempting to execute scripts that had been garbage
>> collected. Is this known?
>>
>>
>> Best
>> 	dna
>>
>> On 25 mars 05, at 13:53, Marc Guillemot wrote:
>>
>>> Hi Denis,
>>>
>>> I now think that htmlunit should resolve the entities as it does and
>>> that it would be wrong to have the entity code "as it". My =
motivation
>>> comes from the comparison with browsers: except view source, which =
is
>>> comparable to WebResponse.getContentAsString(), the different =
methods
>>> to access the source show the resolved entity: in js innerHTML or
>>> innerText (for IE) and View selection source (for Mozilla). =
Therefore
>>> I think that htmlunit behaves like browsers, what is correct, and=20
>>> that
>>> we should handle it only in webtest.
>>>
>>> Marc.
>>>
>>> Denis N. Antonioli wrote:
>>>> Hi
>>>> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for
>>>> those that don't know it).
>>>> In the present case, Webtest lets htmlunit generate a dom of an =
html
>>>> page before querying the document with xpath.
>>>> For example <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate"/>=

>>>> makes sure that a h2 header displays the text 'Resultate'.
>>>> I have the problem that, at some time, the text I want webtest to
>>>> verify is using html entities:
>>>> <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f&amp;uuml;r =
das
>>>> Team 33"/>
>>>> With the help of Marc, I've found that htmlunit is always =
generating
>>>> a dom where all entities have been resolved.
>>>> nekohtml seems to provide a feature
>>>> (http://apache.org/xml/features/scanner/notify-builtin-refs) to =
tell
>>>> when/where the
>>>> source contains entities (see description at
>>>> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify-
>>>> builtin-html-refs>).
>>>> Does someone know if it is possible to get a dom tree in which the
>>>> text nodes contain entitites instead of characters?
>>>> Does it make sense?
>>>> Was it already tried?
>>>> Would it be difficult?
>>>>     dna
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real=20
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
--=20
El pasado es indestructible; tarde o temprano vuelven todas las cosas,
y una de las cosas que vuelven es el proyecto de abolir el pasado.
   -- J.L. Borges, "Nathaniel Hawthorne"

Re: [Htmlunit-user] Handling of entities in htmlunit?

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Handling of entities in htmlunit?