Re: [Htmlunit-user] Handling of entities in htmlunit?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

It seems like (1) would be the appropriate thing to do. While I understand your
cross-platform concern I think the appropriate place to address that concern would
be as close to the problem as possible: create a way for webtest to read something
like (2) or (3) and turn it into (1) before passing it along to htmlunit (I know
little of webtest or ant's encoding issues but being xml based I'm actually quite
surprised you can't do this already).

Also, after briefly looking at the character set determination in htmlunit it's
entirely possible that we're interpreting something incorrectly since I don't see
anything that will use a char set in the html file (which seems a little backwards
anyway but it should be possible).

Brad C

PS Yahoo Mail converted your (3) into (1) when I replied to this message! :D

--- "Denis N. Antonioli" <den...@ca...> wrote:

> what I really want is to write simple, robust tests that handle 
> extended character sets (internationalization?) without excessive 
> magic.
> 
> In the present situation, I have an html page that comes from the 
> server with an encoding and gets translated by htmlunit into a 
> different encoding. As far as I know, that last encoding can be 
> determined by the web server, by the content of the html page or by 
> some default.
> On the other side, I write tests in xml documents which are again 
> translated by an xml parser (ant) into a different encoding.
> 
> The first problem I have is: how to code in the xml document non-ascii 
> characters (such as an uuml or a nbsp)?
> 
> (1) <verifyxpath xpath="/html/body/h2" text="Resultate für das Team 33" 
> />
> (2) <verifyxpath xpath="/html/body/h2" text="Resultate f&amp;uuml;r das 
> Team 33" />
> (3) <verifyxpath xpath="/html/body/h2" text="Resultate für das 
> Team 33" />
> 
> 
> Then comes: And how to do it in a way that makes the xml document 
> robust in a multi-platform environment? I have seen enough files with ü 
> or é written on a PC becoming unreadable once moved to, e.g., a Mac or 
> a Linux computer. I'd rather not use solution (1).
> 
> So, in solution (2), there is a comparison of the ascii characters 
> '&uuml;' , which will most probably be the same in all situations.
> In solution (3), am I sure that htmlunit will always translate to 
> unicode?
> 
> 
> Best
> 	dna
> 
> 
> On 1 avr. 05, at 21:09, Brad Clarke wrote:
> 
> > I don't really understand why you'd want this. Once the document has 
> > already been
> > parsed why does it matter if it was an enitity or not? What exactly 
> > are you testing
> > that you need to know an enitity was used?
> >
> > The test failures are a known bug in a supporting library that will be 
> > fixed when
> > that library is released again.
> >
> > Brad C
> >
> > --- "Denis N. Antonioli" <den...@ca...> wrote:
> >
> >> Hi Marc
> >>
> >> I had to find the time to look into it before posting again...
> >>
> >> On a practical level: I modified locally the head htmlunit from cvs to
> >> behave as proposed, and the changes amount to less than 10 statements
> >> in the productive code, including a pair of setter/getter. The 
> >> included
> >> unit tests, at least, show no impacts on the rest of the 
> >> functionality.
> >>
> >> I see your point, most (all?) common browsers give us a only view on
> >> the original document. I do think though that the functionality 
> >> belongs
> >> to htmlunit for three main reasons.
> >>
> >> First, I find it easier to let the entitites untouched, rather than
> >> converting them to characters in htmlunit and converting the 
> >> characters
> >> back to entities in webtest.
> >>
> >> Second, the two conversions won't be equivalent to the original
> >> document. Should the original document contains a mix of entities and
> >> characters, the final document will contain only characters (as now) 
> >> or
> >> only entities.
> >>
> >> Third, the need for unconverted entities may arise by other user of
> >> htmlunit. The necessary conversion code would then be replicated.
> >>
> >>
> >> By the way, I could not test htmlunit with maven 1.0.2, maven
> >> complained about attempting to execute scripts that had been garbage
> >> collected. Is this known?
> >>
> >>
> >> Best
> >> 	dna
> >>
> >> On 25 mars 05, at 13:53, Marc Guillemot wrote:
> >>
> >>> Hi Denis,
> >>>
> >>> I now think that htmlunit should resolve the entities as it does and
> >>> that it would be wrong to have the entity code "as it". My motivation
> >>> comes from the comparison with browsers: except view source, which is
> >>> comparable to WebResponse.getContentAsString(), the different methods
> >>> to access the source show the resolved entity: in js innerHTML or
> >>> innerText (for IE) and View selection source (for Mozilla). Therefore
> >>> I think that htmlunit behaves like browsers, what is correct, and 
> >>> that
> >>> we should handle it only in webtest.
> >>>
> >>> Marc.
> >>>
> >>> Denis N. Antonioli wrote:
> >>>> Hi
> >>>> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for
> >>>> those that don't know it).
> >>>> In the present case, Webtest lets htmlunit generate a dom of an html
> >>>> page before querying the document with xpath.
> >>>> For example <verifyxpath xpath="/html/body/h2" text="Resultate"/>
> >>>> makes sure that a h2 header displays the text 'Resultate'.
> >>>> I have the problem that, at some time, the text I want webtest to
> >>>> verify is using html entities:
> >>>> <verifyxpath xpath="/html/body/h2" text="Resultate f&amp;uuml;r das
> >>>> Team 33"/>
> >>>> With the help of Marc, I've found that htmlunit is always generating
> >>>> a dom where all entities have been resolved.
> >>>> nekohtml seems to provide a feature
> >>>> (http://apache.org/xml/features/scanner/notify-builtin-refs) to tell
> >>>> when/where the
> >>>> source contains entities (see description at
> >>>> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify-
> >>>> builtin-html-refs>).
> >>>> Does someone know if it is possible to get a dom tree in which the
> >>>> text nodes contain entitites instead of characters?
> >>>> Does it make sense?
> >>>> Was it already tried?
> >>>> Would it be difficult?
> >>>>     dna

Re: [Htmlunit-user] Handling of entities in htmlunit?

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Handling of entities in htmlunit?