Re: [Htmlunit-user] Handling of entities in htmlunit?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I don't really understand why you'd want this. Once the document has already been
parsed why does it matter if it was an enitity or not? What exactly are you testing
that you need to know an enitity was used?

The test failures are a known bug in a supporting library that will be fixed when
that library is released again.

Brad C

--- "Denis N. Antonioli" <den...@ca...> wrote:

> Hi Marc
> 
> I had to find the time to look into it before posting again...
> 
> On a practical level: I modified locally the head htmlunit from cvs to 
> behave as proposed, and the changes amount to less than 10 statements 
> in the productive code, including a pair of setter/getter. The included 
> unit tests, at least, show no impacts on the rest of the functionality.
> 
> I see your point, most (all?) common browsers give us a only view on 
> the original document. I do think though that the functionality belongs 
> to htmlunit for three main reasons.
> 
> First, I find it easier to let the entitites untouched, rather than 
> converting them to characters in htmlunit and converting the characters 
> back to entities in webtest.
> 
> Second, the two conversions won't be equivalent to the original 
> document. Should the original document contains a mix of entities and 
> characters, the final document will contain only characters (as now) or 
> only entities.
> 
> Third, the need for unconverted entities may arise by other user of 
> htmlunit. The necessary conversion code would then be replicated.
> 
> 
> By the way, I could not test htmlunit with maven 1.0.2, maven 
> complained about attempting to execute scripts that had been garbage 
> collected. Is this known?
> 
> 
> Best
> 	dna
> 
> On 25 mars 05, at 13:53, Marc Guillemot wrote:
> 
> > Hi Denis,
> >
> > I now think that htmlunit should resolve the entities as it does and 
> > that it would be wrong to have the entity code "as it". My motivation 
> > comes from the comparison with browsers: except view source, which is 
> > comparable to WebResponse.getContentAsString(), the different methods 
> > to access the source show the resolved entity: in js innerHTML or 
> > innerText (for IE) and View selection source (for Mozilla). Therefore 
> > I think that htmlunit behaves like browsers, what is correct, and that 
> > we should handle it only in webtest.
> >
> > Marc.
> >
> > Denis N. Antonioli wrote:
> >> Hi
> >> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for 
> >> those that don't know it).
> >> In the present case, Webtest lets htmlunit generate a dom of an html 
> >> page before querying the document with xpath.
> >> For example <verifyxpath xpath="/html/body/h2" text="Resultate"/> 
> >> makes sure that a h2 header displays the text 'Resultate'.
> >> I have the problem that, at some time, the text I want webtest to 
> >> verify is using html entities:
> >> <verifyxpath xpath="/html/body/h2" text="Resultate f&amp;uuml;r das 
> >> Team 33"/>
> >> With the help of Marc, I've found that htmlunit is always generating 
> >> a dom where all entities have been resolved.
> >> nekohtml seems to provide a feature 
> >> (http://apache.org/xml/features/scanner/notify-builtin-refs) to tell 
> >> when/where the
> >> source contains entities (see description at 
> >> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify- 
> >> builtin-html-refs>).
> >> Does someone know if it is possible to get a dom tree in which the 
> >> text nodes contain entitites instead of characters?
> >> Does it make sense?
> >> Was it already tried?
> >> Would it be difficult?
> >>     dna

Re: [Htmlunit-user] Handling of entities in htmlunit?

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Handling of entities in htmlunit?