From: Denis N. A. <den...@ca...> - 2005-04-04 21:23:02
|
what I really want is to write simple, robust tests that handle=20 extended character sets (internationalization?) without excessive=20 magic. In the present situation, I have an html page that comes from the=20 server with an encoding and gets translated by htmlunit into a=20 different encoding. As far as I know, that last encoding can be=20 determined by the web server, by the content of the html page or by=20 some default. On the other side, I write tests in xml documents which are again=20 translated by an xml parser (ant) into a different encoding. The first problem I have is: how to code in the xml document non-ascii=20= characters (such as an uuml or a nbsp)? (1) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f=FCr das = Team 33"=20 /> (2) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f&uuml;r = das=20 Team 33" /> (3) <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate für = das=20 Team 33" /> Then comes: And how to do it in a way that makes the xml document=20 robust in a multi-platform environment? I have seen enough files with =FC=20= or =E9 written on a PC becoming unreadable once moved to, e.g., a Mac or=20= a Linux computer. I'd rather not use solution (1). So, in solution (2), there is a comparison of the ascii characters=20 'ü' , which will most probably be the same in all situations. In solution (3), am I sure that htmlunit will always translate to=20 unicode? Best dna On 1 avr. 05, at 21:09, Brad Clarke wrote: > I don't really understand why you'd want this. Once the document has=20= > already been > parsed why does it matter if it was an enitity or not? What exactly=20 > are you testing > that you need to know an enitity was used? > > The test failures are a known bug in a supporting library that will be=20= > fixed when > that library is released again. > > Brad C > > --- "Denis N. Antonioli" <den...@ca...> wrote: > >> Hi Marc >> >> I had to find the time to look into it before posting again... >> >> On a practical level: I modified locally the head htmlunit from cvs = to >> behave as proposed, and the changes amount to less than 10 statements >> in the productive code, including a pair of setter/getter. The=20 >> included >> unit tests, at least, show no impacts on the rest of the=20 >> functionality. >> >> I see your point, most (all?) common browsers give us a only view on >> the original document. I do think though that the functionality=20 >> belongs >> to htmlunit for three main reasons. >> >> First, I find it easier to let the entitites untouched, rather than >> converting them to characters in htmlunit and converting the=20 >> characters >> back to entities in webtest. >> >> Second, the two conversions won't be equivalent to the original >> document. Should the original document contains a mix of entities and >> characters, the final document will contain only characters (as now)=20= >> or >> only entities. >> >> Third, the need for unconverted entities may arise by other user of >> htmlunit. The necessary conversion code would then be replicated. >> >> >> By the way, I could not test htmlunit with maven 1.0.2, maven >> complained about attempting to execute scripts that had been garbage >> collected. Is this known? >> >> >> Best >> dna >> >> On 25 mars 05, at 13:53, Marc Guillemot wrote: >> >>> Hi Denis, >>> >>> I now think that htmlunit should resolve the entities as it does and >>> that it would be wrong to have the entity code "as it". My = motivation >>> comes from the comparison with browsers: except view source, which = is >>> comparable to WebResponse.getContentAsString(), the different = methods >>> to access the source show the resolved entity: in js innerHTML or >>> innerText (for IE) and View selection source (for Mozilla). = Therefore >>> I think that htmlunit behaves like browsers, what is correct, and=20 >>> that >>> we should handle it only in webtest. >>> >>> Marc. >>> >>> Denis N. Antonioli wrote: >>>> Hi >>>> I'm using htmlunit through webtest (<http://webtest.canoo.com>, for >>>> those that don't know it). >>>> In the present case, Webtest lets htmlunit generate a dom of an = html >>>> page before querying the document with xpath. >>>> For example <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate"/>= >>>> makes sure that a h2 header displays the text 'Resultate'. >>>> I have the problem that, at some time, the text I want webtest to >>>> verify is using html entities: >>>> <verifyxpath xpath=3D"/html/body/h2" text=3D"Resultate f&uuml;r = das >>>> Team 33"/> >>>> With the help of Marc, I've found that htmlunit is always = generating >>>> a dom where all entities have been resolved. >>>> nekohtml seems to provide a feature >>>> (http://apache.org/xml/features/scanner/notify-builtin-refs) to = tell >>>> when/where the >>>> source contains entities (see description at >>>> <http://cvs.apache.org/~andyc/neko/doc/html/settings.html#notify- >>>> builtin-html-refs>). >>>> Does someone know if it is possible to get a dom tree in which the >>>> text nodes contain entitites instead of characters? >>>> Does it make sense? >>>> Was it already tried? >>>> Would it be difficult? >>>> dna > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real=20 > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > --=20 El pasado es indestructible; tarde o temprano vuelven todas las cosas, y una de las cosas que vuelven es el proyecto de abolir el pasado. -- J.L. Borges, "Nathaniel Hawthorne" |