It looks like a bug but I couldn't quickly locate anything that reads the content
type from the meta tags. Does it do the same thing with different encodings?
--- Vinay Murthy <vin...@gm...> wrote:
> Hi,
>
> I did notice a certain thread on the mailing list that discussed
> problems relating to the content-type of pages and how we could solve
> this problem by extending DefaultPageCreator.
>
> But, the page that I am trying to access has content-type set to
> text/html, which means I can use the DefaultPageCreator as such.
> However, in addition to this, the page also has the character set
> encoding set to UTF-16. The problem is that, with the UTF-16 charset
> in place, HtmlUnit is not able to detect any javascript on the page.
> Here is a dummy page that I created to test this out:
>
> <HTML>
> <HEAD>
> <TITLE> Page Two </TITLE>
> <META http-equiv="Content-Type" content="text/html; charset=UTF-16">
> <Script language="JavaScript" type="text/javaScript" src="test.js"></Script>
> </HEAD>
> <BODY>
> </BODY>
> </HTML>
>
> I found that if I remove the charset specification, the js gets
> detected and the file, test.js, is loaded. I really, do not have
> control on the source of the actual page that I am accessing.
>
> I have been using the HtmlUnit sources of 10th March 2005. Would
> appreciate your response to this.
>
> Regards
> Vinay
|