From: Niklas V. <nik...@ja...> - 2004-10-22 15:25:20
|
OK, so maybe the hyphen is not the issue. The problem is that the documents don't have a complete ending tag. In my example, i) <! DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > is skipped altogether. I suppose the reason is that the space at the end of the tag (before the >) confuses the parser, making it expect a url such as "http://www.w3.org/TR/html4/loose.dtd" at the end. But - simply removing the last space, which gives us: ii) <! DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> works just fine with htmlunit. This is very strange behaviour, wouldn't you say? If it is ok to exclude the ending url, then it should of course be ok to have spaces at the end of the tag (before the >). In particular since a HTML parser is not supposed to care too much about spaces. It just seems a bit too picky :) / Niklas |