From: Glanville, J. <Jay...@Na...> - 2003-06-04 14:13:14
|
I was doing some debugging of one of my tests and I found a rather interesting thing: if I place a <font> tag inside an anchor <a> tag, then the parser duplicates that anchor. But, according to the Transitional DTD for HTML 4.01, it is valid to place a <font> inside a <a>. I have included in this email three things: the original HTML document that I'm testing, the test case method and the results of the System.out output. As you can see by my HTML document, I only have two anchor tags, but you can see in the output I've been told that there are three anchors. When you look at the output of the HtmlPage.asXml() method, you see that the parser has taken copied the anchor tag outside of the font tag and placed the copy inside the font tag. This doesn't match my source code. As a test, I've validated my original document through the HTML validator at http://validator.w3.org/ with the stated results that "This Page Is Valid HTML 4.01 Transitional!" Is this expected behaviour, or should I enter this as a bug on the SourceForge web site? JDG ----- start of html document ----- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Bad HTML</title> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-5" > </head> <body> <p><a href="http://www.google.ca"><font size="+1">Google</font></a> <p><font size="+1"><a href="http://www.google.ca">Google</a></font> </body> </html> ----- start of html document ----- ----- start of test ----- public final void testBadHtml() throws Exception { final WebClient client = new WebClient(); final URL url = new URL( "http://localhost:8080/dummyserver/badhtml.html" ); final HtmlPage page = (HtmlPage) client.getPage( url ); assertEquals( "Bad HTML", page.getTitleText() ); List anchors = page.getAnchors(); for ( int i = 0; i < anchors.size(); i++ ) { HtmlAnchor currentAnchor = (HtmlAnchor) anchors.get( i ); System.out.println( "current anchor: " + currentAnchor ); System.out.println( "current anchor text: " + currentAnchor.asText() ); } System.out.println( page.asXml() ); } ----- end of test ----- ----- start of output ----- current anchor: HtmlAnchor[<a href="http://www.google.ca">] current anchor text: current anchor: HtmlAnchor[<a href="http://www.google.ca">] current anchor text: Google current anchor: HtmlAnchor[<a href="http://www.google.ca">] current anchor text: Google <html> <head> <title> Bad HTML </title> <meta content="text/html; charset=ISO-8859-5" http-equiv="Content-Type"/> </head> <body> <p> <a href="http://www.google.ca"/> <font size="+1"> <a href="http://www.google.ca"> Google </a> </font> </p> <p> <font size="+1"> <a href="http://www.google.ca"> Google </a> </font> </p> </body> </html> ----- end of output ----- -- Jay Glanville Web Developer jay...@na... (613) 725-2030 x393 |