[Htmlparser-developer] Label Scanning bug
Brought to you by:
derrickoswald
From: <dha...@or...> - 2003-05-09 11:58:45
|
Hi all, I found a mistake in the LabelScanner while doing some testing. Attached changed code and test case for the same. Derrick can you please include it in the next release for me. Basically a string like <label>John Doe<label>Jane Doe</label> gets parsed as <LABEL>John Doe<LABEL>Jane Doe</LABEL></LABEL> instead of <LABEL>John Doe</LABEL><LABEL>Jane Doe</LABEL> after call to toHtml() on the single LabelTag. Also it is parsed as a single node instead of 2 distinct nodes. I am also turning OptionTag into a CompositeTag but am getting a similar problem out there. Trying to work that out as well. I am also facing a strange problem with a certain piee of code. Probably someone can help me out (I think its a bug and have logged it already). Consider the string: String testHTML = new String( "<LABEL value=\"Google Search\">Google</LABEL>" + "<LABEL value=\"AltaVista Search\">AltaVista" + "<LABEL value=\"Lycos Search\"></LABEL>" + "<LABEL>Yahoo!</LABEL>" + "<LABEL>\nHotmail</LABEL>" + "<LABEL value=\"ICQ Messenger\">" + "<LABEL>Mailcity\n</LABEL>"+ "<LABEL>\nIndiatimes\n</LABEL>"+ "<LABEL>\nRediff\n</LABEL>\n"+ "<LABEL>Cricinfo" + "<LABEL value=\"Microsoft Passport\">" + "<LABEL value=\"AOL\"><SPAN>AOL</SPAN></LABEL>" + "<LABEL value=\"Time Warner\">Time <B>Warner <SPAN>AOL </SPAN>Inc.</B>" ); I added the LabelScanner to the parser and parsed. Strangely instead of returning node count as 13(number of LABEL tags) I get 17. Also when I see output of every node (using toHtml()), uptil "Microsoft Passport" everything is correct and I am getting LABEL tags as well. But the next node that I get is a String node with value as #alue="AOL"># (without the hash) and that entire tag got messed up. Any ideas. I have attached test file for that purpose. U'll also have to use the new LabelScanner.java file. Its quite strange. Regards, Dhaval |