[Htmlparser-developer] Label Scanning bug

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I found a mistake in the LabelScanner while doing some testing. Attached 
changed code and test case for the same. Derrick can you please include it in 
the next release for me.

Basically a string like <label>John Doe<label>Jane Doe</label>

gets parsed as 
<LABEL>John Doe<LABEL>Jane Doe</LABEL></LABEL>

instead of 
<LABEL>John Doe</LABEL><LABEL>Jane Doe</LABEL>

after call to toHtml() on the single LabelTag.

Also it is parsed as a single node instead of 2 distinct nodes. 

I am also turning OptionTag into a CompositeTag but am getting a similar 
problem out there. Trying to work that out as well.

I am also facing a strange problem with a certain piee of code. Probably 
someone can help me out (I think its a bug and have logged it already).

Consider the string:

String testHTML = new String(
					"<LABEL value=\"Google Search\">Google</LABEL>" +
					"<LABEL value=\"AltaVista Search\">AltaVista" +
					"<LABEL value=\"Lycos Search\"></LABEL>" +
					"<LABEL>Yahoo!</LABEL>" + 
					"<LABEL>\nHotmail</LABEL>" +
					"<LABEL value=\"ICQ Messenger\">" +
					"<LABEL>Mailcity\n</LABEL>"+
					"<LABEL>\nIndiatimes\n</LABEL>"+
					"<LABEL>\nRediff\n</LABEL>\n"+
					"<LABEL>Cricinfo" +
					"<LABEL value=\"Microsoft Passport\">" +
					"<LABEL value=\"AOL\"><SPAN>AOL</SPAN></LABEL>" +
					"<LABEL value=\"Time Warner\">Time <B>Warner <SPAN>AOL 
</SPAN>Inc.</B>"
					);

I added the LabelScanner to the parser and parsed. Strangely instead of 
returning node count as 13(number of LABEL tags) I get 17. Also when I see 
output of every node (using toHtml()), uptil "Microsoft Passport" everything is 
correct and I am getting LABEL tags as well. But the next node that I get is a 
String node with value as #alue="AOL"># (without the hash) and that entire tag 
got messed up. Any ideas. I have attached test file for that purpose. U'll also 
have to use the new LabelScanner.java file. Its quite strange.

Regards,
Dhaval