Thread: [nekohtml-user] help with parsing bad html anchors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

  I am trying to parse some particularly annoying HTML, and 
unfortunately Neko is not handling it very well.  In the HTML document, 
I have links that look like this:  "<a class=foo 1' href='/link/text'>1</a>"

Unfortunately, the "href" attribute gets dropped completely, and the 
anchor ends up with a "class" attribute, and another attribute named "1" 
whose value is an empty string.

Any suggestions?

This is pretty important for me to be able to parse, so I'm even willing 
to patch the parser if necessary.

Thanks!!!  :-)

-Donnie Pinkston