I have the following HTML:
<img src="http://www.union.wisc.edu/emails/2014/images/wud_e_Revelry_14_0726.jpg"" alt="Revel!" width="1056" height="738" border="1"http://www.union.wisc.edu/emails/2014/images/hoofer_f_CouncilMeeting_14_0533.jpg border-color="CCCCCC"/>
But the parser is not able to pick up this tag and skips it as instance of StartTag. Here is the Jericho log:
2014-05-23 11:46:49 jericho [ERROR] StartTag img at (p0) has missing whitespace after quoted attribute value at position (p81)
2014-05-23 11:46:49 jericho [ERROR] StartTag img at (p0) contains attribute name with invalid first character at position (p81)
2014-05-23 11:46:49 jericho [ERROR] StartTag img at (p0) has missing whitespace after quoted attribute value at position (p83)
2014-05-23 11:46:49 jericho [ERROR] StartTag img at (p0) has missing whitespace after quoted attribute value at position (p132)
2014-05-23 11:46:49 jericho [ERROR] StartTag img at (p0) rejected because it contains too many errors
2014-05-23 11:46:49 jericho [ERROR] Encountered possible StartTag at (p0) whose content does not match a registered StartTagType
The problem is that the browser (FF) is still able to parse and render the image, whereas I am trying to pick off the src attribute and rewrite the URL.
Another similar test case:
Ok, I think I have found a solution. If I set Attributes.setDefaultMaxErrorCount() to something higher than 2 (default), the parsing continues and I am able to get the attributes.
Sorry for not responding, email notifications weren't working for a few days. I'm glad you found the solution!