There are a number of sites out there that are built with very poor HTML. One thing that seems to haven too often is the same attribute entered multiple times in the same element. This causes an error when using XML tools, so it needs to be avoided while it's being read in.
The attached patch checks for the existence of a given attribute list is being populated in the startElement phase. Only the first instance of a given attribute is used.
Disallow multiple duplicate attributes in elements