If you are parsing undecoded UTF-8 hparser.c:1861 issues the warning "Parsing of undecoded UTF-8 will give garbage when decoding entities..." if p_state->argspec_entity_decode is "true" - and this is set in HTML-Parser-3.56/hparser.c:729
if (a == ARG_ATTR || a == ARG_ATTRARR || a == ARG_DTEXT) {
p_state->argspec_entity_decode++;
}
But it seems to me that entities are only REALLY decoded if p_state->attr_encoded is false. So it seems to me, that the setting of p_state->argspec_entity_decode should also somehow depend on p_state->attr_encoded, something along the lines of
if (((a == ARG_ATTR || a == ARG_ATTRARR) && !p_state->attr_encoded) || a == ARG_DTEXT) {
p_state->argspec_entity_decode++;
}
I'm not sure if p_state->attr_encoded is avaliable at that moment, but you get the idea :-)