When a meta charset is able to decode but not encode, HTMLScanner.isEncodingCompatible() will throw UnsupportedOperationException in String.getBytes(). Attached patch allows for this case, test case included.
In the patch, I returned false after 2 UnsupportedOperationExceptions. Upon reflection I don't think that's the desirable. Both encodings are valid, or there would have been an UnsupportedEncodingException. Unless there's something further that could be done to check whether the new encoding will work, I think the right thing is to trust the document about its content. ignore-specified-charset is already available if the client isn't willing to trust the document.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the patch, I returned false after 2 UnsupportedOperationExceptions. Upon reflection I don't think that's the desirable. Both encodings are valid, or there would have been an UnsupportedEncodingException. Unless there's something further that could be done to check whether the new encoding will work, I think the right thing is to trust the document about its content. ignore-specified-charset is already available if the client isn't willing to trust the document.
Patch applied. Many thanks and sorry for the delay.