Actually this isn't correct. htmlcxx is limited to "basic" HTML (opposed to full XML) and expects an encoding that's compatible with the first half of ISO-8859-1 (i.e. also matches UTF-8).
Other encodings are likely to fail.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The htmlcxx library is encoding agnostic. You need to know the original encoding of the text.
Actually this isn't correct. htmlcxx is limited to "basic" HTML (opposed to full XML) and expects an encoding that's compatible with the first half of ISO-8859-1 (i.e. also matches UTF-8).
Other encodings are likely to fail.