The Parser class has a setEncoding() method that can be used.
Unfortunately the parser used by the SiteCapturer is not exposed
publicly, so you will need to add accessors and rebuild the library or
subclass from SiteCapturer and override process() to set the encoding
before proceeding.
Derrick
Ke Deng wrote:
> Hi,
> I use htmlparser v1.6. After I use SiteCapturer to download a site,
> I found the charset of page is changed: if the charset of page is not
> ISO-8859-1 but multiple bytes charset, the page captured by htmlparser
> contains many confused code.
> How to resolve this problem? Is there a way to set correct charset
> before capture?
> Regards,
> Karl.
>
>
>
|