Re: [Htmlparser-user] set correct charset when use SiteCapturer class
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-10-10 11:58:17
|
The Parser class has a setEncoding() method that can be used. Unfortunately the parser used by the SiteCapturer is not exposed publicly, so you will need to add accessors and rebuild the library or subclass from SiteCapturer and override process() to set the encoding before proceeding. Derrick Ke Deng wrote: > Hi, > I use htmlparser v1.6. After I use SiteCapturer to download a site, > I found the charset of page is changed: if the charset of page is not > ISO-8859-1 but multiple bytes charset, the page captured by htmlparser > contains many confused code. > How to resolve this problem? Is there a way to set correct charset > before capture? > Regards, > Karl. > > > |