Re: [Httplib2-discuss] Best way of retrieving a page as a unicode string?
Status: Beta
Brought to you by:
jcgregorio
From: Simon W. <si...@si...> - 2007-12-09 01:20:07
|
On 8 Dec 2007, at 22:10, Joe Gregorio wrote: > Oh, if it were only that simple :) For example, look at the > charset sniffing rules for JSON (RFC 4627) and XML 1.0 > <http://www.w3.org/TR/REC-xml/#sec-guessing>. You're > best bet will probably be to use: > > <http://chardet.feedparser.org/> But surely you can attempt to decode using the charset declared in the Content-Type header, and then fall back on the chardet library if a decoding error occurs? Or am I missing something really scary? Cheers, Simon |