[Httplib2-discuss] Best way of retrieving a page as a unicode string?
Status: Beta
Brought to you by:
jcgregorio
From: Simon W. <si...@si...> - 2007-12-08 21:54:38
|
Is there a supported way of getting hold of a page as a Python unicode string with httplib2? As far as I can tell I need to do this: import httplib2 h = httplib2.Http() headers, content = h.request('http://simonwillison.net/', 'GET') content_type = headers.get('content-type', '') if 'charset' in content_type: junk, charset = content_type.split('charset=', 2) else: charset = 'iso-8859-1' unicode_content = content.decode(charset) Even the above doesn't look like it would properly solve the problem (I'm not sure if that's the best assumption for a default encoding, and I should probably be catching any unicode decoding exceptions and falling back on something else if they occur). Shouldn't this be handled by the library in some way? Cheers, Simon Willison |