[Httplib2-discuss] Best way of retrieving a page as a unicode string?
Status: Beta
Brought to you by:
jcgregorio
|
From: Simon W. <si...@si...> - 2007-12-08 21:54:38
|
Is there a supported way of getting hold of a page as a Python unicode
string with httplib2? As far as I can tell I need to do this:
import httplib2
h = httplib2.Http()
headers, content = h.request('http://simonwillison.net/', 'GET')
content_type = headers.get('content-type', '')
if 'charset' in content_type:
junk, charset = content_type.split('charset=', 2)
else:
charset = 'iso-8859-1'
unicode_content = content.decode(charset)
Even the above doesn't look like it would properly solve the problem
(I'm not sure if that's the best assumption for a default encoding,
and I should probably be catching any unicode decoding exceptions and
falling back on something else if they occur). Shouldn't this be
handled by the library in some way?
Cheers,
Simon Willison
|