There are a ton of bug reports with regard to Unicode conversions.  I don't know what the best way to deal with them all are - so far I've been ignoring them.  I suspect the right answer is to somehow go through all the code, and everywhere there may be a unicode value (e.g. anything input on the command line as a URL, anything obtained from os.walk(), or anything returned in XML from S3), make sure we're treating it as unicode.  That feels like a painful bit of work though...

Any suggestions on how to go about doing so?  If only we could turn on some kind of type checking and catch wherever we may be string-ifying what should be UTF-8-encoded unicode values...

Thanks,
Matt