#20 7-bit source str literals

closed-out-of-date
nobody
None
5
2006-10-10
2004-08-06
Anders J. Munch
No

I'd like to see a warning for non-unicode non-ascii
string literals.

Example:

# -*- coding: latin-1 -*-
x = "blĺbćrgrřd"

This is a likely error; the encoding declaration
doesn't actually do anything for the non-ascii string,
and consequently, when you pass it to some
unicode-capable presentation device, unicode(x) will
bomb with a UnicodeDecodeError exception.

Hence I'd like a warning for any non-unicode string
literal that contains non-ascii characters typed
literally (i.e. not entered as escape sequences). The
fix to avoid such a warning would, depending on your
intent, be either:

x = 'bl\xe5b\xe6rgr\xf8d' # intent: raw byte string
with bit-8-set values

or

# -*- coding: latin-1 -*-
x = u"blĺbćrgrřd" # intent: latin-1 character string

Discussion

    • status: open --> open-out-of-date
     
  • Logged In: YES
    user_id=384806

    Close this: Since Python 2.5 makes this a syntax error, so
    there is no longer any need for pychecker to do anything.

    (BTW, those question marks are not what I wrote, but then
    that's a SourceForge bug.)

     
    • status: open-out-of-date --> closed-out-of-date