Menu

Python 3 support?

Matti
2014-08-21
2016-04-03
  • Matti

    Matti - 2014-08-21

    Any plans on adding support for python 3? Or is it someway possible to run a local python 3 install instead of the bundled 2.7?
    And as you known, there are some syntax changes from 2.x to 3.x eg "class C(MyBaseClass, metaclass=MyMetaClass)".
    I've written a pep8/pyflakes linter for parsing python code and displaying indicators and annotations for the outputted warnings. While keeping the parsed source code backwards compatible with 2.7, PythonScript running 2.7 will suffice but if there's any python3-only syntax it won't work.
    Ideally I would like to have the option to switch python version.

     
  • Dave Brotherstone

    Unfortunately, you can't switch between 2 and 3 without C++ code changes in PythonScript. There are currently no plans to switch to python 3 for one simple reason: strings.

    Strings in Python 2 are effectively byte arrays, and can hold text encoded in ASCII, any single byte encoding, or UTF-8. Scintilla uses UTF-8 to store text and all text manipulations are performed with UTF-8 and byte offsets. Python 3 stores all strings as UTF-16. What this means is that all the lengths in Python 3 are character lengths, and not byte lengths, but scintilla needs byte lengths. This means the (PythonScript) user has to be extra careful when using offsets and lengths in python 3.

    There could maybe things we could do to aid this, but ultimately it would be down to the author of the script to ensure that they don't run into problems when characters come up outside of ASCII. I don't think the win for Python 3 is great enough to justify the extra problems that are likely to come up.

    Cheers,
    Dave.

     
  • Matti

    Matti - 2014-08-22

    Well, at some point comes the time to move to python 3. 2.7 maintenance ends 2020.
    The str/bytes changes from python 2 to 3 must be manageable in C++.
    One must already be aware of this in python, but it's not hard to keep the code compatible with both 2.7 and 3. So script authors could easily handle the needed changes if he/she would choose to use 2 and/or 3.

     
  • Franklin Lee

    Franklin Lee - 2016-03-26

    Does (C)Python 3 really use UTF-16?

    >>> import sys
    >>> s = ' ' * 1000
    >>> sys.getsizeof(s)
    1025
    >>> sys.getsizeof(bytes(s, 'utf-8'))
    1017
    

    Wouldn't it use at least two bytes per ASCII character?

    (Really really late, I know. Development is on Github. It's also not really relevant, since Python 3 would be using bytes instead of str if you wanted byte offsets, and the discussion would be about how to make that more convenient to use.)

    Edit: I didn't realize there was a nested reply system.

    Edit: Python str uses the smallest possible fixed encoding for each string, which means the internal representation's char width is the width of the biggest character in that string. See https://www.python.org/dev/peps/pep-0393/. It also says, "the specification chooses UTF-8 as the recommended way of exposing strings to C code."

    >>> delta = '\u0394'
    >>> s = delta * 500 + ' ' * 500
    >>> b = bytes(delta, 'utf-8') * 500 + b' ' * 500
    >>> sys.getsizeof(s)
    2038
    >>> sys.getsizeof(b)
    1517
    
     

    Last edit: Franklin Lee 2016-03-26
  • Dave Brotherstone

    Yes, pep 393 changed the way internally thngs are stored since 3.3. But really, it's not the internal structure that matters, it's what len(...) returns.

    Python 3:

    >>> len('Dänemark')
    8
    

    Python 2:

    >>> len('Dänemark')
    9
    

    You can probably imagine any number of scenarios where a script does something like an editor.search, looks at the len() of the result, then uses that in a call to editor.setTarget(). This goes against the principal of least surprise.

    Having said that, if somebody wanted to produce a Python 3 version, ie. send a pull request, I'd be more than happy to release a Python3Script. But I don't currently have the bandwidth to make all the necessary changes (and all the thinking about how to deal with all the string based editor methods).

     
  • Randall McDougall

    2 points (off the top):
    • I don’t have any interest in learning 2 very similar-but-different syntaxes for what purports to be different versions of the same language and trying to keep them straight especially when the one in use here has been declared dead already for years, so it’s the one I’ll soon be forced to forget anyway.
    • Looking at your own example:

    Python 3:

    len('Dänemark')
    8

    Python 2:

    len('Dänemark')
    9
    Seriously, which result makes more sense in a text-editing context? From a programming end they are both explainable, but not so much to someone from an end user background trying to write macros — and on top of that you want to tell them that it’s going to change in a few years anyway?

    As someone who’s done this for a long time, but mostly avoided Python until recently … well, it looked like it would be worth taking an interest in initially, but this sort of thing makes me think it should just be avoided for a half dozen more years and see if there’s anything left since most peole don’t seem willing to take the new version seriously.

     

Log in to post a comment.