#1 pythonwin editor does strange things with non-english chars

closed-fixed
pythonwin (177)
5
2004-04-22
2003-04-07
No

Reported by a few people, but to repro:
* Paste "# ça ne va pas" into pywin (or any other
string with extended char)
* Move to the ç
* Press Backspace

Some other character appears. This will be related to
pywin.idle not being mult-character aware.

Discussion

  • Tony Meyer

    Tony Meyer - 2003-05-09

    Logged In: YES
    user_id=552329

    As another example:

    In IDLE (0.8):
    >>> u = u'questa \xe8 bella'
    >>> u
    u'questa \xe8 bella'
    >>> print u
    questa bella

    In PythonWin (152):
    >>> u = u'questa \xe8 bella'
    >>> u
    u'questa \xe8 bella'
    >>> print u
    questa bell

    (note that if you copy the result of the 'print u', or type
    something after it, the a appears. *very* confusing!)

     
  • Oleg Noga

    Oleg Noga - 2003-08-18

    Logged In: YES
    user_id=551440

    So there are 3 bugs:
    ------------
    First bug (this report, 716708):

    Incorrect work of backspace. (Del works ok). I must press
    backspace two times to remove non-english character. This
    bug presents in every version of pythonwin from 1.46, and
    very probably, in the earlier versions.
    ------------

    Second bug (see followup by anadelonbrin, 2003-05-09 10:38
    ):

    Editor does not display characters after non-english.
    Another reproduces:
    - open py file with non-english strings, encoded with system
    default encoding. Look at non-english strings in editor: editor
    does not display string ends! Now move cursor at the begin of
    string and move it with right arrow button. Cursor stucks at
    the end of string, and lefts there for click count that equals
    invisible characters count. Move cursor to the inwisible end of
    string. Now type english characters: qwertyu... Invisible end
    of string will apear, but newly typed characters (qwertyu...)
    still not visible. Save file. Look in notepad. Document data is
    ok.
    So this bug is how pythonwin displays strings in editor, it
    does not affect document data.

    There was no bug like this in version 1.48, we still use it. But
    it present in earlier versions (1.50 ... 1.55).
    --------------

    Third bug (seems unreported yet):

    Can't type \xFE character in editor. It is "ю" (Cyrillic small "u")
    letter. Dot typed instead of this character. Interesting that
    dot in at the same keybord button as Cyrillic small "u" letter.
    Editor opens files with this character, displays them correctly,
    and removes and copies them ok.
    This bug presents in every version of pythonwin from 1.46,
    and very probably, in the earlier versions.

     
  • Gilles Lenfant

    Gilles Lenfant - 2003-11-28

    Logged In: YES
    user_id=122383

    Same strange charset behaviour when copying text (with non
    ASCCI chars) from pythonwin to a regular text editor
    (notepad or any other).

    It seems that pythonwin copies utf-8 to the clipboard rather
    than "natural" default encoding (cp1252 for western european
    languages).

     
  • Henrik Weber

    Henrik Weber - 2004-01-07

    Logged In: YES
    user_id=121229

    This behaviour can be worked around by switching off unicode
    support. The easiest way to do this is in the __init__.py file.

    Replace the line beginning with is_platform_unicode by this:
    is_platform_unicode = 0

    With this pythonwin works normally at least with the german
    umlauts (I haven't tried the other ones).

     
  • Henrik Weber

    Henrik Weber - 2004-01-07

    Logged In: YES
    user_id=121229

    This behaviour can be worked around by switching off unicode
    support. The easiest way to do this is in the __init__.py file.

    Replace the line beginning with is_platform_unicode by this:
    is_platform_unicode = 0

    With this pythonwin works normally at least with the german
    umlauts (I haven't tried the other ones).

     
  • Kanich Vladimir

    Kanich Vladimir - 2004-01-22

    Logged In: YES
    user_id=304670

    I have same problem with encoding 'cp1250' (default encoding
    in sk/cz). Without changing __init__.py file (as recommended
    by hweber) the editor is swallowing the half of characters
    with char-code above 127. It seems that the used editor
    routine is byte-oriented and not UNICODE.
    Setting the variable is_platform_unicode = 0 helps also for
    encoding 'cp1250'.

     
  • Oleg Noga

    Oleg Noga - 2004-02-25

    Logged In: YES
    user_id=551440

    Bug has a solution: editor does not display characters after
    non-english (see followup by anadelonbrin, 2003-05-09 10:38)

    Solution is to replase scintilla.dll with latest version. Just
    download scintilla src from sourcefourge, build it and replase
    scintilla.dll with SciLexer.dll (rename SciLexer.dll to
    scintilla.dll).

    It do not solves bugs with clipboard, backspace and Cyrillic
    small "u" ("ю") letter.

     
  • Oleg Noga

    Oleg Noga - 2004-02-25

    Logged In: YES
    user_id=551440

    view.py, KeyDotEvent() allways generates "." but it can be
    non-english character instead of "."!

     
  • Kanich Vladimir

    Kanich Vladimir - 2004-02-26

    Logged In: YES
    user_id=304670

    With the solution described by oleg_noga (replacing of
    scintilla.dll) I tried to edit full string of special Slovak and
    Czech characters (codepage Cp1250) in PythonWin editor
    with full success. Thank you for the solution. N.B. Backspace
    operation in this arragement needs twice to push BACKSPACE
    key.

     
  • Mark Hammond

    Mark Hammond - 2004-04-22

    Logged In: YES
    user_id=14198

    Had a couple of reports that this is fixed in build 200 :)

     
  • Mark Hammond

    Mark Hammond - 2004-04-22
    • status: open --> closed-fixed
     
  • kxroberto

    kxroberto - 2006-01-11

    Logged In: YES
    user_id=972995

    pywin build 203
    From News Message-ID:
    <1136989316.179566.98110@g44g2000cwa.googlegroups.com>

    Neil Hodgson schrieb:
    > Robert:
    > PythonWin did have some Unicode support but I think
    Mark Hammond was
    > discouraged by bugs. In pythonwin/__init__.py there is a
    setting
    > is_platform_unicode = 0 with a commented out real test for
    Unicode on
    > the next line. Change this to 1 and restart and you may see
    >
    > >>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt'
    > >>> print x
    > sytest3\Пошук.txt
    > >>>

    thanks for that hint. But found that it is still not
    consistent or even
    buggy:

    After "is_platform_unicode = <auto>", scintilla displays
    some unicode
    as you showed. but the win32-functions (e.g. MessageBox)
    still do not
    pass through wide unicode. And pasting/inserting/parsing in
    scintilla
    doesn't work correct:

    PythonWin 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit
    (Intel)] on win32.
    Portions Copyright 1994-2004 Mark Hammond
    (mhammond@skippinet.com.au) -
    see 'Help/About PythonWin' for further copyright information.
    >>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt'
    >>> print x
    sytest3\Пошук.txt
    >>> print "sytest3\Пошук.txt"
    sytest3\?????.txt

    !!!

    --------

    Then tried in __init__.py to do more uft-8:

    default_platform_encoding = "utf-8" #"mbcs" # Will it ever
    ...this?
    default_scintilla_encoding = "utf-8" # Scintilla _only_
    supports this
    ATM

    Pasting around in scintilla then works correct. But
    MessageBox then
    shows plain utf-8 encoded chars. Even german umlauts are not
    displayable any more on my machine and when opening document
    files with
    above-128 chars, Pythonwin breaks (because files are not
    valid utf-8
    streams, I guess):

    >>> Traceback (most recent call last):
    File
    "C:\PYTHON23\Lib\site-packages\pythonwin\pywin\scintilla\document.py",
    line 27, in OnOpenDocument
    text = f.read()
    File "C:\Python23\lib\codecs.py", line 380, in read
    return self.reader.read(size)
    File "C:\Python23\lib\codecs.py", line 253, in read
    return self.decode(self.stream.read(), self.errors)[0]
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in
    position
    19983: unexpected code byte
    win32ui: OnOpenDocument() virtual handler (<bound method
    SyntEditDocument.OnOpenDocument of
    <pywin.framework.editor.color.coloreditor.SyntEditDocument
    instance at
    0x00E356E8>>) raised an exception

    Thus the result is: no combination provides a real
    improvement so far.
    wide unicode in win32-functions is obviously not possible at
    all. I
    switch back to the original setup.

     

Log in to post a comment.