Bugs item #716708, was opened at 2003-04-07 14:43
Message generated for change (Comment added) made by kxroberto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=551954&aid=716708&group_id=78018
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: pythonwin
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: pythonwin editor does strange things with non-english chars
Initial Comment:
Reported by a few people, but to repro:
* Paste "# ça ne va pas" into pywin (or any other
string with extended char)
* Move to the ç
* Press Backspace
Some other character appears. This will be related to
pywin.idle not being mult-character aware.
----------------------------------------------------------------------
Comment By: Robert Kiendl (kxroberto)
Date: 2006-01-11 17:31
Message:
Logged In: YES
user_id=972995
pywin build 203
From News Message-ID:
<113...@g4...>
Neil Hodgson schrieb:
> Robert:
> PythonWin did have some Unicode support but I think
Mark Hammond was
> discouraged by bugs. In pythonwin/__init__.py there is a
setting
> is_platform_unicode = 0 with a commented out real test for
Unicode on
> the next line. Change this to 1 and restart and you may see
>
> >>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt'
> >>> print x
> sytest3\ÐоÑÑк.txt
> >>>
thanks for that hint. But found that it is still not
consistent or even
buggy:
After "is_platform_unicode = <auto>", scintilla displays
some unicode
as you showed. but the win32-functions (e.g. MessageBox)
still do not
pass through wide unicode. And pasting/inserting/parsing in
scintilla
doesn't work correct:
PythonWin 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit
(Intel)] on win32.
Portions Copyright 1994-2004 Mark Hammond
(mha...@sk...) -
see 'Help/About PythonWin' for further copyright information.
>>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt'
>>> print x
sytest3\ÐоÑÑк.txt
>>> print "sytest3\ÐоÑÑк.txt"
sytest3\?????.txt
!!!
--------
Then tried in __init__.py to do more uft-8:
default_platform_encoding = "utf-8" #"mbcs" # Will it ever
...this?
default_scintilla_encoding = "utf-8" # Scintilla _only_
supports this
ATM
Pasting around in scintilla then works correct. But
MessageBox then
shows plain utf-8 encoded chars. Even german umlauts are not
displayable any more on my machine and when opening document
files with
above-128 chars, Pythonwin breaks (because files are not
valid utf-8
streams, I guess):
>>> Traceback (most recent call last):
File
"C:\PYTHON23\Lib\site-packages\pythonwin\pywin\scintilla\document.py",
line 27, in OnOpenDocument
text = f.read()
File "C:\Python23\lib\codecs.py", line 380, in read
return self.reader.read(size)
File "C:\Python23\lib\codecs.py", line 253, in read
return self.decode(self.stream.read(), self.errors)[0]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in
position
19983: unexpected code byte
win32ui: OnOpenDocument() virtual handler (<bound method
SyntEditDocument.OnOpenDocument of
<pywin.framework.editor.color.coloreditor.SyntEditDocument
instance at
0x00E356E8>>) raised an exception
Thus the result is: no combination provides a real
improvement so far.
wide unicode in win32-functions is obviously not possible at
all. I
switch back to the original setup.
----------------------------------------------------------------------
Comment By: Mark Hammond (mhammond)
Date: 2004-04-22 10:48
Message:
Logged In: YES
user_id=14198
Had a couple of reports that this is fixed in build 200 :)
----------------------------------------------------------------------
Comment By: Kanich Vladimir (kanich)
Date: 2004-02-26 08:13
Message:
Logged In: YES
user_id=304670
With the solution described by oleg_noga (replacing of
scintilla.dll) I tried to edit full string of special Slovak and
Czech characters (codepage Cp1250) in PythonWin editor
with full success. Thank you for the solution. N.B. Backspace
operation in this arragement needs twice to push BACKSPACE
key.
----------------------------------------------------------------------
Comment By: Oleg Noga (oleg_noga)
Date: 2004-02-25 16:02
Message:
Logged In: YES
user_id=551440
view.py, KeyDotEvent() allways generates "." but it can be
non-english character instead of "."!
----------------------------------------------------------------------
Comment By: Oleg Noga (oleg_noga)
Date: 2004-02-25 15:32
Message:
Logged In: YES
user_id=551440
Bug has a solution: editor does not display characters after
non-english (see followup by anadelonbrin, 2003-05-09 10:38)
Solution is to replase scintilla.dll with latest version. Just
download scintilla src from sourcefourge, build it and replase
scintilla.dll with SciLexer.dll (rename SciLexer.dll to
scintilla.dll).
It do not solves bugs with clipboard, backspace and Cyrillic
small "u" ("ю") letter.
----------------------------------------------------------------------
Comment By: Kanich Vladimir (kanich)
Date: 2004-01-22 09:33
Message:
Logged In: YES
user_id=304670
I have same problem with encoding 'cp1250' (default encoding
in sk/cz). Without changing __init__.py file (as recommended
by hweber) the editor is swallowing the half of characters
with char-code above 127. It seems that the used editor
routine is byte-oriented and not UNICODE.
Setting the variable is_platform_unicode = 0 helps also for
encoding 'cp1250'.
----------------------------------------------------------------------
Comment By: Henrik Weber (hweber)
Date: 2004-01-07 12:17
Message:
Logged In: YES
user_id=121229
This behaviour can be worked around by switching off unicode
support. The easiest way to do this is in the __init__.py file.
Replace the line beginning with is_platform_unicode by this:
is_platform_unicode = 0
With this pythonwin works normally at least with the german
umlauts (I haven't tried the other ones).
----------------------------------------------------------------------
Comment By: Henrik Weber (hweber)
Date: 2004-01-07 12:16
Message:
Logged In: YES
user_id=121229
This behaviour can be worked around by switching off unicode
support. The easiest way to do this is in the __init__.py file.
Replace the line beginning with is_platform_unicode by this:
is_platform_unicode = 0
With this pythonwin works normally at least with the german
umlauts (I haven't tried the other ones).
----------------------------------------------------------------------
Comment By: Gilles Lenfant (glenfant)
Date: 2003-11-28 15:53
Message:
Logged In: YES
user_id=122383
Same strange charset behaviour when copying text (with non
ASCCI chars) from pythonwin to a regular text editor
(notepad or any other).
It seems that pythonwin copies utf-8 to the clipboard rather
than "natural" default encoding (cp1252 for western european
languages).
----------------------------------------------------------------------
Comment By: Oleg Noga (oleg_noga)
Date: 2003-08-18 16:08
Message:
Logged In: YES
user_id=551440
So there are 3 bugs:
------------
First bug (this report, 716708):
Incorrect work of backspace. (Del works ok). I must press
backspace two times to remove non-english character. This
bug presents in every version of pythonwin from 1.46, and
very probably, in the earlier versions.
------------
Second bug (see followup by anadelonbrin, 2003-05-09 10:38
):
Editor does not display characters after non-english.
Another reproduces:
- open py file with non-english strings, encoded with system
default encoding. Look at non-english strings in editor: editor
does not display string ends! Now move cursor at the begin of
string and move it with right arrow button. Cursor stucks at
the end of string, and lefts there for click count that equals
invisible characters count. Move cursor to the inwisible end of
string. Now type english characters: qwertyu... Invisible end
of string will apear, but newly typed characters (qwertyu...)
still not visible. Save file. Look in notepad. Document data is
ok.
So this bug is how pythonwin displays strings in editor, it
does not affect document data.
There was no bug like this in version 1.48, we still use it. But
it present in earlier versions (1.50 ... 1.55).
--------------
Third bug (seems unreported yet):
Can't type \xFE character in editor. It is "&#1102;" (Cyrillic small "u")
letter. Dot typed instead of this character. Interesting that
dot in at the same keybord button as Cyrillic small "u" letter.
Editor opens files with this character, displays them correctly,
and removes and copies them ok.
This bug presents in every version of pythonwin from 1.46,
and very probably, in the earlier versions.
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2003-05-09 09:38
Message:
Logged In: YES
user_id=552329
As another example:
In IDLE (0.8):
>>> u = u'questa \xe8 bella'
>>> u
u'questa \xe8 bella'
>>> print u
questa è bella
In PythonWin (152):
>>> u = u'questa \xe8 bella'
>>> u
u'questa \xe8 bella'
>>> print u
questa è bell
(note that if you copy the result of the 'print u', or type
something after it, the a appears. *very* confusing!)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=551954&aid=716708&group_id=78018
|