[pywin32-bugs] [ pywin32-Bugs-716708 ] pythonwin editor does strange things with non-english chars
OLD project page for the Python extensions for Windows
Brought to you by:
mhammond
From: SourceForge.net <no...@so...> - 2006-01-11 16:31:22
|
Bugs item #716708, was opened at 2003-04-07 14:43 Message generated for change (Comment added) made by kxroberto You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=551954&aid=716708&group_id=78018 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: pythonwin Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: pythonwin editor does strange things with non-english chars Initial Comment: Reported by a few people, but to repro: * Paste "# ça ne va pas" into pywin (or any other string with extended char) * Move to the ç * Press Backspace Some other character appears. This will be related to pywin.idle not being mult-character aware. ---------------------------------------------------------------------- Comment By: Robert Kiendl (kxroberto) Date: 2006-01-11 17:31 Message: Logged In: YES user_id=972995 pywin build 203 From News Message-ID: <113...@g4...> Neil Hodgson schrieb: > Robert: > PythonWin did have some Unicode support but I think Mark Hammond was > discouraged by bugs. In pythonwin/__init__.py there is a setting > is_platform_unicode = 0 with a commented out real test for Unicode on > the next line. Change this to 1 and restart and you may see > > >>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt' > >>> print x > sytest3\ÐоÑÑк.txt > >>> thanks for that hint. But found that it is still not consistent or even buggy: After "is_platform_unicode = <auto>", scintilla displays some unicode as you showed. but the win32-functions (e.g. MessageBox) still do not pass through wide unicode. And pasting/inserting/parsing in scintilla doesn't work correct: PythonWin 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32. Portions Copyright 1994-2004 Mark Hammond (mha...@sk...) - see 'Help/About PythonWin' for further copyright information. >>> x = u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt' >>> print x sytest3\ÐоÑÑк.txt >>> print "sytest3\ÐоÑÑк.txt" sytest3\?????.txt !!! -------- Then tried in __init__.py to do more uft-8: default_platform_encoding = "utf-8" #"mbcs" # Will it ever ...this? default_scintilla_encoding = "utf-8" # Scintilla _only_ supports this ATM Pasting around in scintilla then works correct. But MessageBox then shows plain utf-8 encoded chars. Even german umlauts are not displayable any more on my machine and when opening document files with above-128 chars, Pythonwin breaks (because files are not valid utf-8 streams, I guess): >>> Traceback (most recent call last): File "C:\PYTHON23\Lib\site-packages\pythonwin\pywin\scintilla\document.py", line 27, in OnOpenDocument text = f.read() File "C:\Python23\lib\codecs.py", line 380, in read return self.reader.read(size) File "C:\Python23\lib\codecs.py", line 253, in read return self.decode(self.stream.read(), self.errors)[0] UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 19983: unexpected code byte win32ui: OnOpenDocument() virtual handler (<bound method SyntEditDocument.OnOpenDocument of <pywin.framework.editor.color.coloreditor.SyntEditDocument instance at 0x00E356E8>>) raised an exception Thus the result is: no combination provides a real improvement so far. wide unicode in win32-functions is obviously not possible at all. I switch back to the original setup. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2004-04-22 10:48 Message: Logged In: YES user_id=14198 Had a couple of reports that this is fixed in build 200 :) ---------------------------------------------------------------------- Comment By: Kanich Vladimir (kanich) Date: 2004-02-26 08:13 Message: Logged In: YES user_id=304670 With the solution described by oleg_noga (replacing of scintilla.dll) I tried to edit full string of special Slovak and Czech characters (codepage Cp1250) in PythonWin editor with full success. Thank you for the solution. N.B. Backspace operation in this arragement needs twice to push BACKSPACE key. ---------------------------------------------------------------------- Comment By: Oleg Noga (oleg_noga) Date: 2004-02-25 16:02 Message: Logged In: YES user_id=551440 view.py, KeyDotEvent() allways generates "." but it can be non-english character instead of "."! ---------------------------------------------------------------------- Comment By: Oleg Noga (oleg_noga) Date: 2004-02-25 15:32 Message: Logged In: YES user_id=551440 Bug has a solution: editor does not display characters after non-english (see followup by anadelonbrin, 2003-05-09 10:38) Solution is to replase scintilla.dll with latest version. Just download scintilla src from sourcefourge, build it and replase scintilla.dll with SciLexer.dll (rename SciLexer.dll to scintilla.dll). It do not solves bugs with clipboard, backspace and Cyrillic small "u" ("ю") letter. ---------------------------------------------------------------------- Comment By: Kanich Vladimir (kanich) Date: 2004-01-22 09:33 Message: Logged In: YES user_id=304670 I have same problem with encoding 'cp1250' (default encoding in sk/cz). Without changing __init__.py file (as recommended by hweber) the editor is swallowing the half of characters with char-code above 127. It seems that the used editor routine is byte-oriented and not UNICODE. Setting the variable is_platform_unicode = 0 helps also for encoding 'cp1250'. ---------------------------------------------------------------------- Comment By: Henrik Weber (hweber) Date: 2004-01-07 12:17 Message: Logged In: YES user_id=121229 This behaviour can be worked around by switching off unicode support. The easiest way to do this is in the __init__.py file. Replace the line beginning with is_platform_unicode by this: is_platform_unicode = 0 With this pythonwin works normally at least with the german umlauts (I haven't tried the other ones). ---------------------------------------------------------------------- Comment By: Henrik Weber (hweber) Date: 2004-01-07 12:16 Message: Logged In: YES user_id=121229 This behaviour can be worked around by switching off unicode support. The easiest way to do this is in the __init__.py file. Replace the line beginning with is_platform_unicode by this: is_platform_unicode = 0 With this pythonwin works normally at least with the german umlauts (I haven't tried the other ones). ---------------------------------------------------------------------- Comment By: Gilles Lenfant (glenfant) Date: 2003-11-28 15:53 Message: Logged In: YES user_id=122383 Same strange charset behaviour when copying text (with non ASCCI chars) from pythonwin to a regular text editor (notepad or any other). It seems that pythonwin copies utf-8 to the clipboard rather than "natural" default encoding (cp1252 for western european languages). ---------------------------------------------------------------------- Comment By: Oleg Noga (oleg_noga) Date: 2003-08-18 16:08 Message: Logged In: YES user_id=551440 So there are 3 bugs: ------------ First bug (this report, 716708): Incorrect work of backspace. (Del works ok). I must press backspace two times to remove non-english character. This bug presents in every version of pythonwin from 1.46, and very probably, in the earlier versions. ------------ Second bug (see followup by anadelonbrin, 2003-05-09 10:38 ): Editor does not display characters after non-english. Another reproduces: - open py file with non-english strings, encoded with system default encoding. Look at non-english strings in editor: editor does not display string ends! Now move cursor at the begin of string and move it with right arrow button. Cursor stucks at the end of string, and lefts there for click count that equals invisible characters count. Move cursor to the inwisible end of string. Now type english characters: qwertyu... Invisible end of string will apear, but newly typed characters (qwertyu...) still not visible. Save file. Look in notepad. Document data is ok. So this bug is how pythonwin displays strings in editor, it does not affect document data. There was no bug like this in version 1.48, we still use it. But it present in earlier versions (1.50 ... 1.55). -------------- Third bug (seems unreported yet): Can't type \xFE character in editor. It is "&#1102;" (Cyrillic small "u") letter. Dot typed instead of this character. Interesting that dot in at the same keybord button as Cyrillic small "u" letter. Editor opens files with this character, displays them correctly, and removes and copies them ok. This bug presents in every version of pythonwin from 1.46, and very probably, in the earlier versions. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-05-09 09:38 Message: Logged In: YES user_id=552329 As another example: In IDLE (0.8): >>> u = u'questa \xe8 bella' >>> u u'questa \xe8 bella' >>> print u questa è bella In PythonWin (152): >>> u = u'questa \xe8 bella' >>> u u'questa \xe8 bella' >>> print u questa è bell (note that if you copy the result of the 'print u', or type something after it, the a appears. *very* confusing!) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=551954&aid=716708&group_id=78018 |