From: Kazutoshi S. <k_s...@f2...> - 2012-08-25 07:06:03
|
Hi Mike, First, please don't start a new topic with a reply to other thread. You can see the problem here: Your new topic is shown deep inside another unrelated thread, and is hard to find. http://jedit.9.n6.nabble.com/jEdit-users-Set-and-move-mark-for-selections-tp5000244p5000369.html maxwell wrote: > I've been using jEdit for several years now, and recently I'm noticing a > changed behavior when editing Unicode (UTF-8) files. Specifically, the > cursor used to treat combining characters (like accent marks--U+0301 > "Combining Acute Accent", for example) as separate characters. (snip) > Now only the separate deletion works, and that only for > the <Backspace> key (assigned to "Delete Previous Character", not the > <Delete> key (assigned to "Delete Next Character"). (snip) > This incorrect behavior occurs using jEdit v4.5.1, Java 1.6.0_33, and > Windows 7. The behavior is correct (the combining characters are treated > correctly, as individual characters) in jEdit v4.5pre1 under Linux (KDE). The behavior was intentionally changed as a bug fix, and the new behavior was released as jEdit 4.5.1. http://www.jedit.org/CHANGES45.txt > - Made basic edit operations aware of characters above BMP and combining > character sequences. > (SF.net bug #3040720 - Kazutoshi Satoda) http://jedit.svn.sourceforge.net/jedit/?view=rev&rev=21259 http://sourceforge.net/support/tracker.php?aid=3040720 I thought that the new behavior is the most general one among some other possible behaviors. It was taken from an example in Unicode Standard Annex #29. http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries > For example, on a given system the backspace key might delete by code > point, while the delete key may delete an entire cluster. If it turns out that another behavior is also another general demand, an option to select the behavior may be a valid feature request. But I don't know well about the use of combining acute, etc. Please provide some references if you would post a feature request. For now, if you really want to delete the base character of the combining sequence, a macro with a direct buffer operation may be an option. It can be assigned to [Delete] key for example. I think it is a bug that "Display Character Code" shows the code only of the base character (or high surrogate). Are you OK if it shows the whole codes of the combining sequence? For example, 'a' + combining acute will be shown as "int=[97,769] hex=[61,301]". -- k_satoda |