#426 Custom define chars for BreakIterator not to break a line

closed-rejected
core (195)
5
2012-03-01
2012-02-20
tvojeho
No

Hi,

I would like to be able to custom define non-whitespace marks that do not cause BreakIterator to break a line.

In my case, when I use jEdit to display text containing non-ACSII right quotation mark (in czech language use it is the html code “ as I mentioned in http://sourceforge.net/tracker/?func=detail&aid=3488310&group_id=588&atid=100588\), the BreakIterator in soft wrap breaks the line and puts the quotation mark alone on the next line.

The different quotation mark usage practices, and perhaps even other non-ASCII marks, in various locales need more flexibility to accomodate different language customs.

Regards, tvojeho

Discussion

  • As for your original problem in #3488310, I did a test with BeanShell
    console (you can find it in Console plugin).

    BeanShell> breaker = java.text.BreakIterator.getLineInstance(new Locale("cz"))
    [checksum=0xfdf522de]
    BeanShell> breaker.setText("„abc“")
    BeanShell> breaker.first()
    0
    BeanShell> breaker.next()
    4
    BeanShell> breaker.next()
    5
    BeanShell> breaker.next()
    -1

    Now I wonder if the problem is in jEdit or in BreakIterator itself. If
    it is a conventional quotation, why the BreakIterator (even with specific
    locale) says there is a line break before it?

    Don't you think the problem is in BreakIterator?

     
    • assigned_to: nobody --> k_satoda
    • status: open --> pending
     
  • tvojeho
    tvojeho
    2012-02-27

    • status: pending --> open
     
  • tvojeho
    tvojeho
    2012-02-27

    I am not sure, I must say that I became aware that something like BreakIterator existed only when the jEdit behavior changed after daily version 2012-02-10 and I found your post http://sourceforge.net/tracker/?func=detail&atid=300588&aid=2483695&group_id=588.

    From your Console test, it looks like the issue is in BreakIterator, which as I understand is Java problem, not jEdit, so I think this FR is meant to enable to specifically modify the behaviour of the BreakIterator independently of the locale settings if possible, or maybe leave the older line break code as optional settings for such cases as mine where the new code behaves unexpectedly.

    Regards, tvojeho

     
  • Max Funk
    Max Funk
    2012-02-27

    Perhaps, the feature request would be a "Wrap behaviour" combo box in
    "Buffer options" with "Default (current locale)", "Whitespace only", "English", ...
    and passing the selected locale to BreakIterator initialisiation?

     
    • status: open --> pending
     
  • While the cause seems to be in JRE, it is possible to have workarounds
    for that problems in jEdit, espacially when the cause seems not to be
    fixed in the near future. And this looks the case.
    http://www.google.co.jp/search?q=site%3Abugs.sun.com+BreakIterator+getLineInstance
    The problem seems happen on a very small set of quotation marks.

    So I put hard coded workarounds in r21221.

    With that workarounds, the original problem reported in #3488310 no
    longer happen.

    Do you still want more control on user side?

     
  • Max Funk
    Max Funk
    2012-02-27

    r21221 will work for »German« but not «French»

    de.wikipedia.org/wiki/Anführungszeichen#Andere_Sprachen

     
  • Did you see the problem with «French»? If you just saw the diff, please
    note that the written cases are those which are wrongly handled by
    underlying BreakIterator. See more details at the comment there.

    In my local test, BreakIterator give right results (no line breaks) for
    «French». If you actually saw the problem with a build after the
    workaround, please report your locale (Locale.getDefault()). It may be
    different with mine "ja_JP", and the reason of different test result.

     
  • Max Funk
    Max Funk
    2012-02-27

    No. I saw it with German (+ german locale). From testing (without your fix), it seemed to me, that the implemented version is one of the most frequent uses, i.e. „German (variant 1)“, “English”, «French». However, german locale should deliver »German (Variant 2)«
    To make the fix better than the original, one should take into account, that all these quotatios have no unified direction (left or right)

     
  • tvojeho
    tvojeho
    2012-02-27

    Thank you for the fix, Kazutoshi. I am waiting to test it in a daily version, but if it behaves as Max predicted - the most frequent uses, i.e. „German/Czech (variant 1)“, “English”, «French», then it will work for me.

    But seeing the various quotation marks with different locales on the wiki page, I think that it would be welcome to be able to adjust the usage individually by the user.

     
  • tvojeho
    tvojeho
    2012-02-27

    • status: pending --> open
     
  • Max Funk
    Max Funk
    2012-02-27

    For all cases in the wikipedia table, I would suggest "never break between
    quotation mark and non-whitespace", that would be

    && !("”’»›“„‘‚«‹".indexOf(prev) >= 0 && !Character.isWhitespace(next))
    && !(!Character.isWhitespace(prev) && "”’»›“„‘‚«‹".indexOf(next) >= 0);

    instead of

    && !("”’»›".indexOf(prev) >= 0 && !Character.isWhitespace(next))
    && !(!Character.isWhitespace(prev) && "“„‘‚«‹".indexOf(next) >= 0);

     
  • tvojeho
    tvojeho
    2012-02-29

    Hi, Kazutoshi,
    I have tested all these »«›’‘"><‹„“” on my locale ("cz") with the 2012-02-28 daily and the line breaking works fine now.

    Thanks, tvojeho

     
  • I found that the workaround exhibits an unwanted soft wrap with some CJK
    text which have a quote in“…”style without enclosing whitespaces.
    I'll add another check for this case; hopefully looking one more
    next/prev char will be enough. I'll close this request if it succeeded.

     
  • Another check mention in the last comment has been added in r21236.

    Closing as Rejected as the requested feature became unnecessary now.
    Please submit another bug if you find something wrong. Thanks.

     
    • status: open --> closed-rejected