#426 Custom define chars for BreakIterator not to break a line

closed-rejected
core (195)
5
2012-03-01
2012-02-20
tvojeho
No

Hi,

I would like to be able to custom define non-whitespace marks that do not cause BreakIterator to break a line.

In my case, when I use jEdit to display text containing non-ACSII right quotation mark (in czech language use it is the html code “ as I mentioned in http://sourceforge.net/tracker/?func=detail&aid=3488310&group_id=588&atid=100588\), the BreakIterator in soft wrap breaks the line and puts the quotation mark alone on the next line.

The different quotation mark usage practices, and perhaps even other non-ASCII marks, in various locales need more flexibility to accomodate different language customs.

Regards, tvojeho

Discussion

  • Kazutoshi Satoda

    As for your original problem in #3488310, I did a test with BeanShell
    console (you can find it in Console plugin).

    BeanShell> breaker = java.text.BreakIterator.getLineInstance(new Locale("cz"))
    [checksum=0xfdf522de]
    BeanShell> breaker.setText("„abc“")
    BeanShell> breaker.first()
    0
    BeanShell> breaker.next()
    4
    BeanShell> breaker.next()
    5
    BeanShell> breaker.next()
    -1

    Now I wonder if the problem is in jEdit or in BreakIterator itself. If
    it is a conventional quotation, why the BreakIterator (even with specific
    locale) says there is a line break before it?

    Don't you think the problem is in BreakIterator?

     
  • Kazutoshi Satoda

    • assigned_to: nobody --> k_satoda
    • status: open --> pending
     
  • tvojeho

    tvojeho - 2012-02-27
    • status: pending --> open
     
  • tvojeho

    tvojeho - 2012-02-27

    I am not sure, I must say that I became aware that something like BreakIterator existed only when the jEdit behavior changed after daily version 2012-02-10 and I found your post http://sourceforge.net/tracker/?func=detail&atid=300588&aid=2483695&group_id=588.

    From your Console test, it looks like the issue is in BreakIterator, which as I understand is Java problem, not jEdit, so I think this FR is meant to enable to specifically modify the behaviour of the BreakIterator independently of the locale settings if possible, or maybe leave the older line break code as optional settings for such cases as mine where the new code behaves unexpectedly.

    Regards, tvojeho

     
  • Max Funk

    Max Funk - 2012-02-27

    Perhaps, the feature request would be a "Wrap behaviour" combo box in
    "Buffer options" with "Default (current locale)", "Whitespace only", "English", ...
    and passing the selected locale to BreakIterator initialisiation?

     
  • Kazutoshi Satoda

    • status: open --> pending
     
  • Kazutoshi Satoda

    While the cause seems to be in JRE, it is possible to have workarounds
    for that problems in jEdit, espacially when the cause seems not to be
    fixed in the near future. And this looks the case.
    http://www.google.co.jp/search?q=site%3Abugs.sun.com+BreakIterator+getLineInstance
    The problem seems happen on a very small set of quotation marks.

    So I put hard coded workarounds in r21221.

    With that workarounds, the original problem reported in #3488310 no
    longer happen.

    Do you still want more control on user side?

     
  • Max Funk

    Max Funk - 2012-02-27

    r21221 will work for »German« but not «French»

    de.wikipedia.org/wiki/Anführungszeichen#Andere_Sprachen

     
  • Kazutoshi Satoda

    Did you see the problem with «French»? If you just saw the diff, please
    note that the written cases are those which are wrongly handled by
    underlying BreakIterator. See more details at the comment there.

    In my local test, BreakIterator give right results (no line breaks) for
    «French». If you actually saw the problem with a build after the
    workaround, please report your locale (Locale.getDefault()). It may be
    different with mine "ja_JP", and the reason of different test result.

     
  • Max Funk

    Max Funk - 2012-02-27

    No. I saw it with German (+ german locale). From testing (without your fix), it seemed to me, that the implemented version is one of the most frequent uses, i.e. „German (variant 1)“, “English”, «French». However, german locale should deliver »German (Variant 2)«
    To make the fix better than the original, one should take into account, that all these quotatios have no unified direction (left or right)

     
  • tvojeho

    tvojeho - 2012-02-27

    Thank you for the fix, Kazutoshi. I am waiting to test it in a daily version, but if it behaves as Max predicted - the most frequent uses, i.e. „German/Czech (variant 1)“, “English”, «French», then it will work for me.

    But seeing the various quotation marks with different locales on the wiki page, I think that it would be welcome to be able to adjust the usage individually by the user.

     
  • tvojeho

    tvojeho - 2012-02-27
    • status: pending --> open
     
  • Max Funk

    Max Funk - 2012-02-27

    For all cases in the wikipedia table, I would suggest "never break between
    quotation mark and non-whitespace", that would be

    && !("”’»›“„‘‚«‹".indexOf(prev) >= 0 && !Character.isWhitespace(next))
    && !(!Character.isWhitespace(prev) && "”’»›“„‘‚«‹".indexOf(next) >= 0);

    instead of

    && !("”’»›".indexOf(prev) >= 0 && !Character.isWhitespace(next))
    && !(!Character.isWhitespace(prev) && "“„‘‚«‹".indexOf(next) >= 0);

     
  • tvojeho

    tvojeho - 2012-02-29

    Hi, Kazutoshi,
    I have tested all these »«›’‘"><‹„“” on my locale ("cz") with the 2012-02-28 daily and the line breaking works fine now.

    Thanks, tvojeho

     
  • Kazutoshi Satoda

    I found that the workaround exhibits an unwanted soft wrap with some CJK
    text which have a quote in“…”style without enclosing whitespaces.
    I'll add another check for this case; hopefully looking one more
    next/prev char will be enough. I'll close this request if it succeeded.

     
  • Kazutoshi Satoda

    Another check mention in the last comment has been added in r21236.

    Closing as Rejected as the requested feature became unnecessary now.
    Please submit another bug if you find something wrong. Thanks.

     
  • Kazutoshi Satoda

    • status: open --> closed-rejected
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks