Menu

#721 Newline characters are lost in Japanese target documents

3.1
closed-fixed
Yu Tang
filter (12)
5
2015-03-11
2014-11-29
Yu Tang
No

Source documents (.properties files) contains "\n" in string resource.
Some or all segments are translated to non-ASCII characters.
Then, generate target documents.
"\n" sequences with a preceding escaped Unicode character are lost or changed to space character in the target documents.
See attached screenshot.

1 Attachments

Discussion

  • Didier Briel

    Didier Briel - 2014-11-29

    What happens if you use the ASCII encoding?

    Didier

     
    • Yu Tang

      Yu Tang - 2014-11-29

      It is the same. The encoding is already set to US-ASCII.
      I don't think I changed it from default.

       
  • Didier Briel

    Didier Briel - 2014-11-29

    OK, I was confused by the Yen sign instead of the \, but I remember now it's usual on Japanese systems.

    It looks like a bug. This code probably wasn't touched since a lot of time.

    Didier

     
  • Yu Tang

    Yu Tang - 2014-11-29

    See attached screenshot for real examples in action.
    These files are taken from OmegaT /trunk.

     
  • Didier Briel

    Didier Briel - 2014-11-29

    I confirm it's probably not a new bug. According to bundles, it was already the same in 2.6 (but I suspect it is much older).

    Didier

     
  • Yu Tang

    Yu Tang - 2014-12-03

    I have good news and bad news.
    The good news is, I found the cause point in source code.
    The bad news is, org.omegat.core.segmentation.Segmenter#glue() seems be the cause.
    The source comment says "For translation to Japanese does not add any spaces."
    So Java Resource Bundles filter is not the only one. All filters are affected.
    See a screenshot for HTML filter.

     
    • Didier Briel

      Didier Briel - 2014-12-03

      HTML is not reliable, as carriage returns are supposed to be transformed into spaces.

      Let's move to the technical mailing list (dev-tech), as Sourceforge comments are not very practical to discuss.

      Didier

       
      • Yu Tang

        Yu Tang - 2014-12-04

        OK, I'll keep dig into it and post to dev-tech next time.
        Thank you for the guiding.

         
  • Didier Briel

    Didier Briel - 2015-01-09
    • status: open --> accepted
    • assigned_to: Yu Tang
    • Group: 2.6 --> 3.1
     
  • Yu Tang

    Yu Tang - 2015-01-10

    Fixed in /trunk.
    New implementation is leaving all line break (both \n and \r) and tab intact in CJK translation.
    Spaces are removed the same as before, but spaces after line break or tab are NOT removed for keeping indentation.

    For example;

    Foo. Bar.->Foo。Bar。 // space removed
    Foo.\n\nBar.->Foo。\n\nBar。 // line break(s) are retained
    List:\n  Item1->List:\n  Item1 // spaces after line break are retained too
    
     
  • Didier Briel

    Didier Briel - 2015-01-10
    • summary: Newline characters are lost in target document with Java Resource Bundles filter --> Newline characters are lost in Japanese target documents
    • status: accepted --> open-fixed
     
  • Didier Briel

    Didier Briel - 2015-03-11

    Closed in the released version 3.1.9 of OmegaT.

    Didier

     
  • Didier Briel

    Didier Briel - 2015-03-11
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.