OmegaT - multiplatform CAT tool / Bugs / #721 Newline characters are lost in Japanese target documents

#721 Newline characters are lost in Japanese target documents

Milestone: 3.1

Status: closed-fixed

Owner: Yu Tang

Labels: filter (12)

Priority: 5

Updated: 2015-03-11

Created: 2014-11-29

Creator: Yu Tang

Private: No

Source documents (.properties files) contains "\n" in string resource.
Some or all segments are translated to non-ASCII characters.
Then, generate target documents.
"\n" sequences with a preceding escaped Unicode character are lost or changed to space character in the target documents.
See attached screenshot.

1 Attachments

Discussion

Didier Briel - 2014-11-29

What happens if you use the ASCII encoding?

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Yu Tang - 2014-11-29
  
  It is the same. The encoding is already set to US-ASCII.
  I don't think I changed it from default.
  
  encoding.png
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2014-11-29

OK, I was confused by the Yen sign instead of the \, but I remember now it's usual on Japanese systems.

It looks like a bug. This code probably wasn't touched since a lot of time.

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yu Tang - 2014-11-29

See attached screenshot for real examples in action.
These files are taken from OmegaT /trunk.

realworldexamples.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2014-11-29

I confirm it's probably not a new bug. According to bundles, it was already the same in 2.6 (but I suspect it is much older).

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yu Tang - 2014-12-03

I have good news and bad news.
The good news is, I found the cause point in source code.
The bad news is, org.omegat.core.segmentation.Segmenter#glue() seems be the cause.
The source comment says "For translation to Japanese does not add any spaces."
So Java Resource Bundles filter is not the only one. All filters are affected.
See a screenshot for HTML filter.

html-filter.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Didier Briel - 2014-12-03
  
  HTML is not reliable, as carriage returns are supposed to be transformed into spaces.
  
  Let's move to the technical mailing list (dev-tech), as Sourceforge comments are not very practical to discuss.
  
  Didier
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Yu Tang - 2014-12-04
    
    OK, I'll keep dig into it and post to dev-tech next time.
    Thank you for the guiding.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2015-01-09

status: open --> accepted

assigned_to: Yu Tang

Group: 2.6 --> 3.1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yu Tang - 2015-01-10

Fixed in /trunk.
New implementation is leaving all line break (both \n and \r) and tab intact in CJK translation.
Spaces are removed the same as before, but spaces after line break or tab are NOT removed for keeping indentation.

For example;

Foo. Bar.->Foo。Bar。 // space removed Foo.\n\nBar.->Foo。\n\nBar。 // line break(s) are retained List:\n Item1->List：\n Item1 // spaces after line break are retained too
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2015-01-10

summary: Newline characters are lost in target document with Java Resource Bundles filter --> Newline characters are lost in Japanese target documents

status: accepted --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2015-03-11

Closed in the released version 3.1.9 of OmegaT.

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2015-03-11

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.