OmegaT's bundle properties filter handles source and target file as US-ASCII.
Java properties should be ISO-8859-1 encoding in Java 8 standard specification,
and UTF-8 from Java 9 or later.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/PropertyResourceBundle.html
OmegaT can raise MalformedInputException when opening a properties file with latin characters.
From Java specification's API note
PropertyResourceBundle can be constructed either from an InputStream or a Reader, which represents a property file. Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence. If "ISO-8859-1" is specified, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes as defined in section 3.3 of The Java™ Language Specification whereas the other constructor which takes a Reader does not have that limitation. Other encoding values are ignored for this system property. The system property is read and evaluated when initializing this class. Changing or removing the property has no effect after the initialization.
We should add an option to select encoding default for properties filter in preference, and default should be UTF-8 for future files.
There was previous discussions on
- RFE#807 Allow changing charset for .properties
- RFE#1083 Improve default output encoding settings
Diff:
The code in your branch seems to work. Thank you.
The fix merged into master as 8bb46dc1
released 6.0