Menu

#1116 BundleProperties fIlter should accept UTF-8 properties file

5.8
closed-fixed
None
5
2023-09-27
2022-09-23
No

OmegaT's bundle properties filter handles source and target file as US-ASCII.

Java properties should be ISO-8859-1 encoding in Java 8 standard specification,
and UTF-8 from Java 9 or later.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/PropertyResourceBundle.html

OmegaT can raise MalformedInputException when opening a properties file with latin characters.

From Java specification's API note

PropertyResourceBundle can be constructed either from an InputStream or a Reader, which represents a property file. Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence. If "ISO-8859-1" is specified, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes as defined in section 3.3 of The Java™ Language Specification whereas the other constructor which takes a Reader does not have that limitation. Other encoding values are ignored for this system property. The system property is read and evaluated when initializing this class. Changing or removing the property has no effect after the initialization.

We should add an option to select encoding default for properties filter in preference, and default should be UTF-8 for future files.

There was previous discussions on
- RFE#807 Allow changing charset for .properties
- RFE#1083 Improve default output encoding settings

Discussion

  • Hiroshi Miura

    Hiroshi Miura - 2022-09-24
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -12,3 +12,6 @@
    
     We should add an option to select encoding default for properties filter in preference, and default should be UTF-8 for future files.
    
    +There was previous discussions on
    +- [RFE#807 Allow changing charset for .properties](https://sourceforge.net/p/omegat/feature-requests/807/)
    +- [RFE#1083 Improve default output encoding settings](https://sourceforge.net/p/omegat/feature-requests/1083/)
    
     
  • Hiroshi Miura

    Hiroshi Miura - 2022-09-24
    • summary: BundleProperties fIlter opens source file as US-ASCII --> BundleProperties fIlter should accept UTF-8 properties file
     
  • Jean-Christophe Helary

    The code in your branch seems to work. Thank you.

     
  • Hiroshi Miura

    Hiroshi Miura - 2022-10-02
    • status: open --> open-fixed
     
  • Hiroshi Miura

    Hiroshi Miura - 2022-10-02

    The fix merged into master as 8bb46dc1

     
  • Hiroshi Miura

    Hiroshi Miura - 2023-09-27
    • status: open-fixed --> closed-fixed
     
  • Hiroshi Miura

    Hiroshi Miura - 2023-09-27

    released 6.0

     

Log in to post a comment.