Menu

#7 file_name_encoding not present in source code, only in docs

None
closed
None
5
2015-02-13
2014-09-05
Otto Frost
No

I searched the source tree com.sos-berlin.products.sources-1.7-4189
Searching for: file_name_encoding
sos\scheduler\jobdoc\params\param_file_name_encoding.sosdoc(10): name="file_name_encoding"
sos\scheduler\jobdoc\params\param_file_name_encoding.sosdoc(15): OptionName="file_name_encoding" Alias=""
Found 2 occurrence(s) in 1 file(s)

this parameter can't work as it is missing in the sources

Discussion

  • Otto Frost

    Otto Frost - 2014-09-05

    However in file SOSFtpOptionsSuperClass.java

    @JSOptionDefinition(
    name = "FileNameEncoding",
    description = "Set the encoding-type of a file name",
    key = "FileNameEncoding",
    type = "SOSOptionString",
    mandatory = false)

    and in SOSFTPCommand.java
    public static final String FILENAME_ENCODING = "Filename_encoding";

    and in SOSFTPCommand.java

    protected String doEncoding(final String pstrStringToEncode, final String pstrEncoding) throws Exception {

        @SuppressWarnings("unused")
        final String conMethodName = conClassName + "::doEncoding";
    
        // Zeichen Unicode
        // ------------------------------
        // Ä, ä \u00c4, \u00e4
        // Ö, ö \u00d6, \u00f6
        // Ü, ü \u00dc, \u00fc
        // ß \u00df
    
        // see http://www.fileformat.info/info/unicode/char/search.htm
    
        final String conUTF8UmlU = "\u00fc"; // "ü";
        final String conUTF8UmlBigA = "\u00c4"; // LATIN CAPITAL LETTER A WITH DIAERESIS "Ã\\?";
        final String conUTF8UmlBigO = "\u00d6"; // "Ã\\?";
        final String conUTF8UmlBigU = "\u00dc"; // "Ã\\?";
        final String conUTF8UmlA = "\u00e4"; // "ä";
        final String conUTF8UmlO = "\u00f6"; // "ö";
        final String conUTF8UmlS = "\u00DF";
    
        String strEncodedString = pstrStringToEncode;
        if (pstrEncoding.length() > 0) {
            byte[] iso88591Data = pstrStringToEncode.getBytes(Charset.forName(pstrEncoding));
            strEncodedString = new String(iso88591Data, Charset.forName(pstrEncoding));
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlU, "ü");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlBigA, "Ä");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlBigU, "Ü");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlBigO, "Ö");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlA, "ä");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlO, "ö");
            strEncodedString = strEncodedString.replaceAll(conUTF8UmlS, "ß");
            getLogger().debug(String.format("Encode String '%1$s' to/in '%2$s' using '%3$s'", pstrStringToEncode, strEncodedString, pstrEncoding));
        }
        return strEncodedString;
    } // private String doEncoding
    

    // do_encoding is broken the whole concept is wrong
    // it only works for german characters
    // how about danish, swedish, norwegian, french
    // vietnamese, chorean, chinese, japanese

    // standard java routines should be used.

    // it should be possible to set filename encoding
    // for both source and target.

    // local file may be on CIFS or NFS mount or on a different user account where a different charset encoding is needed for filename

     

    Last edit: Otto Frost 2014-09-05
  • Otto Frost

    Otto Frost - 2014-09-08

    I have tested transferring file åäö.txt
    windows 2008 <-> ubuntu 12.04 using sftp
    jade executing on windows 2008
    It works fine in both directions without FileNameEncoding parameter.
    On linux filenames are in UTF-8

     
  • Oliver Haufe

    Oliver Haufe - 2015-02-06

    I know that the FileNameEncoding option is not used in JADE.
    But it works depends on the filesystem and the user's locale.
    I think we will remove this option.

     
  • Oliver Haufe

    Oliver Haufe - 2015-02-06
    • status: open --> closed
    • assigned_to: Oliver Haufe
    • Group: -->
     
  • Otto Frost

    Otto Frost - 2015-02-09

    This is an important feature.
    sftp transfers the filename binary, but if different charset encoding for file name is used on local and remote the file name has to be converted.

    SAP has issued note "1906648 - SFTP Adapter: File names with Umlaut characters" but our testing so far unfortunatelty shows that the SAP function is broken. The note indicates that SAP considers it an important feature.

    Trusting the locale for conversion will not work well for remote file systems like sftp, and will be problematic for cifs/nfs mounts or local directories shared between users. (different users may have different locale)

     

    Last edit: Otto Frost 2015-02-09
  • Otto Frost

    Otto Frost - 2015-02-09

    the team behind sshfs has seen the need

    if you have different charset encoding on filenames
    -o from_code=CHARSET
    original encoding of file names (default: UTF-8)
    -o to_code=CHARSET
    new encoding of the file names (default: ISO-8859-2)

     
  • Oliver Haufe

    Oliver Haufe - 2015-02-09

    That's right. I have open a ticket for filename encoding. See https://change.sos-berlin.com/browse/JADE-248

     
  • Oliver Haufe

    Oliver Haufe - 2015-02-09
    • status: closed --> pending
     
  • Uwe Risse

    Uwe Risse - 2015-02-13
    • status: pending --> closed
     
  • Uwe Risse

    Uwe Risse - 2015-02-13

    Please follow this in the jira ticket.
    https://change.sos-berlin.com/browse/JADE-248

     

Log in to post a comment.

MongoDB Logo MongoDB