Re: [JSch-users] Jsch ChannelSftp and character encodings
Status: Alpha
Brought to you by:
ymnk
From: Oberhuber, M. <Mar...@wi...> - 2007-09-25 13:46:23
|
Hello, I found another problem with using the local Platform default encoding for ChannelSftp: * Take a local Windows box with default encoding ("Cp1252") and a remote Linux (RHEL4) box with UTF-8 encoding. * There are some Unicode characters that lead to bytes which cannot be expressed in Cp1252: For instance, the character '=E8' includes 0x8d in UTF-8 and byte 0x8d is not defined in Cp1252. * As a result, when reading the directory with the standard encoding, Jsch replaces character '=E8' by a question mark '?' * But even glob_remote cannot resolve the question mark by=20 the original bytes --> as a result, remote files which=20 include the '=E8' character cannot be worked on at all! --> no stat(), no ls() etc... no Jsch ChannelSftp command at all works on such files. I thus find this issue REALLY problematic, and I'd vote for - provide a ChannelSftp.setControlEncoding() - have the default encoding be UTF-8 Cheers, -- Martin Oberhuber Wind River Systems, Inc. Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm=20 > -----Original Message----- > From: jsc...@li...=20 > [mailto:jsc...@li...] On Behalf=20 > Of Oberhuber, Martin > Sent: Friday, September 21, 2007 11:48 AM > To: Atsuhiko Yamanaka > Cc: jsc...@li... > Subject: Re: [JSch-users] Jsch ChannelSftp and character encodings >=20 > Hello Atsuhiko, >=20 > I'm not a big expert on encodings, but it seems to me > that unconditionally focing UTF-8 is not the right=20 > thing to do, at least for the following reasons: >=20 > 1.) As I understand it, on a UNIX box every user is free=20 > to choose his own encoding. User A could be using UTF-8 > but user B could be using ISO8859-1 or whatever he prefers. > In the shell, the change is made by setting an environment > variable. > But how would the SSHD know what encoding a user prefers? > It's running as root, isn't it? So how would it convert > from UTF-8 to the user's preferred encoding? >=20 > 2.) Although RFC seems to recomment UTF-8, it looks like=20 > practical implementation does not use it to recode. >=20 > 3.) Old version of Jsch defaulted to something else, so if > files with extended chars were written with old Jsch > they cannot be read properly with new Jsch when you=20 > force UTF-8. >=20 > because of all these reasons, I still think the better way > is to allow client choose the default encoding, as I was > proposing. If it turns out that UTF-8 is the correct default, > client can=20 > Channel.setDefaultEncoding("UTF-8"); > otherwise, other client's favorite encoding can be set. >=20 > But as I said, I'm not the big expert on encodings and I'm > happy to discuss this. >=20 > Cheers, > -- > Martin Oberhuber > Wind River Systems, Inc. > Target Management Project Lead, DSDP PMC Member > http://www.eclipse.org/dsdp/tm=20 >=20 > > -----Original Message----- > > From: Atsuhiko Yamanaka [mailto:ym...@jc...]=20 > > Sent: Thursday, September 20, 2007 4:58 AM > > To: Oberhuber, Martin > > Cc: jsc...@li... > > Subject: Re: [JSch-users] Jsch ChannelSftp and character encodings > >=20 > > Hi, > >=20 > > +-From: "Oberhuber, Martin" <Mar...@wi...> -- > > |_Date: Wed, 19 Sep 2007 11:47:37 +0200 _______________________ > > | > > |I'm wondering if anybody thought about the case yet where > > |I'd like to transfer files via Sftp, where the file names > > |Use non-ASCII foreign language characters, and the character > > |Encoding on the local system is different than the remote. > > | > > |Say, I want to transfer from a Windows box to a Linux box. > > |On Windows, my encoding is Cp1252 > > |On remote Linux, my encoding is UTF-8 > > |I want to transfer file "m=1B,Mv=1B(Bchte" > > | > > |Currently, channel always seems to encode Java Unicode Strings > > |With Platform default encoding (Cp1252 in my case). On the=20 > > |Remote, file names will not appear as expected. > >=20 > > Yes, it is a bug/incompleteness of jsch. > >=20 > > As far as I have understood, we have to send filenames in UTF-8 over > > sftp protocol. For example, its IETF draft[1] has said as follows, > >=20 > > 8.1.1. Opening a File > > Files are opened and created using the SSH_FXP_OPEN message. > > byte SSH_FXP_OPEN > > uint32 request-id > > string filename [UTF-8] > > uint32 desired-access > > uint32 flags > > ATTRS attrs > >=20 > > On the other hand, in the current jsch implementation, > > filenames have been sent in the local default encoding. > >=20 > > I'll fix it in the next version, but it will cause the=20 > > troubles for others. > > It seems to me that OpenSSH(for example, openssh-4.7p1)'s=20 > > sftp-server has > > not implemented such encoding conversion. So, as for the=20 > avobe case,=20 > > if the remote host does not use UTF-8, users will get=20 > > unexpected results. > > This is the reason I had not implemented it. > >=20 > > |To fix this, I think there should be=20 > > | Channel.setControlEncoding(String encoding) > > |So I can specify the encoding to use forr file and > > |Path names on the remote. At the time the Java unicode > > |String for arguments is converted to byte arrray, it > > |Should do so with the default encoding specified by me. > >=20 > > Unfortunately, the client does not have the initiative to choose the > > encoding and filenames must be sent in UTF-8 according to the RFC. > >=20 > >=20 > > [1] http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 > >=20 > >=20 > > Sincerely, > > -- > > Atsuhiko Yamanaka > > JCraft,Inc. > > 1-14-20 HONCHO AOBA-KU, > > SENDAI, MIYAGI 980-0014 Japan. > > Tel +81-22-723-2150 > > +1-415-578-3454 > > Fax +81-22-224-8773 > > Skype callto://jcraft/ > >=20 >=20 > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > JSch-users mailing list > JSc...@li... > https://lists.sourceforge.net/lists/listinfo/jsch-users >=20 |