Re: [JSch-users] Jsch ChannelSftp and character encodings
Status: Alpha
Brought to you by:
ymnk
|
From: Oberhuber, M. <Mar...@wi...> - 2007-09-21 09:49:47
|
PS=20 it seems that even on Linux, the platform default encoding differs by distribution. Redhat distributions use UTF-8 as default encoding but SuSE seem to use ISO-8859-1 as default. And, as I said, users can change it in their shells. Cheers, -- Martin Oberhuber Wind River Systems, Inc. Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm=20 > -----Original Message----- > From: Oberhuber, Martin=20 > Sent: Friday, September 21, 2007 11:48 AM > To: 'Atsuhiko Yamanaka' > Cc: jsc...@li... > Subject: RE: [JSch-users] Jsch ChannelSftp and character encodings >=20 > Hello Atsuhiko, >=20 > I'm not a big expert on encodings, but it seems to me > that unconditionally focing UTF-8 is not the right=20 > thing to do, at least for the following reasons: >=20 > 1.) As I understand it, on a UNIX box every user is free=20 > to choose his own encoding. User A could be using UTF-8 > but user B could be using ISO8859-1 or whatever he prefers. > In the shell, the change is made by setting an environment > variable. > But how would the SSHD know what encoding a user prefers? > It's running as root, isn't it? So how would it convert > from UTF-8 to the user's preferred encoding? >=20 > 2.) Although RFC seems to recomment UTF-8, it looks like=20 > practical implementation does not use it to recode. >=20 > 3.) Old version of Jsch defaulted to something else, so if > files with extended chars were written with old Jsch > they cannot be read properly with new Jsch when you=20 > force UTF-8. >=20 > because of all these reasons, I still think the better way > is to allow client choose the default encoding, as I was > proposing. If it turns out that UTF-8 is the correct default, > client can=20 > Channel.setDefaultEncoding("UTF-8"); > otherwise, other client's favorite encoding can be set. >=20 > But as I said, I'm not the big expert on encodings and I'm > happy to discuss this. >=20 > Cheers, > -- > Martin Oberhuber > Wind River Systems, Inc. > Target Management Project Lead, DSDP PMC Member > http://www.eclipse.org/dsdp/tm=20 >=20 > > -----Original Message----- > > From: Atsuhiko Yamanaka [mailto:ym...@jc...]=20 > > Sent: Thursday, September 20, 2007 4:58 AM > > To: Oberhuber, Martin > > Cc: jsc...@li... > > Subject: Re: [JSch-users] Jsch ChannelSftp and character encodings > >=20 > > Hi, > >=20 > > +-From: "Oberhuber, Martin" <Mar...@wi...> -- > > |_Date: Wed, 19 Sep 2007 11:47:37 +0200 _______________________ > > | > > |I'm wondering if anybody thought about the case yet where > > |I'd like to transfer files via Sftp, where the file names > > |Use non-ASCII foreign language characters, and the character > > |Encoding on the local system is different than the remote. > > | > > |Say, I want to transfer from a Windows box to a Linux box. > > |On Windows, my encoding is Cp1252 > > |On remote Linux, my encoding is UTF-8 > > |I want to transfer file "m=1B,Mv=1B(Bchte" > > | > > |Currently, channel always seems to encode Java Unicode Strings > > |With Platform default encoding (Cp1252 in my case). On the=20 > > |Remote, file names will not appear as expected. > >=20 > > Yes, it is a bug/incompleteness of jsch. > >=20 > > As far as I have understood, we have to send filenames in UTF-8 over > > sftp protocol. For example, its IETF draft[1] has said as follows, > >=20 > > 8.1.1. Opening a File > > Files are opened and created using the SSH_FXP_OPEN message. > > byte SSH_FXP_OPEN > > uint32 request-id > > string filename [UTF-8] > > uint32 desired-access > > uint32 flags > > ATTRS attrs > >=20 > > On the other hand, in the current jsch implementation, > > filenames have been sent in the local default encoding. > >=20 > > I'll fix it in the next version, but it will cause the=20 > > troubles for others. > > It seems to me that OpenSSH(for example, openssh-4.7p1)'s=20 > > sftp-server has > > not implemented such encoding conversion. So, as for the=20 > avobe case,=20 > > if the remote host does not use UTF-8, users will get=20 > > unexpected results. > > This is the reason I had not implemented it. > >=20 > > |To fix this, I think there should be=20 > > | Channel.setControlEncoding(String encoding) > > |So I can specify the encoding to use forr file and > > |Path names on the remote. At the time the Java unicode > > |String for arguments is converted to byte arrray, it > > |Should do so with the default encoding specified by me. > >=20 > > Unfortunately, the client does not have the initiative to choose the > > encoding and filenames must be sent in UTF-8 according to the RFC. > >=20 > >=20 > > [1] http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 > >=20 > >=20 > > Sincerely, > > -- > > Atsuhiko Yamanaka > > JCraft,Inc. > > 1-14-20 HONCHO AOBA-KU, > > SENDAI, MIYAGI 980-0014 Japan. > > Tel +81-22-723-2150 > > +1-415-578-3454 > > Fax +81-22-224-8773 > > Skype callto://jcraft/ > >=20 |