Seems I misunderstood the setServerAlive option. I thought it was in seconds but it seems to be in milliseconds. Thus, I sent server alive messages every 10 msec which I guess could have something to do with the "rekeying process" exception. I don't seem to get those if change the interval to 10 seconds (which is what I had intended to do).

I still get the server error message:

"keepalive@jcraft.com" is an unsupported global request

Perhaps this is the way it is supposed to work - can you verify this Atsuhiko? It does seem to poll and download files correctly despite of this message.

/Bengt

2012/10/18 Bengt Rodehav <bengt@rodehav.com>
Here is some more information.

After a while it seems like the polling seems to start working again although I still get the "rekeying process" now and then.

I looked at Serv-U's log (the sftp server) and it looks like this:

[02] Thu 18Oct12 08:21:13 - (969463) User "user" logged in
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_USERAUTH_SUCCESS: successful login
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) SSH_FXP_INIT: client version 3 (JSCH-0.1.44)
[31] Thu 18Oct12 08:21:13 - (969463) SSH_FXP_VERSION: server version 3
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) {0x00000001} SSH_FXP_REALPATH: path: "~"
[31] Thu 18Oct12 08:21:13 - (969463) {0x00000001} SSH_FXP_NAME: path: "/"
[30] Thu 18Oct12 08:21:13 - (969463) {0x00000002} SSH_FXP_STAT: path: "/sample/sftp2file"
[31] Thu 18Oct12 08:21:13 - (969463) {0x00000002} SSH_FXP_ATTRS
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[30] Thu 18Oct12 08:21:13 - (969463) {0x00000003} SSH_FXP_OPENDIR: path: "/sample/sftp2file"
[30] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_GLOBAL_REQUEST: user: keepalive@jcraft.com; wants reply: 1
[31] Thu 18Oct12 08:21:13 - (969463) SSH2_MSG_REQUEST_FAILURE: "keepalive@jcraft.com" is an unsupported global request
[31] Thu 18Oct12 08:21:13 - (969463) {0x00000003} SSH_FXP_HANDLE: directory handle: 0x00000000
[32] Thu 18Oct12 08:21:13 - (969463) {0x00000004} SSH_FXP_READDIR: handle: 0x00000000
[33] Thu 18Oct12 08:21:13 - (969463) {0x00000004} SSH_FXP_NAME: found 1 items
[32] Thu 18Oct12 08:21:13 - (969463) {0x00000005} SSH_FXP_READDIR: handle: 0x00000000
[31] Thu 18Oct12 08:21:13 - (969463) {0x00000005} SSH_FX_EOF: End of file.
[30] Thu 18Oct12 08:21:13 - (969463) {0x00000006} SSH_FXP_CLOSE: handle: 0x00000000
[31] Thu 18Oct12 08:21:13 - (969463) {0x00000006} SSH_FX_OK: Success.
[02] Thu 18Oct12 08:21:13 - (969463) Closed session
[02] Thu 18Oct12 08:21:13 - (969463) User "user" logged out

To me it looks like Serv-U does not recognize/support the keepalive request. Is this something that has to be explicitly supported by the sftp server?

/Bengt



2012/10/17 Bengt Rodehav <bengt@rodehav.com>
Before enabling setServerAlive on the production server I tested it locally (using Serv-U as sftp server) to see that it actually will work. I set the Camel option (for the sftp component) serverAliveInterval to 10. I poll for files every 30 seconds. When I do this, it seems Jsch can no longer connect to the server. I get the following exception:

2012-10-17 14:29:50,350 | DEBUG | sample/sftp2file | SftpConsumer                     | t.file.remote.RemoteFileConsumer  126 | Could not connect to: Endpoint[sftp://user@localhost/sample/sftp2file?delay=30000&delete=true&disconnect=true&include=(?i).*(?%3C!%5C.TMP)&password=******&serverAliveInterval=10&stepwise=false]. Will try to recover.
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot connect to sftp://user@localhost:22
at org.apache.camel.component.file.remote.SftpOperations.connect(SftpOperations.java:121)[123:org.apache.camel.camel-ftp:2.9.1]
at org.apache.camel.component.file.remote.RemoteFileConsumer.connectIfNecessary(RemoteFileConsumer.java:144)[123:org.apache.camel.camel-ftp:2.9.1]
at org.apache.camel.component.file.remote.RemoteFileConsumer.recoverableConnectIfNecessary(RemoteFileConsumer.java:123)[123:org.apache.camel.camel-ftp:2.9.1]
at org.apache.camel.component.file.remote.RemoteFileConsumer.prePollCheck(RemoteFileConsumer.java:56)[123:org.apache.camel.camel-ftp:2.9.1]
at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:89)[110:org.apache.camel.camel-core:2.9.1]
at org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:140)[110:org.apache.camel.camel-core:2.9.1]
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:92)[110:org.apache.camel.camel-core:2.9.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)[:1.6.0_25]
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)[:1.6.0_25]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)[:1.6.0_25]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)[:1.6.0_25]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)[:1.6.0_25]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)[:1.6.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)[:1.6.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)[:1.6.0_25]
at java.lang.Thread.run(Thread.java:662)[:1.6.0_25]
Caused by: com.jcraft.jsch.JSchException: timeout in wating for rekeying process.
at com.jcraft.jsch.Session.write(Session.java:1192)[122:org.apache.servicemix.bundles.jsch:0.1.44.2]
at com.jcraft.jsch.DHG1.init(DHG1.java:127)[122:org.apache.servicemix.bundles.jsch:0.1.44.2]
at com.jcraft.jsch.Session.receive_kexinit(Session.java:547)[122:org.apache.servicemix.bundles.jsch:0.1.44.2]
at com.jcraft.jsch.Session.connect(Session.java:291)[122:org.apache.servicemix.bundles.jsch:0.1.44.2]
at org.apache.camel.component.file.remote.SftpOperations.connect(SftpOperations.java:93)[123:org.apache.camel.camel-ftp:2.9.1]
... 15 more

The polling now stops and the files are not retrieved. Am I doing something wrong here? What is the "rekeying process"?

Thanks,

/Bengt



2012/10/17 Bengt Rodehav <bengt@rodehav.com>
OK - I will let you know when we've tested with setServerAliveInterval. Mind you, it will take a couple of months until we know (or at least believe) that we have fixed the problem since it occurs about once a  month today.

/Bengt


2012/10/17 Atsuhiko Yamanaka <ymnk@jcraft.com>
Hi,

   +-From: Bengt Rodehav <bengt@rodehav.com> --
   |_Date: Wed, 17 Oct 2012 09:56:26 +0200 ____
   |
   |But, how is it possible that the read() call can hang for several days?
   |Does Jsch use any default read timeout? If not, how can I configure it?
   |Finally I would hope that the TCPKeepAlive at the socket level would kick
   |in after a while and mark the other end as "dead" and then close
   |the socket.

If setServerAliveInterval() does not resolve the problem,
TCPKeepAlive will not. I just want to get a hint to find where is the problem.


Sincerely,
--
Atsuhiko Yamanaka
JCraft,Inc.
1-14-20 HONCHO AOBA-KU,
SENDAI, MIYAGI 980-0014 Japan.
Tel +81-22-723-2150