|
From: Markus B. <ma...@vp...> - 2009-08-18 08:27:00
|
Hi Dave. Thanks for looking into that. Inline... ----- "David Meredith (STFC,DL,ESC)" <dav...@st...> wrote: > Hi Markus, > > I was thinking about the issues you reported last-night and I may > have > an idea why its failing. I think it may? be related to a feature that > I > added to the GridFTPFileSystem that prevented gridftp servers from > un-mounting the home directory given prolonged idle time delays (i.e. > the tcp connection stayed alive but after say 60secs idle time, some > of > the NGS gridftp servers would un-mount the users home dir which > caused > any subsequent VFS operations to fail). Adding this feature to punt > the > remote server to mount the users home dir if un-mounted sounds like a > potential candidate for the multi-threading issues. Would it be easy enough for me to try to disable this feature so I can test whether I get more reliabe multi-threaded transfers? Where would I have to look? > The performance issue is (I think) due to the following: > Presumably you are using the vfs copy function. If true, then this > function only uses optimisations when copying between two gridftp > servers for doing the 3rd party transfer. Otherwise, the method falls > back to the VFS byte streaming approach, e.g. when copying between > two > different file systems, which includes local file:// and gsiftp://. > This default byte streaming approach does not use any other protocol > specific optimisations to improve performance, rather it simply > streams > bytes using a input and output streams (bit pipe). Therefore, some > work > is still needed to improve this method, specifically for adding > switches > to use any optimisations for copying between local file:// and a > remote > protocol (e.g. gsiftp get and put). At this stage, I'm not overly concerned about the performance issues. At the moment I'm just piping together input & output streams myself (input stream from a datahandler that gets it's data from a webservice method) and that's probably a much bigger bottleneck than anything else. Also, I'm not really transferring that many or really big files, so I can live with that. > > These issues definitely need looking at, but unfortunately I am on > leave > and out of office a great deal over the next couple of weeks > (however, > I'll take a look when I can and keep you updated regarding progress). > If > a proposal that I recently submitted gets funded, this will give me > time > to do all this, fingers crossed, otherwise its best effort). That would be great. Good luck. ...to all of us :-) BTW, anything new regarding the transfer of lots-of-small files issue? Cheers, Markus > > > dave > > > > > -----Original Message----- > From: Markus Binsteiner [mailto:ma...@vp...] > Sent: 17 August 2009 04:07 > To: com...@li... > Subject: Re: [Commonsvfsgrid-developers] Gridftp problems when using > multiple threads > > Some more info: > > First, I forgot to say: if I use only a single thread, nothing ever > fails. > > Also, we did some more testing and captured some gridftp server logs. > > For a connection that works, this is what the logs say: > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [CLIENT]: > > PASV^M > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [SERVER]: > > 227 Entering Passive Mode (132,181,39,23,156,66)^M > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [CLIENT]: > > TYPE A^M > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [SERVER]: > > 200 Type set to A.^M > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [CLIENT]: > > CWD /home/users/bestgrid/grid-admin^M > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: [SERVER]: > > 250 CWD command successful.^M > [31204] Mon Aug 17 14:33:15 2009 :: hqrouter.vpac.org:40368: [CLIENT]: > > MLST C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > [31204] Mon Aug 17 14:33:15 2009 :: hqrouter.vpac.org:40368: [SERVER]: > > 250-status of C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > Type=dir;Modify=20090817020959;Size=4096;Perm=cfmpel;UNIX.mode=0755;UNIX > .owner=grid-admin;UNIX.group=grid-admin;Unique=f-2dcc151; > C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > 250 End.^M > [31204] Mon Aug 17 14:33:19 2009 :: hqrouter.vpac.org:40368: > [CLIENT]: > PWD^M > [31204] Mon Aug 17 14:33:19 2009 :: hqrouter.vpac.org:40368: [SERVER]: > > 257 "/home/users/bestgrid/grid-admin" is current directory.^M > > A similar transaction (I think) failing: > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [CLIENT]: > > PASV^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [SERVER]: > > 227 Entering Passive Mode (132,181,39,23,156,64)^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [CLIENT]: > > TYPE A^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [SERVER]: > > 200 Type set to A.^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [CLIENT]: > > CWD /home/users/bestgrid/grid-admin^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [SERVER]: > > 250 CWD command successful.^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [CLIENT]: > > MLST C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: [SERVER]: > > 250-status of C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > Type=dir;Modify=20090817020959;Size=4096;Perm=cfmpel;UNIX.mode=0755;UNIX > .owner=grid-admin;UNIX.group=grid-admin;Unique=f-2dcc151; > C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > 250 End.^M > [31181] Mon Aug 17 14:33:43 2009 :: Closed connection from > hqrouter.vpac.org:40367 > > So, it seems somehow the client can't interpret the server response > for > the PWD or, more likely, the server doesn't even get the request > (since > it doesn't say: [CLIENT]: PWD, but just closes the connection). > > In the client debug log I have an exception like: > > ERROR - isConnected error > org.globus.ftp.exception.ServerException: The server uses unknown > communication protool. Custom message: (error code 2) [Nested > exception > message: Custom message: Cannot parse 'PWD' reply: 200 Type set to > I.]. > Nested exception is org.globus.ftp.exception.FTPReplyParseException: > Custom message: Cannot parse 'PWD' reply: 200 Type set to I. > at org.globus.ftp.FTPClient.getCurrentDir(FTPClient.java:345) > at > org.apache.commons.vfs.provider.gridftp.cogjglobus.GridFtpFileSystem.isC > onnected(GridFtpFileSystem.java:176) > at > org.apache.commons.vfs.provider.gridftp.cogjglobus.GridFtpFileSystem.get > Client(GridFtpFileSystem.java:144) > > This seems suspicious: Cannot parse 'PWD' reply: 200 Type set to I. > > Any chance the client is mixing up connections somehow? > > Any other ideas? > > Cheers, > Markus > > Markus Binsteiner wrote: > > Hi. > > > > I'm wondering whether anybody here has got experience with this: > > > > I wrote a grid job submission client that tries to make job > submission > > > as easy as possible for the user. For example, it uploads files > from > the > > users desktop to the jobdirectory on the cluster where the job will > run. > > I'm using commons-vfs-grid to do all the gridftp things. > > > > Now, every now and then the upload of one of the input files fails. > The > > reason is not always the same, sometimes its for example: > > Could not determine the type of file > "gsiftp://ng2.vpac.org/home/grid-admin" > > , sometimes it's: > > Could not write to > > > "gsiftp://ng2.vpac.org/home/grid-admin/C_AU_O_APACGrid_OU_VPAC_CN_Markus > _Binsteiner/simpleTestTarget.txt82 > > > > Sometimes it's something else... > > > > If I repeat the same jobsubmission a few seconds later it works > fine. > > > > I tried to understand what's happening but it's really hard because > I > > can't reproduce the errors, they just happen once in a while. > > > > To get more data, I wrote a gridftp test client which executes > several > > > different tests, and one thing I can reproduce is that, if I use > more > > threads that do the same action (e.g. upload several files at once > to > > the same gridftp location), the more threads I use, the more errors > I > > get. That may have nothing to do with my initial error, but it's > still > a > > problem since I need to do multiple transfers in different threads > > sometimes. > > > > Roughly, if I use 10 threads, I get about 20 failures out of 100 > file > > transmissions. Sometimes more, sometimes less. For 3 threads, I get > > > around 7. > > > > I use a new FileSystem for every transfer. I set it up something > like: > > > > ----------------- > > fsmanager = VFSUtil.createNewFsManager(false, false, > true, > true, > > true, true, true, null); > > // the fsmanager is created only once > > ----------------- > > > > FileSystemOptions opts = new FileSystemOptions(); > > > > if (fileUrl.startsWith("gsiftp")) { > > GridFtpFileSystemConfigBuilder builder = > > GridFtpFileSystemConfigBuilder > > .getInstance(); > > builder.setGSSCredential(opts, gsscredential); > > } > > > > FileObject fileRoot = > fsmanager.resolveFile(mp.getRootUrl(), > opts); > > > > FileSystem fileBase = fileRoot.getFileSystem(); > > > > // do the transfer.... > > > > Anything wrong here? > > > > This not only happens if I do simultaneous upload. It also happens > with > > download or if I just do an ls on a remote directory > simultaneously... > > > > One theory I have is that maybe somehow the client times out more > easily > > when there are several connections at the same time, but I wouldn't > have > > a clue as to where to look for something like this. > > > > I'm not sure whether commons-vfs is threadsafe, but since I'm > creating > a > > new Filesystem for every transfer that shouldn't really be an > issue, > or > > am I wrong? Or is there some automatic pooling or such happening > down > > the track either in the commons-vfs or cog libraries? > > > > Any ideas on what could be the issue? I don't think it's a server > error > > because I used the commandline c client to do something equivalent > and > > > that was fine (and much much faster :-) )... > > > > Cheers, > > Markus > > > > > > > > > ------------------------------------------------------------------------ > ------ > > Let Crystal Reports handle the reporting - Free Crystal Reports > 2008 > 30-Day > > trial. Simplify your report design, integration and deployment - > and > focus on > > what you do best, core application coding. Discover what's new with > > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Commonsvfsgrid-developers mailing list > > Com...@li... > > > https://lists.sourceforge.net/lists/listinfo/commonsvfsgrid-developers > > > > > ------------------------------------------------------------------------ > ------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Commonsvfsgrid-developers mailing list > Com...@li... > https://lists.sourceforge.net/lists/listinfo/commonsvfsgrid-developers > -- > Scanned by iCritical. |