|
From: Meredith, D. (STFC,DL,ESC) <dav...@st...> - 2009-08-25 10:05:31
|
hi folks, sorry, currently in an ICAT meeting for next two days - i will get back to you as soon as i can (prob next week now). ------------------------ David Meredith STFC eScience Centre Daresbury Laboratory Warrington Cheshire WA4 4AD Tel: 01925 603762(Direct Line) email: dav...@st... -----Original Message----- From: Markus Binsteiner [mailto:ma...@vp...] Sent: Sun 23/08/2009 7:49 AM To: commonsvfsgrid-developers Subject: Re: [Commonsvfsgrid-developers] Gridftp problems when using multiple threads Hi. Did some more investigating... I'm pretty sure now that the problem only occurs when I access the *same* filesystem/gridftp-server concurrently. If I have 10 threads accessing different filesystems everything is all right. So, maybe it's a limitation in the cog gridftp client class? It's supposed to support parallel transfers, but those probably need to be setup differently than what I'm doing. To test this I tried to change the commons-vfs-grid code to create a new GridFTPClient object for every filesystem access. But I'm not sure whether I really succeeded doing that or whether it just didn't help... Any other suggestions? Cheers, Markus ----- "David Meredith (STFC,DL,ESC)" <dav...@st...> wrote: > Hi Markus, > > Take a look at getClient() and createAndConfigureGridFtpClient() in > GridFtpFileSystem.java. Note line 159: idleClient.changeDir(defdir); > I'll have a crack at this later today also. > > I didn't really get very far with the many small files issue (again > due to time issues). We also have to update to jglobus 1.7 which may > help. I suspect that using the gridftp get and put (rather than byte > stream) to get and put between file:// and gsiftp:// will also address > this. > > dave > > > -----Original Message----- > From: Markus Binsteiner [mailto:ma...@vp...] > Sent: 18 August 2009 09:26 > To: Meredith, David (STFC,DL,ESC) > Cc: com...@li... > Subject: Re: [Commonsvfsgrid-developers] Gridftp problems when using > multiple threads > > Hi Dave. > > Thanks for looking into that. Inline... > > ----- "David Meredith (STFC,DL,ESC)" <dav...@st...> > wrote: > > > Hi Markus, > > > > I was thinking about the issues you reported last-night and I may > > have > > an idea why its failing. I think it may? be related to a feature > that > > I > > added to the GridFTPFileSystem that prevented gridftp servers from > > un-mounting the home directory given prolonged idle time delays > (i.e. > > the tcp connection stayed alive but after say 60secs idle time, > some > > of > > the NGS gridftp servers would un-mount the users home dir which > > caused > > any subsequent VFS operations to fail). Adding this feature to punt > > the > > remote server to mount the users home dir if un-mounted sounds like > a > > potential candidate for the multi-threading issues. > > Would it be easy enough for me to try to disable this feature so I can > test whether I get more reliabe multi-threaded transfers? Where would > I have to look? > > > The performance issue is (I think) due to the following: > > Presumably you are using the vfs copy function. If true, then this > > function only uses optimisations when copying between two gridftp > > servers for doing the 3rd party transfer. Otherwise, the method > falls > > back to the VFS byte streaming approach, e.g. when copying between > > two > > different file systems, which includes local file:// and gsiftp://. > > This default byte streaming approach does not use any other > protocol > > specific optimisations to improve performance, rather it simply > > streams > > bytes using a input and output streams (bit pipe). Therefore, some > > work > > is still needed to improve this method, specifically for adding > > switches > > to use any optimisations for copying between local file:// and a > > remote > > protocol (e.g. gsiftp get and put). > > At this stage, I'm not overly concerned about the performance issues. > At the moment I'm just piping together input & output streams myself > (input stream from a datahandler that gets it's data from a webservice > method) and that's probably a much bigger bottleneck than anything > else. > Also, I'm not really transferring that many or really big files, so I > can live with that. > > > > > These issues definitely need looking at, but unfortunately I am on > > leave > > and out of office a great deal over the next couple of weeks > > (however, > > I'll take a look when I can and keep you updated regarding > progress). > > If > > a proposal that I recently submitted gets funded, this will give me > > time > > to do all this, fingers crossed, otherwise its best effort). > > That would be great. Good luck. > ...to all of us :-) > > BTW, anything new regarding the transfer of lots-of-small files > issue? > > Cheers, > Markus > > > > > > > > dave > > > > > > > > > > -----Original Message----- > > From: Markus Binsteiner [mailto:ma...@vp...] > > Sent: 17 August 2009 04:07 > > To: com...@li... > > Subject: Re: [Commonsvfsgrid-developers] Gridftp problems when > using > > multiple threads > > > > Some more info: > > > > First, I forgot to say: if I use only a single thread, nothing ever > > fails. > > > > Also, we did some more testing and captured some gridftp server > logs. > > > > For a connection that works, this is what the logs say: > > > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [CLIENT]: > > > > PASV^M > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [SERVER]: > > > > 227 Entering Passive Mode (132,181,39,23,156,66)^M > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [CLIENT]: > > > > TYPE A^M > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [SERVER]: > > > > 200 Type set to A.^M > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [CLIENT]: > > > > CWD /home/users/bestgrid/grid-admin^M > > [31204] Mon Aug 17 14:33:14 2009 :: hqrouter.vpac.org:40368: > [SERVER]: > > > > 250 CWD command successful.^M > > [31204] Mon Aug 17 14:33:15 2009 :: hqrouter.vpac.org:40368: > [CLIENT]: > > > > MLST C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > [31204] Mon Aug 17 14:33:15 2009 :: hqrouter.vpac.org:40368: > [SERVER]: > > > > 250-status of C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > > Type=dir;Modify=20090817020959;Size=4096;Perm=cfmpel;UNIX.mode=0755;UNIX > > .owner=grid-admin;UNIX.group=grid-admin;Unique=f-2dcc151; > > C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > 250 End.^M > > [31204] Mon Aug 17 14:33:19 2009 :: hqrouter.vpac.org:40368: > > [CLIENT]: > > PWD^M > > [31204] Mon Aug 17 14:33:19 2009 :: hqrouter.vpac.org:40368: > [SERVER]: > > > > 257 "/home/users/bestgrid/grid-admin" is current directory.^M > > > > A similar transaction (I think) failing: > > > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [CLIENT]: > > > > PASV^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [SERVER]: > > > > 227 Entering Passive Mode (132,181,39,23,156,64)^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [CLIENT]: > > > > TYPE A^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [SERVER]: > > > > 200 Type set to A.^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [CLIENT]: > > > > CWD /home/users/bestgrid/grid-admin^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [SERVER]: > > > > 250 CWD command successful.^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [CLIENT]: > > > > MLST C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > [31181] Mon Aug 17 14:33:13 2009 :: hqrouter.vpac.org:40367: > [SERVER]: > > > > 250-status of C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > > Type=dir;Modify=20090817020959;Size=4096;Perm=cfmpel;UNIX.mode=0755;UNIX > > .owner=grid-admin;UNIX.group=grid-admin;Unique=f-2dcc151; > > C_AU_O_APACGrid_OU_VPAC_CN_Markus_Binsteiner^M > > 250 End.^M > > [31181] Mon Aug 17 14:33:43 2009 :: Closed connection from > > hqrouter.vpac.org:40367 > > > > So, it seems somehow the client can't interpret the server response > > for > > the PWD or, more likely, the server doesn't even get the request > > (since > > it doesn't say: [CLIENT]: PWD, but just closes the connection). > > > > In the client debug log I have an exception like: > > > > ERROR - isConnected error > > org.globus.ftp.exception.ServerException: The server uses unknown > > communication protool. Custom message: (error code 2) [Nested > > exception > > message: Custom message: Cannot parse 'PWD' reply: 200 Type set to > > I.]. > > Nested exception is org.globus.ftp.exception.FTPReplyParseException: > > > Custom message: Cannot parse 'PWD' reply: 200 Type set to I. > > at org.globus.ftp.FTPClient.getCurrentDir(FTPClient.java:345) > > at > > > org.apache.commons.vfs.provider.gridftp.cogjglobus.GridFtpFileSystem.isC > > onnected(GridFtpFileSystem.java:176) > > at > > > org.apache.commons.vfs.provider.gridftp.cogjglobus.GridFtpFileSystem.get > > Client(GridFtpFileSystem.java:144) > > > > This seems suspicious: Cannot parse 'PWD' reply: 200 Type set to I. > > > > Any chance the client is mixing up connections somehow? > > > > Any other ideas? > > > > Cheers, > > Markus > > > > Markus Binsteiner wrote: > > > Hi. > > > > > > I'm wondering whether anybody here has got experience with this: > > > > > > I wrote a grid job submission client that tries to make job > > submission > > > > > as easy as possible for the user. For example, it uploads files > > from > > the > > > users desktop to the jobdirectory on the cluster where the job > will > > run. > > > I'm using commons-vfs-grid to do all the gridftp things. > > > > > > Now, every now and then the upload of one of the input files > fails. > > The > > > reason is not always the same, sometimes its for example: > > > Could not determine the type of file > > "gsiftp://ng2.vpac.org/home/grid-admin" > > > , sometimes it's: > > > Could not write to > > > > > > "gsiftp://ng2.vpac.org/home/grid-admin/C_AU_O_APACGrid_OU_VPAC_CN_Markus > > _Binsteiner/simpleTestTarget.txt82 > > > > > > Sometimes it's something else... > > > > > > If I repeat the same jobsubmission a few seconds later it works > > fine. > > > > > > I tried to understand what's happening but it's really hard > because > > I > > > can't reproduce the errors, they just happen once in a while. > > > > > > To get more data, I wrote a gridftp test client which executes > > several > > > > > different tests, and one thing I can reproduce is that, if I use > > more > > > threads that do the same action (e.g. upload several files at > once > > to > > > the same gridftp location), the more threads I use, the more > errors > > I > > > get. That may have nothing to do with my initial error, but it's > > still > > a > > > problem since I need to do multiple transfers in different threads > > > > sometimes. > > > > > > Roughly, if I use 10 threads, I get about 20 failures out of 100 > > file > > > transmissions. Sometimes more, sometimes less. For 3 threads, I > get > > > > > around 7. > > > > > > I use a new FileSystem for every transfer. I set it up something > > like: > > > > > > ----------------- > > > fsmanager = VFSUtil.createNewFsManager(false, false, > > true, > > true, > > > true, true, true, null); > > > // the fsmanager is created only once > > > ----------------- > > > > > > FileSystemOptions opts = new FileSystemOptions(); > > > > > > if (fileUrl.startsWith("gsiftp")) { > > > GridFtpFileSystemConfigBuilder builder = > > > GridFtpFileSystemConfigBuilder > > > .getInstance(); > > > builder.setGSSCredential(opts, gsscredential); > > > } > > > > > > FileObject fileRoot = > > fsmanager.resolveFile(mp.getRootUrl(), > > opts); > > > > > > FileSystem fileBase = fileRoot.getFileSystem(); > > > > > > // do the transfer.... > > > > > > Anything wrong here? > > > > > > This not only happens if I do simultaneous upload. It also > happens > > with > > > download or if I just do an ls on a remote directory > > simultaneously... > > > > > > One theory I have is that maybe somehow the client times out more > > easily > > > when there are several connections at the same time, but I > wouldn't > > have > > > a clue as to where to look for something like this. > > > > > > I'm not sure whether commons-vfs is threadsafe, but since I'm > > creating > > a > > > new Filesystem for every transfer that shouldn't really be an > > issue, > > or > > > am I wrong? Or is there some automatic pooling or such happening > > down > > > the track either in the commons-vfs or cog libraries? > > > > > > Any ideas on what could be the issue? I don't think it's a server > > error > > > because I used the commandline c client to do something > equivalent > > and > > > > > that was fine (and much much faster :-) )... > > > > > > Cheers, > > > Markus > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > > Let Crystal Reports handle the reporting - Free Crystal Reports > > 2008 > > 30-Day > > > trial. Simplify your report design, integration and deployment - > > and > > focus on > > > what you do best, core application coding. Discover what's new > with > > > > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > > _______________________________________________ > > > Commonsvfsgrid-developers mailing list > > > Com...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/commonsvfsgrid-developers > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > Let Crystal Reports handle the reporting - Free Crystal Reports > 2008 > > 30-Day > > trial. Simplify your report design, integration and deployment - > and > > focus on > > what you do best, core application coding. Discover what's new with > > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Commonsvfsgrid-developers mailing list > > Com...@li... > > > https://lists.sourceforge.net/lists/listinfo/commonsvfsgrid-developers > > -- > > Scanned by iCritical. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Commonsvfsgrid-developers mailing list Com...@li... https://lists.sourceforge.net/lists/listinfo/commonsvfsgrid-developers -- Scanned by iCritical. |