Thread: [Soaplab-users] Question regarding input and output files
Brought to you by:
marsenger
From: Daniel B. <Ble...@ma...> - 2010-10-05 20:57:38
|
Hi, We have (perhaps) an unusual situation. I have defined our command line processes in Soaplab2 and can properly access them from a client. What we would like to do is be able to pass filenames of files residing on our cluster to the services. The service would process and write out a new file, also residing on the cluster filesystem. The reason for this is that our inputs and outputs tend to be 100¹s of megabytes, and it is not really feasible to pass them across the network. When I tried to specify a filename on the cluster filesystem, I get this error: * org.soaplab.share.SoaplabException: Soaplab::NotValidInputs [Input] Refusing to resolve the reference: /mi3c/projects/ImageLinks/MRA-0001/MR_20080811/0003_3DTOF/ Is there anyway this can be done using soaplab? Thanks, -dan -- Daniel Blezek, PhD Medical Imaging Informatics Innovation Center P 127 or (77) 8 8886 T 507 538 8886 E ble...@ma... Mayo Clinic 200 First St. S.W. Harwick SL-44 Rochester, MN 55905 mayoclinic.org "It is more complicated than you think." -- RFC 1925 |
From: Martin S. <mar...@gm...> - 2010-10-06 10:19:04
|
This is a good and valid question and concern. There are several options - but you have always to keep in mind that this "refusing" was done on purpose, to keep Soaplab2 services secure enough. Some of the following options may open your file system too much. In all options, the input files to be used will need to be referenced as URLs - not as a direct data. By default, all input files defined in ACD files are specified as working also as URLs (so you do not need to change ACD files). But there is another issue: the size of the files. When an input file is referenced by its URL the Soaplab2 fetches it and stores it in a server-side file and passes this new file name to the application. So the file is actually physically copied. If you wish to avoid it, see the last option in the list below. 1) Soaplab2 can access your input data using a true URL (using http or ftp protocols). That can be done only if you have a web server able to serve your data. Such server does not need to be visible outside; it could be used just by your Tomcat server that contain the Soaplab2 web application. If can be, for example, referenced as a 'localhost' (but your users must know it). This is probably the "cleanest" way. 2) Or one can use the fact that Soaplab2 does not refuse to serve data if they sit in its "sandbox". Therefore, you can either make sure that your sandbox is where your data are, or you can create symbolic links in the sandbox pointing to your data elsewhere (on the same file system). You can define what sandbox directory Soaplab2 will be using by the property "working.dir" (see the Configuration guide where to put this property). Be careful that this directory must be writable for the user running Tomcat. Your users, however, need to know the full path to the sandbox (I think that it is a bit unfortunate that Soaplab2 does not use a simple file without absolute path as a file being in a sandbox - perhaps I should fix this in the next Soaplab2 release; but I never expected that dealing directly with files in the sandbox would be used much). 3) Or, you can write your own Java class - a simple extension of the SowaJob (or EmbossJob) class that overwrites only the method "isDataReferenceSafe". If it return true, all local files will be allowed to read. Or you can return true only for some files. 4) Last but not least: In ACD file, you can define your inputs not as inputs but as strings. Your application then gets these strings and uses them as file names. The Soaplab2 will not know that these strings are actually file names, so no checking is done. It will depend on your application what it reads and what it does not. The all options above are more or less for input files. The output files, Soaplab2 creates them in the "result" directory (which can be also defined by a property - the "results.dir" property). And these files can be delivered to the users also as URLs (which means that big files will be passed by the http protocol which does not need to read the whole file first into memory). Please do not hesitate to ask for further clarification. Cheers, Martin -- Martin Senger email: mar...@gm...,mar...@ka... skype: martinsenger |
From: Daniel B. <Ble...@ma...> - 2010-10-06 13:35:45
|
Hi Martin, Thanks for the response. From reading the docs (very good, btw), I had come to these same conclusions. We would like to use Taverna, but our processing model really doesn¹t fit all that well in the grid model, as this question indicates. Best, -dan On 10/6/10 5:18 AM, "Martin Senger" <mar...@gm...> wrote: > This is a good and valid question and concern. There are several options - but > you have always to keep in mind that this "refusing" was done on purpose, to > keep Soaplab2 services secure enough. Some of the following options may open > your file system too much. > > In all options, the input files to be used will need to be referenced as URLs > - not as a direct data. By default, all input files defined in ACD files are > specified as working also as URLs (so you do not need to change ACD files). > > But there is another issue: the size of the files. When an input file is > referenced by its URL the Soaplab2 fetches it and stores it in a server-side > file and passes this new file name to the application. So the file is actually > physically copied. If you wish to avoid it, see the last option in the list > below. > > 1) Soaplab2 can access your input data using a true URL (using http or ftp > protocols). That can be done only if you have a web server able to serve your > data. Such server does not need to be visible outside; it could be used just > by your Tomcat server that contain the Soaplab2 web application. If can be, > for example, referenced as a 'localhost' (but your users must know it). This > is probably the "cleanest" way. > > 2) Or one can use the fact that Soaplab2 does not refuse to serve data if they > sit in its "sandbox". Therefore, you can either make sure that your sandbox is > where your data are, or you can create symbolic links in the sandbox pointing > to your data elsewhere (on the same file system). You can define what sandbox > directory Soaplab2 will be using by the property "working.dir" (see the > Configuration guide where to put this property). Be careful that this > directory must be writable for the user running Tomcat. Your users, however, > need to know the full path to the sandbox (I think that it is a bit > unfortunate that Soaplab2 does not use a simple file without absolute path as > a file being in a sandbox - perhaps I should fix this in the next Soaplab2 > release; but I never expected that dealing directly with files in the sandbox > would be used much). > > 3) Or, you can write your own Java class - a simple extension of the SowaJob > (or EmbossJob) class that overwrites only the method "isDataReferenceSafe". If > it return true, all local files will be allowed to read. Or you can return > true only for some files. > > 4) Last but not least: In ACD file, you can define your inputs not as inputs > but as strings. Your application then gets these strings and uses them as file > names. The Soaplab2 will not know that these strings are actually file names, > so no checking is done. It will depend on your application what it reads and > what it does not. > > The all options above are more or less for input files. The output files, > Soaplab2 creates them in the "result" directory (which can be also defined by > a property - the "results.dir" property). And these files can be delivered to > the users also as URLs (which means that big files will be passed by the http > protocol which does not need to read the whole file first into memory). > > Please do not hesitate to ask for further clarification. > > Cheers, > Martin -- Daniel Blezek, PhD Medical Imaging Informatics Innovation Center P 127 or (77) 8 8886 T 507 538 8886 E ble...@ma... Mayo Clinic 200 First St. S.W. Harwick SL-44 Rochester, MN 55905 mayoclinic.org "It is more complicated than you think." -- RFC 1925 |