Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Home

Salvatore Pinto

Grid-Cache system

Software overview

The Grid-Cache system is an HTTPs client/server application which provided access to local and remote file repositories using a cache based optimization. For remote repositories, the server acts as a mirror of the remote directory, making the effort to download the file from the remote repository transparent to the client. Also, the client (which can be a grid job) does not need to have credentials to access the remote repositories, but only to the grid cache server.

The system permits to limit the access to local and remote reporitories via customizable rules, custom per file ACLs. It permits also to perform all the basic operation over the remote and local repositories (write, read, list), followin the Grid Security Infrastructure (GSI) paradigm.

To optimize the access of remote repositories over slow networks (such as the internet), a local cache is maintained by the server, storing the last accessed files. When a job requests for a file (via the client application), the server checks if the file is in the local cache and, if yes, the file is sent to the client, if not, the file is downloaded from the remote repository and then sent to the client.

The data transfer is performed via an internal HTTPs server, but support is provided for other protocols as well (ex. GridFTP, direct access via NFS share, etc...). The client can be any browser (equipped with an authorized user X509 certificate or X509 proxy certificate) or the basic Linux/Window HTTPs enabled command-line applications (like cURL or wget). A sample generic command line application client, named secp, is provided in the source package.

Main Features

  • Integrated GSI compliant HTTPs server, with X509 proxy certificate and grid-mapfile support
  • Possibility to share local and remote repositories
  • Remote repository access is customizable via shell scripts
  • Support file READ, file WRITE, directory LIST and directory CREATE operations, both on local and remote repositories
  • Uses internal cache system to optimize READ operations over remote repositories
  • Internal cache "expire time" and "maximum size" policy, customizable via shell scripts.
  • Per file ACLs via custom ACL file or custom shell script.

Download

.

Project history

The Grid-Cache server project starts at ESA-ESRIN as a tool for the Grid-Procesing On Demand (G-POD) jobs to access remote data repositories whichs uses not GSI compliant protocols, shared credentials and strict resource access limits (ex. no more than two concurrent files download)

During the years, the system extended its feature, adding support for the HTTPs protocol (which is now the main protocol for data request and other operations), local repositories access and write operations, thus moving to a complete alternative to GridFTP.

The version 2.0 of th system, completely rewritten to support HTTPs, has been released as Open Source in 2011 (with GPLv3 license), and faces active development since then.

Considering the advantages in terms of firewall and network configuration to deal with the HTTPs protocol over the GridFTP one, the Grid-Cache server is now replacing the Globus GridFTP server in the G-POD Grid servers located on external an internal Cloud infrastructures.

TODO list

A partial TODO list is here

If you'd like to have a new feature or you find something else which should be changed, you can contact the project manteiners using the SourceForge ticketing systems or directly via email to eo-gpod@esa.int

Documentation

The following documentation is available on the wiki:

Further documentation is provided inside the source package