Galago Temporary Files

David Fisher

Galago Temporary Files

We provide this information so that appropriate locations can be chosen for data files.

The index construction algorithms used by Galago store two types of temporary files: partially sorted data stream files and tupleflow job data files.

Partially sorted files consist of streams of tupleflow data types in sorted order. They are stored in the temporary directory specified in ~/.galago.conf; see [Galago Configuration]. These files are named using the prefix: "tupleflow". It is worth knowing that the default location for these files on unix is /tmp/, and on OSX it is /var/folders/xx/xxxxxxxxxxx/-Tmp-/.

Tupleflow job files are stored in the folder specified by the "galagoJobDir" parameter. If this parameter is not specified, then a new folder is created in the temporary directory specified in ~/.galago.conf; see [Galago Configuration]. This folder contains one data folder for each tupleflow stage connection, and possibly one folder with the job stage information.

Both types of temporary data can be massive, they should be stored on a hard disk.

If galago is run in drmaa mode, then the job directory must be stored in a cross mounted location.
Galago will be most efficient if the sorter files are stored on storage local to each of the computation nodes.
For example; the job dir might be stored in:


While the temporary sort files could be stored in:



