--- a
+++ b/doc/README.bw
@@ -0,0 +1,132 @@
+The functions with prefix "bw_" (for "beowulf") have a specialized
+purpose and a high-level user interface. They are made for clusters
+with machines that may sometimes be unavailable, or get unavailable
+during a job, typically if they have also Windows installed and users
+sometimes restart them to temporarily use Windows. Also, temporary
+inavailability of the central machine during the job is allowed for.
+
+Prerequisites:
+
+-- One central machine with a Unix-like OS which is running most of
+   the time.
+
+-- Some other machines which at least sometimes run a Unix-like OS and
+   an SSH server.
+
+-- Authentication to the central machine gives passwordless access to
+   the other machines. To start the job while logged into a machine
+   different from the central machine, there should be passwordless
+   access to the central machine from there. Otherwise you might be
+   prompted for a password and it might work, but this is not tested.
+
+-- All Octave-related software used by the jobs is available on each
+   machine. To start the job while logged into a machine different
+   from the central machine, the used startup files and the file with
+   arguments (see below) must be available there. Using a network
+   filesystem for the home directories is recommended.
+
+The user has to supply a function which will be called with different
+sets of arguments. The supplied function is of the form
+
+function result = f (args[, args_id])
+
+i.e. it accepts an argument "args", which might be a structure or
+cell-array to accomodate a set of arguments, and possibly args_id,
+which is the index of "args" within all its possible values (given in
+a cell-array, see below). "results" of course may be a structure or
+cell array too to accomodate more than one value.
+
+For each set of arguments, the function is run at a different one of
+the currently available machines. The user supplies a one-dimensional
+cell-array with one set of arguments (i.e. the value of "args") in
+each entry. The cell-array must be stored in a file under the data
+directory (see below) and remain there until computation is finished
+(for the case the scheduler needs restarting).
+
+The current state is kept in a variable "state" saved to a file whose
+name is sprintf("%s-%s.state", functionname, argumentsfilename) within
+a state directory.
+
+Some of the functions read the startup files fullfile(OCTAVE_HOME (),
+"share/octave/site/m/startup/bwrc") and then "~/.bwrc", if it
+exists. In these files, the following configuration variables can
+be set:
+
+-- "computing_machines": cell-array of addresses (strings),
+
+-- "central_machine": single address (string),
+
+-- "data_dir" (optional): data directory for argument files (default:
+   "~/bw-data"),
+
+-- "state_dir" (optional): state directoy (default: "~/.bw-state"),
+
+-- "min_save_interv" (optional): mininal time-interval in seconds for
+   saving the state (default: 10). The state contains, among others,
+   all currently computed results of the user function. If saving
+   these should take a long time (you could test this by saving some
+   results with Octaves "save" function), you can set min_save_interv
+   to a higher value.
+
+-- "connect_timeout" (optional): timeout for connection attempts in
+   seconds (default: 30). Scheduler will wait at least so long before
+   this machine is contacted again, even if connection was refused
+   before timeout.
+
+
+
+To start a job:
+
+Prepare user function for your job with the above properties, prepare
+cell-array of argument variables for the function and save it in the
+data directory. On any of the machines, run from Octave:
+
+bw_start ("my_function", "argument_filename");
+
+This starts the scheduler on the central machine in the background
+(with nohup) and returns. You can log out then. If the job had been
+running before, e.g if the scheduler had been killed for some reason,
+it is restarted.
+
+
+To inspect jobs:
+
+bw_list ();
+
+
+To retrieve results:
+
+bw_retrieve (<arguments documented within the function>)
+
+
+To restart all pending jobs:
+
+bw_start () # without arguments
+
+This may be necessary if the scheduler had been killed, or the central
+machine was restarted, or maybe the Kerberos tickets got expired ...
+
+
+To stop a job and/or remove the statefile:
+
+bw_clear (<arguments documented within the function>)
+
+
+
+
+Technical notes:
+
+The scheduler forks child processes for each configured computing
+machine and opens a permanent ssh connection with a permanent Octave
+process running remotely. Different sets of arguments (single
+variable) are sent over the connection and the respective results
+(single variable) are sent back. If a connection gets unavailable, the
+child process tries to restart it. The configured computing machines
+are continuously scanned for available machines.
+
+Advisory locking is used to avoid starting more than one scheduler for
+a single combination of user_function/argument_file.
+
+
+
+Olaf Till <olaf.till@uni-jena.de>, 2009-03-29