[TOC]
If $options->{cross_validate} or the --cross-validate command-line option is set, the run is modified to drop out wells and then predict them. If a number is specified, then that number of wells is dropped (if 1 is specified then a deterministic loop over all wells is used; if more than 1 then a monte-carlo selection of the wells to be dropped is implemented).
If the text string "specified" is used, this indicates that the list of wells to be dropped is given in a file. This file is specified as $options->{cross_validate_specified} and cannot be given on the command line. In this case, only one run is performed. Care must be taken to ensure that in the situation where multiple well-horizon intersections occur, the SPRINT technique of appending asterisks to the well names does not cause problems. The suggested solution is to ensure that all well names in the input files are unique.
If a file called "cross_validation_supplement.txt" exists, then it is assumed to be a renamed cross_validation_stats.txt file from a previous run and is used to set the weighting factors for the --not-montecarlo option for methods which do not already exist in the current run.
To ensure statistically valid results, a radius can be specified using --validation-radius; wells within this distance of an excluded well will also be dropped.
The default action is to calculate cross-validation statistics on the last
horizon only. If $options->{cross_validate_all_horizons} is set then the
residuals, and statistics, are computed for all horizons.
Using $options->{cross_validation_weights}{well_name} gives a weighted sum in the RMS calculation. Distances can be computed using a Perl one-liner e.g.:
perl -F, -ane \'next if /^X/; $\F[3] =~ s/ /_/g; $d = sqrt(($\F[0]-3561125.56)**2 + ($\F[1]-5855129.57)**2); $\F[3] $d \' 08_TVDSS_all_xyz.csv > distance_from_Field.txt
Cross validation is designed to be run multiply in
parallel, both using multiple cores on one computer, and using multiple
computers all accessing the same shared network drive. Extensive locking
is used to attempt to prevent interference between the runs, although not
all network file systems are friendly to this sort of approach. When the
number of parallel processes accessing the same network file system
becomes large, the overhead of handling all the locking can be a
bottleneck. Also, there may be situations where a shared network file
system is not available. Xval therefore supports a second method of
running in parallel, by sharing the cross validation report files. The
file cross_validation_report.txt contains all the results of the cross
validation runs, and this file is parsed from time-to-time to create the
cross_validation_stats.txt file, which in turn is read to determine the
weights in the (recommended) not-montecarlo semi-optimisation mode of
running. In order to merge in results from other cross-validation runs
running in parallel, Xval also reads any files named
cross_validation_report_extra*.txt if they exist. These are intended to
be copied in (preferably by rsync) from other locations running the same
analysis in parallel, but which can\'t necessarily access the network
file system in use.