# Just Launched: You can now import projects and releases from Google Code onto SourceForge

We are excited to release new functionality to enable a 1-click import from Google Code onto the Allura platform on SourceForge. You can import tickets, wikis, source, releases, and more with a few simple steps.

## [99decf]: PDL / Book / Operations.pod Maximize Restore History

### Operations.pod    403 lines (316 with data), 17.0 kB

```=head1 Operating on PDLs

PDL variables are structured (conventional) arrays of numbers.
The most basic operations on them are the familiar numerical
operations. All of the familiar C-like mathematical, logical, and
bitwise operators available within Perl can be used on PDLs.

Because PDLs are multidimensional constructs, the notion of
I<operator> is generalized from scalars. Each operator has a number
of I<active dimensions> that are required to do its job, and
additional dimensions are simply looped over (or, more precisely,
simple arithmetic operators have zero active dimensions (they act
on points). Collapse operators have one active dimension (they
summarize data over a single dimension, returning a PDL that has
been collapsed to have one fewer dimensions than it started
with). A few operators, notably matrix multiplication, have more
than one active dimension.

All of the standard Perl operators apply to PDLs, with the usual
precedence rules. They are listed in Table [tab:Perl-operators].
Most arithmetic operators behave exactly as one would expect
within Perl, except threaded over the entire PDL. If you supply
both a PDL and a Perl scalar to any of the two-parameter
operators (like multiplication), the scalar is promoted to a PDL,
so the result is a PDL with appropriate threading.

=for html  <img width=400 src="Operations/operator-precedence-table.png">

Operator Precedence Table for PDL.

PDLs can be used in Perl's I<boolean context>. As in the C and Perl
environments, values that are zero are treated as boolean FALSE,
and values that are nonzero are treated as TRUE. The operators
that return boolean values (like the comparison operators C<==>
and C<!=>) all return arithmetic 1 or 0, so you can use boolean

pdl> \$a = xvals(10);
pdl> \$b = ((\$a % 2) != 0);
pdl> print \$a * \$b; # mask only odd values of \$a
[0 1 0 3 0 5 0 7 0 9]
pdl> print \$a->where(\$b); # Filter only odd values of \$a
[1 3 5 7 9]

Perl has only two types of boolean operators: C-style
short-circuiting binary and ternary operators ( C<&&>, C<||>, and C<?:>),
and bitwise operators. The short-circuiting operators treat
values in true boolean context (nonzero values are TRUE, zero
values are FALSE), but may only be used for single-valued PDLs
(and, of course, Perl scalars). That restriction is because the
short-circuiting operators don't evaluate their second argument
if it won't affect the output, and that would yield
non-deterministic behavior for a PDL expression with many values.

The bitwise operators may be used on any PDL and operate on each
bit independently. They are used for generating mask PDLs and for
other threaded Boolean expressions. The operators that return
boolean values (like the comparison operators and logical-NOT C<!>)
return either 0 or 1, so that the bitwise operators give the
standard Boolean results: e.g.

pdl> \$a = xvals(10);
pdl> \$b = ((\$a % 2) == 0) | ((\$a%3)==0));
pdl> print \$a->where( ((\$a%2)==0) | ((\$a%3)==0);
[0 2 3 4 6 8 9]
pdl> print \$a->where( ((\$a%2)==0) & ((\$a%3)==0);
[0 6]

The bitwise operators can have counter-intuitive results if you
use them with arithmetic values but expect the values to be
treated as booleans. If you want to treat an arithmetic value as
a boolean with the bitwise operators, you should convert it to a
true boolean expression (0 or 1) first. The most terse way to
accomplish that is to invert it twice: C<!> always returns 0 or 1.

pdl> \$a = xvals(10);
pdl> print \$a & 5 #bitwise mask!
[0 1 0 1 4 5 4 5 0 1]
pdl> print \$a->where(\$a & 5);
[1 3 4 5 6 7 9]
pdl> print \$a->where((!!\$a) & 5);
[1 2 3 4 5 6 7 8 9]
pdl> print \$a->where((!!\$a) & 4);
Empty
pdl> print \$a->where((!!\$a) & !!4);
[1 2 3 4 5 6 7 8 0]

The C<\$a & 5> expression performs a bitwise mask of each element of
C<\$a>, keeping the 1's and 4's bit only (5 in binary is C<0101>). The
second expression, C<(!!\$a) & 5>, returns true for every nonzero
element of C<\$a>, because C<!> always returns 0 or 1. The third
expresion, C<(!!\$a) & 4>, returns C<false> for every element, because
C<!!\$a> is always 0 or 1, and 4 is C<0100> in binary: its 1's bit is
clear. The final expression, C<(!!\$a) & !!4>, returns true for every
element because both sides of the C<&> operator have been converted
to true boolean values (0 or 1).

The simplest operators have no active dimensions, simply
threading over their arguments and returning a PDL with the same
shape. Collapse operators (also called "reduce" operators in the
documentation) have one active dimension, and remove it --
collapsing an N-dimensional PDL into an (C<N-1>)-dimensional PDL.
Collapsing is useful for summing, averaging, collecting
statistics, and many other things. Many statistics functions
exist in both whole-PDL and collapse form, and when they do, the
whole-PDL version generally has a more abbreviated name -- for
example, avg gets the mean value of an entire PDL, while C<average>
only collapses one dimension.

PDL includes several explicit collapse/reduction operators that
collapse the 0 dim of the PDL; but you can also perform more
generalized reductions using the C<reduce> function supplied in the
ancillary C<PDL::reduce> module included with PDL itself (see [sub:General-purpose-collapse/reduction:-reduce]
, below). C<reduce> also lets you collapse over multiple arbitrarily
chosen dims, so you don't have to perform gyrations to get the
target dims into the 0 slot (using, for example, C<mv> and C<clump>).

B<A word of warning on collapse operators:>

Collapse operators are handy, but (currently) do not necessarily
scale well to huge operations. The operator is applied
sequentially (the simplest possible method) along the dimension
being collapsed; this is very general but can yield unintuitive
results in a few edge cases. For example, summing over 1,000,000
single-precision floats (that are of approximately equal size) is
a bad idea, because float only stores 5 significant digits; each
successive term will underflow the precision. Similarly, adding 10e6
double-precision floats will lose six orders of magnitude of
precision, for the same reason, so that the result will have at
most (13 - 6) seven orders of magnitude of precision. Tree
accumulation is more precise, and may one day be introduced to
the threading engine if there is sufficient demand. You might be
the person to demand and/or implement it...

=head3 Arithmetic collapse: C<prodover> and C<sumover>

The C<sumover> and C<prodover> functions perform summing and product,
respectively, along the 0 dim of a PDL. For example, to take the
arithmetic mean of each row of C<\$pdl>, you can use the expression

\$pdl->sumover / \$pdl->dim(0)

(though you would probably use C<average>, described below,
instead). Similarly, the geometric mean is

\$pdl->prodover ** (1/\$pdl->dim(0)).

The related functions C<sum> and C<prod> work on the whole of \$pdl and
return a single Perl scalar.

=head3 Logical collapse: C<andover> and C<orover>

The C<andover> and C<orover> operators provide true boolean (not
bitwise) masking of elements across rows. They are complementary
to C<all> (for C<andover>) and C<any> (for C<orover>), which work over the
whole PDL argument and return a single Perl scalar. For example,
C<< \$mask->any >> returns a Perl scalar true value if any bit at all is
M-PDL mask indicating whether each row contains even a single
nonzero value.

=head3 Statistical collapse: C<average> and C<statsover>

The C<average> collapse is complementary to C<avg>, which takes the
average of an entire PDL. For slightly more comprehensive
statistical work, you can use C<statsover> (complementary to C<stats>),
which returns many stats (the mean, median, RMS, min, max,
standard deviation, and population RMS) as separate PDLs in list
context; see the online documentation for more details.

PDL comes with an ancillary module, C<PDL::reduce>, that supplies a
general-purpose C<reduce> function. C<reduce> is useful for collapsing
many dims at once, or single dims other than the 0 dim of a PDL,
without having to rearrange the dim list of the source. For
each row contains a single set bit, but C<< \$mask->reduce('or',1) >>.
will return an N-PDL mask indicating whether each column contains
a single set bit. (Otherwise you'd have to say something like
C<< \$mask->mv(1,0)->orover >>). You must explicitly use C<PDL::reduce>
before you can invoke the reduce operator.

=head2 Combination operators: PDLs and Perl lists

Perl arrays/lists are different than PDLs, but often you will
want to assemble a PDL from a list of values (or other PDLs), or
to break apart a PDL into a list of components. For example, if
you have a list of images that you've read in with C<rim> or C<rfits>
(Chap. [cha:File-I/O]), you can stick them together into a single
image cube. Alternatively, you can slice up large chunks of data
into lists of smaller chunks, that are appropriate for looping
over.

=head3 Global glomming / shredding: pdl and list

The PDL constructor, C<pdl()>, can combine multiple PDLs into one,
glomming a bunch of stuff together into a single data array. For
example, if you have a collection of images you've read into a
perl list, you can make them into a data cube as follows:

for \$name(@fnames) { push(@images, rim(\$name)); } # Read some images
\$cube = pdl(@images); # \$cube is ( Wmax x Hmax x n )

You can mix-and-match different types or shapes of data, and the
resulting PDL will be large enough to contain everything. Unless
you specify a data type (see [sec:Getting-values-into]), the
resulting PDL will be promoted to the most general numeric type
in the input.

The inverse of C<pdl()> is C<list>, which will break all the elements
out of a PDL and return them as a Perl array containing all the
elements of the original. For example, list is useful if you have
to loop over the elements of a PDL:

\$pdl = sequence(3,3);
for \$el( list \$pdl ) {
print \$el,"\n";
}

is an expensive way to print the integers from 0 through 8. The
problem with C<list> is that it loses all dimensional information,
so you might prefer to have more control. Read on!

If you have a collection of same-sized, same-type PDLs, you can
concatenate them with C<cat>:

\$cube = cat @images;

It works the same, in that special case, as the pdl constructor
(in [sub:Global-glomming-/], above). The main difference is that
C<cat> is less forgiving about its input -- so it can help catch
coding or input errors that pdl would not flag. C<cat> is named
after the UNIX shell command of the same name, which concatenates
files.

C<cat>'s companion is C<dog>, which splits a PDL into a list along just
its last dimension, so

@images = dog \$cube

exactly undoes the effect of catting a collection of PDLs
together.

You can extend a PDL along the 0 dimension with C<append>. All other
dimensions must agree, thread-wise, between the original PDL and
what you are appending to it (Remember the threading rules from [sub:Threading-rules]
?). So if you say:

\$two_panel = append \$image_left, \$image_right;

or

\$two_panel = \$image_left->append(\$image_right);

you can combine two images (with the same height) into a single
one. You can append multiple PDLs together, so

\$two_panel = append \$image_left,
zeroes(10,  \$image_left->dim(1)),
\$image_right;

will produce a two-panel image with a 10-pixel-wide dark border
between the two images. You can combine append with the
transposition operators (C<mv>, C<xchg>, and C<reorder>; [sub:Rearranging-a-dim]
) to append along any dimension of the input PDL, but read on:
there is a better way to do that...

=head3 Finer control - use C<glue>

The C<glue> method acts just like C<append>, but along a specified
axis. You can say:

\$two_panel = \$image_left->append(\$image_right);

or, exactly equivalently,

\$two_panel = \$image_left->glue(0,\$image_right);

To get a vertically stacked two-panel image, you can say:

\$two_panel = \$image_left->glue(1,\$image_right);

which eliminates the dimensional gymnastics you would have to do
with C<append>.

Interpolation of values between integer array locations is a
mainstay of scientific computing. PDL has a large number of ways
to do it. The most extensive built-in method is C<interpND>, which
supports many interpolation methods on arbitrarily dimensioned
data sets with a regular grid and can be used in conjunction with
C<ndcoords> (Section [sub:Index-PDLs]) to retool entire data sets at
once.

If you are resampling an entire data set (for example, to
distort, enlarge, or shrink an image, or to change its coordinate
system), you probably don't want simple interpolation at the
locations of your new grid points; PDL has a utility, described
in Chapter [cha:Coordinate-Transformations], to do optimized
resampling under a variety of coordinate transformations.

=head3 Interpolate virtually any regular grid: C<interpND>

The workhorse interpolator for PDL is C<interpND>. It works almost
exactly like the C<indexND> operator described in [sub:IndexND],
except that it allows fractional indices (C<indexND> converts its
index variable to an integer format before using it), and does
not maintain a dataflow connection. It accepts an C<n>-dimensional
source PDL and an index PDL, and collapses the index by lookup
into the source: each row of the index is treated as a pixel
coordinate location to look up in the source. This means that you
can look up an arbitrary collection of points in a single Perl
operation, or (by manipulating an index PDL created, e.g., with
C<ndcoords>) carry out coordinate transformations on data arrays.

Here's an example of simple interpolation of two points from an
image:

pdl> \$image = sequence(10,10) + 100;
pdl> print \$image->interpND(pdl(3,1));
113
pdl> print \$image->interpND(pdl([3,1],[3,2.5]));
[113 128.1]

You can set both the interpolation method and how boundaries are
handled. The default method is linear interpolation, but
sampling, cubic, and even FFT coefficient interpolation are
supported:

pdl> \$coords = pdl([3,1],[3,2.5],[-1,-1]);
pdl> print \$image->interpND(\$coords, {method=>'sample'});
[113 133 100]
pdl> print \$image->interpND(\$coords, {method=>'linear'});
[113 128.1 100]
pdl> print \$image->interpND(\$coords, {method=>'cubic'});
[113 128.1 10]
pdl> print \$image->interpND(\$coords,{method=>'fft'});
[ 113 147.674061 199.00011 ]

(the FFT mode only supports periodic boundary conditions). Other
boundary conditions are:

pdl> print \$image->interpND(\$coords, {bound=>'extend'});
[ 113 128.110 100 ]
pdl> print \$image->interpND(\$coords,{bound=>'truncate'});
[ 113 128.110 0 ]
pdl> print \$image->interpND(\$coords, {bound=>'periodic'});
[ 113 128.1 199 ]
pdl> print \$image->interpND(\$coords, {bound=>'forbid'});
index out-of-bounds in range

=head3 Interpolate on a 1-D irregular grid: interpol, interpolate

For simple linear interpolation on a non-uniform grid, you can
use C<interpolate> or C<interpol>. They will both use linear
interpolation to estimate y values on a piecewise-linear curve
formed by a collection of points C<(x_{i},y_{i})> in the plane. At
the moment, there is no non-uniform interpolator for non-uniform
N-dimensional arrays).

The first routine, C<interpolate>, allows extrapolation as well as
interpolation. It returns a Perl list containing two elements:
the interpolated y values, and an error mask that is zero for
normal points and 1 for points whose x value was outside the
original bounds. You can ignore the error mask by using
C<interpolate> in scalar context. For example:

pdl> \$coords = pdl([1,5],[1.1,6],[2,7],[2.5,0]);
pdl> \$x = xvals(10)*0.25 + 1;
pdl> p \$x
[1 1.25 1.5 1.75 2 2.25 2 2.75 3 3.25]
pdl> (\$y, \$err) = \$x->interpolate(\$coords->((0)), \$coords->((1)));
pdl> print \$y;
[5 6.16667 6.44444 6.72222 7 3.5 0 -3.5 -7 -10.5]
pdl> print \$err;
[0 0 0 0 0 0 0 1 1 1]

If you don't want to extrapolate, you can use C<interpol> instead;
C<interpol> does the same type of interpolation, but crashes if a
value is out of bounds. It only returns a single value - the
vector of interpolated y values.

```