Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

[8a974e]: PDL / Book / Operations.pod Maximize Restore History

Download this file

Operations.pod    403 lines (316 with data), 17.0 kB

=head1 Operating on PDLs

PDL variables are structured (conventional) arrays of numbers. 
The most basic operations on them are the familiar numerical 
operations. All of the familiar C-like mathematical, logical, and 
bitwise operators available within Perl can be used on PDLs.

Because PDLs are multidimensional constructs, the notion of 
I<operator> is generalized from scalars. Each operator has a number 
of I<active dimensions> that are required to do its job, and 
additional dimensions are simply looped over (or, more precisely, 
I<threaded over>; see[sec:Dimensionality-and-Threading]). Most 
simple arithmetic operators have zero active dimensions (they act 
on points). Collapse operators have one active dimension (they 
summarize data over a single dimension, returning a PDL that has 
been collapsed to have one fewer dimensions than it started 
with). A few operators, notably matrix multiplication, have more 
than one active dimension.

=head2 Expressions with PDLs

All of the standard Perl operators apply to PDLs, with the usual 
precedence rules. They are listed in Table [tab:Perl-operators]. 
Most arithmetic operators behave exactly as one would expect 
within Perl, except threaded over the entire PDL. If you supply 
both a PDL and a Perl scalar to any of the two-parameter 
operators (like multiplication), the scalar is promoted to a PDL, 
so the result is a PDL with appropriate threading.

=for html  <img width=400 src="Operations/operator-precedence-table.png">

 Operator Precedence Table for PDL.


=head3 PDLs as boolean values; logicals and masks

PDLs can be used in Perl's I<boolean context>. As in the C and Perl 
environments, values that are zero are treated as boolean FALSE, 
and values that are nonzero are treated as TRUE. The operators 
that return boolean values (like the comparison operators C<==> 
and C<!=>) all return arithmetic 1 or 0, so you can use boolean 
arithmetic to mask out values: 

  pdl> $a = xvals(10);
  pdl> $b = (($a % 2) != 0);
  pdl> print $a * $b; # mask only odd values of $a
  [0 1 0 3 0 5 0 7 0 9]
  pdl> print $a->where($b); # Filter only odd values of $a
  [1 3 5 7 9]

Perl has only two types of boolean operators: C-style 
short-circuiting binary and ternary operators ( C<&&>, C<||>, and C<?:>), 
and bitwise operators. The short-circuiting operators treat 
values in true boolean context (nonzero values are TRUE, zero 
values are FALSE), but may only be used for single-valued PDLs 
(and, of course, Perl scalars). That restriction is because the 
short-circuiting operators don't evaluate their second argument 
if it won't affect the output, and that would yield 
non-deterministic behavior for a PDL expression with many values. 

The bitwise operators may be used on any PDL and operate on each 
bit independently. They are used for generating mask PDLs and for 
other threaded Boolean expressions. The operators that return 
boolean values (like the comparison operators and logical-NOT C<!>) 
return either 0 or 1, so that the bitwise operators give the 
standard Boolean results: e.g.

  pdl> $a = xvals(10);
  pdl> $b = (($a % 2) == 0) | (($a%3)==0));
  pdl> print $a->where( (($a%2)==0) | (($a%3)==0);
  [0 2 3 4 6 8 9]
  pdl> print $a->where( (($a%2)==0) & (($a%3)==0);
  [0 6]

The bitwise operators can have counter-intuitive results if you 
use them with arithmetic values but expect the values to be 
treated as booleans. If you want to treat an arithmetic value as 
a boolean with the bitwise operators, you should convert it to a 
true boolean expression (0 or 1) first. The most terse way to 
accomplish that is to invert it twice: C<!> always returns 0 or 1.

  pdl> $a = xvals(10);
  pdl> print $a & 5 #bitwise mask!
  [0 1 0 1 4 5 4 5 0 1]
  pdl> print $a->where($a & 5); 
  [1 3 4 5 6 7 9]
  pdl> print $a->where((!!$a) & 5); 
  [1 2 3 4 5 6 7 8 9]
  pdl> print $a->where((!!$a) & 4); 
  Empty
  pdl> print $a->where((!!$a) & !!4);
  [1 2 3 4 5 6 7 8 0]


The C<$a & 5> expression performs a bitwise mask of each element of 
C<$a>, keeping the 1's and 4's bit only (5 in binary is C<0101>). The 
second expression, C<(!!$a) & 5>, returns true for every nonzero 
element of C<$a>, because C<!> always returns 0 or 1. The third 
expresion, C<(!!$a) & 4>, returns C<false> for every element, because 
C<!!$a> is always 0 or 1, and 4 is C<0100> in binary: its 1's bit is 
clear. The final expression, C<(!!$a) & !!4>, returns true for every 
element because both sides of the C<&> operator have been converted 
to true boolean values (0 or 1).

=head3 Collapse/reduce: Summarizing by row


The simplest operators have no active dimensions, simply 
threading over their arguments and returning a PDL with the same 
shape. Collapse operators (also called "reduce" operators in the 
documentation) have one active dimension, and remove it -- 
collapsing an N-dimensional PDL into an (C<N-1>)-dimensional PDL. 
Collapsing is useful for summing, averaging, collecting 
statistics, and many other things. Many statistics functions 
exist in both whole-PDL and collapse form, and when they do, the 
whole-PDL version generally has a more abbreviated name -- for 
example, avg gets the mean value of an entire PDL, while C<average> 
only collapses one dimension. 

PDL includes several explicit collapse/reduction operators that 
collapse the 0 dim of the PDL; but you can also perform more 
generalized reductions using the C<reduce> function supplied in the 
ancillary C<PDL::reduce> module included with PDL itself (see [sub:General-purpose-collapse/reduction:-reduce]
, below). C<reduce> also lets you collapse over multiple arbitrarily 
chosen dims, so you don't have to perform gyrations to get the 
target dims into the 0 slot (using, for example, C<mv> and C<clump>).

B<A word of warning on collapse operators:>

Collapse operators are handy, but (currently) do not necessarily 
scale well to huge operations. The operator is applied 
sequentially (the simplest possible method) along the dimension 
being collapsed; this is very general but can yield unintuitive 
results in a few edge cases. For example, summing over 1,000,000 
single-precision floats (that are of approximately equal size) is 
a bad idea, because float only stores 5 significant digits; each 
successive term will underflow the precision. Similarly, adding 10e6
double-precision floats will lose six orders of magnitude of 
precision, for the same reason, so that the result will have at 
most (13 - 6) seven orders of magnitude of precision. Tree 
accumulation is more precise, and may one day be introduced to 
the threading engine if there is sufficient demand. You might be 
the person to demand and/or implement it...

=head3 Arithmetic collapse: C<prodover> and C<sumover>

The C<sumover> and C<prodover> functions perform summing and product, 
respectively, along the 0 dim of a PDL. For example, to take the 
arithmetic mean of each row of C<$pdl>, you can use the expression 

  $pdl->sumover / $pdl->dim(0)

(though you would probably use C<average>, described below, 
instead). Similarly, the geometric mean is 

  $pdl->prodover ** (1/$pdl->dim(0)).

The related functions C<sum> and C<prod> work on the whole of $pdl and 
return a single Perl scalar.

=head3 Logical collapse: C<andover> and C<orover>

The C<andover> and C<orover> operators provide true boolean (not 
bitwise) masking of elements across rows. They are complementary 
to C<all> (for C<andover>) and C<any> (for C<orover>), which work over the 
whole PDL argument and return a single Perl scalar. For example, 
if C<$mask> contains a two-dimensional C<NxM>-PDL bit mask, then 
C<< $mask->any >> returns a Perl scalar true value if any bit at all is 
set in the mask, and C<< $mask->orover >> returns a one-dimensional 
M-PDL mask indicating whether each row contains even a single 
nonzero value.

=head3 Statistical collapse: C<average> and C<statsover>

The C<average> collapse is complementary to C<avg>, which takes the 
average of an entire PDL. For slightly more comprehensive 
statistical work, you can use C<statsover> (complementary to C<stats>), 
which returns many stats (the mean, median, RMS, min, max, 
standard deviation, and population RMS) as separate PDLs in list 
context; see the online documentation for more details.

=head3 General purpose collapse/reduction: C<reduce>

PDL comes with an ancillary module, C<PDL::reduce>, that supplies a 
general-purpose C<reduce> function. C<reduce> is useful for collapsing 
many dims at once, or single dims other than the 0 dim of a PDL, 
without having to rearrange the dim list of the source. For 
example, if C<$mask> contains a two-dimensional NxM-PDL bit mask, 
then C<< $mask->orover >> returns an M-PDL mask indicating whether 
each row contains a single set bit, but C<< $mask->reduce('or',1) >>.
will return an N-PDL mask indicating whether each column contains 
a single set bit. (Otherwise you'd have to say something like 
C<< $mask->mv(1,0)->orover >>). You must explicitly use C<PDL::reduce>
before you can invoke the reduce operator. 

=head2 Combination operators: PDLs and Perl lists

Perl arrays/lists are different than PDLs, but often you will 
want to assemble a PDL from a list of values (or other PDLs), or 
to break apart a PDL into a list of components. For example, if 
you have a list of images that you've read in with C<rim> or C<rfits> 
(Chap. [cha:File-I/O]), you can stick them together into a single 
image cube. Alternatively, you can slice up large chunks of data 
into lists of smaller chunks, that are appropriate for looping 
over. 

=head3 Global glomming / shredding: pdl and list

The PDL constructor, C<pdl()>, can combine multiple PDLs into one, 
glomming a bunch of stuff together into a single data array. For 
example, if you have a collection of images you've read into a 
perl list, you can make them into a data cube as follows:

  for $name(@fnames) { push(@images, rim($name)); } # Read some images
  $cube = pdl(@images); # $cube is ( Wmax x Hmax x n )

You can mix-and-match different types or shapes of data, and the 
resulting PDL will be large enough to contain everything. Unless 
you specify a data type (see [sec:Getting-values-into]), the 
resulting PDL will be promoted to the most general numeric type 
in the input.

The inverse of C<pdl()> is C<list>, which will break all the elements 
out of a PDL and return them as a Perl array containing all the 
elements of the original. For example, list is useful if you have 
to loop over the elements of a PDL:

  $pdl = sequence(3,3);
  for $el( list $pdl ) {
      print $el,"\n";
  }

is an expensive way to print the integers from 0 through 8. The 
problem with C<list> is that it loses all dimensional information, 
so you might prefer to have more control. Read on!

=head3 Gathering/scattering: C<cat> and C<dog>

If you have a collection of same-sized, same-type PDLs, you can 
concatenate them with C<cat>:

  $cube = cat @images;

It works the same, in that special case, as the pdl constructor 
(in [sub:Global-glomming-/], above). The main difference is that 
C<cat> is less forgiving about its input -- so it can help catch 
coding or input errors that pdl would not flag. C<cat> is named 
after the UNIX shell command of the same name, which concatenates 
files. 

C<cat>'s companion is C<dog>, which splits a PDL into a list along just 
its last dimension, so 

  @images = dog $cube

exactly undoes the effect of catting a collection of PDLs 
together. 

=head3 Extending a PDL: C<append>

You can extend a PDL along the 0 dimension with C<append>. All other 
dimensions must agree, thread-wise, between the original PDL and 
what you are appending to it (Remember the threading rules from [sub:Threading-rules]
?). So if you say:

  $two_panel = append $image_left, $image_right;

or

  $two_panel = $image_left->append($image_right);

you can combine two images (with the same height) into a single 
one. You can append multiple PDLs together, so

  $two_panel = append $image_left, 
                    zeroes(10,  $image_left->dim(1)), 
                    $image_right;

will produce a two-panel image with a 10-pixel-wide dark border 
between the two images. You can combine append with the 
transposition operators (C<mv>, C<xchg>, and C<reorder>; [sub:Rearranging-a-dim]
) to append along any dimension of the input PDL, but read on: 
there is a better way to do that...

=head3 Finer control - use C<glue>


The C<glue> method acts just like C<append>, but along a specified 
axis. You can say:

  $two_panel = $image_left->append($image_right);

or, exactly equivalently,

  $two_panel = $image_left->glue(0,$image_right);

To get a vertically stacked two-panel image, you can say:

  $two_panel = $image_left->glue(1,$image_right);

which eliminates the dimensional gymnastics you would have to do 
with C<append>.

=head2 Interpolation


Interpolation of values between integer array locations is a 
mainstay of scientific computing. PDL has a large number of ways 
to do it. The most extensive built-in method is C<interpND>, which 
supports many interpolation methods on arbitrarily dimensioned 
data sets with a regular grid and can be used in conjunction with 
C<ndcoords> (Section [sub:Index-PDLs]) to retool entire data sets at 
once.

If you are resampling an entire data set (for example, to 
distort, enlarge, or shrink an image, or to change its coordinate 
system), you probably don't want simple interpolation at the 
locations of your new grid points; PDL has a utility, described 
in Chapter [cha:Coordinate-Transformations], to do optimized 
resampling under a variety of coordinate transformations.

=head3 Interpolate virtually any regular grid: C<interpND>

The workhorse interpolator for PDL is C<interpND>. It works almost 
exactly like the C<indexND> operator described in [sub:IndexND], 
except that it allows fractional indices (C<indexND> converts its 
index variable to an integer format before using it), and does 
not maintain a dataflow connection. It accepts an C<n>-dimensional 
source PDL and an index PDL, and collapses the index by lookup 
into the source: each row of the index is treated as a pixel 
coordinate location to look up in the source. This means that you 
can look up an arbitrary collection of points in a single Perl 
operation, or (by manipulating an index PDL created, e.g., with 
C<ndcoords>) carry out coordinate transformations on data arrays. 

Here's an example of simple interpolation of two points from an 
image:

  pdl> $image = sequence(10,10) + 100;
  pdl> print $image->interpND(pdl(3,1));
  113
  pdl> print $image->interpND(pdl([3,1],[3,2.5]));
  [113 128.1]

You can set both the interpolation method and how boundaries are 
handled. The default method is linear interpolation, but 
sampling, cubic, and even FFT coefficient interpolation are 
supported: 

  pdl> $coords = pdl([3,1],[3,2.5],[-1,-1]);
  pdl> print $image->interpND($coords, {method=>'sample'});
  [113 133 100]
  pdl> print $image->interpND($coords, {method=>'linear'});
  [113 128.1 100]
  pdl> print $image->interpND($coords, {method=>'cubic'});
  [113 128.1 10]
  pdl> print $image->interpND($coords,{method=>'fft'});
  [ 113 147.674061 199.00011 ]

(the FFT mode only supports periodic boundary conditions). Other 
boundary conditions are:

  pdl> print $image->interpND($coords, {bound=>'extend'});
  [ 113 128.110 100 ]
  pdl> print $image->interpND($coords,{bound=>'truncate'});
  [ 113 128.110 0 ]
  pdl> print $image->interpND($coords, {bound=>'periodic'});
  [ 113 128.1 199 ]
  pdl> print $image->interpND($coords, {bound=>'forbid'});
  index out-of-bounds in range

=head3 Interpolate on a 1-D irregular grid: interpol, interpolate

For simple linear interpolation on a non-uniform grid, you can 
use C<interpolate> or C<interpol>. They will both use linear 
interpolation to estimate y values on a piecewise-linear curve 
formed by a collection of points C<(x_{i},y_{i})> in the plane. At 
the moment, there is no non-uniform interpolator for non-uniform 
N-dimensional arrays). 

The first routine, C<interpolate>, allows extrapolation as well as 
interpolation. It returns a Perl list containing two elements: 
the interpolated y values, and an error mask that is zero for 
normal points and 1 for points whose x value was outside the 
original bounds. You can ignore the error mask by using 
C<interpolate> in scalar context. For example:

  pdl> $coords = pdl([1,5],[1.1,6],[2,7],[2.5,0]);
  pdl> $x = xvals(10)*0.25 + 1;
  pdl> p $x
  [1 1.25 1.5 1.75 2 2.25 2 2.75 3 3.25]
  pdl> ($y, $err) = $x->interpolate($coords->((0)), $coords->((1)));
  pdl> print $y;
  [5 6.16667 6.44444 6.72222 7 3.5 0 -3.5 -7 -10.5]
  pdl> print $err;
  [0 0 0 0 0 0 0 1 1 1]

If you don't want to extrapolate, you can use C<interpol> instead; 
C<interpol> does the same type of interpolation, but crashes if a 
value is out of bounds. It only returns a single value - the 
vector of interpolated y values.