=head1 Operating on PDLs PDL variables are structured (conventional) arrays of numbers. The most basic operations on them are the familiar numerical operations. All of the familiar C-like mathematical, logical, and bitwise operators available within Perl can be used on PDLs. Because PDLs are multidimensional constructs, the notion of I<operator> is generalized from scalars. Each operator has a number of I<active dimensions> that are required to do its job, and additional dimensions are simply looped over (or, more precisely, I<threaded over>; see[sec:Dimensionality-and-Threading]). Most simple arithmetic operators have zero active dimensions (they act on points). Collapse operators have one active dimension (they summarize data over a single dimension, returning a PDL that has been collapsed to have one fewer dimensions than it started with). A few operators, notably matrix multiplication, have more than one active dimension. =head2 Expressions with PDLs All of the standard Perl operators apply to PDLs, with the usual precedence rules. They are listed in Table [tab:Perl-operators]. Most arithmetic operators behave exactly as one would expect within Perl, except threaded over the entire PDL. If you supply both a PDL and a Perl scalar to any of the two-parameter operators (like multiplication), the scalar is promoted to a PDL, so the result is a PDL with appropriate threading. =for html <img width=400 src="Operations/operator-precedence-table.png"> Operator Precedence Table for PDL. =head3 PDLs as boolean values; logicals and masks PDLs can be used in Perl's I<boolean context>. As in the C and Perl environments, values that are zero are treated as boolean FALSE, and values that are nonzero are treated as TRUE. The operators that return boolean values (like the comparison operators C<==> and C<!=>) all return arithmetic 1 or 0, so you can use boolean arithmetic to mask out values: pdl> $a = xvals(10); pdl> $b = (($a % 2) != 0); pdl> print $a * $b; # mask only odd values of $a [0 1 0 3 0 5 0 7 0 9] pdl> print $a->where($b); # Filter only odd values of $a [1 3 5 7 9] Perl has only two types of boolean operators: C-style short-circuiting binary and ternary operators ( C<&&>, C<||>, and C<?:>), and bitwise operators. The short-circuiting operators treat values in true boolean context (nonzero values are TRUE, zero values are FALSE), but may only be used for single-valued PDLs (and, of course, Perl scalars). That restriction is because the short-circuiting operators don't evaluate their second argument if it won't affect the output, and that would yield non-deterministic behavior for a PDL expression with many values. The bitwise operators may be used on any PDL and operate on each bit independently. They are used for generating mask PDLs and for other threaded Boolean expressions. The operators that return boolean values (like the comparison operators and logical-NOT C<!>) return either 0 or 1, so that the bitwise operators give the standard Boolean results: e.g. pdl> $a = xvals(10); pdl> $b = (($a % 2) == 0) | (($a%3)==0)); pdl> print $a->where( (($a%2)==0) | (($a%3)==0); [0 2 3 4 6 8 9] pdl> print $a->where( (($a%2)==0) & (($a%3)==0); [0 6] The bitwise operators can have counter-intuitive results if you use them with arithmetic values but expect the values to be treated as booleans. If you want to treat an arithmetic value as a boolean with the bitwise operators, you should convert it to a true boolean expression (0 or 1) first. The most terse way to accomplish that is to invert it twice: C<!> always returns 0 or 1. pdl> $a = xvals(10); pdl> print $a & 5 #bitwise mask! [0 1 0 1 4 5 4 5 0 1] pdl> print $a->where($a & 5); [1 3 4 5 6 7 9] pdl> print $a->where((!!$a) & 5); [1 2 3 4 5 6 7 8 9] pdl> print $a->where((!!$a) & 4); Empty pdl> print $a->where((!!$a) & !!4); [1 2 3 4 5 6 7 8 0] The C<$a & 5> expression performs a bitwise mask of each element of C<$a>, keeping the 1's and 4's bit only (5 in binary is C<0101>). The second expression, C<(!!$a) & 5>, returns true for every nonzero element of C<$a>, because C<!> always returns 0 or 1. The third expresion, C<(!!$a) & 4>, returns C<false> for every element, because C<!!$a> is always 0 or 1, and 4 is C<0100> in binary: its 1's bit is clear. The final expression, C<(!!$a) & !!4>, returns true for every element because both sides of the C<&> operator have been converted to true boolean values (0 or 1). =head3 Collapse/reduce: Summarizing by row The simplest operators have no active dimensions, simply threading over their arguments and returning a PDL with the same shape. Collapse operators (also called "reduce" operators in the documentation) have one active dimension, and remove it -- collapsing an N-dimensional PDL into an (C<N-1>)-dimensional PDL. Collapsing is useful for summing, averaging, collecting statistics, and many other things. Many statistics functions exist in both whole-PDL and collapse form, and when they do, the whole-PDL version generally has a more abbreviated name -- for example, avg gets the mean value of an entire PDL, while C<average> only collapses one dimension. PDL includes several explicit collapse/reduction operators that collapse the 0 dim of the PDL; but you can also perform more generalized reductions using the C<reduce> function supplied in the ancillary C<PDL::reduce> module included with PDL itself (see [sub:General-purpose-collapse/reduction:-reduce] , below). C<reduce> also lets you collapse over multiple arbitrarily chosen dims, so you don't have to perform gyrations to get the target dims into the 0 slot (using, for example, C<mv> and C<clump>). B<A word of warning on collapse operators:> Collapse operators are handy, but (currently) do not necessarily scale well to huge operations. The operator is applied sequentially (the simplest possible method) along the dimension being collapsed; this is very general but can yield unintuitive results in a few edge cases. For example, summing over 1,000,000 single-precision floats (that are of approximately equal size) is a bad idea, because float only stores 5 significant digits; each successive term will underflow the precision. Similarly, adding 10e6 double-precision floats will lose six orders of magnitude of precision, for the same reason, so that the result will have at most (13 - 6) seven orders of magnitude of precision. Tree accumulation is more precise, and may one day be introduced to the threading engine if there is sufficient demand. You might be the person to demand and/or implement it... =head3 Arithmetic collapse: C<prodover> and C<sumover> The C<sumover> and C<prodover> functions perform summing and product, respectively, along the 0 dim of a PDL. For example, to take the arithmetic mean of each row of C<$pdl>, you can use the expression $pdl->sumover / $pdl->dim(0) (though you would probably use C<average>, described below, instead). Similarly, the geometric mean is $pdl->prodover ** (1/$pdl->dim(0)). The related functions C<sum> and C<prod> work on the whole of $pdl and return a single Perl scalar. =head3 Logical collapse: C<andover> and C<orover> The C<andover> and C<orover> operators provide true boolean (not bitwise) masking of elements across rows. They are complementary to C<all> (for C<andover>) and C<any> (for C<orover>), which work over the whole PDL argument and return a single Perl scalar. For example, if C<$mask> contains a two-dimensional C<NxM>-PDL bit mask, then C<< $mask->any >> returns a Perl scalar true value if any bit at all is set in the mask, and C<< $mask->orover >> returns a one-dimensional M-PDL mask indicating whether each row contains even a single nonzero value. =head3 Statistical collapse: C<average> and C<statsover> The C<average> collapse is complementary to C<avg>, which takes the average of an entire PDL. For slightly more comprehensive statistical work, you can use C<statsover> (complementary to C<stats>), which returns many stats (the mean, median, RMS, min, max, standard deviation, and population RMS) as separate PDLs in list context; see the online documentation for more details. =head3 General purpose collapse/reduction: C<reduce> PDL comes with an ancillary module, C<PDL::reduce>, that supplies a general-purpose C<reduce> function. C<reduce> is useful for collapsing many dims at once, or single dims other than the 0 dim of a PDL, without having to rearrange the dim list of the source. For example, if C<$mask> contains a two-dimensional NxM-PDL bit mask, then C<< $mask->orover >> returns an M-PDL mask indicating whether each row contains a single set bit, but C<< $mask->reduce('or',1) >>. will return an N-PDL mask indicating whether each column contains a single set bit. (Otherwise you'd have to say something like C<< $mask->mv(1,0)->orover >>). You must explicitly use C<PDL::reduce> before you can invoke the reduce operator. =head2 Combination operators: PDLs and Perl lists Perl arrays/lists are different than PDLs, but often you will want to assemble a PDL from a list of values (or other PDLs), or to break apart a PDL into a list of components. For example, if you have a list of images that you've read in with C<rim> or C<rfits> (Chap. [cha:File-I/O]), you can stick them together into a single image cube. Alternatively, you can slice up large chunks of data into lists of smaller chunks, that are appropriate for looping over. =head3 Global glomming / shredding: pdl and list The PDL constructor, C<pdl()>, can combine multiple PDLs into one, glomming a bunch of stuff together into a single data array. For example, if you have a collection of images you've read into a perl list, you can make them into a data cube as follows: for $name(@fnames) { push(@images, rim($name)); } # Read some images $cube = pdl(@images); # $cube is ( Wmax x Hmax x n ) You can mix-and-match different types or shapes of data, and the resulting PDL will be large enough to contain everything. Unless you specify a data type (see [sec:Getting-values-into]), the resulting PDL will be promoted to the most general numeric type in the input. The inverse of C<pdl()> is C<list>, which will break all the elements out of a PDL and return them as a Perl array containing all the elements of the original. For example, list is useful if you have to loop over the elements of a PDL: $pdl = sequence(3,3); for $el( list $pdl ) { print $el,"\n"; } is an expensive way to print the integers from 0 through 8. The problem with C<list> is that it loses all dimensional information, so you might prefer to have more control. Read on! =head3 Gathering/scattering: C<cat> and C<dog> If you have a collection of same-sized, same-type PDLs, you can concatenate them with C<cat>: $cube = cat @images; It works the same, in that special case, as the pdl constructor (in [sub:Global-glomming-/], above). The main difference is that C<cat> is less forgiving about its input -- so it can help catch coding or input errors that pdl would not flag. C<cat> is named after the UNIX shell command of the same name, which concatenates files. C<cat>'s companion is C<dog>, which splits a PDL into a list along just its last dimension, so @images = dog $cube exactly undoes the effect of catting a collection of PDLs together. =head3 Extending a PDL: C<append> You can extend a PDL along the 0 dimension with C<append>. All other dimensions must agree, thread-wise, between the original PDL and what you are appending to it (Remember the threading rules from [sub:Threading-rules] ?). So if you say: $two_panel = append $image_left, $image_right; or $two_panel = $image_left->append($image_right); you can combine two images (with the same height) into a single one. You can append multiple PDLs together, so $two_panel = append $image_left, zeroes(10, $image_left->dim(1)), $image_right; will produce a two-panel image with a 10-pixel-wide dark border between the two images. You can combine append with the transposition operators (C<mv>, C<xchg>, and C<reorder>; [sub:Rearranging-a-dim] ) to append along any dimension of the input PDL, but read on: there is a better way to do that... =head3 Finer control - use C<glue> The C<glue> method acts just like C<append>, but along a specified axis. You can say: $two_panel = $image_left->append($image_right); or, exactly equivalently, $two_panel = $image_left->glue(0,$image_right); To get a vertically stacked two-panel image, you can say: $two_panel = $image_left->glue(1,$image_right); which eliminates the dimensional gymnastics you would have to do with C<append>. =head2 Interpolation Interpolation of values between integer array locations is a mainstay of scientific computing. PDL has a large number of ways to do it. The most extensive built-in method is C<interpND>, which supports many interpolation methods on arbitrarily dimensioned data sets with a regular grid and can be used in conjunction with C<ndcoords> (Section [sub:Index-PDLs]) to retool entire data sets at once. If you are resampling an entire data set (for example, to distort, enlarge, or shrink an image, or to change its coordinate system), you probably don't want simple interpolation at the locations of your new grid points; PDL has a utility, described in Chapter [cha:Coordinate-Transformations], to do optimized resampling under a variety of coordinate transformations. =head3 Interpolate virtually any regular grid: C<interpND> The workhorse interpolator for PDL is C<interpND>. It works almost exactly like the C<indexND> operator described in [sub:IndexND], except that it allows fractional indices (C<indexND> converts its index variable to an integer format before using it), and does not maintain a dataflow connection. It accepts an C<n>-dimensional source PDL and an index PDL, and collapses the index by lookup into the source: each row of the index is treated as a pixel coordinate location to look up in the source. This means that you can look up an arbitrary collection of points in a single Perl operation, or (by manipulating an index PDL created, e.g., with C<ndcoords>) carry out coordinate transformations on data arrays. Here's an example of simple interpolation of two points from an image: pdl> $image = sequence(10,10) + 100; pdl> print $image->interpND(pdl(3,1)); 113 pdl> print $image->interpND(pdl([3,1],[3,2.5])); [113 128.1] You can set both the interpolation method and how boundaries are handled. The default method is linear interpolation, but sampling, cubic, and even FFT coefficient interpolation are supported: pdl> $coords = pdl([3,1],[3,2.5],[-1,-1]); pdl> print $image->interpND($coords, {method=>'sample'}); [113 133 100] pdl> print $image->interpND($coords, {method=>'linear'}); [113 128.1 100] pdl> print $image->interpND($coords, {method=>'cubic'}); [113 128.1 10] pdl> print $image->interpND($coords,{method=>'fft'}); [ 113 147.674061 199.00011 ] (the FFT mode only supports periodic boundary conditions). Other boundary conditions are: pdl> print $image->interpND($coords, {bound=>'extend'}); [ 113 128.110 100 ] pdl> print $image->interpND($coords,{bound=>'truncate'}); [ 113 128.110 0 ] pdl> print $image->interpND($coords, {bound=>'periodic'}); [ 113 128.1 199 ] pdl> print $image->interpND($coords, {bound=>'forbid'}); index out-of-bounds in range =head3 Interpolate on a 1-D irregular grid: interpol, interpolate For simple linear interpolation on a non-uniform grid, you can use C<interpolate> or C<interpol>. They will both use linear interpolation to estimate y values on a piecewise-linear curve formed by a collection of points C<(x_{i},y_{i})> in the plane. At the moment, there is no non-uniform interpolator for non-uniform N-dimensional arrays). The first routine, C<interpolate>, allows extrapolation as well as interpolation. It returns a Perl list containing two elements: the interpolated y values, and an error mask that is zero for normal points and 1 for points whose x value was outside the original bounds. You can ignore the error mask by using C<interpolate> in scalar context. For example: pdl> $coords = pdl([1,5],[1.1,6],[2,7],[2.5,0]); pdl> $x = xvals(10)*0.25 + 1; pdl> p $x [1 1.25 1.5 1.75 2 2.25 2 2.75 3 3.25] pdl> ($y, $err) = $x->interpolate($coords->((0)), $coords->((1))); pdl> print $y; [5 6.16667 6.44444 6.72222 7 3.5 0 -3.5 -7 -10.5] pdl> print $err; [0 0 0 0 0 0 0 1 1 1] If you don't want to extrapolate, you can use C<interpol> instead; C<interpol> does the same type of interpolation, but crashes if a value is out of bounds. It only returns a single value - the vector of interpolated y values.