=head1 Selection and Location in PDLs Indexing and manipulating pieces of arrays is central to many PDL operations. Slicing, dicing, and indexing are selection operations -- they select (or extract) subfields from a source array and arrange them for use by other operations. Slicing is the act of selecting affine chunks -- linear or rectangular N-dimensional subfields that are regularly sampled; normal array subfields are called âslices. Dicing is similar but without the affine constraint: selection of an arbitrary set of locations along one or more axes of an array. Indexing is the selection of a completely arbitrary collection of elements from an array. PDL treats selection operators slightly differently from most other array languages. Array selections, including slices, dices, and indexed selections, maintain their connection to the original parent variable unless they are explicitly severed (via the copy or sever operators). This is possible because PDL distinguishes the operations of global assignment (C<=>) and computed assignment (C<.=>) (See Section [sec:Controlling-Dataflow:-copy]). That behavior lets you represent your data multiple ways simultaneously, depending on which form is most convenient. Slicing, dicing, and indexing are so basic to data extraction and manipulation that PDL slightly modifies the syntax of Perl to make these operations more convenient. The modified syntax is called C<NiceSlice> syntax, and you can enable it with the Perl command C<use PDL::NiceSlice>. Slicing syntax and methods are described in detail in Section [sec:Selection-Operators], below. The opposite of selection is location, which generates indices where a particular condition is true in an array. PDL has several location operators, including the unique C<where> operator that selects corresponding elements from related arrays. These operations are described in Section [sec:Location-Operators]. =head2 A quick tour of selection Here is a simple example that illustrates some of the selection and indexing operations in PDL. Consider a color image of a star field: $starfield = rim('starfield.fits'); might read in the star field as a 1000x1000x3 image. Then $subfield=$starfield->(500:599,500:599); is a 100x100x3 subfield of the original image, and $red = $starfield->(:,:,(0)); $blue = $starfield->(:,:,(1)); $green = $starfield->(:,:,(2)); lets you access the individual color planes as 1000x1000 PDLs (the parentheses around the C<(0)>, C<(1)>, and C<(2)> indicate that the final dimension is to be dropped -- without the parentheses you'd get three 1000x1000x1 PDLs). You can then change the color balance, for example, by modifying the red color plane: $red *= 2; will affect not just the variable C<$red>, but also the original C<$starfield> too (and C<$subfield> and any other selection you have made from C<$starfield>). The selections are merely different representations of the original data in C<$starfield>. To make a separate PDL you can make an explicit copy in the initial assignment, as in: $red = $starfield->(:,:,(0))->copy; or, after the fact, use the sever method on C<$red> ( [sec:Controlling-Dataflow:-copy] ): $red->sever; $red /= 2; will not affect C<$starfield> or C<$stars>, because sever breaks the connection between C<$red> and its source PDL C<$starfield> even after the initial assignment. If you have a list of star locations as a 2xn PDL (called, say, C<$xylist>), you can extract a subfield around each star all at once: $stars = $starfield->range($xylist - 5, 11, 'truncate'); will return an nx11x11x3 PDL that contains an 11x11-pixel subfield centered around each star. That is handy if you want to do the same thing to the neighborhood around each star -- for example, $stars->mv(0,3) *= rvals(11,11) + 0.1; will amplify the tail of the brightness distribution around each star: the C<mv(0,3)> shifts the color-plane index out of the active dimension at the beginning of the dim list, to a thread dimension at the end, making an 11x11x3xn array. The C<rvals> routine creates an 11x11 PDL whose elements contain distance (in pixels) from the center of the image, so the region around each star is amplified far from the central star, and the central star itself is reduced in brightness. The opposite of selection is location. Here's an example of how to use location to generate an C<$xylist> to find all the red stars. $starthresh = 500; $red_simple_xy = indexND( $red >= $starthresh ); That makes C<$red_simple_xy> a C<2xn> list of all the pixel coordinates for which the red color plane exceeds some brightness threshold. One minor problem is that C<$red_simple_xy> may contain multiple entries for a single star, if that star has more than one pixel brighter than the threshold. One solution is to find only local maxima in the image. You can use range to extract the region around each pixel in the entire image, and then use the threading engine to find which pixels are local maxima: $ndc = ndcoords(3,3)-1; $starthresh = 500; $redmax = $red > $starthresh and $red == $red-> range( $ndc, [$red->dims], 't')-> clump(2)-> maximum; $red_xy = indexND($redmax); Here, C<ndcoords> returns a C<2x3x3> index array, each row of which is a vector containing the coordinates of that row in dims 1 and 2. The C<range> call returns a C<3x3x1000x1000> array; clumping the first two dims yields a C<9x1000x1000> array, which is reduced to a C<1000x1000> array by the maximum call. Thus the right hand side of the C<==> is an image, each pixel of which has the value of the brightest pixel in its C<3x3> neighborhood within C<$red>, so C<$redmax> gets a Boolean image with true pixels wherever C<$red> exceeds the threshold and is also a local maximum. Finally, the C<indexND> operator returns a C<2xn> array containing the locations of all the true pixels in C<$redmax>. =head2 Selection Operators PDL is extremely flexible in its ability to reshape, cut up, reconstruct, and represent data in multiple ways. Most vectorized languages feature a way to cut slices out of a large array and copy them to a new variable; PDL goes one step farther, by allowing you to represent the original data in multiple ways simultaneously. Conceptually, a slice, index, or transpose of an array B<remains attached to the original array> unless you explicitly sever it. That connection is referred to as I<dataflow>, because data flows between the original PDL and its children. The basic slicing syntax in PDL is supplied with the special module C<PDL::NiceSlice>, which modifies the way the Perl compiler parses your script, to add new syntax for slicing. Slicing, dicing (selection of particular rows/columns), indexing (selection of particular elements), and ranging (selection of an arbitrary collection of slices) are all supported. =head3 NiceSlice - array subfield syntax Subfields of a PDL are selected with the C<NiceSlice> operator, which takes two forms: juxtaposed and null method. The juxtaposed syntax looks like this: C<< $a(<slicing-stuff>) >>, while the null method syntax looks like this: C<< $a->(<slicing stuff>) >>. The juxtaposed syntax only works on variables; the null method syntax works on both variables and expressions that return a PDL, as in C<< $a->sumover->(3) >>. The C<< <slicing-stuff> >> is a comma-separated list of slice specifiers, as in C<< $a->(3:5,(4),$b,*2) >>. Each slice specifier indicates what should happen to the corresponding dimension of the output, as follows: =over =item * B< C<n> > - a lone number means that the single corresponding generalized row of C<$a> is used, making this a trivial dim (of size 1). For example, if C<$a> is a 3x4-PDL, then C<< $a->(1) >> is a 1x4-PDL. =item * B< C<(n)> > - a lone number (or single-element PDL) in parentheses means that the single corresponding generalized row of C<$a> is used, but that dimension (which is trivial -- it has a size of just 1) is omitted from the output dim list. For example, if C<$a> is a 3x4-PDL , then C<< $a->(1) >> is a 1x4-PDL and C<< $a->((1)) >> is a 4-PDL. =item * B< C<$pdl> > - a PDL with 1 or more elements uses the corresponding generalized rows of C<$a>, in the same dimensional structure as the PDL. For example, C<< $a=sequence(5); $b=pdl(4,1); print $a->($b); >> prints C<[ 4 1 ]>. =item * B< C<n:m> > - two numbers (or variables) separated by a colon is a range to include from the corresponding dimension. Negative numbers are interpreted modulo the last element, so (e.g.) C<2:-1> grabs everything from the third element to the last one. =item * B< C<n:m:s> > - three numbers separated by two colons is an affine range: the C<s> is a step value, allowing sparse slices through the source PDL. Negative values of C<s> step backwards, so (for example) C<-1:0:-1> reverses the order of the elements along a particular dimension. =item * B< C<:> > - uses the whole corresponding dimension =item * B< C<*n> > - inserts a dummy dimension of the given size. =back =head3 NiceSlice Examples Here are some interactive examples of how to use NiceSlice, in the perldl shell: pdl> $a=xvals(5,4)+10*yvals(5,4); print $a; [ [ 0 1 2 3 4] [10 11 12 13 14] [20 21 22 23 24] [30 31 32 33 34] ] pdl> print $a->(:,2); [ [20 21 22 23 24] ] pdl> print $a->(:,(2)); [20 21 22 23 24] pdl> print $a->(0:-1:2,(0)); [0 2 4] pdl> $a->(0:-1:2,(0)) .= 99; pdl> print $a->(0:-1:2,(0)); [99 99 99] pdl> print $a->(:,(0)); [99 1 99 3 99] pdl> $b = pdl(3,4); print $a->($b,(1)); [13 14] pdl> print $a->((2),(3),*4); [32 32 32 32] B< I< A warning > > Nice slicing is, well, very nice -- but it does have some warts because of how Perl 5 implements language modifications. In particular, if you use the nice slice syntax in any file, script, or perl module, you need to include the command C<use PDL::NiceSlice;> somewhere near the top of the file, to ensure that the file is parsed correctly. The C<PDL::NiceSlice> module will preprocess your code on-the-fly, identify nice slicing syntax, and convert it to a normal Perl method call to the method C<nslice> , before Perl can parse it. This normally works well, but because Perl's quoting syntax is so complicated, C<PDL::NiceSlice> doesn't properly recognize most quote constructs. So saying C<print "myval is $val ($units)\n";> will give you something different than you want. You can avoid that by not using as much string interpolation: C<print "myval is $val (".$units.")";> or C<printf "myval is %s (%s)",$val,$units;> should work fine. You can also shut off nice slicing with C<no PDL::NiceSlice;>, and resume by using it again just after your quote. =head3 Slice - string-controlled subfields of a PDL The C<slice> method works almost exactly like C<NiceSlice>, except that it accepts a single string that contains the arguments. The string should consist of the same arguments that you would pass to C<NiceSlice>, with the exception of PDL indexing. Only numeric values and ranges are accepted. C<slice> was once the main way to create subarrays of PDLs, but once C<NiceSlice> became available it is mainly kept around for legacy reasons. =head3 Dice - pull arbitrary rows from a PDL The C<dice> method performs the function of PDL indexing with C<NiceSlice>: it allows you to pull arbitrary collections of generalized rows from a source PDL. Dicing with C<dice> is deprecated, because the C<NiceSlice> syntax (or even C<slice>) is preferred. =head3 Index - select elements from a 1-D PDL This is used for extracting arbitrary elements from a 1-D PDL. For example: pdl> $a = xvals(100); print $a->index(pdl(43,10,21)); [43 10 21] The counterpart of C<index> is which, which extracts indices from a 1-D PDL wherever a particular condition is met (see [sub:which]). =head3 IndexND - select elements from an N-D PDL You can extract and manipulate an arbitrary collection of elements from an C<N>-dimensional PDL with C<indexND>. C<IndexND> is a reduce operator: it collapses an index PDL by one dimension, using the vector in each row to look up a single value in a source PDL. Each row of the index PDL is treated as a vector that indexes an element of the source PDL, and you get back the collection of locations pointed to by the index. That makes C<indexND> a reduce operator on the index PDL. C<indexND> is handy both for extracting data and for marking the source data set via dataflow: if you have a collection of image coordinates as a C<2xN> PDL, you can assign to the index PDL and mark the original image. C<IndexND> can accept and handle boundary conditions, in case your index might run off the edge of the source PDL - see the write up for C<range>, below, for details. pdl> $a = xvals(5,4)+10*yvals(5,4); print $a; [ [ 0 1 2 3 4] [10 11 12 13 14] [20 21 22 23 24] [30 31 32 33 34] ] pdl> $idx = pdl([[2,3],[4,3]], ..( > [[0,0],[0,1]], ..( > [[0,2],[3,3]], ..( > [[1,3],[0,3]]); pdl> print ($b = $a->indexND($idx)); [ [32 34] [ 0 10] [20 33] [31 30] ] pdl> $b .= 99; print $a; [ [99 1 2 3 4] [99 11 12 13 14] [99 21 22 23 24] [99 99 99 99 99] ] The C<indexND> call returns the elements addressed in each row of C<$idx>. C<$idx> is a C<2x2x4> PDL, so the elements are returned as a C<2x4> PDL. They remain connected to C<$a>, so setting them updates the elements of C<$a>. C<IndexND> is implemented via a convenience interface to the slightly more general range; please read the discussion of range, below, for more information on the limits of C<indexND>. If you want to interpolate values from arbitrary locations, you should look for C<interpND>, which is discussed in Chapter [cha:Basic-mathematics]. =head3 Range - select subfields from an N-D PDL The most general selection operator in PDL is C<range>, which selects an arbitrary collection of subfields from the original PDL and returns them collated in a form suitable for threading. It is useful for interpolation, convolution, averaging, marking arbitrary locations in an original data set, or performing local operations at a set of arbitrary locations in a data set. C<range> works similarly to C<indexND> (above), except that each indexed location can refer not only to a scalar but also to an C<N>-D rectangular subfield of the original source array. This is handy, for example, for vectorizing some types of image processing: it is possible to "stack up" subregions of a large data set for threaded processing by a vectorized algorithm. You call range with a source PDL and an index, just like C<indexND> -- but two optional arguments can follow -- a I<size array>, and a I<boundary condition>: $out = $source->range($index, $size, $boundary); will extract a collection of ranges from C<$index>, and return them in C<$out>. The C<$index> must have at least one dimension, and each row of C<$index> is treated as a single vector pointing at a particular value in C<$source>. If you specify a single index location as a row vector, then range is essentially an expensive slice, with controllable boundary conditions. If C<$index>'s 0th dimension has size higher than the number of dimensions in C<$source>, then C<$source> is treated as though it had trivial dummy dimensions of size 1, up to the required number to be indexed by C<$index> -- so if your source array is 1-D and your index array is a list of 3-vectors, you get B<two> dummy dimensions of size 1 on the end of your source array. B<Range sizes> The C<$size> field allows you to extract C<N>-D rectangular ranges from C<$source>. If C<$size> is undef or zero, then you get a single sample out of C<$source> for each row of C<$index>. This behavior is similar to C<indexND>. If C<$size> is positive then you get a range of values from C<$source> at each location, and the output has extra dimensions allocated for them. C<$size> can be a scalar, in which case it applies to all dimensions, or an N-vector, in which case each element is applied independently to the corresponding dimension in C<$source>. Each element of C<$size> should be non-negative. If an element of C<$size> is positive, then the corresponding output dim is made to have the indicated size. If an element is zero, then the corresponding output dim is omitted entirely. This allows you to distinguish, for example, between a C<3x1x2> output range at each location and a C<3x2> output range at each location (with the last output coordinate running over the third input coordinate). B<Boundary conditions> The C<$boundary> is a number, string, or list ref indicating the type of boundary conditions to use when the extracted ranges reach beyond the boundaries of C<$source>. If you specify no boundary conditions the default is to forbid boundary violations on all axes. If you specify exactly one boundary condition, it applies to all axes. If you specify more (for example, as elements of a list ref), then they apply to dimensions in the order in which they appear, and the last one applies to all subsequent dimensions. =over =item * B<0> or C<"f"> C<"forbid"> (default) Ranges are not allowed to cross the boundary of the original PDL. Disallowed ranges throw an error. The errors are thrown at evaluation time, not at the time of the range call (this is the same behavior as slice). =item * B<1> or C<"t"> C<"truncate> - Values outside the original piddle get the special value BAD if you've got bad value support compiled into your PDL and set the badflag for the source PDL; or 0 if you haven't (you must set the badflag if you want BADs for out-of-bound values, otherwise you get 0). Reverse dataflow works OK for the portion of the child that is in-bounds. The out-of-bounds part of the child is reset to (BAD|0) during each dataflow operation, but execution continues. =item * B<2> or C<"e"> C<"extend> - Values that would be outside the original PDL point instead to the nearest allowed value within the PDL. =item * B<3> or C<"p"> C<"periodic> - Periodic boundary conditions apply: the numbers in C<$index> are applied, strict-modulo the corresponding dimensions of C<$source>. This is equivalent to duplicating the C<$source> piddle throughout C<N>-D space. =item * B<4> or C<"m"> C<"mirror"> - Mirror-reflection periodic boundary conditions apply. =back B<Output Dimensionality> C<range> threads over both C<$index> and C<$source>. The returned dimension list is stacked up like this: (index thread dims), (index dims (size)), (source thread dims) The first few dims of the output correspond to the thread dims of C<$index> (beyond the 0 dim). They allow you to pick out individual ranges from a large, threaded collection, so that the output normally has the same dimensionality as the C<$index>, but collapsed by one dimension. The middle few dims of the output correspond to the size dims specified in C<$size>, and contain the range of values that is extracted at each location in C<$source>. Every nonzero element of C<$size> is copied to the dimension list here, so that if you feed in (for example) C<"$size = [2,0,1]"> you get an index dim list of C<"(2,1)">. The last few dims of the output correspond to extra dims of C<$source> beyond the number of dims indexed by C<$index>. These dims act like ordinary thread dims, because adding more dims to C<$source> just tacks extra dims on the end of the output. Each source thread dim ranges over the entire corresponding dim of C<$source>. B<Examples> Here are basic examples of range operation, showing how to get ranges out of a small matrix. The first few examples show extraction and selection of individual chunks. The last example shows how to mark loci in the original matrix (using dataflow). pdl> $src = 10*xvals(10,5)+yvals(10,5) pdl> print $src->range([2,3]) # Cut out a single element 23 pdl> print $src->range([2,3],1) # Cut out a single 1x1 block [ [23] ] pdl> print $src->range([2,3], [2,1]) # Cut a 2x1 chunk [ [23 33] ] pdl> print $src->range([[2,3]],[2,1]) # Trivial list of 1 chunk [ [ [23] [33] ] ] pdl> print $src->range([[2,3],[0,1]], [2,1]) # two 2x1 chunks [ [ [23 1] [33 11] ] ] pdl> # A 2x2 collection of 2x1 chunks pdl> print $src->range([[[1,1],[2,2]],[[2,3],[0,1]]],[2,1]) [ [ [ [11 22] [23 1] ] [ [21 32] [33 11] ] ] ] =head2 Location Operators Location operators are the opposite of indexing operators: they return the elements or locations where a particular expression is true, allowing you to filter a large array and act on an arbitrary subset of it. PDL's location and filtering operators are C<where>, C<which>, and C<whichND>. Each operator accepts a PDL filter expression, and returns either source elements or index values corresponding to the locations where the filter expression is true. The C<where> operator combines location and selection into a single step: it returns the actual elements of the source PDL, so that you can copy them out or act on them directly (via dataflow, the elements remain connected to the original data). The other two operators return indices of the locations where the source expression is true. Location operators use PDLs to represent sets (the set of elements for which a condition is true). The null set is correctly represented -- if the filtering condition is false everywhere, C<where>, C<which>, and C<whichND> will each return the special I<empty> PDL, which has 0 elements. =head3 The C<where> operator The C<where> operator rolls up the operations of location and selection in a single routine. You can say: $out = $source->where( condition($source) ) to retrieve all (and only) the elements of $out that correspond to true elements of the expression C<condition($source)>. The data remain connected back to the original C<$source>. For example: $source = - (xvals(10) % 2); print "Source is $source.\n"; $zeroes = $source->where( !$source ); $zeroes .= xvals($zeroes); print "Source is now $source.\n"; outputs: Source is [0 -1 0 -1 0 -1 0 -1 0 -1]. Source is now [0 -1 1 -1 2 -1 3 -1 4 -1]. Often, you'd like to select the same collection of elements from several PDLs at once, and where can handle that. For example, if C<$x> and C<$y> contain coordinates of a collection of rocks, and C<$mass> contains the mass of each rock, then you can say: ($xsub, $ysub, $msub) = where($x, $y, $mass, $mass > 10 & $x < 0 ); to select the coordinates and mass of just the rocks with a mass greater than 10, that also happen to be placed to be left of the origin. (Notice that the example uses the bitwise-and operator C<&>, not the more familiar logical-and operator C<&&> - to find out why, check out [sec:PDLs-as-logical], below). =head3 The C<which> operator The simplest indexing function is C<which>. It accepts a PDL expression and returns a list of all the offset locations where the expression is true. If the source expression has more than one dimension, then it is B<flattened> to one dimension first using C<flat> ([sub:Rearranging-a-dim]) so that it can be indexed with a single number. Note that this may or may not be a useful way to address thread dimensions, depending on your application. You can use the returned index list either in C<index> or in a C<NiceSlice> expression, to get access to the elements. For example: $dex = which( $source==5 ); $fives = $source->($dex); # niceslice $fives = $source->index($dex); # index =head3 The C<whichND> operator For any kind of indexing that is more sophisticated than C<which>, you probably want C<whichND>, which returns a collection of vectors into the source expression, rather than simple offsets. If you call C<whichND> in scalar context, you get back an C<nxm> PDL whose 0 dim runs across dimension in the vectors and whose 1 dim runs across found locations. If you call it in array context, you get back C<n> separate C<m>-PDLs, each of which contains one dimension of the entire list of vectors. The scalar output of C<whichND> is suitable for use with C<indexND> and range. Here is an example of using C<whichND> to extract purple areas from an RGB image. The demo PDL logo is a JPEG image, with pixel values running from 0-255. We first generate a mask expression that is true when the red and blue components are over 40 and 100, respectively, and green is under 70 -- the purple portion of the RGB palette. The C<whichND> operator yields the coordinates of all nonzero pixels in the mask. pdl> #wget http://pdl.perl.org/images/pdllogo.jpg pdl> $a=rim('pdllogo.jpg'); # $a is 250(X) x 150(Y) x 3(RGB) pdl> $mask = andover ( ($a->mv(2,0) * pdl(1,-1,1)) > pdl(40,-70,100) ); pdl> $coords = whichND($mask); Now C<$pixels> is a C<2x4298> PDL containing the coordinates of every purple pixel in the image. After extracting them with C<indexND>, it is possible to change the original image: pdl> $pixels = $a->indexND($coords) # 4298 x 3 pdl> $pixels .= pdl(10,40,200)->(*1); # flows to $a pdl> wim($a,'pdllogo-blue.jpg'); [float Figure: [Figure 3.1: whichND example. Left: original image. Right: after pixel selection and processing with whichND and indexND ([sub:whichND] ). ]