=head1 Slicing, Dicing and Threading dims with PDL Fundamental to any vectorised data language such as PDL is the ability to manipulate subsets of data in convenient ways. PDL provides the facilities to change the size and dimensionality of data, to take contiguous and non-contiguous subsections of data along dimensions and to take completely arbitrary subsets of data meeting arbitrary criteria. A key powerful feature is the ability to manipulate these subsets of data, and if desired to propagate these changes back to the original data B<automatically>. This includes passing data to user-written subroutines, which may call standard external C code, which do not know or care about whether the data is a subset or not. That sounds pretty abstract - but here is a concrete example: with PDL one could for example select all the pixels in an image greater than a certain value or meeting some other condition. This might serve to isolate a bright star or galaxy. One could then pass the pixel values and their locations to a photometry subroutine (which is just written to work on data arrays not caring whether it is a subset or not) which would fit the pixels with some model and replace them in the array. These changed pixels would then be automatically changed in the original image. This sort of abstraction is extremely powerful as it allows for very concise and clear code. We'll start by looking at the simplest operations to extract simple slices of piddles, and look at increasingly more complex kinds of slices. =head2 Finding piddle dimensions. PDL data arrays can take arbitrary sizes and dimensions. Finding the current dimensions is straight-forward with the C<dims> function which returns a list: $data = zeroes(100,20,3); print dims($data); ($nx, $ny, $nz) = dims($data); The number of elements in a piddle is equally easy: print nelem($data); =head2 The C<slice> function - regular subsets along axes Earlier we saw how to extract a rectangular subset of a piddle: $section = $gal(337:357,178:198); The piddle C<$gal> was a 2D image, we used array syntax (compliments of C<PDL::NiceSlice>) to extract a contiguous subset ranging from pixel 337 to 357 along the first dimension, and 178 through 198 along the second. Behind the scenes, this is implemented by the C<slice> function. Use the on-line documentation: perldl> help slice to explore the full set of options. C<slice> is probably the most frequently used PDL function so we will explore it in some detail. But first we notice that C<slice> is implemented via a named function. Through the magic of C<PDL::NiceSlice> and source filtering you can access C<slice> functionality in a form very similar to the vector array syntax found in many array computation languages such as FORTRAN-90 and Matlab. =head3 The basic slicing specification. The argument syntax to C< slice > is just a list of ranges, the simplest if of the form C< A:B > to specify the start and end pixels. This generalises to arbitrary dimensions; $data = zeroes(1000); $sec = $data(0:20); $data = zeroes(100,100,20); $sec = $data(0:20,40:60,1:3); Note that PDL, just like Perl and C, uses B<ZERO OFFSET> arrays. i.e. the first element is numbered 0, not 1. Just like Perl you can use C<-N> to refer to the last elements: $data = zeroes(1000); $sec = $data(-10:-1); # Elements 990 to 999 (last) One can also specify a step in the slice using the form C< A:B:C > where C< C > is the step. Here is an example: perldl> $x = sequence(24); # Create a piddle of increasing value perldl> print $x [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] perldl> print $x(16:22:2) [16 18 20 22] Quite often one wants all the elements along many of the dimensions, one can just use ` C< : > ' or just omit the specifier altogether: perldl> $a = zeroes(10,20,3) perldl> print dims $a(:,5:10,:) 10 6 3 perldl> print dims $a(,5:10,) 10 6 3 Omitting the range allows specification of just one index along the dimension: $z = zeroes 100,200; $col = $z(42,:); # Column 42 (Dims = 1x200) $row = $z(:,42); # Row 42 (Dims = 100x1) Since the second argument to slice is just a Perl character string it is easy to manipulate: $x1 = 2; $x2 = 42; $sec = $data($x1:$x2); =head3 Modifying slices. Here's the biggy: perldl> $x = sequence(24); perldl> print $x; [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] perldl> $slice = $x(4:20:2); perldl> print $slice; [4 6 8 10 12 14 16 18 20] All very well. But now we modify the slice using the assignment operator. perldl> $slice .= 0; perldl> print $slice; [0 0 0 0 0 0 0 0 0] perldl> print $x; [0 1 2 3 0 5 0 7 0 9 0 11 0 13 0 15 0 17 0 19 0 21 22 23 24] Modifying the slice automatically modfies the original data! However it is done ( C<$slice++> etc. work just as well). All the PDL slicing and dicing functions work this way, from the simplest rectangular slices to the most complex conditional slices. This is because they use a fundamental PDL feature known as B<dataflow>. =head3 Does a slice consume memory? What if we have a big array and make a slice of most of it: $x = zeroes (2000,2000); $slice = $x(10:1990,10:1990); $slice++; If you monitor the memory consumed by the PDL process on your computer (UNIX/Linux users can try the C<top> command) you will see that the amount of memory consumed does not go up - B< even when the slice is modified > . This is because the way PDL is written allows many of the simple operations on slices to be optimised - i.e. a temporary internal copy of the slice need not be made. Of course sometimes - for example when passing to an external subroutine - this optimisation is not possible. But the book-keeping of propagating the changes back to the original piddle is handled automatically. =head3 Advanced slice syntax C<slice> has some advanced syntactical features which allow dimensions to be inserted or removed (this comes in quite useful when passing 2D arrays to functions expecting 1D arguments and visa-versa, this comes in extremely useful when using PDL's advanced B<threading> features (see I<PDL threading and the signature> later. If a dimension is of size unity it can be removed using C< () >: $z = zeroes 100,30; $col = $z(42,:); # Column 42 - 2D (Dims = 1x30) $col = $z((42),:); # Column 42 - 1D (Dims = 30) And then one can put them back again using ` C< * > ': $col2 = $col(*,:,*); # Dims now = 1x30x1 This can even be used to insert more than one element along the dimension: $t = $z(:,*3,:); # Dims now 100x3x30 This sort of thing is very useful for advanced threading trickery. =head3 PDL's Method notation At this point we would like to introduce the varied notations for calling C< slice > and it's friends. This is because it will be commonly seen in PDL code and is very handy. While at first unfamilar to C and FORTRAN users it is not rocket science, PDL users will quickly become used to it. As we mentioned in Chapter 2 piddles are implemented as Perl objects. Objects can have their own personal functions, known as methods. The difference between a method and a function is that a method can only be used on the class of object it belongs too. And methods have a new notation for calling them. This means names (which can get in short suppply) can be re-used for different objects. Many of PDL's functions are available as methods too, in fact once you started using the more advanced features you will find that many of them are only available as methods. (PDL by default defines a lot of functions, which while useful do clutter Perl's namespace, at some point we had to stop!). For example here are 3 different ways of calling C< slice > : $t = slice($z,":,*3,:"); # Function call (old style) $t = slice $z,":,*3,:"; # Function call (old style) $t = $z->slice(":,*3,:"); # Method call (old style) $t = $z(:,*3,:); # Vector syntax (NiceSlice style) $t = $z->(:,*3,:); # Method call (NiceSlice style) The C<PDL::NiceSlice> style vector syntax is the most concise and readable. The method call syntax (either old style or C<PDL::NiceSlice> style) is also readable. You may need to understand the original slicing syntax to understand legacy PDL codes and for the cases where C<PDL::NiceSlice> syntax can not or is not used. See the on-line docs for C<PDL::NiceSlice> for details. =head2 The C<dice> and C<dice_axis> functions - irregular subsets along axes As well as take regular slices along axes via the C< slice > function, another common requirement is to take B< irregular slices > , by which we mean a list of arbitrary coordinates. This operation is referred to in PDL as B< dicing > a piddle. The C< dice_axis > function performs a dice along a specified axis: $a = sequence(10,20); $b = dice_axis $a, 0, [3,7,9]; # Dice along axis 0 $b .= 42; # Alters columns [3,7,9]; print $b; For a 2D piddle dicing along axis 0 selects columns, dicing along axis 1 selects rows. In general in N-dimensions dicing along a given axis reduces the number of elements along that axis, but the number of dimensions remains unchanged. The C< dice > function allows all axes to be specified at once: $z = zeroes 10,20,50; print dims dice $z,[2,3,5],[10,11,12],[30..35,39,40]; The list of axes in the dice can be specified using Perl's ` C< [] > ' list reference notation or using a 1D piddle: $z = sequence 10,20; $dice = long(random(10)*10); # Select random columns $sel = $z->dice_axis(0,$dice); =head2 Using C<mv>, C<xchg> and C<reorder> - transposing dimensions We saw earlier how arguments to C<slice> can be used to add and remove dimensions. More sophisticated tricks can be performed with a whole suite of PDL methods. C<xchg> simply swaps two dimensions: $z = zeroes(3,4); $t = $z->xchg(0,1); # Axes 0 and 1 swapped, dims now = 4,3 This is a simple matrix transpose. The method C<< $z->transpose >> and the equivalent operator C<~$z> also do this, though they also make a copy (i.e. return a new piddle) not a slice and can operate on 1D piddles (i.e. convert a row vector into a column vector). Somethimes this is what you want. C<xchg> works like C<slice> and C<dice> - changes affect the original. Also C<xchg> generalises to N-dimensions: $z = zeroes(3,4,5,6,7); $t = $z->xchg(1,3); # Dims now 3,6,5,4,7 A different way of switching dimensions around is provided by C< $z->mv(A,B) > which justs moves the axis C< A > to posiition C< B > : $z = zeroes(3,4,5,6,7); $t = $z->mv(1,3); # Dims now 3,5,6,4,7 Finally one can completely re-order dimensions: $z = zeroes(3,4,5,6,7); $t = $z->reorder(4,3,0,2,1); # Dims now 7,6,3,5,4 Note C<reorder> is our first example of a pure PDL method - it does not exist as a function and can only be called using the C<< $z->reorder(...) >> syntax. =head2 Combining dimensions with C<clump> We've now seen a whole slew of functions for changing the ordering of dimensions. It is now time to look at some more complicated operations. The first of these is something we have already seen in Chapter 1. This is the C<clump> function for combining dimensions together. Suppose we have a 3-D datacube piddle: perldl> $a = xvals(5,3,2); perldl> print $a; [ [ [0 1 2 3 4] [0 1 2 3 4] [0 1 2 3 4] ] [ [0 1 2 3 4] [0 1 2 3 4] [0 1 2 3 4] ] ] We have seen before we can apply a 1-D function like C<sumover> to the rows - and using dimension manipulating functions to any of the axes. But say we wanted to sum over the first B<TWO> dimensions? i.e. replace our datacube with a 1-D vector containing the sums of each plane. What we need to do is to `clump' the first two dimensions together to make one dimension, and then use C<sumover>. Surprisingly enough this is what C<clump> does: perldl> $b = $a->clump(2); # Clump first two dimensions together perldl> print $b; [ [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4] [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4] ] perldl> $c = sumover $b; perldl> print $c; [30 30] Now we know about C<mv> it is also easy to sum over the last two dimensions: perldl> print sumover $a->mv(0,2)->clump(2) [0 6 12 18 24] It is also possible using the special form C< clump(-1) > to clump B< all > the dimensions together: perldl> $x = sequence(10,20,30,40); perldl> print dims $x->clump(-1); 240000 perldl> print sumover $x->clump(-1); # Same as sum($x) 28799880000 Uncannily this is almost exactly how the C< sum > function is implemented in PDL. =head2 Adding dimensions with C<dummy> After our first look at threading in Chapter 2 we know how to add a vector to rows of an image: perldl> print $a = pdl([1,0,0],[1,1,0],[1,1,1]); [ [1 0 0] [1 1 0] [1 1 1] ] perldl> print $b = pdl(1,2,3); [1 2 3] perldl> print $a+$b; [ [2 2 3] [2 3 3] [2 3 4] ] But say we wanted to add the vector to the columns. You might think to transpose C<$a> : perldl> print $a->xchg(0,1)+$b; [ [2 3 4] [1 3 4] [1 2 4] ] But the result is the transpose of the desired result. We could of course just transpose the result but a cleaner method is to use C< dummy > to change the dimensions of C<$b> : perldl> print $b->dummy(0); # Result has dims 1x3 [ [1] [2] [3] ] C< dummy > just inserts a `dummy dimension' of size unity at the specified place. C< dummy(0) > put's it at position 0 - i.e. the first dimension. The result is a column vector. Then we easily get what we want: perldl> print $a + $b->dummy(0); [ [2 1 1] [3 3 2] [4 4 4] ] Because of the threading rules the unit dimension makes C< $b > implicitly repeat along axis 0. i.e. it is as if C<< $b->dummy(0) >> B<looked> like: [ [1 1 1] [2 2 2] [3 3 3] ] C< dummy > can also be used to insert a dimension of size >1 with the data B<explicitly> repeating: perldl> print dims $b->dummy(0,10); 10 3 perldl> print $b->dummy(0,10); [ [1 1 1 1 1 1 1 1 1 1] [2 2 2 2 2 2 2 2 2 2] [3 3 3 3 3 3 3 3 3 3] ] =head2 Completely general subsets of data with C< index > , C< which > and C< where> Our look at advanced slicing concludes with a look at completely general subsets, specified using arbitrary conditions. Let's make a piddle of real numbers from 0 to 1: perldl> print $a = sequence(10)/10; [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] We can make a conditional array whose values are 1 where a condition is true using standard PDL operators. For example for numbers below 0.2 and above 0.9: perldl> print $a<0.25 | $a>0.85; [1 1 1 0 0 0 0 0 0 1] We'll use this as an example of an arbitrary condtion. Using C< which > we can return a piddle containing the positions of the elements which match the condition: perldl> $idx = which($a<0.25 | $a>0.85); print $idx; [0 1 2 9] i.e. elements 0..2 and 9 in the original piddle are the ones we want. We can select these using the C< index > function: perldl> print $a->index($idx); [0 0.1 0.2 0.9] So here we have an arbitrary, non-contiguous slice. However thanks to the magic of PDL we can still modify this as if it was still a more boring kind of slice and have our results affect the original: perldl> $a->index($idx) .= 0; # Set indexed values to zero perldl> print $a; [0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0] In fact PDL posesses a convenience function called C< where > which actually lets you combine these steps at once: $a = sequence(10)/10; $a->where($a<0.25 | $a>0.85) .= 0; print $a; # Same result as above i.e. we make a subset of values B< where > a certain condition is true. You can of course use index with explicit values: # Increment first and last values $a = sequence(10); $a->index(pdl(0,9))++; What if you had a 2-D array? C< index > is obviously one-dimensional. What happens is an implicit C< clump(-1) > (i.e. the whole array is viewed as 1-D): perldl> $a = sequence(10,2); perldl> $a->index(pdl(0,9)) .= 999; perldl> print $a; [ [999 1 2 3 4 5 6 7 8 9] [ 10 11 12 13 14 15 16 17 18 999] ] You can of course use C< where > too for any number of dimensions: # e.g. make a cube with a sphere of 1's in the middle: $cube = rvals(100,100,100); $tmp = $cube->where($cube<20); $cube .= 0; $tmp .= 1; =head2 PDL threading and signatures Slicing and indexing arbritary subsets of data is certainly a fundamental aspect of any array processing language and PDL is no exeption (as you can tell from the preceding examples). In PDL those functions might be even more important since they are absolutely vital in using PDL B< threading > , the fast (and automagic) vectorised iteration of "elementary operations" over arbitrary slices of multidimensional data. First we have to explain what we mean by B< threading > in the context of PDL, especially since the term B< threading > already has a distinct meaning in computer science that only partly agrees with its usage within PDL. In the following we will explain the general use of PDL threading and highlight the close interplay with those slicing and dicing routines that you have just become familiar with ( C< slice > , C< xchg > , C< mv > , etc). But first things first: what is PDL threading? =head3 Threading B< Threading > already has been working under the hood in many examples we have encountered in previous sections. It allows for fast processing of large amounts of data in a scripting language (such as C< perl >). And just to be sure, PDL B< Threading > ist B< not > quite the same as threading in the computer science sense. Both concepts are related but more about that later. =head3 A simple example As a starting point, we look at one of the PDL projection operators (they make N-1 dimensional piddles from N dim input piddles). So we need some data to try our code on. This time, we use the image of a tiny fluorescent bead that was recorded with a fluorescent microscope: use PDL::IO::Pic; $im = rpic 'beads.jpg'; # image stored in the JPEG format The following code snippet calculates the maxima of all rows of this image C< $im > : $max = $im->maximum; We rewrite this example slightly so that we can see the dimensions of the piddles involved using a little helper routine (see box) to print out the shape of piddles in the course of computations: ($max = $im->pdim('Input')->maximum)->pdim('Result'); that generates the following output: Input Byte D [256,256] Result Byte D [256] \begin_inset Float figure placement H wide false sideways false status open \begin_inset Box Boxed position "c" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 width "110text%" special "none" height "55theight%" height_special "none" status open Since is important to keep track of the dimensions of piddles when using (and especially when introducing) B< PDL threading > we quickly define a shorthand command (a method) that lets us print the dimensions of piddles nicely formatted as needed: { package PDL; sub pdim { # pretty print type+dimensions and # allow for optional string arg my ($this) = @_; print (($#_ > 0 ? $_[1]\t : \begin_inset Quotes eld \begin_inset Quotes erd ). $this->info("%T %D \backslash n")); # use info to print type and dims return $this; }} Two observations: note how we temporarily switched into the package PDL so that pdim can be used as a method on piddles and we made the function return the piddle argument so that it can be seamlessly integrated into method invocation chains: $a->pdim("Dims")->maximum; A small utility routine So let's dissect what has happened. If you look at the documentation of C< maximum > it says This function reduces the dimensionality of a piddle by one by taking the maximum along the 1st dimension. In this respect C< maximum > behaves quite differently from C< max > . C< max > will always return a scalar with a value equal to that of the largest element of a (possibly multidimensional) piddle. C< maximum > , however, is by definition an operation that takes the maximum only over a one-dimensional vector. If the input piddle is of higher dimension this B< elementary operation > is automatically iterated over all one-dimensional subslices of the input piddle And, most imporatantly, this automatic iteration (we call it the B< threadloop >) is implemented as fast optimized C loops. As a convention, these subslices are by default taken along the first dimensions of the input piddle. In our current example the subslices are one-dimensional and therefore taken along the first dimension. All results are placed in an appropriately sized output piddle of N-1 dimension s, one value for each subslice on which the operation was performed. Now it should be no surprise that $im3d = sequence short, 5,10,3; # a 3D image (volume) $max = $im3d->pdim('Input')->maximum; print $max->pdim('Result') . " \n"; generates Input Short D [5,10,3] Result Short D [10,3] [ [ 4 9 14 19 24 29 34 39 44 49] [ 54 59 64 69 74 79 84 89 94 99] [104 109 114 119 124 129 134 139 144 149] ] As expected the above command sequence creates a 2D piddle (size C<[10,3]>) of maxima of all rows of the original volume data. =head3 Why bother? Why should we go through this at length? Quickly you will realize that many more complicated operations can be assembled from the iteration of an elementary operation (that is if you keep reading this chapter). Those elementary operations that ship with the basic PDL distribution make the building blocks for your more complicated real world applications; threading just makes sure it will all happen quickly enough and without too much syntactical effort from your side (you still will have to get your head round the idea). So let's expand our example a little further and postpone the why and how for a small while. =head3 More examples Now suppose we do not want to calculate the maxima along the first dimension but rather along the second (the column maxima). However, we just heard that C< maximum > works by definition on the first dimension. How can we make it do the same thing on the second dimension? Here is where the dimension manipulation routines come in handy: we use a to make a piddle in which the original second dimension has been moved to first place. Guess how that is done: yes, using C<xchg> we get what we want If you check C<pdim>'s output you see how the originally second dimension of size 10 has been moved to the first dimension (step 1->2) and, accordingly, maximum now does its work on all the columns of the original input piddle C<$im3d> (step 3). Again PDL has automatically iterated the elementary functionality of maximum (calculate the maximum of a one-dimensional vector) over all subslices of the data and created an appropriately sized piddle (here of shape C<[5,3]>) to hold the resulting elements. This general scheme works for most PDL functions. For example, let's say you have a stack of images (represented by a 3D piddle) and you want to convolve each image with the same kernel. That's easy. Make sure the image dimensions (x and y) are the leading dimension in your piddle: $convolved = $stack->conv2d($kernel); And if your image stack is organized differently, e.g. the leading dimension is the z dimension, say in a [8,256,256] shaped piddle just use C< mv > to obtain the desired result: $convolved = $stack->mv(0,2)->conv2d($kernel); These (admittedly simple) examples show the general principle: an elementary operation is iterated over subslices of one or several multidimensional piddles. Sometimes the dimensions of the input piddles involved need to be manipulated so that iteration happens as desired (e.g. over the intended subslices) and the result has the intended dimensionality. Formulating your problem at hand in a way that makes use of threading rather than resorting to nested C< for > -loops at the perl level can make the difference between a script that is executed faster than you can type and one that is crawling along and giving you plenty of time to have your long overdue lunch break. =head3 Why threading and why call it B< threading > ? So what are the advantages of relying on threading to perform things you can achieve in perl also with explicit C< for > loops and the C< slice > command? There are several (very good) reasons. The more you use PDL for your daily work the quicker you will appreciate this. Before we get into the details of the why and how let's admit: PDL is by no means the first data language that supports this type of automatical implicit looping facility: the authors have in fact been inspired by several previous data language implementations, most notably C< Yorick > Similar concepts are also implemented in APL and J, although well hidden by a wealth of terminology and notation very different from that of most other conventional computer languages . What we think distinguishes PDL from these previous languages is the consistent support of threading throughout PDL, the tight integration with the PDL preprocessor (dealt with in a separate chapter) and the conceptual interplay with the dimension manipulation routines. The first and most important reason to use B< PDL threading > is simply B< speed > . The alternative to threading are loops at the perl level. That is certainly a viable alternative, however, if we rewrite our maximum routine along these lines a quick benchmark test will prove our point. First of all, here is the code that does the equivalent of C< maximum > on 2D input without using threading sub mymax { # we only cover the case of 2D input my ($pdl) = @_; die "can only deal with 2D input" unless $pdl->getndims == 2; $result = PDL->zeroes($pdl->type,$pdl->getdim(1)); my $tmp; for (my $i=0;$i<$pdl->getdim(1);$i++) { ($tmp = $result->slice("($i)")) .= $pdl->slice(",($i)")->max; } return $result; } We have written it so that C< mymax > can just deal with 2D input piddles. A routine for the general n-dimensional case would have been more involved . Note that we explicitly have to create an output piddle of the desired type and size. By comparison, the corresponding threading routine is much more concise: sub mythreadmax { my ($pdl) = @_; return $pdl->maximum; } In fact, we only wrapped C< maximum > in another subroutine to have the same calling overhead as C< mymax > . We are trying to be fair (even though we are biased). So let's compare the performance of C< mymax > versus C< mythreadmax > . How? Remember that we are using perl, after all, and that there is (almost) always a module that does just what you need. Here and now C< Benchmark > Our benchmarking script looks like this use Benchmark; use PDL; $a = sequence(10,300); timethese(0, { # run each for at least 3 CPU secs 'Perl loops' => '$pl = mymax $a;', 'PDL thread' => '$pt = mythreadmax $a;', }); If we run this script it generates PDL thread: 4 wallclock secs ( 2.48 usr + 0.64 sys = 3.12 CPU) @ 12802.88/s (n=40009) Perl loops: 3 wallclock secs ( 1.80 usr + 1.23 sys = 3.03 CPU) @ 12.86/s (n=39) That proves our point: while the example using threading is executed at a rate of nearly 13,000 per second using explicit loops has brought down the speed to less than 13/s, a very significant difference. Obviously, the difference between threading and explicit looping depends somewhat on the nature of the elementary operation and the piddles in question. The difference becomes most striking the more elementary operations are involved and the faster an individual elementary operation can be performed. The advantage of threading will level off as the time for performing the elementary operation becomes comparable or even greater than that required to execute the explicit looping code. Another distinct advantage becomes apparent when comparing the code required to implement the equivalent of the C< maximum > functionality explicitly in perl code. We have to write extra code to create the right size ouput piddle, explicitly handle dimension sizes, etc. All in all the code is much less concise and also less general. With the requirement to deal with all dimensions, loop limits, etc yourself you increase the probability of introducing errors into your code. When using threading, PDL checks all dimensions for you, makes sure it loops over the correct indices internally and keeps you from having to do the bookkeeping: after all, B< that > is what computers are good at. Even though PDL threading makes your life much easier in one respect by taking care of some of the "messy" details it leaves you with another task: you have to find the places in your algorithm/problem where threading can effectively be used and help to make for speedy execution even when using an (almost inevitably slower) scripting language. But finding such places and making use of these vectorised features is the key to using an array-oriented high level language like PDL successfully. This is what the programmer new to PDL and used to low-level programming has to learn: avoid explicit loops where possible and try to use automatically performed B< thread loops > instead. There is yet another benefit that comes with the threading approach. By looking at places where threading can be efficiently used you are also rethinking your problem in a way so that it can be very effectively parallelize d! The keen reader has probably already observed that those internal automatic loops of elementary operations over subslices do not have to be performed sequentially. use Benchmark; use PDL; $a = sequence(10,300); timethese(0, { # run each for at least 3 CPU secs 'PDL mulithreaded' => '$pl = maximum $a;', 'PDL singlethreaded' => '$a->add_threading_magic(1,10); $pt = maximum $a;', }); The command $a->add_threading_magic(1,10); explicitly tells PDL to use 10 (possibly concurrently run depending on your computer hardware) threads when performing the threadloop over dimension 1. For example, on an Cygwin PC with 2 AMD Opteron CPUs this script yields the following output: =head3 The general case: PDL functions and their signature Having made the case for PDL threading let's study its own messy details. PDL threading is a powerful tool. And as usual you have to pay a price for power: complexity. The general rules for PDL threading can be confusing at first. But there is hope: you can first study the more simple cases and work up to more difficult examples as you go. So let's continue our tour of threading. The first question arises naturally: how can one find out about the dimension of subslices in a elementary operation of a function in PDL? We know from the preceding examples that some PDL functions work on a one-dimensional subvector of the data and generate a zero-dimensional result (a scalar) from each of the processed subslices, for example: C< maximum > , C< minimum > , C< sumover > , C< prodover > , etc. Two-dimensional convolution ( C< conv2d >), on the other hand, consumes a 2D subslice in an elementary operation. But how do we get this information in general for any given function? It is easy: you just have to check the function's B< signature > ! The signature is a string that contains this information in concise symbolic form: it names the parameters of a function and the dimensions of these parameters in an elementary operation. Additionally, it specifies which of these parameters are input parameters and which are output parameters. Finally, for some functions it contains information about special type conversions that are to be performed at run-time. Generally, you can find the signature of a function using the perldl online help system. Just type C< sig <funcname> > at the command prompt, e.g.: perldl> sig maximum Signature: maximum(a(n); [o]c()) The interesting part is the formal argument list in parentheses that follows the function name: a(n); [o]c() This signature states that C< maximum > is a function with two arguments named C< a > and C< c > . Wait a minute: above it seemed that C< maximum > only takes one argument and returns a result! The apparent contradiction is resolved by noting that the formal argument C< c > is flagged with the C< [o] > option identifying C< c > is an output argument. This seems to suggest that we could C< maximum > also call as maximum($im, $result); This is in fact possible and an intended feature of PDL that is useful in B< tight loops > where it helps to avoid unneccesary reallocation of variables (see below). In general, however, we will call functions in the usual way that can be written symbolically as: output_arg_list = function(input_arg_list) or equivalently, using the method notation: output_arg_list = input_piddle_1->function(rest_of_arg_list) The other important information supplied by the signature is the dimensionality of each of these arguments in an elementary operation. Each formal parameter carries this information in a list of formal dimension identifiers enclosed in parentheses. So indeed C< a(n) > marks C< a > as a one-dimensional argument. Additionally, each dimension has a B< named > size in a signature, in this example C< n > . C< c() > has an empty list of dimension sizes: it is declared to be zero-dimensional (a scalar). If piddles that are supplied as runtime arguments to a function have more dimensions than specified for their respective formal arguments in the signature then these dimensions are treated by PDL as B< extra dimensions > and lead to the operation being B< threaded > over the appropriate subslices, just what we have seen in the simple examples above. As mentioned before a higher dimensional piddle can be viewed as an array (again B< not > in the perl array sense) of lower dimensional subslices. Anybody who has ever worked with matrix algebra will be familiar with the concept. For some of the following examples it will be useful to illustrate this concept in somewhat more detail. Let's make a piddle first, a simple 3D piddle: $pdl = sequence(3,4,5); A boring piddle, you say? Yes, boring, but simple enough to clearly see what is going on in the following. First we look at it as a 3D array of 0D subslices. Since we know the syntax of the C< slice > method already we can write down all 0D subslices, no problem: $pdl->slice("($i),($j),($k)"); Well, obviously we have not written down all 3*4*5 = 60 subslices literally but rather in a more concise way. It is understood that C< $i > can have any value between 0 and 2, C< $j > between 0 and 3 and C< $k > between 0 and 4. To emphasize this we sometimes write $pdl->slice("($i),($j),($k)") $i=0..2; $j=0..3; $k=0..4 With the meaning as above (and '..' C< not > meaning the perl list operator). In that way we enumerate all the sublices. Quite analogously, when dealing with an elementary operation that consumes 1D slices we want to view C< $pdl > as an C< [ > 4,5 C< ] > array of 1D subslices: $pdl->slice(":,($i),($j)") $i=0..3; $j=0..4 And similarly, as a C< [ > 5 C< ] > array of 2D subslices $pdl->slice(":,:,($i)") $i=0..4 You see how we just insert a ':' for each complete dimension we include in the subslice. In fig. XXX the situation is illustrated graphically for a 2D piddle. Depending on the dimensions involved in an elementary operation we therefore often group the dimensions (what we call the B< shape >) of a piddle in a form that suggests the interpretation as an array of subslices. For example, given our 3D piddle above that has a shape C< [ > 3,4,5 C< ] > we have the following different interpretations: ()[3,4,5] a 3,4,5-array of 0D slices (3)[4,5] a 4,5-array of 1D slices (of shape [3]) (3,4)[5] a 5-array of 2D slices (of shape [3,4]) (3,4,5)[] a 0D array of 3D slices (of shape [3,4,5]) The dimensions in parentheses suggest that these are used in the elementary operation (mimicking the signature syntax); in the context of threading we call these the B< elementary dimensions > . The following group of dimensions in rectangular brackets are the B< extra dimensions > . Conversely, given the elementary/extra dims notation we can easily obtain the shape of the underlying piddle by appending the extradims to the elementary dims. For example, a C< [ > 3,6,1 C< ] > array of 2D subslices (3,4) (3,4)[3,6,1] identifies our piddle's shape as C< [ > 3,4,3,6,1 C< ] > Alright, the principles are simple. But nothing is better than a few examples. Again a typical imaging processing task is our starting point. We want to convert a colour image to greyscale. The input image is represented as a two-dimensional array of triples of RGB colour coordinates, or in other words, a piddle of shape C< [3,n,m] > . Without delving too deeply into the details of digital colour representation it suffices to note that commonly a grey value B< i > corresponding to a colour represented by a triple of red, green and blue intensities B< (r,g,b) > is obtained as a weighted sum: \begin_inset Formula \[ i=\frac{77}{256}r+\frac{150}{256}g+\frac{29}{256}b\] A straight forward way to compute this weighted sum in PDL uses the C< inner > function. This function implements the well-known B< inner product > between two vectors. In a elementary operation C< inner > computes the sum of the element-by-element product of two one-dimensional subslices (vectors) of equal length: \begin_inset Formula \[ c=\sum_{i=0}^{n-1}a_{i}b_{i}\] Now you should already be able to guess C< inner > 's signature: perldl> sig inner Signature: inner(a(n); b(n); [o]c()) C< a(n); b(n); [o]c(); > : two one-dimensional input parameters C< a(n) > and C< b(n) > and a scalar output parameter C< c() > . Since C< a > and C< b > both have the same named dimension size C< n > the corresponding dimension sizes of the actual arguments will have to match at runtime (which will be checked by PDL!). We demonstrate the computation starting with a colour triple that produces a sort of yellow/orange on an RGB display: $yel = byte [255, 214, 0]; # a yellowish pixel $conv = float([77,150,29])/256; # conversion factor $i = inner($yel,$conv)->byte; # compute and convert to byte print "$i \backslash n"; 202 Now threading makes extending this example to a whole RGB image very straightforward: use PDL::IO::Pic; # IO for popular image formats $im = rpic 'pdllogo.jpg'; # a colour image from the book dataset $grey = inner($im->pdim('COLOR'),$conv); # threaded inner product over all pixels $gb = $grey->byte; # back to byte type COLOR Byte D [3,500,300] The code needs no modification! Let us analyze what is going on. We know that C< $conv > has just the required number of dimensions (namely one of size 3). So this argument doesn't require PDL to perform threading. However, the first argument C< $im > has two B< extra dimensions > (shape C< [ > 500,300 C< ] >). In this case threading works (as you would probably expect) by iterating the inner product over the combination of all 1D subslices of C< $im > with the one and only subslice of C< $conv > creating a resulting piddle (the greyscale image) that is made up of all results of these elementary operations: a 500x300 array of scalars, or in other words, a 2D piddle of shape C< [ > 500,300 C< ] > . We can more concisely express what we have said in words above in our new way to split piddle arguments in elementary dimensions and extra dimensions. At the top we write C< inner > 's signature and at the bottom the C< slice > expressions that show the subslices involved in each elementary operation: Piddles $im $conv $grey Signature a(n); b(n); [o]c() Dims (3)[500,300] (3)[] ()[500,300] Slices ":,($i),($j)" ":" "($i)($j)" Remember that the slice notation at the bottom does not mean that you have to generate all these slices yourself. It rather tells you which subslices are used in a elementary operation. It is a way to keep track what is going on behind the scenes when PDL threading is at work. Threading makes it possible that we can call the greyscale conversion with piddles representing just one RGB pixel (shape C< [ > 3 C< ] >), a line of RGB pixels (shape C< [ > 3,n C< ] >), RGB images (shape C< [ > 3,m,n C< ] >), volumes of RGB data (shape C< [ > 3,m,n,o C< ] >), etc. All we have to do is wrap the code above into a small subroutine that also does some type conversion to complete it: sub rgbtogr { my ($im) = @_; my $conv = float([77,150,29])/256; # conversion factor my $grey = inner $im, $conv; return $grey->convert($im->type); # convert to original type } =head3 You can write your own threading routines Did you notice? By writing this little routine we have created a new function with its own signature that will thread as appropriate. It has B<inherited> the ability to thread from C<inner>. So what is the signature of C<rgbtogr>? It is nowhere written explicitly and we can't use the C<sig> function to find out about it C<sig> will only know about functions that were created using C<PDL::PP> or if we explicitly specified the signature in the PDL documentation but from the properties of C<inner> and the definition of C<rgbtogr> we can work it out. As input it takes piddles with a size of the first dimension of 3 and returns for each of the 1D subslices a 0D result (the greyvalue). In other words, the signature is a(tri = 3); [o] b() There is some new syntax in this signature that we haven't seen before: writing C< tri = 3 > signifies that in a elementary operation C< rgbtogr > will work on 1D subslices (we have encountered this before); additionally, the size of the first dimension (named suggestively C< tri >) B< must > be three. You get the idea. What we have just seen is worth keeping in mind! By using PDL functions in our own subroutines we can make new functions with the ability to thread over subslices. Obviously, this is useful. We will come back to this feature when we talk about other ways of defining threading functions using PDL::PP below. =head3 Matching threading dimensions After this small digression, back to the subject at hand: what happens when both piddle arguments have extra dimensions? Well, the extra dimensions have to match. Otherwise we wouldn't know how to sensibly pair the subslices, right? So when do extra dimensions match? It is quite simple: corresponding extra dimensions have to have the same size in both piddle arguments. Corresponding extra dimensions are those that occur in both piddles. However, one piddle can have more extra dimensions than the other without causing a mismatch. That sounds strange? Ok, here is an example. We use one of the fundamental arithmetic operations in PDL, addition implemente d by the ' C< + > ' operator. You know already that in an array-oriented language like PDL addition is performed element-by-element on scalars. So the signature of ' C< + > ' comes as no surprise a(); b(); [o] c() two scalars are summed to yield a scalar result. And when we use higher dimensional piddles in an addition this elementary operation is performed over all 0D subslices, as before. So let's go through a few cases. First make some simple piddles $a = pdl [1,2,3]; $b = pdl [1,1,1]; $c = ones 3,2; $d = pdl [3,4]; print $a + $b, "\n"; No big deal. Extradims for both piddles have shape C< [ > 3 C< ] > obviously matching, resulting in [2 3 4] Next, print $a + $c; [ [2 3 4] [2 3 4] ] Alright, this probably is exactly what you expected but let us go through our new terminology and check that we can formally agree with what we intuitive ly expected anyway. C< $a > 's B< extradim > (s) has shape C< [ > 3 C< ] > , those of C< $c > shape C< [ > 3 2 C< ] > . The B< corresponding > B< extradim > (s) in this case is just the first one for the piddles involved. It is equal to 3 in both input piddles, so clearly matches. $a $c a(); b(); [o] c() ()[3] ()[3,2] ????? Now, here is something we have not explicitly discussed yet: what is the shape of the automatically created output piddle given the shape of the extradims of the input piddles involved? Well, the result is created so that it has as many extradims as that input B< piddle > (s) with the most extradims. Additionally, the shape will match that of the input piddles. In our current example that leaves us with a result with extradim shape C< [ > 3,2 C< ] > : [o] c() ()[3,2] Remembering that we obtain the shape of the output piddle by appending the shapes of the extradims to that of the elementary dimensions (here a scalar, i.e. 0D) that leaves us with a result piddle of shape C< [ > 3,2 C< ] > . In the next example we want to multiply C< $c > with C< $d > so that each row of C< $c > is multiplied by the corresponding element of C< $d > or expressed in slices $result->slice("($i),($j)") = $c->slice("($i),($j)") * $d->slice("($j)") $i=0..2, $j=0..1 How do we achieve that by threading? $result = $c*$d is not the right way. Why? Well, the extradims don't match, C< [ > 3,2 C< ] > does not match C< [ > 2 C< ] > since 2 is not equal to 3. Just to see how PDL checks this let us actually execute the command. The slightly obscure error message is something like this PDL: PDL::Ops::mult(a,b,c): Parameter 'b' PDL: Mismatched implicit thread dimension 0: should be 3, is 2 Caught at file (eval 344), line 4, pkg main This is PDL's way to tell you that the extra dimensions don't match. So how do we do it? We use one of the dimension manipulation methods again. This time C< dummy > comes in handy. We want to multiply each element in the nth row of C< $c > with the nth element of C< $d > . So we have to repeat each element of C< $d > as many times as there are elements in each row of C< $c > . This is exactly what we can achieve by inserting a dummy dimension of size $c-> B< getdim > (0) as dimension 0 of C< $d > : perldl> print $d->dummy(0,$c->getdim(0))->pdim("New dims"); New dims Double D [3,2] [ [3 3 3] [4 4 4] ] Using this trick we have a our threaded multiplication do what we want. And now the extra dimensions B< match > (!): $result = $c * $d->dummy(0,$c->getdim(0)); print $result; Using our symbolic way of writing down the slices that are paired in a elementar y operation we can see that we achieve what we wanted $c $d->dummy(0,$c->getdim(0)) $result "($i),(j)" "($i),($j)" "($i),($j)" But hang on, we want to verify (somewhat formally) that the right subslices of the original C< $d > are used in each elementary operation. That is easily achieved by noting that the slice "($i),($j)" of the dummied C< $d > is equivalent to the subslice "($j)" of the original 1D piddle C< $d > . So we finally arrive at $c $d $result "($i),(j)" "($j)" "($i),($j)" While this kind of analysis seems probably not justified when dealing with such a simple example it comes in very handy when looking at more complex threaded code. But before we try our understanding on such an example we look once again at the way extra dims have to match in a thread loop. In the previous example, we had to find out about the size of C< $c > 's first dimension (using C< getdim(0) >) to make a dummy dimension that would fit C< $c > 's extradims in the threaded multiplication. Since similar situations occur very often when writing threaded PDL code the matching rules for extra dimensions allow a dimension size of 1 to match any other dimension size: it is the B< elastic > dimension size in a sense that it grows in a thread loop as required. As in the thread loop the corresponding extra dimension is marched through all its positions (e.g. C< slice(":,($i)") $i=0..n-1 >) the elastic dimension just uses its one and only position 0 repeatedly ( C< slice(":,(0)") $i=0..n-1 >). Therefore, an equivalent and more concise way to write the threaded multiplicat ions makes use of this and the fact that a dummy dimension of size 1 is created by default if the second argument is omitted (see C< help dummy >) print $c->pdim('c') * $d->dummy(0)->pdim('dd'); $A [l,m] $B [n,l] $AB [n,m] $AB = inner $A->dummy(1), $B->xchg(0,1) $A->dummy(1) $B->xchg(0,1) $AB (l)[1,m] (l)[n] ()[n,m] ":,(0),($j)" ":,($i)" "($i)($j)" Going back to the original piddles C<$A> and C<$B> we see that the slice expressions change to $A $B $AB ":,($j)" "($i),:" "($i),($j)" and that means $AB->slice("($i),($j)") = inner $A->slice(":,($j)"), $B->slice("($i),:") $i=0..n-1, $j=0..m-1 and that is exactly the definition of the matrix product as we explained above! Our bit of formalism has sort of "proved" it. You see that the slice and dimension matching formalism we developed can really be helpful when you try to verify that your complicated threading expression does what you want it to do. However, as you get more experience with threading we strongly suspect that you don't need this any more; you will rather develop a much better "feeling" how to write down the right combination of dimension manipulations to achieve the desired result in a thread loop.