#80 better rle() output size handling and documentation

Chris Marshall

In order to support PDL threading the rle() routine returns its results with the same size as the input pdl data. That makes threaded computations work at the expense of extra memory usage. In fact, the rld() routine takes care to allocate the maximum possible size to handle all the input rle-encoded data for threaded output.

Because of the rle() fixed output size, the common use case of processing a 1-D piddle requires an extra step to determine where the valid rle() output data is and this is non-intuitive and a bit of a pain for 1-D data. In addition, the docs don't clearly state how to calculate the number of good elements.

If rle() by default returned the smallest possible output data dimensions by truncating to the maximum size < n over the thread dimensions, then the obvious 1-D example would behave more intuitively.