Now that we have parallel thread distribution, I would
like to see an enhancement to allow full use of the
specified number of processors regardless of the size
of the thread dimension. E.g., currently, a thread
loop with an odd size will never use multiple CPUs
(unless I misunderstand things).
To do this correctly would involve appropriate
padding/noops for the "missing" values. One
possibility would be a special slice operation
that returns a virtual padded pdl.
Now fixed as of https://github.com/PDLPorters/pdl/commit/aea7a988639908b3c5927477abfa5d92043eeede
In fact https://github.com/PDLPorters/pdl/commit/55f344510cc7d3f2fab159e2be84a5bc3326eca1