From: David P G. <gr...@us...> - 2013-06-11 22:27:02
|
Hi Josh, In general, I think the reason to have map/reduce/scan operations in the library (as opposed to defined by the user in their application via the normal iteration methods provided by the library) is to give efficient & parallel implementations of the operations. In terms of semantics, I think that means the operations should be defined to be may parallel (the implementation may implement some operations in parallel, but may also implement some of them sequentially). This gives the implementation the most flexibility to heuristically pick an appropriate task-size to dynamically match the available resources. Doing a good job of dividing the work up into chunks and doing them in parallel takes some amount of clever code, so it is appealing to do it once and share it between multiple data structures. However, the trick is doing that in a way that the abstraction that enables a single map or reduce implementation to work on multiple collection types doesn't introduce so much overhead that the performance gained by clever coding of the parallel work structure gets lost in the sequential overhead of object allocation, indirection, and interface invocation. My intuition is that the general implementation would lose enough performance that we'd end up with specialized ones for many of the data structures, but I could be wrong. It would be interesting to see how much of a performance difference there is and understand how much of the gap is fundamental and how much could be fixed via better optimization of the sequential object-oriented X10 language features. For the specific case of Rail, we should bring back map/reduce functions for Rail before we release 2.4. Since Rail is a @NativeRep class, we will probably do that by putting static functions into x10.util.ArrayUtils (or similar) so that we can write them once in X10. --dave |