From: Josef S. <js...@ya...> - 2006-08-29 09:34:25
|
Hi Greg, --- Greg Hood <gh...@ps...> wrote: > Upi, > I have been looking at the MOOSE code, and thinking about certain issues > involved in parallelizing it, and have some serious concerns. > > The greatest concern I have is with the many places in the basecode > that make an implicit assumption that elements are locally resident in > the nodes's memory, and that only one thread will be actively > modifying them. For example, if the elements are distributed over > many nodes, then Element::relativeFind() will potentially require > information on 2 or more nodes. This will cause the code to block for > indefinite periods of time while the interprocess communication is > performed and the remote nodes do what they need to do. The simplest > way of dealing with this would be to only allow one active thread over > the entire set of nodes on which MOOSE is running. However, this > would be disastrous in term of performance -- network setup would be > much slower than doing it on a single node. If we allow multiple > active threads on each node to avoid the performance hit, then every > method that directly or indirectly calls one of these methods that > require off-node information will potentially block. While this > occurs, incoming requests from other nodes must be handled, and some > of those may involve the Element in question. Some form of locking will > thus be needed (probably on a per-Element basis). The difficult thing > is that each of the places in the code where a potentially blocking > call will occur will have to release the Element lock, and must leave > the Element (as well as any kernel data structures) in a safe and > consistent state. I can't see this being done without rewriting many > sections of code. The most troublesome situations will be when > modifications are being made to the element tree, such as when new > elements are being created or old ones destroyed. Once the network is > set up, things may not be so bad, but the network needs to get set up > in order to run it. Moose has a solid foundation using decent design patterns. Since these patterns were introduced, several hurricanes struck. My guess is that among them were lack of a cohesive and comprehensive architectural design, personnel changes, premature optimization, and just wanting to get some pieces done. I was going to warn you about these issues. I think it's completely unreasonable to try to parallelize the code in it's current incarnation. In a nutshell, it's just entirely too tightly coupled with very little apparent cohesion in any of the classes: Elements contain connections, connections contain Elements, Fields contain connections and Elements, etc, etc, etc. This has lead to the dreaded "header.h", AKA KitchenSink.h. I've been trying to get the code back in line with the (maybe just apparent) original design patterns. It's difficult. What's most intimidating is that all roads lead to rewriting the moose pre processor and subsequently regenerating a bunch of code from the .mh files. The problem is that many of the generated files have since been edited by hand. Ugh. The core moose needs to be a library - some thing to be used by programmers. Creating libraries requires a greater attention to implementation details in order to provide accepted and expected behaviors. We need to be able to present a clean API to this library, ideally through swig. The genesis parser, ReadCell, Plot, etc should use the API to access the core library. However, core architectural issues must be addressed before this will be attainable. > [...] > --Greg joe js...@ya... Software Engineer Linux/OSX C/C++/Java |