From: Alan M. G. <alm...@gm...> - 2010-07-04 01:26:38
|
On Thu, Jul 1, 2010 at 12:11 PM, Kevin Cameron <iv...@gr...> wrote: > > Since someone mentioned VCS: I worked on a parallel processing version > of VCS (VCS-MT back in the '90s). Prior to that I worked on a parallel > VHDL simulator. My advice is - > > 1. Don't do a lock-step/SMP implementation. > > SMP is on it's way out, and lock-step algorithms stop you taking > advantage of parallelism in the designs. I'm doing a threadpool-based implementation. In addition, ordering dependencies imply that two events with a dependency on one another also access similar memory areas, meaning that, if we have some sort of hook where we can send particular tasks to particular processors on a NUMA arch, we can send dependent tasks to processors that have fast access to that particular memory. For example, if a task is dependent on some other task we put them on a chain on the "first" task that needs to execute "in order". That chain can be transformed (after completing the task) to a stealable deque, so that the same processor (mostly) handles a set of tasks that are dependent on one another, and therefore will mostly access the same memory. If the other processors become idle they then steal tasks from an overloaded processor. I still have to work out fast multithreaded work-stealing deques though; memfences make my head spin. For that matter commodity SMP's have SMP semantics but NUMA timings anyway. Cache miss, anyone? > Even if it does work, you would probably be better off doing a > proper compiled-code back-end. Certainly. But multithreaded is sexy, and gives IVerilog a chance to be put on a spotlight, with more eyeballs to go through the compiled-code back-end. JITted interpreters are not as sexy as they once were (even tracing JIT is getting mainstream) ;) IMO anyway. The main problem is that compiler research has been in the limelight for a long time, with everyone squeezing every little bit of juice out of .... a single processor. JIT isn't sexy, because it's just compiler research where you're optimizing the compiler while you're optimizing its output. People get taught compilers in college. They didn't (until recently) get taught the stuff Djikstra's been spamming his EWD's about cooperating processors and the bins they go through to communicate with each other. AFAICT anyway. After all I took up electronics in college ;) So what little I know about what they teach in college compsci is from my cousin who took that. In a university in a third-world country. For all I know some university somewhere teaches semaphores to freshmen college students. I suggest using GNU Lightning, although I hear LLVM is getting some traction. LLVM feels kinda heavy to me though, which is why I prefer the header-only Lightning. But Lightning development is kinda slow and spurty... > > 2. Divide up the design into big chunks (statically or dynamically), and > process them in separate executables. > > That scales to networks and can avoid stalls that you get with > lock-step. This requires creating a new ivl target. I'll think about this later. Most parallelizing commercial simulators (whose ads I've seen, anyway) do this but require partitioning "by hand", one part of my proposal is how to (in effect) do partitioning automatically. For that matter SIMBUS seems to be the iverilog answer to by-hand partitioning, although I haven't actually looked deeply into that. But anyway I'll think about mvvp first before hacking into ivl targets. > > 3. Parallel processing things like the event-scheduling (on SMP) doesn't > work. > > Tightly coupled stuff tends to run into the problem that it dirties > the cache lines. That's why there's a proposal for event-local scheduling, with the schedules of each event being merged at a later paralleiizable stage. Basically we're transforming the simulator to what is effectively a glorified MapReduce, with the output of the map being the schedules by each executed event, and using a merge operation on the schedule structures for the reduction function. > > For a static version of (2) it's mostly a front-end job rather than > backend/runtime effort. I agree, see above. Sincerely, AmkG |