Hi Etienne,
In a previous posting you had the following to say on flushing strategy.
Here are some more ideas.
Currently you have _svmf_flush(_svmt_word *pword). This granularity is very
inefficient for many present generation architecture implementations. The
next best granularity would be the cache line. The best would would to do
code generations for a "block" and then just call the iflush with a pointer
and the length. The iflush can be tuned to a particular architecture (or
more precisely an implementaion of the architecture). With cache sizes
increasing (good fractions of a MB to several MB) it will be a bad idea to
flush whole caches as this will greatly affect performance. The CPU
bandwidth and the Memory bandwidth is increasing with successive generations
....
Let's consider the following cases.
1. Implementation with no caches, then obviously iflush will have to do
nothing.
2. Implementation with split I and D with no coherency between them. Then in
this case one can either do flush on a cache line basis or if the routine
figures out that size of code generations is >= cache size then it can just
flush the whole caches.
3. Implementation with multi-level caches with unified caches occuring at
higher levels. In this case we need only to flush to the first unified level
(this will work only for the UP case).
4. MP implications, depending on the coherency of the various cache levels
then one will have to invalidate all the other caches and write out the
generated code to main memory.
5. For NUMA ??
Bottom-line: I think it is better off to provide iflush with a second size
parameter and leave the implementation specfic for a particular architecture
implementation.
What say you ?
Later,
-Gunda
Hi Grzegorz,
You are doing some very interesting work.
A simple comment: we will probably need to be ready to fine tune the
flushing strategy for specific architectures. Should we act upon the
"data" cache or the "instruction" cache, or both, and what is the
ideal granularity of this action (single word (or cache line), or
general flush), what do we flush (write back buffer only, all entries
in the cache, ...). Yep, a lot of fun ahead...
The simplest strategy, on the short term, would be a full cache flush
( of both instruction and data caches). As this only happens "once"
for each executed method (at the end of method preparation), the
simple approach might have no significant impact on the running time,
while allowing the inline-threaded engine to work on modern
processors.
Of course, once we're done with the inline-threaded engine, we will
have to attack the multi-processor cache coherency problem...
Thanks a lot for this very important work.
Etienne
_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail
|