|
From: Wolfgang W. <ww...@gm...> - 2005-03-12 19:59:43
|
Hello everybody!
It's time to get some life onto the list again.
1. Coroutines
-------------
Using stream processing and kernels allows for support of modern
arcitectures to come up in the future (multi-core SMP-like systems
wich will probably be in consumer boxes in some years already (see
recent "Spektrum der Wissenschaft") as well as more radical ones like
the Cell architecture).
However, one of the problems is how to adequately implement streams
and kernels on a (currently) standard UP (uniprocessor) box.
One solution for that could be the use of threads combined with the
use of coroutines. For details, see Knuth, "The Art of Computer
Programming" or the documentation for coroutine.{cc,h} in
devel-ray/src/lib/threads/.
The coroutine implementation in there is actually a thread-safe port
of libPCL (portable coroutine library) by Davide Libenzi.
Basically, coroutines allows for cooperative fast user-space switching of
contexts without system interaction.
(On my linux-2.6 UP system, a coroutine context switch is 7 times as fast
as a thread context switch.)
Coroutine switching is done cooperatively which additionally eliminates
locking required for concurrent threads.
2. VM
-----
After Andrew's remarks concerning the VM and a recent feeling on
behalf of my sinde about the need for more performance, I thought
about changing the (planned) internal VM layout somewhat.
For some background, we were having the following problems:
- We need a garbage collector because the user of the VM cannot be
relied on to properly deallocate objects. However, during the GC
run, other threads need to be stopped which is not easily possible
using pthreads and similar APIs.
- The VM has a two-stage lookup/dereferenciation for pointers.
Advantage: Easily allows to use safe pointers (user cannot crash VM),
allows to implement on-demand requests of objects over the network
on the VM level.
However, using real pointers would of couse be faster and the impact
is not limited on the VM because all exported objects need to use the
same notion of pointers as the VM itself.
So, a possible way to go seems to be the following:
- Use real pointers in the VM. The VM can use a checking mode where
every pointer value is first looked up in a hash table to verify
its validity. This is much slower than the previous two-stage lookup
(which is O(1)) but the non-checking version of the VM (for real
renderings) is faster.
- Network distribution or on-demand requests of objects then need to
be handeled at a higher level. (This will probably demand for an
indirection layer for objects capable of such on-demand loading.)
- The garbage collection is performed by the conservative GC
(http://www.hpl.hp.com/personal/Hans_Boehm/gc/) which is also used
in gcc and other projects.
Hans Boehm, et al, spent a lot of time in writing a state-of-the-art
GC which nowadays even supports multi-threading (by stopping other
threads; they have platform-dependent code for that part).
This also eliminates some development work for us.
- SPL (as of my current design) allows for explicit deletion of objects
(despite the GC). If the user explicitly deletes all his allocated
objects, the application will run faster because the GC has less work
(I verified that with boehm_gc).
We can allow the user to not use garbage collection (and do his own
memory management) if he wishes to do so (at his own risk, as always).
The main problems with this approach are:
- The VM needs to know the object base location to do dynamic casts which
are also needed for virtual function calls. This can be solved by
attaching an offset value to all base classes in all instances.
Actually, no additional memory is required unless the object is a
(base) class of non-zero size but has _no_ virtual functions. These
cases are probably rare.
- Pointer size is no longer constant for the VM since we support 32bit
and 64bit systems. This means that the compiler cannot easily calculate
offsets for the target machine. The easiest solution that comes to my
mind is that the VM assembly uses 2 values for each address/offset
specification, one for 32bit and one for 64bit systems. The VM can
then select the correct version when loading the assembly file.
- Explicit deletion changes in behaviour. Previously, a pointer to an
explicitly deleted object would be NULL after deleting the object
because the shared indirection layer index contains NULL. Now, a
pointer is not automagically NULL and dereferencing a deleted object
may crash the VM (or trigger an assertion for the checking VM).
3. SPL
------
Just by chance, I stumbled accross a tool called "treecc"
(www.southern-storm.com.au/treecc.html). It helps in compiler development
mainly by protecting the programmer from forgetting cases.
I'm currently seeing if I will use it for the SPL compiler implementation.
This is the reason why compilation of the code currently fails just
before the end in the spl/ directory.
Since treecc is quite small and not very common, I will put it into the
3rdparty/ directoy once it is clear that it will be used.
The same holds for the garbage collector. It is correct that there exist
binary distributions of it but we must make sure that we have the
thread-safe version and also may play with parallel marking and other
compile-time tuning.
(Things already work in my development version but I would like to avoid
adding lots of 3rdparty code to the CVS which will be removed again later.)
4. RT core
----------
Any more ideas/suggestions on core design / stream processing / SIMD?
(Andrew: I would be very happy to see your ideas before you have to
leave us.)
Regards,
Wolfgang
|