This version introduces a small optimisation:
cx_stk_t.sig_sa is now initialized once per thread - init code moved from CX_CATCH_SIGNAL() to CX_INIT().
(in fact this should be done long time ago :) )
BTW, for the first time I've checked how much time the test program takes to execute:
on my machine it's only 15 miliseconds...
In 15 miliseconds 11 threads are crashing (some of them several times and in different ways) and are overflowing the stacks and buffers... - and the program returns success, since every crash is catched ;)