On Mon, Mar 19, 2012 at 11:29 AM, Andy Hefner <ahefner@...> wrote:
> On Mon, Mar 19, 2012 at 1:07 AM, Matthew Mondor
> <mm_lists@...> wrote:
> >> Well, the system seems quite faster using --with-__threads=no so far.
> The last time I looked closely at ECL performance (on x86 Linux,
> admittedly a couple years ago), threaded builds wasted a tremendous
> amount of time in ecl_process_env doing a TLS lookup via
> pthread_getspecific at the entry of every function (compare with
> non-threaded builds, which simply load the pointer from a global
> variable). I wonder if you're seeing this, or something else.
Not really. He was discussing the effect of --with-___threads which
switches between using POSIX threads and compiler-assisted thread-local
Regarding the environment problem itself, my experience varies.
* In OS X it is a considerable hit, but not terrible. Last time I measured
it, a multithreaded ECL running with a single thread took between 10% and
20% more time to do general things, such as running the test suite, than a
single-threaded one. This includes not only the environment lookup but also
garbage collection, which no longer can benefit from the generational
algorithm (Boehm's library does not support it in multithreaded code).
* In OS X, TLS invariably ends up using the POSIX threads mechanism. It
does not matter whether it is statically or dynamically loaded, but
specified by the operating system. In Linux, the code that is generated
does not differ that much from OS X, even in the TLS case; or did you
experience a big performance improvement when going non-PIC. In any case
this is something to explore.
I'm sure ECL would see a nice gain by
> passing the environment pointer directly as a function argument (in
> the fashion of GHC's LLVM backend, minus the custom calling
> convention), but it would be an intrusive change.
It would be very intrusive and I am not sure whether it would help that
much -- two extra arguments for each function call instead of just one
(narg). I recall (in the past) the number of arguments being more critical
than having a code that is frequently executed. As I said before, profiling
in the worst platforms does not seem to show such a terrible performance
hit, specially when one compares all the other stuff which is needed for a
multithreaded ECL (locking, for instance, or different handling of special
variables, etc) Again, I do not discard it -- it could be explored with
Static linking would fix the problem too, but.. LGPL. =/
LGPL is not incompatible with static linking. One just has to provide the
binaries to link with a new copy of ECL.
> I've probably mentioned all this before; if so, apologies for repeating
No apologies. This is the type of dicussion which is needed here :)
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)