On Mon, Mar 19, 2012 at 11:29 AM, Andy Hefner <ahefner@gmail.com> wrote:
On Mon, Mar 19, 2012 at 1:07 AM, Matthew Mondor
<mm_lists@pulsar-zone.net> wrote:

>> Well, the system seems quite faster using --with-__threads=no so far.

The last time I looked closely at ECL performance (on x86 Linux,
admittedly a couple years ago), threaded builds wasted a tremendous
amount of time in ecl_process_env doing a TLS lookup via
pthread_getspecific at the entry of every function (compare with
non-threaded builds, which simply load the pointer from a global
variable). I wonder if you're seeing this, or something else.

Not really. He was discussing the effect of --with-___threads which switches between using POSIX threads and compiler-assisted thread-local storage.

Regarding the environment problem itself, my experience varies.

* In OS X it is a considerable hit, but not terrible. Last time I measured it, a multithreaded ECL running with a single thread took between 10% and 20% more time to do general things, such as running the test suite, than a single-threaded one. This includes not only the environment lookup but also garbage collection, which no longer can benefit from the generational algorithm (Boehm's library does not support it in multithreaded code).

* In OS X, TLS invariably ends up using the POSIX threads mechanism. It does not matter whether it is statically or dynamically loaded, but specified by the operating system. In Linux, the code that is generated does not differ that much from OS X, even in the TLS case; or did you experience a big performance improvement when going non-PIC. In any case this is something to explore.

I'm sure ECL would see a nice gain by
passing the environment pointer directly as a function argument (in
the fashion of GHC's LLVM backend, minus the custom calling
convention), but it would be an intrusive change.

It would be very intrusive and I am not sure whether it would help that much -- two extra arguments for each function call instead of just one (narg). I recall (in the past) the number of arguments being more critical than having a code that is frequently executed. As I said before, profiling in the worst platforms does not seem to show such a terrible performance hit, specially when one compares all the other stuff which is needed for a multithreaded ECL (locking, for instance, or different handling of special variables, etc) Again, I do not discard it -- it could be explored with some work.

Static linking would fix the problem too, but.. LGPL. =/

LGPL is not incompatible with static linking. One just has to provide the binaries to link with a new copy of ECL.
 
I've probably mentioned all this before; if so, apologies for repeating myself.

No apologies. This is the type of dicussion which is needed here :) 

Juanjo

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com