On Mon, May 24, 2010 at 8:25 PM, Gabriel Dos Reis <gdr@integrable-solutions.net> wrote:
I'm ignorant of what ECL's collector works.  I have so far assumed that
it just reuses the defaul H-W collector; but apparently, that is not the case?

Well, the Boehm-Weiser garbage collector is a pretty flexible library. It includes a lot of features:
* Precise collection
* Generational garbage collection with write barriers
* Parallel garbage collection shared among threads
* Macros for thread-local allocation
* Macros for inlining allocation without locks
* Facilities to extend the mark phase
* Finalization
...
the list is huge and most of it is not well documented. Making an optimal use requires a man-month of continuous investigation -- but only _after_ all other performance sinks have been eliminated, for we may run the risk of premature optimization choosing the wrong strategy.
 
I understand that second point.  Even if the client's algorithms are
suboptimal (in terms of theoretical complexity), should we still expect
order of magnitude between runs with ECL and other Lisps on the same
algorithm implementation?

I am not sure I expressed myself properly. What I meant is that the user code may be making assumptions about the implementation and it strengths leading indeed to significant performance differences.

The first reason is that you might be exercising functions that are / were also suboptimal in ECL itself. For instance using too much DELETE vs. other loops, or using DELETE-DUPLICATES. In particular those functions were operating on sequences using ELT. That means DELETE and friends could move from O(n) to O(n^2) which for a list with 10 elements is a 10x in time. Indeed the times for runnings the ANSI tests with sequences went from 1 second per file to about 0.09 seconds. Not that it matters much because the whole test suite is dominated by other things, such as the compiler tests, but it gives you an idea of the impact of simple things on the overall performance.

Sometimes it is just a simpler things --gentemp I recall, was a huge bottleneck because of using FORMAT-- that make destroy the user's experience and cause the impression that the whole implementation is a disaster. Here I am probably to be blamed because I do not have the time or resources to profile all those bottlenecks, but they are easy to find with time and patience.

The second reason why things can get pretty bad performancewise you have already experienced: the compiler. Here again stupid design choices, such as the algorithms that decide when to unbox or not, or the expectation from user code that type inference is there and works, thus removing all explicit type declarations, or even an improper use of declarations (we had this problem in pprint.lsp, where automatic type checking of arguments was stupidly activated on all functions), may decrease performance by a factor 10 or more just because of introducing boxed values, additional function calls, excesive branching, ...

But again there is no fundamental limitation. The proof is that the sequence functions that I implemented run *faster* in ECL with appropriate type declarations than in other implementations. Have a look at them: *now* they are not fundamentally different from the algorithms that SBCL uses, same asymptotic complexity and all that.

But if you want a short answer to your question

"should we still expect order of magnitude between runs with ECL and other Lisps on the same algorithm implementation?"

this is not solved in one day :-)

Juanjo 

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://tream.dreamhosters.com