James Y Knight <foom@...> writes:
> I've got some interest in this project, so I'm going to now follow up
> with the appropriate people inside ITA to get things moving.
I take it that means you have some interest from a prospective
consultant? That is good; I think (keeping one eye on what I might do
after my present sources of funding run out) that establishing that
well-posed commercial demand for customization or improvement can be
met is most definitely in my own interest. I am currently employed,
and so can take on only a limited amount of consultancy; however, if
no-one else steps up to the plate I would be happy to try if only to
establish the precedent.
> I'm not entirely sure whether we've got especially hard-to-compile
> code, or if sbcl's compiler is just slow in general. It may be that
> there's some macros or inline functions that SBCL has a particularly
> hard time with, causing everything to slow down, but it's my gut
> feeling that SBCL's compiler is just slow, in general.
I think it is slow, but that there is some low-hanging fruit.
The principals will correct me if I'm wrong, but I believe that Juho
Snellman and David Lichteblau have done some work in investigating
performance and scalability problems in the compiler and the fasl
loader; in particular Juho has looked at the effects of
*top-level-lambda-max*, byte compilers, IR1 interpreters and similar,
while David has found some big-O badness in various places (which may
or may not help for the common case thanks to the pesky constant
factors); if the bottlenecks in ITA's compilation lie in those big-O
places, then a closer look would be warranted.
I think there's one particular source of slowness where many top-level
forms are involved, though I suspect from your description that this
isn't ITA's problem: when the compiler sees a toplevel form
where foo is a macro, it actually ends up compiling, effectively
(funcall (lambda () (foo bar)))
and I think the overhead of this is noticeable. I think this is one
of the things that WHN's fopcompile thing solves; a bit more on this
below, but as I say I would be surprised if your system involves so
many toplevel forms that that is the bottleneck.
> There is one odd thing we do, which exacerbates our problem. We
> actually load all the source, and then compile it. So the comparison
> with CMUCL is somewhat unfair, because sbcl has no evaluator, it has
> to compile everything both times. I know everyone will say "wow
> that's ugly, you shouldn't be doing that", and I won't disagree with
> you, but it's not going away any time soon for a number of reasons.
As Nikodemus has said, this problem can essentially be solved with
relatively little effort, by tidying up and committing Brian Downing's
evaluator (either turned on by default or as an sb-interpreter
contrib). That brings us a little closer to acceptable performance,
and is clearly the lowest-hanging fruit: from your timings, this would
likely chop off around 30% of the time to produce your builds.
> So, SBCL starts with a disadvantage vs. cmucl as it has to compile
> everything twice, but discounting that, it's still taking 1.5x as
> long. (note that the debug level is lower on sbcl becaues debug 2
> takes afaict literally forever). So here's the results I'd like to see:
This is one of the things that makes me think that your bottleneck
lies in large functions rather than lots of small functions: I can
easily believe that there's some N^large or even large^N algorithm in
the compiler somewhere, and adding the instrumenting code for debug 2
in medium-sized components makes N itself large.
> 1) Normal compilation should be faster, ideally even faster than
> CMUCL. Safety and speed do matter.
It's a bit difficult to estimate the scope of this, though there are
some things that could be done. I believe that in Juho's measurements
the presence of the byte compiler in CMUCL for byte-compiling
top-level forms (as discussed above) is one reason for the difference
in speeds; guessing, another might be SBCL's improved support for
typesafety with the split of CONTINUATION into CTRAN and LVAR
structures and the addition of CASTs; this might increase the sizes of
compiler data structures enough to be noticeable.
> 2) Setting compilation-speed 3 should be *much* faster, and I don't
> give a darn about the speed of the compiled code. Some safety is
> still nice, though.
I think that this is a worthwhile aim even if the code LOADed is
interpreted (using Brian Downing's evaluator), and indeed that would
provide a mechanism: we could have not only #<interpreted-function...>
but also #<minimally-compiled-function...>, where the only compilation
that has occurred are the requirements by the Spec.
> I have not done any profiling of the compiler or anything like that.
My bottom line is that I would like to see this work done, and while I
am currently constrained in how much time I can give to it, the
constraints are negotiable, so if for whatever reason there is no
other offer we can work something out.