I started this list of projects for a friend. I think a good
introductory project should have easily measurable impact, even
without a complete patch, and not require to grasp too much of the
system *at once* to get traction. I didn't worry too much about domain-
specific difficulty, since said friend is in PLT (more theory than
implementation, but that's still fixable ;) If you have more ideas, do
reply. If you're interested in doing one of the following, I don't
think help will be hard to come by.
Here's my completely unordered list (no implicit gradient of
difficulty or logical grouping, it's in the order I thought it up):
1. Peephole, either at the virtual-operation or assembly -level would
be cool. I only have the draft of a skeleton ready for either.
2. We've recently been informed of some performance problems in an
internal data structure used to represent set of constraints (that are
known to be true at a certain program point). We used to use hash
tables but switched to something bit-vector--based, and there seems to
be serious performance regressions on some inputs. The interface is
very well (modularly) defined and explained.
3. SSE intrinsics are coming to x86-64 soon. It should be fun to
rewrite some block (mostly sequence-oriented) operations to exploit
4. Port SSE (floats and intrinsics) to x86.
5. Rewriting the way hashes are computed might be useful. We could for
example blit relevant data to a temporary buffer and incrementally
hash that buffer (e.g. with murmur hash).
6. The register allocator is also nicely modular. We currently have a
fairly naive coloring-based allocator. Live-range splitting and a
smarter algorithm would be good. It will probably be important to have
a speed VS quality knob.
7. Representation selection (e.g., whether a given variable's value is
represented as a machine or boxed integer) is a performance issue.
That too is nicely modularised and would benefit from something like
live-range splitting. It's a fairly unique problem, but I think it can
be seen as a variant of spilling optimisation.
8. Speeding the compiler up by making some analyses optional would be
highly appreciated. There's some very preliminary work on the bug
9. In the same vein, modifying the way analyses are run to allow top-
down rewriting would lead to interesting stuff. It's currently very
hard to perform high level rewriting because forms tend to be
rewritten into inline loops first. That project can scale very high.
There are tons of interesting approaches here!
10. Loop optimisations at the IR2 level... Or even an SSA pass, which
would likely be useful for 6 and 7.
11. Green threads? Win32 looks like it'll offer a challenge.
12. EQ-based hashing isn't very hot. It'd be nice to see consistent
address-based hashes, even in the presence of a copying GC. I can
think of at least one scheme that wouldn't be too space/time
inefficient and would play nice(ish) with our use of cheney, at least
on gencgc platforms.
13. (dynamic-extent) allocation pools. I'd expect that'll represent as
much (if not more) work thinking a good interface up as implementing it.