From: Matthias B. <mbe...@co...> - 2013-04-11 12:54:18
Attachments:
0xD55EEF31.asc
|
Hi, I'm considering applying for Google Summer of Code 2013. I have a bit of experience in implementing Common Lisp compilers (LLVM-based, in this case)[1] and type systems[2]. I do not, however, have any relevant experience with the SBCL code base. There are a couple of project ideas posted on the SBCL website.[3] I must admit that I have trouble deciding which of the projects I find most interesting. There are so many to choose from! Multiple of them piqued my interest. :) I particularly like: - 1.6 (efficient interpretation) - 2.2 (exploiting switch/case) - 3.5 (precise type derivation) - 5.5 (hybrid GC) - 5.6 (integrating the MPS collector) What is the relative priority of these proposals? Thank you, Matthias [1]: git clone http://matthias.benkard.de/code/toilet.git [2]: http://matthias.benkard.de/typecore/typecore.pdf http://matthias.benkard.de/typecore/icfp_abstractum_extensum.pdf [3]: http://www.sbcl.org/gsoc2013/ideas/ |
From: Christophe R. <cs...@ca...> - 2013-04-12 13:40:25
|
Matthias Benkard <mbe...@co...> writes: > I do not, however, have any relevant experience with the SBCL code base. One way of strengthening any possible application to the program might be to quickly get some experience -- maybe by developing and posting a trivial improvement (say fixing a typo in a comment somewhere). > There are a couple of project ideas posted on the SBCL website.[3] I > must admit that I have trouble deciding which of the projects I find > most interesting. There are so many to choose from! Multiple of them > piqued my interest. :) > > I particularly like: > > - 1.6 (efficient interpretation) > - 2.2 (exploiting switch/case) > - 3.5 (precise type derivation) > - 5.5 (hybrid GC) > - 5.6 (integrating the MPS collector) > > What is the relative priority of these proposals? I would say that the most important thing is to do something that interests you personally: that's the factor that in my experience is the most likely to lead to a positive outcome of your work over three months. This might lead to you hybridizing the project ideas -- they are just ideas, after all. I myself don't have a clear prioritization of these ideas; they are all interesting and might be nice to have; some of them might be more niche than others, but I don't know what niches the overall userbase inhabits. Other people might have firmer opinions on what would be nice to have. Also, it's worth saying that if you are interested in many of these, then success in achieving one is likely to lead to the skills and knowledge to make executing the rest substantially easier :) Cheers, Christophe |
From: Paul K. <pv...@pv...> - 2013-04-13 15:42:41
|
Christophe Rhodes wrote: > Matthias Benkard<mbe...@co...> writes: > >> I do not, however, have any relevant experience with the SBCL code base. > > One way of strengthening any possible application to the program might > be to quickly get some experience -- maybe by developing and posting a > trivial improvement (say fixing a typo in a comment somewhere). I second that suggestion. I'll also note that there are currently 39 open bugs marked as easy on our bugtracker: https://bugs.launchpad.net/sbcl/+bugs?field.tag=easy . And, like pretty much all open source projects, our manual could use some love. >> There are a couple of project ideas posted on the SBCL website.[3] I >> must admit that I have trouble deciding which of the projects I find >> most interesting. There are so many to choose from! Multiple of them >> piqued my interest. :) >> >> I particularly like: >> >> - 1.6 (efficient interpretation) >> - 2.2 (exploiting switch/case) >> - 3.5 (precise type derivation) >> - 5.5 (hybrid GC) >> - 5.6 (integrating the MPS collector) >> >> What is the relative priority of these proposals? > > I would say that the most important thing is to do something that > interests you personally: that's the factor that in my experience is the > most likely to lead to a positive outcome of your work over three > months. Same here. I believe that intrinsic motivation is the best predictor of success for this kind of task. Still, some comments: 1.6 has the interesting characteristic that the vast majority of the code would probably be independent from SBCL, or at least well modularised. It would potentially be a contribution to other CL implementations as well. The task basically consists of writing regular code that's incidentally used by SBCL, and the experience would be applicable to plenty of non-CL situations. Plus, if performance is good enough, the interpreter might help improve the object system. 2.2 is probably the most likely to lead to transparent improvements, and that's always fun to brag about. There's some code, but only enough to get a minimal (and not very tastefully implemented) primitive going. There's no prior work (for SBCL) to exploit such a primitive in (E)CASE, but there's at least one paper from scheme-land on that topic. 3.5 and 5.6 are very much long shots. You have some experience with type systems; that helps a lot for 3.5, but SBCL's compiler (particularly what's left after removing the contraint propagation pass) is a bit idiosyncratic, which may cause a bit of swearing from time to time (: I wouldn't be surprised if 5.5 doesn't help much in practice: I have a feeling that people rewrite their program when GC becomes a bottleneck. Regardless of that, I believe it would be a good change, and I expect that some companies that use SBCL with huge pointerful heaps will benefit. Looking forward a couple months/years, a mark/sweep GC will hopefully be easier to parallelise or make (mostly-) concurrent than a copying one, so the project's a bit of a technical investment as well. Paul Khuong |
From: Martin M. <mar...@gm...> - 2013-04-22 03:05:12
|
> >Matthias Benkard <mbenkard <at> common-lisp.net> writes: > >> I particularly like: > >> > >> - 1.6 (efficient interpretation) > >> - 2.2 (exploiting switch/case) > >> - 3.5 (precise type derivation) > >> - 5.5 (hybrid GC) > >> - 5.6 (integrating the MPS collector) > >> > > 3.5 and 5.6 are very much long shots. You have some experience with type > systems; that helps a lot for 3.5, but SBCL's compiler (particularly > what's left after removing the contraint propagation pass) is a bit > idiosyncratic, which may cause a bit of swearing from time to time (: > > > Paul Khuong > > -------------------------------------------------------------------------- Hello, I'm also interested in 5.6 (MPS GC) as well as 6.11 (GCed special variables) and 6.12 (Precise stack scanning). I'll be starting a CS PhD program in the fall, working on bioinformatics problems where current C++ software uses in excess of .5 TB of RAM. I'd like to be able to use SBCL for any work where C/++ isn't truly required, so my biggest motivation is space efficiency. How far from fully precise (and fully collected) would 6.11 and 6.12 put SBCL on x64 linux? http://common-lisp.net/~dlw/LispSurvey.html seems to suggest that it is already on PPC, but http://www.sbcl.org/manual/History- and-Implementation-of-SBCL.html indicates that it wouldn't make sense on x86, does x64 have enough registers to justify this for SBCL? Thanks, Martin |
From: Paul K. <pv...@pv...> - 2013-04-22 22:56:34
|
Martin Muggli wrote: [...] > I'm also interested in 5.6 (MPS GC) as well as 6.11 (GCed special > variables) and 6.12 (Precise stack scanning). The latter two are in section 6 because: 6.11: AFAIK, only one system has run into that limit: kpreid's E (?) implementation that bound gensymed symbols as special. The task description might be misleading, so I'll write some details. SBCL implements thread-local special bindings by allocating indices in fixed-size (per-thread) tables on demand, as variables are bound dynamically. Indices are allocated by incrementing a global counter and are never re-used, even if a symbol is eventually GCed. This isn't ideal, and could be alleviated by computing the set of indices still in use during major GCs. I believe that task would constitute a very nice introduction to the GC… but I don't think its impact would be major. 6.12: I don't think anyone has a good angle for that task. I'm sure nearly every committer has thought of approaches, but I don't know of any that seems assuredly workable. If you're not already familiar with SBCL, it's probably not the best choice. > I'll be starting a CS PhD program in the fall, working on bioinformatics > problems where current C++ software uses in excess of .5 TB of RAM. I'd > like to be able to use SBCL for any work where C/++ isn't truly required, > so my biggest motivation is space efficiency. Unless that's mostly unboxed data, I don't think your biggest issue will be conservativeness as much as the fact that our GC is mostly copying. If it is mostly unboxed data, conservativeness may very well not be an issue at all; the vast majority of programs work fine, after all. I also wouldn't be surprised if you end up doing something like ITA (now google): they use SB-ALIEN to represent huge data on the C heap. Perhaps you want to try to come up with a project that would directly help your use case? > How far from fully precise (and fully collected) would 6.11 and 6.12 put > SBCL on x64 linux? http://common-lisp.net/~dlw/LispSurvey.html seems to > suggest that it is already on PPC, but http://www.sbcl.org/manual/History- > and-Implementation-of-SBCL.html indicates that it wouldn't make sense on > x86, does x64 have enough registers to justify this for SBCL? I don't think that being conservative only on registers would be a big problem, and I'm not convinced that increasing register pressure to eliminate rare GC issues is a good tradeoff, but opinions on this issue diverge. Still, if we're ever precise on stack, a partitioned register set could be a build-time feature, but there's little point until then. I do feel that stack conservativeness is a pressing issue: the stack is much bigger than the register file, and stray references can easily hang around for a very long time. Paul Khuong |
From: Nikodemus S. <nik...@ra...> - 2013-04-27 13:02:54
|
On 23 April 2013 01:56, Paul Khuong <pv...@pv...> wrote: >> I'll be starting a CS PhD program in the fall, working on bioinformatics >> problems where current C++ software uses in excess of .5 TB of RAM. I'd >> like to be able to use SBCL for any work where C/++ isn't truly required, >> so my biggest motivation is space efficiency. > > Unless that's mostly unboxed data, I don't think your biggest issue will > be conservativeness as much as the fact that our GC is mostly copying. > If it is mostly unboxed data, conservativeness may very well not be an > issue at all; the vast majority of programs work fine, after all. I also > wouldn't be surprised if you end up doing something like ITA (now > google): they use SB-ALIEN to represent huge data on the C heap. Perhaps > you want to try to come up with a project that would directly help your > use case? A quick suggestion: way to point an array-header into an mmap'ed file or memory containing unboxed data. Current array-headers (used for non-simple arrays) always point to simple-arrays which provide the actual storage; for large datasets having one pointing to a mmap'd file instead would seem like a boon. (There are at least a couple libraries floating around that do something similar by sticking a widetag and length into foreign memory, but that's not ideal as it means you cannot examine arbitrary foreign memory as a lisp array, but only foreign memory where two first words are free for you to write into.) Cheers, -- nikodemus |
From: Paul K. <pv...@pv...> - 2013-04-28 08:00:04
|
Nikodemus Siivola wrote: > A quick suggestion: way to point an array-header into an mmap'ed file > or memory containing unboxed data. [...] > (There are at least a couple libraries floating around that do > something similar by sticking a widetag and length into foreign > memory, but that's not ideal as it means you cannot examine arbitrary > foreign memory as a lisp array, but only foreign memory where two > first words are free for you to write into.) Arrays displaced into foreign memory would certainly be useful. However, on most POSIX platforms, it's easy to mmap a file in and prefix that with an anonymous mapping for the header. That already covers a lot of my use cases. Paul Khuong |
From: Martin M. <mar...@gm...> - 2013-05-02 20:32:43
|
Paul Khuong <pvk <at> pvk.ca> writes: > > Nikodemus Siivola wrote: > > A quick suggestion: way to point an array-header into an mmap'ed file > > or memory containing unboxed data. > [...] > > (There are at least a couple libraries floating around that do > > something similar by sticking a widetag and length into foreign > > memory, but that's not ideal as it means you cannot examine arbitrary > > foreign memory as a lisp array, but only foreign memory where two > > first words are free for you to write into.) > > Arrays displaced into foreign memory would certainly be useful. However, > on most POSIX platforms, it's easy to mmap a file in and prefix that > with an anonymous mapping for the header. That already covers a lot of > my use cases. > > Paul Khuong > > -------------------------------------------------------------------------- ---- > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr > Thanks for your thoughts Nikodemus and Paul, Unfortunately, I'm not working with a concrete problem just yet, so a problem targeted project isn't practical. Hence, I'd be more interested in improving space efficiency in general. And I agree, unboxed arrays and mmap'd files are already workable and efficient, though could be made more friendly. I'm more interested in the dynamic graph structures dealt with by garbage collectors though. In response to Paul Khuong's comment about the mostly copying GC being a better target, I started reading The GC Handbook (Jones '11) and various papers, and learned about an interesting newish GC method for larger-than- core working set problems called bookmarking. There are links and discussion at http://lambda-the-ultimate.org/node/2391, but the basic idea is the GC cooperates with the OS kernel: advises the kernel about which pages are best to evict, and keeps notes about references in pages such that it can limit collection to resident pages. I would be interested in implementing this. Do you think this would have a place in SBCL (even if just a compile time feature)? - Martin Muggli |
From: Paul K. <pv...@pv...> - 2013-05-02 21:03:12
|
Martin Muggli wrote: > In response to Paul Khuong's comment about the mostly copying GC being a > better target, I started reading The GC Handbook (Jones '11) and various > papers, and learned about an interesting newish GC method for larger-than- > core working set problems called bookmarking. > > There are links and discussion at http://lambda-the-ultimate.org/node/2391, > but the basic idea is the GC cooperates with the OS kernel: advises the > kernel about which pages are best to evict, and keeps notes about references > in pages such that it can limit collection to resident pages. I would be > interested in implementing this. > > Do you think this would have a place in SBCL (even if just a compile time > feature)? That approach depends on patching the kernel. AFAIK, no contemporary production operating system supports the interface needed by bookmarking GC… and I don't expect we'll be merging in anything that requires kernel patches any time soon. Paul Khuong |
From: Martin M. <mar...@gm...> - 2013-05-03 00:27:43
|
Paul Khuong <pvk <at> pvk.ca> writes: > > That approach depends on patching the kernel. AFAIK, no contemporary > production operating system supports the interface needed by bookmarking > GC… and I don't expect we'll be merging in anything that requires kernel > patches any time soon. > > Paul Khuong > hmmm... you're right. I saw the need for mincore() and madvise(), and found those present in production OSs, but now I see it also really needs a "callback" from the OS before page evictions so it can make bookmarks. I also see some interest in cooperative GC on the linux-mm list, but it looks like it's still an open wishlist item. In any case, maybe MPS integration is the next best thing. Are there thoughts about what folks would like to see happen regarding the two existing collectors? It seems like there would be an opportunity to have a consistent interface for MPS, CHENEYGC, and GENCGC such that the code base wouldn't need so many LISP_FEATURE_GENCGC ifdef's (which would become more complex with a 3rd GC). Martin Muggli |
From: Paul K. <pv...@pv...> - 2013-05-03 02:08:58
|
Martin Muggli wrote: > In any case, maybe MPS integration is the next best thing. > > Are there thoughts about what folks would like to see happen regarding the two > existing collectors? It seems like there would be an opportunity to have a > consistent interface for MPS, CHENEYGC, and GENCGC such that the code base > wouldn't need so many LISP_FEATURE_GENCGC ifdef's (which would become more > complex with a 3rd GC). I believe it would make much more sense to try and integrate MPS, and see how that task would guide refactoring. It's far too easy to end up implementing a convoluted design that doesn't even enable the changes we're ultimately interested in. Paul Khuong |
From: Martin M. <mar...@gm...> - 2013-05-03 16:08:44
|
Paul Khuong <pvk <at> pvk.ca> writes: > > Martin Muggli wrote: > > In any case, maybe MPS integration is the next best thing. > > > > Are there thoughts about what folks would like to see happen regarding the two > > existing collectors? It seems like there would be an opportunity to have a > > consistent interface for MPS, CHENEYGC, and GENCGC such that the code base > > wouldn't need so many LISP_FEATURE_GENCGC ifdef's (which would become more > > complex with a 3rd GC). > > I believe it would make much more sense to try and integrate MPS, and > see how that task would guide refactoring. It's far too easy to end up > implementing a convoluted design that doesn't even enable the changes > we're ultimately interested in. > Sorry, my question was to refine the goals of MPS integration, not an alternative proposal. I just submitted a proposal, limiting the scope to 64 bit linux, so obviously the other collectors would need to stay around. Thanks for all the feedback! Martin |
From: Matthias B. <mbe...@co...> - 2013-04-13 20:29:26
|
Hi, Paul Khuong wrote: > I second that suggestion. I'll also note that there are currently 39 > open bugs marked as easy on our bugtracker: > https://bugs.launchpad.net/sbcl/+bugs?field.tag=easy . And, like > pretty much all open source projects, our manual could use some love. Heh. Way to go getting me to do actually something productive for a change. ;) I'll have a look and try to poke around a bit. Christophe Rodes wrote: > I would say that the most important thing is to do something that > interests you personally I'm sure that's true. :) Thanks, Matthias |