From: Luke G. <luk...@sy...> - 2007-04-09 09:17:47
|
Howdy! Following up discussions at ILC and #lisp.. Franz have an exciting new concept called "zero-cons" programming. This is the result of looking frankly at what we know about the garbage-collected heap and following it to its logical conclusion: In Lisp we can't rely on any characteristics of the heap what-so-ever, therefore we have to avoid it as much as possible to get efficient and/or predictable behaviour. This is patently absurd, but it highlights an important point. If you don't know how the heap behaves then you can't write efficient programs that use it, any more than you could write efficient programs on a sequence that may be a list or may be a vector. So suppose that a nebulous and ill-defined heap is next to useless. Do we conclude that heap-free programming is the way forward? FUCK NO! Instead I'd conclude that we need to start defining and explaining the chracteristics of our heap so that we can use it unashamedly. I reckon that SBCL can lead the Lisp world forward by nailing down the performance characteristics of the pervasive garbage-collected heap. This jrm post shows a glimpse of what loveliness can be ours: http://groups.google.com/group/comp.lang.lisp/msg/a61cc0b9c525391a If I had a hundred million dollars to direct Lisp development I'd: 1. Write down the performance characteristics of gencgc in a document for average (and also below-average) Lisp hackers to understand. 2. Find the parts that are embarrassing or confusing and use them improve the garbage collector. For example the constant-factor of filling the heap with zero's would perhaps be eliminated here. 3. Say "here's how the heap works, you can depend on it!" Watch my users present papers at Lisp conferences showing programs that cons copiously but deliberately and thrash the pants off Franz's revoltingly contorted programs for reasons simple enough to shout out from the audience. 4. Consider innovative ways to add new and useful properties to the garbage collector. e.g. macros to say "most of the stuff I'm allocating here will be garbage when this form returns so please try not to promote it to the next generation". My $0.02 :-) P.S. If "zero-cons programming" really takes off I'm going to be too embarrassed to admit being a Lisp hacker at parties. P.P.S. My impression is that the AllegroCache team at Franz don't know what properties the allegro GC is supposed to have and furthermore don't trust that it would really work as intended. (Maybe because it has bugs or maybe it's something deeper, like that conservative GC is a lose in gigabytes of heap?) |
From: Frode V. F. <fr...@cs...> - 2007-04-09 19:46:39
|
Hi Luke and sbcl-ers, Luke Gorrie <luk...@sy...> writes: > Following up discussions at ILC and #lisp.. This makes me wish even more I was there, rather than enjoying the snowfall (!) here in Oslo. > [..] So suppose that a nebulous and ill-defined heap is next to > useless. Do we conclude that heap-free programming is the way > forward? FUCK NO! Instead I'd conclude that we need to start > defining and explaining the chracteristics of our heap so that we > can use it unashamedly. [..] This line of thinking fits in nicely with the ideas I've been basing Movitz memory management on: No GC strategy is likely to be great for all applications, therefore GC is "pluggable". I think Movitz as it stands currently almost proves this is feasible (only one "real" GC strategy/plug-in is implemented). I'll offer two observations in light of this: 1. Having pluggable GC implies GC is properly factored out of the remainder of the system, and therefore its characteristics should be relatively easy to capture. 2. The obvious: pluggable GC means that GC can be replaced, hopefully with something that is (more) suitable to the particular application. I'm using "GC" as short-hand for memory management here, i.e. how objects are allocated and space is reclaimed. I hope you'll forgive me for making this somewhat off-topic here. I think it's a very interesting subject. And I do think it would be very valuable for SBCL to specify its GC characteristics. -- Frode Vatvedt Fjeld |
From: Luke G. <luk...@sy...> - 2007-04-10 18:48:34
|
Howdy! Below the line is a response from John Foderaro of Franz which he asked me to post here as he's not a list subscriber. (BTW, nor am I, I'm reading and posting via the marvellous and Lisp-associated gmane.org). Thanks for responding, John! -------- I wasn't at the talk but I think you've really come away with a total misunderstanding of what we're doing at Franz to build scalable applications. No one here is advocating that you should go to the trouble of writing cons-free code everywhere. That's a lot of work for very little gain. You should read Joe Marshall's message (to which you link) more carefully. He says that it is possible to cons like crazy in an application but that 1. you must first tune it carefully 2. you must not introduce code that likes to keep data around for a while. So if you're prepared to write an application, tune it, and then burn it into a ROM so it's never modified then you can have your massive-consing program run just fine. The fact is though that programs evolve. People add new features some of which may create long or medium lived objects. As Joe Marshall notes in a generational gc that's going to cause needless promotion of those objects. Now if you were gc'ing at a normal or slower rate many of those medium lived objects would never make it into the older generations, so adding those new features would not be a problem. Thus a rapid-consing program is like a wheel spinning very fast. If you add any weight to the edge (equivalent to adding more functionality to the program) you have to carefully rebalance the wheel or it will be torn off its axle. What we're advocating is that by reducing the consing in the most used parts of the application you can avoid putting stress on the gc. This is equivalent to a wheel turning slowly. You can then modify the application and it will continue to run just as efficiently without re-tuning. There's no need to go to all this trouble for most applications. Lisp's memory managment is efficient and does its job without any tuning. However when you must work with very large data sets then it becomes important. Our customers are doing just that. They have to process a few hundred million records and do analysis on them and they have to do it in 24 hours since the next day they get another few hundred million records. They are constantly tweeking their application and the analysis they do. They can't be tuning the gc parameters all the time. I've written many Lisp applications and I feel that anything can be written in Lisp, however you can't just write it in Lisp any way you please and have it work. You must write it in one of the appropriate styles for the problem you're trying to solve. This is true not only of Lisp but of any programming language. -- John Foderaro, Franz Inc. |
From: John F. <jk...@fr...> - 2007-04-11 04:50:09
|
Luke Gorrie <luke.gorrie <at> synap.se> writes: > > Howdy! > I'm posting this for Luke Gorrie -- John Foderaro Hi John! I take your point. I have considered these issues before but didn't address them in my initial post. I'll try to do so now. > > Thus a rapid-consing program is like a wheel spinning very fast. If you > > add any weight to the edge (equivalent to adding more functionality to the > > program) you have to carefully rebalance the wheel or it will > > be torn off its axle. Must it be so? In Erlang we have a trick to safely allocate a lot of memory without affecting garbage collection one way or the other. We spawn a new short-lived process that inhibits GC, does lots of work and conses lots of temporary data, then sends the final result up to its parent and terminates. This is GC-efficient because the whole heap of the new Erlang process is discarded upon termination. That's possible because each process has a private heap and there are no inter-heap pointers. The parent process's heap isn't affected beyond receiving a copy of the result. I tend to think of the new process as acting like a nursery generation for its parent. I have discussed with Nikodemus and Juho the idea of a similar Lisp special form like this: (defspecial with-fresh-nursery (allocation &body body) "Execute BODY and inhibit GC up to ALLOCATION bytes of consing. Garbage collection of the newly allocated data is forced upon completion. Data that has become garbage is discarded without cost; Data that is still alive is promoted to the parent generation by copying; Each page of older-generation memory that was modified by BODY is scanned. If more than ALLOCATION bytes are consed then GC behaviour is undefined." ...) Then one could write code like: (with-fresh-nursery (expt 10 6) (let ((line (read-line))) (format t "You entered a ~D-character line." (length line)))) and if someone told you "that conses unnecessarily" you could look them straight in the eye and say "so what?" The performance would be predictable and good: exactly as Joe Marshall describes. I don't propose to use this everywhere. Only in the places where you would otherwise go out of your way to avoid consing. Maybe this direction is worth considering? The heap is an incredibly convenient data structure and it would seem a major win to make it usable in high-performance code. > > Our customers are doing just that. They have to process a few > > hundred million records and do analysis on them and they have to do > > it in 24 hours since the next day they get another few hundred > > million records. Coincidentally I work on Erlang systems that are generating daily reports and processing realtime data for a lot of telecom operators including at least one that was listed in Jans's slides. They've processed billions and billions of events and I've done a lot of optimisation work on them. Hard work :-) AllegroCache itself sounds like an excellent product to me, btw. Cheers, -Luke |
From: John F. <jk...@fr...> - 2007-04-11 14:38:39
|
John Foderaro <jkf <at> franz.com> writes: > > [actually the message before was from Luke] > I think we agree that the ideal solution is to just cons without worrying about allocation and garbage collection. Fortunately for most Lisp programs you can do just this. Less than ideal (yet sometimes necessary) solutions we've discussed are a. code parts of your program to avoid allocations by reusing data objects (the cons-free approach) b. introduce allocation regions and make the code explicitly choose where to do allocation (the with-fresh-nursery approach). The advantage of the cons-free approach is that you can do it today and it's portable Common Lisp. The with-fresh-nursery approach is not available yet and would likely only be implemented in a few Common Lisps. Putting that aside let's look at the with-fresh-nursery approach. You mention in your documention for with-fresh-nursery: >> Data that has become garbage is discarded without cost; The problem is that just determining if something is garbage has a cost (except perhaps with reference counting gc's but in that case you're really spreading the cost over the computation rather than at gc time, so there is always a cost to determing if something is garbage.) Compare that to the cons-free approach where you never have to determine if something is garbage (since you used read-line in your example I assume that Jans discussed the non-consing read-line in his talk so you know how that works) Your example is this: (with-fresh-nursery (expt 10 6) (let ((line (read-line))) (format t "You entered a ~D-character line." (length line)))) are you saying that you're creating a megabyte allocation region just for one call to read-line? Then you call read-line and then invoke a gc where you have to scan for pointers from older generations into the new allocation region in order to free this generation. The overhead seems tremendous. Perhaps what you meant was: (with-fresh-nursery (expt 10 6) (dotimes (i 1000000000) (let ((line (read-line))) (format t "You entered a ~D-character line." (length line)) (process-the-line line)))) I.e you call read-line a billion times to read and process some data file. This would have the positive effect of isolating the tenuring of objects to this particular nursery so that other lisp threads in the system running in different nurseries would not see their newly created object prematured tenured to an older generation. However this application would still cause many gc's and that's overhead that is hard to avoid. Erlang's restriction of no cross-heap pointers is their secret to making this work efficiently (at the cost of extra work by the programmer to work around this restriction). If you consider the case of having a read-line through a 16gb file, if you use the normal read-line you'll allocate and reclaim 16gb. If you use our portable non-consing one then you'll allocate about 80 bytes. There's no way the normal read-line plus gc will ever come close to the performance of the non-consing one. Most of the time the performance does matter. But when someone gives you a 16gb file...... -- John Foderaro Franz Inc. |
From: Luke G. <luk...@sy...> - 2007-04-11 18:09:31
|
Hi John! The performance characteristics I ascribed to with-fresh-nursery are based on my understanding of how a generational copying garbage collector such as SBCL's gencgc is supposed to work. Let me start by laying bare my conception in more detail: | Data that has become garbage is discarded without cost; | Data that is still alive is promoted to the parent generation by copying; Live objects are copied into the next generation at some cost. Dead objects cost nothing because the oldspace is reused as a whole without bothering to look at the garbage objects (much like when Erlang discards a heap). | Each page of older-generation memory that was modified by BODY is scanned. With a remembered set (write-barrier) we can keep track of which pages of memory in older generations have been modified recently enough to contain a pointer into the new generation. We only have to scan the pages that have been so modified by our SETF's etc. If we don't SETF many/any old objects then there's little or no cost. Furthermore I expect the constant factors of with-fresh-nursery to be extremely small ("resetting the consing frontier" -- epsilon). Certainly not proportional to the ALLOCATIONS argument which is just a hint to the "is it time to GC yet?" mechanism. SBCL already supports this kind of hint via the SETF'able (SB-EXT:BYTES-CONSED-BETWEEN-GCS) which defaults to 12MB. Essentially what I'm saying is that AFAIK one can engineer a garbage collector to behave this way and if that's true then it's so good that I'd like to see it promoted from the runtime system into the langauge. Perhaps we'll discover that I have some important facts wrong. I'm not a GC expert by any means. I'm posting based on a few weekends of hacking at the GC in CMUCL/SBCL and on having sketched these ideas to a fair few people with memory management / lisp implementation backgrounds without being totally shot down. So anyway, if we accept these properties of with-fresh-nursery then I can offer a very different performance analysis of processing a lot of lines like this: (dotimes (i 1000000000) (with-fresh-nursery (expt 10 6) (let ((line (read-line))) (format t "You entered a ~D-character line." (length line)) (process-the-line line)))) If we assume that read-line/format/process-the-line don't modify objects in old generations or leave new live data around then the cost would be: nothing for removing the garbage (this is free); nothing for promoting new live objects (there aren't any); nothing for scanning old generations (none have been modified within with-fresh-nursery so none can contain pointers to the new objects) which makes the sum total memory management cost only the repeated epsilon constant factor of with-fresh-nursery. This is like Joe Marshall's post describes except that with-fresh-nursery makes it dependable, i.e. independent of what happens to the heap before and after this code runs. (This dependability seems to answer one objection that Scott McKay made to my pro-cons lobbying at ILC.) I would also like to stress that "research" like with-fresh-nursery is the last step on my "here's what I'd do with a big budget" list. I see tremendous value to the application programmer in first simply documenting the exact performance characteristics of GC and thinking about whether small improvements could make it more useful. Cheers, -Luke P.S. I was tempted to respond to a few side-points but this is unusually stimulating and I'd like to save something for later :-) |
From: John F. <jk...@fr...> - 2007-04-12 03:00:47
|
>> (dotimes (i 1000000000) >> (with-fresh-nursery (expt 10 6) >> (let ((line (read-line))) >> (format t "You entered a ~D-character line." (length line)) >> (process-the-line line)))) >> >> If we assume that read-line/format/process-the-line don't modify >> objects in old generations or leave new live data around then the cost >> would be: If process-the-the line doesn't create or modify objects then what's the point of calling it? In an actual application process-the-line will do some work that that work will be reflected in changes to the heap. In a typical scenario process-the-line is calling functions to store data in collections of vectors which are eventually written to disk (the AllegroGraph scenario). >> >> nothing for removing the garbage (this is free); >> nothing for promoting new live objects (there aren't any); There will likely be some of these. >> nothing for scanning old generations (none have been modified within >> with-fresh-nursery so none can contain pointers to the new objects) Again, there will may be pointers to this data from old space. Also some big costs you forgot to include 1. you have to scan the stacks for all threads in the system as your write barrier won't detect data stored on the stack or in registers saved to the stack. In a typical web-based application you've got 10 to 20 stacks to scan when the with-fresh-nursery form ends. 2. you have to scan all data in all other nuseries. This is because the write barrier won't work between nurseries since between nurseries the "old" data isn't at lower addresses than new data. I'm not saying that what you want to do is impossible. It is more work than you think however and you really only get a super payoff if your application conses like crazy and accomplishes nothing that modifies the state of the lisp. So it might be good for a application that reads a file and looks for context-free errors. |
From: Alexander K. <ale...@gm...> - 2007-04-12 08:57:54
|
On 4/12/07, John Foderaro <jk...@fr...> wrote: > >> (dotimes (i 1000000000) > >> (with-fresh-nursery (expt 10 6) > >> (let ((line (read-line))) > >> (format t "You entered a ~D-character line." (length line)) > >> (process-the-line line)))) > >> > >> If we assume that read-line/format/process-the-line don't modify > >> objects in old generations or leave new live data around then the cost > >> would be: > > If process-the-the line doesn't create or modify objects > then what's the point of calling it? In an actual application > process-the-line will do some work that that work will be reflected > in changes to the heap. In a typical scenario process-the-line > is calling functions to store data in collections of vectors which > are eventually written to disk (the AllegroGraph scenario). > Erlang does this without changing the heap. This works if you have a function where arguments and return values are passed by value. Alternatively, if the arguments and return values are immutable, the same holds. In the case where you have immutable objects, you can keep the immutable objects in a separate heap, so the arguments given to the function and the return values will live in a separate heap from the heap used in the function. If you don't have immutable objects, you can use the object returned as the (only) root pointer in the heap and copy that particular object out of the heap. If the allocation point of the return object is known, it can also be allocated in the heap belonging to the calling function instead of the heap belonging to the function, so you end up with one level less of deep-copying. Alexander |
From: John F. <jk...@fr...> - 2007-04-12 12:50:17
|
Alexander Kjeldaas <alexander.kjeldaas <at> gmail.com> writes: > > Erlang does this without changing the heap.... All programming languages are based on compromises. The language tells the programmer: if you restrict yourself to X I will give you power Y. I haven't studied Erlang yet but from what you and others have written it seems like it has rather severe restrictions on how data objects are manipulated but that there's a big payoff as well. It's no wonder Erlang is a popular language. Lisp makes very few restrictions. When writing a Lisp system you're required to support all the things a user can legitimately do but you should optimize for those things he's likely to do. I read in this thread "As an Erlang programmer I'm willing to write Lisp code like I write Erlang code in order to get the consing benefit.". The problem is that none of the rest of the Lisp system is written like Erlang code and if you really want to make use of existing Lisp functions and libraries you will need to call code you haven't written. Thus you can't count on 100% Erlang-like discipline in memory usage. You could of course build your own heap system on top of lisp. A heap would be a large vector. You could create erlang-like objects in this vector and create erlang-like functions that understood these objects that you created inside this one huge lisp vector. -- John Foderaro |
From: Luke G. <luk...@sy...> - 2007-04-12 22:23:08
|
Howdy! To respond to the main points in reverse order: > Also some big costs you forgot to include > > 1. you have to scan the stacks for all threads in the system as > your write barrier won't detect data stored on the stack or in > registers saved to the stack. [...] > 2. you have to scan all data in all other nuseries. [...] Thanks for pointing out the stacks! You're right that this means there is more engineering effort than I had originally estimated. The with-fresh-nursery abstraction I've defined would require some kind of remembered-set similar to a write-barrier over the stacks and I haven't considered this element before. I'll keep it in mind when my own copy of the Garbage Collection book arrives next week :-) Possibly a realistic implementation of with-fresh-nursery would have some extra caveats in its contract when used in a multithreaded environment too. It would be fun to find out one day if this all can be made to work :-) >>> (dotimes (i 1000000000) >>> (with-fresh-nursery (expt 10 6) >>> (let ((line (read-line))) >>> (format t "You entered a ~D-character line." (length line)) >>> (process-the-line line)))) >>> >>> If we assume that read-line/format/process-the-line don't modify >>> objects in old generations or leave new live data around then the cost >>> would be: > > If process-the-the line doesn't create or modify objects > then what's the point of calling it? In an actual application > process-the-line will do some work that that work will be reflected > in changes to the heap. Interestingly this whole discussion originates from a time years ago when I was writing an application very much like my code snippet there! It was an ethernet switch and IP router that would churn a lot of garbage while processing packets in an event loop but generate almost no new live data in the process (just e.g small entries in the ARP cache). The Joe Marshall post was a response to Raymond Toy referring to my experience of trying to make CMUCL do the memory management for me in this application instead of having to reuse buffers and so on. It turns out that the design of the CMUCL/SBCL garbage collector sounds like it should be extremely efficient in such a program but in practice there are a few important gotchas :-) I'm behind on my work just now but when time permits I will open a new thread about how it looks for an application developer to write that kind of program based on the SBCL garbage collector. Otherwise it seems to be that we are basically in agreement on the technical issues and have thrashed out a lot of details that may even be of use to someone some day :-) Cheers! -Luke P.S. I would like to clarify that although I used Erlang for one example the characteristics I've ascribed to with-fresh-heap were based on the design of SBCL's garbage collector (gencgc). |
From: Luke G. <luk...@sy...> - 2007-04-14 13:55:13
|
Luke Gorrie <luk...@sy...> writes: > I'm behind on my work just now but when time permits I will open a new > thread about how it looks for an application developer to write that > kind of program based on the SBCL garbage collector. .. actually this leads to such digressions that I think it's more appropriate to move it back to my blog. I'm very happy to discuss this stuff in any forum where other people are interested though so anybody please feel free to pick me up on anything controversial. I'm heading off on vacation from thursday 'til the end of the month so I'll avoid picking fights right now. :-) Cheers, -Luke |
From: Luke G. <luk...@sy...> - 2007-04-29 20:34:24
|
Luke Gorrie <luk...@sy...> writes: > Interestingly this whole discussion originates from a time years ago > when I was writing an application very much like my code snippet there! > It was an ethernet switch and IP router that would churn a lot of > garbage while processing packets in an event loop but generate almost > no new live data in the process (just e.g small entries in the ARP > cache). Hey I just stumbled on the post I made to cmucl-imp at the time which I'd like the archive to include a reference to: http://osdir.com/ml/lisp.cmucl.devel/2003-08/msg00303.html |