From: Sam S. <sd...@gn...> - 2004-04-29 20:24:49
|
> * Bruno Haible <un...@hf...g> [2004-04-29 19:14:52 +0000]: > > A cleaner way to implement weak pointers. The ultimate weak HT test is making *DOCUMENTATION* in init.lisp a weak HT. it still fails for me. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Ernqvat guvf ivbyngrf QZPN. |
From: Sam S. <sd...@gn...> - 2004-06-03 22:31:12
|
Bruno, are you done with weak HTs? it is mentioned in NEWS but you did not make *DOCUMENTATION* a weak HT! -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> The only time you have too much fuel is when you're on fire. |
From: Bruno H. <br...@cl...> - 2004-06-04 11:55:18
|
Sam wrote: > are you done with weak HTs? They work now. A few more optimization patches are coming in that improve the efficiency of hash lookup. > it is mentioned in NEWS but you did not make *DOCUMENTATION* a weak HT! You can make *DOCUMENTATION* a weak hash table, under your responsibility. I wouldn't do it probably, because the cost of weak hash tables (and any weak stuff in general) is quite high: After every GC (even the small generational GCs) all weak hash tables are scanned specially. And the gain is just to be able to throw away a string here and there when a symbol is uninterned. I haven't benchmarked it, though. Therefore what I say here is speculation. Bruno |
From: Sam S. <sd...@gn...> - 2004-06-04 13:11:46
|
> * Bruno Haible <oe...@py...t> [2004-06-04 13:46:04 +0200]: > > You can make *DOCUMENTATION* a weak hash table, under your > responsibility. I wouldn't do it probably, because the cost of weak > hash tables (and any weak stuff in general) is quite high: After every > GC (even the small generational GCs) all weak hash tables are scanned > specially. And the gain is just to be able to throw away a string here > and there when a symbol is uninterned. in that case it should be discarded and documentation should be kept with the objects they document. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Man has 2 states: hungry/angry and sate/sleepy. Catch him in transition. |
From: Bruno H. <br...@cl...> - 2004-04-29 20:49:59
|
Sam wrote: > > A cleaner way to implement weak pointers. > > The ultimate weak HT test is making *DOCUMENTATION* in init.lisp a weak > HT. > it still fails for me. Yes. The current status is: weakptr in MIXED BLOCKS: 32-bit OK, WIDE OK MIXED PAGES: OK PURE BLOCKS: OK hashweak in MIXED BLOCKS: 32-bit OK, WIDE OK MIXED PAGES: crash during rehash PURE BLOCKS: crash defhash in MIXED BLOCKS: 32-bit KO, WIDE KO MIXED PAGES: OK PURE BLOCKS: OK Before digging into the weak HT, I need to understand them. There are 4 types of weak HTs, but we don't tell the user which type to use in which situation. AFAICS, :KEY suitable for example a) to attach information to the key object, that lives just as long as the key, b) to implement canonicalization tables, for which the key does not occur in the value :VALUE ?? :EITHER suitable for example to model relations between objects, e.g. the "is husband of" relation: the relationship dies when either of the two married individuals dies. :BOTH ?? And how to implement a canonicalization tables, for which the key occurs in the value? :KEY doesn't fit, and :BOTH doesn't fit either. Can you add more examples to this explanations? Also we should mention that when a key from a (key, value) pair in a weak :KEY HT becomes inaccessible, the entire pair is dropped from the table, i.e. it will not appear in MAPHASH any more. (It's not just the value which is replaced with NIL.) After understanding this, I'll look for the bugs (I see at least 2 of them for the moment: 1. the loop that scans the weak HTs comes after the loop which scans the weak pointers, 2. the hash table count of a weak HT is not decremented when a pair is dropped.) Bruno |
From: Sam S. <sd...@gn...> - 2004-04-29 21:28:47
|
> * Bruno Haible <oe...@py...t> [2004-04-29 22:42:33 +0200]: > > Before digging into the weak HT, I need to understand them. There are > 4 types of weak HTs, but we don't tell the user which type to use in > which situation. AFAICS, > > :KEY suitable for example > a) to attach information to the key object, that lives just as long > as the key, > b) to implement canonicalization tables, for which the key does > not occur in the value > > :VALUE ?? mail assignment to warships. when a warship is sunk, we want to remove the mail->warship map. > :EITHER suitable for example > to model relations between objects, e.g. the "is husband of" > relation: the relationship dies when either of the two > married individuals dies. > > :BOTH ?? same as :EITHER, but instead of "is the spouse of" we are recording "is the spouse/widow of". > And how to implement a canonicalization tables, for which the key > occurs in the value? :KEY doesn't fit, and :BOTH doesn't fit either. why ? I think :BOTH could be used here... > Also we should mention that when a key from a (key, value) pair in a > weak :KEY HT becomes inaccessible, the entire pair is dropped from the > table, i.e. it will not appear in MAPHASH any more. (It's not just the > value which is replaced with NIL.) yes. PS. if you reached a milestone with your CLOS work, do you mind adding a line to src/NEWS? (e.g., "faster/more compliant clos") -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> There's always free cheese in a mouse trap. |
From: Bruno H. <br...@cl...> - 2004-04-30 11:21:42
|
Sam wrote: > > :VALUE ?? > > mail assignment to warships. > when a warship is sunk, we want to remove the mail->warship map. OK. But when the key is dead, we want to remove the (key, value) pair as well, right? (A warship cannot be sent mail if its name has been forgotten.) So we end up with :EITHER for this usecase. > > :BOTH ?? > > same as :EITHER, but instead of "is the spouse of" we are recording > "is the spouse/widow of". How can this work? Once the spouse/husband is dead, the other one cannot refer to it any more. This :BOTH is semantic and implementatory nonsense: 1. what should gethash return when the value is dead but the key is not? 2. maphash will call its argument with value = unbound. 3. what is the entry good for when the key is dead? > > And how to implement a canonicalization tables, for which the key > > occurs in the value? :KEY doesn't fit, and :BOTH doesn't fit either. > > why ? I think :BOTH could be used here... Think more... I've found the following issues/questions in the code: 1. spvw_garcol loop is not correct 2. count is not decremented by the GC 3. hashtabl.d hash a %2 instead of /2 4. weakkvt is an array, why? replace, subseq etc. make no sense because the weakkvt is only used as an auxiliary objects by the hash tables. Did you do it so that the maximum length is = 2^24 not 2^16? 5. when is the value marked when the type is :key? You need to be able to do these things correctly: (let ((m1 (make-hash-table :weak :key)) (m2 (make-hash-table :weak :key)) (m3 (make-hash-table :weak :key)) (m4 (make-hash-table :weak :key)) (x1 (list 1))) (let ((x2 (list 2)) (x3 (list 3)) (x4 (list 4)) (x5 (list 5))) (setf (gethash x1 m1) x2) (setf (gethash x2 m2) x3) (setf (gethash x3 m3) x4) (setf (gethash x4 m4) x5)) (gc) (gethash (gethash (gethash (gethash x1 m1) m2) m3) m4)) must return x5. Of course this must work independently of the order of m1, m2, m3, m4 in the all_weakkvts list. Missing comments: 6. maphash looks only at the key, is this sufficient for all types? yes, because spvw_garcol kills the entire entry. 7. In spvw_garcol, there is no need to call gc_mark for :EITHER. Put a comment there! In summary, I've found so many problems that I think before fixing that, it's good to implement more primitive operations. (Just like you cannot get PROG right if you haven't implemented LET, BLOCK and TAGBODY before.) I propose the following types of objects: weak-pointer is ok as it is weak-and a box holding two values x and y, as long as both x and y are alive. no weak-or because that would be the same semantic nonsense as :BOTH above. weak-mapping a box holding two values x and y, as long as x is alive. Implements a mapping x -> y. This is the semantic equivalent to a weak hash table of type :KEY with just 1 entry. weak-and-mapping a box holding three values x, y and z, as long as x and y are alive. Implements a mapping (x, y) -> z. weak-or-mapping a box holding three values x, y and z, as long as x or y are alive. Implements a mapping (x, y) -> z. THEN, when you've implemented these elementary bricks, it should be much clearer how to fix the weak hash tables. Btw, I do need efficient weak hash sets and weak hash tables for CLOS (weak hash sets for the subclasses list, and weak hash tables for the object -> #<eql-specializer object> mapping). Bruno |
From: Sam S. <sd...@gn...> - 2004-04-30 14:01:33
|
Please note that even if we cannot think of a good application, we still should implement it. It is quite boring to tell the user again an again "you don't need it because I do not understand your need". CMUCL has this, and you can emulate its behavior when in doubt. > * Bruno Haible <oe...@py...t> [2004-04-30 13:13:55 +0200]: > >> > :BOTH ?? >> >> same as :EITHER, but instead of "is the spouse of" we are recording >> "is the spouse/widow of". > > How can this work? Once the spouse/husband is dead, the other one > cannot refer to it any more. no, we want to know who is/was the spouse. > This :BOTH is semantic and implementatory nonsense: > 1. what should gethash return when the value is dead but the key is > not? if the key is alive, then the value is ALSO alive - the key keeps it alive! > 2. maphash will call its argument with value = unbound. no, see above: both are just fine. > 3. what is the entry good for when the key is dead? the key is not dead - it is kept alive by the value. > 4. weakkvt is an array, why? replace, subseq etc. make no sense > because the weakkvt is only used as an auxiliary objects by the > hash tables. this is not a problem because weakkvt is not available to the user, so replace, subseq &c will never see it. we discussed that when I implemented it and that's what _YOU_ said! > Did you do it so that the maximum length is = 2^24 not 2^16? yes. > 5. when is the value marked when the type is :key? > You need to be able to do these things correctly: > (let ((m1 (make-hash-table :weak :key)) > (m2 (make-hash-table :weak :key)) > (m3 (make-hash-table :weak :key)) > (m4 (make-hash-table :weak :key)) > (x1 (list 1))) > (let ((x2 (list 2)) (x3 (list 3)) (x4 (list 4)) (x5 (list 5))) > (setf (gethash x1 m1) x2) > (setf (gethash x2 m2) x3) > (setf (gethash x3 m3) x4) > (setf (gethash x4 m4) x5)) > (gc) > (gethash (gethash (gethash (gethash x1 m1) m2) m3) m4)) > must return x5. Of course this must work independently of the > order of m1, m2, m3, m4 in the all_weakkvts list. all values in m1-4 should be marked. then, in a second pass, all un-marked keys are removed. thus it will take 4 GCs after x1 is dead to remove x4 from m4. > In summary, I've found so many problems that I think before fixing that, > it's good to implement more primitive operations. (Just like you cannot > get PROG right if you haven't implemented LET, BLOCK and TAGBODY before.) LET, BLOCK and TAGBODY are useful in their own right. the objects you propose are more or less useless. > no weak-or because that would be the same semantic > nonsense as :BOTH above. it just means that you need a second GC-bit: "GC inside GC" I suggest that you think of weak HTs as optimizations of the following model: consider a usual HT, but with objects (keys and values) wrapped inside weak pointers: key only for :KEY, value only for :VALUE, none for non-weak and both for :EITHER. The only type that does not fit this model is :BOTH. Somehow Emacs has :BOTH, you can look at the sources. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Never let your schooling interfere with your education. |
From: Bruno H. <br...@cl...> - 2004-05-02 15:09:29
|
> weak-pointer is ok as it is > weak-and a box holding two values x and y, > as long as both x and y are alive. > weak-or ... > weak-mapping a box holding two values x and y, > as long as x is alive. Implements a mapping > x -> y. This is the semantic equivalent to a > weak hash table of type :KEY with just 1 entry. > weak-and-mapping a box holding three values x, y and z, as > long as x and y are alive. Implements a mapping > (x, y) -> z. > weak-or-mapping a box holding three values x, y and z, > as long as x or y are alive. Implements a > mapping (x, y) -> z. OK, Thanks for the discussions, Sam. I'm nearly done implementing these bricks, plus weak-lists and weak-alists. The algorithm I'm using is O(N log N), where N is the number of weak pointers. Bruno |
From: Bruno H. <br...@cl...> - 2004-04-30 11:42:12
|
Let me compare the weak pointer stuff with other implementations: - LispWorks has MAKE-HASH-TABLE :WEAK-KIND and SET-HASH-TABLE-WEAK very similar to what you did, (and IMO SET-HASH-TABLE-WEAK is overkill) - Guile has 3 kinds: weak key, weak value, and "doubly weak" (probably means :EITHER.) Explanations in http://www.nada.kth.se/~mdj/guile-ref/guile-ref_81.html - XEmacs has 3 kinds, weak key, weak value, and :EITHER. See http://www.tau.ac.il/cc/pages/docs/xemacs/lispref_639.html They also have weak-lists. Quite useful for keeping ordered collections of weak elements. http://www.tau.ac.il/cc/pages/docs/xemacs/lispref_117.html#SEC119 - MIT scheme has only weak-key hash tables. Among these I prefer the XEmacs spec: all that they offer is well explained and immediately useful. Bruno |
From: Sam S. <sd...@gn...> - 2004-04-30 14:15:57
|
> * Bruno Haible <oe...@py...t> [2004-04-30 13:34:41 +0200]: > > Let me compare the weak pointer stuff with other implementations: GNU Emacs: make-hash-table is a built-in function in `src/fns.c'. (make-hash-table &rest KEYWORD-ARGS) Create and return a new hash table. Arguments are specified as keyword/argument pairs. The following arguments are defined: :test TEST -- TEST must be a symbol that specifies how to compare keys. Default is `eql'. Predefined are the tests `eq', `eql', and `equal'. User-supplied test and hash functions can be specified via `define-hash-table-test'. :size SIZE -- A hint as to how many elements will be put in the table. Default is 65. :rehash-size REHASH-SIZE - Indicates how to expand the table when it fills up. If REHASH-SIZE is an integer, add that many space. If it is a float, it must be > 1.0, and the new size is computed by multiplying the old size with that factor. Default is 1.5. :rehash-threshold THRESHOLD -- THRESHOLD must a float > 0, and <= 1.0. Resize the hash table when ratio of the number of entries in the table. Default is 0.8. :weakness WEAK -- WEAK must be one of nil, t, `key', `value', `key-or-value', or `key-and-value'. If WEAK is not nil, the table returned is a weak table. Key/value pairs are removed from a weak hash table when there are no non-weak references pointing to their key, value, one of key or value, or both key and value, depending on WEAK. WEAK t is equivalent to `key-and-value'. Default value of WEAK is nil. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Microsoft: announce yesterday, code today, think tomorrow. |
From: Bruno H. <br...@cl...> - 2004-04-30 14:39:09
|
Sam wrote: > :weakness WEAK -- WEAK must be one of nil, t, `key', `value', > > `key-or-value', or `key-and-value'. If WEAK is not nil, the table > returned is a weak table. Key/value pairs are removed from a weak > hash table when there are no non-weak references pointing to their > key, value, one of key or value, or both key and value, depending on > WEAK. WEAK t is equivalent to `key-and-value'. Default value of WEAK > is nil. This documentation is just as insufficient as the LispWorks one. We have to write a better one. Bruno |
From: Bruno H. <br...@cl...> - 2004-04-30 14:30:47
|
Sam wrote: > Please note that even if we cannot think of a good application, we still > should implement it. It is quite boring to tell the user again an > again "you don't need it because I do not understand your need". But if we have 4 different options, that a newbie cannot understand, we need to explain them! And describing some frequent use cases is a good explanation. > >> > :BOTH ?? > >> > >> same as :EITHER, but instead of "is the spouse of" we are recording > >> "is the spouse/widow of". > > > > How can this work? Once the spouse/husband is dead, the other one > > cannot refer to it any more. > > no, we want to know who is/was the spouse. Ah, I see. So this :BOTH is a misnomer, because both pointers are also conditionally strong ones, depending on the liveness of the other pointer. > > Did you do it so that the maximum length is = 2^24 not 2^16? > > yes. ok, that's a good reason. > > 5. when is the value marked when the type is :key? > > You need to be able to do these things correctly: > > (let ((m1 (make-hash-table :weak :key)) > > (m2 (make-hash-table :weak :key)) > > (m3 (make-hash-table :weak :key)) > > (m4 (make-hash-table :weak :key)) > > (x1 (list 1))) > > (let ((x2 (list 2)) (x3 (list 3)) (x4 (list 4)) (x5 (list 5))) > > (setf (gethash x1 m1) x2) > > (setf (gethash x2 m2) x3) > > (setf (gethash x3 m3) x4) > > (setf (gethash x4 m4) x5)) > > (gc) > > (gethash (gethash (gethash (gethash x1 m1) m2) m3) m4)) > > must return x5. Of course this must work independently of the > > order of m1, m2, m3, m4 in the all_weakkvts list. > > all values in m1-4 should be marked. > then, in a second pass, all un-marked keys are removed. > thus it will take 4 GCs after x1 is dead to remove x4 from m4. That's OK. GC never guarantees that objects die immediately. But if our chain has now length N, not only length 4. How do you implement this behaviour in a way that is not O(N^2) ? > LET, BLOCK and TAGBODY are useful in their own right. > the objects you propose are more or less useless. Then hide them in the SYSTEM package and don't document them. We may find them useful later. > it just means that you need a second GC-bit: "GC inside GC" Possibly... I have been thinking in the same direction. I hope it's possible to implement without 2 GC bits. > I suggest that you think of weak HTs as optimizations of the following > model: > consider a usual HT, but with objects (keys and values) wrapped inside > weak pointers: key only for :KEY, value only for :VALUE, none for > non-weak and both for :EITHER. > The only type that does not fit this model is :BOTH. That's why we need to understand the elementary bricks first. I think a BOTH cell can be implemented as a (cons (weak-mapping x y) (weak-mapping y x)) The first half keeps y alive as long as x is alive, the second half does the contrary. > Somehow Emacs has :BOTH, you can look at the sources. Nah. Think yourself! Especially the Emacs sources: nothing in the Emacs sources appears to be worth copying. The language is outdated, the GC is unfixably slow, the design is not object oriented, the multibyte character support is hard to use correctly, the X11 code has race conditions, the regexps has hacks that don't work in multibyte locales, etc. Bruno |
From: Sam S. <sd...@gn...> - 2004-04-30 14:53:50
|
> * Bruno Haible <oe...@py...t> [2004-04-30 16:23:09 +0200]: > >> >> > :BOTH ?? >> >> >> >> same as :EITHER, but instead of "is the spouse of" we are recording >> >> "is the spouse/widow of". >> > >> > How can this work? Once the spouse/husband is dead, the other one >> > cannot refer to it any more. >> >> no, we want to know who is/was the spouse. > > Ah, I see. So this :BOTH is a misnomer, because both pointers are also > conditionally strong ones, depending on the liveness of the other pointer. the :WEAK argument tells us who must be dead for us to remove the HT cell. >> Somehow Emacs has :BOTH, you can look at the sources. > > Nah. Think yourself! Especially the Emacs sources: nothing in the > Emacs sources appears to be worth copying. The language is outdated, > the GC is unfixably slow, the design is not object oriented, the > multibyte character support is hard to use correctly, the X11 code has > race conditions, the regexps has hacks that don't work in multibyte > locales, etc. wow! is XEmacs any better? is there a single program (besides CLISP before I touched it :-) which you consider written well? :-) -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> MS Windows: error: the operation completed successfully. |
From: Sam S. <sd...@gn...> - 2004-04-30 20:27:21
|
> * Bruno Haible <oe...@py...t> [2004-04-30 16:23:09 +0200]: > > Ah, I see. So this :BOTH is a misnomer, because both pointers are also > conditionally strong ones, depending on the liveness of the other pointer. > >> The only type that does not fit this model is :BOTH. > > That's why we need to understand the elementary bricks first. I think > a BOTH cell can be implemented as a > (cons (weak-mapping x y) (weak-mapping y x)) > The first half keeps y alive as long as x is alive, the second half does > the contrary. :BOTH can be viewed as two objects which refer to each other. E.g., A=(B . dataA), B=(A . dataB). As soon as _both_ A & B are not visible from the outside, they both can be collected (== removed from the HT), but as long as either is alive, both are alive (and must remain the HT), handling of :BOTH is indeed a "GC inside a GC". i.e., two passes are needed: first we mark everything the usual way, then we mark all children of elements of :BOTH HTs (but not the elements themselves!), only then we clean-up the weak HTs. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Lisp is a language for doing what you've been told is impossible. - Kent Pitman |
From: Bruno H. <br...@cl...> - 2004-04-30 20:58:28
|
Sam wrote: > E.g., A=(B . dataA), B=(A . dataB). > As soon as _both_ A & B are not visible from the outside, they both can > be collected (== removed from the HT), but as long as either is alive, > both are alive (and must remain the HT), handling of :BOTH is indeed a > "GC inside a GC". Yes. > i.e., two passes are needed: first we mark everything the usual way, then > we mark all children of elements of :BOTH HTs (but not the elements > themselves!) When you do this, you cannot reclaim a cycle between A and B. I.e. when A contains a slot pointing to B, and contains a slot pointing to A, but A and B are not pointed to from outside, you would fail to reclaim them. Therefore forget about the concept of "mark all children of an element but not the element itself". It's broken. No good GC algorithm works this way. Can you formulate a non-broken algorithm how to handle this :BOTH case? May be O(N^2). We'll see how to optimize it afterwards. Bruno |
From: Bruno H. <br...@cl...> - 2004-04-30 15:53:14
|
Sam wrote: > is XEmacs any better? Yes, at least its data structures are more object oriented. > is there a single program (besides CLISP before I touched it :-) which > you consider written well? :-) Yes: the kernel is model to copy from. glibc is written well. Many GNU utilities (coreutils e.g.) are written well. Bruno |
From: Sam S. <sd...@gn...> - 2004-04-30 15:57:19
|
> * Bruno Haible <oe...@py...t> [2004-04-30 17:45:34 +0200]: > >> is there a single program (besides CLISP before I touched it :-) which >> you consider written well? :-) > > Yes: the kernel is model to copy from. glibc is written well. Many GNU > utilities (coreutils e.g.) are written well. gcc? vim? bash? cmucl? -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Computers are like air conditioners: they don't work with open windows! |
From: Bruno H. <br...@cl...> - 2004-04-30 17:01:06
|
Sam wrote: > > Yes: the kernel is model to copy from. glibc is written well. Many GNU > > utilities (coreutils e.g.) are written well. > > gcc? vim? bash? cmucl? You want me to say negative things? Oh no. - gcc: Some parts are written well, especially the newer ones. The way it organizes its system dependencies (the config/ directory) is a model to copy from, when it gets very complicated. - vim: Haven't ever needed to look into its source. This proves that it's robust and well documented. - bash: The way each built-in has its own source file is nice. The parser is ugly. The interface with the readline library was horrible but has improved. - cmucl: Well, you know the main motivation for the SBCL project... Bruno |