I frankly don't know what to think of the following idea: we could provide per-generation GC epoch markers (as well as a global one), and let users have a finer overview of how GC may have affected the heap. The code itself is tiny (see end of message).
The reason *I* want this is at <http://discontinuity.info/~pkhuong/hashing/>. consistent-eq2.lisp uses these hints to implement a specialised weak EQ hash table, and to avoid rehashing the whole table when only a few keys have moved. In the best-case (e.g. only 1 key has been moved over 16M entries in the hash table), this can divide the rehash time by a factor of 10e3 or more. Of course, tracking in which generation each object is introduces some overhead (writes takes ~2x as much time). What makes that weak EQ hash table interesting is that it can be used to provide identity-based hash values that don't change after GCs (i.e., non-sucky SXHASH for EQ-compared object).
Given that, even EQ hash tables don't have to be aware of the GC. For instance, t-hash.lisp seems to perform better than our current hash tables (on space usage, and on read and write performance), except that it doesn't include any logic to rehash after GCs. When it uses consistent-eq2's GET-ID as a hash function, reads and writes become somewhat slower than our hash tables, but it never has to rehash. I'll see if I can find a way to close the gap, or, failing that, add logic to rehash after GCs.
Should the diff (+ some cold-init magic) go in? Everything else described above could be a contrib.
Also, how worried are we about the rehashing issue? Currently, this means that hash tables operations aren't necessarily O(1), even in an amortised analysis.
diff --git a/src/code/gc.lisp b/src/code/gc.lisp
index 042ab05..787ed87 100644
@@ -191,6 +191,8 @@ run in any thread.")
;;; small to measure. -- JES, 2007-09-30
(declaim (type cons *gc-epoch*))
(defvar *gc-epoch* (cons nil nil))
+(declaim (type (simple-array cons (#. (1+ sb!vm:+highest-normal-generation+))) **gc-epoch-per-generation**))
+(defglobal **gc-epoch-per-generation** (make-array (1+ sb!vm:+highest-normal-generation+) :initial-element *gc-epoch*))
(defun sub-gc (&key (gen 0))
@@ -223,7 +225,10 @@ run in any thread.")
(let ((start-time (get-internal-run-time)))
- (setf *gc-epoch* (cons nil nil))
+ (let ((marker (cons nil nil)))
+ (setf *gc-epoch* marker)
+ (loop for i upto (min gen sb!vm:+highest-normal-generation+)
+ do (setf (aref **gc-epoch-per-generation** i) marker)))
(let ((run-time (- (get-internal-run-time) start-time)))
;; KLUDGE: Sometimes we see the second getrusage() call
;; return a smaller value than the first, which can