From: William H. N. <wil...@ai...> - 2004-02-02 21:10:16
|
The current *FOP-STACK* code seems confusing and buggy. I propose to replace it with something simpler and (currently) a little slower (but very likely to be optimizable if anyone cares). Before I do so, I thought I'd give people a chance both to comment on the general idea, and possibly to check whether some other benchmark shows more of a slowdown. CHAPTER THE FIRST: DEMONSTRATING A BUG First, at the command line, inspect the FOP-STACK variables: sb-fasl::*fop-stack-pointer* (length sb-fasl::*fop-stack*) Now, compile and LOAD (the .fasl produced from) this file (which creates its own file, and does a recursive LOAD nested in the LOAD you execute from the command line): (in-package :cl-user) (let ((source "/tmp/faslbug.lisp")) ;; Create a program with enough symbols in it that it's likely to ;; overflow any minimal *FOP-STACK*, causing it to be resized. (with-open-file (f source :direction :output :if-exists :supersede) (print `(defstruct zoom ,@(loop for i from 0 below 200 collecting (intern (format nil "SLOT~D" i)))) f)) (load (compile-file source))) Then, inspect the FOP-STACK variables again: sb-fasl::*fop-stack-pointer* (length sb-fasl::*fop-stack*) You should find that the length of the *FOP-STACK* has grown, but the *FOP-STACK-POINTER* hasn't moved up correspondingly. As far as I can see, (1) the space above *FOP-STACK-POINTER* is permanently allocated and (2) it doubles in size every time something like this happens. (You should be able to make it double in size as many times as you like by LOADing the outer .fasl above repeatedly.) CHAPTER THE SECOND: PESSIMIZATION OF THE REVISED CODE IMHO the original code involved here is pretty ugly and confusing, so the patch below just rips it out and replaces it with something which seems easier to understand (and uses more CL-standard library machinery instead of recreating analogous machinery from scratch). I present the patch for review, instead of just slamming it into CVS, because I don't want to pessimize stuff unnecessarily, and this has the potential to do so, especially since currently adjustable-vector-of-T-with-FILL-POINTER operations aren't optimized very well. But I think the proper fix for that is to optimize those operations, rather than trying to work around them. I'd hope it should be possible to compile those operations fast enough that they aren't a bottlneck compared to the disk i/o required to get the FOPs into memory in the first place. A simple benchmark on my system showed 0.21 seconds of real time 0.21 seconds of user run time 0.0 seconds of system run time 28 page faults and 2165328 bytes consed. to load 19 .fasl files, totalling 735079 bytes, using the old code (sbcl-0.8.7.36), and 0.247 seconds of real time 0.25 seconds of user run time 0.0 seconds of system run time 32 page faults and 2181656 bytes consed. using the new system, so the pessimization seems to be detectable. I think I am happy to live with this level of pessimization (because the new cleaner approach seems much easier to make right, and reasonably easy to optimize in the future) but I wanted to run this by people first. In particular, if someone with a different combination of .fasl files and system characteristics has much more pessimization to report, maybe this isn't such a good idea after all. CHAPTER THE THIRD: THE PATCH Index: src/code/fop.lisp =================================================================== RCS file: /cvsroot/sbcl/sbcl/src/code/fop.lisp,v retrieving revision 1.26 diff -u -r1.26 fop.lisp --- src/code/fop.lisp 12 Dec 2003 04:35:23 -0000 1.26 +++ src/code/fop.lisp 2 Feb 2004 19:51:21 -0000 @@ -37,7 +37,7 @@ ;;; Define a pair of fops which are identical except that one reads ;;; a four-byte argument while the other reads a one-byte argument. The -;;; argument can be accessed by using the Clone-Arg macro. +;;; argument can be accessed by using the CLONE-ARG macro. ;;; ;;; KLUDGE: It would be nice if the definition here encapsulated which ;;; value ranges went with which fop variant, and chose the correct @@ -161,7 +161,7 @@ (unless (= *current-fop-table-index* expected-index) (bug "fasl table of improper size")))) (define-fop (fop-verify-empty-stack 63 :stackp nil) - (unless (= *fop-stack-pointer* *fop-stack-pointer-on-entry*) + (unless (zerop (length *fop-stack*)) (bug "fasl stack not empty when it should be"))) ;;;; fops for loading symbols @@ -340,7 +340,7 @@ (macrolet ((frob (name op fun n) `(define-fop (,name ,op) - (call-with-popped-things ,fun ,n)))) + (call-with-popped-args ,fun ,n)))) (frob fop-list-1 17 list 1) (frob fop-list-2 18 list 2) Index: src/code/load.lisp =================================================================== RCS file: /cvsroot/sbcl/sbcl/src/code/load.lisp,v retrieving revision 1.31 diff -u -r1.31 load.lisp --- src/code/load.lisp 18 Oct 2003 10:14:52 -0000 1.31 +++ src/code/load.lisp 2 Feb 2004 19:51:23 -0000 @@ -146,64 +146,47 @@ ;;;; the fop stack -;;; (This is in a SIMPLE-VECTOR, but it grows down, since it is -;;; somewhat cheaper to test for overflow that way.) -(defvar *fop-stack* (make-array 100)) -(declaim (simple-vector *fop-stack*)) - -;;; the index of the most recently pushed item on the fop stack -(defvar *fop-stack-pointer* 100) - -;;; the current index into the fop stack when we last recursively -;;; entered LOAD -(defvar *fop-stack-pointer-on-entry*) -(declaim (type index *fop-stack-pointer* *fop-stack-pointer-on-entry*)) - -(defun grow-fop-stack () - (let* ((size (length (the simple-vector *fop-stack*))) - (new-size (* size 2)) - (new-stack (make-array new-size))) - (declare (fixnum size new-size) (simple-vector new-stack)) - (replace new-stack (the simple-vector *fop-stack*) :start1 size) - (incf *fop-stack-pointer-on-entry* size) - (setq *fop-stack-pointer* size) - (setq *fop-stack* new-stack))) +;;; (This is to be bound by LOAD to an adjustable (VECTOR T) with +;;; FILL-POINTER, for use as a stack with VECTOR-PUSH-EXTEND.) +(defvar *fop-stack*) +(declaim (type (vector t) *fop-stack*)) ;;; Cache information about the fop stack in local variables. Define a ;;; local macro to pop from the stack. Push the result of evaluation -;;; if specified. +;;; if PUSHP. (defmacro with-fop-stack (pushp &body forms) (aver (member pushp '(nil t :nope))) - (let ((n-stack (gensym)) - (n-index (gensym)) - (n-res (gensym))) - `(let ((,n-stack *fop-stack*) - (,n-index *fop-stack-pointer*)) - (declare (simple-vector ,n-stack) (type index ,n-index)) + (with-unique-names (fop-stack) + `(let ((,fop-stack *fop-stack*)) + (declare (type (vector t) ,fop-stack)) (macrolet ((pop-stack () - `(prog1 - (svref ,',n-stack ,',n-index) - (incf ,',n-index))) - (call-with-popped-things (fun n) - (let ((n-start (gensym))) - `(let ((,n-start (+ ,',n-index ,n))) - (declare (type index ,n-start)) - (setq ,',n-index ,n-start) - (,fun ,@(make-list n :initial-element - `(svref ,',n-stack - (decf ,n-start)))))))) + `(vector-pop ,',fop-stack)) + (call-with-popped-args (fun n) + `(%call-with-popped-args ,fun ,n ,',fop-stack))) ,(if pushp - `(let ((,n-res (progn ,@forms))) - (when (zerop ,n-index) - (grow-fop-stack) - (setq ,n-index *fop-stack-pointer* - ,n-stack *fop-stack*)) - (decf ,n-index) - (setq *fop-stack-pointer* ,n-index) - (setf (svref ,n-stack ,n-index) ,n-res)) - `(prog1 - (progn ,@forms) - (setq *fop-stack-pointer* ,n-index))))))) + `(vector-push-extend (progn ,@forms) ,fop-stack) + `(progn ,@forms)))))) + +;;; Call FUN with N arguments popped from STACK. +(defmacro %call-with-popped-args (fun n stack) + ;; N's integer value must be known at macroexpansion time. + (declare (type index n)) + (with-unique-names (n-stack old-length new-length) + (let ((argtmps (make-gensym-list n))) + `(let* ((,n-stack ,stack) + (,old-length (fill-pointer ,n-stack)) + (,new-length (- ,old-length ,n)) + ,@(loop for i from 0 below n collecting + `(,(nth i argtmps) + (aref ,n-stack (+ ,new-length ,i))))) + (declare (type (vector t) ,n-stack)) + (setf (fill-pointer ,n-stack) ,new-length) + ;; (For some applications it might be appropriate to FILL the + ;; popped area with NIL here, to avoid holding onto garbage. For + ;; sbcl-0.8.7.something, though, it shouldn't matter, because + ;; we're using this only to pop stuff off *FOP-STACK*, and the + ;; entire *FOP-STACK* can be GCed as soon as LOAD returns.) + (,fun ,@argtmps))))) ;;;; Conditions signalled on invalid fasls (wrong fasl version, etc), ;;;; so that user code (esp. ASDF) can reasonably handle attempts to @@ -357,8 +340,7 @@ ;;; ;;; Return true if we successfully load a group from the stream, or ;;; NIL if EOF was encountered while trying to read from the stream. -;;; Dispatch to the right function for each fop. Special-case -;;; FOP-BYTE-PUSH since it is real common. +;;; Dispatch to the right function for each fop. (defun load-fasl-group (stream) (when (check-fasl-header stream) (catch 'fasl-group-end @@ -366,6 +348,10 @@ (loop (let ((byte (read-byte stream))) + ;; stale code from before rewrite of *FOP-STACK* as + ;; adjustable vector (probably worth rewriting when next + ;; anyone needs to debug FASL stuff) + #| ;; Do some debugging output. #!+sb-show (when *show-fops-p* @@ -386,26 +372,10 @@ byte (1- (file-position stream)) (svref *fop-funs* byte)))) + |# ;; Actually execute the fop. - (if (eql byte 3) - ;; FIXME: This is the special case for FOP-BYTE-PUSH. - ;; Benchmark to see whether it's really worth special - ;; casing it. If it is, at least express the test in - ;; terms of a symbolic name for the FOP-BYTE-PUSH code, - ;; not a bare '3' (!). Failing that, remove the special - ;; case (and the comment at the head of this function - ;; which mentions it). - (let ((index *fop-stack-pointer*)) - (declare (type index index)) - (when (zerop index) - (grow-fop-stack) - (setq index *fop-stack-pointer*)) - (decf index) - (setq *fop-stack-pointer* index) - (setf (svref *fop-stack* index) - (svref *current-fop-table* (read-byte stream)))) - (funcall (the function (svref *fop-funs* byte)))))))))) + (funcall (the function (svref *fop-funs* byte))))))))) (defun load-as-fasl (stream verbose print) ;; KLUDGE: ANSI says it's good to do something with the :PRINT @@ -420,16 +390,14 @@ (let* ((*fasl-input-stream* stream) (*current-fop-table* (or (pop *free-fop-tables*) (make-array 1000))) (*current-fop-table-size* (length *current-fop-table*)) - (*fop-stack-pointer-on-entry* *fop-stack-pointer*)) + (*fop-stack* (make-array 100 :fill-pointer 0 :adjustable t))) (unwind-protect (loop while (load-fasl-group stream)) - (setq *fop-stack-pointer* *fop-stack-pointer-on-entry*) (push *current-fop-table* *free-fop-tables*) - ;; NIL out the stack and table, so that we don't hold onto garbage. + ;; NIL out the table, so that we don't hold onto garbage. ;; - ;; FIXME: Couldn't we just get rid of the free fop table pool so - ;; that some of this NILing out would go away? - (fill *fop-stack* nil :end *fop-stack-pointer-on-entry*) + ;; FIXME: Could we just get rid of the free fop table pool so + ;; that this would go away? (fill *current-fop-table* nil)))) t) CHAPTER FOUR: THE PLAN I'll probably check this in later this week if no one has a really shocking horror story about performance. (And then even if someone using CVS comes up with such a horror story later, presumably it won't be too hard to revert to the old code.) -- William Harold Newman <wil...@ai...> In examining the tasks of software development versus software maintenance, most of the tasks are the same -- except for the additional maintenance task of "understanding the existing product". -- Robert L. Glass, _Facts and Fallacies of Software Engineering_ PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C |