Juho Snellman <jsnell@...> writes:
> Attached is a patch that implements %sxhash-substring using FNV 1a
> hashing algorithm (just lisp, no assembler). It's about as fast as the
> old algorithm, but gets significantly fewer collisions.
This certainly is a nice property.
> This has a
> noticable effect on performance of anything that uses hash-tables (for
> example, applying the patch results in ~5% improvement on the COMPILER
> benchmark in cl-bench and ~30% improvement on HASH-STRINGS).
I'd probably dispute your reading of the COMPILER benchmark -- this
may have an effect, but (judging from the variance of supposedly
unchanged tests) I don't think 5% is a significant measurement. 30%
probably is, though :-)
> Then for the bad news. Multiplication mod 2**32 is required, and as
> far as I can tell that's only available on x86 => the new algorithm needs
> to be conditionalized somehow. Using SB!C:FIND-MODULAR-VERSION doesn't
> seem possible, since it's loaded too late in the build process.
Actually, it should be possible to use it. I'll explain in a little
more detail below.
> For now
> I've settled on a kludgy
> #+(or (and sb-xc-host !x86) (and (not sb-xc-host) x86))
> #-(not (or (and sb-xc-host !x86) (and (not sb-xc-host) x86)))
> , but that seem rather sub-optimal. :-)
Indeed -- in fact, it needs some work.
Firstly, there are not just one but two reader macros that do
read-time conditionalization: the ordinary Common Lisp one, with #+
and #-, which dispatches on the *features* list of the host Common
Lisp; and our SBCL-as-application--defined #!+ and #!-, which dispatch
on the *features* list of the Lisp that is to come -- that is, on
*shebang-features*, which becomes *features* when the new SBCL is
So firstly, #+(... x86 ...) is a no-no, because that means it
dispatches on whether the host Common Lisp is running on an x86 --
meaning that if you're attempting a cross-compilation from a host x86
lisp to a PPC, say, you will lose :-). So at least that part wants to
be #!+x86 (i.e. "compile this bit if I will eventually be running on
an x86", rather than "compile this bit if I am currently being built
on an x86").
Secondly, the file in question (target-sxhash.lisp) is built early,
true, but with a :not-host -- that is, it will only ever be built by
sbcl-as-application, and never by the host Lisp in and of itself,
because we do not need to /execute/ this file to run the
cross-compiler (but we /do/ need to compile the file with the
cross-compiler to produce the new lisp). So the conditionals can be
simplified to #!+x86 and #!-x86.
But thirdly, since this file is only built by the cross-compiler, the
cross-compiler is loaded by the time this file is compiled; and at
this time, modular function compilation is a known beast. So it
_should_ be replacable by
(defun %sxhash-substring ...
#.(if (sb!c:find-modular-function '* 32)
unless I'm missing something.
> HASH-STRINGS [ 11.36] 0.73
> +#+(or (and sb-xc-host !x86) (and (not sb-xc-host) x86))
> +(defun %sxhash-substring (string &optional (count (length string)))
> + ;; Fowler/Noll/Vo-hashing. Requires efficient multiplication modulo
> + ;; 2**32.
> + (declare (optimize (speed 3) (safety 0)))
> + (declare (type string string))
> + (declare (type index count))
> + (let ((result 2166136261))
> + (declare (type (unsigned-byte 32) result))
> + (unless (typep string '(vector nil))
> + (dotimes (i count)
> + (declare (type index i))
> + (setf result (ldb (byte 32 0)
> + (* (logxor (char-code (aref string i)) result)
> + 16777619)))))
> + (logand result most-positive-fixnum)))
So, my other question: Is this algorithm (and these constants)
specialized for hashing strings of 7-bit (or 8-bit) characters to
32-bit values? I presume so, but is there a generalization (a) to
21-bit (or 32-bit) characters, possibly with a non-uniform
distribution, and (b) to 64-bit values?
Thanks very much for this investigation; once these things are sorted
out, it will spur implementation of modular multiplication on non-x86
platforms, too :-)
http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge)