Once we start the project, the GSoC student can be bound to deliverables that are possible within the time frame. However the full scope of the project might take more. I can work on those parallely.

If I can be kept in the loop for discussions and I can do more detailed analysis and I hope we can find ways we can collaborate with each other.

Amitav


On Tue, Apr 23, 2013 at 5:22 AM, Paul Khuong <pvk@pvk.ca> wrote:
Amitav Mohanty wrote:
I am interested in contributing to SBCL; but I am no more a student so I
can't go through GSoC. However, since I am beginning to work on SBCL, I
will need some guidance. Yesterday we talked about it at #sbcl too. We
talked about a GSoC student getting preference and I agree to that.

I am interested in the task 'stronger hash functions and specialised hash
tables'. Robert would like to join in and both of us would like to work on
it. As we are both engaged by our jobs, we would not be bound to a
timeline; but we are interested in contributing.

Robert already has done some work related to hashes
https://github.com/brown/city-hash/blob/master/city-hash.lisp

I'm CCing Aenik Shah, as he's a student interested in the project.

Discussions on #sbcl seemed to converge on two points:

1. Implementing a strong block hash function and hacking it for incremental hashing would be useful. We could use the function for strings and bit vectors, but also for object graphs: feeding data to a block hash function (rather than mixing fixnum-sized hash values) should result in a higher-quality distribution. I expect assembly code implementations of the inner loop will eventually be interesting, but getting something working in CL would be a good first step, and doesn't even have to be tightly integrated with SBCL.

2. We want to improve the hash function for immediate data (characters, fixnums, pointer addresses, etc.), but speed doesn't seem to be too much of an issue: curently, only sxhash of symbols has an impact, and that's handled as a cache lookup. For everything else, the common use case (gethash/(setf gethash)) goes through a full call to the hash function, which then dispatches on the object's type before computing a hash value. So it's not too bad if hashing immediate data (after type dispatch) is up to twice or thrice as slow, as long as the distribution of hash values is significantly improved in return.

I'm not sure how you could best collaborate with a GSoC student. I suppose it would be reasonable to try and coordinate to implement hash functions in CL. After that, it will clearly depend on the direction the student will wish to work in.

Paul Khuong