From: <don...@is...> - 2004-04-28 18:02:04
|
> Alternatively, just mark the table as requiring a rehash. > Actually, just create all HTs as GC-INVARIANT and turn them into > non-invariant as needed. I've not followed this thread carefully, but it does sound related to something that I recall being discussed before. I gather you're trying to save rehashing by knowing that there's nothing in the table that needs to be rehashed. Why not arrange for each table to contain internally a list of the keys that have to be rehashed? This seems like a more precise generalization (at the cost of space) of the scheme you suggest (which stores one bit). It saves a lot when a large table has a few keys that need to be rehashed. The cost in space could be limited by a policy that if there are too many elements that need to be rehashed you just iterate as before. I hope that the code does a lazy form of rehashing, so that if the table is not accessed between two gc's there's no rehash cost. Depending on how the tables work, you might arrange that you don't need to worry about rehashing when you write, only when you read. This would be especially worth while for situations where there's a large initialization phase in which data is entered into the table but nothing is retrieved. ==== I'm just about to start experimenting with a large amount of data where all of this gc performance becomes highly relevant. The (time) output for the phase that reads the data into a list of 7.5 million entries: Real time: 518.6605 sec. Run time: 389.74 sec. Space: 6894053224 Bytes GC: 339, GC time: 90.07 sec. On input I printed a dot for every 1000 entries. I noticed what I interpreted as patterns of GC delays. Is there some way to record when gc's start and end ? Ideally a gc hook that would be called after gc but give you access to some data stored from before - run time, real time. It would be nice to be able to do something (that uses a limited amount of space) before, like print a message. I already had to move to the biggest machine I have available. Current image ~600MB Any advice is welcome. |