From: <don...@is...> - 2017-12-20 20:06:53
|
Joe...@t-... writes: > Regarding concurrent hash-tables, I'd relax the requirements about iteration: > - Iteration must terminate. > - The set of keys need not match the serializable database isolation level. > - It's OK for some keys to be missing from the enumeration! That's the price > an application pays for going through a table without adequate outer locking. > - No key shall be presented twice(?) (except if it's removed + re-added meanwhile) > - Iteration must not crash. > > IOW, it's OK during iteration to present keys that have been removed > concurrently. There's no need to implement snapshots. But the data > structures should never be seen in such an inconsistent state that the > iteration loops infinitely or crashes. I think that if you have a thread that keeps adding new keys, it's ok for the iteration to never terminate. I don't think it's ok to miss keys that were present at the beginning of the iteration and never removed. Nor do I think it's ok to present the same key twice in any case. I do agree that the iteration should not crash. Here's what I wrote before: For instance, I hope hash table iteration could be described in terms of all of the keys in the table at the beginning plus all those added during the iteration being ordered somehow, and the iteration visiting them in that order (along with the associated values at the time the key is visited), but skipping those that were not in the table (either cause they were deleted earlier or not yet added) at the time the iteration reaches their position. I hope the informal specification above is clear. (BTW, I think that the statement: the iteration visits once every key that was in the table at the beginning of the iteration but never deleted during the iteration, never visits any key that was not in the table at the beginning and not added during the iteration, and visits either once or zero times those that were added or deleted during the iteration is implied by that but fails to guarantee some desired ordering properties guaranteed by the more operational description.) To be a bit more explicit, here's an example. I think I'm asking for something that every programmer will expect. Suppose an iteration over a table starts in a state with the table empty, and another thread performs the following operations in the following order: operation resulting state add A => 1 (A=>1) add B => 2 (A=>1,B=>2) remove A (B=>2) add A => 3 (B=>2,A=>3) remove B (A=>3) add B => 4 (A=>3,B=>4) and now the iteration returns. Suppose the iteration simply collects and returns a list of all the keys and values in the order visited. The following results from the iteration are allowable: NIL - the iteration could have finished before the first add all of the resulting states listed above are allowed when interpreted as iterations results, since the entire iteration could have been performed in that state (A=>1,B=>4), even though the table was never in such a state, because it could visit A right after A was added and B just before the end) (B=>4) is allowed because A might have been visited during the time it was not present, and then B visited at the end. However (B=>4,A=>1) is NOT allowed, because after B became related to 4, A was never related to 1. Note that my model does require that the table can be viewed as having gone through a sequence of states corresponding to the operations having been performed in some order. So if the first two adds had been done by two different threads, we might have a choice of the order in which they were done. But this choice could be limited by other evidence, such as reads that determine what's in the table at different ordered points in time. The specification above also rules out clearly ridiculous outcomes such as (C=>4) or (A=>9). |