Thread: [Pyobjc-dev] Pre-PEP: Mutable keys in dictionaries
Brought to you by:
ronaldoussoren
From: Just v. R. <ju...@le...> - 2003-02-05 23:54:46
|
Here's a quick pre-PEP. Allow mutable objects as dictionary keys in certain cases by adding a new protocol. dict.__setitem__() and __contains__() would work like this bit of pseudo code: def __setitem__(self, key, value): if not hashable(key): if hasattr(key, "__immutable_copy__"): key = key.__immutable_copy__() else: raise TypeError, "unhashable key" ...insert value with key into dict... def __contains__(self, key): if not hashable(key): if hasattr(key, "__temporary_immutable__"): key = key.__temporary_immutable__() else: raise TypeError, "unhashable key" ...test for key membership... (other methods would be similar) When implemented in C in dictobject.c, this would add no overhead for the "normal" case (when keys _are_ directly hashable). The __temporary_immutable__ part of the protocol is needed to avoid unneccesary copying in those cases where the key isn't actually inserted into the dict, but is only used for membership testing. It would return a wrapper around the mutable object, defining __eq__ and __hash__. (This idea is stolen from the sets.py module.) If an implementer doesn't care about this efficiency, __temporary_immutable__ can be an alias to __immutable_copy__. Use Cases: - sets.py could be written more efficiently and/or could be more easily be rewritten in C. - Again, the troubles of bridging Python to Objective-C have triggered this idea. Due to the nature of Objective-C in general, and Cocoa in particular, we can never be sure beforehand whether a string is mutable or not. We can detect mutability at runtime, but depending on this is dangerous as it exposes an implementation detail that may change between releases of _any_ API that returns strings. Therefore code that works in one setup may break in another, or even in the the same setup after an arbitrary library upgrade, simply because an API suddenly returns a mutable string where it didn't before. Yes this sucks, but it's the way it is. <Add link to post from bbum explaining this situation in detail> Backwards compatibilty: 100%. Overhead added None Implementation There's quite a bit of code duplication in those areas of dictobject.c that matter for this proposal, so implementing it will take longer than the 10 minutes I originally estimated <wink>. Perhaps this is a good opportunity to refactor these bits into inline functions? Not being a C guru I can't judge whether this is possible or desirable with the set of compilers that Python must be built with. Just |
From: Guido v. R. <gu...@py...> - 2003-02-06 18:58:41
|
> Here's a quick pre-PEP. > > > Allow mutable objects as dictionary keys in certain cases by adding a > new protocol. dict.__setitem__() and __contains__() would work like this > bit of pseudo code: > > def __setitem__(self, key, value): > if not hashable(key): > if hasattr(key, "__immutable_copy__"): > key = key.__immutable_copy__() > else: > raise TypeError, "unhashable key" > ...insert value with key into dict... > > def __contains__(self, key): > if not hashable(key): > if hasattr(key, "__temporary_immutable__"): > key = key.__temporary_immutable__() > else: > raise TypeError, "unhashable key" > ...test for key membership... > > (other methods would be similar) > > When implemented in C in dictobject.c, this would add no overhead for > the "normal" case (when keys _are_ directly hashable). I have a small doubt about this claim. Inserting code into PyDict_SetItem(), even if it is normally never executed, may reduce code locality and hence cause the code to miss the cache more often than before. This *may* be addressed by rearranging the code, but there's a limit to that. I have a countersuggestion. Could you do this as a subclass of dict? A subclass in Python would be inefficient, but might still be good enough for your second use case (the ObjC bridge). If not, a subclass in Python might be feasible -- it would be a little bit more work than integrating it into dictobject.c, but you have a lot less convincing to do, and you can even get it to work with Python 2.2. > The __temporary_immutable__ part of the protocol is needed to avoid > unneccesary copying in those cases where the key isn't actually inserted > into the dict, but is only used for membership testing. It would return > a wrapper around the mutable object, defining __eq__ and __hash__. (This > idea is stolen from the sets.py module.) If an implementer doesn't care > about this efficiency, __temporary_immutable__ can be an alias to > __immutable_copy__. > > > Use Cases: > > - sets.py could be written more efficiently and/or could be more easily > be rewritten in C. > > - Again, the troubles of bridging Python to Objective-C have triggered > this idea. Due to the nature of Objective-C in general, and Cocoa in > particular, we can never be sure beforehand whether a string is > mutable or not. We can detect mutability at runtime, but depending on > this is dangerous as it exposes an implementation detail that may > change between releases of _any_ API that returns strings. Therefore > code that works in one setup may break in another, or even in the the > same setup after an arbitrary library upgrade, simply because an API > suddenly returns a mutable string where it didn't before. Yes this > sucks, but it's the way it is. <Add link to post from bbum explaining > this situation in detail> > > > Backwards compatibilty: > > 100%. > > Overhead added > > None > > > Implementation > > There's quite a bit of code duplication in those areas of dictobject.c > that matter for this proposal, so implementing it will take longer than > the 10 minutes I originally estimated <wink>. Perhaps this is a good > opportunity to refactor these bits into inline functions? Not being a C > guru I can't judge whether this is possible or desirable with the set of > compilers that Python must be built with. You can't rely on inline. OTOH, GCC -O3 often inlines static functions without needing a hint. --Guido van Rossum (home page: http://www.python.org/~guido/) |
From: Just v. R. <ju...@le...> - 2003-02-06 20:17:15
|
[JvR] > > When implemented in C in dictobject.c, this would add no overhead > > for the "normal" case (when keys _are_ directly hashable). [GvR] > I have a small doubt about this claim. Inserting code into > PyDict_SetItem(), even if it is normally never executed, may reduce > code locality and hence cause the code to miss the cache more often > than before. This *may* be addressed by rearranging the code, but > there's a limit to that. Ok, so this will need to be addressed somehow. > I have a countersuggestion. > > Could you do this as a subclass of dict? A subclass in Python would > be inefficient, but might still be good enough for your second use > case (the ObjC bridge). If not, a subclass in Python might be > feasible -- it would be a little bit more work than integrating it > into dictobject.c, but you have a lot less convincing to do, and you > can even get it to work with Python 2.2. But it doesn't solve our problem. We have no control over what dictionaries are used, either by our users directly or by the Python library they use. We _do_ have control over the keys that are causing troubles: our main problem is that we cannot rely on NSString instances to be immutable, yet we need the ability to use them as keys in (regular) dicts, just as if they were Python strings. Just |
From: Guido v. R. <gu...@py...> - 2003-02-06 20:08:35
|
> > Could you do this as a subclass of dict? A subclass in Python would > > be inefficient, but might still be good enough for your second use > > case (the ObjC bridge). If not, a subclass in Python might be > > feasible -- it would be a little bit more work than integrating it > > into dictobject.c, but you have a lot less convincing to do, and you > > can even get it to work with Python 2.2. > > But it doesn't solve our problem. We have no control over what > dictionaries are used, either by our users directly or by the Python > library they use. We _do_ have control over the keys that are causing > troubles: our main problem is that we cannot rely on NSString instances > to be immutable, yet we need the ability to use them as keys in > (regular) dicts, just as if they were Python strings. OK, go ahead and try to implement your idea, and report benchmarks here. --Guido van Rossum (home page: http://www.python.org/~guido/) |
From: Neal N. <ne...@me...> - 2003-02-06 20:50:59
|
On Thu, Feb 06, 2003 at 08:42:59PM +0100, Just van Rossum wrote: > > [GvR] > > Could you do this as a subclass of dict? A subclass in Python would > > be inefficient, but might still be good enough for your second use > > case (the ObjC bridge). If not, a subclass in Python might be > > feasible -- it would be a little bit more work than integrating it > > into dictobject.c, but you have a lot less convincing to do, and you > > can even get it to work with Python 2.2. > > But it doesn't solve our problem. We have no control over what > dictionaries are used, either by our users directly or by the Python > library they use. We _do_ have control over the keys that are causing > troubles: our main problem is that we cannot rely on NSString instances > to be immutable, yet we need the ability to use them as keys in > (regular) dicts, just as if they were Python strings. Couldn't you add an as_key() method to NSString? Then dict access would always be like: dict[nsstring.as_key()] Neal |
From: Just v. R. <ju...@le...> - 2003-02-06 21:03:18
|
Neal Norwitz wrote: > Couldn't you add an as_key() method to NSString? Then dict > access would always be like: > > dict[nsstring.as_key()] That's not my idea of transparent bridging, so no ;-) Just |