From: SourceForge.net <no...@so...> - 2006-12-22 14:24:32
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 19:10:30
|
Bugs item #1152612, was opened at 2005-02-26 17:17 Message generated for change (Comment added) made by kzuberi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- >Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 14:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 09:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 19:30:26
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 19:36:40
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by pedronis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- >Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 19:54:41
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 20:38:34
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 20:48:22
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 20:58:25
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by pedronis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- >Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 20:58 Message: Logged In: YES user_id=61408 Originator: NO notice that we want the synchronization. Although it makes no strong promises about what happens with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython this means that builtin types should have an "atomic" behavior. Now the are dissenting opionions (http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm) on this but the bottom line (because it has happened) is that if we miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too annoying especially if the code is to run on top CPython primarely. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 21:16:05
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 21:16 Message: Logged In: YES user_id=1277399 Originator: NO I think we may be able to get away with a ConcurrentHashMap. It should offer better scalability than the one lock per table that comes with the Hashtable and is safer than the HashMap which doesn't have any syncronization. Im seeing roughly the same golden times I was seeing yesterday with a PyDictionary that has had its __finditem__(String) overriden and its Hashtable replaced with a ConcurrentHashMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 20:58 Message: Logged In: YES user_id=61408 Originator: NO notice that we want the synchronization. Although it makes no strong promises about what happens with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython this means that builtin types should have an "atomic" behavior. Now the are dissenting opionions (http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm) on this but the bottom line (because it has happened) is that if we miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too annoying especially if the code is to run on top CPython primarely. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2006-12-22 21:36:29
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by leouserz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 21:36 Message: Logged In: YES user_id=1277399 Originator: NO it may be more scalable with threads but it is not more scalable with memory. ConcurrentHashMap appears to be a hog in terms of what it consumes. That nixes is for general mass usage. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 21:16 Message: Logged In: YES user_id=1277399 Originator: NO I think we may be able to get away with a ConcurrentHashMap. It should offer better scalability than the one lock per table that comes with the Hashtable and is safer than the HashMap which doesn't have any syncronization. Im seeing roughly the same golden times I was seeing yesterday with a PyDictionary that has had its __finditem__(String) overriden and its Hashtable replaced with a ConcurrentHashMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 20:58 Message: Logged In: YES user_id=61408 Originator: NO notice that we want the synchronization. Although it makes no strong promises about what happens with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython this means that builtin types should have an "atomic" behavior. Now the are dissenting opionions (http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm) on this but the bottom line (because it has happened) is that if we miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too annoying especially if the code is to run on top CPython primarely. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-05-19 18:24:34
|
Bugs item #1152612, was opened at 2005-02-26 22:17 Message generated for change (Comment added) made by amak You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- >Comment By: Alan Kennedy (amak) Date: 2007-05-19 18:24 Message: Logged In: YES user_id=647684 Originator: NO Could solving this problem be as simple as changing the org/python/modules/types.java to read like this dict.__setitem__("DictType", new PyTuple(new PyObject[] { PyType.fromClass(PyDictionary.class)), PyType.fromClass(PyStringMap.class)), })); The isinstance operator takes a tuple as a parameter, as can be seen in the definition for StringTypes. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 21:36 Message: Logged In: YES user_id=1277399 Originator: NO it may be more scalable with threads but it is not more scalable with memory. ConcurrentHashMap appears to be a hog in terms of what it consumes. That nixes is for general mass usage. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 21:16 Message: Logged In: YES user_id=1277399 Originator: NO I think we may be able to get away with a ConcurrentHashMap. It should offer better scalability than the one lock per table that comes with the Hashtable and is safer than the HashMap which doesn't have any syncronization. Im seeing roughly the same golden times I was seeing yesterday with a PyDictionary that has had its __finditem__(String) overriden and its Hashtable replaced with a ConcurrentHashMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 20:58 Message: Logged In: YES user_id=61408 Originator: NO notice that we want the synchronization. Although it makes no strong promises about what happens with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython this means that builtin types should have an "atomic" behavior. Now the are dissenting opionions (http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm) on this but the bottom line (because it has happened) is that if we miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too annoying especially if the code is to run on top CPython primarely. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 20:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 19:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 19:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 19:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-05-19 19:56:11
|
Bugs item #1152612, was opened at 2005-02-26 17:17 Message generated for change (Comment added) made by fwierzbicki You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: Deferred Status: Open Resolution: None Priority: 2 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: vars(obj) returns PyStringMap instead of DictType Initial Comment: When getting an object's __dict__, the type() of the dictionary object returns PyStringMap. This causes a problem because types.DictType does not match PyStringMap. Some existing Marshallers (in my case, xmlrpclib) expect an Instance's __dict__ to be a DictType when marshalling an Instance (such as an Exception). It looks like types.DictType should match org.python.core.PyStringMap. When getting the __dict__ of an Instance in CPython, it returns a type of DictType. -Steve leo...@nu... ---------------------------------------------------------------------- >Comment By: Frank Wierzbicki (fwierzbicki) Date: 2007-05-19 15:56 Message: Logged In: YES user_id=193969 Originator: NO I don't think making isinstance of PyStringMap return DictType makes sense, since it does not implement the full interface of a DictType, for example, it cannot take non-strings as keys while one would expect something of type DictType to behave that way. ---------------------------------------------------------------------- Comment By: Alan Kennedy (amak) Date: 2007-05-19 14:24 Message: Logged In: YES user_id=647684 Originator: NO Could solving this problem be as simple as changing the org/python/modules/types.java to read like this dict.__setitem__("DictType", new PyTuple(new PyObject[] { PyType.fromClass(PyDictionary.class)), PyType.fromClass(PyStringMap.class)), })); The isinstance operator takes a tuple as a parameter, as can be seen in the definition for StringTypes. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 16:36 Message: Logged In: YES user_id=1277399 Originator: NO it may be more scalable with threads but it is not more scalable with memory. ConcurrentHashMap appears to be a hog in terms of what it consumes. That nixes is for general mass usage. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 16:16 Message: Logged In: YES user_id=1277399 Originator: NO I think we may be able to get away with a ConcurrentHashMap. It should offer better scalability than the one lock per table that comes with the Hashtable and is safer than the HashMap which doesn't have any syncronization. Im seeing roughly the same golden times I was seeing yesterday with a PyDictionary that has had its __finditem__(String) overriden and its Hashtable replaced with a ConcurrentHashMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 15:58 Message: Logged In: YES user_id=61408 Originator: NO notice that we want the synchronization. Although it makes no strong promises about what happens with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython this means that builtin types should have an "atomic" behavior. Now the are dissenting opionions (http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm) on this but the bottom line (because it has happened) is that if we miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too annoying especially if the code is to run on top CPython primarely. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 15:48 Message: Logged In: YES user_id=1277399 Originator: NO aha, PyStringMap does have a magic method, __finditem__(String data) this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there. I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 15:38 Message: Logged In: YES user_id=1277399 Originator: NO yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap. Performance wise, it improved but not to the degree that it improved with PyStringMap. Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did. This may just mean that PyDictionary could use some additional tweaking. Swapping in a HashMap will help a little as there will be less lock acquisition going on. But I can't believe that is the key to the better performance I was seeing. leouser ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:54 Message: Logged In: YES user_id=1277399 Originator: NO hmm, I was thinking about having a PyString cache. Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString. I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings. So Im not sure how switching to PyDictionary is going to add any costs in this regard. Yes, hashCode should be faster than System.identityHashCode(). Native methods add overhead that you won't ever see with a simple accessor method. String just returns a newly calculated hashCode or a cached one. The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap. leouser ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2006-12-22 14:36 Message: Logged In: YES user_id=61408 Originator: NO the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly. Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs. ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 14:30 Message: Logged In: YES user_id=1277399 Originator: NO hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call. From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running. So from that angle PyStringMap didn't seem that speedy. I would suspect that PyString would return as quickly as the String. Its hashCode, hashes the internal string and caches the value. So I would expect equivilent behavior between the two. Also, it looks like it should have a speedy equals method. As long as the string is interned. So I don't see any terrible issues using it as a key. I think using a PyDictionary would make the instances more compliant with Python. Given that I can take the dict from a Python instance and use non-strings for keys. leouser ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2006-12-22 14:10 Message: Logged In: YES user_id=18288 Originator: NO The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap: http://article.gmane.org/gmane.comp.lang.jython.devel/2610 - kz ---------------------------------------------------------------------- Comment By: leouser (leouserz) Date: 2006-12-22 09:24 Message: Logged In: YES user_id=1277399 Originator: NO its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary. From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance. Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap). leouser ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1152612&group_id=12867 |