Thread: [pygccxml-development] Serious bug in pygccxml (scopedef.py)
Brought to you by:
mbaas,
roman_yakovenko
From: Matthias B. <ba...@ir...> - 2006-03-15 09:44:01
|
Hi, I just had a case where a query for a single class produced a result that contained that class twice (this something that shouldn't be possible at all). It took me about an hour to find that bug. The problem is the scopedef_t class (from which class_t and namespace_t are derived). This class overrides the equality operator __eq__() and the implementation of that operator has unwanted side effects, i.e. it modifies the instances! The method calls self._sorted_list() with its declaration list as argument. _sorted_list() sorts the list inplace, so this means the declaration list gets modified. If you do that while iterating over the list you get a messed up result where a declaration may appear twice (while other declarations haven't been seen at all). I think this is quite a serious bug (that is difficult to track down). A quick fix would be to replace the last line in scopedef_t.__eq__() with: return self._sorted_list( self.declarations[:] ) \ == other._sorted_list( other.declarations[:] ) That is, instead of passing declarations directly we pass a copy of that list (using slice notation, you might also use the copy module). But of course, these involves two copy operations which makes the operator even more inefficient. I have to admit that I dont't really understand why those operators are overridden at all. In which case does it happen that two separate declaration objects refer to the same declaration? (probably when a declaration is included in two separate include files, right? But why isn't the first declaration then reused in the first place?). I think in my case I could switch to comparing id(decl) instead of decl itself. But I think for someone else reading the code this is not really obvious why I have to do that... - Matthias - PS: I also don't understand the purpose of the (static!) method _sorted_list() as this method just calls sort() and does nothing else. Why wasn't sort() used in the first place? |
From: Roman Y. <rom...@gm...> - 2006-03-15 10:27:18
|
On 3/15/06, Matthias Baas <ba...@ir...> wrote: > Hi, > > I just had a case where a query for a single class produced a result > that contained that class twice (this something that shouldn't be > possible at all). > It took me about an hour to find that bug. The problem is the scopedef_t > class (from which class_t and namespace_t are derived). This class > overrides the equality operator __eq__() and the implementation of that > operator has unwanted side effects, i.e. it modifies the instances! The > method calls self._sorted_list() with its declaration list as argument. > _sorted_list() sorts the list inplace, so this means the declaration > list gets modified. If you do that while iterating over the list you get > a messed up result where a declaration may appear twice (while other > declarations haven't been seen at all). I think this is quite a serious > bug (that is difficult to track down). > > A quick fix would be to replace the last line in scopedef_t.__eq__() with= : > > return self._sorted_list( self.declarations[:] ) \ > =3D=3D other._sorted_list( other.declarations[:] ) > > That is, instead of passing declarations directly we pass a copy of that > list (using slice notation, you might also use the copy module). > But of course, these involves two copy operations which makes the > operator even more inefficient. I committed proposed fixed > I have to admit that I dont't really understand why those operators are > overridden at all. pygccxml.project_reader_t has 2 modes to parse group of files. I needed to write some test to check that both methods return same answer. Also I needed to test cache algorithm. In general those 2 methods exists for unit tests. > In which case does it happen that two separate > declaration objects refer to the same declaration? There is no such case >(probably when a > declaration is included in two separate include files, right? But why > isn't the first declaration then reused in the first place?). It is do re-used. > I think in my case I could switch to comparing id(decl) instead of decl > itself. I think that this is a better solution, than comparing decls >But I think for someone else reading the code this is not really > obvious why I have to do that... Matthias, it is not the first time I provide "not so user friendly" interfa= ce. Can you show some code you have to write now and how I think it should be written. > - Matthias - Thank you. > PS: I also don't understand the purpose of the (static!) method > _sorted_list() as this method just calls sort() and does nothing else. > Why wasn't sort() used in the first place? Convenience. Python sorts list in place and return value of sort function is None. This function allows you to write something like this: soorted_list ( x ) =3D=3D sorted_list( y ) while without function you need to write x.sort() y.sort() x =3D=3D y -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Matthias B. <ba...@ir...> - 2006-03-15 13:00:12
|
Roman Yakovenko wrote: > I committed proposed fixed Thanks! >> I have to admit that I dont't really understand why those operators are >> overridden at all. > > pygccxml.project_reader_t has 2 modes to parse group of files. I > needed to write some > test to check that both methods return same answer. > > Also I needed to test cache algorithm. > > In general those 2 methods exists for unit tests. > >> In which case does it happen that two separate >> declaration objects refer to the same declaration? > > There is no such case > >> (probably when a >> declaration is included in two separate include files, right? But why >> isn't the first declaration then reused in the first place?). > > It is do re-used. But if those special == and != operators are only required for unit testing, wouldn't it then be better to move them to the unit tests and keep the standard operators for the regular users? From what I've seen the operators implement rather expensive operations (copying, sorting, comparing several values, etc.) and if this could just as well be done by comparing the reference of the instance, this would be much more efficient. >> But I think for someone else reading the code this is not really >> obvious why I have to do that... > > Matthias, it is not the first time I provide "not so user friendly" interface. > Can you show some code you have to write now and how I think it should > be written. It's in the __call__() method of the ParentFilter class in experimental/filters.py. First, the method looked like this: def __call__(self, decl): return decl.parent==self.parent Because of the bug in the == operator I had a look at the implementation and found out that the operator is doing quite a lot things. With the latest fix you've applied to scopedef.py the above code would work again, but it would not be as efficient as the following line: return id(decl.parent)==id(self.parent) >> PS: I also don't understand the purpose of the (static!) method >> _sorted_list() as this method just calls sort() and does nothing else. >> Why wasn't sort() used in the first place? > > Convenience. Python sorts list in place and return value of sort > function is None. > This function allows you to write something like this: > > soorted_list ( x ) == sorted_list( y ) > > while without function you need to write > > x.sort() > y.sort() > x == y ok, I see. But sorted_list() has the side effect of modifying its argument (which might not have been intentional). That's why the Python developers chose not to return the list from sort() (see note (7) here: http://docs.python.org/lib/typesseq-mutable.html) Maybe the semantics you actually had in mind was this: def _sorted_list( some_list ): result = copy.copy(some_list) result.sort() return result But of course, this makes the operation more expensive... - Matthias - |
From: Roman Y. <rom...@gm...> - 2006-03-15 13:29:41
|
On 3/15/06, Matthias Baas <ba...@ir...> wrote: > But if those special =3D=3D and !=3D operators are only required for unit > testing, wouldn't it then be better to move them to the unit tests and > keep the standard operators for the regular users? From what I've seen > the operators implement rather expensive operations (copying, sorting, > comparing several values, etc.) and if this could just as well be done > by comparing the reference of the instance, this would be much more > efficient. I can argue about this point, but I understand you. > >> But I think for someone else reading the code this is not really > >> obvious why I have to do that... > > > > Matthias, it is not the first time I provide "not so user friendly" int= erface. > > Can you show some code you have to write now and how I think it should > > be written. > > It's in the __call__() method of the ParentFilter class in > experimental/filters.py. First, the method looked like this: > > def __call__(self, decl): > return decl.parent=3D=3Dself.parent > > Because of the bug in the =3D=3D operator I had a look at the implementat= ion > and found out that the operator is doing quite a lot things. With the > latest fix you've applied to scopedef.py the above code would work > again, but it would not be as efficient as the following line: > > return id(decl.parent)=3D=3Did(self.parent) :-) return decl.parent is self.decl.parent is it better ? :-))))) > >> PS: I also don't understand the purpose of the (static!) method > >> _sorted_list() as this method just calls sort() and does nothing else. > >> Why wasn't sort() used in the first place? > > > > Convenience. Python sorts list in place and return value of sort > > function is None. > > This function allows you to write something like this: > > > > soorted_list ( x ) =3D=3D sorted_list( y ) > > > > while without function you need to write > > > > x.sort() > > y.sort() > > x =3D=3D y > > ok, I see. But sorted_list() has the side effect of modifying its > argument (which might not have been intentional). That's why the Python > developers chose not to return the list from sort() (see note (7) here: > http://docs.python.org/lib/typesseq-mutable.html) > Maybe the semantics you actually had in mind was this: > > def _sorted_list( some_list ): > result =3D copy.copy(some_list) > result.sort() > return result > > But of course, this makes the operation more expensive... I prefer expensive and work, to quick and may be work :-) -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |