Thread: [pygccxml-development] Serious bug in pygccxml (scopedef.py)

Brought to you by: mbaas, roman_yakovenko

pygccxml-development

[pygccxml-development] Serious bug in pygccxml (scopedef.py)

From: Matthias B. <ba...@ir...> - 2006-03-15 09:44:01

Hi,

I just had a case where a query for a single class produced a result 
that contained that class twice (this something that shouldn't be 
possible at all).
It took me about an hour to find that bug. The problem is the scopedef_t 
class (from which class_t and namespace_t are derived). This class 
overrides the equality operator __eq__() and the implementation of that 
operator has unwanted side effects, i.e. it modifies the instances! The 
method calls self._sorted_list() with its declaration list as argument. 
_sorted_list() sorts the list inplace, so this means the declaration 
list gets modified. If you do that while iterating over the list you get 
a messed up result where a declaration may appear twice (while other 
declarations haven't been seen at all). I think this is quite a serious 
bug (that is difficult to track down).

A quick fix would be to replace the last line in scopedef_t.__eq__() with:

         return self._sorted_list( self.declarations[:] ) \
                == other._sorted_list( other.declarations[:] )

That is, instead of passing declarations directly we pass a copy of that 
list (using slice notation, you might also use the copy module).
But of course, these involves two copy operations which makes the 
operator even more inefficient.

I have to admit that I dont't really understand why those operators are 
overridden at all. In which case does it happen that two separate 
declaration objects refer to the same declaration? (probably when a 
declaration is included in two separate include files, right? But why 
isn't the first declaration then reused in the first place?).

I think in my case I could switch to comparing id(decl) instead of decl 
itself. But I think for someone else reading the code this is not really 
obvious why I have to do that...

- Matthias -

PS: I also don't understand the purpose of the (static!) method 
_sorted_list() as this method just calls sort() and does nothing else. 
Why wasn't sort() used in the first place?

Re: [pygccxml-development] Serious bug in pygccxml (scopedef.py)

From: Roman Y. <rom...@gm...> - 2006-03-15 10:27:18

On 3/15/06, Matthias Baas <ba...@ir...> wrote:
> Hi,
>
> I just had a case where a query for a single class produced a result
> that contained that class twice (this something that shouldn't be
> possible at all).
> It took me about an hour to find that bug. The problem is the scopedef_t
> class (from which class_t and namespace_t are derived). This class
> overrides the equality operator __eq__() and the implementation of that
> operator has unwanted side effects, i.e. it modifies the instances! The
> method calls self._sorted_list() with its declaration list as argument.
> _sorted_list() sorts the list inplace, so this means the declaration
> list gets modified. If you do that while iterating over the list you get
> a messed up result where a declaration may appear twice (while other
> declarations haven't been seen at all). I think this is quite a serious
> bug (that is difficult to track down).
>
> A quick fix would be to replace the last line in scopedef_t.__eq__() with=
:
>
>          return self._sorted_list( self.declarations[:] ) \
>                 =3D=3D other._sorted_list( other.declarations[:] )
>
> That is, instead of passing declarations directly we pass a copy of that
> list (using slice notation, you might also use the copy module).
> But of course, these involves two copy operations which makes the
> operator even more inefficient.

I committed proposed fixed

> I have to admit that I dont't really understand why those operators are
> overridden at all.

pygccxml.project_reader_t has 2 modes to parse group of files. I
needed to write some
test to check that both methods return same answer.

Also I needed to test cache algorithm.

In general those 2 methods exists for unit tests.

> In which case does it happen that two separate
> declaration objects refer to the same declaration?

There is no such case

>(probably when a
> declaration is included in two separate include files, right? But why
> isn't the first declaration then reused in the first place?).

It is do re-used.

> I think in my case I could switch to comparing id(decl) instead of decl
> itself.

I think that this is a better solution, than comparing decls

>But I think for someone else reading the code this is not really
> obvious why I have to do that...

Matthias, it is not the first time I provide "not so user friendly" interfa=
ce.
Can you show some code you have to write now and how I think it should
be written.

> - Matthias -

Thank you.

> PS: I also don't understand the purpose of the (static!) method
> _sorted_list() as this method just calls sort() and does nothing else.
> Why wasn't sort() used in the first place?

Convenience. Python sorts list in place and return value of sort
function is None.
This function allows you to write something like this:

soorted_list ( x ) =3D=3D sorted_list( y )

while without function you need to write

x.sort()
y.sort()
x =3D=3D y



--
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/

Re: [pygccxml-development] Serious bug in pygccxml (scopedef.py)

From: Matthias B. <ba...@ir...> - 2006-03-15 13:00:12

Roman Yakovenko wrote:
> I committed proposed fixed

Thanks!

>> I have to admit that I dont't really understand why those operators are
>> overridden at all.
> 
> pygccxml.project_reader_t has 2 modes to parse group of files. I
> needed to write some
> test to check that both methods return same answer.
> 
> Also I needed to test cache algorithm.
> 
> In general those 2 methods exists for unit tests.
> 
>> In which case does it happen that two separate
>> declaration objects refer to the same declaration?
> 
> There is no such case
> 
>> (probably when a
>> declaration is included in two separate include files, right? But why
>> isn't the first declaration then reused in the first place?).
> 
> It is do re-used.

But if those special == and != operators are only required for unit 
testing, wouldn't it then be better to move them to the unit tests and 
keep the standard operators for the regular users? From what I've seen 
the operators implement rather expensive operations (copying, sorting, 
comparing several values, etc.) and if this could just as well be done 
by comparing the reference of the instance, this would be much more 
efficient.

>> But I think for someone else reading the code this is not really
>> obvious why I have to do that...
> 
> Matthias, it is not the first time I provide "not so user friendly" interface.
> Can you show some code you have to write now and how I think it should
> be written.

It's in the __call__() method of the ParentFilter class in 
experimental/filters.py. First, the method looked like this:

     def __call__(self, decl):
         return decl.parent==self.parent

Because of the bug in the == operator I had a look at the implementation 
and found out that the operator is doing quite a lot things. With the 
latest fix you've applied to scopedef.py the above code would work 
again, but it would not be as efficient as the following line:

         return id(decl.parent)==id(self.parent)

>> PS: I also don't understand the purpose of the (static!) method
>> _sorted_list() as this method just calls sort() and does nothing else.
>> Why wasn't sort() used in the first place?
> 
> Convenience. Python sorts list in place and return value of sort
> function is None.
> This function allows you to write something like this:
> 
> soorted_list ( x ) == sorted_list( y )
> 
> while without function you need to write
> 
> x.sort()
> y.sort()
> x == y

ok, I see. But sorted_list() has the side effect of modifying its 
argument (which might not have been intentional). That's why the Python 
developers chose not to return the list from sort() (see note (7) here: 
http://docs.python.org/lib/typesseq-mutable.html)
Maybe the semantics you actually had in mind was this:

     def _sorted_list( some_list ):
         result = copy.copy(some_list)
         result.sort()
         return result

But of course, this makes the operation more expensive...

- Matthias -

Re: [pygccxml-development] Serious bug in pygccxml (scopedef.py)

From: Roman Y. <rom...@gm...> - 2006-03-15 13:29:41

On 3/15/06, Matthias Baas <ba...@ir...> wrote:
> But if those special =3D=3D and !=3D operators are only required for unit
> testing, wouldn't it then be better to move them to the unit tests and
> keep the standard operators for the regular users? From what I've seen
> the operators implement rather expensive operations (copying, sorting,
> comparing several values, etc.) and if this could just as well be done
> by comparing the reference of the instance, this would be much more
> efficient.

I can argue about this point, but I understand you.

> >> But I think for someone else reading the code this is not really
> >> obvious why I have to do that...
> >
> > Matthias, it is not the first time I provide "not so user friendly" int=
erface.
> > Can you show some code you have to write now and how I think it should
> > be written.
>
> It's in the __call__() method of the ParentFilter class in
> experimental/filters.py. First, the method looked like this:
>
>      def __call__(self, decl):
>          return decl.parent=3D=3Dself.parent
>
> Because of the bug in the =3D=3D operator I had a look at the implementat=
ion
> and found out that the operator is doing quite a lot things. With the
> latest fix you've applied to scopedef.py the above code would work
> again, but it would not be as efficient as the following line:
>
>          return id(decl.parent)=3D=3Did(self.parent)

:-)  return decl.parent is self.decl.parent is it better ? :-)))))

> >> PS: I also don't understand the purpose of the (static!) method
> >> _sorted_list() as this method just calls sort() and does nothing else.
> >> Why wasn't sort() used in the first place?
> >
> > Convenience. Python sorts list in place and return value of sort
> > function is None.
> > This function allows you to write something like this:
> >
> > soorted_list ( x ) =3D=3D sorted_list( y )
> >
> > while without function you need to write
> >
> > x.sort()
> > y.sort()
> > x =3D=3D y
>
> ok, I see. But sorted_list() has the side effect of modifying its
> argument (which might not have been intentional). That's why the Python
> developers chose not to return the list from sort() (see note (7) here:
> http://docs.python.org/lib/typesseq-mutable.html)
> Maybe the semantics you actually had in mind was this:
>
>      def _sorted_list( some_list ):
>          result =3D copy.copy(some_list)
>          result.sort()
>          return result
>
> But of course, this makes the operation more expensive...

I prefer expensive and work, to quick and may be work :-)

--
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/