#20 Unsafe Object constructor/assignment semantics

open
None
5
2007-01-22
2006-11-06
Kevin J Bluck
No

In my opinion, there is a design (as opposed to
implementation) bug in the PyCXX Object class
interface. In a nutshell, the constructor and
assignment operators overloaded on PyObject* make it
way too easy for programmers to write leaks into their
code.

The constructor has two problems. First and most
important is the defaulted "owned" parameter. The
default is to assume a "borrowed" reference. This I
think was likely a convenience since usually the
PyObject* was being collected from a function
parameter, which are nearly always passed as borrowed
references. Conversely, however, most Python API
functions return new references. Unfortunately,
allowing this parameter to default allows programmers
to easily ignore the question of whether the PyObject*
they are passing is new or borrowed. They are able to
thoughtlessly write something like
Object(PyInt_FromString("100")) and the compiler
happily accepts it without complaint. They will
probably never realize they've just written a resource
leak. Even turning off the default will not completely
fix the constructor, because the second problem is that
the meaning of the 'bool' parameter is obscure. It is
altogether too easy to get confused about whether'true'
means new or borrowed, nor will mistakes in this regard
be easy to pick out during mainenance review.

The second, even worse problem is the assignment
operator. There is no way at all to tell operator=
which sort of reference your PyObject* is. In order to
pass a new reference safely to the assignment operator,
you have to remember to manually decref the Object
instance at some point to avoid creating a resource
leak. Requiring the programmer to write a follow-up
statement in some circumstances but not others to make
a function behave properly is grossly unsafe behavior.

So, my recommendations:

1. Remove completely constructor overload
Object(PyObject*, bool = false) and replace it with
Object(ref_type, PyObject*). ref_type is an enumerated
type at Object class scope, something like

enum{ borrowed = false, new_ref = true }

So, you would write Object( borrowed, ptr ) instead of
Object( ptr ) and Object( new_ref, ptr ) instead of
Object( ptr, true ). Not allowing a default value
forces the programmer to make an explicit decision for
every case, which is appropriate for such an important
question. Putting the ref_type param first both makes
it impossible to default the value later out of
misguided desire for "convenience", and putting that
mnemonic enum type first also allows reviewers to more
easily identify the programmer's assumption and more
easily spot when he made the wrong one. Putting it
second would all to often cause it to be lost in the
noise to the right of long C Python API calls.

2. Remove completely operator=(PyObject*) and don't
replace it any other overload of operator=(). It's just
too unsafe, and the requirement to write a secondary
cleanup statement in some cases removes most of its
syntactic sugar value anyway. Replace it instead with a
explicit public member function that mimics the
constructor interface: reset(ref_type, PyObject*).
(This would replace the existing protected set()
method, which has problems similar to the constructor.)

There is no way to fix these interface issues without
breaking client code. The good news is that these
changes will not *silently* break client code. Every
case where these overloads are used will no longer
compile, so programmers will have to visit each call
site and have the opportunity to retroactively decide
on the correct course of action. In many cases, they'll
be finding and fixing bugs in their own code.
Obviously, this is an extreme action that should be
reserved for a major version upgrade. But I sincerely
think biting the bullet on this matter will improve
both the PyCXX library and the client code that uses it.

Discussion

    • assigned_to: nobody --> barry-scott
    • status: open --> pending
     
  • Logged In: YES
    user_id=28665
    Originator: NO

    I'm going to remove the operator= as its clearly broken.

    The change to the Object( PyObject ) ctor I'm going to experiment with.
    I wonder just how often it is called outside of the PyCXX code itself?
    It may be used so rarely that its no big deal to update caller code.

     
  • Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
    • status: pending --> closed
     
  • Logged In: YES
    user_id=28665
    Originator: NO

    I did not mean to close this.

    I tried to do the change to see what it would entail.

    Implementing this change is very disruptive inside the code and to the API.

    You have a good point, but I'm not sure how I will address it.
    Maybe I'll use #ifdef to allow you to choice the API at compile time.

    I should first clean up the use of set in the existing code. There
    is far to much bouncing between Py::Object and PyObject* inside the
    code.

     
    • status: closed --> open