#5153 Tcl_ListObjReplace may release deleted elements too early

obsolete: 8.5.13
closed-fixed
Jan Nijtmans
5
2013-01-02
2012-12-27
No

in Tcl_ListObjReplace, the deleted elements are Tcl_DecrRefCount'ed BEFORE new elements are Tcl_IncrRefCount'ed. Isn't that a problem if the inserted element happens to be in the list (and only in the list). It would be freed, and the freed memory would be added back to the list. It wouold have to be from custom code (for example, moves an element to the end). Not from script or Tcl_Eval where the objects are ref'ed on the stack and hence will not be released after the Tcl_DecrRefCount.

Discussion

1 2 > >> (Page 1 of 2)
    • milestone: --> 4024681
     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    • milestone: 4024681 --> obsolete: 8.5.13
    • status: open --> open-accepted
     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    Agreed. Same in 8.4 and 8.5 as well

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    Fixed in all branches

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    • status: open-accepted --> closed-fixed
     
  • Jan, two nitpicking questions ;)

    (1) there's an early return path (AttemptNewList); in that case the incRef'ed objv[*] are leaked. Admittedly that happens only for very large lists / lowmem conditions, but ...

    (2) at the point of the former late-incrRef code, there was previously a comment explaining "no memcopy because we need to iterate anyway". Since we no longer do, what about using memcopy (or memmove) ?

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    >Jan, two nitpicking questions ;)
    Please do! Your 'nitpicking' is always appreciated!
    >there's an early return path ....
    There already was a (possible) memory leak in this case, if
    the objv array contains values with refcount 0. Strictly speaking,
    if incrementing refcounts first, then in case of TCL_ERROR
    they should be decremented again before returning. But
    doing this in 8.x is a behaviour change: it might free objects
    that wouldn't be freed before. Since it's so rare, I would
    prefer to keep it as is, at least in 8.0. For 9.0, better
    fix it, that would prevent a leak in all situations.
    > what about using memcopy
    Good idea. Not worth to do it for 8.x, but for 9.0
    this code could definetely have an overhoul.

     
  • Don Porter
    Don Porter
    2012-12-27

    Where's the demo script? The tests?

     
  • > [Don] Where's the demo script? The tests?

    Unfortunately, as Ashok points out, the script level alone can't trigger this (extra refs in the stack frame).

     
  • > [Jan] it might free objects that wouldn't be freed before

    Oh, you're right ! Strike that suggestion :}

    This is yet another reason for my distrust of the choice "create objects at refcount zero".
    So far I had only aesthetic concerns about the two distinct delete-paths (1->0 and 0->-1), but this is a really concrete case where it does matter: thanks !

     
  • Uh, what about a TclDecrRefCountNoDeleteAtZero ? (ie one that frees only at -1)
    (could stay internal... or not, given Ashok's goals ;)

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    >Uh, what about a TclDecrRefCountNoDeleteAtZero ?
    Are you serious? Please don't go that way.
    >....my distrust of the choice "create objects at
    > refcount zero"....
    This choice served Tcl well, but it must be used
    with care. transfering an object with refCount=0
    means: please remove it when you are done with
    it. transfering an object with refCount=1 means
    that the caller expects it to exist when the function
    call returns. Unfortunately, Tcl_ListObjReplace
    violates that rule in the TCL_ERROR case, which
    causes a possible memory leak.

    For 8.4, the escape path doesn't exist, so
    everything is OK.
    For 8.[56], the refcounts could be restored by:
    <http://core.tcl.tk/tcl/info/d5147fc677>
    For 9.x, a Tcl_DecrRefCount would be
    better here, noting a POTENTIAL INCOMPATIBILITY.

    Alex, (and Don) how do you think about that?

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-27

    • status: closed-fixed --> pending-fixed
     
    • status: pending-fixed --> open-fixed
     
  • In response to dgp's request, here is a critcl code fragment to illustrate the problem. As said earlier, because of ref counts, not clear how to demo from a script level.

    critcl::ccommand crash {} {
    Tcl_Obj *objP;
    if (objc != 2) {
    Tcl_WrongNumArgs(interp, 1, objv, "LIST");
    return TCL_ERROR;
    }

    /* Just demo - No error checking */

    if (Tcl_IsShared(objv[1])) {
    Tcl_SetResult(interp, "Need unshared list obj for test", TCL_STATIC);
    return TCL_ERROR;
    }

    /* Nonsensical but legal */
    Tcl_ListObjIndex(interp, objv[1], 1, &objP);
    Tcl_ListObjReplace(interp, objv[1], 1, 2, 1, &objP);
    Tcl_SetObjResult(interp, objv[1]);
    return TCL_OK;
    }

    The following will then result in a panic on windows:
    C:\src\tarray\src>tclsh86t
    % proc history args {}
    % crash
    invalid command name "crash"
    % source tarray.critcl
    % crash
    wrong # args: should be "crash LIST"
    % set v [list ghi fgr tke]
    ghi fgr tke
    % crash $v[set v ""]
    ghi fgr
    % set x "afasdf"
    <Panic>
    Tracing verifies that the second element is freed via TclpFree, put on the cache and then added back to the list, then assigned to x

     
  • Verified crash does not happen with Jan's trunk patch

     
  • Don Porter
    Don Porter
    2012-12-29

    Thank you for the demo to clarify matters.

    In the demo, the caller of T_LOR appears to
    be buggy in its management of refcounts.

    In the demo, the value "fgr" comes into the
    command procedure of [crash] with a refcount
    of 1, accounting for the reference that's stored in
    the list.

    After the call to Tcl_ListObjIndex() another reference
    to that value is stored in objP. To account for the
    second reference, at that point there should be a

    Tcl_IncrRefCount(objP);

    With that correct refcount management in place, there
    should be no issue left with the Tcl_ListObjReplace()
    that we've all been using a long time now.

    A balancing Tcl_DecrRefCount(objP) before returning
    from the [crash] command procedure would then account
    for that variable going out of scope.

    It's true that a buggy C caller can crash the program. That's
    been part of the power and responsibility of C callers as long
    as I've used Tcl. The Tcl C library fairly explicit does not attempt
    to armor plate itself against buggy C callers.

    Unless there's some other non-buggy demo of this issue, I'd
    say there's not a bug here in the sense of something that demands
    to be fixed. If that's correct, then I'd go on to say we ought not
    be messing with the 8.4 and 8.5 implementations over this.

    The improvement in additional safety against buggy C callers is
    something I'd consider in 8.6 patchlevels, but even there only
    when weighed against whatever negative impact might come
    along with the patch. Does it significantly impact performance
    in any reasonable scenarios?

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-29

    Let me respectfully disagree with your diagnostics that
    there is no bug here. The Tcl_ListObjReplace is expected
    to be fire-and-forget: when the objP object is not
    Tcl_IncRefCount'ed means that after the
    Tcl_ListObjReplace call this object is no longer
    used by the caller. In this case, that's true,
    so the caller should not need to increment the
    refCount. Yes, the example code is
    meaningless, but valid. From the documentation
    cannot be derived that the refcount should
    be incremented before the Tcl_ListObjReplace
    call, and decremented immediately after.

    The only thing my patch does is do the
    increment before the decrement, as suggested
    by Ashok, nothing else changed. I now corrected
    the escape path as well, restoring the
    refcounts as they were.

    Extreme care should be taken doing
    this in any 8.x releases, but with 4 pair of
    eyes focussed on this bug and all
    existing test-cases as back-up, I think
    this change should be fine.

    I am trying to turn this into a test-case
    for the test suite, but that's not
    finished yet.

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-29

    > but even there only
    > when weighed against whatever negative impact might come
    > along with the patch. Does it significantly impact performance
    > in any reasonable scenarios?
    Changing the order of calls doesn't influence the performance,
    I don't see any possible negative impact here.

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-29

    From the Tcl_ListObjIndex documentation:
    > The reference count for the list element is not incremented;
    > the caller must do that if it needs to retain a pointer to the element.
    In this case, the caller doesn't retain a pointer to the element,
    it only passes it to Tcl_ListObjReplace. After that, objP
    is not used any more. The function Tcl_ListObjReplace
    is retaining a pointer to the element (in the list
    representation), so this function should (and does)
    increment the refCount.

     
  • FWIW, I agree with Jan's analysis on all points.

    Re the naked refcount-- , I'm still wondering whether it shouldn't have a more explicit macro (like TclDecrRefCountNoDeleteAtZero : that wasn't a joke), for two reasons:

    (1) the need to use it, though rare, cannot be worked around; it is a consequence of the unfortunate choice create-at-zero mentioned earlier in this thread

    (2) A macro keeps the abstraction level for source compat (e.g. in case we want to steal a few bits in the refcount)

     
  • Jan Nijtmans
    Jan Nijtmans
    2012-12-29

    See:
    <https://sourceforge.net/tracker/?func=detail&aid=3598875&group_id=10894&atid=360894>

     
  • Don Porter
    Don Porter
    2013-01-02

    Looking again in a new year, I have no compelling
    objections to keep the patch everywhere it is committed.

    I do like to have demos and tests and maintainer review
    as a general matter, even with no serious harm here.

     
  • Don Porter
    Don Porter
    2013-01-02

    • status: open-fixed --> closed-fixed
     
  • Jan Nijtmans
    Jan Nijtmans
    2013-01-03

    >I do like to have demos and tests and maintainer review
    >as a general matter, even with no serious harm here.
    A Demo is given already by Ashok, I'm working on integrating
    Ashok's test case in the test suite. Tried to use the
    existing "testobj" command for this, but is looks like
    that's not possible. Upcoming.

     
1 2 > >> (Page 1 of 2)