Menu

#5132 Tcl_InitStubs() Broken As Designed

open
8
2014-11-30
2012-11-20
Don Porter
No

A stubs-enabled extension encounters Tcl in three distinct ways.

1) It compiles against a tcl.h file from some release of Tcl. With
USE_TCL_STUBS active, the particular tcl.h file determines the
layout of the stubs table that the extension is compiled to assume.

2) It links against some libtclstub.a file. That libtclstub.a file in turn
was compiled against some tcl.h file and from it got a value of
TCL_STUBS_MAGIC compiled into it. As far as I can tell there is
nothing in place to make sure that the two tcl.h files so far mentioned
are the same file, or at a minimum define the same TCL_STUBS_MAGIC
value. Until now it's never been an issue since we've never changed
that value.

3) The extenson [load]s into some Tcl interp. The extension calls
Tcl_InitStubs() passing in the interp value. T_IS pulls a table pointer
out of the iPtr->stubTable field of that interp for vetting for suitability.

What's important is that the table in iPtr->stubTable has a compatible
layout to the one that was compiled into the extension. But the
mechanism in place cannot check this. To illustrate this, consider
compiling an extension containing the call

Tcl_InitStubs(interp, "9.0", 0);

but compiling it against a Tcl 8 header file, so all of it's other calls
into the Tcl library are coded based on the assumptions of the Tcl 8
stub table layout.

Next imagine linking this against libtclstubs9.a so that the TCL_STUBS_MAGIC
value the Tcl_InitStubs() call will be checking will be the one for the Tcl 9 table.

Then [load] in a Tcl 9 interp. All of Tcl_InitStubs() checks will pass, but we'll
be left with an extension compiled to Tcl 8 layout expectations making use
of a Tcl 9 stub table, and that's gonna crash hard.

Now, if the extension writer would have just stuck with the recommended
convention of

Tcl_InitStubs(interp, TCL_VERSION, 0)

he could not get in this trouble. But that is our fault for giving him that
knob to tweak in the first place. Tcl_InitStubs ought not have
a "version" argument for the coder to get wrong.

Discussion

<< < 1 2 3 > >> (Page 2 of 3)
  • Jan Nijtmans

    Jan Nijtmans - 2012-11-30

    Merged the (modified) "novem-review" branch to "novem"
    Whatever check we will do in HasStubSupport still has
    to be decided, it now has a full tclversion and a magic
    parameter. so anything can be put there as need arises.
    For now, I kept the check the same as in Tcl 8.6.

    I don't see the need to remove the Tcl_InitStubs()
    function itself. It is fully hidden, but allows the future
    (unversioned) tclstub.a to function for Tcl 8 as well.

    Further on, the win32 build was broken due to
    other changes, fixed that.

    Don, Is that sufficient for you to merge the
    novem-remove-string-result branch?

     
  • Don Porter

    Don Porter - 2012-11-30

    Merge completed. Thanks.

    We're closing in on agreement what the reformed
    TclInitStubs() should look like and do. Just not quite
    there yet. That can continue without the merge held
    back, and that's a good thing.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-05

    Updated the "bug-3588687" branch, trying to get closer to our common goal.
    Two remarks:
    - I don't think tcl.h should be modified, for binary compatibility with
    older 8.6.x releases. So, in stead defined TclInitStubs() in tclInt.h,
    and let the linking occur through Tcl_InitStubs(). We have it, so why
    not use it? In Tcl 9.0 we can then make the step to change
    Tcl_InitStubs to a macro.
    - The usage of TCL_MAJOR_VERSION and TCL_VERSION in tclStubLib.c
    hinders the adoption of a version-less stub library. I was under the
    impression that the tcl version should be brought in through tcl.h and
    that tclStubLib.c should not reference TCL_VERSION nor
    TCL_MAJOR_VERSION. So I suggest to remove the check in
    HasStubSupport(), in stead check for tclversion[0] == version[0]
    in TclInitStubs. Still that would not be 100% as desired, but
    at least the stub library then becomes completely independant
    from tcl.h. In Tcl 9.0 we could rename tclstub9.0.a to tclstub.a
    (and change tcl.h to use a macro), and this stub library would
    be usable for Tcl 8.6 (using Tcl_InitStubs), 9.0, 9.1 ... (using
    TclInitStubs)

    Wouldn't that be better (although we aren't there yet)? Feedback appreciated!

     
  • Don Porter

    Don Porter - 2012-12-06

    Found time to examine this again. First a simple Q.
    What's the value of partitioning some of the code into
    another new file, tclStubLibCompat.c ?

     
  • Don Porter

    Don Porter - 2012-12-06

    The calls to Tcl_PkgRequireEx() within TclInitStubs
    are themselves relying on the stubs mechanism.
    In order to call them with certainty they will work,
    we need to check that the magic value of the stub
    table in the loading interp matches the
    TCL_STUB_MAGIC value with which the stubs
    library was compiled. We must check for three-way
    agreement. Two-way is not enough.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-06

    The value of a separate tclStubLibCompat.c is, that it can be left
    out when the stub library is linked using Tcl 9 headers. I am trying
    to reach the goal of a version-independant stub library, the difference
    is in tcl.h only.

    The minimum requirement of Tcl_PkgRequireEx to work is that:
    - the entry in the stub table stays the same
    - the signature of Tcl_PkgRequireEx stays the same.
    I see no reason to change the entry or the signature,
    so there is no magic check needed to make it work for
    sure. The additional checks are meaningless IMHO.

    The real problem is the signature change in other
    functions, which are not used in the stub library,
    such as Tcl_EvalEx(). Or in the Tcl_Obj structure,
    which will (I assume) change the length field
    from int to size_t. On 32-bit platforms that doesn't
    make a difference (all those types are 32-bit) but
    on 64-bit platforms it totally breaks. Therefore I
    propose to change the magic value from
    (int) 0xFCA3BACF
    to
    (int) (0xFCA3BACB + sizeof(size_t))
    Then the magic value on 32-bit platforms stays
    the same, on 64-bit platforms it changes. Then
    the current check suffices to detect the problem.

    So, I agree with you - as you stated earlier:
    >Since I'd prefer not to change the TCL_STUBS_MAGIC value
    I prefer not to change the TCL_STUBS_MAGIC
    as well. But striving for a version-independant
    stub library, on 64-bit platforms I don't see
    another possibility. On 32-bit platforms I
    don't see a reason to change it.

     
  • Don Porter

    Don Porter - 2012-12-06

    I must be slow or something (Don't answer that! :) )

    If the tclStubLibCompat.c gets left out, then you do not
    get a libtclstubs.a that's version-free. You get a libtclstubs.a
    that will not link against an extension compiled with Tcl 8
    headers. Right?

    If you commit to keeping including tclStubLibCompat.c
    in the stubs library, though, seems you could end up with
    a libtclstubs.a that's truly version-free, and sensitive only
    to changes in TCL_STUB_MAGIC. But if we need to keep
    including it, then the value of multiple source files no longer
    makes sense to me.

    That said, please have a look at my new checkin on the
    bug-3588687 branch. I think it moves things closer to
    achieving our increasingly shared goals.

     
  • Don Porter

    Don Porter - 2012-12-06

    We may have different visions of what the
    TCL_STUB_MAGIC value tells us. For me,
    if the TCL_STUB_MAGIC value stays the same,
    that means:

    1) The interp has a stubs table.
    2) The first entry in the table is a routine
    with the same signature and compatible
    functioning as Tcl_PkgRequireEx().

    and that's it! Other compatibility checks get
    deferred to what T_PRE judges and that
    gets controlled by the usual version numbering
    scheme.

    In that vision, the only reason to change TCL_STUB_MAGIC
    is either

    1) A stubs table rethink so radical it no longer allows the
    first slot to be assigned to T_PRE(); or
    2) An incompatibility that cannot be sorted out by T_PRE()
    using Tcl version numbers.

    An example of 2) would be some kind of development fork
    where the forks had incompatible stub tables, but were making
    claim to the same set of version numbers. I don't fully grok your
    64-bit claims yet, but it's possible they'd be in this category.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-06

    Let's do an experiment, to show what I want to tell.

    Please check out the branche "novem-support". It is a
    modified 8.6, mainly pkgb.so is modified to use
    Tcl_PkgRequireEx(..., "8.6-". ...), and uses the
    Tcl_EvalEx() function. Clearly this is wrong: It would
    allow pkgb.so to be loaded by both Tcl 8.6 and Tcl 9.0,
    but Tcl_EvalEx will have a different signature, so that will
    lead to a crash. I'll show that.

    Then there is the branch "novem-demo-bug-3588687",
    which is a modified "novem" with a Tcl_EvalEx signature
    change: the numBytes parameter is changed to a
    Tcl_WideInt (will be size_t in Tcl9, but this way I can
    demonstrate the crash on 32-bit platforms as well)

    Let's run this tclsh9.0 and load the pkgb.so from
    the "novem-support" 8.6 build:

    $ tclsh9.0
    % load ../../tcl8.6/unix/dltest/pkgb.so
    % info commands pkgb_*
    pkgb_unsafe pkgb_sub
    % pkgb_unsafe
    Crash!
    Aborted (core dumped)
    $

    This pkgb.so was compiled and linked against
    the unmodified Tcl 8.6 tcl.h and libtclstub86.a.
    How come that this was not detected? Whatever
    test in tclStubLib.c cannot detect that: pkgb.so
    was compiled with a legacy stub library.

    So, the only way I see to make Tcl_StubInit give
    a nice error message is change the TCL_STUB_MAGIC
    magic value for 64-bit builds in Tcl9. Then, even
    though pkgb.so was linked with a legacy stub
    library it would detect that and give:
    "interpreter uses an incompatible stubs mechanism"
    in stead of crashing.

    If you have another idea how to prevent a crash
    in this case, I'm all ear. That's why my conclusion
    is that TCL_STUB_MAGIC should change on
    64-bit platforms, it can stay the same on 32-bit
    platforms. Do you get the point from this?

    Regards,
    Jan Nijtmans

     
  • Don Porter

    Don Porter - 2012-12-06

    I'm only starting, but wanted to quickly observe
    that Tcl_PkgRequireEx() doesn't accept arguments
    like "8.6-". It's only built for single version numbers,
    not requirement ranges. For those you have to call
    Tcl_PkgRequireProc(), which TclInitStubs doesn't do.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-06

    >Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"

    Somehow it works, so we might have hit an unrelated bug in tclPkg.c
    somewhere here .......

     
  • Don Porter

    Don Porter - 2012-12-06

    I've got trains and family to content with, so may be until tomorrow until
    I can reply well, but if you haven't checked what the latest bug-3588687
    code would do in this case, that might be worth trying.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-06

    > >Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"
    > Somehow it works, so we might have hit an unrelated bug in tclPkg.c
    > somewhere here .......
    TIP #268 specifies that ''The existing functions Tcl_PkgRequire(Ex) are
    re-implemented in terms of this function (=Tcl_PkgRequireProc)",
    so Tcl_PkgRequireEx() accepts "8.6-" just fine. That's what I
    see and it fits with the TIP description (and with my expectations).
    I don't see a bug here, after all.

     
  • Don Porter

    Don Porter - 2012-12-07

    I'm getting a clearer picture now. Hope to have more extended remarks
    tomorrow. Meanwhile, just for fun you might give this one a try:

    Tcl_InitStubs(interp, "8.6 9", 0);

    :)

     
  • Don Porter

    Don Porter - 2012-12-07

    Just so I can start from an agreed assumption in the morning, we agree
    that

    Tcl_InitStubs(interp, "8.6-", 0)

    is *never* a sensible thing to do, right? Any code including that is
    wrongheaded and buggy from the get-go.

    (In that sense it's like my example in the original ticket here.
    Compiling against Tcl 8 headers, but passing "9.0" as an argument
    to Tcl_InitStubs() cannot ever work and is always a mistake.)

    There's no compat issue with making Tcl_InitStubs refuse to follow
    broken orders. We just need to redesign it so that such mistakes are
    reported errors, not panics or crashes.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    >we agree that
    Tcl_InitStubs(interp, "8.6-", 0)
    >is *never* a sensible thing to do, right

    On 64-bit platforms we agree on that, but
    on 32-bit platforms I see no problem.

    > Compiling against Tcl 8 headers, but passing
    > "9.0" as an argument to Tcl_InitStubs() cannot ever
    > work and is always a mistake.)
    There is no reason at all to do that: It can only work
    when the extension uses the common subset of
    Tcl 8.6 and 9.0 API's, but then it makes no sense
    to specify "9.0". Specifying "8.6-" would make more sense.

    The consensus should be:
    Tcl_InitStubs(interp, "8.6", 0)
    means that this extension uses the 8.6 API, so compiles and
    runs with any 8.x release with x >= 6.
    Tcl_InitStubs(interp, "9.0", 0)
    means that this extension uses the 9.0 API, so compiles and
    runs with any 9.x release.
    Tcl_InitStubs(interp, "8.6-", 0)
    means that when compiled with Tcl 8.6 headers it will run with
    Tcl 8.6, and when compiled with Tcl 9.0 headers it will run
    with any 9.x release. In Tcl9 some API's will be removed,
    others will be added. Any extension specifying "8.6-" can
    only use the common subset API's of 8.6 and 9.0. (
    assuming the stub entrie positions are kept equal)

    This also means that no extension can claim "8.6-"
    during the alpha/beta period of Tcl 9. One method
    to enforce that is in tcl.h 9.0 specifying:
    #if TCL_RELEASE_LEVEL == TCL_FINAL_RELEASE
    #define Tcl_InitStubs(interp, version, exact) \ TclInitStubs(interp, version, exact, TCL_VERSION, TCL_STUB_MAGIC)
    else
    #define Tcl_InitStubs(interp, version, exact) \ TclInitStubs(interp, TCL_PATCH_LEVEL, 1, TCL_VERSION, TCL_STUB_MAGIC)
    #endif

    But - as proven - claiming "8.6-" compatibility cannot ever work
    across the 8.6/9.0 border on 64-bit platforms. That's what the
    MAGIC change is needed for.

    Regards,
    Jan Nijtmans

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    >we agree that Tcl_InitStubs(interp, "8.6-", 0)
    >is *never* a sensible thing to do, right
    Thinking more about it, I agree! It would promise
    that it will work with Tcl 10 as well, which
    is nonsence, of course.
    Tcl_InitStubs(interp, "8.6 9", 0) should be used.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    I now turned pkgb.so in core-8-5-branch and trunk into a
    Tcl9 interoperability test. And closed the "novem-support" branch.

    Tcl_InitStubs(interp, "8.6 9", 0) unfortunately doesn't work, it leads
    to a crash. That must be a bug somewhere in tclPkg.c. Luck
    enough, "8.6-9.1" works, and that's good enough for this test.

    Now the challenge is that pkgb.so loaded in any Tcl interpreter
    should either give an error message, either succeed, but
    never crash. On 32-bit platforms that's no problem: The
    signature of the used API will at most change some parameters
    from int to size_t, but they all are the same size. On 64-bit
    platforms I see no other solution than to change the MAGIC
    value.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    My idea of an unversioned stub library is now in the "novem-unversioned-stub"
    branch. The only thing missing is the MAGIC value change when the API
    changes in an incompatible way (novem-64bit-sizes)

     
  • Don Porter

    Don Porter - 2012-12-07

    Ok, two things are becoming clear.
    1) How I was misunderstanding/forgetting how
    Tcl_PkgRequireEx() had expanded in function,
    and how that impacts *all deployed* stubs-enabled
    extensions. (eeeuuuuwwww!!!)
    2) Those areas where we still suffer from fundamental
    disagreement.

    I'm going to work on code demonstrations. That seems to
    be the way we're most successful clearly demonstrating
    points to one another.

    While I work on that... Thanks for the test case. I've merged
    it over to the bug-3588687 branch. On that branch, build the
    pkgb.so for testing. Built another on the trunk.

    Then over on the novem-demo-bug-3588687 branch, `make shell`:
    % load path/to/bug-3588687/pkgb.so
    version conflict for package "Tcl": have 9.0a0, need 8
    % load path/to/trunk/pkgb.so
    % pkgb_unsafe
    Crash!
    make: *** [shell] Aborted

    I submit this demonstrates that bug-3588687 branch
    is a fix for the problem demonstrated by pkgb.so.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    >I submit this demonstrates that bug-3588687 branch
    >is a fix for the problem demonstrated by pkgb.so.
    But for the fix you had to recompile pkgb.so and link
    to a modified stub library. So this doesn't work
    with already-compiled pkgb.so binaries.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    The challenge is: Some person A build an extension foo.so.
    Another person B, which doesn't have the source code of
    that extension, tries to load it in Tcl9, and it crashes.
    So, the challenge is, how can Tcl9 detect that without
    making modifications to foo.so. We don't want to
    backport tclStubLib.c modifications to the field everywhere,
    giving additional restrictions to people. Joe English
    already expressed his reluctance to modify
    tclStubLib.c in current releases <= 8.6. I tried it already -
    before I found the alternative solution in tclLoad.c -
    which was put on-hold on your request.....

    It became clear to me that the solution must be
    found in Tcl9, not by modifying the 8.x stub library!

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-12-07

    Please try again with the current "novem-demo-bug-3588687" branch.
    You will get:
    % load path/to/trunk/pkgb.so
    interpreter uses an incompatible stubs mechanism

    This demonstrates that this solution works for
    already-built extensions, no need to recompile
    and link to a newer stub library.

     
  • Don Porter

    Don Porter - 2012-12-07

    Ok, you're just repeating my starting point.

    The routine Tcl_InitStubs() and therefore all
    deployed binary stubs-enabled extensions -- the
    ones linked against the "versioned stubs lib"
    -- suffer from a fundamentally broken design. If any
    of them passed arguments requesting or
    accepting "Tcl 9", then those deployed, binaries
    are going to crash if they are [load]ed into novem.
    It's ugly and it's broken, but we can't turn back time.
    This was botched, and we're going to suffer for that botch.

    All we can do is get a fixed unversioned stubs library
    out and people started using it before Tcl 9 loading
    attempts get widespread.

    If we really must must must do something that will
    allows any old foo.so linked to a versioned stubs
    library that's been lying around untouched to
    [load] into Tcl 9 and not crash, then the only solution
    is to change TCL_STUB_MAGIC in Tcl 9 (for all
    platforms, which I will demonstrate). Then the
    [load] will not crash, but will instead generate the
    error message:

    "This interpreter does not support stubs-enabled extensions."

    and as a consequence, those new extensions linked to the
    new unversioned stubs library would be unable to generate any
    better error message than:

    "interpreter uses an incompatible stubs mechanism"

    Consider, though, which foo.so are going to crash. Only those
    which passed a 'version" string argument that accepted a Tcl 9
    interp. I submit that no one has done this, other than folks
    like us who are probing and demonstrating the brokenness
    if Tcl_InitStubs(). So, we would be sacrificing the ability to
    produce more meaningful error messages like

    "version conflict for package "Tcl": have 9.0a0, need 8"

    for people actively making extensions today, using the
    latest tools and advice for the sake of preventing crashes
    in foo.so that do not exist outside the labs of torture testers
    like us.

    I prefer that we keep TCL_STUBS_MAGIC unchanged.
    Get better error messages. And if any of those mythical
    foo.so appear, then they crash. Sorry. We limped along
    with a misdesigned Tcl_InitStubs for a decade, and that's
    the pain.

     
  • Don Porter

    Don Porter - 2012-12-07

    Demonstrating on the tip of novem-demo-bug-3588687:

    % load path/to/trunk/pkgb.so
    interpreter uses an incompatible stubs mechanism
    % load path/to/bug-3588687/pkgb.so
    interpreter uses an incompatible stubs mechanism

    And in reality, the first result for deployed foo.so will be:

    "This interpreter does not support stubs-enabled extensions."

     
<< < 1 2 3 > >> (Page 2 of 3)