A stubs-enabled extension encounters Tcl in three distinct ways.
1) It compiles against a tcl.h file from some release of Tcl. With
USE_TCL_STUBS active, the particular tcl.h file determines the
layout of the stubs table that the extension is compiled to assume.
2) It links against some libtclstub.a file. That libtclstub.a file in turn
was compiled against some tcl.h file and from it got a value of
TCL_STUBS_MAGIC compiled into it. As far as I can tell there is
nothing in place to make sure that the two tcl.h files so far mentioned
are the same file, or at a minimum define the same TCL_STUBS_MAGIC
value. Until now it's never been an issue since we've never changed
that value.
3) The extenson [load]s into some Tcl interp. The extension calls
Tcl_InitStubs() passing in the interp value. T_IS pulls a table pointer
out of the iPtr->stubTable field of that interp for vetting for suitability.
What's important is that the table in iPtr->stubTable has a compatible
layout to the one that was compiled into the extension. But the
mechanism in place cannot check this. To illustrate this, consider
compiling an extension containing the call
Tcl_InitStubs(interp, "9.0", 0);
but compiling it against a Tcl 8 header file, so all of it's other calls
into the Tcl library are coded based on the assumptions of the Tcl 8
stub table layout.
Next imagine linking this against libtclstubs9.a so that the TCL_STUBS_MAGIC
value the Tcl_InitStubs() call will be checking will be the one for the Tcl 9 table.
Then [load] in a Tcl 9 interp. All of Tcl_InitStubs() checks will pass, but we'll
be left with an extension compiled to Tcl 8 layout expectations making use
of a Tcl 9 stub table, and that's gonna crash hard.
Now, if the extension writer would have just stuck with the recommended
convention of
Tcl_InitStubs(interp, TCL_VERSION, 0)
he could not get in this trouble. But that is our fault for giving him that
knob to tweak in the first place. Tcl_InitStubs ought not have
a "version" argument for the coder to get wrong.
Merged the (modified) "novem-review" branch to "novem"
Whatever check we will do in HasStubSupport still has
to be decided, it now has a full tclversion and a magic
parameter. so anything can be put there as need arises.
For now, I kept the check the same as in Tcl 8.6.
I don't see the need to remove the Tcl_InitStubs()
function itself. It is fully hidden, but allows the future
(unversioned) tclstub.a to function for Tcl 8 as well.
Further on, the win32 build was broken due to
other changes, fixed that.
Don, Is that sufficient for you to merge the
novem-remove-string-result branch?
Merge completed. Thanks.
We're closing in on agreement what the reformed
TclInitStubs() should look like and do. Just not quite
there yet. That can continue without the merge held
back, and that's a good thing.
Updated the "bug-3588687" branch, trying to get closer to our common goal.
Two remarks:
- I don't think tcl.h should be modified, for binary compatibility with
older 8.6.x releases. So, in stead defined TclInitStubs() in tclInt.h,
and let the linking occur through Tcl_InitStubs(). We have it, so why
not use it? In Tcl 9.0 we can then make the step to change
Tcl_InitStubs to a macro.
- The usage of TCL_MAJOR_VERSION and TCL_VERSION in tclStubLib.c
hinders the adoption of a version-less stub library. I was under the
impression that the tcl version should be brought in through tcl.h and
that tclStubLib.c should not reference TCL_VERSION nor
TCL_MAJOR_VERSION. So I suggest to remove the check in
HasStubSupport(), in stead check for tclversion[0] == version[0]
in TclInitStubs. Still that would not be 100% as desired, but
at least the stub library then becomes completely independant
from tcl.h. In Tcl 9.0 we could rename tclstub9.0.a to tclstub.a
(and change tcl.h to use a macro), and this stub library would
be usable for Tcl 8.6 (using Tcl_InitStubs), 9.0, 9.1 ... (using
TclInitStubs)
Wouldn't that be better (although we aren't there yet)? Feedback appreciated!
Found time to examine this again. First a simple Q.
What's the value of partitioning some of the code into
another new file, tclStubLibCompat.c ?
The calls to Tcl_PkgRequireEx() within TclInitStubs
are themselves relying on the stubs mechanism.
In order to call them with certainty they will work,
we need to check that the magic value of the stub
table in the loading interp matches the
TCL_STUB_MAGIC value with which the stubs
library was compiled. We must check for three-way
agreement. Two-way is not enough.
The value of a separate tclStubLibCompat.c is, that it can be left
out when the stub library is linked using Tcl 9 headers. I am trying
to reach the goal of a version-independant stub library, the difference
is in tcl.h only.
The minimum requirement of Tcl_PkgRequireEx to work is that:
- the entry in the stub table stays the same
- the signature of Tcl_PkgRequireEx stays the same.
I see no reason to change the entry or the signature,
so there is no magic check needed to make it work for
sure. The additional checks are meaningless IMHO.
The real problem is the signature change in other
functions, which are not used in the stub library,
such as Tcl_EvalEx(). Or in the Tcl_Obj structure,
which will (I assume) change the length field
from int to size_t. On 32-bit platforms that doesn't
make a difference (all those types are 32-bit) but
on 64-bit platforms it totally breaks. Therefore I
propose to change the magic value from
(int) 0xFCA3BACF
to
(int) (0xFCA3BACB + sizeof(size_t))
Then the magic value on 32-bit platforms stays
the same, on 64-bit platforms it changes. Then
the current check suffices to detect the problem.
So, I agree with you - as you stated earlier:
>Since I'd prefer not to change the TCL_STUBS_MAGIC value
I prefer not to change the TCL_STUBS_MAGIC
as well. But striving for a version-independant
stub library, on 64-bit platforms I don't see
another possibility. On 32-bit platforms I
don't see a reason to change it.
I must be slow or something (Don't answer that! :) )
If the tclStubLibCompat.c gets left out, then you do not
get a libtclstubs.a that's version-free. You get a libtclstubs.a
that will not link against an extension compiled with Tcl 8
headers. Right?
If you commit to keeping including tclStubLibCompat.c
in the stubs library, though, seems you could end up with
a libtclstubs.a that's truly version-free, and sensitive only
to changes in TCL_STUB_MAGIC. But if we need to keep
including it, then the value of multiple source files no longer
makes sense to me.
That said, please have a look at my new checkin on the
bug-3588687 branch. I think it moves things closer to
achieving our increasingly shared goals.
We may have different visions of what the
TCL_STUB_MAGIC value tells us. For me,
if the TCL_STUB_MAGIC value stays the same,
that means:
1) The interp has a stubs table.
2) The first entry in the table is a routine
with the same signature and compatible
functioning as Tcl_PkgRequireEx().
and that's it! Other compatibility checks get
deferred to what T_PRE judges and that
gets controlled by the usual version numbering
scheme.
In that vision, the only reason to change TCL_STUB_MAGIC
is either
1) A stubs table rethink so radical it no longer allows the
first slot to be assigned to T_PRE(); or
2) An incompatibility that cannot be sorted out by T_PRE()
using Tcl version numbers.
An example of 2) would be some kind of development fork
where the forks had incompatible stub tables, but were making
claim to the same set of version numbers. I don't fully grok your
64-bit claims yet, but it's possible they'd be in this category.
Let's do an experiment, to show what I want to tell.
Please check out the branche "novem-support". It is a
modified 8.6, mainly pkgb.so is modified to use
Tcl_PkgRequireEx(..., "8.6-". ...), and uses the
Tcl_EvalEx() function. Clearly this is wrong: It would
allow pkgb.so to be loaded by both Tcl 8.6 and Tcl 9.0,
but Tcl_EvalEx will have a different signature, so that will
lead to a crash. I'll show that.
Then there is the branch "novem-demo-bug-3588687",
which is a modified "novem" with a Tcl_EvalEx signature
change: the numBytes parameter is changed to a
Tcl_WideInt (will be size_t in Tcl9, but this way I can
demonstrate the crash on 32-bit platforms as well)
Let's run this tclsh9.0 and load the pkgb.so from
the "novem-support" 8.6 build:
$ tclsh9.0
% load ../../tcl8.6/unix/dltest/pkgb.so
% info commands pkgb_*
pkgb_unsafe pkgb_sub
% pkgb_unsafe
Crash!
Aborted (core dumped)
$
This pkgb.so was compiled and linked against
the unmodified Tcl 8.6 tcl.h and libtclstub86.a.
How come that this was not detected? Whatever
test in tclStubLib.c cannot detect that: pkgb.so
was compiled with a legacy stub library.
So, the only way I see to make Tcl_StubInit give
a nice error message is change the TCL_STUB_MAGIC
magic value for 64-bit builds in Tcl9. Then, even
though pkgb.so was linked with a legacy stub
library it would detect that and give:
"interpreter uses an incompatible stubs mechanism"
in stead of crashing.
If you have another idea how to prevent a crash
in this case, I'm all ear. That's why my conclusion
is that TCL_STUB_MAGIC should change on
64-bit platforms, it can stay the same on 32-bit
platforms. Do you get the point from this?
Regards,
Jan Nijtmans
I'm only starting, but wanted to quickly observe
that Tcl_PkgRequireEx() doesn't accept arguments
like "8.6-". It's only built for single version numbers,
not requirement ranges. For those you have to call
Tcl_PkgRequireProc(), which TclInitStubs doesn't do.
>Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"
Somehow it works, so we might have hit an unrelated bug in tclPkg.c
somewhere here .......
I've got trains and family to content with, so may be until tomorrow until
I can reply well, but if you haven't checked what the latest bug-3588687
code would do in this case, that might be worth trying.
> >Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"
> Somehow it works, so we might have hit an unrelated bug in tclPkg.c
> somewhere here .......
TIP #268 specifies that ''The existing functions Tcl_PkgRequire(Ex) are
re-implemented in terms of this function (=Tcl_PkgRequireProc)",
so Tcl_PkgRequireEx() accepts "8.6-" just fine. That's what I
see and it fits with the TIP description (and with my expectations).
I don't see a bug here, after all.
I'm getting a clearer picture now. Hope to have more extended remarks
tomorrow. Meanwhile, just for fun you might give this one a try:
Tcl_InitStubs(interp, "8.6 9", 0);
:)
Just so I can start from an agreed assumption in the morning, we agree
that
Tcl_InitStubs(interp, "8.6-", 0)
is *never* a sensible thing to do, right? Any code including that is
wrongheaded and buggy from the get-go.
(In that sense it's like my example in the original ticket here.
Compiling against Tcl 8 headers, but passing "9.0" as an argument
to Tcl_InitStubs() cannot ever work and is always a mistake.)
There's no compat issue with making Tcl_InitStubs refuse to follow
broken orders. We just need to redesign it so that such mistakes are
reported errors, not panics or crashes.
>we agree that
Tcl_InitStubs(interp, "8.6-", 0)
>is *never* a sensible thing to do, right
On 64-bit platforms we agree on that, but
on 32-bit platforms I see no problem.
> Compiling against Tcl 8 headers, but passing
> "9.0" as an argument to Tcl_InitStubs() cannot ever
> work and is always a mistake.)
There is no reason at all to do that: It can only work
when the extension uses the common subset of
Tcl 8.6 and 9.0 API's, but then it makes no sense
to specify "9.0". Specifying "8.6-" would make more sense.
The consensus should be:
Tcl_InitStubs(interp, "8.6", 0)
means that this extension uses the 8.6 API, so compiles and
runs with any 8.x release with x >= 6.
Tcl_InitStubs(interp, "9.0", 0)
means that this extension uses the 9.0 API, so compiles and
runs with any 9.x release.
Tcl_InitStubs(interp, "8.6-", 0)
means that when compiled with Tcl 8.6 headers it will run with
Tcl 8.6, and when compiled with Tcl 9.0 headers it will run
with any 9.x release. In Tcl9 some API's will be removed,
others will be added. Any extension specifying "8.6-" can
only use the common subset API's of 8.6 and 9.0. (
assuming the stub entrie positions are kept equal)
This also means that no extension can claim "8.6-"
during the alpha/beta period of Tcl 9. One method
to enforce that is in tcl.h 9.0 specifying:
#if TCL_RELEASE_LEVEL == TCL_FINAL_RELEASE
#define Tcl_InitStubs(interp, version, exact) \ TclInitStubs(interp, version, exact, TCL_VERSION, TCL_STUB_MAGIC)
else
#define Tcl_InitStubs(interp, version, exact) \ TclInitStubs(interp, TCL_PATCH_LEVEL, 1, TCL_VERSION, TCL_STUB_MAGIC)
#endif
But - as proven - claiming "8.6-" compatibility cannot ever work
across the 8.6/9.0 border on 64-bit platforms. That's what the
MAGIC change is needed for.
Regards,
Jan Nijtmans
>we agree that Tcl_InitStubs(interp, "8.6-", 0)
>is *never* a sensible thing to do, right
Thinking more about it, I agree! It would promise
that it will work with Tcl 10 as well, which
is nonsence, of course.
Tcl_InitStubs(interp, "8.6 9", 0) should be used.
I now turned pkgb.so in core-8-5-branch and trunk into a
Tcl9 interoperability test. And closed the "novem-support" branch.
Tcl_InitStubs(interp, "8.6 9", 0) unfortunately doesn't work, it leads
to a crash. That must be a bug somewhere in tclPkg.c. Luck
enough, "8.6-9.1" works, and that's good enough for this test.
Now the challenge is that pkgb.so loaded in any Tcl interpreter
should either give an error message, either succeed, but
never crash. On 32-bit platforms that's no problem: The
signature of the used API will at most change some parameters
from int to size_t, but they all are the same size. On 64-bit
platforms I see no other solution than to change the MAGIC
value.
My idea of an unversioned stub library is now in the "novem-unversioned-stub"
branch. The only thing missing is the MAGIC value change when the API
changes in an incompatible way (novem-64bit-sizes)
Ok, two things are becoming clear.
1) How I was misunderstanding/forgetting how
Tcl_PkgRequireEx() had expanded in function,
and how that impacts *all deployed* stubs-enabled
extensions. (eeeuuuuwwww!!!)
2) Those areas where we still suffer from fundamental
disagreement.
I'm going to work on code demonstrations. That seems to
be the way we're most successful clearly demonstrating
points to one another.
While I work on that... Thanks for the test case. I've merged
it over to the bug-3588687 branch. On that branch, build the
pkgb.so for testing. Built another on the trunk.
Then over on the novem-demo-bug-3588687 branch, `make shell`:
% load path/to/bug-3588687/pkgb.so
version conflict for package "Tcl": have 9.0a0, need 8
% load path/to/trunk/pkgb.so
% pkgb_unsafe
Crash!
make: *** [shell] Aborted
I submit this demonstrates that bug-3588687 branch
is a fix for the problem demonstrated by pkgb.so.
>I submit this demonstrates that bug-3588687 branch
>is a fix for the problem demonstrated by pkgb.so.
But for the fix you had to recompile pkgb.so and link
to a modified stub library. So this doesn't work
with already-compiled pkgb.so binaries.
The challenge is: Some person A build an extension foo.so.
Another person B, which doesn't have the source code of
that extension, tries to load it in Tcl9, and it crashes.
So, the challenge is, how can Tcl9 detect that without
making modifications to foo.so. We don't want to
backport tclStubLib.c modifications to the field everywhere,
giving additional restrictions to people. Joe English
already expressed his reluctance to modify
tclStubLib.c in current releases <= 8.6. I tried it already -
before I found the alternative solution in tclLoad.c -
which was put on-hold on your request.....
It became clear to me that the solution must be
found in Tcl9, not by modifying the 8.x stub library!
Please try again with the current "novem-demo-bug-3588687" branch.
You will get:
% load path/to/trunk/pkgb.so
interpreter uses an incompatible stubs mechanism
This demonstrates that this solution works for
already-built extensions, no need to recompile
and link to a newer stub library.
Ok, you're just repeating my starting point.
The routine Tcl_InitStubs() and therefore all
deployed binary stubs-enabled extensions -- the
ones linked against the "versioned stubs lib"
-- suffer from a fundamentally broken design. If any
of them passed arguments requesting or
accepting "Tcl 9", then those deployed, binaries
are going to crash if they are [load]ed into novem.
It's ugly and it's broken, but we can't turn back time.
This was botched, and we're going to suffer for that botch.
All we can do is get a fixed unversioned stubs library
out and people started using it before Tcl 9 loading
attempts get widespread.
If we really must must must do something that will
allows any old foo.so linked to a versioned stubs
library that's been lying around untouched to
[load] into Tcl 9 and not crash, then the only solution
is to change TCL_STUB_MAGIC in Tcl 9 (for all
platforms, which I will demonstrate). Then the
[load] will not crash, but will instead generate the
error message:
"This interpreter does not support stubs-enabled extensions."
and as a consequence, those new extensions linked to the
new unversioned stubs library would be unable to generate any
better error message than:
"interpreter uses an incompatible stubs mechanism"
Consider, though, which foo.so are going to crash. Only those
which passed a 'version" string argument that accepted a Tcl 9
interp. I submit that no one has done this, other than folks
like us who are probing and demonstrating the brokenness
if Tcl_InitStubs(). So, we would be sacrificing the ability to
produce more meaningful error messages like
"version conflict for package "Tcl": have 9.0a0, need 8"
for people actively making extensions today, using the
latest tools and advice for the sake of preventing crashes
in foo.so that do not exist outside the labs of torture testers
like us.
I prefer that we keep TCL_STUBS_MAGIC unchanged.
Get better error messages. And if any of those mythical
foo.so appear, then they crash. Sorry. We limped along
with a misdesigned Tcl_InitStubs for a decade, and that's
the pain.
Demonstrating on the tip of novem-demo-bug-3588687:
% load path/to/trunk/pkgb.so
interpreter uses an incompatible stubs mechanism
% load path/to/bug-3588687/pkgb.so
interpreter uses an incompatible stubs mechanism
And in reality, the first result for deployed foo.so will be:
"This interpreter does not support stubs-enabled extensions."