Tcl / Read-Only Patches / #330 atfork patch

David N. Welton - 2004-03-25

atfork.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-06-16

newatfork.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-06-16

Logged In: YES
user_id=240

Updated patch.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-06-17

Logged In: YES
user_id=95086

Hi David,

I'm currently reviewing the patch. I have applied it to the
core-8-4-branch and get problem with thread.test. The
offending part:

# Some tests require the testthread command

-set ::tcltest::testConstraints(testthread) \
- [expr {[info commands testthread] != {}}]
+#set ::tcltest::testConstraints(testthread) \
+# [expr {[info commands testthread] != {}}]

-if {$::tcltest::testConstraints(testthread)} {
+tcltest::constraints::cset testthread [expr {[info commands
testthread] != {}}]
+
+if {[::tcltest::constraints::cset testthread]} {

The "::tcltest::constraints::cset" is unknown.
I do a simple:

patch -p0 < newatfork.diff
cd unix
make
make test
....
Test file error: invalid command name
"::tcltest::constraints::cset"
while executing
"::tcltest::constraints::cset testthread"
(file
"/usr/local/homes/zv/sf/core-8-4-branch/tests/thread.test"
line 1)
timer.test
....

Another thing is: you should beware of pthread_once. This is not
defined in many posix-thread implementations. Safer is to use
mutex and a static int to simulate the behaviour.

I will yet have to make some tests here. Where I have a bad
feeling is the lock order of mutexes. You are basically looping
thru all Tcl-created mutexes in an linear fashion but this
might
lead to deadlocks I'm afraid. The Tcl mutexes are of dynamic
nature and users can creaet and lock them in any arbitrary
order. I still have to verify that this won't dwarf the
effort, i.e.
lead to a deadlock just before the fork. This will take some
time.

Also, the patch provided is ment for 8.4 branch only, right?

Zoran

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2004-06-17

Logged In: YES
user_id=79902

Speaking as someone who recently went through the core test
suite fixing up constraint setup and use, that should just
refer to:

# Set operation...
tcltest::testConstraint testthread \
[llength [info commands testthread]]
# Read operation...
if {[tcltest::testConstraint testthread]} { ...

No variable or namespace inside the tcltest namespace should
be referred to directly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-06-17

Logged In: YES
user_id=240

Don't mind the bit at the top of the test patch.... I
probably accidentally included some of the big change I made
to the test suite.

As for pthread_once and the mutex locking/unlocking, that,
and the rest of the core functionality of the patch is all
Jeff's. I just prettied it up some, and added some comments
and tests.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-12-05

Logged In: YES
user_id=240

Hi Zoran - did anything ever come of this patch?

Thanks,
Dave

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-05

Logged In: YES
user_id=95086

The problem is as I already described:
This patch is looping thru all Tcl-created mutexes in an linear
fashion but this might lead to deadlocks I'm afraid.
The Tcl mutexes are of dynamic nature and users (extensions)
can create and *lock* them in any *arbitrary* order.
This will lead to deadlock if some extension is holding a lock-hierarchy
and somebody (another thread) tries to fork at that time.
This is very tricky business...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-12-06

Logged In: YES
user_id=240

It is tricky all right. What you describe is not very
likely, but still an impediment to a clean fix. The bug is
real though and, for instance on Debian, Tcl cannot fork(),
which has been noticed by several people to date. We need
to figure out either 1) a good fix, or 2) a workaround that
at least lets people do what they need. A workaround might
include a way for users to specify if they have any tricky
packages that need special care.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-08

Logged In: YES
user_id=95086

"What you describe is not very
likely, but still an impediment to a clean fix"

Oh, please believe me: it is *very* likely. Think of heavy Tcl
threading apps like AOLserver. There you have the immediate
problem of exec and cgi processing which are/would be significantly
affected.
I would rather vote for a cleaner fix.
BTW, would you please point me to some of those bug-reports
you mentioned? I would like to re-check them and see if we can
find some solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David N. Welton - 2004-12-08

Logged In: YES
user_id=240

The two URL's in the original 'patch report', above, are
links to the bugs filed against Tcl and Tclx. I've seen
people ask about this every so often on the newsgroup.

I guess if you say so it could happen, you're the expert.
That part of the patch came from Jeff (I just cleaned it up
and added comments/tests and so on).

What might might be doable as a workaround? Would it be
possible to include some of this logic from the patch in the
[fork] implementation itself, so that in the manual you
could say: "be sure that you don't hold any of the mutexes
before forking".

I don't know what the answer is, but this bug has been
sitting there for a year and a half at this point, and I'd
like to see some discussion on it with the goal of figuring
out an answer. Maybe it's appropriate to do that on
tcl-core so more minds are brought to bear.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-08

Logged In: YES
user_id=95086

Thanks for the hint about the links. Oh,. I just did not read
the original posting right. Sorry for this.

I will look into this. Yeah, it would be good to bring this
topic to the tcl core list to see what other people would
say about that. I can do this after having peeked into
the reports.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-17

Logged In: YES
user_id=95086

Just a life-sing.. I'm still looking into this...

Aparently, the *main* problem is with the Tcl notifier
thread and its handling in the child process, i.e. the fact
that child has not
have access to the notifier thread because it is simply not
there
after the fork.

Attempting to lock all mutexes in master and then release them
again after the fork in both master and child processes, plus
re-initializing the notifier subsystem (effectively
re-creating the
notifier thread) in the child process is a logical way of
solving
this puzzle *only* if you have control of the lock hierarchy...

In the nature of Tcl lib, this is not guaranteed, hence it
will lead possibly (and very probably) to various
deadlock-embrace situations. Also, the patch as provided,
will destroy the reference count of the notifier thread
which is not good but this is a minor issue which can be
fixed. I see no problem in this.
I do see problems in wholesale locking of Tcl mutexes though.
This is the main problem and we have to see if this step
is really needed and what would be the consequence of not
doing it. I'm afraid some hard compromises would be required
and those may/will have consequences and lead to some
special-cases which would be pretty unelegant, design-wise.

Well, this all requires more thinking...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-17

Logged In: YES
user_id=95086

Some more thoughts...

This patch actually attempts to solve 2 problems
at a time:

1. Assure all Tcl mutexes are in the sane state after
the fork, so that the Tcl library in the child process is
in the sane state as well.

2. Reinitialize (actually, re-start) the notifier thread
in the child, since it is not replicated by the fork call
but is/(will be) needed by the Tcl lib in the child.

Only the step 2. is the actual fix of the initial problem
as described in the references to bug-reports above.
This ain't that difficult to solve by pthread_atfork
mechanics and should not introduce any unwanted
side effects. At least I can't think of any at this point.

The step 1. attempts to solve the more difficult problem
of the state of internal mutexes used by the Tcl library
after the fork. This is far more complex than it looks like
on the first glance (deadlock embrace issues)...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zoran Vasiljevic - 2004-12-17

labels: 342014 --> 49. Threading
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel A. Steffen - 2006-05-27

Logged In: YES
user_id=90580

attached patch to HEAD that implements a pthread_atfork()
handler for the Mac OS X specific notifier in
tclMacOSXNotify.c (used when configured with
--enable-corefoundation), this allows unthreaded CF tcl to
[fork] without hanging at notifier finalization: the handler
recreates CoreFoundation state and notifier thread in the child.
Note that pthread_atfork() is available starting with OS X
10.4 Tiger only, so tcl built on earlier releases will not
include this fix.

Because vfork() is used by the core on OS X, the atfork
handler does not run for [exec] and [open |], so core
commands are not affected; the handler only gets called when
an extension/embedder calls fork(), such as TclX's [fork].

On Tiger, this makes [fork] safe to use from tcl with
--disable-threads irrespective of corefoundation
configuration option; but of course it doesn't fix the
generic stale locked mutex issue with threaded tcl and fork
also discussed here, so threaded CF tcl can still deadlock
in the child if any locks are held in the parent at the time
of the [fork].

patch committed to HEAD and core-8-4-branch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel A. Steffen - 2006-05-27

patch implementing atfork handler for mac os x notifier

tclMacOSXNotify-atfork_HEAD.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

atfork patch

The Tool Command Language implementation

Group

Searches

Help

#330 atfork patch

Discussion