[Jython-dev] Threads and multiple interpreters

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Bugs have come up over the years that relate to multiple Python 
interpreters and their state with threads (#2465, #2513, #2505, #2507. 
#2199).  In circumstances where there is more than one interpreter and 
more than one JVM worker thread, properties that users hoped would be 
independently settable turn out not to be, or get mixed up. Mostly, 
these are things we keep in the sys module, aka PySystemState.

We've claimed victory on this problem a few times (closed a few issues) 
but like a badly stretched carpet, the nail makes it wrinkle up 
somewhere else. This makes me suspect that something fundamental may be 
wrong.  Apparently [1], it is too strong to say (C)Python is broken in 
this area of threads and interpreters. Yet CPython is certainly in some 
difficulty making the same carpet lie flat [2].

[1] https://mail.python.org/pipermail/python-ideas/2017-May/045770.html
[2] https://docs.python.org/3/c-api/init.html#bugs-and-caveats (2nd 
paragraph)

Do our problems stem from copying a flawed model? Is it just more 
painfully obvious in Jython because of the absence of the GIL, and the 
type of applications people build on the JVM?

My understanding of the C API is that a PyThreadState points to the tip 
of the stack of active PyFrames and so at any moment where that thread 
is not actually running in a CPU, it holds all the state necessary to 
resume. Any reference in the resumed code to module-global state, e.g. 
imported modules, will be resolved to the values prevailing before 
suspension.

In CPython a PyThreadState is paired with an OS thread, and in Jython 
with a JVM Thread, in such a way that any executing code that doesn't 
already have it in a local variable can look up the current 
PyThreadState, and hence all the stack and interpreter state. "Paired" 
in Jython means we use a ThreadLocal to get from Thread to state. In 
CPython it is the same with the addition of the GIL to police exclusive 
access to interpreter resources, when that thread becomes the unique 
current thread.

Also, every PyThreadState references one PyInterpreterState (many-one) 
where the import mechanism to use is referenced, and the sys module 
(path, metapath), builtins and codec registry all sit, so a subsequent 
import statement will search the right places. Anything implicitly 
dependent on the interpreter state (such as print, through sys) will 
behave consistently. Or so we hope .

The CPython implementation allows for multiple PyInterpreterState 
objects (sub-interpreters), each with its collection of PyThreadState 
objects. The C API [2] warns us that GIL manipulation combined with 
multiple sub-interpreters is delicate because of the assumption that one 
OS thread maps to at most one PyThreadState. The assumption is that the 
OS/JVM thread will lead us to the correct PyThreadState and so to the 
correct PyInterpreterState, but certain sequences of operations expose 
the assumption as unreliable.

I argue that this is an incorrect basis for finding the right 
interpreter (as [2] effectively warns us).

In a compelling case for Jython, a Java thread pool is shared by 
multiple sub-interpreters, each of which may have made changes to the 
module search path, builtins, available codecs or sys.std[in|out|err]. 
The pool has a queue of tasks, each waiting for any available JVM 
thread. An object representing a task was created in a particular 
sub-interpreter. If the task involves any code compiled from Python or 
"Python aware" Java, that code can only execute as expected in the 
context the programmer has created for it in the state of the 
interpreter. For example, if execution encounters an import statement, 
the search path should be as defined where the task was prepared. If 
there is a call to print(), and sys.stdout has been redefined in 
interpreter that originated the task, that's the sys.stdout that print() 
should find.

When a task is given to a worker thread, it doesn't matter that some 
other sub-interpreter holds a PyThreadState that represented this thread 
in the past, or that none does.  We need a way to map from a task to its 
defining sub-interpreter, and (probably) to create a PyThreadState in 
that sub-interpreter for the OS/JVM thread that happens to be running 
the task, at least before code is executed that references the thread 
and interpreter. Our only clue to this is the task object itself.

Although this is a compelling case for the JVM, it may not seem so for 
CPython. However, the same surely applies wherever a C extension loses 
interpreter context for an object that transfers between threads. In 
fact, such loss of context happens all the time when interpreting 
CPython byte code and has to be found each time we create a PyFrame, for 
example, by a call to PyThreadState_GET(), but hardly ever has the 
thread changed. Even for the JVM we should not think this applies only 
for objects explicitly created to represent tasks: any call-back, in 
fact, any invocation of a Python-defined operation (like PyObject._add, 
when it leads to the dunder add method) needs the same consideration if 
objects are able to be handled by different threads.

At first, it seems as if every PyObject must have a reference to its 
owning interpreter. I think this is not the case. Since it is only the 
execution of code that raises the issue, and a (Python) function or 
method seems always able to find the module it belongs to, I expect only 
module objects to need this reference. The places where we must resolve 
the interpreter are those where we currently pick up the thread or 
interpreter state by a call to runtime support. Subject to a deeper 
look, I think the slow path via the module is only needed where 
presently there is no ThreadState and we fall back on the "default 
interpreter". But in the absence of certainty, we shouldn't have to guess.

This solution is controversial, I don't doubt.

Jeff

-- 
Jeff Allen