|
From: Tom H. <th...@cy...> - 2004-10-13 08:27:30
|
There is an issue with a number of kernels in the 2.4.7-2.4.9 time frame which seem to have had the getpid system call changed to return the ID of the process (ie of the main thread) but which have not yet had the gettid system call added to get the ID of the current thread. I'm not entirely sure if this is an issue in the master kernel source or only in certain RedHat patched kernels, but it has come up a number of times - bug 82114 is the main bug but it has others attached. The problem is that VG_(gettid) needs to get the ID of the current thread and it assumes that if the gettid system call returns ENOSYS then the getpid system call will do that. On the broken systems it gets the wrong ID and all sorts of wierd things happen. I have a patch of sorts, but it is horrible. Basically if gettid fails then instead of using getpid we do a readlink on /proc/self and use what it points to as the thread ID. This seems to work, but it is horrible, so what do people think about putting it in? or does anybody have a better way to get the real ID of the current thread on these systems? Tom -- Tom Hughes (th...@cy...) Software Engineer, Cyberscience Corporation http://www.cyberscience.com/ |
|
From: Nicholas N. <nj...@ca...> - 2004-10-13 10:04:57
|
Hi, I don't understand this stuff so well, but here goes anyway... On Wed, 13 Oct 2004, Tom Hughes wrote: > The problem is that VG_(gettid) needs to get the ID of the current > thread and it assumes that if the gettid system call returns ENOSYS > then the getpid system call will do that. Is that assumption valid? > This seems to work, but it is horrible, so what do people think about > putting it in? or does anybody have a better way to get the real ID of > the current thread on these systems? What does the patch look like? Is it big, or small but conceptually horrible? If the latter, I would say put it in if it works and fixes a problem that is affecting a number of people. Just my two cents. N |
|
From: Tom H. <th...@cy...> - 2004-10-13 10:16:00
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> On Wed, 13 Oct 2004, Tom Hughes wrote:
>
>> The problem is that VG_(gettid) needs to get the ID of the current
>> thread and it assumes that if the gettid system call returns ENOSYS
>> then the getpid system call will do that.
>
> Is that assumption valid?
In principle, yes.
Linux uses a 1-1 model where every thread in the program is a distinct
object at the kernel level. Specifically the kernel treats each thread
pretty much the same as it does a process so it has a slot in the
process table and a process ID value.
Until recently the getpid system call returned the process ID of the
thread which called it. As a result the getpid() function in glibc had
the same behaviour. That is wrong as POSIX says that getpid() should
return the same value for all threads in a process.
As part of the move to the new threading system the getpid system call
was changed to return the process ID of the root thread so that it
behaves as POSIX says it should. At the same time a gettid system call
was added which does what getpid used to do and returns the ID of the
thread which calls it.
Except that it seems there was a small gap between these two steps, or
at least that RedHat released some kernels with the first change patched
in and not the second.
So valgrind's assumption that if gettid doesn't exist then getpid will
do what gettid now does is normally correct except for this small group
of problem kernels.
>> This seems to work, but it is horrible, so what do people think about
>> putting it in? or does anybody have a better way to get the real ID of
>> the current thread on these systems?
>
> What does the patch look like? Is it big, or small but conceptually
> horrible? If the latter, I would say put it in if it works and fixes
> a problem that is affecting a number of people. Just my two cents.
It's tiny - this is the whole thing:
Index: coregrind/vg_mylibc.c
===================================================================
RCS file: /home/kde/valgrind/coregrind/vg_mylibc.c,v
retrieving revision 1.81
diff -u -r1.81 vg_mylibc.c
--- coregrind/vg_mylibc.c 15 Jul 2004 12:59:41 -0000 1.81
+++ coregrind/vg_mylibc.c 19 Jul 2004 13:31:47 -0000
@@ -232,8 +232,14 @@
ret = VG_(do_syscall)(__NR_gettid);
- if (ret == -VKI_ENOSYS)
- ret = VG_(do_syscall)(__NR_getpid);
+ if (ret == -VKI_ENOSYS) {
+ Char pid[16];
+
+ if ((ret = VG_(do_syscall)(__NR_readlink, "/proc/self", pid, sizeof(pid))) >= 0) {
+ pid[ret] = '\0';
+ ret = VG_(atoll)(pid);
+ }
+ }
return ret;
}
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-10-13 10:29:41
|
On Wed, 13 Oct 2004, Tom Hughes wrote: >> What does the patch look like? > > It's tiny - this is the whole thing: [snip] If you add an explanatory comment, I have no objections to it. N |