From: Andraz T. <And...@gu...> - 2003-05-29 23:23:47
|
I am trying to debug large heavily multithreaded program (Cinelerra) with Valgrind, but i think i have found a bug in valgrind instead. Basically, in one case locking mutex just after it was unlocked yields mutex block. Log is this: Unlocking mutex: 0x491c3c70 --17547-- PTHREAD[25]: pthread_mutex_unlock mx 0x491C3C70 ... Locking mutex: 0x491c3c70 --17547-- PTHREAD[25]: pthread_mutex_lock mx 0x491C3C70 ... --17547-- PTHREAD[25]: pthread_mutex_lock mx 0x491C3C70: BLOCK No lines between were deleted, so this events happen one after another. This is definitely wrong behaviour, as the lock after unlock should not block. I am using debian unstable, glibc 2.3.1, gcc 3.3 I tested this with valgrind 1.9.6 and with CVS version. Maybe worth mentioning is, that thread 25 wasn't the thread that created & locked the mutex at the first place, but this should not be an issue, since this is vaild behaviour in pthreads. Cinelerra uses mutex locking all over the place, but this behaviour is 100% reproducable - always happens with the same mutex. Bye Anrdaz |
From: Jeremy F. <je...@go...> - 2003-05-29 23:40:03
|
On Thu, 2003-05-29 at 16:23, Andraz Tori wrote: > Maybe worth mentioning is, that thread 25 wasn't the thread that created > & locked the mutex at the first place, but this should not be an issue, > since this is vaild behaviour in pthreads. No it isn't. You can't lock with one thread and unlock with another - pthread_mutex_unlock should return EPERM (calling thread does not own mutex). The bug is that Valgrind should flag this as an error. J |
From: Jeremy F. <je...@go...> - 2003-05-30 00:34:47
|
On Thu, 2003-05-29 at 17:01, Andraz Tori wrote: > Thanks, I didn't know that. De facto this is vaild behaviour in > pthreads... Everything works as expected if valgrind is not run... > > If this is true than mutexes are useless for interthread synchronisation > of 'data ready' states? They're already useless for that; I'm guessing you really want to be using condition variables, which have the behaviour you're looking for. With any luck, they synchronization code is only in one or two places... > > The bug is that Valgrind should flag this as an error. > > The whole cinelerra (quite a huge program) is based upon this kind of > behaviour. It uses mutexes in order to synchronize 'data delivery': That's exactly what a condvar is for. > So my question is... would it be possible for Valgrind to support this > bugous behaviour by some switch or something? If you come up with a patch we can look at it. But I think your time would be better spent fixing the code: what you're asking is equivalent to "but my program always uses memory after it has been freed, and it works fine without Valgrind". > The problem is that cinelerra works just fine if it is run without > valgrind... Oh yes. Have you tried it with NPTL? J |
From: Igmar P. <mai...@jd...> - 2003-05-31 10:04:19
|
On Fri, 29 May 2003, Jeremy Fitzhardinge wrote: > On Thu, 2003-05-29 at 16:23, Andraz Tori wrote: > > > Maybe worth mentioning is, that thread 25 wasn't the thread that created > > & locked the mutex at the first place, but this should not be an issue, > > since this is vaild behaviour in pthreads. > > No it isn't. You can't lock with one thread and unlock with another - > pthread_mutex_unlock should return EPERM (calling thread does not own > mutex). With a big if attached : The mutex should be created as a checking mutex. If you don't, things just hang / crash / do other weird stuff. Use PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP instead of PTHREAD_MUTEX_INITIALIZER > The bug is that Valgrind should flag this as an error. > > J Indeed. Igmar |
From: Andraz T. <And...@gu...> - 2003-05-30 00:01:24
|
V pet, 30.05.2003 ob 01:37, je Jeremy Fitzhardinge poslal(a): > On Thu, 2003-05-29 at 16:23, Andraz Tori wrote: > > Maybe worth mentioning is, that thread 25 wasn't the thread that created > > & locked the mutex at the first place, but this should not be an issue, > > since this is vaild behaviour in pthreads. > No it isn't. You can't lock with one thread and unlock with another - > pthread_mutex_unlock should return EPERM (calling thread does not own > mutex). Thanks, I didn't know that. De facto this is vaild behaviour in pthreads... Everything works as expected if valgrind is not run... If this is true than mutexes are useless for interthread synchronisation of 'data ready' states? > The bug is that Valgrind should flag this as an error. The whole cinelerra (quite a huge program) is based upon this kind of behaviour. It uses mutexes in order to synchronize 'data delivery': When data is ready, the server thread unlocks A-'data ready for processing mutex' and blocks on B-'data is processed mutex'. Before that client thread is waiting on blocked A. When server thread unlocks it, client thread can go on untill it finishes. When it finishes, it locks A again, unlocks B, and locks (and this time blocks) on A again. Server thread was waiting on blocking B and since it has now been unlocked by client, it knows data is ready and can happily continue. So my question is... would it be possible for Valgrind to support this bugous behaviour by some switch or something? The problem is that cinelerra works just fine if it is run without valgrind... Bye Andraz |
From: Julian S. <js...@ac...> - 2003-05-30 00:07:57
|
> So my question is... would it be possible for Valgrind to support this > bugous behaviour by some switch or something? > > The problem is that cinelerra works just fine if it is run without > valgrind... If what you say is true, then it only works by luck and is certainly not compliant the the POSIX pthread standard. It sounds like you need to use the "condition variable" facility of POSIX pthreads to achieve the "data ready" signalling you want, or perhaps the POSIX semaphore functions (sem_*). J |
From: Andraz T. <And...@gu...> - 2003-05-30 00:29:04
|
V pet, 30.05.2003 ob 02:07, je Julian Seward poslal(a): > > So my question is... would it be possible for Valgrind to support this > > bugous behaviour by some switch or something? > > > > The problem is that cinelerra works just fine if it is run without > > valgrind... > > If what you say is true, then it only works by luck and is certainly > not compliant the the POSIX pthread standard. It sounds like you > need to use the "condition variable" facility of POSIX pthreads to > achieve the "data ready" signalling you want, or perhaps the > POSIX semaphore functions (sem_*). well.. i don't know why it works. but fact is that the whole program (heavily multithreaded) is written in such manner, and up to now there were no problems with it... Maybe it is not what POSIX says, but it seems that it works... i just added error checking if(pthread_mutex_unlock(&mutex)) printf("Mutex unlock error"); no error is ever reported (when not running under valgrind). Valgrind reports warning, but returns success! Does this mean that pthreads do not do what they should? i don't know. I am not an expert, just an user with a problem, so i am sorry if i say something stupid. :) bye andraz |
From: Jeremy F. <je...@go...> - 2003-05-30 00:55:14
|
On Thu, 2003-05-29 at 17:28, Andraz Tori wrote: > well.. i don't know why it works. but fact is that the whole program > (heavily multithreaded) is written in such manner, and up to now there > were no problems with it... > Maybe it is not what POSIX says, but it seems that it works... That's an illusion. There's no certainty until you fix your code. I suspect that even if pthread mutexes were working in the way you think they should, your code would still have race-conditions and/or be deadlock-prone. Condition variables are pretty easy to use: a pthread_cond_t is always paired with a pthread_mutex_t. To sleep on one, waiting for an event to happen, you use: pthread_mutex_lock(&mutex); while(!test_condition()) { // cond_wait() atomically released the mutex and // goes to sleep on the condvar pthread_cond_wait(&condvar, &mutex); } // act on event pthread_mutex_unlock(&mutex); At the other end, when you want to generate your event, you do: pthread_mutex_lock(&mutex); set_condition_true(); pthread_cond_signal(&condvar); // or cond_broadcast() pthread_mutex_unlock(&mutex); > Does this mean that pthreads do not do what they should? i don't know. I > am not an expert, just an user with a problem, so i am sorry if i say > something stupid. :) By default pthread mutexes are "fast" which means they don't check for errors, and don't report them; they just behave in undefined ways. It so happens in this case it seems to be behaving in the way you expect, but you can't rely on it: someone might upgrade their library and completely break the program. If you create an error-checking pthread mutex, it should start reporting errors. J |
From: Andraz T. <And...@gu...> - 2003-05-30 01:09:42
|
V pet, 30.05.2003 ob 02:32, je Jeremy Fitzhardinge poslal(a): > On Thu, 2003-05-29 at 17:01, Andraz Tori wrote: > > Thanks, I didn't know that. De facto this is vaild behaviour in > > pthreads... Everything works as expected if valgrind is not run... > > > > If this is true than mutexes are useless for interthread synchronisation > > of 'data ready' states? > They're already useless for that; I'm guessing you really want to be > using condition variables, which have the behaviour you're looking > for. With any luck, they synchronization code is only in one or two > places... I am afraid it is not so easy... Cinelerra is a strange beast... kocka:/home/minmax/progs/hvirtual-1.1.6/cinelerra# grep Mutex *.[Ch]|wc -l 104 approx half of these mutexes is being used for synchronisation like this and this does not include plugins ... :) The problem is that main maintainer has... mildly said ... strange attitutde towards any external work done. to my experience he would never accept external patch to fix this. Heh... my other question is... would it be possible to modify valgrind so it would not report memory leaks that occured just once (at the same code line)? Again, main maintainer isn't willing to make deletions on exit (even when provided with a proper patch) so i am looking for some other solution to be able to find just leaks that are done repeatedly while using the program. There are houndrets of one-time leaks, so writing supressions for all of them is out of the question... > > > The bug is that Valgrind should flag this as an error. > > > > The whole cinelerra (quite a huge program) is based upon this kind of > > behaviour. It uses mutexes in order to synchronize 'data delivery': > That's exactly what a condvar is for. Thanks ! But aren't condvars slower than mutexes? (i would imagine) > > So my question is... would it be possible for Valgrind to support this > > bugous behaviour by some switch or something? > If you come up with a patch we can look at it. But I think your time > would be better spent fixing the code: what you're asking is > equivalent to "but my program always uses memory after it has been > freed, and it works fine without Valgrind". I would gladly fix the code, but i am afraid it wont be accepted into main branch of cinelerra, which means the work would be useless. At least to my experience. So I just try to work on trivial bugs, because anything nontrivial isn't accepted. > > The problem is that cinelerra works just fine if it is run without > > valgrind... > Oh yes. Have you tried it with NPTL? no, I am already afraid. :) but the bottom line: why don't neither pthreads neither valgrind return the error on the unlock call... Valgrind reports a warning, but call still returns success (perror says nothing)... even if it is obvious there was a failure. I did create error checking pthread mutex (as you suggest in your next message).. it is not reporting errors. the unlock is done with: printf("Unlocking mutex: %p\n", &mutex); if(pthread_mutex_unlock(&mutex)) {perror("Mutex::unlock"); here it is, relevant part of the log file... mutex 0x41a9f2a4 is being unlocked, and perror says successefuly, on the next usage it blocks on lock... Unlocking mutex: 0x41a9f2a4 --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x41A9F2A4 ... --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x409C28C8 ... --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x409C28C8 ... --24562-- PTHREAD[25]: pthread_key_validate key 0x3 --24562-- PTHREAD[25]: pthread_getspecific_ptr --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x409BB1F0 ... --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x409C28C8 ... --24562-- PTHREAD[25]: pthread_mutex_unlock mx 0x409C28C8 ... Mutex::unlock: Success Locking mutex: 0x41a9f2a4 --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x41A9F2A4 ... --24562-- PTHREAD[25]: pthread_mutex_lock mx 0x41A9F2A4: BLOCK bye andraz |
From: Jeremy F. <je...@go...> - 2003-05-30 06:53:39
|
On Thu, 2003-05-29 at 18:09, Andraz Tori wrote: > The problem is that main maintainer has... mildly said ... strange > attitutde towards any external work done. to my experience he would > never accept external patch to fix this. Ah well, that's unfortunate. > Thanks ! But aren't condvars slower than mutexes? (i would imagine) Don't know; I've never measured them. But that's unimportant if mutexes are the wrong mechanism to use. > here it is, relevant part of the log file... > mutex 0x41a9f2a4 is being unlocked, and perror says successefuly, on the > next usage it blocks on lock... OK, we should make valgrind mutexes more picky. J |
From: Igmar P. <mai...@jd...> - 2003-05-31 10:36:11
|
On Fri, 29 May 2003, Jeremy Fitzhardinge wrote: > They're already useless for that; I'm guessing you really want to be > using condition variables, which have the behaviour you're looking for. > With any luck, they synchronization code is only in one or two places... I don't agree, they're used to protect global variables from having multiple threads messing with then : pthread_mutex_lock(&mutex); for (....;...;...) { /* Walk linked list*/ ..... } pthread_mutex_unlock(&mutex); This works fine, at least in my app with at least 5 threads running, not counting incoming connections. I use rwlocks whenever I can, that means you must be able to read a datastructure without modifying it. cond. variables are the way to go when it comes to thread synchronisation, I use them to process an application shutdown. I've seen weird things happening when valgrind monitors threads, including the SIGFPE's flying around :). I'll see if I can reproduce it again, I know what happened when those things started. > > > The bug is that Valgrind should flag this as an error. > > > > The whole cinelerra (quite a huge program) is based upon this kind of > > behaviour. It uses mutexes in order to synchronize 'data delivery': > > That's exactly what a condvar is for. Indeed. Mutexes can't be used for that. > > So my question is... would it be possible for Valgrind to support this > > bugous behaviour by some switch or something? > > If you come up with a patch we can look at it. But I think your time > would be better spent fixing the code: what you're asking is equivalent > to "but my program always uses memory after it has been freed, and it > works fine without Valgrind". > > > The problem is that cinelerra works just fine if it is run without > > valgrind... > > Oh yes. Have you tried it with NPTL? That requires kernel patching and a complete glibc rebuild :) Just to wonder OT : Can both LinuxThreads and NTPL be build into the same glibc ?? > > J Igmar |
From: Jeremy F. <je...@go...> - 2003-06-02 02:30:43
|
On Sat, 2003-05-31 at 03:15, Igmar Palsenberg wrote: > On Fri, 29 May 2003, Jeremy Fitzhardinge wrote: > > > They're already useless for that; I'm guessing you really want to be > > using condition variables, which have the behaviour you're looking for. > > With any luck, they synchronization code is only in one or two places... > > I don't agree, they're used to protect global variables from having > multiple threads messing with then ... I think we're violently agreeing on that. > That requires kernel patching and a complete glibc rebuild :) > > Just to wonder OT : Can both LinuxThreads and NTPL be build into the same > glibc ?? RH9 seems to have things set up so you can run either version. J |