|
From: Christoph B. <bar...@or...> - 2009-02-05 15:26:51
|
Hi,
today I had to learn that the attached program is incorrect. It is not allowed
to destroy the barrier while not all threads have left the
pthread_barrier_wait() call.
Unfortunately neither DRD nor Helgrind warn about this error. Could you please
improve the tools to detect such errors?
Christoph
#include <pthread.h>
#include <stdlib.h>
pthread_barrier_t * barrier;
void * thread(void * arg) {
pthread_barrier_wait(barrier);
return NULL;
}
int main() {
pthread_t tid;
barrier = (pthread_barrier_t *) malloc(sizeof(*barrier));
pthread_barrier_init(barrier, NULL, 2);
pthread_create(&tid, NULL, thread, NULL);
pthread_barrier_wait(barrier);
pthread_barrier_destroy(barrier);
free(barrier);
pthread_join(tid, NULL);
return 0;
}
|
|
From: Julian S. <js...@ac...> - 2009-02-05 15:34:03
|
On Thursday 05 February 2009, Christoph Bartoschek wrote: > Hi, > > today I had to learn that the attached program is incorrect. It is not > allowed to destroy the barrier while not all threads have left the > pthread_barrier_wait() call. > > Unfortunately neither DRD nor Helgrind warn about this error. Could you > please improve the tools to detect such errors? Um, where is the bug in this program? To me it looks OK: the barrier is not destroyed until after both parent and child have passed it. J |
|
From: Christoph B. <bar...@or...> - 2009-02-05 17:35:16
|
Am Donnerstag, 5. Februar 2009 schrieb Julian Seward: > On Thursday 05 February 2009, Christoph Bartoschek wrote: > > Hi, > > > > today I had to learn that the attached program is incorrect. It is not > > allowed to destroy the barrier while not all threads have left the > > pthread_barrier_wait() call. > > > > Unfortunately neither DRD nor Helgrind warn about this error. Could you > > please improve the tools to detect such errors? > > Um, where is the bug in this program? To me it looks OK: the barrier > is not destroyed until after both parent and child have passed it. > For me this also looked ok till now. But the standard seems not to guarantee that this works. I have asked about this in comp.programming.threads: http://groups.google.com/group/comp.programming.threads/browse_thread/thread/4f65535d6192aa50/a5f4bf1e3b437c4d?lnk=st&q=#a5f4bf1e3b437c4d My explanation is that the threads need still access to the barrier after being woken up from the wait. When the last thread reaches the barrier all waiting threads are woken up but they are not yet finished with pthread_barrier_wait(). When the first thread leaving pthread_barrier_wait() destroys the barrier, then the other threads cannot perform their final tasks in pthread_barrier_wait(). Christoph |
|
From: tom f. <tf...@al...> - 2009-02-05 17:47:24
|
Christoph Bartoschek <bar...@or...> writes: > Am Donnerstag, 5. Februar 2009 schrieb Julian Seward: > > On Thursday 05 February 2009, Christoph Bartoschek wrote: > > > Hi, > > > > > > today I had to learn that the attached program is incorrect. It is not > > > allowed to destroy the barrier while not all threads have left the > > > pthread_barrier_wait() call. > > > > > > Unfortunately neither DRD nor Helgrind warn about this error. Could you > > > please improve the tools to detect such errors? > > > > Um, where is the bug in this program? To me it looks OK: the barrier > > is not destroyed until after both parent and child have passed it. > > > > For me this also looked ok till now. But the standard seems not to guarantee > that this works. > > I have asked about this in comp.programming.threads: > > http://groups.google.com/group/comp.programming.threads/browse_thread/thread/ > 4f65535d6192aa50/a5f4bf1e3b437c4d?lnk=st&q=#a5f4bf1e3b437c4d > > My explanation is that the threads need still access to the barrier after > being woken up from the wait. > > When the last thread reaches the barrier all waiting threads are woken up but > they are not yet finished with pthread_barrier_wait(). When the first thread > leaving pthread_barrier_wait() destroys the barrier, then the other threads > cannot perform their final tasks in pthread_barrier_wait(). I'm getting a bit off topic, but .. Perhaps I'm just not understanding the linked-to discussion, but given this interpretation -- how could one ever delete a barrier? It sounds like the only safe way to destroy the barrier is if you've joined every thread which could have possibly used it. Given that constraint, I'm not sure how real world software could reasonably deal with this. So is the idea essentially that we might as well forget about destroying barriers? What am I missing? -tom |
|
From: Christoph B. <bar...@or...> - 2009-02-05 18:35:38
|
Am Donnerstag, 5. Februar 2009 schrieb tom fogal: > I'm getting a bit off topic, but .. > > Perhaps I'm just not understanding the linked-to discussion, but given > this interpretation -- how could one ever delete a barrier? > > It sounds like the only safe way to destroy the barrier is if you've > joined every thread which could have possibly used it. Given that > constraint, I'm not sure how real world software could reasonably deal > with this. > > So is the idea essentially that we might as well forget about > destroying barriers? What am I missing? 1. There is the proposal to fix the standard by allowing the thread that gets the return value of PTHREAD_BARRIER_SERIAL_THREAD to destroy the barrier. 2. You can delete the barrier as soon as you know that all threads left the call to pthread_barrier_wait(). This can be done by other synchronisation primitives like another barrier, a lock, a condvar or a join. Christoph |
|
From: Julian S. <js...@ac...> - 2009-02-06 19:19:37
|
Christoph, I understand, by reading the thread that Tom Fogal refers to .. > http://groups.google.com/group/comp.programming.threads/browse_thread/thread/ > 4f65535d6192aa50/a5f4bf1e3b437c4d?lnk=st&q=#a5f4bf1e3b437c4d .. that the POSIX pthreads standard is in a way broken: you cannot know when you are the last thread to leave the barrier, and so there is no safe way to destroy the barrier without using yet another synchronisation operation to somehow guarantee that all the threads really have left the barrier. That is a correct understanding, yes? Now it is indeed the case that both Helgrind and DRD do report destruction of a barrier which has waiting threads. You can easily verify this using the regression test case helgrind/tests/bad_bar.c. However, in your example, all threads are considered by Helgrind and DRD to have left the barrier before you destroy it. Hence no error is reported. If you can suggest some criteria that allows to distinguish the case you consider an error, from a "safe" destruction of a barrier, that would be very helpful. But given that the POSIX spec is basically broken, I don't see how it would be possible to construct such a criteria. J On Thursday 05 February 2009, Christoph Bartoschek wrote: > Am Donnerstag, 5. Februar 2009 schrieb tom fogal: > > I'm getting a bit off topic, but .. > > > > Perhaps I'm just not understanding the linked-to discussion, but given > > this interpretation -- how could one ever delete a barrier? > > > > It sounds like the only safe way to destroy the barrier is if you've > > joined every thread which could have possibly used it. Given that > > constraint, I'm not sure how real world software could reasonably deal > > with this. > > > > So is the idea essentially that we might as well forget about > > destroying barriers? What am I missing? > > 1. There is the proposal to fix the standard by allowing the thread that > gets the return value of PTHREAD_BARRIER_SERIAL_THREAD to destroy the > barrier. > > 2. You can delete the barrier as soon as you know that all threads left the > call to pthread_barrier_wait(). This can be done by other synchronisation > primitives like another barrier, a lock, a condvar or a join. > > Christoph > > --------------------------------------------------------------------------- >--- Create and Deploy Rich Internet Apps outside the browser with > Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing > skills and code to build responsive, highly engaging applications that > combine the power of local resources and data with the reach of the web. > Download the Adobe AIR SDK and Ajax docs to start building applications > today-http://p.sf.net/sfu/adobe-com > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Bart V. A. <bar...@gm...> - 2009-02-07 14:51:49
|
On Fri, Feb 6, 2009 at 8:19 PM, Julian Seward <js...@ac...> wrote: > If you can suggest some criteria that allows to distinguish the case you > consider an error, from a "safe" destruction of a barrier, that would be > very helpful. But given that the POSIX spec is basically broken, I don't > see how it would be possible to construct such a criteria. How about comparing the vector clocks of the most recent barrier_wait() calls with the vector clock of the thread destroying the barrier ? This should allow to find out whether or not barrier_wait() calls and a barrier_destroy() call that explicitly destroys a barrier or any free() call that implicitly destroys a barrier were ordered via a synchronization operation. Bart. |
|
From: Bart V. A. <bar...@gm...> - 2009-02-21 16:35:02
|
On Thu, Feb 5, 2009 at 4:26 PM, Christoph Bartoschek
<bar...@or...> wrote:
> today I had to learn that the attached program is incorrect. It is not allowed
> to destroy the barrier while not all threads have left the
> pthread_barrier_wait() call.
>
> Unfortunately neither DRD nor Helgrind warn about this error. Could you please
> improve the tools to detect such errors?
>
> Christoph
>
> #include <pthread.h>
> #include <stdlib.h>
>
> pthread_barrier_t * barrier;
>
> void * thread(void * arg) {
> pthread_barrier_wait(barrier);
> return NULL;
> }
>
> int main() {
> pthread_t tid;
>
> barrier = (pthread_barrier_t *) malloc(sizeof(*barrier));
> pthread_barrier_init(barrier, NULL, 2);
>
> pthread_create(&tid, NULL, thread, NULL);
>
> pthread_barrier_wait(barrier);
> pthread_barrier_destroy(barrier);
> free(barrier);
>
> pthread_join(tid, NULL);
> return 0;
> }
The latest trunk revision of DRD (r9214 or later) should now always
print an error message for the above example: depending on how threads
are scheduled, DRD will either complain that a barrier is being
destroyed that is still in use or that synchronization between
pthread_barrier_wait() and pthread_barrier_destroy() was missing. The
last case can be forced by inserting sleep(1) just before the call to
pthread_barrier_destroy().
Thanks for reporting this.
Bart.
|