|
From: Nicholas N. <nj...@cs...> - 2005-03-13 18:01:41
Attachments:
pth_exit.c
x
|
Hi, In the subversion tree, two regtests are hanging for me, with my Debian 3.0, 2.4.29 kernel system. 1. pth_exit hangs near the end. I augmented it with debugging printf statements (see attachment), and I get the following output. ==11609== Nulgrind, a binary JIT-compiler. ==11609== Copyright (C) 2002-2005, and GNU GPL'd, by Nicholas Nethercote. ==11609== Using LibVEX rev 1020, a library for dynamic binary translation. ==11609== Copyright (C) 2004, and GNU GPL'd, by OpenWorks LLP. ==11609== Using valgrind-3.0.0.CVS, a dynamic binary instrumentation framework. ==11609== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==11609== For more details, rerun with: -v ==11609== pre-1 pre-sleep pre-2 pre-sleep pre-3 pre-sleep pre-4 pre-sleep pre-exit post-sleep post-sleep post-sleep post-sleep It then hangs here. If I hit Ctrl-C it then prints the message from TL_(fini) and finishes normally. The --trace-syscalls=yes log (attached; note that it's for the program without the debugging printf statements) indicates that it's getting stuck in a poll() call -- if I let the program run, the repeating lines at the end of the log get added to about once every 2 seconds. Any ideas what the problem is, or how I could debug this further? --- Also, I find that none/tests/manythreads takes about 150 seconds on my machine, which is more than half of the time for the whole (reduced SVN tree) suite.. Could we reduce the number of threads by 90% to 1000? N |
|
From: Julian S. <js...@ac...> - 2005-03-13 18:22:17
|
> 1. pth_exit hangs near the end. [...]
>
> The --trace-syscalls=yes log (attached; note that it's for the program
> without the debugging printf statements) indicates that it's getting stuck
> in a poll() call -- if I let the program run, the repeating lines at the
> end of the log get added to about once every 2 seconds.
>
> Any ideas what the problem is, or how I could debug this further?
Interesting. OOo on SuSE 9.1 also hangs at exit but will then
exit after Control-C; there are several threads that have called
exit_group but two which are stuck in 'poll'. I'll make enquiries.
This looks just slightly simpler to debug than OOo :-)
> ---
>
> Also, I find that none/tests/manythreads takes about 150 seconds on my
> machine, which is more than half of the time for the whole (reduced SVN
> tree) suite.. Could we reduce the number of threads by 90% to 1000?
Hmm. I had dealings with that one yesterday and I thought rev 3311 fixed
it. Even on my weedy 1 GHz VIA machine, it now takes 4.1 seconds. This
10000ness of the test is valuable, because it guarantees that if the test
completes then the address space manager is not leaking thread stacks
since each thread stack is 2M.
In vg_memory.c find this line:
if (0) show_segments("unmap_range(BEFORE)");
and change it to if (1). Re-run. You'll get a tremendous amount of
crap, but if the address space manager is working OK it should settle
down to showing about 40-50 segments, basically staying constant
after about the first 10MB or so of logfile spewage. If it's still
broken the number of segments might be rising endlessly.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-13 18:46:53
|
On Sun, 13 Mar 2005, Julian Seward wrote:
>> Also, I find that none/tests/manythreads takes about 150 seconds on my
>> machine, which is more than half of the time for the whole (reduced SVN
>> tree) suite.. Could we reduce the number of threads by 90% to 1000?
>
> Hmm. I had dealings with that one yesterday and I thought rev 3311 fixed
> it. Even on my weedy 1 GHz VIA machine, it now takes 4.1 seconds. This
> 10000ness of the test is valuable, because it guarantees that if the test
> completes then the address space manager is not leaking thread stacks
> since each thread stack is 2M.
>
> In vg_memory.c find this line:
>
> if (0) show_segments("unmap_range(BEFORE)");
>
> and change it to if (1). Re-run. You'll get a tremendous amount of
> crap, but if the address space manager is working OK it should settle
> down to showing about 40-50 segments, basically staying constant
> after about the first 10MB or so of logfile spewage. If it's still
> broken the number of segments might be rising endlessly.
The number is increasing. Check out www.cs.utexas.edu/~njn/trace.bz2,
which got up to 20MB and 973 segments before I killed it.
N
|
|
From: Jeremy F. <je...@go...> - 2005-03-14 00:42:12
|
Nicholas Nethercote wrote:
> Also, I find that none/tests/manythreads takes about 150 seconds on my
> machine, which is more than half of the time for the whole (reduced
> SVN tree) suite.. Could we reduce the number of threads by 90% to 1000?
Something is very wrong then; it takes about 2 seconds for me. How long
does it take on that machine with 2.4.0?
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-14 00:47:40
|
On Sun, 13 Mar 2005, Jeremy Fitzhardinge wrote: >> Also, I find that none/tests/manythreads takes about 150 seconds on my >> machine, which is more than half of the time for the whole (reduced SVN >> tree) suite.. Could we reduce the number of threads by 90% to 1000? > > Something is very wrong then; it takes about 2 seconds for me. How long does > it take on that machine with 2.4.0? 4 seconds. Julian's been looking at the problem in SVN. N |
|
From: Julian S. <js...@ac...> - 2005-03-14 00:54:36
|
Uh, this was caused by me messing with the low-level address space manager. I'll chase it. It's leaking thread stacks. It would be very helpful if you could look at one of the various threaded-program-hangs-at-exit cases. I fixed a bunch of tls stuff earlier, so you should have some chance of running threaded tests now. J > > Also, I find that none/tests/manythreads takes about 150 seconds on my > > machine, which is more than half of the time for the whole (reduced > > SVN tree) suite.. Could we reduce the number of threads by 90% to 1000? > > Something is very wrong then; it takes about 2 seconds for me. How long > does it take on that machine with 2.4.0? |
|
From: Nicholas N. <nj...@cs...> - 2005-03-14 01:08:23
|
On Mon, 14 Mar 2005, Julian Seward wrote: > It would be very helpful if you could look at one of the various > threaded-program-hangs-at-exit cases. I fixed a bunch of tls stuff > earlier, so you should have some chance of running threaded tests > now. pth_exit still hangs for me. N |