You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
1
(1) |
2
(2) |
3
|
4
(1) |
5
(6) |
6
|
|
7
(1) |
8
|
9
(1) |
10
(2) |
11
(6) |
12
(3) |
13
(3) |
|
14
|
15
(11) |
16
(8) |
17
(5) |
18
(5) |
19
(5) |
20
(3) |
|
21
(2) |
22
(4) |
23
(5) |
24
(4) |
25
|
26
|
27
|
|
28
(8) |
|
|
|
|
|
|
|
From: Konstantin S. <kon...@gm...> - 2010-02-07 18:53:20
|
On Fri, Feb 5, 2010 at 8:00 PM, Julian Seward <js...@ac...> wrote:
>
> The log is quite useful. It might be that there is a race
> between the handling for sys_clone and for sys_exit_group. I'm not
> sure I understand the details though.
>
> sys_exit_group happens when the main thread exits. It marks
> all other threads in the same thread group as "to be forced
> to exit". If any of these threads are blocked in syscalls
> then they are hit on the head with sigvgkill to get them out
> of the syscall. Or something like that. (see function
> PRE_(sys_exit_group)).
>
> So, I suspect the problem is, there is a child thread
> that has just been created by clone
> (by a call to do_syscall_clone_amd64_linux)
> but which is not yet marked
> as being in the same thread group as its parent
> (which happens a few hundred instructions after the child's
> starup, in thread_wrapper (called by run_a_thread_NORETURN called
> by ML_(start_thread_NORETURN), which is the start point
> for the child on the host cpu).
>
> Then the parent exits, but the child is not marked as also-to-exit
> because it is not marked as in the same thread group as
> its parent. So it stays alive. This is I think what happened
> to tid=281 in the logfile you sent.
>
> It would be best to mark the child's thread group before
> creating it. But I don't understand the meaning of thread groups,
> and how these relate to what VG_(gettid) and VG_(getpid) return.
>
> I could chase this if you can refine the test case into something
> that reliably hangs every time -- the current 5% failure rate is going to
> make it impossible to investigate.
>
> One thing you could do is to insert a spin-wait loop in
>
Indeed, the patch below make the bug manifeest itself every time.
The process either hangs (top shows it as zombie) or continues to print
stuff forever.
--kcc
--- coregrind/m_syswrap/syswrap-linux.c (revision 11037)
+++ coregrind/m_syswrap/syswrap-linux.c (working copy)
@@ -214,11 +214,20 @@
vg_assert(0);
}
+static void spin_loop(int c, int tid) {
+ static volatile int z;
+ VG_(printf)("spinning: %d\n", tid);
+ while(c--) {
+ z++;
+ }
+ VG_(printf)("done: %d\n", tid);
+}
+
Word ML_(start_thread_NORETURN) ( void* arg )
{
ThreadState* tst = (ThreadState*)arg;
ThreadId tid = tst->tid;
-
+ spin_loop(1 << 25, tid);
run_a_thread_NORETURN ( (Word)tid );
/*NOTREACHED*/
vg_assert(0);
> ML_(start_thread_noreturn) [make sure gcc doesn't just optimise it
> away] to delay the point where the child sets up its .threadgroup
> field. This might make the hang happen more often. Can you try that?
>
> J
>
> On Tuesday 02 February 2010, Konstantin Serebryany wrote:
> > Hi Julian,
> >
> > Any luck with this hang?
> > Anything I can help with?
> >
> > --kcc
> >
> > On Thu, Jan 28, 2010 at 10:37 AM, Konstantin Serebryany <
> >
> > kon...@gm...> wrote:
> > > Sent a log off list
> > > With logging on it does not really want to hang.
> > > Instead (with ~5% probability) it loops forever.
> > > I think this is the same bug -- the process misses its own death
> time...
> > >
> > > --kcc
> > >
> > > On Thu, Jan 28, 2010 at 10:40 AM, Julian Seward <js...@ac...>
> wrote:
> > >> On Wednesday 27 January 2010, Julian Seward wrote:
> > >> > On Wednesday 27 January 2010, Konstantin Serebryany wrote:
> > >> > > I've minimized the problem to a small test (below).
> > >> > > It spawns many threads and doesn't join them before exiting.
> > >> > > It will hang (or loop forever) one out of 40-100 runs:
> > >> > > % g++ -g -lpthread hang.cc
> > >> > > % for((i=10;i<=99;i++)); do date; time
> > >>
> > >> ~/valgrind/trunk/inst/bin/valgrind
> > >>
> > >> > > --tool=none --trace-syscalls=yes --trace-signals=yes -q ./a.out
> 2>
> > >> > > $i.log ; done
> > >> >
> > >> > Ok; managed to reproduce it. 2 threads were still stuck in some
> > >> > syscall (don't know which yet). Investigating.
> > >>
> > >> I can reproduce it, but only in the case where there is no logging,
> > >> which isn't useful. If you have a logfile where it hangs for
> > >> --trace-syscalls=yes --trace-signals=yes, can you compress it and
> > >> send it to me? afaics the log is about 40MB long, but it should
> > >> bzip2 nicely.
> > >>
> > >> J
>
>
>
|