|
From: Eyal L. <ey...@ey...> - 2007-11-11 22:54:53
|
I am trying to use helgrind and so far I have a showstopper. Simplified, my software is a number of servers and clients. The clients execute a heartbeat connection to one server (1/sec). Bringing up the servers and clients all looks fine but very soon (90s after it actually starts running) one server exits. Another exits a few minutes later, then the last one survives for much longer. The vg logs always end with this report: Helgrind: Fatal internal error -- cannot continue. Helgrind: mk_SHVAL_ShR(tset=8192,lset=1): FAILED Helgrind: max allowed tset=8191, lset=131071 Helgrind: program has too many thread sets or lock sets to track. ... My servers were effectively idle at this point, the real work did not even start. There were only a dozen threads "created" by any server by the time it fails. The program uses pthreads, nothing fancy, the testing is done on Linux Debian 3.1 on x86, libc-2.3.2. Latest vg off svn. -- Eyal Lebedinsky (ey...@ey...) |
|
From: Julian S. <js...@ac...> - 2007-11-11 23:02:37
|
> Helgrind: Fatal internal error -- cannot continue. > Helgrind: mk_SHVAL_ShR(tset=8192,lset=1): FAILED > Helgrind: max allowed tset=8191, lset=131071 > Helgrind: program has too many thread sets or lock sets to track. Yes. It's a problem I'm tracking. Really it is trying to squeeze too much information into a 32-bit word. It's probably fixable properly at some performance loss. In the meantime, at hg_main.c:902 reduce N_LSID_BITS from 17 to 16 and try again. J |
|
From: Eyal L. <ey...@ey...> - 2007-11-12 11:50:45
|
Sorry to say but this actually did not fix the problem. My first test ran past where it failed before but it did fail later and I can now see that the problem is not resolved. Though the message is slightly different now (tset/lset): Helgrind: Fatal internal error -- cannot continue. Helgrind: mk_SHVAL_ShR(tset=16384,lset=1): FAILED Helgrind: max allowed tset=16383, lset=65535 Helgrind: program has too many thread sets or lock sets to track. Julian Seward wrote: >> Helgrind: Fatal internal error -- cannot continue. >> Helgrind: mk_SHVAL_ShR(tset=8192,lset=1): FAILED >> Helgrind: max allowed tset=8191, lset=131071 >> Helgrind: program has too many thread sets or lock sets to track. > > Yes. It's a problem I'm tracking. Really it is trying to squeeze > too much information into a 32-bit word. It's probably fixable properly > at some performance loss. > > In the meantime, at hg_main.c:902 reduce N_LSID_BITS from 17 to 16 and > try again. > > J -- Eyal Lebedinsky (ey...@ey...) |
|
From: Julian S. <js...@ac...> - 2007-11-12 12:15:13
|
> > In the meantime, at hg_main.c:902 reduce N_LSID_BITS from 17 to 16 and > > try again. change to 15 and try again. J |
|
From: Eyal L. <ey...@ey...> - 2007-11-12 10:09:57
|
Julian, Thanks, this fixes it. Can I ask a question then? I notice that I am getting race reports for situations where I am rather sure that the only active thread is the main thread. My programs do not apply locks in these sections of the code, for example during server shutdown after all threads were join'ed. Is this possible or should I check my program again? cheers Eyal Julian Seward wrote: >> Helgrind: Fatal internal error -- cannot continue. >> Helgrind: mk_SHVAL_ShR(tset=8192,lset=1): FAILED >> Helgrind: max allowed tset=8191, lset=131071 >> Helgrind: program has too many thread sets or lock sets to track. > > Yes. It's a problem I'm tracking. Really it is trying to squeeze > too much information into a 32-bit word. It's probably fixable properly > at some performance loss. > > In the meantime, at hg_main.c:902 reduce N_LSID_BITS from 17 to 16 and > try again. > > J -- Eyal Lebedinsky (ey...@ey...) |
|
From: Julian S. <js...@ac...> - 2007-11-12 10:26:24
|
> I notice that I am getting race reports for situations where I > am rather sure that the only active thread is the main thread. It depends what you mean by the "only active thread". Are the other threads still alive but blocked somehow? Or (as below) have they really all exited? > My programs do not apply locks in these sections of the code, > for example during server shutdown after all threads were > join'ed. If all threads merge into one using pthread_join, then you are allowed to access it without a lock. The exact rule (which is somewhat more general than the sentence above implies) is described in detail in the docs, which are unfortunately hard to build. Try this: $ (cd docs && make html-docs) $ konqueror docs/html/hg-manual.html Click on "6.4.4. Restoration of Exclusive Ownership" J |
|
From: Eyal L. <ey...@ey...> - 2007-11-12 12:41:39
|
15 really did it (completed one servers test now). And to think that some call this computer "science"... Thanks Julian Seward wrote: >>> In the meantime, at hg_main.c:902 reduce N_LSID_BITS from 17 to 16 and >>> try again. > > change to 15 and try again. > > J -- Eyal Lebedinsky (ey...@ey...) |
|
From: Julian S. <js...@ac...> - 2007-11-12 14:33:19
|
> And to think that some call this computer "science"... No, this is software "engineering" :-) J |
|
From: Konstantin S. <kon...@gm...> - 2007-11-14 09:52:42
|
In my case I had to go down to 14. :( This is rather risky since we will be getting out of lock sets then. On Nov 12, 2007 3:41 PM, Eyal Lebedinsky <ey...@ey...> wrote: > 15 really did it (completed one servers test now). > |
|
From: Julian S. <js...@ac...> - 2007-11-15 03:54:49
|
Eyal, Konstantin, On Wednesday 14 November 2007 10:52, Konstantin Serebryany wrote: > In my case I had to go down to 14. :( > This is rather risky since we will be getting out of lock sets then. > > On Nov 12, 2007 3:41 PM, Eyal Lebedinsky <ey...@ey...> wrote: > > 15 really did it (completed one servers test now). I made an experimental version of Helgrind which fixes this. With it, up to 2^30 thread sets and 2^30 lock sets are allowed and so for all practical purposes you should never run out of either. However, the cost is a slowdown in the region 0% - 20% and approximately a 20% increase in memory consumption. J |