|
From: Darryl M. <dar...@ne...> - 2008-11-02 04:59:05
|
tom fogal wrote: > One thing I neglected to think about earlier is how one would do a > `Can I still run?' check. There is no notification to a user process > when a context switch comes back. > > Remember that valgrind instruments instructions. You means a context switch occurred due to timeslice exhaustion whilst execution was in user-space. So the scenario you describe is referring to resumption of user-space execution. Why does it need to know ? I can not see the problem, my strategy works just fine in the situation you describe. Maybe you are confusing resumption of execution in user-space due to timeslice exhaustion with resumption of execution in user-space due to a syscall return. The syscall return case I do have things to talk about (as previously written). To add; as it is under emulation then it is true to say the user-space part of the thread spends most of its time executing valgrind code as opposed to simulated application/client code. Since it is executing valgrind code then VG has control. Thinking through your example above there would be an issue if valgrind allows a process to run an infinite loop of pure application/client code, i.e. a CPU bound application that never makes a syscall/libc/valgrind/etc function call. If I understand correctly if everything is instrumented then this never happens there is always a point where valgrind has control. If not then simply inject a checkpoint/safe-point during instrumentation. My only concern so far has been syscalls. When any thread makes a syscall you never know how long its going to take or if it will block, so you treat the situation as-if they always block, even when you know they don't. > I had a long diatribe about how you essentially want the JVM's safe > points and how ridiculously difficult it will be, as you'd have to > write a parallel scheduler. Which I still think is true, in your > general case of the choosing which thread to run. It depends what you mean by a scheduler. Anything that manipulates what gets to run could be called a scheduler. The points I've put forward propose to create artificial waits, artificial temporary suspension, artificial low priority. Artificial meaning from the application/client code perspective the observable result is as-if. I certainly wouldn't call whats proposed a parallel scheduler, it can't make something thats not runnable (due to being blocked in the kernel) runnable! The kernel can. But it can put something that is running to sleep, and thats a big enough wrench/spanner to achieve the goal. I also think you misinterpted something as I wasn't providing any emphasis on difficulty levels; you're definitely adding that meaning yourself. Your challenges have been great to hear and a good sounding board but there is nothing added that wasn't already thought about or seems a show stopper. > You really don't need that functionality though. All you want is to > guarantee that <= 1 thread is running, at all times. > > A solution to your issue -- introduce a new mutex, within valgrind. > Wrap every function you possibly can. Acquire the lock before the > function, and release it after the function. You're idea of adding a mutex lock around everything, that may work. I wouldn't be wrapping anything I would be expecting valgrind to instrument the application/client in that way on my behalf. It also won't be on a function call basis or a line of code basis as your example shows. It would be one a unit (like your notes lean towards) of work basis that is only allowed to have a single load, store or load&store with memory within it or a single function call/jump. So yes a single asm insn might need a lock around it. So code actually becomes: acquire(&vg_runnable_thread) vg_scheduler_pre_client_hook(); // run some application/client code work sys_call(); vg_scheduler_post_client_hook(); release(&vg_runnable_thread) Note that you could then obtain/influence threading control with the vg_scheduler_pre_client_hook() and the vg_scheduler_post_client_hook(). You can obviously understand the major performance penalty for your scheme. Now if you think about optimizing it for a moment... I think you arrive at the solution I've been discussing, valgrind already does cool stuff at watching memory in a byte glandular way and hey its only syscalls that are a problem in having valgrind bend the scheduling rules of the kernel. So why not intercept system calls in a not so different way to your example above, infact we can do away with the acquire() relead() stuff and roll in into the vg_scheduler_xxxx() stuff whilst wrapping all syscalls. I agree with your approach but maybe I'm a little further down the road on it. > In this scheme, you don't get to control which thread runs. But I > think Julian stressed, and I stress now, that you really don't want to > do that. Well you can control threads if you want as my version above explains. But Yes I think I've concluded on that point. It is not absolute control that is wanted it is the ability to bend the existing scheduling rules in the direction of exposing more bugs. Add to this the want of an audit trail of memory access between point A and point B, the audit must match up 100% with the SMP/CPU/memory view of a real world system. > It sounds like, if you wanted to do this, you *would* need to lock at > every instruction, since you'd need some kind of global hash table or > other data structure which would maintain this list. Ah yes, now you see. Yes a memory access would involve the questions being asked, are we interested in this extent ? do we log this access ? is this access an application error ? Doesn't memcheck already do a lot of this stuff, this was the point of valgrind to leverage on this. I'm just adding a threading twist to it. > By the way, have you heard of software transactional memory? ``STM''. > I'm not sure if there are any open-source systems which implement it. > However, an STM system must keep exactly this, except they of course > do it to provide better parallelism than the absurdity that is > threads. No thats a new one for me, I will research some as you suggest. I disagree thread aren't absurd :) > Thinking about those gives me the idea that you could probably avoid > this wrapping in regions of code which could provably not r/w a shared > variable, perhaps by knowing that the thread holds no locks. Euh. Just because a thread holds no locks doesn't mean that the thread (that holds no locks) won't violate the multi-threaded design by reading from or writing to memory it shouldn't. The reason it shouldn't is because that location is guarded by a lock (which it didn't take before access). A memory access audit will provide that info. > If you only need a hint, not a hard guarantee, change your wrappers to > set and restore the thread priority instead of grab a lock. I imagine > that'd be much cheaper. There is no feedback / assurity from the kernel (or whatever scheduler) you as using of which threads have been given the opportunity to run. We want to offer up a roll-call to all threads, some may run, some may not run, but they were offered the opportunity. I disagree its hard to implement a timeout, there are many such mechanisms to do that. Don't forget its the kernel that actually implements the timeout, a futex() would probably do the trick and provide a mechanism to be woken up to boot. You misunderstand the point of system call interception, the point is to pass back control to valgrind, not to modify the "error return code of a system call" but to modify the "exit from system call and return to caller assembly code". One is a code like a variable, the other is code as in a program. A system call maybe implemented as a function call, jump point, call gate, software interrupt, whatever... but before that moment there is assembly code around it to setup the stack/data correctly and assembly code after it where execution resumes that might deal with storing errno. We just want to "wrapper" that (aka intercept that). Now if ALL code is instrumented under emulation then there may not be a need to intercept syscalls if we can ensure control is passed back to valgrind BEFORE application/client code. This would then check a thread specific memory location to see if there is an outstanding inception event (a latching flag) for this thread and if there is it calls a function. This function then works out what the nature of the interception is and deals with it. You completely miss the point that GDB has no support for byte granular memory checking. That is the main feature of valgrind looking to be leveraged. The application/client program needs to communicate with the byte granular memory checker, whatever and wherever that is. I think gdb watchpoints only work on natural sized memory locations (4/8 bytes) they don't seem suited to the goals. Watchpoints can have the option of hardware assistance but in limited numbers. STM-like may certainly be the way, I'll see what I can find on that avenue. Darryl |
|
From: tom f. <tf...@al...> - 2008-11-02 18:01:54
|
Darryl Miles <dar...@ne...> writes: > tom fogal wrote: > > One thing I neglected to think about earlier is how one would do a > > `Can I still run?' check. There is no notification to a user process > > when a context switch comes back. > > > > Remember that valgrind instruments instructions. [snip] > Why does it need to know ? It doesn't; I wrote this before I understood your `acquire a VG-internal lock at syscall entry' implementation, and didn't come back and edit. > > I had a long diatribe about how you essentially want the JVM's safe > > points and how ridiculously difficult it will be, as you'd have to > > write a parallel scheduler. Which I still think is true, in your > > general case of the choosing which thread to run. > > It depends what you mean by a scheduler. [snip] > I certainly wouldn't call whats proposed a parallel scheduler, it can't > make something thats not runnable (due to being blocked in the kernel) > runnable! [snip] I don't see how that's not a scheduler. Each thread essentially has a valgrind-specific state -- whether or not you'd like it to artifically suspend or not -- and valgrind should choose at various points whether or not to pause or delay the thread based on that information. But this part of the discussion is pedantic at this point. > > You really don't need that functionality though. All you want is to > > guarantee that <= 1 thread is running, at all times. > > > > A solution to your issue -- introduce a new mutex, within valgrind. > > Wrap every function you possibly can. Acquire the lock before the > > function, and release it after the function. > > You're idea of adding a mutex lock around everything, that may work. I > wouldn't be wrapping anything I would be expecting valgrind to > instrument the application/client in that way on my behalf. Yes -- I'm trying to describe how you might hack valgrind to do what you want. [snip] > Note that you could then obtain/influence threading control with the > vg_scheduler_pre_client_hook() and the vg_scheduler_post_client_hook(). Yes.. but you don't need to. The locks around the syscall already assure you that only one thread can run. However, those hooks would be the logical place to modify the internal hash table. [snip] > I think you arrive at the solution I've been discussing [...] > > So why not intercept system calls in a not so different way to your > example above, infact we can do away with the acquire() relead() stuff > and roll in into the vg_scheduler_xxxx() stuff whilst wrapping all syscalls. Valgrind already does this. Look up function wrapping in the manual. > > In this scheme, you don't get to control which thread runs. [...] > > But Yes I think I've concluded on that point. [...] > [snip -- we agree] > > > It sounds like, if you wanted to do this, you *would* need to lock at > > every instruction, since you'd need some kind of global hash table or > > other data structure which would maintain this list. > > Ah yes, now you see. Yes a memory access would involve the questions > being asked, are we interested in this extent ? do we log this access ? > is this access an application error ? > > Doesn't memcheck already do a lot of this stuff, this was the point of > valgrind to leverage on this. I'm just adding a threading twist to it. Well, it doesn't add it to an internal data structure, as far as I know .. just reports it immediately. Yes though, seems like most of the pieces you want are already there. > > By the way, have you heard of software transactional memory? ``STM''. [snip] > > I'm not sure if there are any open-source systems which implement it. > > However, an STM system must keep exactly this, except they of course > > do it to provide better parallelism than the absurdity that is > > threads. > > No thats a new one for me, I will research some as you suggest. I > disagree thread aren't absurd :) This is way off-topic -- but threads don't scale, && they're much too difficult to get right. I think it's an `Edward Lee' that has a paper on how a better approach to parallelism would be to start from determinism, and add non-determinism, instead of the reverse like we currently do with threads. Anyway with any luck most everything will be data-parallel in a decade, and threads will be this strange thing that only operating system programmers actually use. > > Thinking about those gives me the idea that you could probably avoid > > this wrapping in regions of code which could provably not r/w a shared > > variable, perhaps by knowing that the thread holds no locks. > > Euh. Just because a thread holds no locks doesn't mean that the thread > (that holds no locks) won't violate the multi-threaded design by reading > from or writing to memory it shouldn't. The reason it shouldn't is > because that location is guarded by a lock (which it didn't take before > access). Hrm, yeah, sounds like !(B => A) here, but the point still stands in general. You should come across these sorts of optimizations as you look into STM; my particular example is flawed, but this system is going to be slow, so you'll need some sort of optimization here. > > If you only need a hint, not a hard guarantee, change your wrappers to > > set and restore the thread priority instead of grab a lock. I imagine > > that'd be much cheaper. [snip] > I disagree its hard to implement a timeout, there are many such > mechanisms to do that. It's just that jumping to timeouts normally means mucking with signals, which I gather is painful in something like valgrind. I could be wrong there. > You misunderstand the point of system call interception, the point is to > pass back control to valgrind, not to modify the "error return code of a > system call" but to modify the "exit from system call and return to > caller assembly code". [...] Okay, that makes sense; the wording was a little confusing. > Now if ALL code is instrumented under emulation then there may not be a > need to intercept syscalls if we can ensure control is passed back to > valgrind BEFORE application/client code. I don't think this is possible, for the reasons mentioned at the very top of this email -- we can't know when/where a context switch will `jump back to', and the code is already translated at that point. > You completely miss the point that GDB has no support for byte granular > memory checking. Ahh, yes, I did miss that. -tom |
|
From: Darryl M. <dar...@ne...> - 2008-11-03 00:46:07
|
tom fogal wrote: > Darryl Miles <dar...@ne...> writes: >> tom fogal wrote: > It doesn't; I wrote this before I understood your `acquire a > VG-internal lock at syscall entry' implementation, and didn't come > back and edit. > > I don't see how that's not a scheduler. Each thread essentially has a > valgrind-specific state -- whether or not you'd like it to artifically > suspend or not -- and valgrind should choose at various points whether > or not to pause or delay the thread based on that information. System calls would be left to run, i.e. any thread that wanted to jump into a system call should be left to do so (T1) but in the act of doing this at least one thread that was artificially blocked in user-space (in valgrind scheduler code) would be released to run (T2). If there are no threads to run because they are all running them thats a cool situation too. On the return from syscall of that first thread (T1) it would by default be intercepted and a simple check made. If there is already another running thread in user-space i.e. the T2 thread we unblocked then T1 will be suspended by the VG scheduler. If there was no T2 thread running then the T1 thread will be left to continue running. So thats how to get thread control so only one thread runs application/client code at one time. Sure we allow multiple threads to be running inside the kernel, running includes being delayed or blocked inside the kernel as every syscall should be treated as-if it will block. I hope that clears up how thread management is possible to do. I'm not averse to calling the valgrind thread management a "scheduler" it's just not a parallel scheduler to the one in the kernel, the one in the kernel has a different set of input events/wait-queues to the one what valgrind has. But none the less both have influence over the runability of threads and specifically the runability of application/client code being examined. Which at the end of the day is the goal. >> Note that you could then obtain/influence threading control with the >> vg_scheduler_pre_client_hook() and the vg_scheduler_post_client_hook(). > > Yes.. but you don't need to. The locks around the syscall already > assure you that only one thread can run. > > However, those hooks would be the logical place to modify the internal > hash table. The reasons for wanting only one thread to run at once: * To get a consistent log of memory access, we've now serialized all application/client code memory access. * To be able to force scheduler situations/scenarios that improve the chances of causing a threading bug to show up. This is the enforced yield, enforced the point of making the current thread that is happy to yeild go to sleep. Like nanosleep() can do. "The locks around syscalls" I'm not sure on that terminology, I don't intend to make syscalls mutually exclusive, far from it. My design states the opposite of this. The terminology I have used was to "intercept" syscalls, i.e. allow vg to do something before and after (aka wrapper). The design I hold up for discussion allows all threads to be inside the kernel at the same time, we deal with serializing application/client code on the "return from syscall". We don't need to serialize kernel calls nor put locks around them, I'm not sure where this misunderstanding comes from I don't think its from me. That will cause deadlocks. The mutexes you proposed were there to protect tiny fragments of application/client code so that only one thread of application/client code was running at any one time. They are not there to do anything in relation to syscalls, unfact during syscall entry you'd need to release that lock so allow another user-space thread to run. I still cite my technique is better and the logical progression after you try your mutex approach that way and find out performance sucks to much. > Valgrind already does this. Look up function wrapping in the manual. Absolutely. > Well, it doesn't add it to an internal data structure, as far as I > know .. just reports it immediately. Yes though, seems like most of > the pieces you want are already there. This is where you STM sounds good, a transaction log (or journal) of all access. This I'm proposing an internal data structure would be added for this functionality, I understand it may not currently work that way. Also I'm only interested in the specific locations I tell valgrind to watch I don't need a complete memory view, I don't need any reverse engineering of whats going on. > This is way off-topic -- but threads don't scale, && they're much too > difficult to get right. I think it's an `Edward Lee' that has a paper > on how a better approach to parallelism would be to start from > determinism, and add non-determinism, instead of the reverse like we > currently do with threads. > > Anyway with any luck most everything will be data-parallel in a > decade, and threads will be this strange thing that only operating > system programmers actually use. Yes maybe we should start another thread on this issue, an interesting discussion point. What CPUs currently in production provide hardware assistance to this data access model. What languages/compilers/tools ? What real world problems need it (stuff we just cant do today without it) ? Maybe a decade is optimistic, thats about 4 iterations of language/software technology these days, or 2.5 iterations of CPU/hardware. > It's just that jumping to timeouts normally means mucking with > signals, which I gather is painful in something like valgrind. I > could be wrong there. futex() doesn't use signals, but allows a thread to go to sleep with the ability to receive a wakeup-now event. nanosleep() also allows for a delay. Without use of signals. alarm() in many version of Unix require the use of signals, of which the SIGALRM might be something the application/client is using so needs special care. >> Now if ALL code is instrumented under emulation then there may not be a >> need to intercept syscalls if we can ensure control is passed back to >> valgrind BEFORE application/client code. > > I don't think this is possible, for the reasons mentioned at the very > top of this email -- we can't know when/where a context switch will > `jump back to', and the code is already translated at that point. Ah I don't understand the correlation between needing to know when a context switch occurs, ensuring valgrind has CPU control at the time we need it to and the instrumentation/translation. If it's a case that in order to use this feature full stop we need to leave the hooks in place inside the translated code then that might be a command line option to enable the API which will add a few asm instructions to the normal code path to checking if a flag is on and making a branch/call where the hooks are needed. Darryl |
|
From: tom f. <tf...@al...> - 2008-11-02 18:21:15
|
Darryl Miles <dar...@ne...> writes:
> Julian Seward wrote:
> > I can't claim to really understand this, but I do have a couple of
> > questions:
> >
> > * "it knows that T2 is blocked in the kernel"
> > how does V know that a syscall it is about do perform (on behalf
> > of the client application) will or will not block?
> > AFAICT that is unknowable from user space. And how would it
> > distinguish a block from merely a long wait?
>
> It would work on the basis that Valgrind knows a syscall is being
> performed, its running the application/client under emulation and it has
> already intercepted the entrypoints to all syscalls.
>
> Valgrind already does a similar thing (but at a higher level up) for
> malloc/free/realloc etc.
You missed his question -- valgrind sees (essentially):
read(42, &mybuf, 1024);
Is this system call going to block? Or are we going to return to user
code without any context switches? It is impossible to know a priori,
unless we can peek into the kernel.
Anyway I don't think this question is relevant anymore -- with the
lock approach, yuo don't care whether things block.
> > Do you have any concise fragments of code illustrating what problem
> > it is you are really trying to solve?
>
> "Concise" might seem a little subjective.
Look at it from a developer point of view. Think of a stupid, 10 line
program that you don't actually want to debug because it's easy, and
show the output you'd want to get from valgrind on that program.
I'll make up an example:
T1 T2 T3 T4
acquire(&x) acquire(&x) read(abc) write(abc)
write(abc) read(abc) y = abc*18 z =42/abc
release(&x) release(&x) ... whatever ..
Darryl (I think) wants to see is something like:
while in T1's C.S. `x', the variable `abc':
saw a write from T1 of size whatever
saw a write from T4 of size whatever
saw a read from T3 of size whatever
saw a read from T4 of size whatever
(of course, that's just one possible ordering that I'm making up)
Where those `saw a' lines define a total ordering of the memory access
to `abc'.
Having this sort of audit is useful for deriving/proving ideas such
as, `if I can guarantee that T1 always writes before T* reads, then my
system will always have the correct result'. This is a bit
lower-level than a simple race detection algorithm, which takes these
kinds of accesses and has built-in logic to decide whether or not they
are acceptable. Furthermore, as I understand `Eraser' at least [1],
there is no notion of `history' -- variables have a state, and there
is a state transition graph which changes based on which locks are
held as data are accessed, but there is nothing which keeps track of
the path travelled through that state transition graph.
[snip]
I don't think the rest is relevant until we see a first stab at an
implementation.
-tom
[1] I have a vague recollection that Helgrind was based on that
algorithm, back in the day..
|
|
From: Darryl M. <dar...@ne...> - 2008-11-03 01:56:12
|
tom fogal wrote: > Darryl Miles <dar...@ne...> writes: > > You missed his question -- valgrind sees (essentially): > > read(42, &mybuf, 1024); > > Is this system call going to block? Or are we going to return to user > code without any context switches? It is impossible to know a priori, > unless we can peek into the kernel. > > Anyway I don't think this question is relevant anymore -- with the > lock approach, yuo don't care whether things block. No system calls will be blocked or otherwise impeded by the valgrind scheduler. I'm not sure on this use of the term "with the lock" in relation to talking about syscalls. There is no lock around syscalls, only around application/client code, this is to ensure only one application/client thread is running at any one time. So from the above and in answer to your direct question. It does not matter if the read() syscall (or any other syscall) blocks, since it does not break the design. As and when context-switches occur they occur, again it does not matter where and when they occur. The design I propose does not break if a context switch occurs at the wrong moment nor break from not knowing when a context switch occured. >>> Do you have any concise fragments of code illustrating what problem >>> it is you are really trying to solve? >> "Concise" might seem a little subjective. > > Look at it from a developer point of view. Think of a stupid, 10 line > program that you don't actually want to debug because it's easy, and > show the output you'd want to get from valgrind on that program. > > I'll make up an example: > > T1 T2 T3 T4 > acquire(&x) acquire(&x) read(abc) write(abc) > write(abc) read(abc) y = abc*18 z =42/abc > release(&x) release(&x) ... whatever .. > > Darryl (I think) wants to see is something like: > > while in T1's C.S. `x', the variable `abc': > saw a write from T1 of size whatever > saw a write from T4 of size whatever > saw a read from T3 of size whatever > saw a read from T4 of size whatever should be "saw a read from T2 of size whatever" in last line, but yes understood on your example. > (of course, that's just one possible ordering that I'm making up) > > Where those `saw a' lines define a total ordering of the memory access > to `abc'. Yes yes yes the audit of memory access. The issue where the scheduler comes into play is to put code in T1 during the "acquire(&x)" to basically instruct valgrind with: "Hey I've got my application/client code lock with 'acquite(&x)' now but I want you to try as hard as you can to run T2,T3,T4 as far as you can for a little while, then come back to me". So T1 is artificially put to sleep just after the lock acquisition. This moment would also atomically enable the earmark areas of memory the developer is interested in for this event. Valgrind then might resume T2. T2 will then attempt the same acquire(&x) in doing this it will call futex() syscall to attempt to gain the lock and if unsuccessful go to sleep. No we know from the example it will be unsuccessful but inline with my rules on scheduling because it is making a syscall it will in the process of doing that take up T3. T2 will continue into the kernel and ultimately be put to sleep. Valgrind has resumed T3. T3 proceeds to make syscall read() and in the process of doing that takes T4 and enters the syscall. Valgrind has resumed T4. T4 proceeds to make syscall write() and finds there are no other threads to wake and enters the syscall. The recap at this point in time all 4 threads are inside the kernel. T1 is blocked sleeping on semaphore/timeout, T2 is blocked forever waiting to aquire lock, T3 is doing a read() and T4 is doing a write(). Lets say T3 returns from syscall first. When it does so it checks to see if it can continue to run application/client code, it does this by checking that nothing else is running (in userspace). It finds nothing is running so it proceeds to run and execute "y=abc*18". Now T4 returns from syscall. It also check to see if T4 can continue to run application/client code but it finds T3 is already running so it suspends itself in the valgrind scheduler and puts itself to sleep. Now T1 returns from syscall, due to timeout. This timeout was imposed by valgrind to artificially reduce the scheduling priority of T1 to force other threads to run first. This returns and finds T3 then does something to relinquish the single thread at a time lock, T1 takes over compeletes. I'm sure you can work fill in the rest of the picture. Obviously the exact order of event might change over many runs, you indicate this understanding from your comment "one possible ordering" well the same is try of the above that is one possible ordering. You can add to it that you can presume a state where only T1 is running at the start. You can then see how maybe using HPET timers and other weighting factors you can run another thread for a period of time before deciding to artificially put it to sleep. Then select another thread which has not run and wake that one up. I'm sure you appreciate that to get a sequential log of ordered memory access out of valgrind you're going to need to serialize application/client code to be simulate one-thread-at-time. I'm also happy to depart from the "report on all memory information for the entire life of the process" paradigm. To replace with a more specific selection of memory between an arbitrary point A and point B during runtime execution. It is more useful throw out the "entire life of the process" design goal, since Point A and Point B should do most people, if that design goal is too difficult/costly to achieve. Than it is to throw out the "all memory information" design goal. i.e. it's highly likely in some circumstance to want to see all memory access between two points in time, than is it to see all access specific memory over the life of the process. The 2nd case isn't so useful. Ideally the design needs to also allow for overlapping (in same thread or different thread) point As and point Bs. So it is like a multitude of sets all being watched by different watcher contexts that are setup and torn down. Darryl |
|
From: Julian S. <js...@ac...> - 2008-11-03 08:46:39
|
> > T1 T2 T3 T4 > > acquire(&x) acquire(&x) read(abc) write(abc) > > write(abc) read(abc) y = abc*18 z =42/abc > > release(&x) release(&x) ... whatever .. > > > > Darryl (I think) wants to see is something like: > > > > while in T1's C.S. `x', the variable `abc': > > saw a write from T1 of size whatever > > saw a write from T4 of size whatever > > saw a read from T3 of size whatever > > saw a read from T4 of size whatever Getting a trace of a particular memory location is trivial; either hack the Lackey tool a bit, or simply use the following flags to Helgrind (I think DRD has similar flags): --trace-addr=0xXXYYZZ show all state changes for address 0xXXYYZZ --trace-level=0|1|2 verbosity level of --trace-addr [1] The rest of the long discussions, re scheduling, I don't understand the purpose of. The normal thing to do w.r.t. thread debugging, once you have a log of which thread accessed which location in which order, is to feed that stuff into one of several by-now standard race detection algorithms. Either a pure happens-before algorithm, as exemplified by DRD and Helgrind in 3.4, or one based on locksets, of which Helgrind in 3.3 partially exemplifies. What does messing with the scheduling get you that standard race detection algorithms don't? J |
|
From: tom f. <tf...@al...> - 2008-11-03 19:40:59
|
Julian Seward <js...@ac...> writes: > > > > Darryl (I think) wants to see is something like: > > > > > > while in T1's C.S. `x', the variable `abc': > > > saw a write from T1 of size whatever > > > saw a write from T4 of size whatever > > > saw a read from T3 of size whatever > > > saw a read from T4 of size whatever > > The rest of the long discussions, re scheduling, I don't understand the > purpose of. [snip] > What does messing with the scheduling get you that standard race > detection algorithms don't? Of course, you only see the error if it actually happens. Darryl is trying to force the problems to occur. Writing that though, maybe a static analysis tool would suit his needs better. -tom |
|
From: Julian S. <js...@ac...> - 2008-11-03 19:48:36
|
> > What does messing with the scheduling get you that standard race > > detection algorithms don't? > > Of course, you only see the error if it actually happens. Darryl is > trying to force the problems to occur. I'm not sure I'm convinced about that. A lockset based race detector can report violations of a locking discipline, that could lead to races, even if they didn't happen in this particular run. Indeed, that insensitivity to scheduling is basically the main advantage of the lockset algorithm family. J |
|
From: Darryl M. <dar...@ne...> - 2008-11-03 22:19:15
|
Julian Seward wrote: >>> What does messing with the scheduling get you that standard race >>> detection algorithms don't? >> Of course, you only see the error if it actually happens. Darryl is >> trying to force the problems to occur. > > I'm not sure I'm convinced about that. A lockset based race detector > can report violations of a locking discipline, that could lead to > races, even if they didn't happen in this particular run. Indeed, > that insensitivity to scheduling is basically the main advantage of > the lockset algorithm family. Not looking to replace that technique, no one technique is best; I'm looking to supplement it. Providing a new debugging angle on the "threading problem" for which the developer can see and verify his design is working as expected. That the developer can dictate the rules/law of operation and have a tool perform runtime verification of those laws. But before that getting the necessary data to perform that type of audit is the main task. Writing testcases to prove the design of some widget or sub-system interaction would be the way to prove that component is designed well. The work can also be reused when migrating the code to another platform/host/cpu/environment for re-verification there. The thingy being checked could be a threading primitive or could be basic interaction verification of operation with key parts of a program. For the most part my scheduling discussion is really to address the "sequential access to memory view" of what is going on. Since I can not see how you can provide that without only one thread running application/client code at any one time. It's much less about trying to force problem to occur, since there is a pretty easy way to achieve that with nanosleep() just after taking a lock. It's not ideal but it would work fine with testcases. The memory audit is far more important a goal at this stage. Darryl |