|
From: Edward W. <ew...@ta...> - 2006-10-06 20:08:16
|
Hi, I'm researching ways of adding speculative (or non-speculative) parallelism into the translation-run component in Valgrind. Over the last couple of days, I've been experimenting with just invoking run_thread_for_a_while (in scheduler.c) in a separate thread. =20 I do this by invoking the clone sys call that starts a helper thread that invokes run_thread_for_a_while when VG_(scheduler)() tells it to. The clone syscall I use includes the VKI_CLONE_VM|VKI_CLONE_FS| etc. flags, with a stack I give it allocated from VG_(am_alloc_VgStack)(). For now, everything runs serially, i.e. VG_(scheduler)() stops when run_thread_for_a_while() runs. No rocket science so far ... However, when I run valgrind (with the "none" tool) all works for a while, and then my helper thread crashes with the message: =3D=3D16024=3D=3D Stack overflow in thread 0: can't grow stack to = 0xBE873FD4 (sig=3D11) I've checked the stack of the helper thread, and its 0x621DBFEC, so it's nowhere near the location where it is complaining about. I'm pretty sure I'm doing something trivially stupid. Any help will be greatly appreciated. =20 Thanks, Ed |
|
From: Nicholas N. <nj...@cs...> - 2006-10-06 22:03:50
|
On Fri, 6 Oct 2006, Edward Walker wrote: > > I'm researching ways of adding speculative (or non-speculative) > parallelism into the translation-run component in Valgrind. Can I ask what is the motivation for this? I'm curious... > Over the last couple of days, I've been experimenting with just invoking > run_thread_for_a_while (in scheduler.c) in a separate thread. > > I do this by invoking the clone sys call that starts a helper thread > that invokes run_thread_for_a_while when VG_(scheduler)() tells it to. > The clone syscall I use includes the VKI_CLONE_VM|VKI_CLONE_FS| etc. > flags, with a stack I give it allocated from VG_(am_alloc_VgStack)(). > For now, everything runs serially, i.e. VG_(scheduler)() stops when > run_thread_for_a_while() runs. No rocket science so far ... > > However, when I run valgrind (with the "none" tool) all works for a > while, and then my helper thread crashes with the message: > > ==16024== Stack overflow in thread 0: can't grow stack to 0xBE873FD4 > (sig=11) > > I've checked the stack of the helper thread, and its 0x621DBFEC, so it's > nowhere near the location where it is complaining about. > > I'm pretty sure I'm doing something trivially stupid. Any help will be > greatly appreciated. I don't think you're doing something trivially stupid, rather something that Valgrind is not at all designed to handle. Valgrind's execution is deliberately serialised, for reasons described in the first paragraph of section 2.3.9 in http://www.valgrind.org/docs/phd2004.pdf. I'm surprised it ran successfully for as long as it did. Nick |
|
From: Edward W. <ew...@ta...> - 2006-10-07 00:13:53
|
>> >> I'm researching ways of adding speculative (or non-speculative) >> parallelism into the translation-run component in Valgrind. > >Can I ask what is the motivation for this? I'm curious... The motivation is actually pretty simple: performance. I'm investigating alternative BB issue techniques and methods for breaking input/output/anti dependencies. The pervasiveness of multi-core makes this very relevant and speculative thread speculation is just _one_ possible technique which enables this. > I don't think you're doing something trivially stupid, rather something that=20 > Valgrind is not at all designed to handle. Valgrind's execution is=20 > deliberately serialised, for reasons described in the first paragraph of=20 > section 2.3.9 in http://www.valgrind.org/docs/phd2004.pdf. I'm surprised it=20 > ran successfully for as long as it did. Sure. I'm in an area where I hear that comment quite often: "this application can't be parallelized". That's often not true if we can identify all the data dependencies, and investigate novel issuing techniques that can break those dependencies. Just curious if you have any ideas as to where this stack exception could be coming from? Any help would be greatly appreciated. I don't have this problem if I do it in QEMU for example, but valgrind makes for a better platform to start from because of its more efficient implementation. Alternatively, I'll be moving back to QEMU if this is just not possible. Many thanks in advance, Ed |
|
From: Nicholas N. <nj...@cs...> - 2006-10-07 01:43:04
|
On Fri, 6 Oct 2006, Edward Walker wrote: >>> I'm researching ways of adding speculative (or non-speculative) >>> parallelism into the translation-run component in Valgrind. >> >> Can I ask what is the motivation for this? I'm curious... > > The motivation is actually pretty simple: performance. I'm > investigating alternative BB issue techniques and methods for breaking > input/output/anti dependencies. The pervasiveness of multi-core makes > this very relevant and speculative thread speculation is just _one_ > possible technique which enables this. So you're trying to make Valgrind go faster, or dynamic binary translation systems in general? >> I don't think you're doing something trivially stupid, rather something >> that Valgrind is not at all designed to handle. Valgrind's execution is >> deliberately serialised, for reasons described in the first paragraph of >> section 2.3.9 in http://www.valgrind.org/docs/phd2004.pdf. I'm surprised >> it ran successfully for as long as it did. > > Sure. I'm in an area where I hear that comment quite often: "this > application can't be parallelized". I didn't say that. > Just curious if you have any ideas as to where this stack exception > could be coming from? Any help would be greatly appreciated. I don't > have this problem if I do it in QEMU for example, but valgrind makes for > a better platform to start from because of its more efficient > implementation. Alternatively, I'll be moving back to QEMU if this is > just not possible. I don't know. Others might. Nick |
|
From: Edward W. <ew...@ta...> - 2006-10-07 02:28:56
|
Hi Nick, >> So you're trying to make Valgrind go faster, or dynamic binary translation=20 >> systems in general? > Dynamic binary systems in general. I work on the NSF TeraGrid, a distributed high performance computing infrastructure in the US, and one of our biggest challenges is in supporting the plethora of ISAs that we host in our distributed infrastructure. So a research challenge is to enable _some_ HPC applications to be able to migrate between systems connected across our 30 gbps network backbone across the country. There are lots of people working on using VMMs like Xen to enable this, but I think for many situations, user-space dynamic binary translators like valgrind are sufficient, without the overhead of setting up entire OS VMs as containers for a running HPC job across sites.=20 And since I am in the area of HPC, performance (plus reliability) is a very important consideration. Fortunately, many of our systems are now multi-core, e.g. here in Texas we support a system with 1300 woodcrest compute nodes, each of which provides to the application a dual-CPU, dual-core processor. Furthermore the prospect of up to 80 core CPUs have already been announced by Intel, so things will be getting better (or worse depending on how you look at it). So if we can leverage this trend in ever increasing processor cores, provide a certain level of location transparency through processor emulation, this will be a big win for computational science in general. =20 I know valgrind is more then a processor emulator, unlike QEMU and QuickTransit. But unlike QuickTransit, Valgrind is open-source, and unlike QEMU, Valgrind is more efficient. =20 It's not absolutely critical that I pick the fastest platform for my research, but being in HPC, it's a lot easier to convince my colleagues if real performance gains can be seen fairly quickly. =20 Sorry if I did not convey all this information in my previous email. Email is sometimes not the best vehicle to convey complex motivation. =20 >> Sure. I'm in an area where I hear that comment quite often: "this >> application can't be parallelized". >I didn't say that. Sorry. I didn't mean to misinterpret your statement.=20 - Ed |