|
From: Paul M. <pa...@sa...> - 2004-03-03 11:20:40
|
I have ported valgrind to PowerPC and I have it running quite nicely, on Linux of course. A patch against valgrind-2.1.0 to add PowerPC support is at: http://ozlabs.org/~paulus/ppc-valgrind.patch.gz I didn't include the patch in this message because it is moderately large (280kB uncompressed) and I thought the mailing list software might reject it. It would be good if you (valgrind developers) could look at the patch and give me any comments you have. I would like to get this stuff included in future releases. I have ended up putting in quite a few ifdefs in the code, as that was the quickest way to make the changes I needed to get it working on PowerPC. Now that it is working I think it would be worth looking over what things need to be abstracted out and what might be the best way to do that. Hopefully that will reduce the number of ifdefs. There are still a few limitations and things left to do: * It only supports 32-bit instructions at present (doesn't handle the 64-bit part of the PowerPC architecture). * It doesn't handle Altivec instructions (Altivec provides short-vector SIMD capabilities, a bit like MMX or SSE). * I haven't generated any suppressions files for libc.so or ld.so. I seem to get lots of errors from them, particularly for programs that use dlopen. I have some prototype code to improve the valid bit computations for ADD and CMP0 (compare against 0) which reduce the reported errors a lot. Glibc seems to like using the add-fefefeff and or-7f7f7f7f tricks a lot. I have a formula for getting the valid bits for the result of an ADD exactly without too much effort. * The code to set cache parameters in cachegrind is pretty minimal at present. * I punted on VG_(saneUInstr) for now. I really should go back and fill in the parts for the new UInstrs I added. Regards, Paul. |
|
From: Nicholas N. <nj...@ca...> - 2004-03-03 11:57:17
|
On Wed, 3 Mar 2004, Paul Mackerras wrote: > I have ported valgrind to PowerPC and I have it running quite nicely, > on Linux of course. > > A patch against valgrind-2.1.0 to add PowerPC support is at: > > http://ozlabs.org/~paulus/ppc-valgrind.patch.gz Wow. I'm impressed! > It would be good if you (valgrind developers) could look at the patch > and give me any comments you have. I would like to get this stuff > included in future releases. > > I have ended up putting in quite a few ifdefs in the code, as that was > the quickest way to make the changes I needed to get it working on > PowerPC. Now that it is working I think it would be worth looking > over what things need to be abstracted out and what might be the best > way to do that. Hopefully that will reduce the number of ifdefs. > > There are still a few limitations and things left to do: > > * It only supports 32-bit instructions at present (doesn't handle the > 64-bit part of the PowerPC architecture). > > * It doesn't handle Altivec instructions (Altivec provides > short-vector SIMD capabilities, a bit like MMX or SSE). I just had a quick look. Here are some thoughts that pop immediately into my head: - Could you write up a description of the main changes you made, which parts were affected a lot, which parts weren't affected much, anything you've learnt? No substitute for reading the entire patch, but valuable as a high-level introduction. - Some big changes have been made ("full virtualization") since 2.1.0. I don't know how much these changes would affect your patch. - How much have you tested it? Does it run big programs like OpenOffice and Mozilla? Does it pass all the regression tests? ("make regtest") - All the #ifdef-ery is ugly, as you say. In particular, it will be painfully obvious to you that UCode is heavily geared towards x86. I see you've had to add 17 new UInstrs. And handling Altivec would require more (and you can see how awful the MMX/SSE UInstrs are). Julian, Jeremy and I have given quite a lot of thought to architecture ports. The key requirement, in our minds, is that we don't want an NxM code explosion, where N is the number of architectures, and M is the number of skins. (Also, factor in a xP for P different operating systems.) With the "fat UCode" approach you've taken, you unfortunately have this problem, as you've seen with all the skin changes that were required. Finding a design that supports multiple architectures without requiring skin rewriting is extremely challenging. We have a Wiki (www.goop.org/twiki/bin/view/Valgrind/ValgrindFutures) where we've discussed various things. Much of it is 6--9 months old, and a bit out of date. But the "IrKnowledge" and "IrOpinions" pages show some of our current thinking; they are a distillation of several failed half-attempts to come up with a concrete design. So I'm personally not convinced the approach you've taken is the right way to go -- it might be acceptable for two architectures, but what happens when someone wants a SPARC port, or an ARM port, etc? Obviously there must be some architecture-specific bits (eg. asm disassembly and code-regeneration), but keeping those bits separate from the skins seems, to me, crucial. Anyway, I congratulate you heartily for undertaking such a big task. Even if your patch doesn't get included as-is, it should provide an extremely valuable base for further work. I won't say more for the moment; I'll be interested to see what other people think. As a first step, would you be interested in bundling it up (with "make dist") and putting it somewhere so people can try it? I can link to it from the Valgrind page. [Hmm... Doug -- would you be interested in doing the same with your x86/BSD patch?] N |
|
From: Doug R. <df...@nl...> - 2004-03-03 17:29:08
|
On Wed, 2004-03-03 at 11:51, Nicholas Nethercote wrote: > > [Hmm... Doug -- would you be interested in doing the same with your > x86/BSD patch?] My subversion tree is publically accessible. You can check out my 'reasonably stable' tree (based on valgrind cvs from about 10th January) with: $ svn co svn://svn.rabson.org/repos/valgrind/branches/dfr \ valgrind or you can look at the tree I've been syncing to a more recent valgrind cvs (29th February) with: $ svn co svn://svn.rabson.org/repos/valgrind/branches/dfr-merge \ valgrind I haven't been making public snapshots on a regular basis since I'm still fixing stuff on a fairly regular basis and the few people I've been testing this stuff with seem happy using subversion to keep in sync. |
|
From: Madhu M K. <mm...@ya...> - 2004-03-03 19:16:35
|
Doug Rabson <df...@nl...> said on March 3,2004: > On Wed, 2004-03-03 at 11:51, Nicholas Nethercote wrote: > > > > > [Hmm... Doug -- would you be interested in doing the same with your > > x86/BSD patch?] > > My subversion tree is publically accessible. You can check out my > 'reasonably stable' tree (based on valgrind cvs from about 10th > January) with: I can second that. The FreeBSD port works well - there are a few edge cases ( zombie processes and getting stuck in poll loops), but mostly, it works perfectly. Cheerio, M Madhu M Kurup /* Nemo Me Impune Lacessit */ mmk at yahoo-inc dt com |
|
From: Julian S. <js...@ac...> - 2004-03-03 19:44:48
|
> > A patch against valgrind-2.1.0 to add PowerPC support is at: > > > > http://ozlabs.org/~paulus/ppc-valgrind.patch.gz Amazing. /me is also impressed. How much work was it? > I just had a quick look. Here are some thoughts that pop immediately into > my head: > > - Could you write up a description of the main changes you made, which > parts were affected a lot, which parts weren't affected much, anything > you've learnt? No substitute for reading the entire patch, but valuable > as a high-level introduction. Request seconded by me ... > - Some big changes have been made ("full virtualization") since 2.1.0. I > don't know how much these changes would affect your patch. > > - How much have you tested it? Does it run big programs like OpenOffice > and Mozilla? Does it pass all the regression tests? ("make regtest") Yes, this is critical. How stable is it? > - All the #ifdef-ery is ugly, as you say. In particular, it will be > painfully obvious to you that UCode is heavily geared towards x86. I > see you've had to add 17 new UInstrs. And handling Altivec would > require more (and you can see how awful the MMX/SSE UInstrs are). > > Julian, Jeremy and I have given quite a lot of thought to architecture > ports. The key requirement, in our minds, is that we don't want an NxM > code explosion, where N is the number of architectures, and M is the > number of skins. (Also, factor in a xP for P different operating > systems.) With the "fat UCode" approach you've taken, you unfortunately > have this problem, as you've seen with all the skin changes that were > required. We need to make a strategic decision about porting soon. If we do nothing someone for sure will do basically the same as Paul has done, but for AMD64, and then we will surely be in #ifdef hell. > As a first step, would you be interested in bundling it up (with "make > dist") and putting it somewhere so people can try it? I can link to it > from the Valgrind page. Yes, good plan. Let others hammer on it a bit and see how it holds together. Congratulations on excellent hackery. J |
|
From: Paul M. <pa...@sa...> - 2004-03-04 01:12:50
|
Julian Seward writes:
> Amazing. /me is also impressed. How much work was it?
About 4 weeks work, on my own time, since I started hacking on it. I
had been looking and the code and thinking about what would be needed
before that.
> > - How much have you tested it? Does it run big programs like OpenOffice
> > and Mozilla? Does it pass all the regression tests? ("make regtest")
>
> Yes, this is critical. How stable is it?
Mozilla started up but seemed to hang. I need to do some more work on
the pthreads stuff.
> Congratulations on excellent hackery.
Thanks.
Paul.
|
|
From: Julian S. <js...@ac...> - 2004-03-03 19:46:37
|
On Wednesday 03 March 2004 19:29, Madhu M Kurup wrote: > Doug Rabson <df...@nl...> said on March 3,2004: > > On Wed, 2004-03-03 at 11:51, Nicholas Nethercote wrote: > > > [Hmm... Doug -- would you be interested in doing the same with your > > > x86/BSD patch?] > > > > My subversion tree is publically accessible. You can check out my > > 'reasonably stable' tree (based on valgrind cvs from about 10th > > January) with: > > I can second that. The FreeBSD port works well - there are a few edge > cases ( zombie processes and getting stuck in poll loops), but mostly, > it works perfectly. In that case, can one or the other of you use the autotest scripts in nightly/ to run overnight tests, so we can monitor the status? For that matter, doing the same for the ppc port wouldn't be a bad idea. Thanks, J |
|
From: Doug R. <df...@nl...> - 2004-03-04 10:19:25
|
On Wed, 2004-03-03 at 19:47, Julian Seward wrote: > On Wednesday 03 March 2004 19:29, Madhu M Kurup wrote: > > Doug Rabson <df...@nl...> said on March 3,2004: > > > On Wed, 2004-03-03 at 11:51, Nicholas Nethercote wrote: > > > > [Hmm... Doug -- would you be interested in doing the same with your > > > > x86/BSD patch?] > > > > > > My subversion tree is publically accessible. You can check out my > > > 'reasonably stable' tree (based on valgrind cvs from about 10th > > > January) with: > > > > I can second that. The FreeBSD port works well - there are a few edge > > cases ( zombie processes and getting stuck in poll loops), but mostly, > > it works perfectly. > > In that case, can one or the other of you use the autotest scripts in > nightly/ to run overnight tests, so we can monitor the status? > > For that matter, doing the same for the ppc port wouldn't be a bad > idea. I'll probably set this up fairly soon. |
|
From: Julian S. <js...@ac...> - 2004-03-03 19:56:38
|
> reported errors a lot. Glibc seems to like using the add-fefefeff > and or-7f7f7f7f tricks a lot. I have a formula for getting the > valid bits for the result of an ADD exactly without too much effort. Ha! Tell me what it is (pretty please). I have been wondering how to do that for a long time, without success, since it would also improve accuracy on x86. > * I punted on VG_(saneUInstr) for now. I really should go back and > fill in the parts for the new UInstrs I added. Yes ... do. You'd be amazed how many hours of debugging saneUInstr has saved us collectively. J |
|
From: Paul M. <pa...@sa...> - 2004-03-04 01:12:50
|
Julian Seward writes: > > reported errors a lot. Glibc seems to like using the add-fefefeff > > and or-7f7f7f7f tricks a lot. I have a formula for getting the > > valid bits for the result of an ADD exactly without too much effort. > > Ha! Tell me what it is (pretty please). I have been wondering how to do > that for a long time, without success, since it would also improve accuracy > on x86. If you have values A and B, with valid bits QA and QB, let A_min = A & ~QA A_max = A | QA B_min = B & ~QB B_max = B | QB If we then let the sum S = A + B, then QS = QA | QB | ((A_min + B_min) ^ (A_max + B_max)) Proof: Let C be the carry-in to each bit of the sum. Then S = A ^ B ^ C Thus QS = QA | QB | QC. If we can work out QC, we are set. Now, C is monotonic in A and B, in the sense that if you change a bit in A or B from 0 to 1, you may get bits in C changing from 0 to 1, but you won't get bits changing from 1 to 0. (I can give a proof of this if you like; it involves looking at which bit positions of the sum generate or propagate carries.) If we consider the range of possible A and B values from A_min to A_max and B_min to B_max, we therefore have the smallest C value (in an unsigned sense) for A_min + B_min, and the largest C value for A_max + B_max. In other words C_min = (A_min + B_min) ^ A_min ^ B_min C_max = (A_max + B_max) ^ A_max ^ B_max so QC = C_min ^ C_max = (A_min + B_min) ^ (A_max + B_max) ^ A_min ^ A_max ^ B_min ^ B_max = (A_min + B_min) ^ (A_max + B_max) ^ QA ^ QB since QA = A_max ^ A_min and similarly for QB. Then QS = QA | QB | ((A_min + B_min) ^ (A_max + B_max) ^ QA ^ QB) However, the only time the value of a bit of QC matters is when we have 0 in the corresponding bits of QA and QB. Therefore we can drop the "^ QA ^ QB" in the QC term of this expression, yielding: QS = QA | QB | ((A_min + B_min) ^ (A_max + B_max)) Regards, Paul. |
|
From: Paul M. <pa...@sa...> - 2004-03-04 00:49:38
|
Nicholas Nethercote writes:
> - Could you write up a description of the main changes you made, which
> parts were affected a lot, which parts weren't affected much, anything
> you've learnt? No substitute for reading the entire patch, but valuable
> as a high-level introduction.
I'll do that, tonight hopefully.
> - Some big changes have been made ("full virtualization") since 2.1.0. I
> don't know how much these changes would affect your patch.
I'll check out the cvs repository on sourceforge.net and look at what
has changed. I assume that sourceforge.net is the right place to go?
> - How much have you tested it? Does it run big programs like OpenOffice
> and Mozilla? Does it pass all the regression tests? ("make regtest")
Not yet, still working on that.
> Julian, Jeremy and I have given quite a lot of thought to architecture
> ports. The key requirement, in our minds, is that we don't want an NxM
> code explosion, where N is the number of architectures, and M is the
> number of skins. (Also, factor in a xP for P different operating
> systems.) With the "fat UCode" approach you've taken, you unfortunately
> have this problem, as you've seen with all the skin changes that were
> required.
Mostly in the memcheck skin. I think that with the other skins, we
could avoid the need for changes in the skins with some suitable
abstractions. (I can see one exception to that though: the code in
cachegrind that works out what the cache parameters are.)
> Finding a design that supports multiple architectures without requiring
> skin rewriting is extremely challenging. We have a Wiki
> (www.goop.org/twiki/bin/view/Valgrind/ValgrindFutures) where we've
> discussed various things. Much of it is 6--9 months old, and a bit out
> of date. But the "IrKnowledge" and "IrOpinions" pages show some of our
> current thinking; they are a distillation of several failed
> half-attempts to come up with a concrete design.
Interesting. If we allow skins to extend the ucode then we will
inevitably have architecture-specific code in the skins. We could
import the ucode extensions from memcheck into the core and then not
allow skins to extend the ucode. That shouldn't be too bad since
skins can always use CCALL to do any funky stuff they need to do.
Memcheck is a bit of a special case since it is the "primary" skin and
because it is important that the valid bit computations are reasonably
fast. For that reason I wouldn't like to see all the tag computations
done as CCALLs. Some of them could be done with ordinary ucode
instructions (e.g. AND, OR) but some of them couldn't be done easily
and efficiently that way (e.g. Tag_PCast40).
> So I'm personally not convinced the approach you've taken is the right
> way to go -- it might be acceptable for two architectures, but what
> happens when someone wants a SPARC port, or an ARM port, etc?
> Obviously there must be some architecture-specific bits (eg. asm
> disassembly and code-regeneration), but keeping those bits separate from
> the skins seems, to me, crucial.
I agree that that is the direction to head. Valgrind is such a useful
tool though that I want to have something that is usable on PPC now,
without having to wait until we have worked out the exact right way to
do things. :)
> As a first step, would you be interested in bundling it up (with "make
> dist") and putting it somewhere so people can try it? I can link to it
> from the Valgrind page.
Good idea. I have put it at
http://ozlabs.org/~paulus/valgrind-2.1.0-ppc.tar.bz2
This includes a fixes for a couple of bugs I found last night.
Regards,
Paul.
|
|
From: Tom H. <th...@cy...> - 2004-03-04 07:31:28
|
In message <164...@ca...>
Paul Mackerras <pa...@sa...> wrote:
> Nicholas Nethercote writes:
>
> > - Some big changes have been made ("full virtualization") since 2.1.0. I
> > don't know how much these changes would affect your patch.
>
> I'll check out the cvs repository on sourceforge.net and look at what
> has changed. I assume that sourceforge.net is the right place to go?
No - it was was moved to the KDE repository some time ago. Full
details of how to check it out are on the web site.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-03-04 10:05:43
|
On Thu, 4 Mar 2004, Paul Mackerras wrote: > > But the "IrKnowledge" and "IrOpinions" pages show some of our > > current thinking; they are a distillation of several failed > > half-attempts to come up with a concrete design. > > Interesting. If we allow skins to extend the ucode then we will > inevitably have architecture-specific code in the skins. We could > import the ucode extensions from memcheck into the core and then not > allow skins to extend the ucode. That shouldn't be too bad since > skins can always use CCALL to do any funky stuff they need to do. > Memcheck is a bit of a special case since it is the "primary" skin and > because it is important that the valid bit computations are reasonably > fast. For that reason I wouldn't like to see all the tag computations > done as CCALLs. Some of them could be done with ordinary ucode > instructions (e.g. AND, OR) but some of them couldn't be done easily > and efficiently that way (e.g. Tag_PCast40). Our idea was to have a platform-independent language that skins in which skins would write their instrumentation. It would presumably look a lot like UCode. As for skin-specific UInstrs, we thought ditching them would be ok; Memcheck is the only one that uses them, and they're not really necessary -- even Tag_PCast40 is just a NEG, SBB, OR which is currently expressible in UCode. We could have a special instruction in this instrumentation language with which a skin can create any (arch-specific) instruction it wants by just specifying the naked bytes, so skins could generate any instruction if they really wanted (ie. make the common case easy, and the uncommon case possible). SIMD instructions and registers are a big complication. It seems that programs will increasingly use them in ways that normal integer registers are currently used, which will require having a sensible way to instrument them. > > So I'm personally not convinced the approach you've taken is the right > > way to go -- it might be acceptable for two architectures, but what > > happens when someone wants a SPARC port, or an ARM port, etc? > > Obviously there must be some architecture-specific bits (eg. asm > > disassembly and code-regeneration), but keeping those bits separate from > > the skins seems, to me, crucial. > > I agree that that is the direction to head. Valgrind is such a useful > tool though that I want to have something that is usable on PPC now, > without having to wait until we have worked out the exact right way to > do things. :) No problem -- I'm sure other PowerPC users will be happy you've done so :) And you've done a massively helpful thing by concretely identifying the x86-specific parts of Valgrind. > Good idea. I have put it at > > http://ozlabs.org/~paulus/valgrind-2.1.0-ppc.tar.bz2 > > This includes a fixes for a couple of bugs I found last night. Thanks. I've linked to this from valgrind.kde.org. Keep updating it if you make improvements (and the patch, too). Paul, are you subscribed to valgrind-developers? N |
|
From: Jeremy F. <je...@go...> - 2004-03-04 22:45:08
|
On Thu, 2004-03-04 at 01:59, Nicholas Nethercote wrote: > Our idea was to have a platform-independent language that skins in which > skins would write their instrumentation. It would presumably look a lot > like UCode. As for skin-specific UInstrs, we thought ditching them would > be ok; Memcheck is the only one that uses them, and they're not really > necessary -- even Tag_PCast40 is just a NEG, SBB, OR which is currently > expressible in UCode. Yes, but not portably. It relies on the x86 behaviour of NEG setting carry depending on whether its argument is zero or not. Unless we define the NEG UOp to contain this, it isn't very portable. Paul's implementation of Tag_PCast40 is very different, and uses the PPC's extensive set of bit-swizzling instructions. It seems to me that the PCast* operations can be composed out of various fairly generic pieces which we could make into UCode operations (things to expand any 0 bit to all zero bits, any 1 bit to all 1 bits, etc). > We could have a special instruction in this instrumentation language with > which a skin can create any (arch-specific) instruction it wants by just > specifying the naked bytes, so skins could generate any instruction if > they really wanted (ie. make the common case easy, and the uncommon case > possible). Well, since we're planning to have one anyway to carry the client's instructions through, it will just be a matter of inserting those. J |