|
From: Josep M. P. C. <jos...@bs...> - 2006-12-20 14:34:23
Attachments:
valgrind-cell.diff
|
Hi, I have been trying to use valgrind on the Cell architecture. In order to make use of the Cell SPEs, I have defined two missing system calls. Find attached the diff against version 3.2.1. Yours faithfully, Josep M. Perez |
|
From: Nicholas N. <nj...@cs...> - 2006-12-20 22:19:19
|
On Wed, 20 Dec 2006, Josep M. Perez Cancer wrote: > I have been trying to use valgrind on the Cell architecture. Wow... have you had any success? AIUI, Cell has a PowerPC at its core and then 8 vector CPUs (SPUs), right? It looks like the Linux programming model is to push off units of computation to the SPUs via syscalls. Is that right? In that case, all the work done by the SPUs won't be visible to Valgrind. Hmm, looking at www.power.org/resources/devcorner/cellcorner/CellTraining_Track1/CourseCode_L2T1H1-30_LinuxonCell-Kernel.pdf it seems that SPUs might be able to call code on the main CPU? That would present difficulties for Valgrind. Also, the SPUs' local memories are mapped into the main address space? That will be difficult to handle too. The signal handling details look tricky as well. > In order to make use of the Cell SPEs, I have defined two missing system > calls. Find attached the diff against version 3.2.1. Nick |
|
From: Josep M. P. C. <jos...@bs...> - 2006-12-21 10:33:54
|
Nicholas Nethercote wrote: > On Wed, 20 Dec 2006, Josep M. Perez Cancer wrote: > >> I have been trying to use valgrind on the Cell architecture. > > Wow... have you had any success? AIUI, Cell has a PowerPC at its core > and then 8 vector CPUs (SPUs), right? It looks like the Linux > programming model is to push off units of computation to the SPUs via > syscalls. Is that right? In that case, all the work done by the SPUs > won't be visible to Valgrind. > > Hmm, looking at > www.power.org/resources/devcorner/cellcorner/CellTraining_Track1/CourseCode_L2T1H1-30_LinuxonCell-Kernel.pdf > it seems that SPUs might be able to call code on the main CPU? That > would present difficulties for Valgrind. Also, the SPUs' local > memories are mapped into the main address space? That will be > difficult to handle too. > The signal handling details look tricky as well. Sorry I haven't been more specific. This patch is just a small step towards supporting the Cell architecture. Before applying this patch, all Cell specific system calls would fail. The patch just lets the program continue running, but does not solve any of the points that you raised. Specifically, it doesn't take into account anything that happens on the SPUs nor any DMA transfer. For example, memcheck can't detect any DMA accesses. Just to give a my personal view of what would have to be implemented: 1. Cell BE specific system call support (done). 2. SPU code to UCode translation and UCode to SPU code translation. 3.a. Adding a translation phase to the preamble of the spu_run system call. 3.b. Or replacing some libspe functions so that they do the translation of the SPU code before sending it to their local store. 4. Adding the ability of handling several independent address spaces inside valgrind. 5. Adding the ability of recognising mmapped SPU memory on the PPU address space. 4. Making the SPU side capable of recognising DMA commands as memory accesses affecting both the SPU address space and [the PPU address space OR the address space of another SPU]. 5. Adding a mechanism to cope with SPU code (and data) that after the translation doesn't fit on the SPU memory. 6. Adding a mechanism to send gathered data and requests from SPU to PPU and from PPU to SPU. 7. Adding a mechanism to track mailboxes (reads to empty mailboxes, ...). 8. Adding support for SPU events. 9. Porting the existing tools. This list is probably missing some things. Nevertheless, implementing full support for the Cell BE architecture will require a huge amount of work. At this point, I only need point one implemented, which is what this patch does. Please, feel free to comment on missing items. Josep M. Perez > >> In order to make use of the Cell SPEs, I have defined two missing >> system calls. Find attached the diff against version 3.2.1. > > Nick |
|
From: Josef W. <Jos...@gm...> - 2006-12-22 11:13:40
|
On Thursday 21 December 2006 11:33, Josep M. Perez Cancer wrote: > 4. Adding the ability of handling several independent address spaces > inside valgrind. VG probably would use one PPU process per PPU/SPU. No need for supporting independent address spaces. It really would be nice to get memcheck working when multiple processes are communicating via shared memory. Should be possible when every process runs in VG: They have to share VG's meta information for the shared address ranges. DMA transfers in the case of Cell is probably easy after implementing the general feature. Sharing meta information among different VG processes could even be a starting point to speed up multithreaded code on multiprocessor (-core) machines. > 5. Adding a mechanism to cope with SPU code (and data) that after the > translation doesn't fit on the SPU memory. I can not imagine you ever want to run VG itself partly on a SPU. Why would you? Josef |
|
From: Josep M. P. C. <jos...@bs...> - 2006-12-22 11:56:59
|
Josef Weidendorfer wrote: > On Thursday 21 December 2006 11:33, Josep M. Perez Cancer wrote: > >> 4. Adding the ability of handling several independent address spaces >> inside valgrind. >> > > VG probably would use one PPU process per PPU/SPU. > No need for supporting independent address spaces. > > It really would be nice to get memcheck working when multiple > processes are communicating via shared memory. Should be possible > when every process runs in VG: They have to share VG's meta > information for the shared address ranges. > > DMA transfers in the case of Cell is probably easy after implementing > the general feature. > > Sharing meta information among different VG processes could even be > a starting point to speed up multithreaded code on multiprocessor > (-core) machines. > I fully agree. That would help and is generic enough to be useful for all platforms. > >> 5. Adding a mechanism to cope with SPU code (and data) that after the >> translation doesn't fit on the SPU memory. >> > > I can not imagine you ever want to run VG itself partly on a SPU. > Why would you? > As you mentioned above, you could think of an SPU thread as an independent process with shared memory and special communication mechanisms. In fact, it is possible to build an SPU binary and run it from the command line. Then it would be nice if we could run memcheck on such a program, specially since the local store does not have memory protection. Furthermore, typically a program that has PPU and SPU parts starts its SPU parts by starting SPU binaries that have been encapsulated into the program itself. In that case, it would help if we could start the main program from memcheck, and then the spawned SPU "threads" could run automatically under memcheck (even if independently). The last step would be to use the shared memory infrastructure that you mentioned to coordinate the PPU and the SPU parts. For example, if the SPU program does a DMA to bring a piece of main memory to its local store, it could also bring in the information regarding that memory. Then the SPU program could run with memory checks. At some point the SPU program may send its output back to main memory. At that point the updated memory information of the output data could also be sent back to the PPU, and the PPU could merge it with its existing information. Josep M. Perez > Josef > |