|
From: Oriol P. <aut...@gm...> - 2007-06-06 13:52:41
|
Hi, My name is Oriol. I'm playing with Valgrind trying to implement a tool for generating traces of powerpc binaries. I found that some instructions like dcbst (Data Cache Block Store) or dcbt (Data Cache Block Touch) are translated to no-op in toIR.c code file. I think it's not strictly true because this kind of instructions sometimes does memory accesses. I know that they are neither loads nor stores, but they access to memory giving bytes to the cache. What do you think about? Oriol |
|
From: Nicholas N. <nj...@cs...> - 2007-06-06 22:04:16
|
On Wed, 6 Jun 2007, Oriol Prat wrote: > I'm playing with Valgrind trying to implement a tool for generating > traces of powerpc binaries. > I found that some instructions like dcbst (Data Cache Block Store) or > dcbt (Data Cache Block Touch) are translated to no-op in toIR.c code > file. > I think it's not strictly true because this kind of instructions > sometimes does memory accesses. > > I know that they are neither loads nor stores, but they access to > memory giving bytes to the cache. Hmm, yes. From http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/dcbst.htm : The dcbst instruction causes any modified copy of the block to be copied to main memory. If RA is not 0, the dcbst instruction computes an effective address (EA) by adding the contents of general-purpose register (GPR) RA to the contents of GPR RB. Otherwise, the EA is the contents of RB. If the cache block containing the addressed byte is in the data cache and is modified, the block is copied to main memory. The dcbst instruction may be used to ensure that the copy of a location in main memory contains the most recent updates. This may be important when sharing memory with an I/O device that does not participate in the coherence protocol. In addition, the dcbst instruction can ensure that updates are immediately copied to a graphics frame buffer. Treat the dcbst instruction as a load from the addressed byte with respect to address translation and protection. In other words, it copies a value from the cache to main memory if main memory is not up-to-date with respect to the cache. This is required with I/O devices that are not aware of the possible difference between the cache and main memory. Memcheck should treat it as a load, eg. check the address is addressable. And http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/dcbt.htm : The dcbt instruction may improve performance by anticipating a load from the addressed byte. The block containing the byte addressed by the effective address (EA) is fetched into the data cache before the block is needed by the program. The program can later perform loads from the block and may not experience the added delay caused by fetching the block into the cache. Executing the dcbt instruction does not invoke the system error handler. If general-purpose register (GPR) RA is not 0, the effective address (EA) is the sum of the content of GPR RA and the content of GPR RB. Otherwise, the EA is the content of GPR RB. Consider the following when using the dcbt instruction: * If the EA specifies a direct store segment address, the instruction is treated as a no-op. * The access is treated as a load from the addressed cache block with respect to protection. If protection does not permit access to the addressed byte, the dcbt instruction performs no operations. Note: If a program needs to store to the data cache block, use the dcbtst (Data Cache Block Touch for Store) instruction. In other words, it's a prefetch, copying a value from main memory to the cache. Memcheck should treat it also as a load.o Nick |
|
From: Paul M. <pa...@sa...> - 2007-06-06 22:56:16
|
Oriol Prat writes: > I'm playing with Valgrind trying to implement a tool for generating > traces of powerpc binaries. > I found that some instructions like dcbst (Data Cache Block Store) or > dcbt (Data Cache Block Touch) are translated to no-op in toIR.c code > file. > I think it's not strictly true because this kind of instructions > sometimes does memory accesses. > > I know that they are neither loads nor stores, but they access to > memory giving bytes to the cache. > > What do you think about? Those instructions don't generate an exception if the memory address isn't mapped or can't be accessed, and they don't cause any transfer of data between registers and/or memory. In other words, they don't affect the architected state of the machine at all. They are just hints and in fact are not required by the architecture specification to do anything at all. So treating them as no-ops is fine. Paul. |
> Consider the following when using the dcbt instruction: > > * If the EA specifies a direct store segment address, the instruction > is treated as a no-op. > * The access is treated as a load from the addressed cache block with > respect to protection. If protection does not permit access to the > addressed byte, the dcbt instruction performs no operations. Note that there can be *no exceptions* (of any kind) as a result of dcbt and dcbst. Therefore, *any* address is perfectly valid. > Note: > If a program needs to store to the data cache block, use the dcbtst > (Data Cache Block Touch for Store) instruction. > > In other words, it's a prefetch, copying a value from main memory to the > cache. Memcheck should treat it also as a load.o dcbt (and dcbtst) is strictly a _hint_ about _performance_ that the programmer or compiler chooses to give to the hardware. The hardware is free to ignore *all* hints, and some implementations *do* ignore dcbt/dcbtst entirely. dcbt/dcbtst does not alter semantics in *any* way. In contrast, dcbz _does_ alter the semantics. Namely: store zero into the entire cache line (and therefore, logically into memory) without fetching the line from memory. Memcheck (and everything except cachegrind) should treat dcbt and dcbtst as a no-operation. Even if the address is "wild", then there is no change to the output. It might be somewhat slower [or faster, because performance is not necessarily monotonic with caching when there is contention], but that is the only possible effect. -- |
|
From: Nicholas N. <nj...@cs...> - 2007-06-06 23:57:45
|
On Wed, 6 Jun 2007, John Reiser wrote: > dcbt (and dcbtst) is strictly a _hint_ about _performance_ that the > programmer or compiler chooses to give to the hardware. Ah, my mistake. Thanks for the correction. So the good news is that Valgrind is correct and we don't need to change anything :) Nick |
|
From: Oriol P. <aut...@gm...> - 2007-06-07 08:57:19
|
Hi, Processor can ignore the instruction or cannot ignore it so if the processor is effectively executing the instruction and the address are correct then some memory to cache traffic could appear. I know that there are no changes to processor state but I'm agree with John that cachegrind should know about these possible cache accesses. I don't know if the other architectures supported by Valgrind have such kind of prefetch instructions. I would propose a new IR instruction like Ist_Cache or Ist_Prefetch that would'nt change the processor state but have a structure with a memory reference and a data length. Then tools could access to these memory addresses and count them or not. Has it sense? Is it absurd? :) Oriol 2007/6/7, Nicholas Nethercote <nj...@cs...>: > On Wed, 6 Jun 2007, John Reiser wrote: > > > dcbt (and dcbtst) is strictly a _hint_ about _performance_ that the > > programmer or compiler chooses to give to the hardware. > > Ah, my mistake. Thanks for the correction. > > So the good news is that Valgrind is correct and we don't need to change > anything :) > > Nick > |
|
From: Josef W. <Jos...@gm...> - 2007-06-07 10:32:11
|
On Thursday 07 June 2007, Oriol Prat wrote:
> I don't know if the other architectures supported by Valgrind have
> such kind of prefetch instructions.
There is a prefetch instruction as part the SSE instruction set on
x86. I do not know if there is a "write-back-cacheline" instruction.
> I would propose a new IR instruction like Ist_Cache or Ist_Prefetch
> that would'nt change the processor state but have a structure with a
> memory reference and a data length. Then tools could access to these
> memory addresses and count them or not.
>
> Has it sense?
Yes, it could be useful.
However, regarding cachegrind with the simple cache model,
it probably would/should not change anything.
Cachegrind is only deciding whether a memory access induces a
miss or a hit in presence of a synchronous cache without any
prefetch functionality.
Cachegrind does not store whether a cacheline was modified
or not. So execution of dcbst ("write back modified cacheline") has
no influence on cachegrinds internal simulation state and outcome.
Similarly, meaningful simulation of any prefetching is not really
possible as you need to decide whether prefetching was done in
time or not. But as cachegrind does not have any notion of time,
it is the easiest to do worst-case (1) or best-case simulation, ie.
(1) every prefetching is unsuccessful or (2) every prefetching was
in time before real use.
You could say that Cachegrind currently always does (1), and thus,
any software prefetch instructions, like dcbt, can be safely ignored.
Callgrind optionally allows to simulate a stream prefetcher with
semantic (2), ie. a best case scenario. For this, your suggestion about
adding a IR for SW prefetch instructions is useful and should be
taken into account.
Josef
> Is it absurd? :)
>
>
> Oriol
>
> 2007/6/7, Nicholas Nethercote <nj...@cs...>:
> > On Wed, 6 Jun 2007, John Reiser wrote:
> >
> > > dcbt (and dcbtst) is strictly a _hint_ about _performance_ that the
> > > programmer or compiler chooses to give to the hardware.
> >
> > Ah, my mistake. Thanks for the correction.
> >
> > So the good news is that Valgrind is correct and we don't need to change
> > anything :)
> >
> > Nick
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
|
|
From: Oriol P. <aut...@gm...> - 2007-06-07 11:14:35
|
I'm agree Josef.
So, one question.
Anybody knows how to access to process state?
I'm interested in calculate the address that these instructions are accessing.
I need the values of rA and rB registers that the instruction has
coded in the opcode in runtime.
Is there any way to do that in the API of Valgrind?
I have the opcode decoder, I only need the address of these registers
on the process state in the memory.
Thanks,
Oriol
2007/6/7, Josef Weidendorfer <Jos...@gm...>:
> On Thursday 07 June 2007, Oriol Prat wrote:
> > I don't know if the other architectures supported by Valgrind have
> > such kind of prefetch instructions.
>
> There is a prefetch instruction as part the SSE instruction set on
> x86. I do not know if there is a "write-back-cacheline" instruction.
>
> > I would propose a new IR instruction like Ist_Cache or Ist_Prefetch
> > that would'nt change the processor state but have a structure with a
> > memory reference and a data length. Then tools could access to these
> > memory addresses and count them or not.
> >
> > Has it sense?
>
> Yes, it could be useful.
>
> However, regarding cachegrind with the simple cache model,
> it probably would/should not change anything.
>
> Cachegrind is only deciding whether a memory access induces a
> miss or a hit in presence of a synchronous cache without any
> prefetch functionality.
>
> Cachegrind does not store whether a cacheline was modified
> or not. So execution of dcbst ("write back modified cacheline") has
> no influence on cachegrinds internal simulation state and outcome.
>
> Similarly, meaningful simulation of any prefetching is not really
> possible as you need to decide whether prefetching was done in
> time or not. But as cachegrind does not have any notion of time,
> it is the easiest to do worst-case (1) or best-case simulation, ie.
> (1) every prefetching is unsuccessful or (2) every prefetching was
> in time before real use.
> You could say that Cachegrind currently always does (1), and thus,
> any software prefetch instructions, like dcbt, can be safely ignored.
>
> Callgrind optionally allows to simulate a stream prefetcher with
> semantic (2), ie. a best case scenario. For this, your suggestion about
> adding a IR for SW prefetch instructions is useful and should be
> taken into account.
>
> Josef
>
>
>
> > Is it absurd? :)
> >
> >
> > Oriol
> >
> > 2007/6/7, Nicholas Nethercote <nj...@cs...>:
> > > On Wed, 6 Jun 2007, John Reiser wrote:
> > >
> > > > dcbt (and dcbtst) is strictly a _hint_ about _performance_ that the
> > > > programmer or compiler chooses to give to the hardware.
> > >
> > > Ah, my mistake. Thanks for the correction.
> > >
> > > So the good news is that Valgrind is correct and we don't need to change
> > > anything :)
> > >
> > > Nick
> > >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Valgrind-developers mailing list
> > Val...@li...
> > https://lists.sourceforge.net/lists/listinfo/valgrind-developers
> >
>
>
>
|