|
From: Ivo R. <iv...@iv...> - 2016-07-17 20:07:40
|
Dear developers,
as you may be aware, we are currently working on sparcv9 support
for Valgrind [1]. Part of the ISA are crypto instructions, such as md5,
sha256, AES, Camelia, Montgmery multiplication and squaring, and others [2].
Some of these instructions can have inputs of up to several kilobytes,
utilizing sparc register windows and floating-point registers.
We have some doubts about the best way of implementing support for these
complex instructions in the VEX frontend and eventually backend (isel).
So far we have come up with 3 possible approaches:
1. Describe a crypto instruction with IR opcodes.
Pros: clean approach, does not require any isel support, works for
cross-arch analysis
Cons: bloated IR tree, especially after a tool instrumentation, expecting
slow processing
2. Utilize a clean helper function which will compute required output.
Pros: IR tree will be small, instrumentation relatively fast
Cons: unclear how to pass effectively inputs and outputs from the helper,
logic is "hidden" behind the helper
3. Add a new IR opcode for a crypto instruction.
Pros: clear intent, smallest IR tree
Cons: need to add support to VEX isel and all tools,
unclear how to effectively allocate virtual/host registers
I would like to hear your thoughts, comments, suggestions.
Thank you,
I.
[1] https://bitbucket.org/iraisr/valgrind-solaris
[2]
http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/sparc-architecture-2015-2868130.pdf
|
|
From: Julian S. <js...@ac...> - 2016-07-24 12:52:31
|
> as you may be aware, we are currently working on sparcv9 support
> for Valgrind [1]. Part of the ISA are crypto instructions, such as md5,
> sha256, AES, Camelia, Montgmery multiplication and squaring, and others [2].
> Some of these instructions can have inputs of up to several kilobytes,
> utilizing sparc register windows and floating-point registers.
Hmm, I see that XMONT* access 7 register windows (!). That must be a
complete nightmare to implement in hardware.
> 1. Describe a crypto instruction with IR opcodes.
> Pros: clean approach, does not require any isel support, works for
> cross-arch analysis
> Cons: bloated IR tree, especially after a tool instrumentation, expecting
> slow processing
Agree .. I don't think this is a good solution.
> 2. Utilize a clean helper function which will compute required output.
> Pros: IR tree will be small, instrumentation relatively fast
> Cons: unclear how to pass effectively inputs and outputs from the helper,
> logic is "hidden" behind the helper
This is probably your best bet, if you can do it. Can you split the
problem into a sequence of C helper calls, in which each helper takes
256 or 384 bits of input, and returns a 128 bit result? I used this
technique recently to implement arm64 AES and SHA instructions -- have
a look.
> 3. Add a new IR opcode for a crypto instruction.
> Pros: clear intent, smallest IR tree
> Cons: need to add support to VEX isel and all tools,
> unclear how to effectively allocate virtual/host registers
That sounds complex and difficult from a register allocation point of view.
There are two other possible solutions:
[1] This is a horrible hack. Try to avoid it. It can cause
the guest program to observe different results natively vs on Valgrind,
if it is buggy. But anyway:
Generate IR like this:
* move the (guest) SP down by (eg) 1024 bytes.
* copy all guest registers into the newly created area on the guest
stack
* call a dirty helper function to do the computation, passing it the
SP value as a parameter
* copy values out of memory area back into guest registers
* move SP back up 1024 bytes
Problem is that if the guest program has for any reason stored values
on the stack below SP then they will be corrupted. We had an obscure
and longstanding bug on x86_64 for this reason. Also, if the program
takes a signal in the middle of this sequence then the state may be
corrupted. (Not entirely sure about that, but ..)
[2] This is better but whether it actually works depends on the exact
details of which registers are accessed, and whether you can describe
that in the dirty-helper side-effect annotations. Which -- I suspect
you will have problems with because of the register windows. Anyway:
* generate a single dirty helper call, passing it a pointer to the
VexGuestSPARC64State struct and any other params you require.
* Write C to do the operations directly on that state
* [the difficult bit] make sure you can actually describe, in the
IRDirty::fxState area, which parts of the register state the
helper reads and writes. If you can't, it's game over :-(
See guest_x86_toIR.c "FNSAVE m108" for an example.
J
|
|
From: Ivo R. <iv...@iv...> - 2016-07-27 19:37:34
|
2016-07-24 14:52 GMT+02:00 Julian Seward <js...@ac...>: > > > 2. Utilize a clean helper function which will compute required output. > > Pros: IR tree will be small, instrumentation relatively fast > > Cons: unclear how to pass effectively inputs and outputs from the helper, > > logic is "hidden" behind the helper > > This is probably your best bet, if you can do it. Can you split the > problem into a sequence of C helper calls, in which each helper takes > 256 or 384 bits of input, and returns a 128 bit result? I used this > technique recently to implement arm64 AES and SHA instructions -- have > a look. > Yes, thank you, I've had a look. But it seems arm64 instructions are not so complex - they take 3 or 4 registers/operands. Sparc crypto instructions are complex - they can take several tens of registers, but no actual operands. > [2] This is better but whether it actually works depends on the exact > details of which registers are accessed, and whether you can describe > that in the dirty-helper side-effect annotations. Which -- I suspect > you will have problems with because of the register windows. Anyway: > * generate a single dirty helper call, passing it a pointer to the > VexGuestSPARC64State struct and any other params you require. > * Write C to do the operations directly on that state > * [the difficult bit] make sure you can actually describe, in the > IRDirty::fxState area, which parts of the register state the > helper reads and writes. If you can't, it's game over :-( > I took this approach with initial 'md5' implementation: https://bitbucket.org/iraisr/valgrind-solaris/commits/dd966b975760920e14342ad5bc076109a3c6942c We do not maintain all register windows in the guest state. After some prototyping, we maintain only the current register window in the guest state, as all normal operations cannot address the other ones, anyway. The other register windows are on the stack, where they would eventually end up anyway. So I think this approach will suit our needs the best. Thank you your response! I. |