|
From: Paul F. <pa...@so...> - 2025-11-06 07:35:38
|
https://sourceware.org/cgit/valgrind/commit/?id=85a06581cc9e725103a2dda0acfcf95b95944755 commit 85a06581cc9e725103a2dda0acfcf95b95944755 Author: Paul Floyd <pj...@wa...> Date: Thu Nov 6 08:33:42 2025 +0100 doc: add a text file describing client syscall argument handling To help understand all of the shuffling that gets done for "syscall syscall". Diff: --- docs/internals/client_syscall_arguments.txt | 112 ++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) diff --git a/docs/internals/client_syscall_arguments.txt b/docs/internals/client_syscall_arguments.txt new file mode 100644 index 0000000000..1ef113db84 --- /dev/null +++ b/docs/internals/client_syscall_arguments.txt @@ -0,0 +1,112 @@ +Client Syscall Arguments +======================== + +This document describes how Valgrind handles arguments for client syscalls. +Everything described here takes place in VG_(client_syscall), syswrap-main.c. + + +Data Structures +~~~~~~~~~~~~~~~ + +There are 3 data structures that get used during the argument handling. + +1. VexGuestArchState, the usual storage for registers. +2. SyscallArgLayout, contains info about where the arguments are. +3. SyscallArgs (two copies in SyscallInfo), contains the argument values. + +Flow +~~~~ + +The main steps in the function are to call the PRE syscall wrapper. +That may perform the syscall (or simulate the syscall) and it may +also mark the syscall as blocking. If the PRE did not mark the syscall +as completed it will proceed to either make a non-blocking or a blocking +call. Lastly the POST gets called, if required. + +All of the above can be complicated by the fact that some platforms have +a "syscall syscall". Most platforms have a libc function called "syscall()". +On some platforms libc shuffles the arguments and just performs the +requested syscall directly . Other platforms have a syscall for performing syscalls. +There may even be more than one such syscall. In these cases it is the kernel +that shuffles the arguments to pass them on to the appropriate +syscall. + +The main platforms that have a "syscall syscall" are Darwin and FreeBSD. +Linux mips32 also has some special handling for syscall syscall. + +In Valgrind when there is a "syscall syscall" we don't want to just pass +all of the parameters through. If we did that then "syscall syscall" PRE wrapper +would need to handle all other kinds of syscalls, probably by some kind +of second level of recursive call. This is not the approach that has been taken. +Instead the arguments get "canonicalised" so that the PRE sees "syscall(SYS_write)" +is if it were just a normal direct write syscall. + +The argument layout for such "syscall syscalls" is the same as normal syscalls +but offset by one in register/stack positions. The first argument will be that for +syscall or __syscall. The second argument will be the target normal syscall +followed by the target arguments. + + +Flow in Detail +~~~~~~~~~~~~~~ + +1. Get the canonical arguments. +Call getSyscallArgsFromGuestState() +This stores the canonical arguments (syscall syscall format gets shuffled) +in the SyscallArgs structure. + +2. Get the syscall argument layout +This just initialises the fields of the SyscallArgLayout structure. The layout +will be different depending if it is a normal syscall or a syscall syscall. +It cannot be canonicalised - we can shuffle around the values but we can't +shuffle around where they are stored. + +4. Call the syscall PRE wrapper +The argument values are passed in a pointer to SyscallArgs. The fields of that +structure are used by the ARGX and SARGX macros to access the argument values +in the wrapper. + +The argument layout is passed in a pointer to SyscallArgLayout. The fields of +this structure are used indirectly by the PRE_REG_READX macros (X being an +integer for the argument position) For each argument the PRE_REG_READX macro +uses a PRAX macro which in turn uses either PSRAn for stack accesses or +PRRAn for register accesses. In the case of amd64 the location of argument 6 +depends on whether it is a normal syscall or a syscall syscall. In the former +case it will be in a register. In the latter case it will be on the stack. +There is special handling for this case. + +If the syscall has not been completed by the PRE then either step 5 or step 6 +will be executed for blocking and non-blocking syscalls respectively. + +5. Perform a blocking syscall +This is the more complicated of the two as we need to release the global lock, +change to using the guest signal mask, do the syscall, restore the Valgrind +signal mask and request the global lock again. + +A call to putSyscallArgsIntoGuestState is made. The PRE may have changed +some of the arguments so we need to put the arguments back into +VexGuestArchState. + +The syscall (and the signal mask handling) is performed in a call to +do_syscall_for_client(). This takes the arguments other than the syscall number +from VexGuestArchState. + +6. Perform a non-blocking syscall. + +This is much simpler. It performs the syscall via VG_(do_syscall). +The arguments are passed via struct SyscallArgs (possibly modified by the PRE +wrapper). + +7. Call VG_(post_syscall)() +This will call the POST wrapper if required. + +Future Work +~~~~~~~~~~~ + +The flow would be simpler if do_syscall_for_client() used struct SyscallArgs +to get the arg values like VG_(do_syscall). That would avoid having to +put modified arguments back into the guest state. I have not checked, but +I am not certain that the modified guest state is not visible after the syscall. + +The handling of "syscall syscall" does an excessive amount of shuffling, +especially for the syscall number. I think that this can be simplified. |