Disclaimer: All following information and changes were acquired and tested on x86/WOW64 Win7 system only. Various aspects might be different on different OS and architecture.
User callbacks start in user space at a ntdll!KiUserCallbackDispatcher. This function has - callback ID, callback code address, pointer to input data and size of input data.
The input data usually follows the arguments on the stack with just a little padding. The data seems to always start with an undocumented structure CAPTUREBUF with details about the call. The individual kinds of callbacks have their customized structures, all having CAPTUREBUF at their beginning.
The kernel stores everything on the user stack, so the CAPTUREBUF provides possibility for fixing any self-referential pointers in the data.
In order to let e.g. Memcheck to see the passed data correctly initialized etc., Valgrind would need to copy the whole callback parameters block to e.g. the guest stack, fix any any self-referential pointers in the contents and execute the original callback.
This would be possible, because this is exactly what the system is created to do (CAPTUREBUF for "captured pointers"). Unfortunately, in the probably most widely used environment, WOW64, wow64user.dll does the conversion from kernel x64 structures to userland x86 ones and both the full input data size and the array of captured pointers is lost in the process, because the data size is not checked by anything down the road and any pointers can refer to the original values.
So at least for WOW64, every user callback needs to be analyzed and forwarded correctly to the original handler. But it has the advantage of knowing what's going in during the callback, which might prove useful for some new Valgrind tools.
In order to know what kind of marshalling to apply even though the IDs can change any time, one needs to know at least the name of the callback. This is done by tracing pointers in user32!_apfnDispatch[].
To leave the guest process more pristine, this symbol loading is again done in the loader, with the communication using shared memory and events. There is one command to get the address of _apfnDispatch, the second one resolves the function pointers located there. All done during global init.
A table of known user callbacks and their data contents has been introduced. It uses silly textual format to be able to define new structure layouts rapidly. However it still needs the manual analysis part, which sucks.
If the callbacks is known, it is forwarded properly to the guest stack, taking care of any pointers (mostly UNICODE_STRINGs). If the callback is not known, the original input data buffer is passed by pointer only, the memory is not marked as readable for V etc. This seems to work, but produces memcheck warnings, of course.
NtCallbackReturn() wrapper was modified to save the syscall parameters before popping the syscall stack, otherwise the return values from the callback would be invalid.
This section describes how Valgrind for Windows intercepts KiUserCallbackDispatcher.
General:
WIN32/WOW64:
WIN64:
==== ============ =================== ==== ======== ========================
Original ntdll!KiUserCallbackDispatcher Hooked ntdll!KiUserCallbackDispatcher
==== ============ =================== ==== ======== ========================
0000 83c404 add esp,4 0000 ff15 call [user_callback_
0003 5a pop edx xxxxxxxx dispatcher_ptr]
0004 64a118000000 mov eax,fs:[18h] 0006 90909090 nop*4
000a 8b4030 mov eax,[eax+30h] 000a 909090 nop*3
000d 8b402c mov eax,[eax+2Ch] 000d 909090 nop*3
0010 ff1490 call [eax+edx*4] 0010 ff1490 call [eax+edx*4]
0013 33c9 xor ecx,ecx 0013 33c9 xor ecx,ecx
0015 33d2 xor edx,edx 0015 33d2 xor edx,edx
0017 cd2b int 2Bh 0017 cd2b int 2Bh
======== =========================== ======== =======================================
Stack layout at the entry time of Stack layout at the entry time of
KiUserCallbackDispatcher VG_(win_sc_win32_user_callback_dispatch)
======== =========================== ======== =======================================
ESP+0 return address (into
VG_(catch_user_callback_dispatcher))
+0x04 address of dispatcher table (->EAX)
ESP+0 bogus return address +0x08 KiUserCallbackDispatcher+0x0006 (->EIP)
+0x04 ID of the callback handler +0x0c ID of the callback handler (->EDX)
+0x08 ptr to CAPTUREBUF (ESP+0x10) +0x10 ptr to CAPTUREBUF (ESP+0x18)
+0x0c size of the CAPTUREBUF (S) +0x14 size of the CAPTUREBUF (S)
+0x10 start of CAPTUREBUF +0x18 start of CAPTUREBUF
... ...
+0x0c+S last word in CAPTUREBUF +0x14+S last word in CAPTUREBUF
+0x10+S caller of system call stub +0x18+S caller of system call stub
+0x14+S 1st syscall argument +0x1c+S 1st syscall argument
Notes:
==== ============== ===================== ==== ============== =====================
Original ntdll!KiUserCallbackDispatcher Hooked ntdll!KiUserCallbackDispatcher
==== ============== ===================== ==== ============== =====================
0000 648b0d00000000 mov ecx,fs:[0h] 0000 648b0d00000000 mov ecx,fs:[0h]
0007 ba8000e877 mov edx,offset 0007 ba8000e877 mov edx,offset
!KiUserApcDispatcher+0x48 !KiUserApcDispatcher+0x48
000c 8d442410 lea eax,[esp+10h] 000c 8d442410 lea eax,[esp+10h]
0010 894c2410 mov [esp+10h],ecx 0010 894c2410 mov [esp+10h],ecx
0014 89542414 mov [esp+14h],edx 0014 89542414 mov [esp+14h],edx
0018 64a300000000 mov fs:[0h],eax 0018 64a300000000 mov fs:[0h],eax
001e 83c404 add esp,4 001e ff15xxxxxxxx call [user_callback_
0021 5a pop edx dispatcher_ptr]
0022 64a130000000 mov eax,fs:[30h] 0024 90909090 nop*4
0028 8b402c mov eax,[eax+2Ch] 0028 909090 nop*3
002b ff1490 call [eax+edx*4] 002b ff1490 call [eax+edx*4]
002e 50 push eax 002e 50 push eax
002f 6a00 push 0 002f 6a00 push 0
0031 6a00 push 0 0031 6a00 push 0
0033 e8a4f70000 call ZwCallbackReturn 0033 e8a4f70000 call ZwCallbackReturn
======== =========================== ======== =======================================
Stack layout at the entry time of Stack layout at the entry time of
KiUserCallbackDispatcher VG_(win_sc_wow64_user_callback_dispatch)
======== =========================== ======== =======================================
ESP+0 return address (into
VG_(catch_user_callback_dispatcher))
+0x04 address of dispatcher table (->EAX)
ESP+0 0 (bogus return address) +0x08 KiUserCallbackDispatcher+0x0024 (->EIP)
+0x04 ID of the callback handler +0x0c ID of the callback handler (->EDX)
+0x08 ptr to CAPTUREBUF (ESP+0x18) +0x10 ptr to CAPTUREBUF (ESP+0x20)
+0x0c 0 (no size S) +0x14 0 (no size S)
+0x10 ?fs:[0] +0x18 ?fs:[0]
+0x14 ?KiUserApcDispatcher+0x48 +0x1c ?KiUserApcDispatcher+0x48
+0x18 start of CAPTUREBUF +0x20 start of CAPTUREBUF
... ...
+0x14+S last word in CAPTUREBUF +0x1c+S last word in CAPTUREBUF
Notes:
==== ========== ===================== ==== ============== =====================
Original ntdll!KiUserCallbackDispatcher Hooked ntdll!KiUserCallbackDispatcher
==== ========== ===================== ==== ============== =====================
0000 488b4c2420 mov rcx,[rsp+20h] 0000 90 nop
0005 8b542428 mov edx,[rsp+28h] 0001 ff1425nnnnnnnn call [user_callback_
dispatcher_ptr]
0009 448b44242c mov r8d,[rsp+2Ch] 0008 909090909090 nop*6
000e 65488b0425 mov rax,gs:[60h] 000e 65488b0425 mov rax,gs:[60h]
60000000 60000000
0017 4c8b4858 mov r9,[rax+58h] 0017 4c8b4858 mov r9,[rax+58h]
001b 43ff14c1 call [r9+r8*8] 001b 43ff14c1 call [r9+r8*8]
001f 33c9 xor ecx,ecx 001f 33c9 xor ecx,ecx
0021 33d2 xor edx,edx 0021 33d2 xor edx,edx
0023 448bc0 mov r8d,eax 0023 448bc0 mov r8d,eax
0026 e82f010000 call ZwCallbackReturn 0026 e82f010000 call ZwCallbackReturn
======== ============================== ======== =====================================
Stack layout at the entry time of Stack layout at the entry time of
KiUserCallbackDispatcher VG_(win_sc_win64_user_callback_dispatch)
======== ============================== ======== =====================================
RSP+0 return address (into
VG_(catch_user_callback_dispatcher))
+0x08 ? rcx=buffer
+0x10 ? rdx=length
+0x18 ? r8=callback ID
+0x20 ? r9=KiUserCallbackDispatcher+8 ->RIP
RSP+0 ? (1) +0x28 ? (1)
+0x08 ? +0x30 ?
+0x10 ? +0x38 ?
+0x18 ? +0x40 ?
+0x20 ptr to CAPTUREBUF (RSP+0x58) +0x48 ptr to CAPTUREBUF (RSP+0x80)
+0x28 (32bit) size of CAPTUREBUF (S) +0x50 (32bit) size of CAPTUREBUF (S)
+0x2c (32bit) ID of callback handler +0x54 (32bit) ID of callback handler
+0x30 caller of system call +0x58 caller of system call
+0x38 ? (2) +0x60 ? (2)
+0x40 ? +0x68 ?
+0x48 ? +0x70 ?
+0x50 ? +0x78 ?
+0x58 start of CAPTUREBUF +0x80 start of CAPTUREBUF
... ...
+0x50+S last word in CAPTUREBUF +0x78+S last word in CAPTUREBUF
Notes: