QEMUSlice2 Code
Brought to you by:
xfu2013
Read the documentation in qemu-doc.html or on http://wiki.qemu.org **** if anything goes wrong, place a breakpoint at raise_exception_err **** - QEMU team ----------------- the following are MOD logs by Dr. Xiang Fu ----------- (1) --------- OBJECTIVE: instruction trace *** Take popcnt as an example. *** 1. modify ops_sse.h add void trace(uint32_t eip) 2. modify ops_sse_header.h add the DEF_HELPER_1(...) and call trace in disas_insn 3. the problem is the system is always complaining about macro definition of parameter mismatch. The PROBLEM is with TARGET_ULONG, the system is always treating it i32. And it always crashed. 4. th debug: size calculation (tcg.cc 2060). Debug the system by ./configure --enable-debug --disable-pie sudo make install gdb qemu-system-i386 run -m 512 -hda winxp.img 5. solution: don't pass TARGET_ULONG, pass ENV instead. Then use env->eip. (2) ---------------- IMPORTANT CODE NOTES 1. CPU op code defined in cpu.h, there is an ENUM construct 2.int_helper.cc defines the handling (emulatioN) of integer operations 3.ops_sse_header.h defines all helper functions for all instructions translate.c disas_insn is the key!!! TO UNDERSTAND ITS LOGIC, use GDB run -m 512 -hda winxp.img -no-kvm 4. inside the body of disas_insn is the handling for each instruction, however, seems all functions not well wrapped as a disassembler. cannot actually print the instruction. 5. MEMORY READING: achieved using MMU translation. env->tlb_table[mmu_idx][page_idx] Once addr is obtained, we can call ldsb ... functions to retrieve word. Or read it directly if we do not care about machine endianess as we are pretty sure to work on x86 platform here. cpu_ldub_code ... function is defined in include/qemu/bswap.h could be combined with the use of libdis library to disassemble (more convenient than modifying the translate.c) (3) --------------- WRITE a disassembler mostly ok. one sample file in the test directory. Need to copy libdisasm.so to /usr/lib (or specify the directory in configure) in Makefile.target and Makefile for line "libs +=" append -ldisasm Note: CPU_LDL... are defined in include/exec/cpu_all.h (4) ------------------- TRACE NOTEPAD.exe ------------------- (a) CR[3] stores the page table entry address (b) read TIB and PEB to get the process name, note that however, FS[0]'s data HOWEVER, is sometimes not pointing to the right data 50% of the times. (c) To solve (b), we need to study target-i386/seg_helper.c. Here is summary: [1]. load_segment(env, *e1, *e2, selector) retrieves a SEGMENT descriptor entry from LDT/GDT, based on the selector value. selector could be CS/DS/ES/FS/GS etc (16 bit). It's 3rd bit decides which DT to read (LDT or GDT). Each entry has two words, and they are stored into e1 and e2. [2] Now with e1 and e2, the base and limit of the segment can be calculated using get_seg_base(e1, e1) and get_seg_limit [3] tss_load_seg(env, seg_reg, selector) given selector updates the SegmentCache[seg_reg] entry. This is an INTERNAL (not real hardware) cache for the information of segment. NOTE that it has side-effect. We should try to avoid calling it. [4] tss_load_seg(...) calls cpu_x86_load_seg_cache(...). cpu_x86_load_seg_cache() is a simple function that sets the contents of the SegmentCache[seg_reg] entry (d) plan: to get the right FS[0]. We will do: load_segment(env, e1, e2, VALUE_OF_FS) base = get_seg_base(e1, e2); Problem: what is the value of FS register? should be somewhere in CPUX86State. It seems that VALUE_OF_FS IS the CPUX86State.segs[R_FS].selector (this can be evidenced by the implementation of "push fs" in translate.c -- however, this eventually leads to the problem: for about 50% of the processes, the dump of FS:[0] is NOT right!) (e) After dumping every 100 instruction of the same process find that 50% of the time, when instruction address is 0x0804xxxx (kernel space) the FS register is not pointing to TIB structure. Clearly, the FS value is saved some where ... It might be retrieve the FS from TSS (Task Store State), however, the logic is quite complex, we need to look at why the kernel is invoked (e.g., via JMP or via IRET or interrupt) etc. We'll use a little bit costly but do-able approach: Simply check FS:[0x18] is the address of FS:[0]. If not match, simply go ahead because it's not the instruction to take the process name (5) ----------------- MINI Project 6: I/O-------- (1) How is keyboard input handled? QEMU uses VNC console. In the console, when a key is pressed, a funtion named "kbd_put_keycode" in ui/input.c is called. Then function ps2_put_keycode in hw/ps2.c is called. NOTE that the keycode is in "keyscan code" (not ascii). Note in ps2.c there are two states: PS2Keyboardstate and PS2BoardState. Each maintains a queue of events. So ps2_put_keycode is to simply append the keycode to the queue. hw/ps2.c also provides a function ps2_read_data() to read out the data, this is clearly prepared for those "in instruction". Sequence of functions: kbd_read_data->ps2_read_data. Now let's look at the QEMU emulator side. When an instruction "INB eax, 0x64) is met", disas_insn is called. Note the INB opcode is 0xec (INB). The translated code includes "cpu_inb" in ioport.c. It then calls ioport_read, which based on a function pointer array "ioport_read_table", determines which function to call. The table is initialized by register_ioport_read. Using GDB (set a bp on register_ioport_read), we can find that it's a general function ioport_readw_thunk registerd, which calls IORange->ops->read. We didn't spend time on investigate how the reigstration is done, next we'll use breakpoint in GDB to find out which I/O port is used for getting user input from keyboard. (Here software BP seems ok, hardware BP seems slightly faster). ********************************************************** 1) gdb qemu-system-i386 2) b main 3) run -m 256 -hda winxp.img -no-kvm !!! when we see SIGUSR1 type >> handle SIGUSR1 noprint Note: don't type "HANDLE SIGUSR1 ignore" -> it's going to break the running of the system 5) Ctrl+c and then set the breakpoint on kbd_read_data. !!! PAY ATTENTION IF WE STOP AT THE BREAKPOINT, THE ENTIRE SYSTEM IS FROZEN (BECAUSE KEYBOARD IS CAPTURED). !!! We need to let the system continue and print out !!! the information we want *** set BP at line 324 of pckbd.c (which is the branch for handling keyboard events) >> b pckbd.c:325 if val==0x1e (if "a" is pressed, the scancode for 'a' is 0x1e) >> commands > backtrace (we want to see who calls keyboard function) >>silent >> cont >>end Somehow, the above still freezes the system -- not sure why, thesame trick works well with other breakpoints on other parts of the systme BUT, we are able to see the dump of backtrace ioport.c cpu_inb calls ioport_read calls (retrieves func pointer from ioport_table_read) port:??? memory.c memory_region_iorange_read memory_region_read_access calls pckbd.c kbd_read_data note that the input parameter 'addr' is NOT used. CONCLUSION: the "INB EAX, 0x60" instruction is used to read from KEYBOARD. Port number is 0x60!!! ********************************************************** (6) --------------- PORTING TO C++ ------------------------ Objective: add an instruction class to the sytem so that we can use STL. Approach: establish a folder called "traceinstr", and create the Makefile as usual. Then in the root folder Makefile, add a ".PHONY" rule about traceinstr, and add rules "make -C traceinstr" to compile the directory as a target. The real trouble is the linker. Most sytems are compiled using "cc" (c compiler), but needs to link with the files in "traceinstr". We have to use "extern 'C'" to wrap the functions to be imported by the C language part Note we also need to add "-L../traceinstr -linstr" to the Makefile of the i386-softmmu folder. then in rules.mak replace "LINK" definition with "g++" (however, will trigger an error about main missing in multiboot.o) Then go to pc-bios/optionrom/Makefile and change the call of $(LD) directly to "ld"!!! (this is to allow the BIOS stuff to use C compiler and C linker, but the other parts of qemu would still be ok). (7) ------------------ MEMORY TRACING ---------------------------- Objective: we'll track which addresses is each instruction reading from and writing to. Idea: instrument cpu_x86_ldsb_code etc. Using GDB, we can find out that all the memory loading and saving operations are essentially defined as macros in include/exec/softmmu_header.h:85. [it looks like a macro of the same function for many different types of data sizes, something like a C languaguage version of C++ template]. This saves code space, but is very UGLY and creates a lot of trouble. If we want to instrument and print cr3, note at at line 85, the CPUArchState can be CPUX86State or CPUAlphaState (which does not have cr3). The trick is to also use MACROs to define a macro "dummy_cr3" and use it in the definition in softmmu_header.h. Then "dummy_cr3" is SPECIALLY redefined in cpu.h in target_i386. (8) ----------------- NETWORK --------------------------- The trick is to specify the model rtl8136, see run.sh in qemu_image folder. (9) ------------------ DEBUG MEMORY ERRORS ----------------------- Sometimes, if we read error (causing segmentation relted faults), we want to intercept the error. Set BP on " **** - QEMU team ----------------- the following are MOD logs by Dr. Xiang Fu ----------- (1) --------- OBJECTIVE: instruction trace *** Take popcnt as an example. *** 1. modify ops_sse.h add void trace(uint32_t eip) 2. modify ops_sse_header.h add the DEF_HELPER_1(...) and call trace in disas_insn 3. the problem is the system is always complaining about macro definition of parameter mismatch. The PROBLEM is with TARGET_ULONG, the system is always treating it i32. And it always crashed. 4. th debug: size calculation (tcg.cc 2060). Debug the system by ./configure --enable-debug --disable-pie sudo make install gdb qemu-system-i386 run -m 512 -hda winxp.img 5. solution: don't pass TARGET_ULONG, pass ENV instead. Then use env->eip. (2) ---------------- IMPORTANT CODE NOTES 1. CPU op code defined in cpu.h, there is an ENUM construct 2.int_helper.cc defines the handling (emulatioN) of integer operations 3.ops_sse_header.h defines all helper functions for all instructions translate.c disas_insn is the key!!! TO UNDERSTAND ITS LOGIC, use GDB run -m 512 -hda winxp.img -no-kvm 4. inside the body of disas_insn is the handling for each instruction, however, seems all functions not well wrapped as a disassembler. cannot actually print the instruction. 5. MEMORY READING: achieved using MMU translation. env->tlb_table[mmu_idx][page_idx] Once addr is obtained, we can call ldsb ... functions to retrieve word. Or read it directly if we do not care about machine endianess as we are pretty sure to work on x86 platform here. cpu_ldub_code ... function is defined in include/qemu/bswap.h could be combined with the use of libdis library to disassemble (more convenient than modifying the translate.c) (3) --------------- WRITE a disassembler mostly ok. one sample file in the test directory. Need to copy libdisasm.so to /usr/lib (or specify the directory in configure) in Makefile.target and Makefile for line "libs +=" append -ldisasm Note: CPU_LDL... are defined in include/exec/cpu_all.h (4) ------------------- TRACE NOTEPAD.exe ------------------- (a) CR[3] stores the page table entry address (b) read TIB and PEB to get the process name, note that however, FS[0]'s data HOWEVER, is sometimes not pointing to the right data 50% of the times. (c) To solve (b), we need to study target-i386/seg_helper.c. Here is summary: [1]. load_segment(env, *e1, *e2, selector) retrieves a SEGMENT descriptor entry from LDT/GDT, based on the selector value. selector could be CS/DS/ES/FS/GS etc (16 bit). It's 3rd bit decides which DT to read (LDT or GDT). Each entry has two words, and they are stored into e1 and e2. [2] Now with e1 and e2, the base and limit of the segment can be calculated using get_seg_base(e1, e1) and get_seg_limit [3] tss_load_seg(env, seg_reg, selector) given selector updates the SegmentCache[seg_reg] entry. This is an INTERNAL (not real hardware) cache for the information of segment. NOTE that it has side-effect. We should try to avoid calling it. [4] tss_load_seg(...) calls cpu_x86_load_seg_cache(...). cpu_x86_load_seg_cache() is a simple function that sets the contents of the SegmentCache[seg_reg] entry (d) plan: to get the right FS[0]. We will do: load_segment(env, e1, e2, VALUE_OF_FS) base = get_seg_base(e1, e2); Problem: what is the value of FS register? should be somewhere in CPUX86State. It seems that VALUE_OF_FS IS the CPUX86State.segs[R_FS].selector (this can be evidenced by the implementation of "push fs" in translate.c -- however, this eventually leads to the problem: for about 50% of the processes, the dump of FS:[0] is NOT right!) (e) After dumping every 100 instruction of the same process find that 50% of the time, when instruction address is 0x0804xxxx (kernel space) the FS register is not pointing to TIB structure. Clearly, the FS value is saved some where ... It might be retrieve the FS from TSS (Task Store State), however, the logic is quite complex, we need to look at why the kernel is invoked (e.g., via JMP or via IRET or interrupt) etc. We'll use a little bit costly but do-able approach: Simply check FS:[0x18] is the address of FS:[0]. If not match, simply go ahead because it's not the instruction to take the process name (5) ----------------- MINI Project 6: I/O-------- (1) How is keyboard input handled? QEMU uses VNC console. In the console, when a key is pressed, a funtion named "kbd_put_keycode" in ui/input.c is called. Then function ps2_put_keycode in hw/ps2.c is called. NOTE that the keycode is in "keyscan code" (not ascii). Note in ps2.c there are two states: PS2Keyboardstate and PS2BoardState. Each maintains a queue of events. So ps2_put_keycode is to simply append the keycode to the queue. hw/ps2.c also provides a function ps2_read_data() to read out the data, this is clearly prepared for those "in instruction". Sequence of functions: kbd_read_data->ps2_read_data. Now let's look at the QEMU emulator side. When an instruction "INB eax, 0x64) is met", disas_insn is called. Note the INB opcode is 0xec (INB). The translated code includes "cpu_inb" in ioport.c. It then calls ioport_read, which based on a function pointer array "ioport_read_table", determines which function to call. The table is initialized by register_ioport_read. Using GDB (set a bp on register_ioport_read), we can find that it's a general function ioport_readw_thunk registerd, which calls IORange->ops->read. We didn't spend time on investigate how the reigstration is done, next we'll use breakpoint in GDB to find out which I/O port is used for getting user input from keyboard. (Here software BP seems ok, hardware BP seems slightly faster). ********************************************************** 1) gdb qemu-system-i386 2) b main 3) run -m 256 -hda winxp.img -no-kvm !!! when we see SIGUSR1 type >> handle SIGUSR1 noprint Note: don't type "HANDLE SIGUSR1 ignore" -> it's going to break the running of the system 5) Ctrl+c and then set the breakpoint on kbd_read_data. !!! PAY ATTENTION IF WE STOP AT THE BREAKPOINT, THE ENTIRE SYSTEM IS FROZEN (BECAUSE KEYBOARD IS CAPTURED). !!! We need to let the system continue and print out !!! the information we want *** set BP at line 324 of pckbd.c (which is the branch for handling keyboard events) >> b pckbd.c:325 if val==0x1e (if "a" is pressed, the scancode for 'a' is 0x1e) >> commands > backtrace (we want to see who calls keyboard function) >>silent >> cont >>end Somehow, the above still freezes the system -- not sure why, thesame trick works well with other breakpoints on other parts of the systme BUT, we are able to see the dump of backtrace ioport.c cpu_inb calls ioport_read calls (retrieves func pointer from ioport_table_read) port:??? memory.c memory_region_iorange_read memory_region_read_access calls pckbd.c kbd_read_data note that the input parameter 'addr' is NOT used. CONCLUSION: the "INB EAX, 0x60" instruction is used to read from KEYBOARD. Port number is 0x60!!! ********************************************************** (6) --------------- PORTING TO C++ ------------------------ Objective: add an instruction class to the sytem so that we can use STL. Approach: establish a folder called "traceinstr", and create the Makefile as usual. Then in the root folder Makefile, add a ".PHONY" rule about traceinstr, and add rules "make -C traceinstr" to compile the directory as a target. The real trouble is the linker. Most sytems are compiled using "cc" (c compiler), but needs to link with the files in "traceinstr". We have to use "extern 'C'" to wrap the functions to be imported by the C language part Note we also need to add "-L../traceinstr -linstr" to the Makefile of the i386-softmmu folder. then in rules.mak replace "LINK" definition with "g++" (however, will trigger an error about main missing in multiboot.o) Then go to pc-bios/optionrom/Makefile and change the call of $(LD) directly to "ld"!!! (this is to allow the BIOS stuff to use C compiler and C linker, but the other parts of qemu would still be ok). (7) ------------------ MEMORY TRACING ---------------------------- Objective: we'll track which addresses is each instruction reading from and writing to. Idea: instrument cpu_x86_ldsb_code etc. Using GDB, we can find out that all the memory loading and saving operations are essentially defined as macros in include/exec/softmmu_header.h:85. [it looks like a macro of the same function for many different types of data sizes, something like a C languaguage version of C++ template]. This saves code space, but is very UGLY and creates a lot of trouble. If we want to instrument and print cr3, note at at line 85, the CPUArchState can be CPUX86State or CPUAlphaState (which does not have cr3). The trick is to also use MACROs to define a macro "dummy_cr3" and use it in the definition in softmmu_header.h. Then "dummy_cr3" is SPECIALLY redefined in cpu.h in target_i386. (8) ----------------- NETWORK --------------------------- The trick is to specify the model rtl8136, see run.sh in qemu_image folder. (9) ------------------ DEBUG MEMORY ERRORS ----------------------- Sometimes, if we read error (causing segmentation relted faults), we want to intercept the error. Set BP on "gen_exception" does not work, instead, **** set a breakpoint on raise_exception_err of mem_helper.c in target_i386. !!! It seems that exception_index 14 and error_code 0 often causes BLUESCREEN. Using backtrace we can find error. One problem we have with ops_sse.h is that if we adjust the WAIT_INS to a smaller number like 101, the system will crash with bluescreen. Using the above technique, we can identify that when we try to read the process name too early (ing the FS:[0] structure->0x3c offset), the contained address is usually an illegal address). This causes the problem. To fix the problem, we need a way to tell if an address is a legal address. READ mem_helper.c in target_i386. It seems that we could use cpu_memory_rw_debug(...) function to verify if an address is legal (it basically checks the validity bit in page table and check the access right). Use it in ops_sse.h where we try to read the pFilePath. (10) ------------------- SNAPSHOT and Monitor ------------------------ To avoid restarting the system every time, we can use the monitor, by appending "-minotr stdio" to the command. we can use "savevm name" and "loadvm name" command to save and restore snapshots (albeit it's a big time consuming about 30 seconds - but it's better than to restart the system). Using monitor we can also use mouse_move xoff,yoffset command to reset the mouse position which is convenient. NOTE THAT however, not working well with GDB!!! (it's actually ok, as long as using exactly the same qemu-system-i386 command arguments when the vmsnapshot is taken0. (11) --------------------------- system interrupt handling observation ---- When an interrupt occurs, there will be a piece of code of OS interrupt handler inserted (with an IRET). Notice that this can be an I/O event and can happen anywhere. (12) ------------------ revist memory tracing ------------------------- from translate.c by studing disas_insn and a memory saving instruction (such as push), we can identify/trace to file tcg_op.h about micro operation tcg_gen_qemu_st32 where "st" stands for "store", it translates into a micro-instruction INDEX_op_qemu_st32. Then we need to study where INDEX_op_qemu_st32 is executed. *** INDEX_op_qemu_st32 *** Doing a grep search of the above, we can find that tcg/i386/tcg-target.c:1348 *** static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,*** This function contains the logic. It calls tcg_out_tbl_load to load the TLB, and it writes to TLB using tcg_out_qemu_st_direct. It then calls tcg_out_modrm_offset *** These functions are used to generate TCG micro-operation code, and generate the corresponding machine instructions on the target i386 system for these micro-opcodes. Using GDB, we could find that these tcg_out_*** functions are actually called by *** cpu_x86_exec ***, it calls tb_find_slow/fast to generate/locate the tb_code and then it calls tcg_qemu_tb_exec to execute the generated code tcg_qemu_tb_exec is defined as a macro: code_gen_prolog(env, tb_ptr) code_gen_prologue = code_gen_buffer + code_gen_buffer_size - 1024; At code_gen_buffer is a function that takes env and tb_ptr as two parameters and execute them, now the problem is to figure out the source code of code_gen_buffer. Using GDB, we could find that code_gen_prolog address is 0x8bf8d04. (use x/16i $eip at line 599 of cpu-exec.c), NOTE!! need to use hardware breakpoint here!!! Then we step into b71e5c00 (the code_gen_prolog buffer). However, could not see source (because most likely these are dynamically generated code). Stepping into a couple of instructions, we find that it's calling helper_trace2. So, in summary, it's the dynamic compilation of binary code that builds the machine code to call helper_trace2 for each instruction. It's the tcg_out_qemu_st that generates the binary instruction that stores the memory words. ** tcg_out_qemu_st *** logic: (supports multiple addressing modes, addressing parameters stored in args, seems not easy to handle [too many cases]) tcg_out_tlb_load: tcg_out_mov(r0, addrlo) (e.g., instantiated to EAX < - EBX?) tcg_out_modrm(opc, ret, arg): opc must be X86 opcode (e.g., MOV 0x8b EAX, EBX) tcg_out_mov(r1, addrlo) (r1 <- addrlo) tcg_out_shifti (r0, 8) [shift r0 8 bits to the left] tcg_arthi(r1 AND (TARGET_PAGE_MASK | 1 << s_bits-1) tcg_arith(r0 AND (CPU_TLB_SIZE-1)<<CPU_TLB_ENTRY_BITS) tcg_out_modrm_sib_offset opcode: 141 is LEA r0 + offset(tlb_table[mem_index]), the function is responsible for generating the INDEX BASED ADDR MODE instruction given any opcode. tcg_out_modrm_offset CMP r1, 0(r0) jne slow path r1 <-addrlo ... clearly it's to generate instructions to WRITE TLB table. then call tcg_out_tbl_st_direct: generate the MOVL instruction to save 32-bit word using register indexed mode. --> so it does generate a save instruction, Similarly: tcg_out_ld/tcg_out_st: basically register based MOVL ******* at line 1072 of tcg/i386/tcg-target.c has the explaination of all local variables !!!! ADDRLO_IDX contains the index into the ARGS (lower part of the address), ADDRLO_IDX+1 stores the higher part, MEM_INDX: memory contentxt index and log2 (log of the size). ***** note: helper_ldb(wlq)_mmu, helper_stb(wlq)_mmu defined in softmmu_template.h. Their ENV is exactly the same as the subsequent helper_trace2 parameter, however env->eip is not always the pc_start. So the question is: why is pc_start different from env->eip? (13) ------------------ revisit MEMORY TRACING Part 2 --------------------- It seems that the key to the current problem of memory tracing is the difference between env->eip and pc_start Experiment 1: in handle_trace2 function, print both the instructions. Result: env->eip contains the address of the first instruction of a code block (no branches) Experiment 2: check if CPUX86State has anything similar to eip. Result: CPUX86State does not store anything about EIP other than the env->eip data attribute. Now the problem is how does handle_trace2 get the program counter. Note that it's disas_insn who passes handle_trace2 the cpu_env (a global variable for TCG micro-operations) Experiment 3: we then need to figure out how EIP register is updated by micro operations. Look at how NOP instruction is done! Break at translate.c:6879, where NOP instruction is handled and look at how EIP is updated. Actually the new EIP is returned. It seems that the micro-instructions do not update EIP (more maintain the EIP in the global state). Experiment 4: ADD a new GLOBAL VARIABLE called my_eip, which is updated whenever the handle_trace2 is called. See if it is called BEFORE or AFTER the memory access. Conclusion: READ operations are always logged before handle_trace2 is called. Strangely WRITE operations are not logged. THESE READ operations are called only when used to load instructions. ??? cpu_x86_ldxxx is not called whenever a LOGICAL MEMORY READ ACCESS is performed by an instruction, it is called lazily when TLB flush???? Experiment 5: Add a hook to the helper_ldb(lq)_mmu and helper_st(blq)_mmu. Question is READ/WRITE recorded correctly. Conclusiopn: helper_ld and helper_st functions are called by tci.c (interpreter of intermediate TB code)e,g., ldu8 opcode of TCB. *** Experiment 6: see if helper_ldb(lq)_mmu is called for every memory read. Use GDB. Conclusion: the system does not always call helper_ldb and helper_st when the instruction has to read or write memory. It only does them occasionally when it's a jump (translation block has to change) and occasionally on some push operations (but no pattern clearly to follow). Looks like TLB operation caches the right at the level of helper_ldb. Experiment 7: revisit tcg_out_qemu_ld and tcg_out_qemu_st and see their relation with each instruction. They do not correspond to the machine instructions (instead, they are actually called by the disas_insn function. Similarly there is no direct correspondance with helper_ldb(wlq)_mmu and helper_st functions!!! Experiment 8: read source file of disas_insn (translate.c) and check how addl %(ecx), %ebx is handled. Opcode: 0x01. They are calling qemu_gen_ld_32u .. qemu_gen_st_32u which generates the intermediate TCG code. They are called by disas_insn and the passed addr is the oprand id (not the real address). However, the helper_ldb_mmu are not called correspondingly. Experiment 9: examine the generated code for a sequence of PUSH instructions. Start from disas_insn and set a breakpoint at where it handles PUSH instruction, it's at line 5132, which consists of two micro operations, first move the register contents to a temporary variable T0 and then push_T0 into stack. where gen_push_T0, first reduces ESP by 4, and then calls gen_op_st_T0_A0 which eventually calls ***tcg_gen_qemu_st32***! Note *** tcg_ctx.gen_opc_ptr points to the location of the current micro-instruction. It calls tcg_gen_code(tcg_ctx, char *ptr code_buf) to generate from intermediate code to machine code. !!! tcg_optimize does some optimizations like liveness analysis and operand optimization and REALLOCATES REGISTERS. So it's not strictly following the micro operations generated for each instruction!!! Experiment 10. find out how to disable the optimization !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -------------------------------------------------------------- disable the first two macros: USE_LIVENESS_ANALYSIS and USE_TCG_OPTIMIZATIONS in tcg/tcg.c handle_instr verified no problem (passing pc_start no problem). It seems that disable USE_TCG_OPTIMIZATIONS does not HELP!!!! Strange!!! We'll have to check how the intermediate code is translated next. -------------------------------------------------------------- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ---------------------------------------------------------------------- Experiment 11. Find out how instructions are dynamically recompiled. ---------------------------------------------------------------------- Step 1. Find out a TB (translation block) that contains at least one push instruction that does not have a helper_stl_mmu called. This can be done using breakpoints on translate.cc:5132 and count the number of tb_find_slow called. Data: take the pc_start 0xf1ec6, 0xf1ec7, 0xf1ec8 (all push), using env->eip we know that the TB (translation block) starts at 0xf1ec6, doing info_b we know that when 0xf1ec6 (as pc_start) is hit, the tb_find_slow is hit 16 (or 17 times) - just manually control. **** We could set a conditional breakpoint at tb_find_slow when pc==0xf1ec6) Discovery: from tb_slow_find, it first tries to locate if the TB has already been generated by a hash. It turns out the TB needs to be generated, then it calls tb_gen_code, note that the passed PC is 0xf20d8. Step 2. Find out how intermediate code is generated tc_code is stored at tb->tc_ptr (code_gen_ptr) at 0xb31e6b30. It then calls cpu_gen_code(), which calls gen_intermediate_code first. It first calls gen_intermediate_code_internal. Note DisasContext->tb always points to the current TB. Intermediate code: opcode is stored at tcg_ctx.gen_opc_ptr, num_insns records the number of instructions in this TB. push instruction: tcg_ctx.gen_opc_ptr (0x8c1cde2), note opcode is stored in tcg/tcg-opc.h (cound end as 0, nop as 1 ... and then mov_i32 is 10). arguments are stored at tcg_ctx.gen_opparam_ptr (0x8c1d30c). push instruction generates the opcode from 0x8c1cddc to 0x8c1cdee: (use x command). Thse opcodes are defined in tcg/tcg_opc.h 11 11 10 10 11 23 120 10 8 10: mov_32, 11: mov_i32, 23: sub_i32, 120 (0x78): qemu_st32, 8: call Note: at 0xf1ec6, 0xf1ec7, 0xf1ec8 there are three consecutive PUSH instructions. All of them are translated into the same sequence of 10 micro-operations: each micro-operation code (opcode/opc) is 16-bits long. Again, notice opcode 120 (0x78) is qemu_st32. 11 11 8 10 10 11 23 120 10 8 Step 3. Find out how intermediate code is translated set a breakpoint at 8107 (to break out of the loop if disas_insn), at 8135 it generates the icount. Currently the number of instructions translated is: 11. Micro-instructions from 0x8c1cddc to 0x8c1ce92 (182 bytes , i.e., 91 micro-ops, average 9 each instruction). When gen_intermeidate_code() finishes, it returns to cpu_x86_gen_code. The generated code is stored at global variable gen_code_buf (and also tb->tc_ptr): 0xb31e6b30. It then calls tcg_gen_code() to generate the x86 code. Now let's look at tcg_gen_code(). In tcg_gen_code(), s->code_ptr (initial value: 0xb31e6b30) always points to the current location of the x86 code. It first calls tcg_gen_code_common. In tcg_gen_code_common(), it first calls tcg_liveness_analysis. As we disabled the MACRO earlier, the dummy version of tcg_liveness_analysis() is called. It then calls tcg_reg_alloc_start(s), which has to actually go through the entire list of temporary variables and registers and set their status. Back to tcg_gen_code_comm(), it then has a processing loop starting from line 2258 of tcg.c, it processes the register allocation for every intermeidate instruction. Note that it nicely displays the ENUM string of each opcode value. The first being processed is INDEX_op_movi_i32 (value; 11, using "p (int)opc" to display it in GDB). It first allocates registers for the movi micro-operation by calling tcg_reg_alloc_op, it actually generates x86 code. use gdb x/16i 0xb31e6b30 (the orignal starting point). For the first two micro-operations, it is generating the helper_trace2(env, 0xf1ec6) for instruction 0xf1ec6. Every movi32 corresponds to about 2 to 3 machine instructions. For these MOV instructions, they are translated using functions like tcg_out_ld and tcg_out_modrm_offset to generate x86 code. Now observe the opcode 120 (at index 7), let's observe the logic: it's going to generate instruction at 0xb31e6b4e (wihch is 6 instructions away from the call of helper_trace2). Stepping through the tcg_reg_alloc_op, we found that it calls tcg_out_mov and at line 1985 of tcg.c it calls tcg_out_op(s, opc, new_args, const_args) where the opc is INDEX_op_qemu_st32. in tcg_out_op(...INDEX_op_qemu_st32...), we find that it is located at tcg-target.c: 1667. This is a big switch case, at line 1919, it calls tcg_out_qemu_st(...), it first calls tcg_out_tlb_load(). This generates 9 instructions at 0xb31e6b5e (9 instructions away from helper_trace2). They are: (not sure about how it corresponds to loading TLB) mov %ecx, eax; mov %ecx, %dx; shr 0x8, %eax; and 0xfffff003, %edx, and 0xff0, %eax, lea 0x360(ebp, eax, 1), eax, cmp (%eax), %edx, mov %ecx, %edx jne 0xb31e6b82, add 0x8(eax), edx Now it's going to generate code at 0xb31e6b85 (which is 19 instructions away from call of help_trace2) by tcg_out_qemu_st_direct. It calls tcg_out_modrm_offset (.. OPC_MOV_EvGv...) that generates 1 instruction starting from 0xb31e6b85: mov %edi, (%edx). tcg_out_qemu_st next calls add_qemu_ldst_label() to add the current context of store into ldst label, but it does not generate ANY more instructions!!!! --------------------------------------------------------------------- --------------------------------------------------------------------- Conclusion: when generating the x86 code for INDEX_op_qemu_st32, it does not generat the call for helper_stl_mmu at all!!! Instead, it flushes TLB. Note that ldst_labels are created insteaded, and they are executed at the end of TLB, which is strange. -------------------------------------------------------------------- -------------------------------------------------------------------- Next to figure out: add_qemu_ldst_label()??? What do these ldst_labels do? ------------------------------------------------------------------------ Experiment 12: find out add_qemu_ldst_label and TLB_flush logic ----------------------------------------------------------------------- (1) tcg/i386/tcg-target.c defines tcg_out_tlb_load, it loads TLB given addrlo_idx (which can be used to locate the address), it computes address = s->args[addrlo_idx] + s->args[addrlo_idx]<<16; Then it computes the address of env->tlb_table[mem_index][0], and then tests TLB hit (using a comparison), conditional jump to a branch to load TLB. It seems that parameter "which" could be used to determine if it's read or write. To figure it out do the following GDB experiment: *** set a conditional breakpoint at tb_find_slow when pc==0xf1ec6, and step into the tcg_out_tlb_load. Let's check the addr: addrlo_idx = 1 args[addrlo_idx] = 1 args[addrlo_idx+1] = 0 Seems that the logic of computing address is MUCH MORE complex than this,give this up first unless we do not have any other solutions (2) tcg/i386/tcg-target.c. tcg_out_qemu_st_direct: has two parameters: datalo, datahi (seems to be the two parts of the address), it first generates a MOVL instruction and then creates an ldst label, cannot step into add_qemu_ldst_label, seems to be a macro (3) add_qemu_ldst_label: creates a ldst label structure, it has information of addrlo_reg, addrhi_reg, datalo_reg, datahi_reg. (4) tcg_out_qemu_ld_slow_path takes the TCGLabelQemuLdst and calls the helper method (helper_st/ld_mmu) to read/write. The address is pushed into the stack for helper_st/ld command by pushing the contents of addrlo_reg/addrhi_reg in real x86 code. Now the question is when tcg_out_qemu_ld_slow_path is called? Call path: tcg->gen_code_common (the_end at line 2327 of tcg/tcg.c) -> tcg_out_tb_finalize -> tcg_out_qemu_ld(st)_slow DON'T UNDERSTAND THE LOGC HERE: why perform all the read/write operations at the end of the X86 code for a Translation Block? (Why delay them)? *** tcg_gen_code_common: most operators except for mov will be handled by tcg_reg_alloc_op (which, e.g., handles INDEX_op_st32 opcode), that performs the translation. After the for loop of processing, it calls tcg_out_tb_finalize (at i386/tcg-target.c:1648) it reads TCGContext->qemu_ldst_labels one by one and performs the action one by one. It seems that raddr corresponds to the x86 code of the ld/st. Note at line 1559, it generates a JUMP instruction to raddr! It seems to be weaving the logic of lt/rd in! [so the code is actually generated at the end by modifying the code]. -------------------------------------------------------------------- Experiment 13: Watch the behavior of tcg_out_tb_finalize. Correlate with the x86 code generation for opcode 120 (0x78) -------------------------------------------------------------------- Set up: (1) We could set a conditional breakpoint at tb_find_slow when pc==0xf1ec6), step into tcg_gen_code_common() and record the opcode generated by the instructions one by one and the address of ldst_labels. Run another round and Break on handle instruction to get the disassembly of these two instructions. (2) Step into tcg_out_tb_finalize and look at the generation of x86 code and compare the change of JMP. (3) set breakpoint on the corresponding instructions and do a step by step. Data below: x86 code (tb->tc_ptr: 0xb31e6b30) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Instruction 1: (@f1ec6) PUSH EBX !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Intermediate Code: Stored tcg_ctx.gen_opc_ptr 0x8c1cddc 11 (movi) 11 8 (call) 10 (mov_32) 10 11 23 (sub_i32) 120 (qemu_st32) 10 Generated Code: tb->tc_ptr 0xb31e6b30 Opcode 120: s->code_ptr: 0xb31e6b4e to 0xb31e6b85 0xb31e6b4e <code_gen_buffer+2894>: mov %eax,%edi 0xb31e6b50 <code_gen_buffer+2896>: mov %eax,0x80(%esp) 0xb31e6b57 <code_gen_buffer+2903>: mov %ecx,0x84(%esp) 0xb31e6b5e <code_gen_buffer+2910>: mov %ecx,%eax 0xb31e6b60 <code_gen_buffer+2912>: mov %ecx,%edx 0xb31e6b62 <code_gen_buffer+2914>: shr $0x8,%eax 0xb31e6b65 <code_gen_buffer+2917>: and $0xfffff003,%edx 0xb31e6b6b <code_gen_buffer+2923>: and $0xff0,%eax 0xb31e6b71 <code_gen_buffer+2929>: lea 0x360(%ebp,%eax,1),%eax 0xb31e6b78 <code_gen_buffer+2936>: cmp (%eax),%edx 0xb31e6b7a <code_gen_buffer+2938>: mov %ecx,%edx 0xb31e6b7c <code_gen_buffer+2940>: jne 0xb31e6b82 <code_gen_buffer+2946> 0xb31e6b82 <code_gen_buffer+2946>: add 0x8(%eax),%edx 0xb31e6b85 <code_gen_buffer+2949>: mov %edi,(%edx) --------- the following are for the next opcode mov32_i 0xb31e6b87 <code_gen_buffer+2951>: mov 0x84(%esp),%edi 0xb31e6b8e <code_gen_buffer+2958>: mov %edi,%esi generated on label, see s->nb_qemu_ldst_labels display s->qemu_ldst_labels: p s->qemu_ldst_labels[0] $26 = {is_ld = 0 (write), opc = 2, addrlo_reg = 1, addrhi_reg = 0, datalo_reg = 7, datahi_reg = 0, mem_index = 0, raddr = 0xb31e6b87 <code_gen_buffer+2951> "", label_ptr = {0xb31e6b7e <code_gen_buffer+2942> "", 0x0}} NOTE:!!!! raddr is 0xb31e6b87 (which is RIGHT AFTER the last instruction generated!!! Note the last instructions are some how added before the processing of next opcode). !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Instruction 2: (@f1ec7) PUSH EDI, 3rd instruction (PUSH ESI; PUSH EBX) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Intermediate Code: Stored tcg_ctx.gen_opc_ptr 0x8c1cdee 11 11 8 10 11 23 120 10 Code starts from 0xb31e6bb1. 0xb31e6bb1 <code_gen_buffer+2993>: mov %eax,%edi 0xb31e6bb3 <code_gen_buffer+2995>: mov %eax,0x80(%esp) 0xb31e6bba <code_gen_buffer+3002>: mov %ecx,0x84(%esp) 0xb31e6bc1 <code_gen_buffer+3009>: mov %ecx,%eax 0xb31e6bc3 <code_gen_buffer+3011>: mov %ecx,%edx 0xb31e6bc5 <code_gen_buffer+3013>: shr $0x8,%eax 0xb31e6bc8 <code_gen_buffer+3016>: and $0xfffff003,%edx 0xb31e6bce <code_gen_buffer+3022>: and $0xff0,%eax 0xb31e6bd4 <code_gen_buffer+3028>: lea 0x360(%ebp,%eax,1),%eax 0xb31e6bdb <code_gen_buffer+3035>: cmp (%eax),%edx 0xb31e6bdd <code_gen_buffer+3037>: mov %ecx,%edx 0xb31e6bdf <code_gen_buffer+3039>: jne 0xb31e6be5 <code_gen_buffer+3045> 0xb31e6be5 <code_gen_buffer+3045>: add 0x8(%eax),%edx 0xb31e6be8 <code_gen_buffer+3048>: mov %edi,(%edx) 0xb31e6bea <code_gen_buffer+3050>: add %al,(%eax) 0xb31e6bec <code_gen_buffer+3052>: add %al,(%eax) later at 0xb31e6bea the instructions for the next opcode are: 0xb31e6bea <code_gen_buffer+3050>: mov 0x84(%esp),%edi 0xb31e6bf1 <code_gen_buffer+3057>: mov %edi,%esi It generates one ldst_label. dump s->qemu_ldst_labels[1] we have (gdb) p s->qemu_ldst_labels[1] $33 = {is_ld = 0, opc = 2, addrlo_reg = 1, addrhi_reg = 0, datalo_reg = 7, datahi_reg = 0, mem_index = 0, raddr = 0xb31e6bea <code_gen_buffer+3050> "", label_ptr = {0xb31e6be1 <code_gen_buffer+3041> "", 0x0}} Note: raddr is 0xb31e6bea (which is the next immediate code_ptr for next opcode) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! At line 1654 of i386/tcg-target.c tcg_out_tb_finalize !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! There are 6 ldst labels. Process s->qemu_ldst_labels[0]: calls tcg_out_qemu_st_slow_path It first changes label_ptr[0] (which 0xb31e6b73) to s->code_ptr -label_ptr[0]. So the original instruction at ---------------------------- 0xb31e6b7c <code_gen_buffer+2940>: jne 0xb31e6b82 <code_gen_buffer+2946> ---------------------------- is changed to 0xb31e6b7c <code_gen_buffer+2940>: jne 0xb31e6e3a <code_gen_buffer+3642> (the beginning of the code to be generated by tcg_out_qemu_st_slow_path!!!!!) --------------------------- Then afer the helper_stl_mmu call is generated, it uses the raddr to jump back to the instruction for the next immediate micro-operation!!!! ---------------------------------------------------------- So the logic is very clear: for INDEX_op_qemu_st32 opcode, the qemu_gen_code_common first generates the first logic (1) load/check TLB (2) if TLB hit, load from TLB (3) otherwise jump to the helpmer_ld/st_mmu code (generated and hooked up/wired by the tcg_out_qemu_st_slow_path!!!) This is the reason why helper_ld/st_mmu is NOT called every time! --------------------------------------------------------- Conclusion call graph of the generated X86 CODE: load TLB check if TLB hit branch {go ahead} : {slow ld/st which calls helper function} Big question: from load RIGHT BEFORE TLB, we should insert a call to record the memory reference. Now the problem is how can we get the address that is beging read/write? Algorithm Idea: to insert a call of tcg_out_trace_mem(...) at the beginning of tcg_out_qemu_st(...). The function will have all of the parameters of tcg_out_qemu_st(...) to calculate the address to access. Now the logic to calculate the address should be similar to that of tcg_out_qemu_st_slow_path(), it takes tcg_ctx and label. It pushes mem_index, data_reg, addrlo_reg, TCG_AREG0 (all these information come from label) and calls a function similar qemu_st_helper(tcg_ctx, function_ptr). ------------------------------------------------------------- Experiment 14. Add the memory tracer ------------------------------------------------------------- (1) define helper_trace_mem(unsgiend int addr, unsigned int size, unsigned int bRead) in ops_sse.h, It will be called RIGHT AFTER helper_trace2 (for st/ld instructions), and RIGHT BEFORE TLB load. No need to pass EIP, because we can access global variable global_eip. The logic is to simply call the corresponding trace_mem function. <DONE> (2) define tcg_out_trace_mem(mem_index, data_reg, data_reg2, addrlo_reg, addrhi_reg) i386/tcg-target.c. This function simulates tcg_out_qemu_st_slow_path and takes all of its parameters. It calculates the access address and then call helper_trace_mem(..) <DONE. Used some tricks if #ifdef for different arch> (3) in tcg_out_qemu_st() and tcg_out_qemu_ld() in tcg-target.c, call tcg_out_trace_mem(..) to perform the trace. The call should be placed right before line 1367 tcg_out_tlb_load. To verify: (1) We could set a conditional breakpoint at tb_find_slow when pc==0xf1ec6), step into tcg_gen_code_common() and record the opcode. The set breakpoint at disas_insn and then handle_instr, and also handle_mem_read and handle_mem_write. < OK.> (2) ********************8 However the system eventually fails on CPL_MASK (dpl <cpl error), need to check details *** line 652 of seg_helper.c ************************************8 Design: set a breakpoint at 652 of seg_helper.c and figure out the meaning of dpl and cpl. CPL is the current privilege level. RPL is the request privilege level. DPL is the description privilege level. Use a stupid way BINARY SEARCH (ignore BP) to find out the last instruction involved. (set a breakpoint on help_trace2). Not quite reliable due to threads interleaving. Attempt2: notice that env->cr[3] 0x39000 is quite suspicious (never encountered before). Set a watch on it (use watch ((CPUX86State*)0x08da4b90)->cr[3]), this caught where cr3 is changed (switch_tss). This does not help a lot. Attemp3: note that raise_exception_err env->eip is always 0xecc (because the handler address is 0xecc). env->sysenter_eip is 0x804def6f Attemp4: gdb setup: (gdb) handle SIGUSR1 noprint Signal Stop Print Pass to program Description SIGUSR1 No No Yes User defined signal 1 (gdb) b ops_sse.h:2255 Breakpoint 1 at 0x82e245f: file /home/csc288/qemu/qemu-1.4.0/target-i386/ops_sse.h, line 2255. (gdb) b raise_exception_err if exception_index!=14 Breakpoint 2 at 0x82cacd1: file /home/csc288/qemu/qemu-1.4.0/target-i386/excp_helper.c, line 122. Attemp5: change to branch.exe (from srss.exe) to observe and see the result. Observation: no even the raise_excepion_err BP does not work (maybe we need to remove the condition). This looks like a memory corruption bug where some important region is overwritten. Hard to find! Attemp6: Note, it may be caused by the additional read of instructions (it reads 15 bytes always, which could be reaching the end of the code region). disable the read 15 bytes parts (in ops_sse.h), and see if it crashes. DOES NOT WORK! Attempt7: observation: it seems that when we change the process to trace, it is the process being traced crashed. So the problem should not be located in the qemu_out_xxx calls. It still resides in the traceinstr package. Now disable the call of Trace::handleMemRead and Trace::handleMemWrite and just printf() the message in handle_mem_read and handle_mem_write. RESULT: It still crashed!. Try remove the printf again. Still does not work. Attemp8: switch back to srss.exe and run GDB. (to switch in between the winxp guest window causes trouble of freezing the system). Still the same. No try to redefine the HANDLE_MEM_READ and HANDLE_MEM_WRITE. It seems that function calls are the problem. Try to not call anything (including printf) and just do local variables mod. See what's going on. Now it's not throwing the error. It seems to be the extra function call that causes the problem. Attempt9: make an extra call directly in tcg/i386/tcg-target.c replace it with a dummy function and see what's going on. OK. Now replace the dummy function with a printf and see what's going on.. It seems to break it when we put printf in. When it has multiple levels of calls it seems ok. Attemp 10: figure out how the STACK_REG and why it should be changed. Read tcg_out_qemu_st_slow_path and observe what it does. Observation: tcg_out_qemu_st_slow_path calls tcg_out_calli to generate the code of call and the displacement (target). It looks fine. Note that has a a JMP_short 5 right after the call. It then generates OPC_JMP_long and advances the code_ptr by 4, this is clearly to set up the place (JMP_addr) for the later st/ld label processing to fill out the eventual address to jump back. then tcg_out_addi(...TCG_REG_CALL_STACK...). (it's like add $0x10, %esp). then it generates jump back to raddr. So a call of tcg_Zout_qemu_slow_path generates the following instructions: ------------------ push $0x0 push %edi push %esi push %ebp //the above pushes 4 params in reverse seq call 0x82f09ca <helper_stl_mmu> jmp 0xb31e6a3f (jump 5 bytes away, skip next instr) 0xb31e6a3a jmp 0xb31e6945 (THE START OF NEXT MICROCODE AFTER THIS) 0xb31e6a3f add $0x10,%esp jmp 0xb31e6945 (jmp raddr the next immediate instruction) ---------------------------------------- Now the problem, why add 0x10 (16) to %esp? In GDB, use tb *addr ( hardware breakpoint!!!) to set a breakpoint on the first push $0x0 instruction and the other instructions to observe. Somehow, step by step does not work that well. Initially, the ESP is 0xb02f4e50, after 3 pushes (notice that sometimes it's 3 pushes). and then the call to helper_ldl_mmu, ESP is 0x...4e44 (note that it's 12 bytes away!). So by C language calling convention, after a function call is completed, when reseting ESP/EBP, it takes away the parameters. It seems that we are doing it right regarding pushing parameters. ---------------------------------------- Attempt 11: given that it's always the problem of cpl/dpl/rpl in accessing a segment, let's check the values of env->hflagsA Observation: even for the same process the hflags may change. Attempt 12: hflags changes too frequently, first get close to the crash point then watch on the dpl expression. Obsevation: cr3_to_trace is 0x5042000, and normally cpl is 0 and dpl is 0. Note that somehow cpl is changed to 3, cpl is calculated using the following formula (from helper_seg.c): env->hflags & 3 set watch point "watch *0x8da4bc8 & 3", the address is retrieved using p/x &(env->hflags). Found that the value is changed from 0 to 3 at line 995 of cpu.h! cpu_x86_set_cpl. It's called by helper_iret_protected and then helper_ret_protected and then helper_ret_protected and then x86_set_cpl. Next step into code_gen_buffer() and cpu_x86_exec. The current global_eip is 0x806eeec5. next eip is env->eip 0xec6. (here is line 2200 of seg_helper.c). Then after quite a while it reaches the raise_exception_err. Now do a comparison, observe other cases of helper_iret_protected and see what's going on. It seems that helper_iret_protected doe snot always hit the cpu_x86_set_cpl! (in many branches it pops ESP). It depends on the value of parameter shift. !!!! Actually line 2200 of seg_helper.c is ONLY hit one times, and that directly triggers the exception !!! (note: to repeat this needs to type the "run" command directly, don't let windows to warm restart!) ---- now let's set a BP on helper_trace2 (it's called for every instruction) and see how many more will be called. !!! after 3 instructions it's exception! ******************************************************************** eip_ins are 134854, 134860, env->cr[3] is 0x39000, note that cr3_to_trace is 0x4dc2000. Next: set a bp on tb_find_slow, hit then it hits dias_insn for the following (1) pc_start 134854 (0x20ec6): mov R, Iv (mov immediate number to R) (2) pc_start 134860:(0x20ecc): INT 16 (looks like an I/O request), code placed tcg_ctx.gen_opc_ptr (index 16), opc11, list of micro-opcode: movi (pc) st_tl (save to cpu_env->eip) gen_helper_raise_inerrupt at 0xb6cb5643!!!!!!!!1 (later set a BP on it) ******************************************************************** ++++++++++++++++++++++++++++++++++ Now we could simply the breakpoint process b disas_insn if pc_start==134860 (it will be hit twice, ignore the 1st hit). However, it's kind of slow though, could simply add a branch in dias_insn to speed it up. >>> b translate.c:4272 b raise_exception_err if exception_index!=14 >>> ++++++++++++++++++++++++++++++++++ Continue the observation (3): the TB has only these two instructions, then it starts to generate code. The generated code is: 0xb6cba9a1 <code_gen_buffer+61688225>: mov %ebp,(%esp) 0xb6cba9a4 <code_gen_buffer+61688228>: mov $0x20ecc,%ebx 0xb6cba9a9 <code_gen_buffer+61688233>: mov %ebx,0x4(%esp) 0xb6cba9ad <code_gen_buffer+61688237>: mov $0x12,%ebx 0xb6cba9b2 <code_gen_buffer+61688242>: mov %ebx,0x0(%ebp) 0xb6cba9b5 <code_gen_buffer+61688245>: call 0x82e2362 <helper_trace2> 0xb6cba9ba <code_gen_buffer+61688250>: movl $0xecc,0x20(%ebp) 0xb6cba9c1 <code_gen_buffer+61688257>: mov %ebp,(%esp) 0xb6cba9c4 <code_gen_buffer+61688260>: mov $0x10,%ebx 0xb6cba9c9 <code_gen_buffer+61688265>: mov %ebx,0x4(%esp) 0xb6cba9cd <code_gen_buffer+61688269>: mov $0x2,%ebx 0xb6cba9d2 <code_gen_buffer+61688274>: mov %ebx,0x8(%esp) 0xb6cba9d6 <code_gen_buffer+61688278>: mov %eax,0x80(%esp) *** 0xb6cba9dd <code_gen_buffer+61688285>: *** call 0x82ca99c <helper_raise_interrupt> Clearly, 0xb6cba9dd is the one which raises the interrupt! ------------ set a BP at 0xb6cba9dd -------------------------------- Now simplify the debug: hb *0xb6cba9d6 (use hb incase of overwriting). DOES NOT WORK!!! set a bp at helper_raise_interrupt (condiiton into==16) after the first hit on raise_exceptioN_err ++++++++++++++++++++++++++++++++++ Now we could simply the breakpoint process >>> b helper_raise_interrupt if intno==16 b raise_exception_err if exception_index!=14 >>> ++++++++++++++++++++++++++++++++++ ******************************************************************** Attempt 13: study the logic of helper_raise_interrupt >>> b helper_raise_interrupt if intno==16 b raise_exception_err if exception_index!=14 >>> Observation: when it's called, env->cr[3] is 0x39000, next_eip_addend is 2, intno is 16, error_code is 0, according to (google interrupt list, int16/eah=0 is to get keystroke). Current env->eip is 0xecc and next eip is 0xece (because of next_eip_addend), !!! env->hflags is 0x4008c7 and it & 3 is 0x3. Use this to compare with regular version !!! !!!!!!!!!!!!!!!!!!!! It's clear helper_raise_interrupt belongs to the last INSTRUCTION. Next helper_trace2 is never hit, jumps directly to exception !!!!!!!!!!!!!!!!!!!! Now recompiles and then check. Modify target-i386/cpu.h It seems that the helper_raise_interrupt when intno==16 is NEVER called! Conjecture here: use of external calls somehow changes the relative speed of the threads and 0x39000 thread when triggers and exception has some how env variable messed up. Notice that thread 0x39000 does call the INT 16 multiple times, its env->hflags is 0x44 (different from the 0x4008c7) of the previous. Attempt 14: check back again on env->hflags and see why its changed >>> b raise_exception_err if exception_index!=14 b cpu_x86_set_cpl if env->cr[3]==0x39000 display/x s->hflags display/x s->hflags & 3 >>> the hflags is shown to be flipped several times ranging from 2b4 to 0x400b4 . >>> When it is close to the crash, Ctrl+C and enter a watch point (to save time, watch is too slow) >>> "watch *0x8da4bc8 & 3" (the address is retrieved earlier by p/x s->hflags) >>> The reaspon cpu_x86_set_cpl(3) is called is because shift is 1 and is_iret=1, also new_eflags and VM_MASK is 1. see seg_helper.c:2026, set a breakpoint on it (condition on env->cr[3]==0x39000). Observation: in most cases, it is returning new_eflags 0x206. tracing into the call POPL(ssp, sp, sp_mask, new_eflags) we find that POPL is a macro defpnition that cpu_ldl_kernel(env, SEG_ADDL(ssp, sp, sp_mask)) and then sp+4. It reads the data from kernel_stack. It actually calls cpu_ldl_kernel, addresses are like 8055014c. NOTE THAT after the dumping message shows up, the cpu_x86_set_cpl for process 0x39000 will be only called ONCE!!! Now let's observe the last case that it is called. The last addr is is trying to retrieve from is 0x8054d6c4, the new_eflags it retrieves is 0x23006. Tried to run it a second time, within the same GDB instance, the value is ALWAYS 0x8054d6c4!!! Now let's set a breakpoint on 0x8054d6c4! cpu_stl_kernel. The cpu_stl_kernel is never hit but the cpu_ldl_kernel is hit, which is strange. Let's try the physical address in cpu_ldl_kernel it eventually calls ldl_p(0xa004a6c4), this is always a fixed address as well, now set a breakpoint on stl_p(0xa004a6c4). Very time consuming... Still not hit. Attempt 14: study the logic of helper_raise_interrupt again. It must be pushing eflags into the stack. Check specifically the use of VM_MASK. It seems that when syscall or interrupt in protected mode, VM_MASK is off (on env-->eflags). See do_interrupt_protected at line 792 of seg_helper.c) >>> b raise_exception_err if exception_index!=14 wait for some time untill close to crash and then b helper_raise_interrupt if intno==16 >>> Check the value of env->eflags at the last helper_raise_interrupt (note that VM_MASK is 0x0002 0000). Observation: the env->eflags of the process 0x39000 is 0x23002 (it is SET!) Attempt 15: now the question is why the env->eflags for process 0x39000 has VM_MASK enabled (it is never enabled after the system starts to trace srss.exe). We need to watch when this happens. Design: (1) in source code add a global variable last_39000_eip (2) modify helper_trace2 instruction to update the last variable and print it out when env->eflags & VM_MASK changes. Observation: switch made from 1 to 0: 20ec6, c83f9 from 0 to 1, 806ef788, 806ef804, Each of these instructions are only hit up to 4 times. Given the above: >>> b raise_exception_err if exception_index!=14 b ops_sse.h:2265 (at 2265 we add a branch to check process id condition to stop at c83f9) b ops_sse.h:2269 (at 2269 we add a branch to check process id condition to stop at 22ec6) >>> Note that bp at 2265 is only hit once, after the first raise_exceptioN_err leads to the problem. It changes the flags from 0x23202 to swap OFF VM_FLAG. But then it is switched from 1 to 0 by 8063f804, but again, it is switched from 0 to 1 by 20ec6. The BP at 20ec6 is ONLY hit ONCE! right before the exception. It's the one that SETs the VM_FLAG. and submits the INT 16 in the VM_FLAG mode, it triggers do_interrupt_protected()->cpu_x86_set_cpl(), based on the condition that env->eflags & 0x20000 (VM_FLAG) is 1. ***** Now let's look at the two instructions 0x20ec6 and 0x20ecc, the following is the analysis from dias_insn ***** the following are the last 4 records recorded *********** ******************************************************************** eflags swiched from 0 to 131072 at 20ec6 (switch is already done, actually should be the last eip who does the switch). Last eip is 806eeec5 eflags switched from 131072 to 0 at 806ef788 (last eip is 20ecc, actual change) eflags from 0 to 131072 already at c83f9 (last eip is 804dfa41, actual change) eflags from 131072 to 0 at 8063f804 (last eip is 20ece, actual changes) ******************************************************************** Instructions at *** 806eeec5: iret (turn on VM flag) 806ef804: mov R, Iv (actually no effect, it's just the next immediate instruction 20ece) 806ef788: mov Ev, Iv (actually NO effect on VM, it's just the next immediate instruction after 203cc) *** 804dfa41: iret (turn on VM flag) 20ec6: mov R, Iv (actully no EFFECT) 20ecc: INT 16 (turn OFF VM flag) *** 20ece: ILLEGAL_OP the first visit!!! second visit les Gv, still ILLEGAL_OP. c83f9: pushfA (actually no EFFECT) It seems that it's the instruction 20ece (les Gv) causes ILLEGAL_OP error! Next to compare it with a normal run. In the CORRECT VERSION, 20ece is generating ILLEGAL_OP as well, HOWEVER, it is not hit twice! >>>>> b disas_insn if pc_start==0x20ec6 || pc_start==0x20ece || pc_start==0x20ecc b ops_sse.h:2266 (where we added condition to check eip_in is 0x20ec6, 20ecc, and 20ece) b raise_exception if exception_index!=14 >>>>> observation: ***** !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! disas 0x20ec6 (mov)-> disas 20ecc (INT 16) -> execute 20ec6 -> 20ecc (INT 16) -> do_interrupt_protected getting e1/e2 by cpu_ldl_kernel, and then calculate dpl=0, cpl = 3! next_eip = 0xece -> raise_exception(GPF - general protection error) -> do_interrupt_protected (13 - GPF), next_eip=0xecc -> still dpl<cpl (but is_int is 0), so won't raise exception. but push segment selectors, set cpl back to 0. sets back to env->eip to 806ef788 [this is actually the interrupt handler, calculated from e1 and e2] disas 20ece -> execute 20ece -> RIASE_EXCEPTION (index=6, invalid opcode) - do_interrupt_protected(intno=6 invalid opcode) -> it jumps then to eip 0x806ef804 (interrupt handler)-> ... a lot of instructions ... -> disas 0x20ec6 (mov) -> disas 0x20ecc (INT 16) -> execute 20ec6 (now VM mask is 1) -> exec 20ecc (INT 16) -> do_interrupt_protected -> cpl is 3 causes an exception -> do_interrupt_protected(13, is_int=0) -> 806ef788 (mov instruction).... a lot -> dias_insn 20ece -> exec 20ece -> raise_eception (6, invalid opcode) -> do_interrupt_protected (index6) -> 0x806ef804 -> 0x806ef809 -> ... !!! over 1000 instructions (this is exactly the same as FIRST ONE!) ... -> crash (stop: 0x0000007f unexpected kernel mode trap), 0x8 means double fault from MSDN. Guess: the same error 0x20ece is repeated twice and this leads to the crash. do_interrupt_protected is (int45, unknown service). LAST EIP: 0x8050b895: INT 45. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ----> It seems dooing the exception twice cause the problem. Check around the code between 0x20e00 to 0x20f00 (set BP on disas_insn condition on pc_start) --------- now env->eflags & VM_FLAG is 0 -> 20e74: push 20e76: push 20e78: push 20e7a: push 20e7c: mov Ev, Gv 20e7f: mov R, Iv 20e82: mov seg, Gv 20e84: mov Seg, Gv 20e86: mov R, Iv 20e89: push 20e8b: mov 20e91: Arithmetic 20e98: push 20e9a: pop 20e9b: push ds 20e9c: pop ds 20e9d: Arithmetic 20ea0: Arithmetic 20ea3: mov -------> 20ea5: mov Ev, Gv loop -> 20ea8: mov 20eaa: Arithmetic -------> 20eb4: jecxz (loop) 20eb6: push 20eb7: pop ds -----> 20eb8: call im (-2850), 20ebb: push, 20ebc: mov Ev, Gv 20ebe: Arithm, 20ec1: push/pop, 20ec2: push, 20ec3: push ds 20ec4: push, 20ec5: mov R, Ib 20ec7: mov R, Iv 20eca: mov Ev, Gv, 20ecd: INT 19 (0x13, disk request) !!! strangely, first did not hit 20ec6, 20ecc and 20ece 20ecf: jcc Jv, Then it hits 20ec6: INT 16(0x10 - VGA request?) Check the binary code starting from 20ec5 20ec5: (all in hex) b4 41 bb aa 55 8a 56 4 cd 13 cd 13A Afer 10 sec (1st time 20ec6 is hit): cf 66 b8 12 0 0 0 cd 10 c4 c4 ?????!!!!! the code segment is changed!!! compare with the version of tcg_out_trace_mem is commented out 20ec5: (all in hex) b4 41 bb aa 55 8a 56 4 cd 13 cd 13A (SAME!) Afer 10 sec (1st time 20ec6 is hit): cf 66 b8 12 0 0 0 cd 10 c4 c4 (SAME!) BUT AGAIN, 20ecc is NEVER CALLED again in the CORRECT version. Attempt 15: Now set an additinal variable of cr3_to_trace2 and print out the instructions of 390000. Analyze the instruction before 0x20ecc. Observation: it immediately crash the system. Notice that the memory tracing has been disabled!!! So it's the relative speed of the process that caused the problem! Now, let's only call handle_instr for instructions in range 0x20000 to 0x20f00. Still does not help. Trace using GDB. Found that the program stuck at: os_host_main_loop_wait <- main_loop_wait <- main_loop <- main(). Found that if we disable the check of cr3_to_trace2 in translate.c it is then fine. The regular running produces exactly the same dump of stack trace of os_host_main_loop_wait <-main_loop_wait <-main_loop <-main(). So the Ctrl+c in GDB does not work, 99% of the time, the system is waiting for I/O. Need to set breakpoint on handle instruction. Attempt 15: Now eable cr3_to_trace2 again and set breakpoint on help_trace2 when env->cr[3]==0x39000 (and without the condition). see what is going on. By running on helper_trace2 we found that the system is running in cr3 process 0. By conditional breakpoint on 0x39000, the helper_trace2 is still hit. !!!!! AFTER REMOVING the isInsOfProcess() check the system got to work! !!!! What the ...... heck .... could not figure why ....A Attempt 16: now remove the cr3_to_trace2 and resume the normal execution (enable trace mem) - still does NOT WORK!!!! Attempt 17: now check another process and see what's the reason of crash. winlogon.exe ok. try csrss.exe ok. lsass.exe. ok. svchost.exe. ok logonui.exe. ok. spoolsv.exe. ok. userinit.exe. ok alg.exe. ok. rundll32.exe. ok wscntfy.exe ok dumprep.exe didn't check. try notepad.exe (crash), branch.exe ( !!!!!!!!!!!!!!1 Still the same error with 0x39000. Seems that process 0x39000 is responsible for handling keyboard I/O. It's always complaining about invalid opcode at 0x20ece (les GS).!!! Verify next time. Attempt 20: It seems 0x20ece is the cause (multiple hits causes problem). Set a breakpoint on helper_trace condition 0x20ecc and 0x20ece and see how's different on (branch.exe) on GOOD and BAD versions. GOOD VERSION (branch.exe): 20ecc and 20ece is only hit ONCE! , check if other instructions are ever hit. We set a bp on helper_trace2 if env->cr[3] is 0x39000. It seems to be running all the times, only when the program being traced is running, it is not invoked. ??does OS every switch? BAD VERSION (branch.exe): it's altogether hit twice. But after 0x20ece is hit, it's running in kernel model for over 1000000 instructions. not sure what's the purpose of the process. 20ecc will be hit twice. Attempt 21: find a way to skil the execution of 20ece and see how it goes. (1) where does 20ece start from (translated from x86)? (2) where does next instruction start from? (3) do the jump. >>>> (1) set BP on disas_insn condition on 20ece. It seems that 0x20ece is the first instruction of this TB, however, env->eip is 0xece (missing the 20 at the beginning)? now sure if it's caused by a segment or not. it calls gen_exception -> we can simply skip this by resetting EIP. (2) gdb is too slow, let's simply COMMENT OUT the gen_exception statement 5547 of translate.c -> copy code of illegal_op and comment out the statement which generates the exception) --> does not work and involved in infinite loop of 0x39000 in GDB!!! To simply disable the if check does not work either! (needs 2nd restart and skips the srss.exe due to windows mechanism). check the effects on branch.exe shows that it is involved in blackscreen. Now set a condition to skip exception on 0x20ece ONLY. Does not work either!!! ----> conclusion: to either disable exception generation for 0x20ece or to disable all for any cases would not work. There are other instructions after 0x20ece and they will trigger exception as well. Attempt 22: Figure out why 0x20ec6 is called from iret from 0x80xxxxx. (1) Figure out the functionality and process name of 0x39000 >>> using QEMU does not work. Cannot read the process structure, maybe the address is in real mode. >>> use WinDbg to examine the list of all processes. 0x39000 is the process named "System". >>> From internet found that the System process has all EPROCESS structure for each process, it seems to be managing the processes and managing system calls. However, it's strange why 0x20ece will be invokved and run in normal user mode. Attempt 23: check the ORIGINAL version and see if 0x20ec6 is EVER HIT and what's the CPL mode. >>> note that help_trace2 function is not there. we have to set bp on raise_exception_err if exception_index!=14. The system does raise a general protection error (13) at 0xecc , but no further error at 0xece. Also set a bp at disas_insn condition pc_start==0x20ecc and pc_start==0x20ece. It is exactly the same as the revised version (adding help_trace2). Now use bp on disas_insn and condition to check how the process switch between 0x39000 and other processes. After the initial timer screen, it starts to switch to other processes (most of time EIP in 0x806xxxxx range, first instruction is 0x8069e090 and env->eflags is 2 - in kernel mode). Running in 0x39000 takes pretty long before switch to another process. Some instructions (1st) being switched to (in 0x39000): 0x8069e090, 0x804ea161, 0x804f1c48, 0x80589c93 It's strange that they are not jumped from gen_exception or gen_interrupt; we'll need to figure out how it's jumping from one process (identified by cr[3]) to another. Attempt 24: check why 0x20ec6 is accessed. (jumped from iret) bp on *** 806eeec5: iret (turn on VM flag), and ip in range 0x20ec0 and 0x20ecf (if too slow, create an if branch in helper_trace2). --->(result): Logic of iret: (1) set CC op, (2) jmp pc (0x806eeec5), (3) calls helper_iret_protected (set a bp on it can see the logic): it calls POPL(ssp, sp, sp_mask) and gets the new EIP 0xec6!!! At this moment eflags is 0x202 pretty early 0x20ec1 (push) is hit (eflags 0x202) 0x20ec2: push 0x20ec3: push ds 0x20ec4: push 0x20ec5: mov R, Ib 0x20ec7: mov R, Iv 0x20eca: mov Ev, Gv 20ecd: INT 19 (0x13, eflags 0x202, disk request!!!! visited Strangely it's hit multiple times! - and then system loading Then it hits 0x806eeec5 (iret) and then 0x20ec6 Now we are clear about the iret logic, the EIP is the FIRST WORD popped from kernel stack (ssp, sp, sp_mask). Attempt25: set a BP at helper_trace 0x806eeec6, and 0x20ecd and see how it's pushed. After helper_trace finished, si in GDB, we should see the call of helper_gen_interrupt and helper_iret respectively. <<< (1) create at line 2259 of ops_sse.h inside helper_trace2 to capture eip_in in the range. (2) set a BP at 2259 of ops_sse.h (3) observation on 0x20ecd: it does NOT push anything! Just sets env->exception_next_eip to 0xecf in helper_gen_interrupt -> then it calls cpu_loop which later calls do_interrupt_real()!!! [instead of protected] this is determined by env->hflags & HF_SVMI_MASK (0x44 & 1<<21). (next eip is 0xecf), it does uses PUSHW to push the information: cpu_compute_eflags(env) [0x202], old_cs:0x2000, old_eip: 0xecf Note that PUSHW is defined as cpu_stw_kernel at ssp + sp & sp_mask.A During the last push EIP of the last 0x20ecd: SSP: 0x22f30, SP (esp): 0x67fe4 (when retrieved it's 0x67fe2 in POPW) The corresponding helper_iret_real matches (4) observation on 0x806eeec5: it's the iret instruction. it calls helper_iret_protected (but NOT helper_iret_real!!!!) , it takes out old_eip:0xec6, cs: 0x2000, eflags: 0x23206). The difference between helper_iret_real and helper_iret_protected is that PUSHed and POPPed are 16-bit word!!! The first 5 0x20ecd is matched with helper_iret_real, but the LAST one is matched with helper_iret_protected!!! (which is kind wrong!!!!)---------------------------!!!!! Note that EACH CALL of 0x20ecd is PAIRED WITH an helper_iret_real in the five first calls. Then after the 30 seconds waiting, 0x20ecd is hit again (multiple times), this time: SSP: 0x22f30, SP (esp): 0x57fe4 , eip: 0xecf (still do_interrupt_real) Still being matched every time for 0x20ecd (a corresponding helper_iret_real is called). Then after a while, helper_iret_protected is called by 0x806eeec5 (iret)--> It turns out that (NOTE: there is no match of any hit of 0x20ecx range!!!) SSP: 0x0, sp: 0xf9e6374c. It reads out new_eip: 0xec6. Now the question is WHO PUSHes and calls? Set a BP on cpu_stl_kernel if ptr>=0xf9e63740 && ptr<=0xf9e63750 The problem is that this BP is NEVER EVER HIT!!!!! (so there is a memory corruption HERE!!!!) Work to do: set a watch point on the RAW ADDR that the system is READ for cpu_ldl_kernel. It's reading VAddr 0xf9e6374c, real addr:0xa1a6d74c , then it calls ldl_le_p which really reads the address 0xa1a6d74c --> 0xec6. (in two consecutive debugging sessions, all the above values are the same!) Now it's easy, let's set a hardware w/r breakpoint on 0xa1a6d74c. >>> *** use command awatch *0xa1a6d74c (speed is pretty fast!) The contents of 0xa1a6d74c changes several times 0 -> 209 -> 1 -> 209 -> 0x806eec9e -> use ignore 4 (the awatch bp nunber) we find that it is hit 53 times. So ignore it 52 times and then check the value (doesn't quite work, sometimes it's only hit twice). --> finally find that it is OVERWRITTEN to 0xec6 at one instruction before 0xb339934f!!! The next action is helper_trace_mem (write to 0xf9e6374c) and env->cr[3]=0x39000. !!!!!!!!!!!!!! env->eip is 0x806eeec4 (just one instruction before 0x806eeec5). Dump of instructions: 0x806eee64: 0x806eee6b: mov Ev, Gv 0x806eee6d: arth 0x806eee72: mov Gv, Ev 0x806eee75: push Iv 0x806eee7a: push Iv 0x806eee7f: push Iv 0x806eee84: push Iv 0x806eee89: push Iv 0x806eee8e: mov R, Iv 0x806eee93: arith 0x806eee98: mov R, Iv 0x806eee9d: GRP1 0x806eeea3: arith 0x806eeea5: push 0x806eeea6: pushf 0x806eeea7: GRP1 0x806eeeae: GRP1 0x806eeeb5: push Iv 0x806eeeba: mov R, Iv 0x806eeebf: Op A, Iv * 0x806eeec4: push 0x806eeec5: iret Attempt 26: verify that the push instruction 0x806eeec4 is the one which pushes 0xec6 into stack. >>> modify help_trace2 and add an if branch on 0x806eeec4 and 0x806eeec5. >>> hit the breakpoint and then add do_interrupt_protected, find the addressing being read >>> run it again and set an awatch point on the address >>> check if this is done by 0x806eeec4 and 0x806eeec5. -------------------------------------------------------------------------- Conclusion:!!! confirmed 0xec6 is purposely pushed by 0x806eec4 so that iret could jump to it. Now the problem is that is 0xec6 is an immediate number or from some other registers! Need to dump instructions from 0x806eee9d to 0x806eeec5. Job to do: write a function to dump instructions starting at an address. ---------------------------------------------------------------------------- Attempt 27: write a function printInstr(begin_addr, end_addr) - print all instructions in the range. >>> b ops_sse.h:2259 >>> then call print_instrRange(0x806eee64, 0x806eeec6), got the dump below --------------- @EIP 0x806eee64: length: (7): movl %fs:0x40, %esi @EIP 0x806eee6b: length: (2): mov %esp, %eax @EIP 0x806eee6d: length: (5): sub $0x00000210, %eax @EIP 0x806eee72: length: (3): movl %eax, 0x4(%esi) @EIP 0x806eee75: length: (5): push $0x00000000 @EIP 0x806eee7a: length: (5): push $0x00000000 @EIP 0x806eee7f: length: (5): push $0x00000000 @EIP 0x806eee84: length: (5): push $0x00000000 @EIP 0x806eee89: length: (5): push $0x00002000 @EIP 0x806eee8e: length: (5): mov $0x806EF6CC, %eax @EIP 0x806eee93: length: (5): sub $0x806EEEC6, %eax * @EIP 0x806eee98: length: (5): mov $0x806EEEC6, %edx * @EIP 0x806eee9d: length: (6): and $0x00000FFF, %edx @EIP 0x806eeea3: length: (2): add %edx, %eax @EIP 0x806eeea5: length: (1): push %eax @EIP 0x806eeea6: length: (1): pushf @EIP 0x806eeea7: length: (7): orl $0x00020000, (%esp) @EIP 0x806eeeae: length: (7): orl $0x00003000, (%esp) @EIP 0x806eeeb5: length: (5): push $0x00002000 @EIP 0x806eeeba: length: (5): mov $0x806EEEC6, %eax @EIP 0x806eeebf: length: (5): and $0x00000FFF, %eax @EIP 0x806eeec4: length: (1): push %edx @EIP 0x806eeec5: length: (1): iret @EIP 0x806eeec6: length: (4): mov $0x0012, %ax @EIP 0x806eeeca: length: (2): addb %al, (%eax) * @EIP 0x806eeecc: length: (2): int $0x10 @EIP 0x806eeece: length: (2): les %esp, %eax @EIP 0x806eeed0: length: (2): addb %al, (%eax) @EIP 0x806eeed2: length: (2): addb %al, (%eax) @EIP 0x806eeed4: length: (2): addb %al, (%eax) @EIP 0x806eeed6: length: (2): addb %al, (%eax) @EIP 0x806eeed8: length: (2): addb %al, (%eax) @EIP 0x806eeeda: length: (2): addb %al, (%eax) @EIP 0x806eeedc: length: (2): addb %al, (%eax) @EIP 0x806eeede: length: (2): addb %al, (%eax) ----------------- So it's pushing %edx, but its source is from two immediate numbers (look at the two instructions with * above), it's intention should be actually return to 0x806eeec6? From the sequence of POPLs in helper_ret_protected (in POP order), we have new_eip - 0xec6 new_cs - 0x00002000 new_eflags - result of pushf @0x80teeea6 ------------ Conjecture 1: intention is to actually jump back to 0x806eeec6? There are only three instructions after it. It's basically to int 0x10 (ax=0x12) - looks like a VGA I/O request then the instruction les at 0x806eeece is to load result eax into es:[esp] Interestingly there are no more instructions after that! When the BP (at 0x806eeec4) is hit a second time, dump instructions displays the same. -------------- Now simply look at another couple of iret instructions and see if it is arranging PUSHes before iret. examples: 0x804dfa41(no), 0xc01d9 (no), 0x804df9a6(no) ------------------ dump the instructions around 0x20ec6 we have: They are EXACTLY the same around the code 0x806eeec6!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11 @EIP 0x20e75: length: (5): push $0x00000000 @EIP 0x20e7a: length: (5): push $0x00000000 @EIP 0x20e7f: length: (5): push $0x00000000 @EIP 0x20e84: length: (5): push $0x00000000 @EIP 0x20e89: length: (5): push $0x00002000 @EIP 0x20e8e: length: (5): mov $0x806EF6CC, %eax @EIP 0x20e93: length: (5): sub $0x806EEEC6, %eax @EIP 0x20e98: length: (5): mov $0x806EEEC6, %edx @EIP 0x20e9d: length: (6): and $0x00000FFF, %edx @EIP 0x20ea3: length: (2): add %edx, %eax @EIP 0x20ea5: length: (1): push %eax @EIP 0x20ea6: length: (1): pushf @EIP 0x20ea7: length: (7): orl $0x00020000, (%esp) @EIP 0x20eae: length: (7): orl $0x00003000, (%esp) @EIP 0x20eb5: length: (5): push $0x00002000 @EIP 0x20eba: length: (5): mov $0x806EEEC6, %eax @EIP 0x20ebf: length: (5): and $0x00000FFF, %eax @EIP 0x20ec4: length: (1): push %edx @EIP 0x20ec5: length: (1): iret @EIP 0x20ec6: length: (4): mov $0x0012, %ax @EIP 0x20eca: length: (2): addb %al, (%eax) @EIP 0x20ecc: length: (2): int $0x10 @EIP 0x20ece: length: (2): les %esp, % ------------------ Conjecture: maybe the part of the code around 0x806eeec6 (push the addr in) is to enforce to jump to 0x20ec6 (switching real mode and protected mode?) >>>> now the question is who's calling and leads to 0x20ec6 eventually!!!!! Attempt 28: keep an arraylist of addresses and then dump it to file when hit 0x20ec6 --> 64MB limit is exceeded at least 10 times. Find the entry point leads to 0x20ec6: use print_instrRange to print instructions. 0x806e339b 16298285 There is a loop repeated many times: @EIP 0x806f30a2: length: (4): andw $0x00, (%edi) @EIP 0x806f30a6: length: (1): inc %edi @EIP 0x806f30a7: length: (1): inc %edi @EIP 0x806f30a8: length: (1): inc %eax @EIP 0x806f30a9: length: (5): cmp $0x00001000, %eax @EIP 0x806f30ae: length: (2): jc 0xFFFFFFDB @EIP 0x806f3089: length: (5): cmpw $0xFFFF, (%edi) @EIP 0x806f308e: length: (2): jz 0x00000014 > It's a counter loop that increaes (%edi): use vi to remove the above loop, vi command: -----------------------> -----------------------> Finally found the entry of the interrupt handler 0x8052d41f 0x8052d421 0x8054800f 0xf896094e 0xf896094f 0xf8960951 0xf8960955 0xf896095b 0xf8960961 0xf8960963 0xf8960968 0x806f3110 -----------1st instruction here!!!!! 0x806f3112 0x806f3113 0x806f3115 0x806f3118 ------------------------------------------------------ %s/0x806f30a2\n.*a3\n.*.....8e//g It's 0xf8960937 who triggers it!!!!!!!!!!!!!!!!! --------------------------------------------------- >>> now stop at 0xf8960937 and see how it goes Interestingly the bp never hits! In ddebug session: the first time it's hit: last_eip: 0xf9e5c937 the second time it's hit: last_eip: 0xf9e5c968 2nd debug session: 1st hit: last_eip: 0xf9e5c937 2nd hit: last_eip: 0xf9e5c968 Instruction dump below: [NOTE the instructions with *] * @EIP 0xf9e5c937: length: (3): lcall *0x2C(%eax) @EIP 0xf9e5c93a: length: (5): push $0xF9E5D900 @EIP 0xf9e5c93f: length: (5): lcall 0xFFFFFA39 @EIP 0xf9e5c944: length: (2): mov %bl, %al @EIP 0xf9e5c946: length: (1): pop %edi @EIP 0xf9e5c947: length: (1): pop %ebx @EIP 0xf9e5c948: length: (1): pop %esi @EIP 0xf9e5c949: length: (1): leave @EIP 0xf9e5c94a: length: (3): ret $0x0004 * @EIP 0xf9e5c94d: length: (1): int3 @EIP 0xf9e5c94e: length: (1): push %ebx @EIP 0xf9e5c94f: length: (2): xor %ebx, %ebx @EIP 0xf9e5c951: length: (4): cmpb %bl, 0x8(%esp) @EIP 0xf9e5c955: length: (6): movl %ebx, 0xF9E5E6AC @EIP 0xf9e5c95b: length: (6): movl %ebx, 0xF9E5E6A8 @EIP 0xf9e5c961: length: (2): jz 0x0000000A @EIP 0xf9e5c963: length: (5): movl 0xF9E5C328, %eax * @EIP 0xf9e5c968: length: (3): lcall *0x2C(%eax) @EIP 0xf9e5c96b: length: (5): push $0xF9E5D900 After recompile, hit 0xf9e5c937 (cr3:0x39000: -> 0x806f3110 -- (use "watch absolute addr of env->cr[3]) to verify that no context switch) --> 0x806eeec4. 2nd hit: 0xf9e5c968 --> ... > 0x805eeec4 -> crash. Now the question is who triggers 0xf9e5c968. Need to look at 0xf9e5c94e (see above code dump). When 0xf9e5c94e is hit, its last eip is 0x8054800f! Break on 0x8054800f (ljmp 0x804d76A0), --------------- the above analysis is not accurate, 0x806f3310 is NOT the first instruction ---------------------------------- --------------------------------------------------------------------- Attempt 28: It's hard to tell the difference between function call. Our conjecture is that process 0x39000 is a process providing interrupt handlers. Very likely the 0x20ec6 is eventually triggered by an interrupt. Modification: in helper_trace record last_cr3, if it is switching from other process to 0x39000, record the last_cr3 and last_eip. Chomp the dump of instructions and compare. -----> Observation: (1) the first context switch is from cr3 0x0, eip: 0x4047a9. Dump: @EIP 0x4047a0: length: (1): inc %ebx @EIP 0x4047a1: length: (3): addb %dh, -0x24(%edx) @EIP 0x4047a4: length: (5): movl 0x0046A418, %eax *** @EIP 0x4047a9: length: (3): mov %eax, %cr3 So actually process 0x39000 is the same as process 0!!! (2) the second context switch is from cr3: 0x542e000 eip: 0x804e1f6c cr3: 0x50ae000 eip: 0x804e1f6c ---- dump ---> !!!!!!!!!!!!!---------------- @EIP 0x804e9634: length: (1): push %ebx @EIP 0x804e9635: length: (1): push %esi @EIP 0x804e9636: length: (1): push %edi @EIP 0x804e9637: length: (6): movl %fs:0x00000124, %eax @EIP 0x804e963d: length: (3): movl 0x44(%eax), %ebx @EIP 0x804e9640: length: (3): movl %ebx, -0x24(%ebp) @EIP 0x804e9643: length: (6): lcall *0x804D7650 Interesting, if run in GDB, it's cr3: 0x516e000 eip: 0x804e9634 ----> could not capture it in GDB, the cr3 and eip is ALWAYS switching slightly, but the first instruction is always the same: (cli insruction) 0x804e0f69 - check what it is. @EIP 0x804e0f68: length: (1): nop * @EIP 0x804e0f69: length: (1): cli @EIP 0x804e0f6a: length: (6): movl 0xFFDFF03C, %ecx @EIP 0x804e0f70: length: (3): leal 0x50(%ecx), %eax @EIP 0x804e0f73: length: (4): movb $0x89, 0x5(%eax) @EIP 0x804e0f77: length: (1): pushf @EIP 0x804e0f78: length: (7): andl $0xFFFFBFFF, (%esp) It looks like that the context switch occurs for the push instruction, and note that it is NOT actually an INT instruction. Attempt 30: check the INT instruction and see how it is switched to process 0x39000. 1> check out several instructions that trigger int N. (1) cr3: 0x463c000 (this can change though), pc: 0x8050b895, int 0x2d Interestingly. the BP is only HIT once! there are many other processes invoking interrupts, e.g., 0x39000 itself. The following is a list: (1) 0x20980 int 0x10 (2) 0x20d79 int 0x15 (3) 0x206cb int 0x13 At 0x8050b895, the instructions are: @EIP 0x8050b887: length: (2): mov %edi, %edi @EIP 0x8050b889: length: (1): push %ebp @EIP 0x8050b88a: length: (2): mov %esp, %ebp @EIP 0x8050b88c: length: (3): movl 0x10(%ebp), %eax @EIP 0x8050b88f: length: (3): movl 0x8(%ebp), %ecx @EIP 0x8050b892: length: (3): movl 0xC(%ebp), %edx *@EIP 0x8050b895: length: (2): int $0x2D //x86 debug service @EIP 0x8050b897: length: (1): int3 @EIP 0x8050b898: length: (1): pop %ebp @EIP 0x8050b899: length: (3): ret $0x000C After the INT 2d -> it visits 0x804e0032 (still cr[3] 0x44fc000), using the breakpoints we can find htat it executes within the same originator process 0x44fc0000 for some instructions and then it jumps to ---------------------------------------------------------- cr3: 0x39000, eip_in: 0x804dc0b4.!!! my_last_eip is 0x804dc0b1, and (actually both processes these instructions are located in the same addr range. Note the instruction at 0x804dc0b1 at the following dump (when cr3 is 0x39000) @EIP 0x804dc0ab: length: (3): movl 0x18(%edx), %eax @EIP 0x804dc0ae: length: (3): movl %eax, 0x1C(%ecx) *@EIP 0x804dc0b1: length: (3): mov %eax, %cr3 *@EIP 0x804dc0b4: length: (4): movw 0x30(%edx), %ax @EIP 0x804dc0b8: length: (4): movw %ax, 0x66(%ecx) @EIP 0x804dc0bc: length: (3): ret $0x0008 @EIP 0x804dc0bf: length: (3): leal (%ecx), %ecx ---------------------------------------------------------- !!! conclusion: when INT 2d occurs, it jumps to the interrupt handler (still in the same addr space, same cr3 page table). Then somewhere later it switches to 0x39000. Setting a BP at helper_raise_exception (NEVER hit). It seems that it's the INT 2D causes the problem. Attempt 31. Now change the process to watch to branch.exe and look at when it is switching to 0x39000. <<< 1. find out if it's hitting 0x8050b895 in branch.exe. >>> no it didn't hit 0x8050b895 <<< 2. modify the conditional breakpoint and check the first time it's switching from branch.exe to 0x39000 >> after some dumps of helper_trae, it hits the switch ONLY once, and then it comes to the exception!!!! 3. now find out the EIP of the instruction >>> eip: 0x804e0f69 (in 0x39000: cli instruction), and the last instruction in branch.exe (0x804e9634). Interestingly, the trace has only several instructions there, as shown below: -------------trace dump v2---------------- #### CR3: c059000, Size: 367, TOTAL: 4683 ... @806ecdc0 [cr3: 1000d000, visited: 1]: add [eax], al @806ecdc7 [cr3: 1000d000, visited: 1]: add [eax], al @806ecdcd [cr3: 1000d000, visited: 1]: add [eax], al @806ecdd2 [cr3: 1000d000, visited: 1]: add [eax], al @806ecdd5 [cr3: 1000d000, visited: 1]: add [eax], al @806ecdd6 [cr3: 1000d000, visited: 1]: add [eax], al @806ecddb [cr3: 1000d000, visited: 1]: add [eax], al @806ecddd [cr3: 1000d000, visited: 1]: add [eax], al ------found last 20ec6------------------------ 0x8056bfc6 0x8056bfc7 >>> set a BP on 806ecdc0 and check the instructions (note env->cr3 is 0x39000) dump is below: @EIP 0x806ecdc0: length: (7): testb $0xFF, 0xFFDFF050 @EIP 0x806ecdc7: length: (6): jnz 0xFFFFFE99 @EIP 0x806ecdcd: length: (5): push $0x000000D1 @EIP 0x806ecdd2: length: (3): sub $0x04, %esp @EIP 0x806ecdd5: length: (1): push %esp @EIP 0x806ecdd6: length: (5): push $0x000000D1 @EIP 0x806ecddb: length: (2): push $0x1C @EIP 0x806ecddd: length: (5): lcall 0x00001ECF ---> jumps 0x806eecac. This can be verified in process 0x39000 dump But interestingly, 0x806ecddd is THE LAST INSTRUCTION recorded for process branch.exe. Need to figure out why the cr3 is switched (it's just an lcall)A At 0x806ecddd, it later hits helper_trace_mem (why is this called? check disas_insn later -- to push next addr), it's writing 0xf8f17d4c. --> interestingly, it is calling helper_trace2 directly without calling helper_stl_mmu!!! why???? It's actually ok because the helper_stl_mmu is actually handled together by the ld/st_labels. Do an experiment and see if every qemu_ld ---> helper_trace_mem --> helper_ldl_mmu. >>> b tcg_out_qemu_ld first find an instruction which does ld 0xfe05b compw $0x94, %cs:(%esi) the translation of the instruction starts from 0xb31e6063 add_qemu_ldst_label starts at 0xb31e608c (actually it does not take any actual translation). call helper_mem_trace starts from 0xb31e608c >>> then b helper_trace on the instruction; b helper_trace_mem and then b helper_ldl_mmu ---> it's calling helper_ldl_mmu without calling helper_trace_mem ----> the current implementation is wrong, helper_trace_mem is NOT called at all! the tcg_out_tbl_load (at its end) has a conditional JUMP which direcltyjumps to the next instruction!!!!!!!! So needs to lift the qemu_out_trace_mem UP!!!!!! Verification of the change: b helper_trace on 0xfe05b, then b on helper_trace_mem and helper_ldl_mmu: VERIFIED FIXED. Now repeat the experiment on branch.exe again. check the following instr: 0x806ecddd. @EIP 0x806ecddd: length: (5): lcall 0x00001ECF ---> jumps 0x806eecac. >>> now the problem is that it only records no more than 20 instructions. Fixed the but, it's in Trace::addInstruction(...) Last instruction recorded is the following: @804e1f4c [cr3: 0ec6d000, visited: 0]: mov ax, 0x0023 @804e1f50 [cr3: 0ec6d000, visited: 0]: sub esp, 0x30 @804e1f53 [cr3: 0ec6d000, visited: 0]: mov ds, ax ------found last 20ec6------------------------ It seems after 5 instructions in @7c92xxxx range, it executes around 15000 instructions in the 804exxxx range and never gets out (and then crashed). Last instruction is @7c92289c (jz 0x7 - that is jz 0x7c9228a3). Set a BP on it.. Could not capture on it. The system skipped several instructions at the beginning. >>> record the trace without memory trace and see how is it different. (1) Correct Trace: to 0x00401014, captured over 889k instructions. Instruction 804e1f25 is always hit after some 7c92xxxx instruction. (2) Incorrect Trace: about 16k instructions. Interestingly, the incorrect trace dump the contents of instructions correctly! !!! GUESS: somehow cpl or segment registers not set correct. Attempt 31. switch back to srss.exe and recrod the list of instructions of 0x39000 and see what's the difference. (similarly, add some code to dump the instructions before hitting 0x20ec6). <<<< ???? write a simple program and search for the occurance of instructions in trace_bad.txt which is not in trace_good.txt <<<< >>> (1) tac trace_bad.txt > reverse_trace_bad.tx >>> (2) do a simple python script to find the first departure point between trace_bad and trace_good. > first word departure 0x806ee297 --- guess: it's a interrupt handler function depends on some values do the jump The sequence of departure instructions (the instruction is the same for both traces and the next instruciton is different) are: -=--------------------------------- size of hash table is 2884927 departure point: 0x806ee297 - RET departure point: 0x804e37fd - jz 0x00000052 -- THIS SEEMS TO BE THE ONE Then after 0x804e37ff, all not visited before About 43k instructions executed --> in trace_good, the jz is only hit once. departure point: 0x8054b131 - ret $0x000C departure point: 0x804e2acc - ret departure point: 0x8054b03c - ret 0x8 departure point: 0x804da2de - ret departure point: 0x804fa5ee - ret 0x8 Attempt 32. To verify eflags and flags, we'll switch back to srss.exe to monitor. (1) get its trace first and then dump the eflags/flags. Develop a function that dump the eflags. print_eflags and flags at instruction 0x7c92289a (the first instruction) e_flags and h_flags are exactly the same at 7c92289a! Then dump the e_flags (202) and h_flags (400b4) when it's dumping the trace. then wen dumping the trace. ---- good_trace: dumped at 0x800ca1f6 still eflags (202) and h_flags (400b4) ---- bad_trace: dumped at 0x20ec6 eflags (23002) and hflags (4008c7) !!!! now inspect the situation at 0x804f08a3 (the departure point) --- good_trace: first couploe of hits 202 and 400ab4, last couple of hits 202 and 4000b4. Seems to be OK though. ---bad_brace: 0x804f08a3 is hit multiple times, last time hit: 202 and 4000b4 , first couple ofhits: 400ab4 (note: ab4) ----> completely the same, there is no way to explain the different behavior on the comparison with $0x20FD at 0x804f08a3. Attempt 33. Now remove the printf function, and perform the analysis again. After replacing it with a dummy function, found that the problem is with the parametes passed!!!! Strangely: if passing all 4 parameters to dummy_func, it malfunctions. if passing any ONE parameter not as a contsnt, it hangs! Comparison below: ---------------------------------------------------------- good version (passing all 1's) as parameters: 0x832efd2 <helper_trace_mem+106>: movl $0x1,0xc(%esp) 0x832efda <helper_trace_mem+114>: movl $0x1,0x8(%esp) 0x832efe2 <helper_trace_mem+122>: movl $0x1,0x4(%esp) 0x832efea <helper_trace_mem+130>: movl $0x1,(%esp) 0x832eff1 <helper_trace_mem+137>: call 0x8459174 <dummy_func(unsigned int, unsigned int, unsigned int, int)> ----> parameters ARE pushed into the stack, set a bp at 0x832efd2 by b *0x832efd2, we have ESP value: 0xb02fde00, after the call is finished:(void *) 0xb02fde00, so the ESP is back to its normal status. ------------------------------------------------------------- ------------------------------------------------------------- bad version (pass one REAL parameter): 0x832efd2 <helper_trace_mem+106>: mov -0x1c(%ebp),%eax 0x832efd5 <helper_trace_mem+109>: mov 0xec(%eax),%edx 0x832efdb <helper_trace_mem+115>: mov 0x85e92ac,%eax 0x832efe0 <helper_trace_mem+120>: mov -0x24(%ebp),%ecx 0x832efe3 <helper_trace_mem+123>: mov %ecx,0xc(%esp) 0x832efe7 <helper_trace_mem+127>: mov %edx,0x8(%esp) 0x832efeb <helper_trace_mem+131>: mov -0x20(%ebp),%edx 0x832efee <helper_trace_mem+134>: mov %edx,0x4(%esp) 0x832eff2 <helper_trace_mem+138>: mov %eax,(%esp) 0x832eff5 <helper_trace_mem+141>: call 0x8459178 <dummy_func(unsigned int, unsigned int, unsigned int, int ----> parameters ARE pushed into the stack, set a bp at 0x832efd2 by b *0x832efd2, we have ESP value: 0xb02fde00, after the call is finished:(void *) 0xb02fde00, so the ESP is back to its normal status. **** (GOODVERSION!!!) By setting a BP at helper_trace2, we found that the system is involved in a small infinite loop of around 30 instructions 0xfffffff0 --> ljmp $E05B 0xfe05b --> cmpw $0x94, %cs:(%esi)A --> calls dummy_func 0xfe062 --> jnz 0xC031E58E 0xfe066 --> xor %eax, %eax ... 0xfc493 0xfc495 lidt %cs: (%esi) hint the dummy_func twice 0xfc49b lgdt %cs:(%esi) hit the dummy_func twice --> after this instruction env->gdt->base becomes 0xfd3a8!!!! 0xfc4a1--> mov cr0, %eax 0xfc4a4--> mov %eax, %cr0 0xfc4a8--> ljmp 0xC4B3:0xC4B3 0xfc4ab -- ljmp $0xC4B3, 0xC4B3 ---> 0xfc4b3 (it is accomplished via helper_ljmp_protected: new_cs=8, new_eip=fc4b3, next_eip_addend=8 load e1 = 0xffff, e2 = 0xcf9b00, cpl and dpl are both 0. Then it jumps to 0xfc4b3 (new_eip)) ------------------------------------------------------------- *** (BAD VERSION!!!) same setting 0xfffffff0 --> ljmp $E05B 0xfe05b --> cmpw $0x94, %cs:(%esi)A --> calls dummy_func 0xfe062 --> jnz 0xC031E58E 0xfe066 --> xor %eax, %eax 0xfe068 --> mov 0xfe06a --> mov $7000, %sp 0xfe070 --> mov $416c, %dx 0xfe076 --> ljmp 0x5566e40c 0xfc480 --> mov %ax, %cx 0xfc483 --> cli 0xfc484 --> cld 0xfc485 --> mov $0x008f, %ax 0xfc48b --> out %al, $0x70 0xfc48d --> in $0x71, %al 0xfc48f --> in $0x92, %al 0xfc491 --> or $0x2, %al 0xfc495 lidt %cs: (%esi) hint the dummy_func twice 0xfc49b lgdt %cs:(%esi) hit the dummy_func twice 0xfc4a1--> mov cr0, %eax 0xfc4a4--> mov %eax, %cr0 0xfc4a8--> ljmp 0xC4B3:0xC4B3 0xfc4ab -- ljmp $0xC4B3, 0xC4B3 back to 0xfffffff0 AGAIN!!! ---> 0xfffffff0 (it is accomplished via helper_ljmp_protected: new_cs=8, new_eip=fc4b3, next_eip_addend=8 *** load e1 = 0, e2 = 0 Different!!!!! it loads dt from env->gdt {selector 0, base 0, limit 55} It's not changed after lidt and lgdt. --------------------------------------------------- Attempt 34: Analyze how dummy_func would affect lgdt %cs: (%esi) -------------------------------------------------- *** good version: before call helper_trace_mem dump registers: eax 0xc480 50304 --> 0 ecx 0xf0000 983040 edx 0xfd3a0 1037216 ebx 0xfd3a0 1037216 esp 0xb02fde40 0xb02fde40 ebp 0x28da4b90 0x28da4b90 esi 0x0 0 edi 0x2 2 eip 0xb31e6453 0xb31e6453 <code_gen_buffer+1107> eflags 0x246 [ PF ZF IF ] cs 0x73 115 ss 0x7b 123 ds 0x7b 123 es 0x7b 123 fs 0x0 0 gs 0x33 51 After call helper_trace_mem dump registers, changes eax, eip. Note that it's 0xb31e64d5 (mov %ecx, 0xc4(%ebp)) writes into env->gdt->base! While 0x...9b's translated code starts at 0xb31e642b! %ecx is from helper_ldl_mmu (addr: 0xfd3a2, mmu_idx:0) -> 0xfd3a8 *bad version --> the problem is with second parameter: addr is 0x2 -> which is definitely not right. %esi in emulator is 0, env->segs[1]->base is 0xf0000 (CS). Good version is the same. So it's where to load the address that matters! ---> next set BP on helper_ldl_mmu!!!!!!!!!!!!!!!!! ------------------------------------------------------ Attempt 35: check helper_ldl_mmu how it's different ----------------------------------------------------- <<<< (1) set BP on helper_trace2 and display/x eip_in (2) run until 0xf4e9b (where it tries to load lgdt) (3) then start step by step and see how parameter 0xfd3a2 (good version) and bad version (0x2) is passed to helper_ldl_mmu --------------- good version --------------- -------------- bad version -------------- called helper_ldl_mmu twice 1st time: (for lidt) addr ix 0xfd3e2, return: 0xfd3e6 2nd time: (for lgdt) addr is 2 Note that the two envs->regs and env->segs are exactly the same! Check translate.c line 7359 for the logic: gen_lea_modrm: A0 <- disp A0 <- A0 + seg *reg_ptr = OR_A0 *offset_ptr = disp T1 <- [A0] A0 <- A0 + 2 T0 <- [A0] base <- T0 (cpu_T[0]) limit <- T1 (cpu_T[1]) The point is why A0 <- disp + seg yields 2? Find out the corresponding instructions to the two micro operations in gen_lea_modrm: tcg_ctx.gen_opc_ptr is 0x28c1ce7c (A0<-disp) microcode: 11 (movi_32) ends 0x28c1ce82 (end of A0<-A0+2):miccode: 18 (ld_32), 22 (add_i32) (corresponds to tcg_ctx.gen_opc_ptr[80] to [82](included)) checking tcg/i386/tcg.c tcg_gen_code, then correspond to code at A0 <- disp is just saved in args[15] (TCGTemp*), it is not really translated into an instruction A0 <- A0 + seg is then located from 0xb31e642b to 0xb31e642e, as dumped below: --------------------------------------------------------------------- !!!!!NOTE THAT THE FOLLOWING RUNNING TRACE IS EXACTLY THE SAME AS THE GOOD TRACE EXCEPT instruction 0xb31e64a4 (where %ecx has different value) ****************** translation of lgdt %cs(%esi) ************************* 0xb31e642b : mov 0x54(%ebp),%ecx %ecx has the set value (cs in emulator) At run time: its %ecx receive value 0xf0000 (this is CS selector) 0xb31e642e : mov $0xd3a0,%ebx (%ebx has the const value) 0xb31e6433 : lea (%ebx,%ecx,1),%edx (this is the add edx = ebx + ecx) # after this point $edx is 0xfd3a0 # there is a helper_trace_mem call RIGHT AFTER this # after the helper_trace_mem call: edx is 0xfd3a0 #------------------ seems to be doing T1 <- [A0] now ------------------ Then we have the following sequence of instructions before helper_ldl_mmu #----------------------------------------------------------------------- 0xb31e6436 : mov %edx,%ebx #save edx to ebx. ebx = 0xfd3a0 now 0xb31e6438 : mov %eax,0x80(%esp) #esp is now 0xb02fde50, save $eax to 0xb02fded0 (val: 0xc480) 0xb31e643f : mov %ecx,0x8c(%esp) #save ecx (0xf000) to 0xb02fdedc 0xb31e6446 : mov %edx,0x88(%esp) #save edx (0xfd3a0) to 0xb02fded8!!! # later you will see these are generated by # check of clobber registers and save them at line 1934!!! # again, this is to preserve registers eax, ecx, edx ---------------------------------------------------- #// set a watch point on 0xb02fded8 here, see who's changing it! 0xb31e644d : push $0x1 #param 4: bRead 1 0xb31e644f : push $0x1 #param 3: size 1 0xb31e6451 : push %ebx #param 2: addr 0xfd3a0 (read 0xfd3a0) correct! 0xb31e6452 : push %ebp #param 1: env 0xb31e6453 : call 0x832ef68 <helper_trace_mem> 0xb31e6458 : add $0x10,%esp #at this point $esp is 0xb02fde40, reverse # to its orignal status, to 0xb02fde50! 0xb31e645b : mov %ebx,%eax #eax has now 0xfd3a0 0xb31e645d : mov %ebx,%edx #edx has now 0xfd3a0 0xb31e645f : shr $0x8,%eax #eax has now 0xfd3 0xb31e6462 : and $0xfffff001,%edx #edx is still 0xfd000 0xb31e6468 : and $0xff0,%eax #eax is now 0xfd0 0xb31e646e : lea 0x35c(%ebp,%eax,1),%eax #eax is now 0x28da5ebc # guess operand T1? 0xb31e6475 : cmp (%eax),%edx 0xb31e6477 : mov %ebx,%edx #edx has 0xfd3a0 again 0xb31e6479 : jne 0xb31e65d7 <code_gen_buffer+1495> #---------------------------------------------------------------- Then we have another call of helper_trace_mem!, as seen below: note that the actual cpu_ldl_mmu is not done yet! because they need to do the read/write operations at the end of the block. the following is the memory trace for read operation to read [A0+2] #---------------------------------------------------------------- 0xb31e647f : add 0xc(%eax),%edx # eax = 0x28da5ebc 0xb31e6482 : movzwl (%edx),%ebx #$edx is 0xb06be3a0 now, ebx=0xfd3a0, # after ebx -> 0x37??? 0xb31e6485 : mov 0x88(%esp),%esi # esi is from [88+esp], # should be value of A0: 0xfd3a0 # because A0 in s->temp[15] has # the information about its location # in memory! 0xb31e648c : lea 0x2(%esi),%ecx # ecx is addr parameter # from 2 + esi 0xb31e648f : mov %ecx,0x88(%esp) # save ecx to stack # [0xb02fded8] = 0xfd3a2 0xb31e6496 : push $0x1 #4th param bRead --> this is to read 0xb31e6498 : push $0x2 #3th param size --> 2 0xb31e649a : push %ecx #2nd param addr --> 0xfd3a2 0xb31e649b : push %ebp #1st param env --> stored at $ebp (0x28da4b90) 0xb31e649c : call 0x832ef68 <helper_trace_mem> ************************************************************************** Then the handling after helper_trace_mem is similar ********************************************************************** 0xb31e64a1 add $0x10,%esp #esp is 0xb02fde50 0xb31e64a4 mov %ecx,%eax # eax = 2 !!!! in the good trace $ecx is !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ # ecx is 0xfd3a2. So in the bad version, $ecx value is DESTRUCTED # somehow it's not saved!!!!! !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^ 0xb31e64a6 mov %ecx,%edx # edx = 2 0xb31e64a8 shr $0x8,%eax # eax = 0 0xb31e64ab and $0xfffff003,%edx #edx = 2 0xb31e64b1 and $0xff0,%eax #eax = 0 0xb31e64b7 lea 0x35c(%ebp,%eax,1),%eax #eax = 0x28da4eec 0xb31e64be cmp (%eax),%edx # 0xb31e64c0 mov %ecx,%edx #edx = 2 0xb31e64c2 jne 0xb31e65f2 ********************************************************************** Now call the helper_ldl_mmu! 0xb31e65f4 push %ecx #push param 2: addr (2)!!!!! 0xb31e65f5 push %ebp #push param 1: env 0xb31e65f6 call 0x82f0d6e <helper_ldl_mmu> ********************************************************************** +++++++++++++++++++++= FINALLY FINALLY Problem is helper_trace_mem did NOT preserve ECX register, but this is according to Intel ABI (ECX, EDX, and EAX need not be preserved!!!) . But ECX register is used by the CALLER! That's the problem! Fix: enfoce preserve ECX, EDX but not EAX register! Question 1. check if there is any special flags for pre-serving registers ----------------------------------------------- Attempt 36: figure out why %ecx is allocated for storing temporary. Analyze which generates the following instruction pairs: 1. 0xb31e6446 : mov %edx,0x88(%esp) #save edx (0xfd3a0) to 0xb02fded8!!! 0xb31e6485 : mov 0x88(%esp),%esi # esi is from [88+esp], 2. 0xb31e648f : mov %ecx,0x88(%esp) # save ecx to stack !!! but there is no corresponding restore instruction -------------------------------------------------- <<< approach: (1) set BP at helper_trace2 and disas_insn, stop at 0xfc49b (lgdt %esi(%cs)) (2) observe the micro-op code generated for each step (3) set BP at tcg_gen_code_common and watch the instruction range *** use tcg_ctx->gen_opc_ptr and its difference with tcg_ctx->gen_opc_buf to infer the location and content of the micro-code CROSS-REFERENCE with the previous attempt to check why the restoration code is missing for the second helper_trace_mem! <<< >>> Pseudo-code Micro-code x86 Instructions gen_lea_modrm: A0 <- disp 11 (movi_32) 0xb31e642b (no-code gen) A0 <- A0 + seg 18 (ld_i32), 0xb31e642b 22 (add_i32), 0xb31e642e, T1 <- [A0] 114 (ld16u) 0xb31e6436 A0 <- A0 + 2 11, (movi_32) 0xb31e6485 22, (add_i32) 0xb31e6485 T0 <- [A0] 116 (ld32) 0xb31e648f [!!!diff from 114!, because it's loading T0], (load T0 <- T0 & 0xffffff 11 (movi32), 0xb31e64cd 31 (and_i32) 0xb31e64cd base <- T0 (cpu_T[0]) 21 (st_i32) 0xb31e64d5 limit <- T1 (cpu_T[1]) 21 (st_i32) 0xb31e64db #from debug A0: 0xfd3a0 **************************************************************** Conclusion: 1. 0xb31e6446 : mov %edx,0x88(%esp) #belongs to T1<-[A0] 0xb31e6485 : mov 0x88(%esp),%esi # belong to [A0<-A0+2] 2. 0xb31e648f : mov %ecx,0x88(%esp) # belongs to T0-<[A0] ************************************************************** ------------------------------------------------------------- Attempt 37: based on attempt 36, Now look at specifically 0xb31e6485 [A0<-A0+2] (MICROcode 11, 22, index: 84, 85), check why it's reading from memory; <<< do x/16i 0xb31e6485 to check from time to time about the code generated. ------------------------------------------------------------- Observation: for opcode at 84 (11, movi_32), it just call tcg_reg_alloc_movi, but not generate any real x86 code. It saves constant 2 into the ops/args array. At opcode 85 (22, add_i32), it calls tcg_reg_alloc_op(...). it first copy cthe constant, then it calls tcg_regset_set(allocated_regs, s->reserved_regs), then it uses a for loop to retrieve the arguments. For the 1st arg, its type is TEMP_VAL_MEM (value 2, note that 0 is dead, 1 is register, and 2 is MEM, and 3 is const), it calls tcg_reg_alloc, it allocates as reg6, then it generates a load instruction which loads from memory (esp+0x88) , and records this information in s->reg_to_temp[reg/6] to arg number (which is used to retrieve the value record at s->temps[arg]). That generates the instruction at 0xb31e6486. Note that it sets ts->mem_coherent to 1 (because currently the value is the same as the memory). In this debug session: ts is located at s->temps[15]. (arg=15).reg allocated is number 6 (esi) ----------------------------------------------------------- Attempt 38: how are the MEMORY VARIABLES saved (their register values saved into memroy Check instruction 0x b31e648f [T1-<[A0]] (microcode 116, index 86, ld32. At this moment A0 is already A0+2, saved in argument 15] check why it's savied to memory but NEVER read out!!! Note that instruction 0xb31e648f writes the value of A0 (stored in %ecx) into memory! ----------------------------------------------------------- +++ the idea is similar. This time it's reading out of register directly. arg is still 15 (A0), see attempt 37). Note that its type is NOT TEMP_VAL_CONST, it's indicated as TEMP_VAL_REG. It, as usual, calls tcg_reg_alloc_op (just like 114): (1) it first copies s->reserved_regs to allocated_regs, it seems to be a bit string 0x48 01001000 (2) then it identifies the first argeument to be 15 (this is A0) it's located at s->temps[15]. val_type=1 (REG), reg=1, val is 0xd3a0, mem_coherent is 0, then it uses tcg_regset_test_reg(arg_ct->u.regs, reg) for if branch. Note that reg value is 1, and arg_ct->u.regs is 0xfa. (it seems to test if the register is contained in some constraint set). (3) It then sets allocated_regs (again, verified that allocated_regs is a bit string. Its value is now 0x32= 0011 0010, the logic of tcg_regset_set_reg(d,r) is defined as (d) |= 1L << (r). Here it's setting bit 2 (from right). (4) Then because the ld instructio is defined in tcg.h as a OPF_CALL_CLOBBER instruction (meaning clobbers call registers and potentially update globals), !!! at line 1934 of tcg.c it calls tcg_reg_free!!!! if the register is in the tcg_target_call_clobber_regs! It is initialized in i386/tcg-target.c (EAX, ECX, EDX) are set, and its value is now: 0x7 (111), notice that the const value of EAX... here are TCG_REG_EAX etc! value defined as TCG_REG_EAX = 0, ECX=1, EDX=2, EBX=3, ESP=4, EBP=5, ESI=6, EDI=7 (note that this is different from the definition in target-i386/cpu.h!!!!) So here reg=1 actually means ECX!!!!! !!!!!!!!###########!!!!!!!!!!!!!!!! At line 1937 it's a simple loop which synchronize (save reg to mem!!!) It synchronizes ECX (1), the logic of tcg_reg_free is simple: it reads s->reg_to_temp[reg] which returns the arg_id, the memory variable can be retrieved using s->temp[arg_id]. Then if it's not memory coherence, the system generates a tcg_out_st instruction. !!!!!!!!!!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& It then sets ts->mem_coherent to 1???? This is quite suspicious as the st operation is not done yet??? need to verify !!!!!!!!!!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& !!!!!!!!###########!!!!!!!!!!!!!!!! Conclusion: the registers ARE properly saved when the instruction is identified as a CLOBBER instruction! QEMU will first save the CLOBBER instruction (EAX, ECX, EDX) to memory first BEFORE doing ld or st instruction! ------------------------------------------------------------ Attempt 39: now the question when generating the code for ldl tasks, why does the system read directly from %ecx? s->temps[15] at the end of handling of T1<-A[0] is set memory variable, and ecx has been saved to memory! Need to look at instruction: 0xb31e64a4 mov %ecx,%eax # eax = 2 !!!! It corresponds to the following micro code operation T0 <- [A0] 116 (ld32) 0xb31e648f [!!!diff from 114!, because it's loading T0], check instruction at 0xb31e64a4 to identify the translation, while continue the exploration right after attempt 38 ------------------------------------------------------------ Observation: it's following into the tcg_out_tbl_load, it takes the register %ecx directly as the parameter (so there is no check of the argument), this is actually set by tcg_reg_alloc_op, because earlier tcg_reg_alloc_op has load the A0 into register and it has synchronized it, it thought that the value won't be messed up when it calls tcg_out_ld --> tcg_out_tlb_load. But actually before tcg_out_tlb_load, there is a call helper_trace_mem which might mess up the register values!!!! ----------------------------------------------------------------- Conclusion: REAL CAUSE OF THE BUG!!!!!!!!!!!!! --------------------------------------------------------------- ###: tcg_out_tlb_load has the assumption that all registers are EXACTLY the same as tcg_out tcg_out_qemu_load, because tcg_reg_alloc_op has already cleared all register issues (saving and synchronizing them). But because we inserted a function call before it, the CLOBBER registers might be messed up. In this case, we ONLY need to PUSH registers EAX, ECX, EDX into the stack, and at the end, POP them out. ------------------------------------------------------- ####################################################### Attempt 40: push EAX, ECX, EDX into stack and pop them out when at the end of tcg_out_trace_mem. ####################################################### ------------------------------------------------------- (1) works.works. (2) recover the mem_handle function, with logic commented out. works. (3) now enable the real memory recording logic. works (4) remove dummy function. works. (4) register windows. works. ----------------------------------------------------------------- Task 41: add a config file ----------------------------------------------------------------- (1) place the file in traceinstr/config.txt (5 min) (2) add a function init_traceinstr() in handle.h and handle.cc (10 min) (3) the init_traceinstr() initializes a number of global variables. (10 min) ----------------------------------------------------------------- Task 42: add dependency analysis ----------------------------------------------------------------- (1) Later for memory management, add an additional layer of abstraction called dependencyCache which maps from instruction to instruction. At this moment, simply use C++ hash map. [15 min]. DONE. (see accessHistory.h) (2) Add configuration trace range, to trace instructions in a specific range. [25 min] DONE. (see trace.c) (3) In Memory Accesss handlers, add the dependency acccess [20 min] . DONE (4) Testing the above. [0] add a range to dump instructions. [10 min]. DONE. [1] get a proper range (about 20 instructions). [15 min]. DONE [2] check the initial dump [15 min]. DONE. [3] modify dump to include data dependency [20 min]. DONE. [4] examine and check data dependency [40 min] -------------------------------------------------------------------- Task 43: analyze the problem of instruction dump error and fail of fflush -------------------------------------------------------------------- (1) fflush problem. When run ./run.sh, it exits or crashes early 2 or the 3 times. run in gdb: strangely it hits exit(0), but the output of trace.dump() did not show up including all the printf statements around. Somehow it seems that the printf()'s are directly stopped by exit(). Also trace.dump() is somehow run in parallel with other printfs. How would that be possible? It's supposed to be sequential. Answer: the threads/processes running on the emulator is actually really loaded with real threads in the OS? So the printfs can be from others? --- very strangely even if the exit is protected with getchar, the exit() call still gets executed! It seems that getchar() does not work at all. Notice that even not using monitor mode does not work either. Temporary solution: reduce the number of instructions to capture to 300 solves the problem temporarily. ---------------------------------------------------------------- Task 44: problem of dump for system lib instructions. ---------------------------------------------------------------- (1) add into CONFIG file a file to write. DONE. (1) Identify the probelamatic instruction. svchost.exe, 5c679cc0. DONE (2) Set a BP and find the real instruction. env: segs = {{selector = 35, base = 0, limit = 4294967295, flags = 13628160}, { instruction is CMPL $0x...., %ecx. (3) Compare the values of segment registers. DONE. The problem is that update_instr passes an empty buffer (not contents filled in) to add_instr when the instruction is not there. (4) Fix. DONE. add one function has_instruction into the system so that it checks if the instruction is there before update_instruction is called; if the instruction is not there, fill out the instruction buffer first. --------------------------------------------------------------- Task 45: Register access? Think about solution! -------------------------------------------------------------- (1) study what are available in the libdism package x86_insn_t -> operands (type: x86_oplist_t) operand_count (no need t ouse this) x86_operand_list has next pointer x86_opt_list->op (type: x86_op_t) x86_op_t -> type (x86_op_type) -> data (cast as x86_reg_t) if the type is op_register then it's register, but the handling of op_expression could be more complex. use x86_operand_foreach(op_src/op_dest) can read the target and source operand. Now the only problem is how to handle the op_expression. reads the register from data.expression.base or index. (2) think about implementation plan. Add a function to InstrInfo class: parseInOutReg(). Pseudo-code as below: for each input operand (use x86_operand_foreach call) if type is reg. then add the register code if type is op_expression. then add the register base. Note:!!!! jz conditional branch's flag register is not captured!!! (3) use GDB to do the coding first and exploration. DONE. (4) finish the implementation and testing. In progress. see below. (5) fix op_expression. DONE. (6) fix the src/dest problem. TEMPORARY solutio: do not handle op_expression, for those index based or displacement based addressing modes (ignore the dependency on the registers). This might be impacting the pointer arithmetic, we'll handle it later. DONE. (7) Fix instruction dump to dump all registers. DONE. (6) fix conditional logic on flag registers. DONE. --> define a boolean function bool isReadFromFlags(x86_insn_t *insn) --> add those instructions that are conditional jumps, conditional calls --> define a boolean function bool isWriteToFlags(x86_insn_t *insn) --> add arithmetic, logic, and flag_manip instructions, check instr_group --> find out the ID of the eflags -- call x86_flag_reg. DONE. (7) Add the register dependency. Use the accessHistory class to add appendency. (a) declare the accessHistory. DONE. (b) add to handle_instruction. For write register, update the accessHistory; for read register, add the dependency. DONE (c) Debug. DONE. ---------------------------------------------------------- Task 46: dump trace when process exits ---------------------------------------------------------- (1) find out the following information. (a) read about process exit in windows. System call and interrupt number? zwTerminateProcess -> set EAX: 0x0101 It eventually executes SYSENTER instruction ----> so SYSENTER, eax= 0X0101 IS THE time to dump the trace. DONE. (b) find out the interrupt handler in QEMU --> seems to be helper_raise_sysenter. see translate.c:7141. (c) Need to understand how the sysenter is translated. *** key: how to read EAX, Is it safe to read from env->regs[0]? *** (c.1) understand the disas_insn logic at 7141. tcg_ctx->gen_opc_ptr = 0x28c2618c (a) generate a jump to itself! Two steps INDEX_op_movi_i32 (mov PC to a temp) 0xb INDEX_st (mov temp -> env->eip) 0x15 0xb (mov) (b) gen_helper_sysenter two opcodes: 0x8 (call), Now tcg->ctx->gen_opc_ptr is 0x28c26194 (c.2) understand how these opcode are handled. set a bp at 2259 of tcg.c (this is the loop that processes microcode one by one). Iniitially gen_opc_buf is located at 0x28c2617c (this is 16 bytes earlier). So we need to hit it 8 times (2 bytes one code). s->code_ptr is: 0xb6ea4383 The code are: 0xb6ea4383 : movl $0x7c90eb8d,0x20(%ebp) //to set env->eip 0xb6ea438a : mov %ebp,(%esp) 0xb6ea438d : mov %eax,0x80(%esp) //push param env 0xb6ea4394 : call 0x82fa589 <helper_sysenter> It seems that EAX is not protected. env->regs[0] may be different from the real value of EAX (emulated) (because env->regs[0] has only the value of at the beginning of the block). (c.3) understand the logic of helper_sysenter. It resets all segement register's base and limits. It sets EIP to env->sysenter_eip = 0x804def6f @EIP 0x804def6f: length: (5): mov $0x00000023, %ecx @EIP 0x804def74: length: (2): push $0x30 @EIP 0x804def76: length: (2): pop %fs @EIP 0x804def78: length: (2): mov %cx, %ds @EIP 0x804def7a: length: (2): mov %cx, %es @EIP 0x804def7c: length: (6): movl 0xFFDFF040, %ecx Note how the system handles %ecx at 0x804def6f, it's corresponidng tcg_ctx->gen_opc_ptr is 0x28c26184, it has two operaiontls: movl_T0_im, mov_reg_T0. clearly we need to look at mov_reg_T0 -> it eventually call tcg_gen_ext32u_tl(cpu_regs[reg], t0) (here reg is 1 stands for ecx, eax is 0 I guess). Note that cpu_regs[reg] maps from reg number to temporarily allocated reg (reg renaming). It generates mov_i32 (reg to reg) at 0x28c26184 (microcode: a) Now set a BP at tcg.c:2259, it is later leading to tcg_reg_alloc_mov which calls tcg_regset_set(allocated_regs, s->reserved_regs). *** if cpu_regs[1=ecx] is 6, then its data store is located in tcg_ctx->temps[6], as shown below {base_type = TCG_TYPE_I32, type = TCG_TYPE_I32, val_type = 2, reg = 3, val = 0, mem_reg = 5, mem_offset = 4, fixed_reg = 0, mem_coherent = 1, mem_allocated = 1, temp_local = 0, temp_allocated = 0, next_free_temp = 0, name = 0x851fc2b "ecx"} Inside tcg_reg_alloc_mov there are lot of cases to handle: e.g., the mem and the register itself are not coherance, etc. Note the following attributes: TCGTemp in tcg/tcg.h * fixed_reg * mem_coherent means the REGISTER has been sycnrhonized (saved) to the TCGContext->temps array. (2) define the algorithm. DONE 1. add an attribute EAX_BEFORE_SYSENTER to X86EnvState (5 min) 2. in the disas_insn (line 7141)'s part which handles (20 min) the SYSENTER instruction, add code to save current EAX register vale to global attribute EAX_BEFORE_SYSENTER. (a) declare a function: gen_save_regs_before_SYSENTER() (b) find out how to st to env Similar to the following: gen_op_mov_TN_reg(OT_LONG, 0, R_EAX); //check what's the proper ot size? 4? tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State, eip)); 3. in helper_sysenter, read out the EAX_BEFORE_SYSENTER, if it is 0x0101, then this is to terminate process. (10 min) (3) implementation. DONE. (4) testing . (a) set BP at seg_helper.c:2251 (inside helper_sysenter), watch the value of EAX captured. The gen_op_mov_reg_T0 ... has problems debug it Worked now! ---------------------------------------------------------------- Task 47: check data dependency on branch.exe again. --------------------------------------------------------------- Done. 30 minutes. -------------------------------------------------------------- Task 48: add control dependency ------------------------------------------------------------- 0. add Instruction::addControlDependency() 5 min. DONE. 1. record last instruction - declare it in Trace class. 5 min 2. modify Trace::updateInstruction. 5 min. DONE. 3. check srss.exe. 10 min. DONE. 4. check branch.exe. 15 min. SKIPPED ----------------------------------------------------------- Task 49: Establish FTP repository between VM instances -------------------------------------------------------- Solution: (1. not working). use FTP. However, MS DOS has some stupid error and does not support passive FTP mode and we always get 500 port illegal port. (2. use the TAP device). Pretty much follow the instruction http://en.wikibooks.org/wiki/QEMU/Networking. Install and properly edit the qemu-ifup and qemu-ifdown scripts. Note that a lot of Linux commands such as openvpn and firestarter needs to be installed. DOES NOT WORK. (3. following KVM instruction) https://help.ubuntu.com/community/KVM/Networking. Basically modify /etc/network/interfaces to set up the bridge adapter directly. See below: ---------------------- auto eth0 iface eth0 inet manual auto br0 iface br0 inet dhcp bridge_ports eth0 bridge_stp off bridge_fd 0 bridge_maxwait 0 auto eth1 iface eth1 inet manual auto br1 iface br1 inet static address 169.254.236.150 network 169.254.236.0 netmask 255.255.255.0 broadcast 169.254.236.255 gateway 169.254.236.100 bridge_ports eth1 bridge_stp off bridge_fd 0 bridge_maxwait 0 ------------------------ In the run.sh, notice that the type of adaptor is important. PCI-virtio is not recognized by the guest XP, needs to replace it with rtl8139. Also need to use "br0" to replace "hn0" in the original command from "hn0". Note that the host can capture all traffic by tcpdump on "br0". Run.sh string see below: Then sftp can be provided by psftp, download from www.chiark.greenend.org.uk. (2) Adding a second parameter. The trick is to duplicate the handling of br0 and add br1. Note that the 169.254.*.* network adaptor needs to be set as STATIC IP ADDRESS. So the handling is slightly different for br1. **** don't forget to enalbe "br1" in /usr/local/etc/qemu/bridge.conf ############## 2nd adaptor still cannot work ##### check later. --------------------------------------------------------------- Task 50: study VM snapshot --------------------------------------------------------------- Implementaiton: in QEMU monitor use command savevm and loadvm. loadvm needs about 1 minute. But it's better than nothing. Check why KVM is not possible later. *** to make save/load faster, use -drive file=winpx.img,cache=unsafe It will make the load/save shortened to 5 seconds!!! ---------------------------------------------------------------- Task 50: Slice Algorithm ---------------------------------------------------------------- Implementation Steps: (1) InstrInfo add a boolean attributes bCondBranch. Init to true when it is an conditional branch in its constructor. [15 min]. DONE (2) in Trace add a data member Vector slice. [30 min]. DONE. in Instruction add a boolean marker, bInslice. Add a function setSlice(): Q = new Queue {instr, all branch instructions} mark all in slice while Q is not empty: ins = Q.removeFirst() if ins->hasNoDataDependency, mark in slice for each data depenency of ins: add it into queue go over all the instructions and add them revser the slice. (3) Trace::dumpSlice(int low, int upper_limit) [10 min] DONE. dump each slice. (3.5) Add configuration. [20 min]. DONE. (4) At this moment, can study the slice. [1 hr] *1. bug. CMP is not included in calculation. <-- it's the problem in tracing algorithm. FIXED. *2. bug. instruction at 0x401034 (for slicing to 0x40103e) is included (which should not be included). <--. The slice is too coarse. -------------------------------------------------------------- Task 51: Network Problem Again: cannot FTP or SFTP to host network -------------------------------------------------------------- Summary of http://translate.google.de/translate?hl=en&ie=UTF-8&sl=de&tl=en&u=http://qemu-buch.de/de/index.php/QEMU-KVM-Buch/_Netzwerkoptionen *syntax for -net nic vlan=0, macaddr=xx:xx, name=nic1, model=rtl8139|e1000.., *by default each nic is attached to a VLAN (-net user), default 10.0.2.0 network, with DHCP server at 10.0.2.2, host is also accessible from 10.0.2.2 *-net user,restrict=y forbids log on connection to host *"info network" in QEMU monitor to watch network status. * port redirection is to redirect a PORT (say 12345 on host) to a PORT of the GUEST system (say 22). * TAP is a software adaptor. In TAP mode, the VLAN is connected to TAP device; *** in run.sh insert sudo tunctl -t tap0 -u csc288 at the end -> sudo tnctl -d tap0 * in /etc/qemu-ifup, add one line #!/bin/sh sudo /sbin/ifconfig $1 10.0.2.100 [note here the host is set to 10.0.2.100] In /etc/qemu-ifdown, set it to sudo /sbin/ifconfig $1 down TAP alone does not work! No DHCP, and cannot connect though. * in /etc/networking/interfaces, first create a bridge br0 attached to eth0; then in /etc/qemu-ifup, brdige the br0 to tap0. Use sudo "brctl show" to display the information. #######################------------------############################# Solution: (1) do not create bridge statically. Still use the old /etc/network/interfaces file to bring up eth0 (DHCP) and eth1 (static address). (2) prepare startbridge.sh in qemu_images. This file add the bridge, adds a tap and bridges them. sudo tunctl -t tap0 -u csc288 #create tap0 sudo ip addr flush dev eth0 #will drop IP from eth0 sudo ip addr flush dev tap0 #will drop IP from tap0 sudo ifconfig tap0 0.0.0.0 up # strangely this line is required sudo brctl addbr br0 sudo brctl addif br0 eth0 tap0 #now eth0 and tap0 are bridged sudo ip link set dev br0 up sudo dhcp br0 # will set up routing table Use 10.0.2.2 as default gateway. It seems that there is no way to mix both 10.0.2.x and 169.254.0.0 network (so we skip eth1 here) (3) run.sh setting: notice that both script,downscript are NOT used. -net nic,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=no,downscript=no SFTP is running too slow!!! change /etc/ssh/sshd_config to allow longer login time !!!! Make sure "route -n" produce the right routing table. If not, do "sudo service networking restart". ------------------------------------------------------------ Task 52: Add samba support ------------------------------------------------------------ (1) sudo apt-get install samba (2) modify /etc/samba/ config file to enalbe file sharing, started with [homes] ... COMMENT OUT ### valid users = %S then have the following---------------- comment = Network Logon Service path = /home/samba/smbuser public=yes security = share guest ok = yes guest only = yes read only = no force account = smbuser (3) sudo smbpasswd -a smbuser (to add a new user) (4) sudo smbd reload ---------------- In Windows XP, visit Network Places and add a network place. In command window do the following: net use X \\10.0.2.16\smbuser Then X drive is available. ----------------------------------------------------------- Task 53: snapshot problem ---------------------------------------------------------- After network is enabled, cannot do snapshot Solution: (1) experiment if network is disabled, can we do snapshot? - STILL no (2) observation: even after disable the entire network, drop the connection, and replace it with user namework. It still does not work. (3) debugging into QEMU: found that the system is reloaded and helper_trace2 is being executed. The system is trapped in some kernel service (infinite loop). Might be some device problem that causes the error. (4) attempt1: drop the network device and the smb link in the "network places" and try it again. Worked! It seems that it is the smb link in "network places" causing the trouble (5) attempt2: enalbe the tap device and try the snapshot again. Strangely, the guest OS has exactly the same IP 10.0.2.15 and this dynamic IP is assigned ho dhcp at hofstra. We decide to use a different MAC and see how it works. --- DOES NOT WORK!!!! TAP device not working. (6) attemp3: use the user network, however, there was a blue screen. try it again. STILL NOT WORKING, CANNOT duplicate (4). ------------------> by debugging, we found that the windows XP is involved in an infinite loop in process 0x39000. ------------------------------------------------------------ Task 53: Binary Rewriting ------------------------------------------------------------ Implement Trace::writeSliceIntoExecutable(char *filename, vector<Section*>) [3 hrs] //1. move to entry //2. add instruction one by one, without readjusting addr //3. finish writing. Plan: (1) add definition SectionInfo, BinaryWriter [15 min] DONE. (2) add function Trace::writeSliceIntoExecutable(fileName, vector<SectionInfo*>) [5 min]. call binwrite.writeSliceToExec. DONE. (3) add CONFIG reading for Vector Info [25 min]. DONE. (3.5) get all section data from b3.exe. [15 min]. DONE (4) Implment a naive algorithm which FIRST CLEARS all area with NOP, and simply write the instructions ONE BY ONE. [65 min including debugging] (4.1) declaration of all related functions [15 min]. DONE (4.2) writeBytes(FILE *file, int location, char *instr, size) [10min]. DONE. (4.3) writeInstruction(Instruction*, FILE *file, int location) [10 min]. DONE. (4.4) getSectionInFileOffSet(vecSections, Instruction) return -1 if could not find. [15 min] DONE. (4.4) writeSlice (just write each instruction) [15 min]. DONE. (4.5) testing [30 min]. DONE. (5) Add the function clear section. (15 min). DONE (6) Test write Slicing using b3.exe. [1 hr]. DONE. ----------------------------------------------------------------- Task 54: Snapshot FINAL SOLUTION!!!!!!!!!!!!!!! ----------------------------------------------------------------- (1) try user net. CORRECT SEQUENCE IMPORTANT!!! (a) stopvm - delvm if any, restrat (b) stopvm - savevm (use a new id, maybe related) - quit immediately *** IMPORTANT! don't do anything before quitting!!! (c) restart - loadvm (without stop) SUCCESS!!!! (2) try tap device. Problem IP is the same as host. ****** IMPORTANT. CHECK route -n FIRST to make sure 10.0.2.2 is the default gateway for all traffice FIRST!!!! **************------------------ ********************* ***!!!! use STATIC IP 10.0.2.17 for the guest (if it has the conflicting 10.0.2.15 ip assigned!!!!) #*********** **************------------------ ********************* Now set X: drive use net use X \\10.0.2.15\smbuser (3) now the snapshot problem again!!!! note working. info cpus found that cpu status if false. **** still follow (1). do a couple of ***info cpus*** frequently. ----------------- FINALLY SET ---------------------------------------- ----------------------------------------------------------------- Task 57: Solve the Oversize Slice Problem ----------------------------------------------------------------- Problem: say if a function a() is called and uses parameters supplied by multiple functions (e.g., b,c,d,e,g). Let's say x only uses the result from d, but the current slicing algorithm will include bc,e,g as well. This occurs a lot for system functions such as those in ntdll. Idea: associate a time-stamp with each register writing/memory writing operation. When a dependency link is established, the dependency should be tagged with the timestamp of the object that it is accessing. #### Implementaiton Plan: (1) collect current slice size: 28739 for b3.exe (total trace size: 50656) (2) introduce a global timer of long. (name: lTimestamp) [10 min] update the lTimestamp in handle_instruction. DONE. (3) define a new class called dependLink (in instr.h) [15 min]. DONE (4) define the comparison function for dependLink (in instr.h) [15 min]. DONE. (5) modify the data dependency of Instruction class [10 min] [10 min].DONE (6) update the write/read dependency for mem. [30 min] (a) update accessHistory add time stamp [15 min]. DONE. (b) update mem read and write [15 min].] DONE. (7) update the write/read dependency for reg [30 min] (8) test. [1 hr] trace size: 47272, slice size: 27140. does not improve. Reason, we did not take advantage of the time stamp information in the slice alg. (9) ALG: add another timestamp to each dependency link (the creation time). So a link has two time stamps (readTimeStamp: the time that the dependee is created), and the createTimeStamp: the time that the link is created (and thus writing to the destination value). given two links LinkA -> LinkB, they have to match the condition: linkA.readTimeStamp = linkB.createTimeStamp (10) Implementation: (a) add creatTimeStamp to dependLink class and change all related functions. [30 min]. DONE. (b) modify data dependency of memory. [20 min]. DONE (c) modify data dependency of reg [15 min]. DONE (c.2) test first [20 min]. DONE. (d) modify dependency algorithm [20 min]. DONE. (e) testing [30 min]. DONE IMPROVEMENT: slice slice: 5944 (reduced to 20% of original size). ----------------------------------------------------------------- Task 56: Revisit the slicing algorithm. Fix it ----------------------------------------------------------------- 1. Fix the jump instruction issue. Idea: Need to think about the control dependency. Each Instruction will have a "prev link" which indicates the prior instruction right before it. However, if the prior instruction ( in most cases), if replaced with NOP, will still lead to the current instruction. In this case, there is no control dependency on it. Only when the previous instruction is a JUMP, BRANCH, or CALL, or INT3, we need to add an explicit CONTROL DEPENDENCY between them. The handling of RET needs special, to set up the prev link we need to trace back and find the CONTROL DEPENDENCY. The control dependency of an instruction should include all control dependency of a prev link. If the prev link. 1. Implementation Plan: (1) Make the following modification in Instruction class: (1) add data member: priorInstruction [5 min] (2) add comments for set<dependLink*> controlDependency [5 min] previous instruction in (3) in Trace::updateInstruction, update priorInstruction/controlDependency. (3.1) add a global variable Instruction *prevInstruction to Trace. [5 min] DONE. (3.2) add a function setControlDependency( [5 min].DONE. (3.2) add a function findPriorInstruction which traces back for the prior instruction) [15 min] DONE. (3.4) add an line function which tells if an instruction isJUMP() [10 min] DONE (3.5) add an inline function which tells if an instruction isRET() [10 min] DONE (3.5.1) add an line function for adding control dependency. [15 min] DONE. (3.6) Complete setControlDependency() [15 min] DONE. (3.7) Test findPriorInstruction [15 min] Use SRSS.exe DONE. (3.8) Test setControlDependency [15 min] Use SRSS.exe DONE. (3.9) Change algorithm of tracing [20 min] (a) add a function genDirectControlDependency() - return a set of dependLinks [65 min] . DONE (a.1) change the add control dependency algorithm [15 min] DONE. (b) bulid it into algorithm [5 min]. DONE (c) test the addControlDependency first [5 min] DONE. (d) test the selectly add control dependency function [15 min]. DONE (e) test the slice algorithm[15 min]. add a local variable to record queue size. (3.20) Test use SRSS.exe [15 min]. DONE (a.1) fix the dump [10 min] DONE (a.2) fix the setPriorInstr [10 min]. DONE (3.21) Test use b4.exe [1.5 min] (a.1) set up the watching stats. [10 min]. DONE. (a.2) check why it's too slow. [75 min] It seems to be the control dependency size that causes the problem. (a.3) remove the oldest timestamp. [25 min] DONE. still memory problem. (a.4) run the program b3 again and see what is actually the problem. [30 min] It seems to be a destructed b3.exe. Reconfigure the system. DONE. (a.5) study the trace file again. Problem: slice has only one instruction. Control dependency is not taken in! [15 min] (a.6) fix the above bug [60 min] trace into Trace::setSlice and step by step. (1) fixed one bug related to queue push (2) control dependency has no readAccessTS. (fixed) (a.6) fix the data dependency link problem [30 min] When doing the inclusion set, should do -1! (a.7) problem, register dependency not taken into account. [60 min] It seems that -1 does not occur to register dependency. Fix: in selectivelyAddControl dependency, timeStamp and accessTime the same, fix that! (a.8) there is still a bug related to the "-1" problem. Debug: trace into instruction 0x401034 and 0x401031 and 0x401024, and observe the timestamp associated with each of the links. Fix: the problem is that when handling memRead and memWrite, the actual time stamp should be -1.. FIXED ----------------------------------------------------------------- Task 57: Slicing Algorithm. Avoid visitng the same control dependency link multiple times. ----------------------------------------------------------------- use cache. (1) introduce a cache set and compare function in Trace class. [15 min] (2) use the cacle in selectiveAddControlDependency. [10 min] (3) test [15 min] *** set::count() DOES NOT work! --> replace count() with find() does not work. Very strangely, the linkComp work. New experiment: in updateInstruction, just create another two dependLink and see if we could break at the comparison function. fixed the stupid error: forgot to add the instruction to the set of visited. took 3 hrs to fix it. ----------------------------------------------------------------- Task 58: Slicing Algorithm Problem: slicing stops at 0x401005. ----------------------------------------------------------------- Observation: it seems to be a simple bug of < vs "<=" CLEARED. done. ----------------------------------------------------------------- Task 59: Examine the slice. See if it's executable. ----------------------------------------------------------------- Observation: the sliced program is involved in an INFINITE LOOP! check later. Infinite loop: 0x401298 to 0x4012AC. However, the behavior departs from 0x401277 and 0x40127D. Found bug: 0x401277 depends on 0x401275 (XOR ESI, ESI). But instruction at 0x401275 is not included in slice. Check dump: *** the problem again is the time stamp not matching each other. There is a gap of around 0x30 between the access time of 0x401277 and 0x40127d and the two depend links can NOT be connected. Debugging design: (1) set BP at 0x401275, 0x401277, and 0x40127d (2) constructor of dependLink constructor conditional breakpoint. check time stamp. Observation: the problem is caused by the context switch. At 0x401277 there is a context switch, and 0x401277 is later executed twice (resumed) and the time stamp cannot be connected. Solution: once it is found that it is a context switch, add the current time stamp as the access time to each dependlink. Question: how to discover that this is a back from a context switch? check the global previous instruction insn_type. Implementation: (1) in Trace::setControlDependency, add a case switch for IRET. and add a call to Instruction::updateDependencyForIRET. [5 min]. DONE (2) impelement updateDependencyForIRET. (a) for control dependency, if the priorInstr already exists, just update the time stamp. [10min] DONE. (b) for data dependency, first find the max smaller ts in all dependencies, then check those which contains this max-smaller ts and add the current ts. [20 min] DONE (3) Debugging (a) walk through all newly added instructions. [60 min]. Need to send commands using QEMU Monitor, otherwise hit the bp too fast. DONE (a.1) fixed a small bug related to NULL. [5 min] (b) check the resulted file. the "XOR" issue fixed. Now the problem of JMP is missing. ----------------------------------------------------------------- Task 60: Examine the loss of control dependency of JMP instruction. ----------------------------------------------------------------- (1) collect the trace and check 0x401275. It seems that it did not trace back to 0x401070. (2) debug: set conditional breakpoint at 0x401275 in slicing, check what is going on. (a.1) it seems that 0x004013cf is in the slice, still need to check if it is WRITTEN. VERIFIED -> OK! (a.2) debug again check if the queue is accessed RIGHT. verified Ok. (3) new problems? at 0x004012da ----------------------------------------------------------------- Task 61: Slicing Algorithm Bug 3. at 0x4013CA ----------------------------------------------------------------- Observation: the function is called, entirely it's empty and it does not RETURN to the right place. (1) check why 4013CA is in the slice: because the following instructions have depencies on it: (1) 0x804e1f25 (looks like context switch, NOT IN SLICE). check later (2) 0x403e5e (ret, NOT IN SLICE ): to return (2) debug check how 0x4013CA is listed in SLICE. set conditional BP at trace.cc:213, 286, also set on push link to Queue. found that 0x4013CA is pushed by __0x4013ca__________ to the queue. Bug found: 0x4013ca is listed as control dependency of itself!!! Observation: 0x4013ca is reached from an IRET. then updateDependencyForIRET is called. The problem is that the update of the lastInstr occurs too early!!! --> actually not there is a recursive call of the setControlDependency (which should actually be setPriorInstr) -> then the "lastInstr" is not updated correctly. Correct it and still keep the set lastInstr operation. -----> FIXED. 0x4013CA is not in slice any more. ----------------------------------------------------------------- Task 62: Fix RET/IRET logic ----------------------------------------------------------------- The RET/IRET should be included in control dependency. Simply add a control dependency. ----------------------------------------------------------------- Task 63: Slicing Algorithm Bug 4. 0x00403DFB ----------------------------------------------------------------- Problem: function call without pushing the parameters. But these parameters are accessed by system calls. E.g., 0x7C8017FD has data dependency on the push instruction, but the push instruction is not included. The problem is that those instructions (which are NOT supposed in the slice), are not wiped with NOP instructions. So they still have the dependency on the pushed parameters. For Task 62, there is also a problem: what if a function has no any data dependency by the target instruction, then the entire function should NOT be included at all. [1] Fix for 62: make the slicing algorithm a multiple pass algorithm. In each pass, go over the instruction with PriorInstr to be a call again, search the entire trace and look for those CALL/RET pairs (forward), if there is any instruction in slice in between, add the RET instruction as the control dependency for the subsequent instruction, and redo the slicing. Until no more is added. [2] Algorithm: /** It checks the trace in the forward fashion, for each call instruction, find the the correspoding SUBSEQUENT INSTRUCTION, and then searches in the trace and see if there is any instruction in slice, if there is any, then add the corresponding RET instruction as the control dependency for the SUBSEQUENT instruction. */ int Trace::checkCallPairs(queue toProcess); /** scan from ts1 to ts2 and see if there is any instruction in slice */ bool Trace::hasInstructionInSlice(long ts1, long ts2); some assisting functions Instruction *getSubsequentInstr(Instruciton *ins); //get the subsequent instruction in EXECUTABLE (not in trace) Instruction *getInstruction(unsigned int addr, unsigned int cr3); [3] For the above, we need to introduce a new history class for the ENTIRE trace. As the trace can be very huge, we only RECORD THE ADDRESS of instructions being hit at each time. This class should be at the logical/abstract level and be later expanded to include support for disk-mem exchange operations. As for the packing/unpacking, the different versions of instructions should be modeled by the Instruction class (and later queried by timestamp) class instrAddrHistory{ instrHistroy(unsigned int cr3); unsigned int getCR3(); long getSize(); //return the total number of instructions(addr) recorded void appendInstrAddr(unsigned int addr); unsigned int getInstrAddr(long timestamp); } Add a hash table to Trace class which maps from cr3 to instrAddrHistory ----------------------------------------------------------------- Task 64: Slicing Algorithm Bug 4. ----------------------------------------------------------------- Now fix the CALL pair RET problem. (1) add a addrInstrHist map to Trace [5 min] DONE. (2) in Trace::updateInstruction, puch the address to the history, including testing [30 min] DONE. (3) add prototype of the following functions. [10 min] DONE int Trace::callInSliceCalls(queue toProcess); bool Trace::hasInstructionInSlice(long ts1, long ts2); (4) implement handlelInSliceCalls [120 min] (a.1) there is an infinite loop problem. Fix it by adding a bit information about access. Algorithm: added one bool bit for representing the in slice status. (6) test handleInSliceCalls [30 min] ----------------------------------------------------------------- Task 65: Slicing Algorithm Bug 5. Infinite loop at 0x00403DFB ----------------------------------------------------------------- Problem: parameter pushing. The algorithm thought that some function (like those in imported dlls) are not contained in slice, but actually they are, because we do not modify those system dlls. This leads to some missing parameter pushing operations when calling these functions. Fix idea: if a function (not in slice range) has some function in slice, then all function instructions (with proper timestamp) are labeled as in slide. Implementation: (1) add parameter vecSectInfo to setSlice, and handleInSliceCalls [5 min] DONE. (2) add a function processInSliceCall(ts1, ts2) [5 min] DONE. (3) implementa processInSliceCall(ts1, ts2) [125 min] (1) issue 1? infinite loop. fix (2) issue 2: call/pair loop repeated two many times. fix. (3) issue 3: infinite recursion again. fixed. slice size: 43557 (entire size: 50997) (4) remove the add RET in control link. (4) debug processInSliceCall [15 min] (5) test the resulting file [15 min] Infinite loop solved. ----------------------------------------------------------------- Task 66: Add CONFIG paramter FILE_TO_ANALYZE ----------------------------------------------------------------- (1) add FILE_TO_ANALYZE in config file and add implementations to it. [15 min]. DONE (2) test it. [10 min]. DONE. ----------------------------------------------------------------- Task 67: collect stats ----------------------------------------------------------------- Collect: (1) total in range instructions. (2) total number of instructions in slice Implementation: (1) add two attributes: nInRangeInstr, nInRangeSliceSize into trace class [5 min] DONE (2) in Trace::addInstruction increment nInRangeInstr [5min]. DONE. (3) in Trace::writeSlice increment nInRangeSliceSize [10 min] DONE. (4) test [90 min] (1) trace. DONE. (2) testing DONE. in range slice size: 1169, in range trace size: 2470. ----------------------------------------------------------------- Task 68: test slicing functions ----------------------------------------------------------------- idea: the same function called twice. one is used and the ohter is not. Make sure that the other one does not appear in the slice. Implementation: (1) compile the dump the program. [10 min] (2) run slice algorithm and then analyze it [40 min] Test program in b1.cpp -> b1.exe , run as b5.exe Entry: 0x401030 SLICE_AT: 0x00401068 Found problems: (1) parameters are NOT pushed (still related to the function body). Function body is not in slice! error in previous algorithm. (2) the function is still being called twice. Fix Debugging: ************** (1) parameters are NOT pushed (still related to the function body). Function body is not in slice! error in previous algorithm. ************** Set condition BP on 0x401048/401057 where 0x401005 is called. Found several problems: (1) to test if an instruction is in slice, should use the timestamp. DONE. but not fixing the two duplicate calls (2) the number of instructions included in slice does not seem right. ONLY 1 instruction added. --> this seems OK. 0x401026 is captured, however, it did not trace to 0x401023. Problem with 0x401026, when build dependency it does not include the READ register. Disable the getDiff(input_reg, output_reg) in the constructor of InstrInfo. DOES NOT WORK. Now trace into InstrInfo constructor and check out how registers are being processed. Found the problem is with the handling of wr type of registers. But it forms a self-loop. Fix the Instruction::updateRegDependency. FIXED. ----------------------------------------------------------------- Task 69: fix issue 2 (duplicate function calls) ---------------------------------------------------------------- (1) first change the linkComp comparison, two instructions are regarded as equal only when both the instruction and create timestamp are the same. (2) made similar changes to handleMemRead and handleMemWrite --- still does not work. debugging plan (1) bp at instr.cc:252 and check instruction 0x401026 hit twice, what happens. Strangely two links are added, but in the dump there is only one showing up. (2) now the two links did show up, NOTE dump file too large, introduce debug levels. But it still takes a very long time to produce the slice. --- still not fixed. Debugging Plan: BP on setSlice and trace into the while loop. check the instructions added one by one. Added one debug function for dumping the contents of queue. Observation: trace on the ESP/EBP register caused the problem. It also enlarges the slice generated greatly. Now the question is: can we ignore the ESP/EBP tracing? if "normal" compiler is used and ESP/EBP is not used to pass data, then we can safely ignore them. Implementation Steps: (1) find the MACRO code of ESP/EBP. It's actually in disasm/ia32_reg.c big structure, search for "eax", esp and ebp are at index 5 and 6 respectively. This is verified by the dump of trace as well. DONE. (2) add system parameter TRACE_ESP_EBP. DONE. (3) modify the logic for TRACE_ESP_EBP. DONE. (4) check the size again. Still take quite some time to generate the slice. DONE! dual function call problems solved. Now the slice size is half of the in range trace. ----------------------------------------------------------------- Task 70: New problem with seh_prolog4 ---------------------------------------------------------------- The problem is that there is some instruction assignment to ESP, and the disable TRACE ESP/EBP lost the track. Experiment 1: enable ESP trace again, see if the problem persists. All range: 40304/51002, In Range: 1313/2284, total links: 724,692 (this number is much larger) doubles the time. Experiment 2: disable ESP trace. All range: 31495/51520, In range: 1007/2284, Total links: 300658, 2 minutes Experiment 3: in the ENABLE version, ignore the instructions such as CALL/RET (because they will change ESP/EBP by default anyway). All range: 39772, 51542 , In Range, 1245/2284 Total Links: 845305. It DOES NOT HELP! The reason, there are still inside function instructions (like pop) impacting ESP value. Experiment 4: add hist based visit check in setSlice, see the result. links visited: 513k. Improves about 40%. Slice Size: 30526, In Range Slize Size: 888, In Range Trace Size 2284 Experiment 5: performance is very bad. Check if this is the problem of hist swapping. Add stats to hist swapping. Also add the timing information. Stats: # total links visited: 409622, histSwaps: 15384, duration 232.000000 (sec) - 4 minutes It seems that there is too many history swaps. Experiment 6: increase the size of history buffer to 20MB and see the result. Saves about 10% # total links visited: 407121, histSwaps: 29, duration 196.000000 Slice Size: 26059, In Range Slize Size: 879, In Range Trace Size 2284 *** so here the major bottle neck is still the total number of links visited. Experiment 7: still use the same idea as experiment 3, ignore CALL/RET/PUSH/POP, only handle the specific MOV/ADD instructions that involve ESP/EBP. There is a timestamp related bug. Fixed the addDirectControl Still need TO TEST IT!~ This greatly reduces the time spent on slicing. # total links visited: 280876, histSwaps: 8, duration 56.000000 Slice Size: 31531, In Range Slize Size: 1019, In Range Trace Size 2284 Experiment 7 does not fix the problem with seh_prolog4. Does not work. PUSH/POP not working. Has to cancel the optimization on the ESP/EBP. ----------------------------------------------------------------------------- Task 71: fix problem with instruction 0x40129C depends on 0x401297 ----------------------------------------------------------------------------- Problem: AX instead of EAX register. Implementation: add the handling in Instr handler and setInOutReg of InstrInfo.. Fixed. ----------------------------------------------------------------------------- Task 72: fix problem with instruction 0x4012AF (implicit register on EAX) ----------------------------------------------------------------------------- Debug: set a breakpoint on the update_2 and update_3 functions in instr.cc and a conditional BP. The problem is that the x86_ea_t (expression) is not processed when a register is used as a base or index register for calculating effective address. In handling implicit and explit, handle the implicit. Implmentation: (1) add an assisting function handleOpExpression(op, input_reg) to each update_func for input [5 min] (2) logic: if op is expression, if index/base id is not 0, then add it to input. [15 min] (3) debug [25 min] DONE. The slice size is increased though. ----------------------------------------------------------------------------- Task 74: Comparative Study and find out why qemu loadvm is so slow and why cpu status is halted. ----------------------------------------------------------------------------- (1) find out which functions in QEMU are responding for loadvm (during loadvm, Ctrl+C and bt) After loading the vm, the status is stuck on (looks like an infinite loop) qemu_run_all_timers () at qemu-timer.c:454 (calls the following) qemu_run_timers is visited many many times It seems that it takes a very long time to reach the "break" in the following 386 if (!qemu_timer_expired_ns(ts, current_time)) { 387 break; 388 } Verified, it's this stuck which causes the delay of loading. It seems that the loop/timestamp combination caused problem. SOLVED FINALLY! The problem is with the clock using host clock (merging clocks cause trouble). Add an option --- -rtc clock=vm --------- ----------------------------------------------------------------------------- Task 75: check performance problem ----------------------------------------------------------------------------- (1) total hits: 695k, total links hit: 400k, time: 60seconds. Strangely, performance increased over 10 times! Due to efforts in 74. (2.1) Effort 1. check the timing of addDirectControlDependency. it's 7 sec. (2.2) Effort 2. check the timing of the other parts of while loop. It's 58/60 seconds (most of it) So the majority is the main while loop. Set three sets of data and calculate each part. (2.3) Effort 3. check the main while loop, split into three sections. 0 sec. t1, t2, all 0. t3 takes all of them. Haha, found that link->setReadAccessTime.count() accounts for most of the time! Even with the use of "find" it's no good. t1: 16, t2: 51, t3: 120. treis to replace link->setReadAccessTime.find()!=end() with a simple loop t1: 16, t2: 48, t3: 122, maxset size: 12. Decision: keep the link->find() solution and remove timing t1 to t3. Now the trace slicing time down to 60 seconds. ----------------------------------------------------------------------------- Task 73: Problem with 0x00403604 ----------------------------------------------------------------------------- Problem: at addres 0x0041DE80 the two version's value do not match. Reason: Instruction 0x004035c1 is set to NOP. It should be depended by 0x004035f0. Debugging: (1) verify the following fact: 0x4035c1 sets up the value of 0x0041de80, and then 0x004035f0 reads it. Verified. -> value changed to 0x003200d0. DONE. (2) check the dump and verify the dependency is set up correctly. Note: 0x004035c1 is in the dependency list. In the dependency list: 0x00403f38 (once), 0x004035fc (many times - loop), 0x004035c1 (once) Observation: the time stamp of 0x004035c1 are all 0x90xxx, but the timestamp of 0x4035fc's are 0x60xxx. It seems that the older time stamp of 0x004035c1 is KICKED OUT. Got to remove the logic of keeping the size of time stamp. After the change is made: no big change on slice, however, time increased to 80seconds. SOLVED! Up to now, the SLICING ALGOIRTHM is completely working! YEAH! ----------------------------------------------------------------------------- Task 73: Slicing Improvement: ESP/EBP problem ----------------------------------------------------------------------------- Simple program: 1 int x = f(a); 2 int y = f(b); 3 int z = x + 5; 4 print(z) Backward slicing from z, then the 2nd statement should NOT be called at all! But in binary slicing, function calls f(a) push/pop parameters, CALL/RET modifies ESP/EBP registers, it makes the function call g(b) depends on the instructions. See below: 1 push EAX ; ESP -=4 2 call FUNC_F ; ESP -=4 3 PUSH EBP ; ESP -=4 4 MOV EBP, ESP ; EBP = ESP 5 //do something 6 POP EBP ; EBP covers to OLD EBP, ESP+=4 7 RET ; ESP+=4 8 add ESP, 4 ; ESP+=4 8.5 mov addr_var_x, EAX ; x = f(a) done 9 mov ECX, some_var_addr 10 push ECX ; ESP-=4 11 call FUNC_F ; ESP-=4 12 PUSH EBP ; ESP-=4 13 MOV EBP, ESP ; EBP = ESP, 14 //do something 15 POP EBP ; EBP = old EBP, ESP+= 16 RET ; ESP+=4, depends on 15 17 ADD ESP, 4 ; ESP+=4, depends on 16 18 mov EDX, variable_z ; depends on init 19 mov EcX, var_a ; depends on 8.5 20 ADD EDX, ECX ; depends 18, 19 21 mov var_z, EDX ; depends on 20 22 push var_z ; trace starts, depends on 21 AND 17!!! WHICH INCLUDES 11 to 17 23 call print At this moment: the slice size is shown below: # total links visited: 546121, histSwaps: 8, duration 84.000000 Slice Size: 42507, In Range Slize Size: 1411, In Range Trace Size 2284 Plan, will need a rather complex multiple pass algorithm. To accomplish this, we need to record the full time stamp and dependency information. ----------------------------------------------------------------------------- Task 74: Re-engineer the traceinstr package and introduce unit testing ----------------------------------------------------------------------------- TraceManager - Trace - instrTrace - Instruction - InstrInfoStore - InstrInfo Implementation Steps: (1) establish folder, include handle.h and handle.cc first, and establish Makefile [2 hrs] DONE. (2) remove global variables in handle.cc and introduce several new functions [1 hr]. DONE. (a) remove cr3_to_trace and add function isProcessNameToTrace (b) remove global variables trace, NAME_TO_TRACE, etc. (3) define class declarations. [1 hr ] DONE (4) implement handle.cc forward all calls to TraceManager [1 hr] (a) isProcessToBeTraced. DONE. (b) TraceManager Constructor. DONE. (c) remove init_Tracer. DONE. (d) forward call dump(). DONE. (e) add save(). DONE. (d) revise has_instruction. DONE (f) add_instr and handle_instr. DONE (5) implement a dummy isProcessToBeTraced and test it hits add_instr [0.5 hr]. DONE ----------------------------------------------------------------------------- Task 75: experiment with monior ----------------------------------------------------------------------------- (1) find out where is the QEMU monitor. DONE. Functions related: do_loadvm(Monitor *mon, const qdict), defined in monitor.c the handle_user_command(char *str) command is more user firendly. no need to handle *mon and qdict. DONE (2) add a new MENU of QEMU called "batch_analyze" to QEMU monitor which calls BatchAnalyzer. DONE (a) add an entry in hmp-commands.hx (seems to be help file) (b) add a function do_bprocess in monitor.c (2) add an "BatchAnalyzer" class. DONE. (a) add an empty BatchAnalyzer class, singleton pattern. DONE. (b) add a wrapper function in handle.h. DONE. (c) in BatchAnalyzer class calls loadvm. There is a global variable called default_mon, call command handle_user_command Trouble is with the cross reference (linking). Make sure to wrap the functions in "extern 'C'", and put the cross-ref part in dummy.cc. DONE. (d) finally solved. need to remove the "STATIC" keyword before the "handle_user_command" in monitor.c!!!!! - "static" hides the function in the linker!!!! Still does not work. separate BatchAnalyzer.o from libinstr.so. (e) another problem. Both qemu-i386 and qemu-system-i386 are compiled. qemu-system-i386 pass through, however qemu-i386 did not pass because there is no monitor.cc at all. Need to find out the way to handle it. ---> solution: the $(all-obj-y) includes the conditional inclusion of monitor.o, so just put BatchAnalyzer.o together with minotor.o (when they are cross-referencing each other!!!!!!!!). (f) gdb to find out if it works. Fix: we have to copy the .so files to /usr/lib first. ---- DONE NOW ---- ----------------------------------------------------------------------------- Task 76: Implement BatchAnalyzer class ----------------------------------------------------------------------------- Design idea: since their is batch processer. All jobs should have similar configurations. (1) Nail down specification of config.txt. DONE (2) create function: loadvm(const char* name) [20 min]. DONE (3) create function: copyFileToVM(char *fileBasePath). DONE. (4) Problem: after loadvm, the control does not resume. It seems to be the problem of special chars. DONE. ----------------------------------------------------------------------------- Task 77: Figure out waiting for process termination (fast way) ----------------------------------------------------------------------------- (1) check output. FAILED. did not capture anything. can be done later, e.g., to capture the OUT instruction of the device to find out the chars being printed. Need to later figure out how is I/O processed. (2) loadvm problem. DESIGN: could wait until 2 more new processes are discovered by instruction execution mode to know that VM is successful loaded. (3) general design: need to have an event triggering method, when 5 processes are discovered, trigger the next function. ----------------------------------------------------------------------------- Task 78: Create Logger Class and Util class ----------------------------------------------------------------------------- Design: (1) It should support multiple logging mode (2) It should display on screen (3) It should have a task name (4) There should be one logger for each task Implementation: (1) Util.isFileExist. DONE (2) Util.createDir. DONE (3) Logger. constructor and test. 20 min. DONE. (4) implement the log and dump functions. DONE. DONE. ------------------------------------------------------------------------------- Task 79: Event Trigger Mechanism of BatchAnalyzer class ------------------------------------------------------------------------------ Design: (1) load vm, triggers event (more than 3 processes now) - number of processes will be cleared to 0 in QEMU part initially right before loadvm (2) net use command, triggers event (terminal print - command completed. Seems we can intercept sysenter call and check the data , check zwWaitReplyRequestPort!!!) ------------------------------------------------------------------------------- Task 80: Implement Parse Configuration File of BatchAnalyzer ------------------------------------------------------------------------------ Implementation: (1) Create a JOB Class to include common configurations of jobs and specifics. [20 min] DONE. (2) implement BatchAnalyzer parseConfig. [220 min] (1) file operations. DONE. (2) parse line. DONE (3) Util string related functions. split etc. DONE. (4) Util string comparison. DONE. (4) Implement the parse function . DONE (3) in do_jobs, get the folder one by one, create a TraceManager class and carry out the job [30 min] ------------------------------------------------------------------------------- Task 81: Generate Jobs and Gen Raw Trace ------------------------------------------------------------------------------ Idea: continue the implementation of do_jobs(). Retrieve the job one by one. (0) Change config.txt to type GEN_RAW_TRACE [5 min] DONE. (1) in do_jobs, for each directory, get the sub_directory name, call genJOB(sub_dir) which returns a JOB. Temporarily in the test folder add a function to test do_jobs, but need to delete it later. [120 min] DONE. (1.5) introduce Util::error_exit. 5 min. DONE. (2) implement genJOB(sub_dir), depending on the job category, read the CONFIG file if necessary. Set up the job_base_path (this is an instance property). Based on the category of jobs, read the config file correspondingly. [15 min] DONE. (3) in do_jobs call execJOB(job). this is basically a switch case that calls the corresponding exec_TYPE_JOB function. [10 min] DONE. (4) implement execGenRawTrace(job). It creates a TraceManager (init constructor with job instance). [15 min] (a) add TraceManager constructor. [8 min] creates trace instances for the processes to monitor respectively. (b) furnish Job initializer to read in all details. (b.1) add SMB folder in Job [5 min]. DONE (b.2) add Util::getRelativePath. DONE. ------------------------------------------------------------------------------- Task 82: Implement the event capturing mechanism ------------------------------------------------------------------------------ Implementation Sequence: (1) introduce struct tracer_event and define the first three categories. [20 min] DONE. (2) add function send_event() to handle.h and handle.cc [10 min] DONE. (3) implement the vm loaded event [15 min] DONE. (4) implement clear_process_info and add it to loadvm[15 min]. DONE. (5) implement the process terminate event. [45 min] (a) find how process termination is discovered. [15 min] FOUND IT. it's in seg_helper.c, helper sys_enter. (b) place in sys_enter[15 min]. DONE NOTE that EAX can NOT be read directly from env, there is a global variable EAX_BEFORE_SYSENTER set in translate.c. DONE. (c) test it [15 min]. DONE (6) implement the print string event. [1.5 hr] (a) use XP to compile many different printf statements and putchar and cout [15 min]. DONE (b) use IMM to find out the corresponding syscall and the EAX number and where are the data [45 min] For putchar: EAX=0xc8, DATA LOCATED AT *(EDX+0xc)+0x30, it's a string terminated with 0 for printf: Yes, it is also *(EDX+0xc)+0x30, however, it's not terminated with 0 For cout, it's printing character by character. Next question is: where is the data length? Need to trace through printf. It's located at *(EDX+0xC)+0x84 So data is located at: *(EDX+0xC)+0x30 Data size is located at: *(EDX+0xC)+0x84 (c) figure out how EAX_BEFORE_SYSENTER is handled and maybe get the EBX_BEFORE_SYSENTER as well [30min] (1) similary create EDX_BEFORE_SYSENTER in CPUx86State. Copy it in the gen_save_regs function in translate.c [10 min] DONE. (2) verify the print function is 0xc8 in EAX. [15 min]. DONE. verified needs both EAX to be 0xc8 and ECX to be 0x0098007c (d) implementation in sys_enter in seg_helper.c [15 min] (1) create the conditional capture in helper_sysenter [5 min] DONE. (2) declare a function copy_string(char *buf, int &buf_content_size, ulong start_addr, int size) [5 min] DONE (3) add printf_buf, print_buf_size, and then call copy_string [5 min] DONE (3.5) test if copy_string is called and if the data is retrieved right. DONE. Found that actually there are two modes. 8-bit char and 16-bit wide char. (4) implement copy_string, whenever encounter a "\n", send the event and reset the buffer. [20 min] DONE (5) test copy_string in the test folder. ------------------------------------------------------------------------------- Task 83: Implement the event handling mechanism ------------------------------------------------------------------------------ Idea: BatchAnalyzer maintains a vector of tasks. Each task is triggered by a certain event. Tasks are executed one by one and can be timed out. For example, in executing a job, the tasks are: (1) loadvm (trigger: none). complete status: timeout or loadvm event received (2) execCommand: "net use y: \\10.0.2.15\smbuser". complete status: "net use success message", or time out, or "failed" (3) execCommand: "copy y:\b1.exe .\" complete status: copy completed or time out or failed (4) execCommand: "b1.exe" complete status (process terminate captured or time out) Implementation: (1) define class Condition. [15 min]. DONE. getStatus() setStatus(evt) three status available: UNDECIDED, FAIL, SUCCESS (2) define derived class ConditionOnLoadVM [60 min]. DONE The tricky linkage problem again. (3) define derived class ConditionOnProcessTerminate [15 min]. DONE an an attribute on CR3. (4) define derived class ConditionOnPrintString [15 min]. DONE (5) define class Task, which a success condition [30 min]. DONE Task(timeout) fire(); //calls do_job and starts timer thread do_job() (6) define derived class loadVMTask [15 min]. DONE (a) define loadVM Task. DONE (b) add function addTask(). DONE (c) add configurations: LOADVM_TIMEOUT, NETUSE_TIMEOUT, COPY_TIMEOUT, TASK_TIMEOUT. DONE (d) get rid of the vtable problem. DONE. It's caused by pure virtual function. However, the compiler complains about constructor, which is silly. (e) add function startExecuteAllTasks(). DONE. (f) add semaphore. DONE. (g) add execNextJob(). DONE (h) add waitForAllTasksComplete(). DONE (i) implement the fire() calls two methods, do_job and set_time_out. DONE. (j) finish the timeout event handler. DONE. (k) finish the loadVMTask::do_job. DONE (l) add send_event to BatchAnalyzer so that event could trigger execNextJob. Logic: find the current task, set the condition. check the condition of the task, if it's not UNDECIDED, then either move on to the next task, or fail the entire job. Implementation Steps. (1) add a Logger instance to the Job class. [20 min] DONE. (a) add a JOB counter (2) add various log statements to each step of execute a job. [15 min]. DONE (a) add Logger instance to task, do not delete it as this is the logger for job. (3) implement the logic of send_event (add a function to BatchAnalyzer) [20 min] (7) add handle_timeout_event() to BatchAnalyzer [30 min] (a) add current job to BatchAnalyzer DONE. (b) add Util::EvtToString. DONE (c) add Job::toString(). DONE (d) add genTasksForGenRawTrace(Job *job). DONE (e) handle time out event. DONE (f) handle other events. (1) add checkEmptyList(). DONE (2) update execNextTask. (8) test the framework up to now (1) test logging. Create two jobs and let it work. [25 min]. DONE Problem: it kills directory. FIXED. (2) fix the abort problem. Problem: when qemu tries to do_all_vm stop, it tries to stop all vcpu, which calls pthread_cond_wait, that needs the assumption the global mutex is owned by the thread. However, the task itself is a new thread, which does not own the global mutex. Fix idea: when executing a task, don't create a new thread, just call the do_job directly. Except the load_vm task, all other tasks are asynchronous. The do_job of the task will be finished immediately (then it's done). When the event comes in, the do_job of the next task will be called. Similarly, execNextJob will be called. So there is no need to use semaphore at all. -- 8:00am 08/10 (a) fix logger of BatchAnalyzer. DONE (b) add logger message to execNewJob. DONE (c) remove extra thread in executing task.DONE (d) implmeent the handle_event. DONE. --- 9:00am 08/10. (e) implement TraceManager destructor. DONE. (f) fix handle_event log. DONE. (g) fix the BatchAnalyzer log problem. DONE. (h) fix execnextJob at the end of job list problem. DONE (i) fix the display level problem. DONE (9) define derived class netuseCommandTask --- 10:00am 08/10/2013 (a) declare conditionOnMsg. already defined. DONE. (b) declare class and its constructor.DONE (c) add Task::toString(). DONE. (d) add Task in job. DONE. (e) add config samba_ip. DONE (f) add NETUSE timeout. DONE (g) fix bug of execNextTask (should not remove the current task). pop should be done in handle_event. DONE. (h) fix the swapping sequence of fail and success message. DONE. (i) fix the problem of deleting task twice. DONE. --- 11:00am 08/10 (j) find the fail string "was not". done. However, it could take too long. Need to use the timeout to stop the task. DONE. (k) fix the check of timeout, and change timeout value. DONE. (j) fix the timeout issue, no event sent. DONE. (l) now another problem when timeout: it crashes on segmentation fault. FIXED. this is because the task has already been deleted. DONE (m) another similar problem when a thread comes back, logger becomes invalid. Remove logger command for message. DONE (k) fix the qemu_cond_wait again. The problem occurs when the previous command timedout, check what if it does not time out. seems it does not occur again --- TO DO ------------ (o) check why it times out the net use command. The system does not have a abort when net use is successful. However, it aborts with the error on sema_condition wait when it tries to stop all VM. This might be because that the system is doing I/O and it may have locked certain devices. DONE. (i) figure out the handle_user_command and see if there is anything with the lock. Found that handle_user_command is done in main_loop in vl.c:2007 (maybe that's the safe place to call. DONE. (j) VERIFICATION: disable network run net use, and then loadvm in monitor and see if the same error could occur. DONE. Verified. (j) in vl.c add two functions: (1) append_user_command, (2) execute_user_command. Keep a buffer of char * cmds[] to store commands. Test it in Util.cc first. This should be a big array maintained by two indexes. (a) make the framework go through. COULD STILL NOT GET THROUGH AFTER MAKE CLEAN. Just leave one target dir in config-host.mak (and remove all other objectives!!!!!) ################################################################################# #############!!!!!!!!!!!!!!!1 COPY IN config-host.mak config-host.mak.cp!!!!!!k ################################################################################# (b) add a queue list to BatchAnalyzer. DONEj (c) call it in main_loop.c:407. still problems with linking. Add BatchAnalyzer.o in Makefile.objs. !!!!####????? still problems. (d) replace all handle_user_commands in BatchAnalyzer with addCommand. DONE (e) fix the compiling error. linker problem. Still does not work. Linker traces back to monitor.o which does not exist yet. check how actually it is linked. *** make it an .so file does not work. DONE, not solved (g) check if no calls to BatchAnalyzer.o, how it gets compiled. #0 handle_user_command (mon=0x28db2680, cmdline=0x28db2ac0 "") at /home/csc288/qemu/qemu-1.4.0/monitor.c:3963 #1 0x082c08f5 in monitor_command_cb (mon=0x28db2680, cmdline=0x28db2ac0 "", opaque=0x0) at /home/csc288/qemu/qemu-1.4.0/monitor.c:4602 #2 0x081f0120 in readline_handle_byte (rs=0x28db2ac0, ch=13) at readline.c:373 #3 0x082c0849 in monitor_read (opaque=0x28db2680, buf=0xbfffe31c "\r\005\235\267", size=1) at /home/csc288/qemu/qemu-1.4.0/monitor.c:4588 #4 0x081d7776 in qemu_chr_be_write (s=0x28c38550, buf=0xbfffe31c "\r\005\235\267", len=1) at qemu-char.c:164 #5 0x081d88a7 in fd_chr_read (opaque=0x28c38550) at qemu-char.c:588 #6 0x081b1616 in qemu_iohandler_poll (readfds=0x89ea1e0 <rfds>, writefds=0x89ea260 <wfds>, xfds=0x89ea2e0 <xfds>, ret=1) at iohandler.c:124 #7 0x081b22ee in main_loop_wait (nonblocking=0) at main-loop.c:422 ->> main-loop.o --> iohandler.o --> qemu-char.o (Makefile.objs) --> at line 164 of qemu-char.c, it calls sHandlerOpaque (so monitor's function is passed as a function pointer at dynamic time). (h) now the dependency relation is as below monitor.o <---> BatchAnalyzer.o -->libinstr monitor.o ---> libinstr main-loop.o ---> BatchAnalyzer.o (explicit. THIS CAUSED that mainllop.o ---> (depends) on monitor.o which should NOT be dependent on (causes a loop). Solution: (1) declare a void *f_cmd_handler(char *) function pointer in main-loop.c, and call it DONE. (2) in BatchAnalyzer constructor, resets the function pointer. FIXED. Note: network has to be set up, otherwise loadvm is not successful. (j) fix the broken logger issue. It seems that the problem disappeared. (10) define task taskCopy. DONE. (11) task execTraceCommand. DONE. (a) define a similar class. 10 min. DONE. (b) call the creator of class. 30 min. (b1) framework. 10 min. DONE. (b2) create a new TraceManager for each job. DONE. (b2) addProgramToTrace. 10 min DONE (c) test. 15 min (b1) refine message log for task complete. DONE (b2) test completed. (12) test and fix problem: 2ND JOB not able to capture load vm message. DONE (a) debug: check where clear_process_info is called and fix it. DONE. (b) test. Now both processes are captured successfully.DONE. ------------------------------------------------------------------------------- Task 83: Raw Trace Trigger Mechanism ------------------------------------------------------------------------------ Idea: the helper_trace2 function will capture each instruction, it then sends to TraceManager isProcessToBeTraced to set up the process status. Then the handle_instr() is called for each instruction for process to be traced. Also add_instr() might be called for the 1st time an instruction is encountered. Step 1. make sure isProcessToBeTraced is handled properly. When a new process to be added, in TraceManager keeps two mappings, from process name to cr3 and from cr3 to Trace. When a new process comes in, update the record. (1) check isProcesssToBeTraced is called. 10 min. DONE. (2) add cr3ToTrace in TraceManager and create Trace class. (a) add Trace constructor. DONE. (b) addProcessInfo(cr3, procname); (3) test if process information is added and test logger for Trace. set BP on isProcessToBeTraced, Trace::Trace Problem: the system is not able to capture the process (missed some of the processes). Debugging 1: (1) BP on taskAnalyze, (2) bp on helper_trace2. Found that the discovery of new CR3 is actually never hit! Debugging 2: (1) BP on taskAnalyze::do_job, (2) bp on helper_trace2, find all process ids and set condition bp to discover new ID at the entry. Attempt: clear arrCR3. failed. Debugging 3: (1) BP on taskAnalyze, (2) bp on helper_trace2 and (3) bp on helper_sysenter (seg_helper.cc:2315). Still could not catch anything. Guess: maybe it's the print msg too early discharged the b1.exe. On copy command, use c:\ to discharge. VERIFIED. make a temporay change to the exit condition to taskAnalyze, later will need to trace on cr3. (4) test: set BP on Trace::Trace. Fix log path problem. DONE. Step 2. Design the Trace class. The trace clas first provides a number of methods for updaing instructions and memory references. Internally, it keeps the following components: (1) an instruction store (in RAM) and a map from address to instructore store (2) history of trace (timestamp and the address of the instruction beging executed and the memory address being referenced)A Later in full trace, we'll establish the map between registers and mems. All stores will be supported by a in memory cache to write the contents to file from time to time. Step 3. Implement a Cache class in support of components of Trace. (a) Definition of class. 40 min. DONE. (b) constructor. 20 min. DONE. (b1) set all properties (b2) create files (c) destructor temporory solution. 5 min. DONE. (d) appendRecord(char *bytes, int size). 20 min. DONE. (e) saveBlockToDisk. 20 min . DONE. (f) debug saveBlockToDisk. 40 min. DONE. create cache of 5 records. append 6 records and see how it is written. (g) implement saveToDisk. 15 min. DONE. (a) save current block (b) save to index. (c) call it in destructor. (h) debug saveToDisk. 15 min. (i) implement loadCache(char *filePath). DONE. (j) debug loadCache. (k) implement loadBlock(long long int id), assumption Cache has been loaded. (20 min). DONE. (l) debug loadBlock. 60 min (stuck on a stupid memory overwriting, the report of segmentation fault does not yield the accurate location). (m) implement retrieveRecord(long long int id). 20 min. DONE (n) debug retrieveRecord. 20 min. DONE. (o) simple test. 15 min. DONE. (j) random test. 20 min. DONE. Step 4. Implement the has_instr() function. DONE and tested. declare an internal structure instr_quick_info(unsigned int addr, int char), declare a hash_map on it. (e) unit testing. 20 min Step 5. Implement the add_instr(). DONE. There is going to be a global InstrInfo instance and and cacheInstrStore. Global instance writes into cacheInstrStore. (1) add the InstroInfo class. 15 min. DONE. (2) add the cacheInstrStore and InstrInfo instance. 10 min. DONE. (2) add load_instr method. 10 min. DONE (3) add writeToCache method. 15 min. DONE. (4) add loadFromCache method. 10 min. DONE. (5) testing writeToCache and loadFromCache. 20 min DONE. (6) implement add_instr. 15 min. DONE. (7) simple test of addr_instr. 15 min. ---> found problems with hash. Step 6. Implement the handle_instr(), handle_mem_read, and handle_memwrite Idea: add an InstrExecRecord, includes addr, (timestamp) is implicit, memoryReadRange, memWriteRange. Backed up by Cache. (1) add InstrExecRecord definition. 20 min. DONE. (2) add InstrExecRecord instance and the cache instance. 10 min. DONE. (3) implement InstrExecRecord.exec(addr). 10 min. DONE. 08/16/2013 8:45am (4) implmeent InstrExecRecord.updateMemRead(addr). 15 min. DONE. (5) implement InstrExecRecord.updateMemWrite(addr). 10 min. DONE (6) unit test updateMemRead, updateMemWrite. 20 min. DONE. (7) unit test update_instr(). 15 min. DONE. 10:00AM (8) implement InstrExecRecord.appendRecordToCache(). 15 min. DONE (9) implement InstrExecRecord.loadRecordFromCache(id i). 15 min. DONE (10) simple debug of append and loadrecord. 15 min. DONE. (11) unit test serialization. 20 min. 11:00AM (10) hook up with qemu. 30 min. DONE. (11) simple debug on trace. 180 min (a) has_instr. DONE. (b) handle_instr. DONE. (c) add_instr. DONE. (d) handle_mem_read. . NETUSE slowed down to about 100 seconds. over 10 times slower. Problems of reading memory first. (d.1. 15 min) add a simple hash trick to TraceManager::cr3 get and see if it's working to improve speed. 15 min --> shortened to 53 seconds. if without at all, it's 17 seconds. (d.2 15 min) change TraceMnager::getInstance to inline. Improves to 27 seconds (d.3 30 min) now move TraceManager::handle_mem_read and handle_mem_write both back to inline and header file. Still 27 seconds. does not improve a lot (d.4 15 min) move Trace::handle_mem_read to inline header file as well. improved to 24 sec. (d.5 5 min) use -O2 flag. best performance 22sec. no big difference. (d.6 15 min) handle_mem_read not showing up. FIXED. (d.7 15 min) problem handle_mem_read is hit first. change to all logs. (e) handle_mem_write. DONE. (e.1 15 min) solve non-consecutive problem. bug solved. 8:15AM 08/17 (e.2 30 min) check other non-consecutive issues. Found the problem, some instructions needs to read. DONE two pairs of memory slots (such as the COMPARE instruction). Solution: [1] in InstructionExecRecorder, adds two sets of memReadStartAddr2 and memWriteStartAddr2 [8 min] DONE [2] update the logic [10 min] DONE [3] remove the error detection logic for startAddr<endAddr [5 min] . DONE [4] test [8 min] 9:45AM 08/17/2013 (12) add InstrExecRecorder dump. 53 min (a) add a config item. DUMP_ENABLED 1/0. [10 min] DONE. (b) add InstrInfo dump function. [15 min]. DONE. (c) add InstrExecRecorder dump function. [30 min]. DONE (c.1) add a function to Cache to get last id. DONE. (c.2) add the trce to the InstrExecRecorder. and add a number of functions for reloading instrProcessor.[20min] (c) call InstrInfo dump function in the appendToCache [8 min] (d) debug [10 min] set bp on InstrExecRecorder::dump DONE minor adjust. (e) test, using winxp image data [10 min] (13) integrate testing. 30 min. DONE. Logging speed is fast. less than 1 second. DONE- 11:45 08/17/2013 (14) update the process terminate mechanism for task (chagne to process terminate task). (90 min) 12:00PM 08/17/2013 (a) change ConditionOnProcessTerminate, change the cr3 to a set of cr3 ints. [5 min]. DONE (b) add one function for add a CR3 [8 min]. DONE (c) change the setStatus to remove one cr3 from the set. if set is empty, set the status to satisfied. [8 min]. DONE. (d) create a new type of event in event.h (new_process_to_trace, cr3) [8 min]. DONE (e) change TraceManager::isProcessToBeTraced, call send_event [8 min]. DONE 12:30pm. 1:00PM (f) change Batchanalyzer::handle_event and add the handling of new_process_to_trace, modify the current top task.[15 min]. DONE (g) debug and trace if terminate process event can terminate task [30 min] set BP on the above functions (h) integrate testing [15 min]. DONE. 1:26PM. 2:00pm 08/17/2013 (15) solve the issue that that a job is completed. (a) implement the destructor - delete the trace manager and set it to NULL [15 min] . DONE (b) call destroyCurrentTraceManager in BatchAnalyzer::clearTasks() [5 min]. DONE (c) debug [15 min] . (c.1) needs to fix the destructor of Cache. DONE. (c.2) check why history is not deleted. Strange the destructor is never being called. done: recorded about 650k instructions and the trace size is about 6.5MB (for about 1 second of execution). So 4GB max file size could support around 700seconds (10 minutes) of running. DONE; 3:10/PM 08/17/2013 ------------------------------------------------------------------------------- Task 84: Design Slicing Algorithm ------------------------------------------------------------------------------ (1) needs to add job specific config file. Specifies the slice starting point. (2) algorithm: for instrStore and execHistory, use the read only mode. Then create a copy of fullInstrStore and fullExecHistory, also supported by Cache. 1st forward processing and populate the fullInstrStore and fullExecHistory, establish the dependency between time stamps. So here we need to keep memory read/write cache and register read/write cache. Instruction needs to keep a copy of registers being read and written. 2nd trace back from the starting point and trace backward (seems no need to keep all the edge information). ------------------------------------------------------------------------------- Task 85: Add and process slice config and other supporting classes, and the framework ------------------------------------------------------------------------------ (1) add the config file: (1) SLICE_AT. Check WinXP image. [20 min] DONE. (2) add the genSliceTasks() and make it triggered. [15 min]. DONE. (3) make the config parsed in the job. [30 min]. DONE (4) add the config file: FULL_TRACE, and add genFullTraceTasks(). [15 min]. DONE. (5) add ConditionOnSynchTaskCompleted. declare a static method for generating task ID. [15 min]. DONE. methods: static: createNewConditionOnTask() regular: int getID() (6) ass taskSynchoronized takes a ConditionOnSyncTask, has an internal ID for condition. when finish triggers a send_evnet. DONE. (a) create a new event_type and ID value [8min]. DONE. (b) add the taskSynchronized class [10 min] DONE ---- ------------------------------------------------------------------------------- Task 86: Implement Full-Trace Function. ------------------------------------------------------------------------------ (1) add taskFullTrace [10 min]. DONE. (2) call loadCache to load the raw_trace instrStore [10 min]. DONE. (3) call loadCache to load the raw_trace execHistory [5 min]. DONE (4) create InstrExecRecorder with raw_execHistory [5 min]. DONE (5) drill down to each folder of raw_trace. [15 min]. DONE. (6) Modify the InstrExecRecorder so that it does not depend on trace, but on instrStore only! [30 min] (7) merge the above into trace::loadTraceFromDisk(); [20 min] DONE. (8) delcare Trace::expandFromRawTrace(trace); [5 min]. DONE (9) create the instrStore and exechistory and so on. [15 min]. DONE (10) Load InstrStore. DONE. (a) set up the loop to read the raw instruction one by one. DONE (b) call the add instruction one by one - BUT arrBytes not initialized yet!!!. FIXED. DONE!!! (c) update the registers for add_instruction. copy the implementation from OLD. ----------- TO DO --------------------------- 9:00AM 08/20/2013 (d) update appendToCache for InstrInfo, to append set of input/output registers. [20 min] DONE. (e) update loadFromCache for InstrInfo [20 min]. DONE. 9:40AM DONE (11) create a memory write cache, at this moment, use unordered_map. But wrap it with an inline function. [15 min] (a) declare CachedMap, internally it has an unordered_map at this moment. [20 min]. DONE 10:00AM. 10:45AM (b) declare an instance of CachedMap in InstrExecRecorder.[10 min]. DONE. (c) declare a class called dependLink [45 min] data members: flag, type, timestamp, ESP/EBP value and encoding [15 min]. DONE function: serializeTo(ptr) [15 min]. DONE. function: deserializeFrom(ptr) [15 min].DONE unit_test: [20 min]. DONE. (d) fix the old errors in unit testing. . DONE (12) create a register write cache. [10 min]. DONE. (13) update the logic (a) update the logic of handle_mem_write [10 min. [DONE]] (b) add an array of dependLinks and the count. Iniitlize count in handle_instr. [10 min]. DONE. (c) update the logic of handle_mem_read [15 min]. DONE. 9:00AM 08/21/2013 (d) update the logic of handle_instr and update the register updates. Note the handling of the old version. [20min]. DONE (e) record the ESP/EBP value. 1st attempt. simply copy the ESP/EBP values for every instruction. (e1) add function gen_save_esp_ebp [15 min]. DONE (e2) insert it into disas_insn blindly [10 min]. DONE (e3) test the system performance [20 min] DONE. (a) bp on before/after save esp/ebp functions. net use slowed down to 40 seconds (27 vs 40). around 30% slow down. 10:00AM 08/21/2013 (f) modify the raw_trace generation. (f.0) remove the ESP_AFTER value, cause it can be recorded by the previous instruction [10 min]. DONE (f.1) add two flags: change_ESP, change_EBP to InstrExecRecorder [5 min]. DONE (f.1.5) add esp_val and ebp_val to handle_instr functions in various classes. [15 min] DONE (f.1.6) clean the directory, clear all dirs not i386 arch. [30 min]. DONE 335MB -> 15MB (f.2) in handle_instr: record the ESP_BEFORE value [8 min]. DONE (f.3) in handle_instr: for the older instruction, record the ESP_AFTER value [8 min]. DONE 11/20am 08/21/2013 (f.4) in appendToRecord, compare the ESP/EBP value and set the flag [15 min]. DONE. (f.5) appendToRecord, serialize the ESP/EBP value [15 min]. DONE (f.6) retrieveRecord, dserialize the ESP/EBP value [10 min]. DONE (f.7) fix unit testing. [10 min]. DONE. (f.8) dump instruction, dump the information. [10 min]. DONE. (f.9) debug/test [20 min]. DONE (a) check trace PUSH/POP instructions. DONE. working. 12pm 08/21/2013 (d) minor fixes [30 min] (d.1) code review memmory write [5 min]. DONE (d.2) code review memory read [5 min]. DONE (d.3) code review register write [5 min]. DONE (d.4) code review register read [10 min]. DONE 1pm 08/21/2013 (d.5) implement the dump() about links [15 min]. DONE (d.5) implement the append and retrieve () about links [15 min]. DONE (e) integration testing and debugging (e.1) environment set up. [5 min]. DONE 2pm 08/21/2013 (e.1.5) expand gen_full_trace, call handle_instr, handle_mem_read, handle_mem_write specifically. [30 min] (e.1.8) debug BP on handle_instr [15 min]. DONE (e.1.9) fix load instrExecRecofder [5 min]. DONE (e.1.10) fix get_total_size problem. fix the Cache size problem. [20 min]. regenerate raw trace first. DONEk 3:20pm 08/21/2013 3:30pm (e.2) debug BP on handle_instr [20 min] (a) fix the eip==-1 problem. DONE. (b) fix log error on no find register problem. DONE. (c) fix the set of registers problem. needs to clear reg sets. DONE. (d) fix the timestamp increment problem. DONE. (d) continue fix the set of registers problem. It's caused by a bug in InstrInfo serialization.DONE -------------------- 4:30pm 08/21/2013. (e.2) debug BP on memory write (Trace.cc:92). [8 min]. DONE (e.2.1) fix bug on size in mock_mem_access (e.3) debug BP on memroy read. [20 min]. DONE. (e.3.1) fix CachedMap not find case. unordered_map somehow now work right. no bug. --------------------- TOO MANY ERRORS IN REG READ/WRITE, check it later 8:30 08/22/2013 (e.4) debug BP on register read/write [15 min] (a) bp on Trace.cc:99 and see how the registers are handled. Trace each addition to map. [25 min] It seems that the map is working. Check if this is caused by missing calls of approx_regcode. It is called. The system is complaining about reg_code 51. Check what is the register. Found that register 51 is cr3, 81 is eflags, and 85 is eip. (b) read the OLD logic about processing registers. [15 min] It seems that it's simply ignored in Instruction::updateRegDependency if the cache returns NULL. In libdisas.h there are a number of functions defined to retrieve register ID. unsigned int x86_sp_reg(void); unsigned int x86_fp_reg(void); unsigned int x86_ip_reg(void); unsigned int x86_flag_reg(void); (c) define a collection of register constants in InstrExecRecorder.cc as private, and dismiss warning when necessary and ignore EIP. DONE. There will be some warnings initially about registers, but eventually it will be fine. 9:30AM 08/22/2013. 10:00AM 08/22/2013 (e.5) debug BP on dump (a) debug into InstrExecRecorder [15 min]. DONE. (b) remove the Util::error_exit in error finding instruction at 423. [15 min]. DONE (c) fix the destructor of Trace. [15 min]. DONE. (d) problem with CachedMap destructor again. It seems to be always causing trouble. Remove template and make it fixed type. [25 min] 11:20AM 08/22/2013 (e) Trace destructor cause problem [60 min]. strange problem. could not figure out for a while ... remove struct quick_instr_info and replace it with std::pair ... 12:30PM still not solved. (e) Trace destructor cause problem [60 min]. strange problem. could not figure out for a while ... remove struct quick_instr_info and replace it with std::pair ... --- new attempt: download valgrind and check memory problem. ~~~~ STUPID. deleted the folder ~~~~~~~~~~ should have set up the git earlier!!!!! fxxk! 1:34 set up git Take 11:00AM version of yesterday. ------------------------- oops, price paid for stupid rm -fr ! ------------------------------------ 12pm 08/21/2013 (e.1) environment set up. [5 min]. DONE *** app crash. Use Valgrind to find problem: (1) use writeRegMemCache[xxx] = xxx. (illegal op, but strangely did not find out by compiler) (2) declare dependLink arr [5] (should dependLink *arr = new dependLink [5]). 9:00AM 08/23/2013 ------------------------------------------------------------------------------- Task 87: Use Valgrind to remove memory errors and buffer overflow [30 min] ------------------------------------------------------------------------------ (1) identify the buffer overflow place. Found when 69th instruction crashes app. It seems that the stack canary word is located at ebp-0xc. It is modified. (2) use watch to find out the problem Use "watch *0xbfffccdc" to catch it. It is caused by dependLink.serializeTo(ptr). which is called by InstrExecRecorder.appendToCache Found that the buffer is not big enough. ---- DONE!!! ------------------------------------------------------------------------------- Task 88: fix the complaints about special registers. [20 min] ------------------------------------------------------------------------------ (information) register 65 - 70: segment registers. 49-56 cr registers, 71-72 ldtr/gdtr registers Implementation: add an inline function is specicial register, and do not generate complaint message. [15 min] DONE. 10:00AM 08/23/2013 ------------------------------------------------------------------------------- Task 89: check the max link problem [15 min] ------------------------------------------------------------------------------ (1) code review. [5 min] (2) implementation and debugging: add a isVisited function. [10 min] ------------------------------------------------------------------------------- Task 90: make sure that the history are showing up [15 min] ------------------------------------------------------------------------------ (1) code review [10 min] (2) implementation and debugging [5 min] Trace around 18MB, instructions: 600k, around 30 bytes per instruction record. ------------------------------------------------------------------------------- Task 91: check the logger problem, exits and deleted too early. [30 min] ------------------------------------------------------------------------------ (1) add a string path to each Logger, or check it [8 min]. DONE. (2) debug into InstrExecRecorder and check its logger path [8 min] It's using the logger from the rawTrace. (3) plan for revision [5 min] (1) get the file_name from rawPath in BatchAnalyzer::gen_full_trace. (2) in Trace::constructFromRawTrace add an additional parameter (3) create Logger (4) when finish delete trace. (4) implementation [10 min] (5) debug. BP on Trace.cc:330 [15 min] 11:00AM 08/23/2013 ------------------------------------------------------------------------------- Task 92: check the logger problem of raw trace. [25 min] ------------------------------------------------------------------------------ (1) debug and code review [5 min] (2) plan of revision [10 min] resets member to NULL to avoid deleting it. [5 min] (3) debugging [5 min] ------------------------------------------------------------------------------- Task 93: Misc Tasks ------------------------------------------------------------------------------ (1) handle event 106. [15 min] DONE. (2) solve the cache -> block_size memory allocation error. Check Cache destructor. [10 min] DONE. (3) fix minor dump problem [5 min] DONE. 2:00PM 08/23/2013 (5) read dump and find more bugs [10 min] (6) fix timestamp dump issue. [5 min] (7) instruction 2 reg dependency problem. [30 min] (a) It's the problem of generating raw trace, 5 registers recorded for instruction @@7c92289c [5 min] (b) dump_set [10 min] (c) debug into full trace, BP on InstrInfo.cc:227 [20 min] processing registers is ok. need to check serialize_set. ok. check deserialization. bp on Trace.cc:101. Found the problem : deserialize_set DONE. 3:20PM 08/23/2013 (8) improve layout of output. [5 min]. DONE (9) problems with instruction @7c9228a3, regiser dependency not right [30 min] check processing of registers. BP on InstrInfo.cc:229 Implicit ops should be all treated as input_reg! (10) check the ESP updates problem. BP on InstrExecRecorder.cc:67 at 7c9228de. not solved yet. 4:30PM 08/23/2013 7:30PM 08/23/2013 Continue with (10): found the problem. Information loss when replay the handle_instr. [10 min] (1) create a new function: expandFromRaw(): [30 min] (a) copy memRead and regSet from raw (b) update the memory access (c) update the register access (2) update Trace.cc [8 min] DONE. (3) revert the original implementation , make sure it's onwly called in raw mode [10 min]. DONE (4) test raw mode [15 min] .DONE (5) test full mode [30 min] (a) fix the timestamp. DONE. (b) fix the ESP link. DONE. (c) FIX the mem link. DONE (around 60k mem access without source) (d) fix the 0xFFFFF ESP/EBP value. DON'T FIX. Leave 0xFFFFF (c) dump the memory warning. 9:00 AM 08/24/2013 ------------------------------------------------------------------------------- Task 94: Solve the ESP Link (0xFFFF) problem (25 min) ------------------------------------------------------------------------------ Problem: when there are instructions like MOV EAX, [EBP+10], and the EBP value does not change during the last instruction, EBP_VAL is set to 0xFFFF. Idea: declare two attributes LATEST_ESP and LATEST_EBP, init to 0xFFFF. Whenever ESP_VAL_AFTER is changed, update the LATEST_ESP value. Implementation: declare the two attributes in trace.h and update in InstrExecRecorder.expandFromLink. Then test -- DONE. ------------------------------------------------------------------------------- Task 95: Develop control dependency link ------------------------------------------------------------------------------ Idea: an instruction has a dependency on previous instruction if the previous instruction is replaced with NOP, the control flow cannot reach the current instruction (however, exception is given to jump/contorl instructions). Even if the previos instruction naturally flows to the current instruction, there is still a control dependency. Implementation: (0) design [15 min]. DONE (1) declare two attributes in trace.h: nxtImmediateAddr [5 min]. DONE (2) update nxtImmediateAddr in expandFromRaw [15 min]. DONE 10:00AM 08/24/2013 (3) define function isTransferControl [10 min] - check old implementation. DONE (a) add type to InstrInfo [10 min]. DONE (b) add serialization support [15 min]. DONE. (c) unit test (add real code) [20 min]. DONE (d) insert 4 inline about checking type [15 min]. DONE 11:15AM (4) declare and set attribute in InstrExecRecorder: isLastInstrTransferControl [10 min]. DONE (5) add the control dependency logic, note: the possibility of context switch! [15 min]. DONE. (6) debug and testing [30 min] (a) check CLINK. ok (b) context switch not ok. check instruction 3, why the length is no ok. check when length is constructed it is ok InstrInfo::load. ok check ... found the bug. loadInstr is called after the value is assigned. (c) add logic to isRET. --- DONE 1:30PM 08/24/2013 ------------------------------------------------------------------------------- Task 96: add task stop VM to stop vm (25 min) ------------------------------------------------------------------------------ (1) Design [5 min] (2) add class stopVM and method stop VM [8 min] (3) add the task to gen_full_trace [8 min] (4) test and debug [5 min] --- DONE ------------------------------------------------------------------------------- Task 97: design and set up slice framework [40 min] ------------------------------------------------------------------------------ (1) Design [15 min] DONE (2) implement genTasksForOneSlice() [15 min] (3) implement taskFullTrace [15 min] (3) Trace *loadFullTrace(job_path) (2) implement genTaskForOneSlice [10 min]. DONE (3) implement taskOneSlice[15 min]. DONE. 2:40pm 08/24/2013 ------------------------------------------------------------------------------- Task 98: implement Trace::loadFullTrace [35 min] ------------------------------------------------------------------------------ (1) Design [5 min] (2) simulate load raw trace [10 min] (3) debugging [20 min] (a) fix the name problem of full trace ------------------------------------------------------------------------------- Task 99: set up the slice framework (ignore the PE parsing first) ------------------------------------------------------------------------------ (1) Design [10 min] DONE. (2) add Trace::slice(job) [5 min] (3) search for the timestamp that contains the instruction [20 min] 9:40AM 08/25/2013 ------------------------------------------------------------------------------- Task 100 (Yeah!): add bInSlice flags to both InstrStore and ExecRecord ------------------------------------------------------------------------------ 1. Misc. correct documentation in trace.h [15 min]. DONE 2. add bInSlice to InstrStore and serialize it [15 min]. DONE 3. unit test InstrInfo [10 min]. DONE 4. add bInSlice to InstrExecRecorder and serialize it [15 min] 5. unit test. [25 min] 6. regenerate the raw and full traces. DONE. ------------------------------------------------------------------------------- Task 101: change the control link to include both the ESP and EBP value ------------------------------------------------------------------------------ 1. change the definition. DONE 2. change the serializatoin. DONE 3. change the dump information. DONE 4. change the call. DONE 5. test. DONE. ------------------------------------------------------------------------------- Task 102: expand the interface of one_slice ------------------------------------------------------------------------------ 1. introduce class section. DONE. 2. add the section to the function. DONE 3. create a dummy section for testing purpose. DONE 12:00PM 08/26/2013 ------------------------------------------------------------------------------- Task 103: port the binWriter class ------------------------------------------------------------------------------ 1. copy and set up makefile. [15 min] DONE 2. clearAllSecitons [10 min] DONE. 3. writeFile [15 min]. DONE. 4. writeBytes [10 min]. DONE. 5. writeInstruction [10 min]. DONE. 6. getInstrInFileOffset [15 min]. DONE ------------------------------------------------------------------------------- Task 104: add section investigation function to binWriter ------------------------------------------------------------------------------ 1. add prototype [10 min]. DONE 2. read about PE format. [60 min]. DONE Data to read: (1) magic code "50 45" at 0x0d8 (2) offset to PE header at 0x3C (2) number of sections 2 bytes at 0x0de-0xd8 = 0x6. (3) image base 4 bytes at 0x10c-0xd8 = 0x34 (4) sections tarts at 0x1d0-0xd8 = 0xF8, each has 0x28 bytes offset: virtual size 0x8, 4 bytes virtual address 0xc, 4 bytes in file locatio 0x14, 4 bytes characteristics: 0x24 (4 bytes). Macro : EXECUTE bit 0x20000000 3. implementaiton [60 min]. DONE 4. test [60min] (1) make the framework [10min]. DONE (2) copy the file and establish the folder [10min] DONE (3) test function [10 min] 9:10AM 08/28/2013 ------------------------------------------------------------------------------- Task 105: test notepad [25 min] ------------------------------------------------------------------------------ (1) copy note pad [10 min] (2) modify code [8 min] (3) test. [5 min] DONE. 9:25AM ------------------------------------------------------------------------------- Task 106: copy slice file to job folder [1 hr] ------------------------------------------------------------------------------ (1) implement copy function in Util [10 min] DONE (2) test the copy function in Util [8 min] DONE (3) implement the set up of the slice file [20 min] (a) read about how job is passed. Trace->name has the information. [15 min] (b) logic: the file to copy is the file which matches Trace->name, the destination directory is the name of the Trace. [10 min] (c) set up logger. [10 min] DONE. (d) implement (b). [15 min] DONE. DONE. 10:52AM ------------------------------------------------------------------------------- Task 107: clear in-slice flag and set up slice framework function [40 min] ------------------------------------------------------------------------------ (1) clear slice flag and debug [15 min]. DONE (2) slice algorithm design [25 min] while loop and look back: if the current ts in slice for each dependency ts (memory, register, esp, ebp) mark in slice for each control link if previous instruction is not RET otherwise search ESP value not to exceed min_marked value, if found data dependency on non-section instructions, then need to mark all 1:00PM ------------------------------------------------------------------------------- Task 108: Implementation: slicing algorithm ------------------------------------------------------------------------------ (1) Design [30 min]. DONE (2) Declare function searchFunction(startTS, minTS, vecSections), it returns the timestamp which between startTs and timstamp there are no dependency points. [30 min]. DONE (3) implement the algorithm [1hr]. DONE. (4) implement the searchFunction() [30 min]. DONE (5) debugging [20 min] 9:00AM 08/29/2013 ------------------------------------------------------------------------------- Task 109: Implement update Cache (3 hrs) ------------------------------------------------------------------------------ (1) create Cache::updateRecord(id, buf, size) [30 min]. DONE. (2) unit test Cache::updateRecord [30 min]. DONE. (3) refactor InstrInfo::appendToCache and unit test it [30 min]. DONE (a) add an attribute id - but no need to serialize it. (4) add InstrInfo::updateCache() [20 min]. DONE. (5) uint test InstrInfo::updateCache() [20 min]. DONE (5) refactor InstrExecRecorder::apendToCache [20 min]. DONE (6) add InstrExecRecorder:updateCache[20min]. DONE (7) unit test updateCache[20min]. DONE 12:00PM 2:00PM ------------------------------------------------------------------------------- Task 110: Debug Slicing Algorithm (2.5 hrs) ------------------------------------------------------------------------------ (1) use the updateCache [10 min] (2) debug the mainloop.[1.5 min] bp on Trace.cc:204 (a) bug on missing updateCache. DONE (b) similar. DONE. (c) bug with full_trace generation. remove EIP from the register dependence. DONE (d) fix the processFunciton (condition on it). DONE. (d.1) fix missing of map getInstrID (e) refactor the handling of transfer control. (e3) define a function search for control link [15 min] (e4) call it after search for control link [5 min] (e5) debug [10 min] (e.5.1) make control link the last when building link (e.5.2) set attribute needToBeVisited BP on Trace.cc:279 and 224 --> identified it's a bug. Found the problem. It's the serialization. 7:00pm (f) debug processFunction. BP on 318, 346. FIXED. (g) fix the addrInSection. fixed. 9:00AM ------------------------------------------------------------------------------- Task 111: Debug Slicing Algorithm (2.5 hrs) ------------------------------------------------------------------------------ (1) extra depending on 0x40102a. timestamp 591585. check when it has the flag set. -- there seem to be bugs about serialization. [30 min] (a) set a BP on updateCache and see when it's updated. -- IT'S NEVER HIT. Now the question is who writes to that address? (b) BP Trace.cc:327 --> fixed. Found that it's the updateCache() not called after init of slice and tobevisited flag. (2) continue debugging: start from 0x40106d Found problem: control link should be the last one. DONE The above introduced a bug, fix it. BP on Trace.cc: 228, 240, 248 DONE. work ok. Still has extra function call to remove (problem, extra ESP/EBP) ------------------------------------------------------------------------------- Task 112: Remove extra function call ------------------------------------------------------------------------------ Idea: handle control link first and then handle ESP/EBP links Implementation : (1) define findTSWithESP(minTS, bool bESP) [10 min]. DONE (2) separate the for loop [10 min]. DONE. (3) handle the next curTS [8 min]. DONE. Still problems. seems need to add reverse link. ---> algorithm design: add a new flag called ESP_DELAY_FLAG, when a timestamp (instruction) has an ESP delay flag, it needs to be examined. (1) if it is visited during the main while loop, treat it like a normal instruction. In another word, if we have something like add esp, 4 sub esp, 4 add esp, 4 sub esp, 4 We would include all these instructions instead of optimizing them (2) During processFunction, the processor first do a pass of all instructions in the function body, if none has any hard real DATA/CONTROL dependency, then the function can be skipped. [this needs to be verified about the those instructions with ESP_DELAY_FLAG] for each instruction with ESP_DELAY_FLAG, search for the timestamp (before the call) instruction and make sure that there is no ESP modifying instruction in between the call and the target. If that is fine the entire function can be removed. -------------- IMPLEMENTATION PLAN --------------------------------------------------- (1) add a ESP_DELAY_PROCESS_FLAG to InstrExecRecorder. Note, needs to change char to short int for the flag. Add two functions to set and get the attribute. unit testing [25 min]. DONE (2) In trace.cc, marge the second loop with the first loop, just call the setESPDelayProcessFlag [10 min] DONE. (3) modify findTSWithESP, add the logic for checking no esp updating instructions. [15 min]. DONE (4) update the processFunction, include a loop to check all esp delay instructions. [15 min]. DONE (5) debug [25 min] (1) bp trace.cc256. find problem with serialization. FIXED. (2) error with full trace memory link at 0x401062. check -- 8:30AM 08/31/2013. ------------------------------------------------------------------------------- Task 113: Debug the algorithm ------------------------------------------------------------------------------ (1) error with full trace memory link at 0x401062. check -- Read code of InstrExecRecorder.cc. check RawTrace first. The addr being read/write do not match. Need to regenerate all traces. Problem fixed. (2) Debug the sequence of instructions being processed. BP on Trace.cc:260 Mostly, found one bug in processFunction (3) problemProcessFunction: check of isEspDelay BP on Trace.cc:412. DONE (4) bug: setEbpDelay not work. BP on Trace.cc:286, and then then BP on 420. check 0x401029. Found the problem Ebp not serialized. fixed. (5) bug: check why findTSWithEBP returns -1. Check why EBP valueis 0xffffffff. It's caused by loadFromCache. (a) add a function called searchForESPValue(ts start, bool ESP) [15 min] (b) bp on 211. FIXED. (6) need to clarify the semantics of findTSWithESP. --> findTSWithESP_AFTER debug: BP on 221 verify in winxp image: esp case ok. and ebp case ok. (7) decide what to do with findTSWithESP. DONE. (8) debug the new strategy findTSWithESP_AFTER again. DONE. 12:30PM 2PM (9) check processFunction return. DONE. (10) for the else case, should set the return instruction in slice. DONE. (11) for esp/ebp delay instructions, if processeded directly, should set them in slice. ------------------------------------------------------------------------------- Task 114: add logging message. ------------------------------------------------------------------------------ (0) fix the bug why the log disappears. DONE (1) log process instruction. DONE (2) log data link. DONE (3) log esp link. DONE (4) log call processFunction DONE. ------------------------------------------------------------------------------- Task 115: find out the -1 problem in processFunction ------------------------------------------------------------------------------ (1) Problem: last logged operation: process ts 599210 @0x402c05 inSlice:1 needVisit: 1 ESPDelay: 0 EBPDelay: process control link ts 599209 @0x408571 unexpcted tsRet -1 in processFunction()! (2) check 0x408571 in winxp image. The call instruction is at 0x402c00. (3) read the log and check: ESP: 0x12ff74 at 0x402c00 ESP: 0x12ff74 at 0x408571 So the search should actually work but it crashed. (4) gdb into the case. Found the problem: expected_esp is not right 0xffff. (5) create and call function getESP_BEFORE_VALUE DONE ------------------------------------------------------------------------------- Task 116: find out another -1 problem in processFunction ------------------------------------------------------------------------------ (1) problem. before CALL and after RET, the ESP actually does not match!!! process ts 569476 @0x7c911bff inSlice:1 needVisit: 1 ESPDelay: 0 EBPDelay: add data link ts 569473 @0x7c910c00 process control link ts 569475 @0x7c910c02 unexpcted tsRet -1 in processFunction()! (2) fix idea: The problem is that sometimes in 0x7c section, the RET instruction does not have the exact stack pointer when come back from a function. use a second criteria when process functions. As long as the next immediate instruction is the expected next immediate instruction. Then let is pass ,but generate a warning in the logger. 9:00 09/04/2013 ------------------------------------------------------------------------------- Task 117: find out yet another -1 problem in processFunction ------------------------------------------------------------------------------ (1) debug: set BP on Trace.cc:503 and ignore it 7 times. Error point: timestamp 555893, it should be back to 555873. The problem is that diffESP is 36, greater than the threshold 32. (2) solution: assumption: recursive calls should not destroy its own stack. So if exact ESP match could not be found, we'll just check next immediate address, ignore the use of diffESP. ------------------------------------------------------------------------------- Task 118: find out yet yet another -1 problem in processFunction ------------------------------------------------------------------------------ (1) problem timestamp: 394875. It seems to be caused by 7c918de7 -> ... syscall --> 0x80range --> sysexit --> 7c90eac7 [no match of previous instruction] But 7c90eac5 is NOT the prior instruction of 7c90eac7!!! (2) verify in winxp image. 0x7c90eac7 soon leads to zwContinue that jumps to the entry 0x4013d7. So 0x7c90eac7 is not right after a particular call, the stack is set up by the process loading procedure. So this is a little bit like the return oriented programming attack. Nothing new here! arrange the stack and return to the corresponding code to accomplish the logic, but the kind of setting is set up by the OS! (3) solution idea: ignore the case there is no matching calls. Generate a warning message. 2:30PM 09/04/2013 ------------------------------------------------------------------------------- Task 117: improve speed ------------------------------------------------------------------------------ (1) Improve processFunction speed by builing a pre-process table. DONE. (2) in Trace class introduce two members: callTable and tsToCallID [40 min] (a) class CallRetRecord [40min] (b) test CallRetRecord [20 min] 8:30AM 09/05/2013 (b) declare call table and tsToCallID in Trace [15 min] (b.2) test the creation [15 min] (3) implement function setCallTable() (a) declaration and compile [15 min]. DONE (b) set up the loop to process each instruction/timestamp [15 min]. DONE (c) handle call [15 min]. DONE (d) handle RET [15 min]. DONE. (d) add function searchForCall. [20 min]. DONE (e) add nxtImmEIP into CallRetRecord and update all functions. [20 min]. DONE. (d) handle ret [15 min] DONE (f) debug into setupCallTable [20 min] 2:00PM 09/05/2013 (a) found bug of getESP_Value_Before, id 61, check later.... It seems to be the problem of loadFromCache. FIXED. needs to call eipToCallID. (b) searchForCall does not return the timestamp. FIXED (c) fix the stackIdx problem. FIXED. (d) fix the ID 64 not in range problem. CCR updateCache problem, got to add callID (e) examine the search for call result. bp on 283, 299, 346 (f) still the serialization problem. load 67 but load 56. FIXED. (g) check the not found case. Involved in infinite loop at 5152, Problem area: 5213. Problem is called by sign extension from int to long long int, 0xFFFFFFFF is expanded to 0x00000000FFFFFFFF. 8:30AM 09/06/2013 ------------------------------------------------------------------------------- Task 118: Test pre-built function table correctness ------------------------------------------------------------------------------ (1) generate the dump. [10 min] bp on Trace.cc:419 (2) examine the dump (a) check the first 10 matches [20 min] OK (b) check 5 calls in 0x4010 range. [10 min] ok. (c) check the no match case. [10 min]ok. 9:30AM 09/06/2013 ------------------------------------------------------------------------------- Task 119: Use the pre-built function information ------------------------------------------------------------------------------ (1) Design. [15 min] (2) update the algorithm of slice [15 min] (3) update the processFunction using the call table [25 min] (4) Debug processFunction: (a) step through [25 min] (b) test hasDependee case. [25 min] Problem: entireFunctionDependee needs fix. 2:30PM 09/06/2013 (5) Problem: entireFunctionDependee needs fix. Change the logic, when the entry instruction is not in the section, then mark the entire function as entire function dependee [15 min] OK. (6) the -1 logic. and change warning level. [15 min] (7) problem: second slice fails . found the problem. raw trace missing.[25 min] ------------------------------------------------------------------------------- Task 120: Write to file ------------------------------------------------------------------------------ (1) figure out how to call binwriter. [10 min] DONE. (2) call binwriter and write the file [25 min] DONE. (3) debug [15 min] (a) step through. OK (4) test. [20 min] Problem: 0x40103d (b=2) and 0x401057 (call ...) should not be included! Strangely they did not show up in the slicing algorithm. Debug: set write_slice 9:00AM ------------------------------------------------------------------------------- Task 121: debug slicing. Find out why every executed instruction is included in slice ------------------------------------------------------------------------------ (1) bp on Trace.cc:579 and see if each InstrInfo is in slice. [20 min] (2) conditional BP too slow. Use customized if branch to check if InstrInfo is messed. [20 min] They are never hit. So the problem is in write slice. (3) double check using xp image. works! ------------------------------------------------------------------------------- Task 122: verify if the slice is executable ------------------------------------------------------------------------------ (1) bp on entry of main. [15 min] 10:00AM (2) problem at 0x00406a61. Comparative study of the missing instruction: Break at call 0x00401330. [20 min] The problem is at function call 0x00404A5c, it changes the value of EDI (but the correct version does not) (3) check function 0x00404a5c. Problem is with 0x0040473f. It changes the value of EDI. There is a bunch of pop instruction in the function body which resets the EDI instruction, and the sliced version is not right. It maybe be caused by the function call handling. [20 min] (4) Guess of the reason: [1] one instruction might be depended on multiple reasons, esp and memory. The current code avoids it to be procesesd twice, got to revert it. [2] first pass of function call match too relaxed, got to strengthen it. [20 min] 11:00AM (5) cancel the redundancy check in dependency construction [15 min] DONE. (6) strengthen function call match first check [15 min] DONE. (7) run the system again and then check. [20 min] (a) unit test (a) generate full trace, and (b) slice. STILL DOES NOT WORK. (8) check if the slice works well by log 0x00404A61 @ts 479689 --> 0x402741 @ts 479681 --> 0x40270c @ts 479531 All of the above instructions are included Compare the ESP value side by side. At 0x40272f the ESP does not match. So need to change the logic again. When 2nd criteria is used to find function match, then the function cannot be USED! set the bOutHasDependency to true. fixed. 2:00PM 09/07/2013. ------------------------------------------------------------------------------- Task 123: Found new problem. 0x00403d7a. ------------------------------------------------------------------------------ Problem: At 0x404c0c ESI value is not right ESI value is from 0x404459. It is in slice, however, it is popping bad value. Guess of the reason: when an instruction is included in slice, it should forever be included. Thus, to produce an executable slice should be an incremental one. If an instruction is included in slice, then it may be invalidate the decision of passes. Implementation plan: (1) declare a boolean variable bNeedsVisit [5 min] DONE (2) before skipping a timestamp, check its instruction, if in slice, set bNeedsMoreVisit [10 min] DONE (3) at the end, set the instrStore [5 min] SKIPPED. (4) add setInSlice(long long int ts) to Trace class [15 min] (5) replace all ier->setInSlice [15 min] (6) testing. [20 min] done. BUT NOT FIX THE Poblem. Had to check each function call and see if the pair of ESP/EBP ok. The pairs of ESP/EBP are ok, However, it's the problem of the contents. Which is not popped ok. Check the trace 0x00404c0c (call esi, esi value not right) --> 0x@404459 (pop esi) at 458042 --> @404431 (push esi) at 458025 By setting bps, we fouund that until 0x004039b5, everything matches.A *** problem found: call esi instruction did not yield the register dependency on esi!!!!. (1) check the raw trace. no finding. (2) BP on InstrInfo.cc:261. Problem is in register set up. It seems that for instruction call esi, esi is not included in op_ro. (3) try op_src and op_dest. they don't work. (4) try op_explicit. It is included in op_explicit. Use a temporary logic. If the explicit register is not included in write set, then in clude it in read set. (5) dependency problem fixed. ------------------------------------------------------------------------------- Task 124: Another bug: 00409ad6 ------------------------------------------------------------------------------ Observation: bad and correct behaviors depart at 0x00409ace (call esi)A Problem is that instruction at 0x00409AB3 is ignored!! (1) check the set generation, BP InstrInfo.cc:289. Register set is ok. EAX is included in the output_reg of XOR eax, eax (2) Now the problem is to check why 0x00409AB3 is not included. The problem is that there is an instrcution setnz al instruction in between, which is approximated to eax. --> caused the problem!!!! Needs to redesign the register handling algorithm!!!! ------------------------------------------------------------------------------- Task 125: Fix register handling ------------------------------------------------------------------------------ (1) remove approx_reg_code [5 min] DONE (2) declare find_reg_code(int reg, int *reg_codes) return number of reg code [10 min] DONE (3) call find_reg_code [15 min] DONE (4) call gen_reg_code in register dependency analysis [15 min] DONE (4) implement find_reg_code [120 min] (a) declare all registers [20 min] (b) declare map [60 min] (c) debug. [20 min] (d) double check all register maping [20 min] (5) check handling of ESP/EBP. check to avoid duplicate register. (6) modify inSameGroup instruction to include the mapping (7) debug. ok [20 min] (8) debut the case on 0x00409ac3.[15 min] It should has two dependencies: 0x00409abd (setnz al) 0x00409ab3 (xor eax, eax); (9) inpsect full trace. [15 min] ok. (10) inspect now the slicing result. [15 min] (11) fix bug at Trace.cc:527 (12) test [20 min] SLICE ALGORITHM NOW WORKS NOW! ------------------------------------------------------------------------------- Task 126: Add stats report ------------------------------------------------------------------------------ (1) declare a function named stats_report() [5 min] DONE. (2) report the stats in trace [15 min] DONE. (3) report the stats in InstrStore [10 min]. DONE (4) test. DONE. ------------------------------------------------------------------------------- Task 127: Study a simple i/o input program. ------------------------------------------------------------------------------ (1) create a simple getchar() program with a branch. [15 min] (2) debug it in windows and see how it works. It goes through several layers/wrapper of read, and finally called Kernel32.readFilea --> Kernel32.readConsoleA --> kernel32.7c8713f9 (its first parater 0x0040f440 stores the I/O value) --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c9132f3 calls zwRequestWaitReplyPort --> at 7c90e3eb calls KiFastSysCall --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) (3) trace it in qemu. (a) fix bug on loading program. DONE (4) trace it on qemu. Around 2500 instructions in between ts: 600884 7c90eb8d (begin) ts: 602599 7c90eb94 end) --- logic below------- There are too many in, out instructions, trace from address 0x41f440. Sequence of events from backward is: timeStamp: 602645, ins @7c87160d: repz movs es:[edi], ds:[esi] read: (start: 0x25069c, end: 0x25069e) write: (start: 0x41f440, end: 0x41f442) , DEPLINKS: , R: 602638 , R: 602639 , R: 602644 , M: 264761 -- verified. This is called right after CsrClientServerCall --trace 0x25069c --> **** THIS IS NOT RIGHT!!!!! It should be a timestamp between 600884 and 602559 (it is verified that 0x25069c is overwritten during the syscall) !!! timeStamp: 264761, ins @7c90256d: repz movs es:[edi], ds:[esi] read: (start: 0x12eea0, end: 0x12eed7) write: (start: 0x25069c, end: 0x2506d3) , DEPLINKS: , R: 264745 , R: 264747 , R: 264757 , M: 262754 , C: 264760 ESP: 0x12ed50 EBP: 0x12ed58 ---------------- Another strange fact: there are no IN instructions in between 600884 and 602599. Now try to parse the logic between 600884 and 602599. In the following @ is followed by timestamp. (1) @600887, it pops fs, actually it is set fs to 0x00000030 [this must be the one for kernel], then the input cx register is copied to ds and es. This will affect the calculation of virtual address (segment), but not affect the translation from va to physical addr yet. (2) @600974, it resets fs:[0], SEH handler. (3) at 601292, there is an OUT instruction , then it reads from 0x800ca300. at 601300, it repeats roughly the same. (4) @601797, it resets cr3. (5) @602099, lldt instruction, (6) @602106, reset cr3. ----------------- Note lldt is reset twice during the period. This may have something to do with the reason why 0x25069c cannot be traced. CR3 is also reset twice (on for switch into kernel mode and the other). 8:30AM 09/12/2013 ------------------------------------------------------------------------------- Task 128: Continue study how I/O works ------------------------------------------------------------------------------ (1) trace into sendkey BP on monitor.cc:4602 It eventually calls ps2_queue to queue the keyboard event. (2) data is read by ps2_read_data. It is called by the following sequence of functions: #0 ps2_read_data (opaque=0x28df9e60) at hw/ps2.c:191 #1 0x0814c057 in kbd_read_data (opaque=0x28ddf2ac, addr=0, size=1) at hw/pckbd.c:323 #2 0x082b44cc in memory_region_read_accessor (opaque=0x28ddf2d0, addr=0, value=0xaa0fdd70, size=1, shift=0, mask=255) at /home/csc288/qemu/qemu-1.4.0/memory.c:322 #3 0x082b4709 in access_with_adjusted_size (addr=0, value=0xaa0fdd70, size=1, access_size_min=1, access_size_max=1, access=0x82b441b <memory_region_read_accessor>, opaque=0x28ddf2d0) at /home/csc288/qemu/qemu-1.4.0/memory.c:370 #4 0x082b4a04 in memory_region_iorange_read (iorange=0x28df9df8, offset=0, width=1, data=0xaa0fdd70) at /home/csc288/qemu/qemu-1.4.0/memory.c:415 #5 0x082ace1d in ioport_readb_thunk (opaque=0x28df9df8, addr=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:186 #6 0x082ac940 in ioport_read (index=0, address=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:70 #7 0x082ad599 in cpu_inb (addr=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:310 #8 0x082fc0d5 in helper_inb (port=96) at /home/csc288/qemu/qemu-1.4.0/target-i386/misc_helper.c:77 #9 0xafa8ed96 in code_gen_buffer () Plan: [1] restart the raw trace and set BP at helper_inb, trace into it during runing b10.exe [15 min] helper_inb is called too often, try kbd_read_data [2] find out the corresponding eip and cr3. eip is 0x806f48ae, cr3 is 0x39000 [it's clearly not the cr3 of the target process] Instruction dump below: @EIP 0x806f48ae: length: (1): in %dx, %al @EIP 0x806f48af: length: (3): ret $0x0004 @EIP 0x806f48b2: length: (2): mov %edi, %edi @EIP 0x806f48b4: length: (2): xor %eax, %eax @EIP 0x806f48b6: length: (4): movl 0x4(%esp), %edx @EIP 0x806f48ba: length: (2): in %dx, %ax @EIP 0x806f48bc: length: (3): ret $0x0004 After switching from cr3 0x39000 to 0xec400000 (the process to trace), the first instruction is located at 0x804dbf63. Note that it appeared twice: @601798, @602107, both after a mov cr3, eax instruction. It seems that somehow, somewhere it is switched to process 0x39000 Next experiment: (1) first verify if the process id is always 0xec400000 --> verified yes (2) check when the switch happens, bp on helper_trace2 when cr3 is not 0xec400000 The first instruction of 39000 is 0x804dbf67, and the last EIP (of process is 0x804dbf63) Experiment 2: check what's happenging after 0x804dbf60 (switch cr3), first stop at handle_instr to stop at the process. Then bp at helper_trace2. Too slow add if branches. Intersting observation, before the switch of cr3, the instruction is as follows: (gdb) print print_instrRange(0x804dbf60, 0x804dbf70, env) @EIP 0x804dbf60: length: (3): mov %eax, %cr3 @EIP 0x804dbf63: length: (4): movw %cx, 0x66(%ebp) @EIP 0x804dbf67: length: (2): ljmp 0x00000005 @EIP 0x804dbf69: length: (3): leal (%ecx), %ecx @EIP 0x804dbf6c: length: (3): movl 0x18(%ebx), %eax @EIP 0x804dbf6f: length: (3): movl 0x3C(%ebx), %ecx After the switch of cr3. Then it comes the long dump instruction which jumps to 0x81f8f5c4 (with the new cr3 0x39000), then it switches back. So in summary, during this period that the trace is not recorded, the IN instruction is executed. Experiment 3: check how IN is handled. There are several cases and some AL values do get saved. ******************* PAGE TABLE TRANSLATION ************************************ Task 4: study how memory word is retrieved. It calls cpu_ldub_code. Debug into it. It's defined in include/exec/softmmu_header.h:98, by delving into the logic it's possible to get the hardware address either from the soft MMU logic or page table logic. ******************* PAGE TABLE TRANSLATION ************************************ Experiment 5: generate another log. Log1: ts: 600884 7c90eb8d (begin) ts: 602599 7c90eb94 end) 1st cr3 switch: 601797 (relative to begin: 913) Log2: from 600599 to 602316 first cr3 switch: 601515 (relative to begin: 916) Exactly the same amount of instructions So it's not the context switch, it's fixed jump. And it's not using busy loop. It must be using some type of 9:30AM 09/13/2013 ------------------------------------------------------------------------------- Task 129: Fix the logic about telling "non-voluntanary" context switch ------------------------------------------------------------------------------ Idea: add the case that jump, call, sysenter and their subsequent instructions should be regarded not as context switch. There is a slight chance that there are really timer interrupt after these instructions. We at this moment ignore that. Implementation Steps: (1) in Trace class declare a boolean attribute bRecordEnabled init true, declare lastEIP [5 min] DONE. (2) define InstrInfo::getNxtImmAddr [5min] DONE (2) declare a function checkRecordStatus() [20 min] DONE check if it is context switch [non voluntary], if yes, then stop the recorder; if the mode is stopped, check if the ip is close to lastEIP, and re-enable the recorder. [20 min] (3) use it and update all related functions. [15 min] DONE. (4) debug, bp on the above logic. [20 min] DONE 2:00PM 09/13/2013 (5) TEST. read log file Problems: (a) dump EIP one instruction earlier. fixed. [15 min] (b fix problem at iret.) [15 min] DONE. (c fix problem at sysenter.) [15 min] DONE. (d) cr3 problem. Verified it's ok. (e) test the full trace and slicing. (e.1) handle the iret problem. DONE - rate is still around 60%. 4:00PM. ------------------------------------------------------------------------------- Task 130: Handle CR3 ------------------------------------------------------------------------------ Idea: if current instruction is modifying CR3, for next immediate instruction's helper_trace, send event new_cr3. TraceManager receives the event and dispatches to the current trace, add the new_cr3 to trace (mapped it to current trace). When CR3 is switched back, TraceManager removes CR3 to trace. 9:00AM 09/14/2013 Implementation: (1) in InstrInfo class add an inline function to check if it is modifying CR3. debug into the function OK. (1.5) record a new trace. [15 min] DONE. (2) add a type event for cr3_changed [15 min] DONE. (3) when CR3 changes, invoke send event cr3_changed and debug it [20 min] DONE (3.5) debug it. [20 min] DONE. 10:15AM (4.1) TraceManager::setTraceNeedsCR3Update [15 min] (4.2) declare TraceManager::handle_cr3_change and use it in BatchAnalyzer [15 min] DONE. (4.3) implement handle_cr3_change. [25 min] DONE. 2:40PM (4.4) in InstrExecRecorder call setTraceNeedsCR3Update, move cr3_to_watch to Trace class [25 min] (4.5) fix the handle_cr3_change [20 min] (5) debug Trace::handle_instr [20 min] (5.1) fix the out of memory issue. DONE. (5.2) fix the setTraceNeedsUpdateCR3. DONE. (6) problem: proc_status is not updated. FIX: (6.1) TraceManager::getCR3ToWatch [10 min] DONE. (6.2) TraceManager::getCR3ToRemoveWatch [5 min] DONE. (6.3) getCR3ToWatch() and getCR3ToRemoveWatch() in handle.h [10 min] (6.4) update CR3_to_watch by calling getCR3ToWatch() and getCR3ToRemoveWatch() in ops_sse.h [15 min] (6.5) debug: starts from TraceManager::handle_cr3_event, and then trace into helper_trace2. [20 min]. OK. (7) problem: cr3 not switched back to original. Remove the protection. ------------------------------------------------------------------------------- Task 131: fix CR3 handling code ------------------------------------------------------------------------------ (1) in Trace class, change cr3_to_watch to a vector, and modify the setCR3_to_watch. Treat the vector as a stack. If adding an existing cr3_to_watch, pop error. create a function removeCR3_to_watch. Move both functions to the .cc file [20 min] DONE. (2) Modify the TraceManager::handle_cr3 [15 min] (3) Debug: (1) general logic of handle_cr3_change. fix logic at 164. [15 min] OK. (2) check the logic of continuous adding.[20min ok. (3) FIX THE PROBLEM FROM 39000->00e4c000.] change the logic to pop until the last one. add function is CR3 in watch list. pop warning. (3) run and test. [20 min] 8:30AM 09/17/2013 ------------------------------------------------------------------------------- Task 132: fix CR3 handling bugs ------------------------------------------------------------------------------ (1) the warning message in removecr3. DONE. (2) fix the double delete error. FOUND the problem, when trace pops the cr3, TraceManager did not remove the cr3 correspondingly. This causes recording of more instructions than necessary! Add TraceManager parameter to trace. (3) problem with task completion. When delete trace, need to add additional logic to check the trace's cr3 watch list one by one and remove these cr3 in watch list. FIXED. ------------------------------------------------------------------------------- Task 133: Now read the trace of b10.exe, and find out how the i/o is processed ------------------------------------------------------------------------------ Trace to follow 0x401014 (getchar) Kernel32.readFilea --> Kernel32.readConsoleA --> kernel32.7c8713f9 (its first parater 0x0040f440 stores the I/O value) --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c9132f3 calls zwRequestWaitReplyPort --> at 7c90e3eb calls KiFastSysCall --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) Problem: lost track of 401010. Did not even record the instructions until back from 401019. [1] Debug: set bp on helper_trace2 and check why 0x401010 is not hit. BP on 2394 of ops_sse.h. This time it works. Guess: maybe inaccurate context swith leads to the problem? [context switch right after jump]. 7c8713f9: 590165 7c8715bb: 590168 START: 590672 @7c90eb8d END: 592000 @7c90eb94 IN instruction and access of 0x25069c is still out of range. Job next: check if the END address is really the end address. Debug process: bp on 7c8715bb (this address is hit ONLY ONCE during execution), and then on 7c90eb8d. It should return at 7c90e3ed, and then return to 7c9132f8. So the correct exit address should be 908725 @7c9132f8. Now corrected: START: 590672, END: 908725 During this period, there are IN instructions at: 691019, 691070, 691112, 691184, During this period, there are no access of 0x25069c --> trace data [a] 0x401019 (@2255318) EAX -> 2255302 (@402bd3) --> depends on memory 0x12ff48, --> @2255261 (@402bc4) --> depends on 0x41f440 --> timeStamp: 2255084, ins @40c0bd: mov [ebx], al write: (start: 0x41f440, end: 0x41f440) --> timeStamp: 2250831, ins @7c87160d: repz movs es:[edi], ds:[esi] read: (start: 0x25069c, end: 0x25069e) write: (start: 0x41f440, end: 0x41f442) --> 236530 [WRONG]. It must be written somewhere between 590672 and 908725 [2] effort: study the IN instructions between 590672 and 908725. 691019, 691070, 691112, 691184, 691019 --> does not work. ------------------------------------------------------------------------------- Task 134: Disable the CR3 switch logic and see if we can still capture the IN INSTRUCTION, and then vary the input load and see if the number of IN instruction is changed. ------------------------------------------------------------------------------ [1] disable the TraceManager::handle_instr [disable all, adding and switching CR3], also in Trace::handle_instr disable the setModifyCR3. [10 min] [2] read the dump file. Check the following points --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration). Identified: START: timeStamp: 473357, ins @7c8715bb: call [0x7C801034] END: timeStamp: 475131, ins @7c9132f8: cmp edi, ebx NOTE that 7c8715c1 (only hit once during entire execution) is located at timestamp 475159 [this verified] CONCLUSION: There are kernel mode cr3 code running, and need to keep track of the physical memory.!!! for I/O. Idea: we turn on the tracing of physical memory beginnong from every sysenter, and ends at sysexit. Build a reverse page table and map the physical memory begin written back to sysenter [map them back to virtual addr]. 9:00AM 09/18/2013 ------------------------------------------------------------------------------- Task 135: Modify helper_trace_mem in tcg/i386/tcg-target.c so that the physical address will be calculated ------------------------------------------------------------------------------- [1] develop a function vmAddrToPhyAddr, simulate /home/csc288/qemu/qemu-1.4.0/include/exec/softmmu_header.h [1 hr] (a) trace into cpu_ldub_code and study the logic. [20 min] page table access and virtual adddress translation is provided in cpu_x86_handle_mmu_fault. It seems that vmAddrToPhyAddr can be done without the access of page table. If trace_mem_access is called AFTER memory access is done, then memory access is already in TLB. This can be directly used for calculating the physical address. Need to verify if the host address in the code is really the physical address. (b) study of the cpu_ldub_code function TARGET_PAGE_BITS is 12, defined in target_i386/cpu.h CPU_TLB_SIZE is defined as 1<<CPU_TLB_BITS-1 (which is 8). Thus it is 127 (7 bits of 1's) page_index is actually the index of page inside TLB. [it is assumed that page is always loaded active in TLB when it is accessed]. the entry of TLB table is defined in include/exec/cpu-defs.h. ??? in CPUTLBEntry, the field addend is used to add with virtual address to get the physical address, not sure the use of addr_read, etc. unlikely is defined as a macro "include/qemu/osdep.h:#define unlikely(x) __builtin_expect(!!(x), 0)" Simply interpret it as x==0. The "unlikely" condition in cpu_ldub_code means that if the TARGET_PAGE_MASK is defined as 0xFFFFF000 (i.e., ~(1<<TARGET_PAGE_BITS)-1). This is used to tell if the address is in TLB. So here addr & target_page_mask IS the "REAL PAGE INDEX"!!!! (note that variable page_index is actually the real page index mod the TLB size).. If it is in TLB, then the ELSE BRANCH is executed. It first generates the host address (physical addr) by adding the addend. Then it uses the host_addr to retrieve the contents. So here, the host_addr is actually the ADDRESS in the host which stores the data. So here it is confirmed that host_addr here IS the physical address. If it does not match the TLB entry read_addr, there could be several complications, e.g., unaligned access accross pages. case 1. I/O read. The I/O address is out of normal page address. It is forwarded to io_read case 2. unaligned access. It first checks if this is TCG generated code. Then it calls slow_ldl_mmu. It handles the case of spanning over two pages. It loads the data in two pieces and then merge the data. When loading the data it is caling slow_ldl_cmmu. case 3. unalgined access in the same page. It is simply treated as normal access. case 4. not in TLB. call fill_tlb (c) logic of fill_tlb (located at /home/csc288/qemu/qemu-1.4.0/target-i386/mem_helper.c:137) It first calls handle_mmu_fault. it first checks cr0 is set. Then it checks PAE flag. Different treatment. Read page table (logic starts from line 717!!!) page table entry address is stored as a part of env->cr[3] E.g., addr is 0x80087000, env->cr[3] = 0x39000 pde_addr is env->cr[3]+addr>>20 & 0xffc = 0x39800 [note 0xffc is 1111 1111 1100] Interpretation: 0x39000 is the starting address of the first level page table. Note the operation addr>>20 & 0xffc. This is essentially take the left most 10 bits, and then multiple by 4 (because each entry in page table is 4 bytes). This yields ( from addr 0x800087000) the entry in first page table: 0x800. Thus pde_addr 0x39800 is the corresponding entry address in the 1st-level page table. Note that 1-leve page table is called page_directory. line: 718 pde = ldl_phys(pde_addr), this is actually the entry of the 2-level page table pde in debug session is 0x3b003. It is acombination of flags. It has to be first verified with PG_PRESENT_MASK=1. Thus it is a real address, otherwise this is a page fault. then it checks cr[4] PSE_BIT, this decides that the page size is 4MB. --> page size is 4kb. Then check the PAGE_ACCESSED_MASK, and save it if this is the first time the page is accessed. pte_addr is calculated as pte_addr = ((pde & ~0xfff) + ((addr >> 10) & 0xffc)) This is actually (addr >>10 & 0xffc) to take the left most bits 10-20 (the middle 10 bits) and multiple by 4, and add to pde. so pde & 0xfff is the beginning address of the 2nd-level page table, and then pte_addr is the address of the page entry. pte_addr is now 0x3b21c. pte is then the real page entry its value 0x87063. Similarly, its lower bits are padded with flags. ptep is calculated as pte & pde (not sure why it's needed) -- seems to be getting the conjunction of flags line: 821. virt_addr is the page starting address for the address virt_addr is 0x8008700 by clearing out the last 12 bits. line: 841 do the mapping page_offset is the offset of the address in the page. That is 0x0 for 0x80087000 paddr is (pte & TARGET_PAGE_MASK) + page_offset is 0x87000 notee that TARGET_PAGE_MASK wipes the last 12 bits of pte as 0. So pte entry with last 12 bits 0 is the REAL ENTRY ADDRESS of the page. paddr is the physical address. Then it calls tlb_set_page, the physical pages is actually mapped into host memory pages. it calls phy_page_find, which returns the section that the physical page is located in. Here "section" seems to be the private data structure used by QEMU to maintain memory management. Then at line 270, addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) memory_region_get_ram_ptr (mr=0x28d86918) at /home/csc288/qemu/qemu-1.4.0/memory.c:1150 It goes through a list of ram blocks and check if its block offset matches the given address, then it returns the corresponding host address. So the addrend is actually corresponding host address (in emualator) fo rthe physical address. ************************************************************************************************************ Conclusion: in TLB, virtual address is actually mapped to host address (instead of the real physical address). In page table, virtual address is mapped to physical address. To translate physical address, need to call memory_region_get_ram_ptr to calculate the addend. ************************************************************************************************************ 2:30PM ------------------------------------------------------------------------------- Task 136: design the physical memory tracing system. ------------------------------------------------------------------------------- Q1. trace the real physical address or the host address? Q1.1 is helper_trace_mem called after or before the memory access? [1]check tcg/i386/tcg-target.c, helper_trace_mem is placed before memory read/write. So they may not be in cache yet. [2] make an experiment and try to move them after the real memory/read/write and see if it works. [3] strangely, the dump seems ok, however, there are a lot of unrelated ERROR messages on in-consecutive memory access. VERIFIED, cannot be placed anywhere. but only at the beginning, coz data might be destructed. Thus, host_addr is not usable Decision: take the host addr, it might be still faster. Design in general: (1) add a function vaToha which maps from vritual address to host address (2) insert the call vaToha into helper_trace_mem, it's going to slow down it a little bit (3) modify handle_mem_read, handle_mem_write and parameters phyaddr, phyaddr2 (4) declare capture_physical_mem in handle.h and use it in handle_mem_trace (5) provide a function in QEMU for building reverse_page table, it calls add_entry page table (6) provide a data structure called page_table, it can be used for modeling both reverse_page table and the regular page table. (7) How to monitor the writing to page table? ------------------------------------------------------------------------------- Task 137: Implement the function va_to_ha() ------------------------------------------------------------------------------- (1) declare the function in /home/csc288/qemu/qemu-1.4.0/include/exec/softmmu_header.h [DONE] simulate tlb_fill in /home/csc288/qemu/qemu-1.4.0/target-i386/mem_helper.c:137 (2) add the logic for handling other cases. [DONE] (3) debug. place it in helper_trace_mem 9:00AM 09/19/2013 (4) add the handling of unaligned access within the same page. [15 min] DONE (5) handle I/O processing. [20 min] DONE (6) handle unaligned access across pages [90 min] (8) debug unaligned I/O access[20 min] check the regular handling of 0xf1bce --> it's always mapped to I/O port 1 (9) I/O processing unaligned has to be very accurate. Cannot be over approximated. FIX it [30 min] DONE (10) fix the unaligned access acrss pages. [20 min]. debug and find if it returns the same. va: 0x1ffe5ffd, ha1: 0xa98e2ffc, ha2: 0xaa0fde18 Now switch to the real logic and see what is going on: check if the branch is diabled, what's going on. Still does not work. Identified it's the reload tlb logic causes the problem. (11) Try to figure out why reload tlb causes the problem. enalble the branch and see how many times it was hit. the bad news is that it is hit many many times when causing blue screen. It happens at around 35000 times however not fixed. Effort 2: add a global variable as "last addr" record it and place a bp on it. Effort 3: bp on raise_exception first. Confirmed: it's 216c4. the problem is it goes to iotable, which is wrong. observe the values 148. Normal case: 147->148->194 [tlb_addr: 0x21000] Bad case: 147->148->150 [tlb_addr: 0x21010] check the value of TARGET_PAGE_MASK The problem seems to be tob_addr is 0x21010 (not clear aligned). debug into tlb_fill and see what's the problem. The tlb_addr: 0x21010 is caused by the following: } else if (memory_region_is_ram(section->mr) 301 && !cpu_physical_memory_is_dirty( 302 section->mr->ram_addr 303 + memory_region_section_addr(section, paddr))) { 304 te->addr_write = address | TLB_NOTDIRTY; 305 } So the problem is that we refilled the TLB but did not do the writing to clear the TLB_NODIRTY tag. Need to clear the tag. Directly modifying it back seems not solve the problem. Make a minor change to check if it is reloaded and then change. Still does not work. Attempt 4: comment out the reload again and. Still bug. Attempt 5: comment out the read also. Remove all early returns. Add a check of -1 at the end. -1 check does not work. Found that read also causes problem. Make another experiment, enable write but disable read. Strangely, it still does not work. Both needs to be disabled, however, it does not occur always. Attempt 6: compare the trace after the last memory write of 0x216c4. @EIP 0xc01d9: length: (1): iret @EIP 0x20ece: length: (2): les %esp, %eax Strangely, the bp are never hit! Attempt 7: just skip the load_tlb for 0x216c4 and see what happens. Does not work. Attempt 8: create a similar functionto tlb_fill without filling the tlb table. in target-i386/mem_helper.c. This time works. 7:30AM ------------------------------------------------------------------------------- Task 138: test precision of the tlb_fill simulator ------------------------------------------------------------------------------- [1] Design: (1) capture a VA which usues get_ha read first va: fd094, ha: 0xaa442094 (2) then set a bp on tlb_fill and step back to the main function and see what's the ha verified (3) repeat for get_ha write va: 0xe0c3c, ha: 0x899ddc3c verified (5) check large page bp on mem_helper.c:512, check the return and then check the helper_ldl va: 0x806f9088, size: 0x400000 (4MB). -> ha: 0x89ff6088 verified. The va_add_large_page only changes the TLB attribute for full flush when large page invalidated. it's not going to change the translation. (4) check the case it returns -1. The first write returns -1. bp on write and then bp on tlb_fill and the entire get_ha function va: 0x4ea005, second visit transfer to 0x97d7a005 verified, it works. The system will generate a page fault, and after some interrupt handler, it handles the page fault properly and will call the va_to_ha properly. DONE. 9:00AM ------------------------------------------------------------------------------- Task 139: Improve the precision of unaligned access ------------------------------------------------------------------------------- [1] change the starting addr and size handling. [2] debug: bp on softmmu_header.h:193 DONE [3] fix the documentation of va_to_ha [2] debug: break helper_ldl(b)_mmu DONE. ------------------------------------------------------------------------------- Task 140: improve memory read/write storage ------------------------------------------------------------------------------- (1) add class memRange. all members public. supports function [20 min] DONE isMergableWith(start, length) mergeWith(start, length) all inline function (2) test memRange [30 min] DONE (1) test isMergable (2) test mergeWith (3) add class memRangeManager. support functions: [20 min] addRange(start, length) getCount getArrRanges copyFrom(memRangeManager) (3) test memRangeManager. [25 min] DONE. 12:00PM ------------------------------------------------------------------------------- Task 141: Modify InstrExecRecorder class to accomodate the memRangeManager ------------------------------------------------------------------------------- (1) remove readMemAddr etc. [DONE] (2) fix the expandFromRaw. [DONE] (3) fix the mock [DONE] (4) fix handle_instr [DONE] 2:30PM (5) fix memory access [DONE] (7) add memRange.serializeTo and desrialize [20 min] DONE (6) fix serialize [15 min] [DONE] (7) fix deserialize [15 min] DONE. (8) fix dump [10 min] DONE. (9) fix testRecorder [15 min] (1) test. [20 min] DONE 4:30PM. 10:00AM 09/21/2013 ------------------------------------------------------------------------------- Task 142: Set up the physical mem tracing mechanism ------------------------------------------------------------------------------- Idea: whenever there is a CR3 change instruction, start to physical memory tracing mode [1] in handle.h add function isModifyCR3(cr3, eip, opcode) [15 min] DONE [2] in helper_trace2, add a branch to test isModifyCR3 and call two functions: DONE buildPageTable(cr3), and setPhyMemTrace(cr3) [15 min] [3] implement the setPhyMemTrace function in handle.cc [15 min] DONE. DONE [4] debug [20 min] DONE. 10:40 DONE. 11:00AM ------------------------------------------------------------------------------- Task 143: Read Page Table ------------------------------------------------------------------------------- [1] set up the framework. Declare the following: [15 min] DONE (1) build_page_map in include/exec/softmmu_header.h (2) add_page_map in handle.h [2] debug into the template [10 min] [2] implement build_page_map in softmmu_header.h [1 hr] (1) copy from va_to_ha DONE. (2) handle_segment mapping case. DONE (3) handle CR4 case. skip it at this moment. DONE. (4) handle non pae case. add a function. DONE. (5) handle normal 4k page case. DONE. (6) virtual page number to physical page number. DONE 4:20PM (7) remove in pte_to_ha. DONE. [10 min] (8) fix build_map_for_non_pae [25 min] (9) add function set_page_size(cr3, pagesize); [10 min] (10) fix build_page_map [15 min] [3] debug (1) all branches of build_page_map. DONE. the other two branches never encountered. [10 min] (2) debug into build_page_non_ae [15 min] (3) fix << problem. [5 min] DONE (4) debug pet_to_ha. [15 min] DONE. 10:15AM 09/22/2013 (5) check how many pages are generated. done. Around 10k pages * 4k = 40MB? ------------------------------------------------------------------------------- Task 144: Create page_map class ------------------------------------------------------------------------------- [1] define a page_map class, extended later. [15 min] DONE [2] function clear() [5 min] DONE [3] function add_page() [5 min] [4] function ppage_to_vpage [5 min] DONE [5] function vpage_to_ppage [5 min] DONE [6] function va_to_ha [10 min] DONE [7] function ha_to_va [10 min] DONE [8] debug and verify the entire system. [1] get cr3, and then set a BP on tcg_target.c:1252 get three sets of va and ha va: 0x20044, ha: 0x98030044 va: 0xf74dbc28, ha: 0x97dccc28 va: 0xe10010e4, ha: 0x8c08f0e4 [2] fix set page size. DONE [3] add_page_map has not been done yet. [2] bp on Trace::handle_instr and call trace.pagemap.va_to_ha 7:30AM 09/23/2013 There are still bugs: va 0x401010 does not work!!! 0x00401010 first page 0x001, seconnd page number 0x001, page offset: 0x010 BP on target_i386/mem_helper.c: 182 Found it page number calculation error: Still did not correct the problem. Found that page_width is still not right. Fixed. using ha, we can directly use "x/10i $ha_value" to verify that the stored binary instructions correspond to code! page_map success now [9] test efficiency of the system. OK. a little bit slow. Later run profiling tools on it. 9:30AM 09/23/2013 ------------------------------------------------------------------------------- Task 145: make the CR3 memory change working. ------------------------------------------------------------------------------- [1] Design [20 min] DONE [2] change InstrExecRecorder handle_mem_read/handle_mem_write, change size to real size, update documents [5 min] DONE [3] change Trace::handle_mem_read/handle_mem_write, change size to real size [5 min] DONE [4] change handle.cc handle_mem_read [10 min] DONE [5] test if everything works. [10 min] DONE [7] implelement TraceManager::handle_phy_mem_access(unsigned int addr, int realsize) [10 min] 10:30AM [8] implement Trace::handle_phy_mem_access [15 min] DONE. [8] in tcg/i386/tcg-out.c:1249, call va_to_ha and then call TraceManager::handle_phy_mem_access [8 min] DONE [9] debug[20 min] [1] tcg-out DONE [2] TraceManagger::handle_phy_mem DONE [3] Trace::handle_phy_mem DONE. 11:20AM [10] fix the bTracePhyMem logic enable/disable [20 min] DONE add lastCR3ChangeEIP to Trace. [11] debug it. [25 min] (1) check it is disabled OK. (2) if record enabled is false, do not trace physical memory. OK. 12:30PM. [12] debug the trace memory, see where it gets captured. BP on 1410. Did not get hit once. 3:00PM [13] disable the check on RecordEnabled and see the result. Still does not work. [14] figure out if it's the bug of the capture code or it's the entire mechanism. (1) find the physical address of 0x0025690c, it's accessed by the following instruction which eventually contains the data: imeStamp: 475197, ins @7c87160d: repz movs es:[edi], ds:[esi] read: (start: 0x25069c, end: 0x25069e) write: (start: 0x41f440, end: 0x41f442) va: 0x0025069c, ha: 0x9803469c verieid: 0x9803469c: 0x000a0d61 Run another time: always the same address. Now we can set a watch point, use hardware breakpoint otherwise it's too slow. awatch *9803469c Captured! the eip is 0x75b443d5 next eip is 0x75b443d5, cr3 is 0x5dee000 The translation is correct back from ha, however, the Trace::handle_phy_mem_access it not called! @EIP 0x75b443d5: length: (2): repz movsl %ds:(%esi), %es:(%edi) Debug Idea: (1) embed logic at helper_trace_mem and helper_trace2 when eip is 0x75b443d5 ops_sse.h:2374 tcg-target.c:1251 in handle_instr bp on 7c87160d, and then bp on va_to_ha and get the hw address. verify it it contains data or not. (2) modify the system and find the write access on 0x0025690c Found the bug: passed the va not the ha! (2.5) check two address translation case. passed! (3) Problem: it recroded too many addresses! Verified, it is recording kernel space addresses. We need to modify build_page_map to EXCLUDE those special pages. List all the flags: PG_USER_MASK 1<<2 in mem_helper.c:197, 213, 233 Still minor problems: [1] (gdb) p/x addr $12 = 0x8993e300 (gdb) p/x va $13 = 0x7ffe0300 check page table 511-992 This one seems ok, check xp image!!!! verified, this is a legal address. 8:30AM 09/24/2013 ------------------------------------------------------------------------------- Task 146: fix the CR3 memory capacity problem. ------------------------------------------------------------------------------- [1] add a printf and see the memory range. [25 min] There are two problems: [a] merge problem, [b] address not in range. [2] merge problem [15 min] Check the merge problem first. (a) enlarge the capacity and see what's going on. Non mergible does cause problems [3] solve the merge problem [45 min] (a) in memoryManger implements a minimize() function [25 min] DONE (b) test it [10 min] DONE (b) call minimize when reaches capacity [10 min] DONE DONE 10:00AM [4] go back shrink the capacity and find out the non-in-mem ranges [15 min] No in range mems: 0x8993e2f0, 0x8d011620 p1no: 550 , p2no: 318 pde: 0x2423163 page_no: 8993e It seems that page_no 8993e is never added, check pagemap result ha_to_va and va_to_ha add test logic to code in pageMap::add_page and IT IS NEVER HIT! check how come it's added into range. Found the bug at trace.h:1441 Now switch back to 500. Now it's fine (even 200 is not fine). 11:30AM [5] now verify the correctness of the analysis. --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration). Start: timeStamp: 473548, ins @7c8715bb: call [0x7C801034] End: timeStamp: 475755, ins @7c8715c1: cmp [ebp-0x9C], edi During period (473548, 475755), there are _2__ cr3 change instructions: timeStamp: 474518, ins @804dbf60: mov cr3, eax read: (start: 0x250688, end: 0x25269f) read: (start: 0x250690, end: 0x250693) read: (start: 0x7c9110d8, end: 0x7c9110d8) read: (start: 0x7ffe02f0, end: 0x7ffe02f0) read: (start: 0x7c911059, end: 0x7c911059) read: (start: 0x7c902688, end: 0x7c90268b) read: (start: 0x7ffe0300, end: 0x7ffe0307) , ESP: 0xf7584c34 -> 0xf74dbc70 timeStamp: 474827, ins @804dbf60: mov cr3, eax a lot of read/write OK. 0x25069c there. [6] improve the packing code. [15 min] [7] generate the full trace and check the dependency. first check the instruction that reads 0x25069c after timestamp 475755. then check which instruction it depends on: imeStamp: 476542, ins @7c87160d: repz movs es:[edi], ds:[esi] read: (start: 0x25069c, end: 0x25069e) write: (start: 0x41f440, end: 0x41f442) 2:30PM Found that there are segmentation fault: id=475984 (1) bp on the desrialize function . did not find out the problem. it seems that it starts to break at index 10. (2) bp on memRangeManager::serialize when its count greater than 10, look at the memory dump and check it back found an additional bug: mock_memory_access Every time it broke at differneet index, 10, 36 Found that the problem may be caused by InstrExecRecorder.serialize - size. The size is not right! 7:30PM Check how it's serialized. In serialization: size is ok 1109 given around 180 entries in myRead_VA. timestamp: 474693, size: 1109, its position: 9512044. Guess, maybe there are memory overwriting. Check the record size of cache. New bug: fix append_record error. Could not find out why total_size is being overwritten. After total_size 16388 it generates the error 9:00AM 09/25/2013 ------------------------------------------------------------------------------- Task 146: Fix the cache append problem ------------------------------------------------------------------------------- [1] find out how it is inconsistent. (a) BP on Cache.cc:90 (b) then BP on InstrInfo::appendToCache (c) also BP on InstrInfo::loadFromCache - disable first. Observe until id 16388 = 16 * 1024 + 4 [2] Observation: error occurs at 16389, the nxtRecordInBlock is 16384 which is wrong. Our guess is that there is a loadCache at 16383 (which leads to nxtRecordInBlock) but it never recovers to the latest position. Stragenly, did not hit 16384 loadcache, but the nxtRecordInBlock is set to 16384. Need to set a watch point. break on 1, and then enable 2 hit a couple of times and then set the watch point. DOES NOT WORK. too many swaps. [3] Attempt 3: declare attribute lastloadID and check the ID. Last load ID is 174. verify it can be repeated. hit again. now set condition on Cache::retrieveRecord bp condition id==174 , it's hit 10 times, got to ignore times. Last couple of calls' call-chain isModifyCR3(0x81f8f5ee) Trace::handle_instr: 787-> isModifyCR3(0x81f8f5ee) Trace::checkRecordStatus 0x81f8f5f0 InstrExecRecorder->dump (0x81f8f5ee) --> setInstrPorcessor->InstrInfo::loadFromCache FAILED, the number of times that it is hit is not stable. [4] Attempt 4: break on Util::error_exit and check the eip being added. Then find out the previous instruction, set a BP on handle instruction of the previous instruction (1) eip being added: 0x81f8f5c4 (verified hit the same spot) the previous instruction is 0x800ca220 (2) bp on Cache.cc:93, then bp on Trace::handle_instr where condition is eip==0x800ca220 check how many times it is hit before fault. It's only hit one time (but wait two seconds). There are multiple rounds of instruction execution, now it's 0x800ca223. Strange, after BP, cannot repeat 1. [5] Attempt 5: check all InstrInfo::loadFromCache and see if they are paired. Disabled 3 instructions at line 787 The problem seems to be fixed. Now the problem is that it seems to be super slow. [6] Attempt 6: try differnet size. Worked like a charm. Increased from 16kb to 64kb buffer and no swaps. [7] now verify the correctness of the analysis. --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration). Start: timeStamp: 473389, ins @7c8715bb: call [0x7C801034] End: timeStamp: 475191, ins @7c8715c1: cmp [ebp-0x9C], edi @25069c is written by the mov cr3 instruction at 474668 Now generate the full trace. There is still a segmentation fault. ------------------------------------------------------------------------------- Task 147: Fix the full trace generation problem. ------------------------------------------------------------------------------- [1] break at 474667, add a conditional branch. Problem deserialization. The count in mrRead_VA is 161 however, the total size of the record is only 242. Which is clearly not right. [2] Check the serialization. Set a breakpoint at InstrExecRecorder::serialize when its eip is 0x804dbf60 size is 1277, timestamp: 475212 last line of x/100wx buf x/100wx buf 0x087c91be 0x91b9ff00 0xe500017c 0x017c91b9 FOUND THE PROBLEM: Cache::appendRecord size is only char!!!! 8:45AM [3] inspect full trace. There are still problems with full trace deserialization. check it. Deubg: (1) use generate_raw mode, and check the serialization. BP on InstrExecRecorder.cc:267 timestamp: 475617, eip: 804dbf60, mrRead_VA size 1260, mrWrite_VA size 10, total size: 1265 in Cache::serializeTo, position in block 345698 to 346963, dump of the first 20 words. 0x5dc5066a: 0x804dbf60 0xce00100f 0x00000000 0x307ffe00 0x5dc5067a: 0x91d3d400 0x9200017c 0xba7c91d2 0x90268800 0x5dc5068a: 0x1800047c 0x207c9026 0xfe030000 0x9400087f 0x5dc5069a: 0x027c91d5 0x91d59c00 0xa400027c 0x027c91d5 0x5dc506aa: 0xfb000400 0xc000027f 0x0c7ffb00 0xfb00ce00 seems no problem. (2) now use the full trace mode, check the deserialization BP on InstrExecRecoder.cc It did not hit the breakpoint. Broke at 208065. (3) add condition 208065 and see what's going on. Error occurs between 208062 and 208063 (at 208063 the eip should be 804fbd60). The problem is with the record 804fbd60 (it's saving only 3 bytes) arrPositionArry is not set right, the entries before and after it are all right. (4) fix the depend links problems first. increase to 1000. (5) recover to raw case. Set a conditional bp in Cache::appendRecord(size) when size is smaller than 10. The BP is never hit. go back to full case again and look at the result. It broke again right before mov cr3 instruction. 11:30AM (6) The problem must be in serialization. use gen_raw mode. BP on InstrExecRecorder.cc:267 Then insepect the following: EIP: 0x804dbf60 ts: 342754 mrRead_VA size: 31 blockID: 5 nxtRecordIdx: 15073 blocksize (its position): 297039 size:209 next record index: 15074 serializeTo is called again eip: 804dbf63 ts: 342755 mrRead_VA size:0 blockID: 3 nxtRecordIdx: 15074 blocksize (its position): 297248 size: 21 next record index: 15075 Strangely, when it is deserialized, the index was completedly different. Need to set BP on Cache::saveCurrentBlockToDisk Guess: the problem might be still in Cache::serialization, when it's saving the arrPosition. Then bp on InstrExecRecorder::serializeTo and InstrExecReorder::loadFromCache check the next record 2:30PM fix the problem enlarge from byte to short int. Debug: serialization problem. Cannot get the serialization working. Always broke at ID 267!!! 3:00PM debug into save block 53, and see how the sizes are saved. Problem: maybe the cache is not completely saved!!!! call delete cache. 3:30PM shirnk the test data set to 1 and see what's the problem. fixed 3:40PM still check the size problem. Inspect the data file. Verified it's the data file corrupted. fwrite is not reliable! Split short int into 2 bytes and then try it. 4:05PM still does not work. trace into saveCurrentBlock Guess: the problem might be the calculation of the startIdx?? 4:20PM fix the calculation of startIdx and revert back to write of short int. DONE! 4:30PM test the full trace again. 4:33PM New bug found: system segmentation fault at pagemap destructor. add NULL protection 7:30PM Problem again: full trace does not contain write to 0x25069c now verify the correctness of the analysis. [raw mode] --> ntdll.CsrClientCallServer (at 7c8715bb) --> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format) It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration). Start: timeStamp: 473267, ins @7c8715bb: call [0x7C801034] End: timeStamp: 475069, ins @7c8715c1: cmp [ebp-0x9C], edi @25069c is written by the mov cr3 instruction at 474546 Now generate the full trace: generated. Note timestamp moved -1. Problem: timestamp: 475084 does not depend on 474546 (474545) instead, it depends on 205063 Debug: set a BP on InstrExecRecorder.cc:297 Found the error in mock_mem 8:40PM Now fix the extra memory link problem. 8:50PM check slicing Run too slow. could not stop. 8:00AM 09/27/2013 ------------------------------------------------------------------------------- Task 148: Fix the slicing. ------------------------------------------------------------------------------- [1] mov job1 to job2 and job2 to job1, and reset the experiment. check if it is still stuck. [15 min] Does not work [2] Try enlarge the store size and see what's happening. [15 min] fixed a bug, increased to 1M entries. [3] code inspection. Check how block_size is initialized [15 min] See the problem. The block_size is loaded from the raw trace. We have to regenerate the raw trace. completed 9:00AM [4] inspect the slice generated for b1.exeA OK. [15 min] [5] inspect the slice generated for b10.exe. [15 min] Found problem at 0x407e45 Compare the trace. The problem is that the ESI at 0x407e42 is not the right value 9:30AM [6] Add a memory check support to Cache first. If the target address is over the limit, then stop the application. [30 min] Fixed. 9:50AM [5] insecpt the slice generated for b10.exe [30 min] Pair by Pair compare TO DO: fix the trace problem at 0x407e45. The problem is that the ESI at 0x407e42 is not the right value ESI is from 0x402702 The value of ESI is from 0x4026cb (the push instruction is pushed three times 0, 0x00410008 (iob), 0 in regular execution). It is also pushing the same value into the stack in bad trace. Now need to figure out why it's popping bad values out. Now check the ESP value if they match each other. yes; the value 0x41d008 is stored at 0x12feec. Check the pop instruction at 0x402702. The problem is the error trace is 12 bytes away when doing the pop. Next, start from the first pop, do the pair by pair comparison. Difference occurs at 0x40c4bc, the stack structure is different now. --> found that it departs at 0x0040C047. *** inspect the algorithm *** Found the problem: 0x0040c4c1 is not included in slice (ADD ESP, 0XC). Verified it's hit only once. In the slicing algorithm 0x40c4e7 call instruction depends on 0x0040c4c1 and it is skipped in slice. Dump below: timeStamp: 477231, ins @40c4c1: add esp, 0x0C , ESP: 0x12fed8 -> 0x12fee4 , DEPLINKS: , R: 477230 and ESP value: 0x12fee4, C: 477230 ESP: 0x12fee4 EBP: 0x12ff14 timeStamp: 477235, ins @40c4e7: call 0x00000011 write: (start: 0x12fee0, end: 0x12fee3) , ESP: 0x12fee4 -> 0x12fee0 , DEPLINKS: , R: 477231 and ESP value: 0x12fee0 2:30PM Idea: check timestamp 477813 and 477817 in slice algorithm, strangely 477817 is not hit. Check the log. Identified the problem: when the call instruction is replaced by the skip/NOP, there is a ESP dependency; when we set the previous instruction as needsToVisit, we skipped the ESP dependency. Problem: the ESP Dependency at 477813 is not handled properly. [1] add the dump information for setInSlice. DONE [2] Debug 477813 and see why it's not listed as in slice. During debug, it is said to be not EspDelay(). Strange. Fixed. [3] verify if the fix is successful. ALL GOOD. [4] create another simple example b4.exe Verified, work ok. 7:30AM 09/28/2013 ------------------------------------------------------------------------------- Task 149: Mining Conditional Branches ------------------------------------------------------------------------------- [1] in config, add a new job called mine_conditions [5 min] DONE [2] update the Job class and add a new job cateogry JOB_MINE_CONDITIONSA [5 min] DONE [3] update BatchAnalyzer and update the following [56 min] execJob [5 min] DONE. execMINE_CONDITION_JOB [8 min] DONE gen_MINE_COND_JOB [8 min] DONE. create class taskCondJob [15 min] DONE. declare Trace::mine_cond_slices [10 min] DONE. Debug: [20 min] DONE. 9:30AM [4] Implement Trace::mine_cond_slices(Job *job) , simulate the framework of one slice. first call collect_conditions and then extract_slice_for_condition [30 min] DONE. [5] declare gen_slice_for_branch() [20 min] DONE. [6] think about collect_conditions algorithm [20 min] 10:45AM [5] implement vector<long long int> collect_conditions(vector<sectionInfo>)). Collect the set of condition branches. Avoid loop points [60 min] [6] debug first part of gen_slice_for branch [20 min] [7] debug the function collect_branches [30 min] DONE. 12:30PM. [8] implement function extract_slice_for_condition(long long int ts, string src, string file_path) Idea: loop back from the ts, and mark all data dependencies. If one point is visited multiple times with the same ESP value, then mark that as a loop area. From the start to end loop area, perform the control dependency analysis until it is self-contained. 7:30 AM 10/01/2013 ------------------------------------------------------------------------------- Task 150: refactor the onslice algorithm ------------------------------------------------------------------------------- [1] create a new function full_slice(ts1, ts2) [30 min] DONE [2] test the slice algorithm. [10 min] DONE. 9:00AM ------------------------------------------------------------------------------- Task 151: Data Slice and Identify Single Occurance Component ------------------------------------------------------------------------------- [1] add a function init_data_slice [60 min] DONE [2] test init_data_slice [30 min] set a bp at init_data_slice and change timestamp to 477436. (jnz ...) DONE. [3] completely check the data dependency one by one. Too copmlex to trace. [30 MIN]. DONE. [4] handle instructions like xor eax, eax [2 hr] [a] collect stats for 477437. Total size: 23592. [b] in InstrInfo declare flag FLAG_NO_DATA_DEPENDENCY [5 min] DONE. [c] in InstrInfo declare function examine_no_data_dependency(), first check those inReg set - outReg set is empty and list and then decide the algorithm [20 min] [c.1] implement Util::getSetdiff [d] inspect generated instructions that are identified as no reg data dependency [1 hr] [e] now compare the data slice: 23210 (reduced about 200 instructions). 8:30am 10/02/2013 [5] apply the algorithm to one_slice: slice 3: Trace Size: 524692, in slice: 451858, Percentage: 86.12% Instruction Store Size: 46807, in slice: 40611, Percentage: 86.762664% Instruction Store Size (excluding imported DLL): 3096, in slice: 1792, Percentage: 57.881137% [a] modify the algorithm. [30 min] [b] verify if the new slice is working. OK. however, does not improve that much. Trace Size: 524692, in slice: 451593, Percentage: 86.07% Instruction Store Size: 46807, in slice: 40531, Percentage: 86.591749% Instruction Store Size (excluding imported DLL): 3096, in slice: 1787, Percentage: 57.719638% [c] inspect the log [90 min] Found problem: NO REG DEP here for ts 477157 @0x7c801892 ins @7c801892: inc eax insert conditional BP to check it. Fixed the bug. the call of examine_no_reg_dep() is called after the type is set! Found 3 more problems: 472986, 471574, 475575. fixed another bug. d [d] verify how the new data slice algorithm helps reducing the size. Check the instructions in dump one by one, and check the log, and open the full trace. [30 min] Most of the records do not actually reduce the size. [e] check timestamp 466639, how is its inslice set? conditional BP on InstrExecRecorder::updateCache. Verified, it's ok. added by function processing. Final stats: Trace Size: 524692, in slice: 453436, Percentage: 86.42% Instruction Store Size: 46807, in slice: 42494, Percentage: 90.785566% Instruction Store Size (excluding imported DLL): 3096, in slice: 1802, Percentage: 58.204134% +++ Task completed: Task generate one slice for: /home/samba/smbuser/slice_jobs/job3 11:00AM ------------------------------------------------------------------------------- Task 152: function processing of data slicing ------------------------------------------------------------------------------- [1] algorithm design [45 min] [2] step 1. define function void identifySingleOccuranceComponent(long long int ts, long long *tsStart, long long *tsEnd) [10 min] DONE [3] step 2. Modify the gen_slice_for_branch algorithm [30 min] (1) add two vectors: vecSOCStart, vecSOCEnd() and update these two vectors during the loop DONE. [4] Debug: Problem 1: init data slice is very large. Not sure if it is right. Trace into init_data_slice and check the timestamps being visited. Found problem long long int overflow (as long) It seems that ts 477434 is not cleared. Got to call clear_slice-tags(). Fixed the problem. TO DO: fix the identifySOC function. Start call should be fixed with corresponding ret call. 9:00AM 10/03/2013 ------------------------------------------------------------------------------- Task 153: Misc. tasks of data slicing ------------------------------------------------------------------------------- [1] improve the branch collection. Add a hash_map to avoid visiting the same branch again. [15 min] no room for improvement. DONE. [2] move the set visit function to generate full trac.e [20 min]. DONE. [3] Algorithm Design [30 min]. DONE. 10:00AM [4] fix the IdentifySOC algorithm. Remove the logic of add/minus 1. [15 min] DONE. [5] debug the first 5 occurance of IdentifySOC. [15 min] OK. [6] examine collect_branch again and see if there can be further improvement. [10 min] no room. DONE. [7] modify full_slice algorithm prototype, add a boolean variable bSOC. [5 min] DONE. [8] algorithm design of full_slice [15 min] DONE. [9] quick look at think slicing [5 min] DONE. 11:00AM [8] full_slice SOC component design: [30 min] [1] call full_slice on start, and end-1 because end will be reached anyway. [5 min] DONE. [1.5] start and end should be added in slice initially. [5min] DONE. [2] data link and reg link will be added as usual. [5 min] DONE. [3] esp link and ebp link will be added usual, but out of range link will not be added, because esp/ebp value guaranted at entry [5 min] DONE. [4] control link will be added as usual, but out of range link will not be added because start ponit will be reached through jump [10 min] [9] Debug. [70 min] [1] test the EBP and ESP case [10 min] DONE. [2] test the control link case [10 min] DONE. [3] set ts = 477437 and debug the first 2 cases [20 min] DONE. There seems to be some problems [4] inspect the log of 5 cases [30 min] (1) found bug. dependency not as expected. VERIFIED ACTUALLY OK. 8:00PM [10] binWriter.asssembleJMP [30 min] [1] find out all two kinds of jmp length. [45 min] [a] jmp short EB + OFFSET (positive up to 7e, negative up to 80) [b] long jump, e9 + 4 bytes offset (however, there are limits 0x09000000). [2] create function int asJMP(unsigned int eip, unsigned int target, char *buf) 7:30AM 10/04/2013 ------------------------------------------------------------------------------- Task 154: binWriter ------------------------------------------------------------------------------- [1] add function asJSP(curValue, expVal, buf) [15 min] DONE. [2] test adJsp [15 min] DONE. 8:30AM [3] add function asJBP(curValue, expVal, buf) [10 min] DONE. [4] test adJsp [10 min] DONE. [5] Algorithm Design. declare all function prototypes [20 min] declare function in binWriter::writeDataSlice(Trace) in Trace needs to make getEsp_after and getEsp_before public. [6] implement writeDataSlice [1hr.] DONE. 10:15AM [7] implement initEntryPoint. [7.1] implement asINIT_ESP and test it [30 min] DONE [7.2] check trace and examine the entry point and then decide if need to catch the entry point [15 min] The entry point occurs at 306916.a DONE. [7.3] implement binWriter::genEntryPoint [30 min] DONE 11:30AM [9] implement Trace::findTSWithEIP() [15 min] DONE [10] test findTSWithEIP [10 min] DONE. [11] update the logic with tsEntry [15 min] DONE. [12] debug into it [30 min] [x1] found one bug on get_ESP_VALUE_BEFORE and get_esp_value_after .DONE. 8:33PM [13] now implement handle_SOC. [1 hr] [1] declare prototype [15 min] [2] implemen it. [45 min] [3] implement writePartialSlice [15 min] [4] update writeInstruction to verify buffer. [15 min] [14] debug through [20 min] bug1. problem with writeInstruction. 7:30AM 10/05/2013 ------------------------------------------------------------------------------- Task 155: Test the Data Slice Algorithm ------------------------------------------------------------------------------- [1] debug through: (1) break on Trace::gen_slice_for_branch, set ts to 477437, and then break on [30 min] binWriter::writeDataSlice. [1] fix asJMP bug. DONE. [2] partialTrace bug. DONE [3] fix the eipBridge problem. DONE. [4] fix the bridge not in section problem. Algorithm Design: 30 min [2] fix the bridge not in section problem. Idea: require the last instruction in SOC should int in section. Make the following changes: (1) update esp_after_soc to be the value before the last instruction (2) the last instruction is replaced with bridge component (3) when generating the partial slice, do not include the last instruction (but the slicing algorithm will guarantee that the last instruction is hit). 9:00AM implementation. [1 hr min] finish (1), (2), (3) DONE add log message for writeSOC. debugging: [1] fix eipBridge bug. DONE. [2] fix tsEnd bug. DONE. [3] fix the addr 7cxx bug. [4] fix the merge SOC problem. 10:30AM fix the addr 7cxx bug when it's a single instruction. [20 min] Idea: whenever found such raw slice instruction, expand SOC as well. fix identifySOC. still not work. check ts: 308356 Problem: 308356 is added later in the later segment. So the process needs to be repeating itself until no further instructions are added. 11:30AM Implementation to fix 7cxx bug: [1] add Trace::getSliceSize() - calculate slice size [15 min] [2] add an additional loop - 15 min [3] add assist function: isTSInAnySOC(vecSOCStart, vecSOCEnd) DONE [4] add assist function: insertSOC(vecSOCStart, vecSOCEnd)A DONE. [5] modify the algorithm DONE --------------------- [6] dbug isTSInAnySOC [15 min] FIXED one bug. [7] debug insertSOC [35 min] redid the logic . DONE solve trouble "case skip should not ..." Idea: can allow it to be NOP. Because it will guarantee to be reached by the algorithm. check the problem of suspicious index. fixed the bug. DONE. [8] debug the algorithm [30 min] fix the 7cxx problem. LINE 339 ---> problem. Fix: fix the verifyNop funciton. DONE. [9] fix the ESP/EBP problem. add tsLastEnd, if tsLastEnd is equal to tsStart, then should skip the bridge. [10] fix overwrite 2. improve the ESP and EBP. Idea: the original sequence of timestamps (instructions), if there are no conditional jumps, then they will fore sure to lead to the next SOC. If they make any modifications to data, they are not referenced by any later SOC anyway, thus no dependency; if they depend on any previous SOC, they make no change to the control flow becuase there are no conditional jumps. So, when the gap is TOO SMALL, we just need to keep the original instructions. Implementation: [1] implement verifyNoCondJumps(Trace *trace, Timestamp tsStart, tsEnd) [15 min] DONE. [2] debug and test verifyNoCondJumps [15 min] DONE. [3] modify the algorithm: when call verifyAllNops fails, call verifyNoCondJumps [15 min] DONE. [11] fix the logic of writeSOC bridging component.. [1] modify writePartialTrace add a bool flag [10 min] DONE [2] modify the writeSOC algorithm [30 min] Alg: generate the buffer of bridge component first, then check the gap. if gap>0 if gap<bridgeSize: write those instructions directly else write the components else //gap<=0 write the components //but face the failure of visiting back the next immediate instr. DONE. [12] THERE ARE BIGGER PROBLEMS WITH THE ALGORITHM ---------------- Problem: tsLastEnd: 355578 (@403b94), tsStart 355591 When build bridge component it overwrites @403b9b (ts: 355399) ---> have to redo the data slice algorithm -------------- 7:00AM 10/08/2013 ------------------------------------------------------------------------------- Task 155: Redo the Data Slice Algorithm ------------------------------------------------------------------------------- [0] Alg Design [20 min] DONE [1] declare an SOC class [15 min] DONE [2] declare the SOCManager class [20 min] DONE 1. addSOC 2. getSize 3. getSOC [3] modify the main algorithm in Trace to call SOCManager methods [1] move identifySOC to SOCManager. [15 min] [DONE] [2] declare SOCManager in gen_data_slice and then move findTS in SOCManager[15 min] DONE. 8:30AM [3] move insert_into_vec into SOCManager [15 min] DONE [4] move mergeSOC into SOCManager [15 min] [5] refine the main algorithm in Trace gen_data_slice and add functions as needed [30 min] 10:00AM [4] work on main algorithm (1) add a Trace::setIER_II(ts) inline function [8 min] (2) modify the interface of add(). [8 min] (3) finish the addTS [25 min] (4) algorithm design: addSOC [10 min] first search for SOC to insert if it can be merged with previous one, merge it otherwise call insertSOC to literally add one SOC (5) define a function full_slice_all_soc [10 min] 11:15AM (6) implement addSOC [20 min] (7) algorithm design of insertSOC At [20 min] //1. call SOC->setBridgePoint //2. if fail, merge it with nextSOC //3. else: really insert the SOC (8) implement insertSOC [15 min] (9) modify addSOC Logic [20 min] DONE. 9:00PM (10) double check insertSOC design [20 min]. DONE (11) implement SOC::setBridgeTo(). [25 min]. DONE 8:00AM 10/09/2013 ------------------------------------------------------------------------------- Task 156: Test and Debug the Data Slice Algorithm ------------------------------------------------------------------------------- [1] unit test insert_into_vector [15 min] [2] unit test remove_vec [15 min] [3] initial debug gen_data_slice [10 min] OK. will visit later [4] implement check_all_soc [25 min] (1) verify in descending order [10 min] (2) verify SOC::bridge [15 min] 9:10AM [5] fix the "IMPOSSIBLE" error. line 46. [15 min] DONE. [6] debug findInsertLocation [10 min] DONE. [7] check the case 477420. fix the bug on get_room [10 min] [8] fix one minor bug in setBridge [5 min] [8] debug through SOC::get_room [15 min] DONE. [9] modify get_room(set a minimal size needed) [10 min] DONE. [10] debug through SOC::gen_bridge [15 min]. DONE.A 10:30AM [11] debug through SOC::setBridgeTo [25 min] DONE. [12] fix SOC::gen_bridge bug [15 min] 11:30AM [13] improve init_data_slice efficiency. [20 min] DONE. [1] change the return value of Trace::set_slice. if return 0, then no updates, return 1 updated the timestamp only, 2 both. [2] change the total of init_data_slice [14] fixed one nasty bug of relading ier in init_data_slice [25 min] DONE. [15] debug SOCManager::identifySOC [10 min] DONE. [16] debug SOCManager::insertSOCAt [15 min] DONE. [1] add logger message for full_slice 7:30PM [17] debug and verify findTS [10 min] DONE [18] initial debug of addSOC [60 min] fixed bMerged bug. fixed another bug about merge. added delete code. [19] debug through addTS [10 min] done. [20] debug through verify_bridge_fine. [20] debug verify_all_soc. fix bug1: ignore bridge fine for 0. add one more parameter to verifyBridge. have to unset inslice for those on the path. When verify bridge fails, should merge. [21] separate out the check of descending order. 7:45AM 10/10/2013 [22] Debug the strange descending order problem Insert the check descending order at the beginning of addTS. Problem is with identifySOC, remove the isInSlice(). DONE. [23] New problem. too many bridge fails. Check the reason. (1) check why 477380 and 477381 were added to slice. It's caused by a RET which relies on CALL that is in the bridge. 10:30AM [24] Improve the identifySOC will lose some precision. [30 min] DONE. check the call table and start the search from the last matching call. 1st pass 71 SOCs -> 41 SOCs merge (speed improved and more SOCs than [23]) 2nd pass 45 -> 22 SOCsA 34d pass: 23 -> 23 [25] Implement binWriter::writeDataSlice [45 min] [26] debug through writeDatasliceA (1) fix one bug related to loop. (2) add log to writeSOC SEEMS working needs to regenerate log and check. ------------------------------------------------------------------------------- Task 156: check the problem of raw trace ------------------------------------------------------------------------------- [1] copy back job2 AND 3 . they work fine [15 min] [2] try job1 again. Still stuck gdb and trace helper_trace2 and handle_instr and check the problem [30 min] Somehow after recompiled it works. !!!!!!!!!!!!!!!!!!!!!!!!!!! [3] new branch timestamp to test is 486953 corresponding to eip 40103c!!!. !!!!!!!!!!!!!!!!!!!!!!!!!!!!! Now verify if it works [1] hit the main function [2] skip the first conditional branch. DONE. Problems: [1] it includes printf, which shouldn't [2] execution of printf breaks. Continue to fix the context switch problem and then come back to visit the problem. Slice Stats: Trace Size: 539345, in slice: 183354, Percentage: 34.00% Instruction Store Size: 48113, in slice: 23528, Percentage: 48.901544% Instruction Store Size (excluding imported DLL): 3247, in slice: 2709, Percentage: 83.430859% full slice size: 539344 ------------------------------------------------------------------------------- Task 157: handle the context switch problem ------------------------------------------------------------------------------- In raw processing, checkRecordStatus if context switch happens at if-branch the analysis is inaccurate. What if multi-threaded programs? Thread Id can be determined using FS:[0x18]. can be handled later (add to InstrExecRecorder the threadID). For now to detect context switch, for each InstrInfo, include another address called targetAddr for jmp or conditional jump instructions. The last 4 bytes or the last 2 bytes should be the relative or absolute address. Several complications: (1) sysenter does not return exactly at the same address! (gap of around 6 bytes!) (2) transfer control (jmp call) are easy to handle. RET will need InstrExecRecorder! 7:30AM 10/11/2013 [0] Algorithm Design [20 min] DONE. Implementation Plan: [1] declare InstrInfo::getTargetAddr() as inline function. [20 min] DONE. [2] debug [1]. [20 min] DONE. 8:30AM [3] declare InstrExecRecorder::getRETTarget() as inline function. [30 min] [1] algorithm design [45 min] [2] in ops_sse.h:helper_trace2(), if opcode is c2 or c3, take the value at address ESP_BEFORE [10 min] DONE [3] change the definition of handle_instr, add one more parameter [20 min] DONE 9:30AM [4] debug [2]. [20 min] DONE. [9] update Trace::checkRecordStatus [40 min] DONE. 10:30AM [10] code inspection [20 min] [11] debug through the function [30 min] [1] problem. ret value is not as expected. Need to record lastRET_ADDR [DONE] [2] fix the page_map problem. 11:40AM [12] continue debug. [10] run and test [15 min] 6:00PM [13] check the mysterious pagemap problem. [30 min] Now blue screen, check if it's the save esp causes problem. FIXED. verified, the ESP_BEFORE is only valid when it is being changed. so need to adda condition when retrieving the value! [14] check the switch warning problem. Found the problem, the target address problem can be much more complex. It can be different addressing mode. It could be register indexed, e.g., CALL [EDX]. However, we cannot actually save every register. This approach could be too costly. Instead use another approach. In Trace class declare a pair of interrupt handler vector (start and end) Whenever encounter the start, record it and when meet the exit back off from the it. 7:45PM [15] new interrupt trace design. [1] in trace class declare INT_HANDLER_SIZE, int [] ih_start, int [] ih_end [15 min] DONE [2] define inline function isInterruptHandler() [10 min] DONE [3] refine the algorithm, if it's interrupt handler, enter the status, record the expected Instruction [30 min] [4] debug the algorithm [20 min] [1] fix the memory tracing problem. Recompiled not showing any more Seems working and improved speed. [5] debug and run and test [20 min] Regenereate full log (around 10% less in size) 442437 corresponding to eip 40103c! New size: race Size: 493066, in slice: 154682, Percentage: 31.37% Instruction Store Size: 55035, in slice: 18431, Percentage: 33.489598% Instruction Store Size (excluding imported DLL): 3247, in slice: 2662, Percentage: 81.983369% Improved about 2%. Still the same problem. printf crashed and it should not be included at all. 7:45AM 10/12/2013 ------------------------------------------------------------------------------- Task 158: Devise a method for collecting reverse track. ------------------------------------------------------------------------------- Basic idea: add a reverse_pointer attribute to each node and print it when necessary. [0] algorithm design [30 min] DONE. 8:15AM [1] add an attribute "long long int reverse_pointer", and one flag {bReversePointer} in InstrExecRecorder, add functions for clear flags [15 min] DONE. [2] modify serialization and deserialization [15 min] DONE. [3] clear flag when finish the analysis [10 min] . DONE [3] test serliazation [30 min] DONE] 9:15AM [4] define function reverse__trace(long long int ts) to Trace [20 min] [5] modify Trace::setSlice() add attribute reverse_pointer (source), modify InstrExecRecorder::setSlice() [1] add additional attribute to InstrExecRecorder and Trace [20 min] DONE [2] dd the attribute and serialization support to InstrInfo [60 min] DONE [3] setEspDelay and setControl etc. all should have source [15 min] DONE. [3] modify the calls of ::setInSlice [45 min] DONE. [4] unit test InstrInfo [30 min] [1] fix serialization [2] fix one bug in InstrExecRecorder::handle_instr [5] debug through reverse_trace [15 min] [5.1] fix the iifinite loop problem at 438251 Fix: add comparison operator to the setReversePointer function. Working now! 4:00PM [6] test the system by calling it once at the end of trace for the printf function [20 min] Regenereate full log (around 10% less in size) 442437 corresponding to eip 40103c! reverse trace on the following: timeStamp: 428481, ins @401022: call 0x00000076 Finds that 428481 (call printf) depends on 442286 [7] Check 442286, now the report does not report any dependence. In full-dump there is no instruction depending on 442286. Set a BP and check why 442286 is included. It is propagated from other occurance of the same instruction. 08:00AM 10/13/2013 ------------------------------------------------------------------------------- Task 159: Analyze why printf is included ------------------------------------------------------------------------------- [1] ts=442437, (eip 40103c!) [2] bp on trace.h 921, 994, 1007, and 1352, ------------ [1] fix one bug in [2] observation: 442286 InstrInfo (@eip: 0x402943) has reverse pointer to 435494, this is caused by ts 435481, it should be propagated to 442286. Found the problem at Trace.cc:432, the instrProcessor did not load properly. But the value has been updated before. set BP on clear_reversePointer and set_reversepointer [3] problem: clear_slice does not assign -5, append "L" after the constants definition. Did not fix the problem. when updating the value, 443394>-5 is not evaluated as true. Found the problem: reverse_pointer is declared as unsigned long long int. Shoot! Fixed. Now the reverse_trace has 33 steps now. -------------- [3] now analyze the reverse trace. Problem is caused by a multiple occurance instruction in a function. This seems to be propagated too much. [4] add logger flush() function [5 min] DONE. [5] continue the analysis ts=442437, (eip 40103c!) then bp on Trace.cc:851 *** Observation 1: 435494 depends on 435481 (@402943 pop ebx) but 442286 is included in slice because @eip:402943 is included and it's the same instruction at line Trace.cc: function 0x402935 is CALLED MANY TIMES The separator of printf() and getchar() in b20.exe is 435198. ts: 428481 calls printf (@eip: 401022) --435192 here (pop ebx) @402943: 435198, add esp, 4 435199, calls getchar (@40102a) --435378 is here (push ebx) @40290a: -- depends on 435192 xxx --435481 is here (pop ebx) @402943 (in seh_epilog4 function) - pops 0 --435494 is here (push ebx) @409590:-- depends on 435481. push input parameter '0' file handler. xxx--442286 is here (pop ebx) @402943: -- depends on 435378, included in slice because of 435378 xxx (M) ---435502 is here (mov esi, [ebp+8]):@408f34 reads from the ebx pushed by 435494. reads input parameter '0' 442433 cmp ..., 0x64 442439 calls pringf("nok) 447072 add esp (after printf) Problem: 442286 is incouded is actually normal, the problem is that actually ebx is not used in getchar, but it is regarded as information passed!!! observation: [1] there is no cr3 change between 435192 and 435378, so the ebx dependency should have no problem. [2] check if 435502 is really reading from 435494. Verified: ok. both accessing from 12fed4. value is 0. Seems to be the file handler for the read function. [3] check wwhy 442286 is visited, because 442289 @0x402947 in slice . recorded using reverse_trace function. ------------------------------------------------------------------------------- Task 160: Handle mutiple occurance timestamp ------------------------------------------------------------------------------- [1] improve the algorithm. When a timestapm is not inslice but the corresponding InstrInfo is: check if the instruction has no side effect (would not trigger exception), if yes, do not put it in slice (so it will read its data dependency from register or memory, but it will never trigger exception). After the adjustment: Trace Size: 493066, in slice: 154659, Percentage: 31.37% Instruction Store Size: 55035, in slice: 18427, Percentage: 33.482329% Instruction Store Size (excluding imported DLL): 3247, in slice: 2662, Percentage: 81.983369% No big improvement. ------------------------------------------------------------------------------- Task 161: improve the algorithm by ignoring esp/ebp links without any other usage ------------------------------------------------------------------------------- ts=429635 [1] add the attributes. OK. [2] fix unit test. OK. [3] regenerate the trace. There are bugs fix them. [4] introduce bNoDataSlice instead. OK. [5] new bug 0x4012AD. put in conditional jump 7:30AM 10/16/2013 ------------------------------------------------------------------------------- Task 162: Fix slicing algorithm problems ------------------------------------------------------------------------------- ts=429635 [1] 0x40129E is not included. Check the problem In full trace: ts is 284230 for 0x40129e. In branch slice trace: 284230 is not included because 284231 is marked as bNoData Add dumping information and check details. Problem: neededForReg and neededForMem all 0. [1.5] need to update the serialization and unit test. DONE. 9:00AM [2] problem with 0x004012e5 --> 0x00403e4b (call esi). In full trace: eip 0x00403e4b is at ts: 292643. The problem is 292640 is not processed at all. The problem is 292643 call esi is an instruction that needs data dependency. Fix: [2.5] declare function isJumpNeedsData, the logic is to check if if this is a jump and it has dependency on any of the registers, then it should be regarded as a jump instruction that needs data; or its input operand is not constant. 9:45AM [1] add the function isJumpNeedsData and add a flag, and add a void set function [10 min] .DONE [2] in function setInputOutput reg update the flag. [20 min] DONE. [3] debug (set bp on the set function) [45 min] [3.1] fix one bug about type. DONE [3.2] for EIP in read/write should ignore it. DONE [3.3] remove the [rw] of jmp/call, coz it's always updating EIP/ESP OK now. [3.4] check why flag is not set. set cond bp on 0x403e4b. It is set. [3.5] improve the setJmpNeedData function [10 min] OK. [3.6] set conditional BP in trace. [10 min] DONE. 8:30PM [4] Problem with eip 0x004028fc it's not included in slice, but it's dependeded by ts: 309044 (sub esp, eax) The problem is that 309044 (sub esp, eax) is depended for ESP So for ESP dependent case, we need to distinguish between depend on esp and depend on mem. For examle, when there is no reg and mem dependency. PUSH ECX does not have propagate the dependency on ECX (does not need ECX for ESP) SUB ESP, EAX needs to propagate the dependency onto EAX. (does need EAX for ESP) So the logic is where there is no REG and MEM dependency (except ESP) if the instruction is being depended on ESP, then push/pop does not need data dependency and all other instructions will need to propagate data dependency. [4.1] declare flag pushpop_reg_const_operand_only set the flag [5 min] DONE [4.2] in setInputOutput Reg, set the pushpop_reg_const_operand_only [15 min] [4.3] implement the isNeedDataforEspEbp [5 min] DONE [4.4] debug [15 min] [4] run and test and check ts 292640-292643 [10 min] 9:00AM 10/17/2013 ------------------------------------------------------------------------------- Task 162: Fix slicing algorithm problems ------------------------------------------------------------------------------- ts=429635 [1] Problem 1: eip@0x403fd7 (mov ebp, esp) [ts:@279775] is not included, but it is depended by the leave instruction at 0x404007 [ts:@283850] In the full trace, 283850 mistakenly depend on [timeStamp: 283828, ins @804df995: pop ebp] The problem is caused by context switch. @@800ca21d: pop ebx 9:15AM 10/17/2013 ------------------------------------------------------------------------------- Task 163: Investigate context switch problem's complete solution ------------------------------------------------------------------------------- [1] intercept on raise exception and add a event called Interrupt. print out the relevant values [2] code inspection: note target-i386/int-helper.c and excp-helper.c raise_interrupt is more generic than raise_exception, raise_exception is to simply call raise_interrupt and set the exception id as the interrupt number, and set the "is_int" to 0, and set the next_eip_offset to 0 (to resume at the original instruction). Guess: timer interrupt should not be using raise_exception but raise_interrupt. raise_exception mainly has GPF (general protection fault), raise_interrupt should have I/O requests as well 10:45AM [3] Implementation Plan: [1] declare struct int_record and a new event in event.h [15 min] DONE. [2] define TraceManager::handle_interrupt [10 min] [DONE] [3] define Trace::handle_interrupt [10 min] DONE. [4] in excp_helper.c call send_event [20 min] DONE [5] in ops_sse.h decalre function isCR3ToTrace [15 min] DONE. [5] debug [20 min] DONE. 11:45AM [6] IMPROVE the tracing in Trace;:handle_interrupt [5 min] [7] inspect the raw trace generated. [30 min] Observation: alsmot every Interrupt is accompanied by context switch. 4:00PM Development Plan: Trace::checkRecordStatus. Logic: (0) declare a flag (JUST RECEIVED INTERRUPT). [5 min] DONE. (1) update the stack. If just got the event interrupt switch the flag of interrupt and push one token into stack. If the opcode is iret then pop the stack (note iret's opcode check xp image) [30 min] DONE (2) keep the rest of the logic [10 min] DONE. (3) debug [15 min] 5:00PM Problem 1: expectedEIP problem. Need to add last_eip in Trace and use it to predict the next_eip. Still not working Attempt 2: catch iret instruction. (4) inspect log [20 min] Seems now ok. Sometimes instructions repeat itself and the expInstr is the next one but it seems ok. *** ts=427098 for eip 0x40103c 7:45PM Still problems with nested interrupt iret. Inpect the log, look at all the warnings. [1] most of exceptions has int_no:0xe and error_no:4 or 6 both can trigger warning or not trigger warning [2] for most warning message, the address is the next instruction (mostly jumps/calls) [3] only two exceptions @403fdc and @404028. For @403fdc it needs 3 iret to return to the right place. [4]*** all interrupt information next_eip_addon is 0. looks suspicoius. Debug Plan: [1] conditional bp on @403fdc [2] then bp on raise_interrupt, raise_interrupt2, and checkRecordException. Observation: 0x403fdc raises tbl_fill error -> 0x804e1f25 -> iret to 0x80range Verified; there is no interrupt in between. So an interrupt can actually contain multiple iret before returning to the target. Check why there are so many levels Debug Plan: directly return true for 0x403fdc. Adjust the value of eipExpected so that we can capture all iret. Problem is Here: timeStamp: 276914, ins @804e1fca: test [ebp+0x70], 0x00000200 read: (start: 0xf750bdd4, end: 0xf750bdd7) , ESP: 0xf750bd64 -> 0xf750bd58 timeStamp: 276915, ins @81f8f5c4: push esp This is clearly a swap, it is not captured by interrupt. 7:30AM 10/18/2013 ------------------------------------------------------------------------------- Task 164: Investigate context switch problem's complete solution AGAIN ------------------------------------------------------------------------------- Debug Plan: [1] conditinal bp on @403fdc [2] then bp on checkRecordException and decrease the idxExpect so that all instructions will be captured [3] conditional bo on @804e1fca helper_trace2, and then step by step and see how it gets into @81f8f5c4. Observation: [a] right after 0x804e1fca, the current TLB block ends and it enters #0 cpu_x86_exec (env=0x28dbd190) at /home/csc288/qemu/qemu-1.4.0/cpu-exec.c:321 There is a huge branch checking interrupts. It first calls cpu_svm_check_intercept_param(env, SVM_EXIT_INTR, 3270); and then *** do_interrupt_x86_hardirq(env, intno, 1); defined in target-i386/seg_helper.c:1293 Note: do_interrupt_all in seg_helper.c:1196 Debug Plan 2: Figure out the call sequence [1] b excp_helper.c:111 (raise_interrupt2), then Trace::handle_instr, Trace::checkRecord, do_interrupt_all verified: it's raise_interrupt -> do_interrupt_all -> Trace::handle_instr So we can move the logic to do_interrupt_all 9:45AM Implementation Plan: [1] move the logic from excp_helper.c to do_interrupt_all in seg_helper.c [15 min] DONE. [2] add two check logic: [15 min] [DONE] (1) get the first 16 bit of the target address, if not match, generate ERROR message (Util::error) (2) when idx>1 generate a warning: nested interrupt. [3] generate the raw and inspect @403fdc [30min] [1] handle the case sometimes next_eip is 0.[ There are three such warnings captured: [1] 0x7c900719->0x7c902f06: verified ok [2] ins @805633f1: jnz 0x00009600 -> @805633f7. OK. [3] @805788ea: --> 804dc750. ok. Warnings are caused by 0xb1 and 0x9e. Seems no need to figure out the details. [2] observe nested: all good. no more than 3 layers of nested. [3] observe @403fdc. Now fine with nested interrupt. There is a strange repitition of code as shown in the following, not sure if it will impact Note here: the execution has proceeded to 0x403ff0 (after the nested interrupt on @403fdc returns successfully), Then it gets an interrupt which directly returns to 0x403fdc again. ----------------------------- INTERRUPT: int_no: 0xe, is_interrupt: 0, error_no: 4, nxteip: 0x403fdc -- Context Switch! -- Context Switch BACK! timeStamp: 263660, ins @403ff0: mov ebx, 0xFFFF0000 timeStamp: 263661, ins @403fdc: mov eax, [0x0041D400] read: (start: 0x41d400, end: 0x41d403) timeStamp: 263662, ins @403fe1: and [ebp-0x8], 0x00 read: (start: 0x12ffb4, end: 0x12ffb7) write: (start: 0x12ffb4, end: 0x12ffb7) timeStamp: 263663, ins @403fe5: and [ebp-0x4], 0x00 read: (start: 0x12ffb8, end: 0x12ffbb) write: (start: 0x12ffb8, end: 0x12ffbb) timeStamp: 263664, ins @403fe9: push ebx write: (start: 0x12ffa0, end: 0x12ffa3) , ESP: 0x12ffa4 -> 0x12ffa0 timeStamp: 263665, ins @403fea: push edi write: (start: 0x12ff9c, end: 0x12ff9f) , ESP: 0x12ffa0 -> 0x12ff9c timeStamp: 263666, ins @403feb: mov edi, 0xBB40E64E timeStamp: 263667, ins @403ff0: mov ebx, 0xFFFF0000 ------------------------------- [4] generate the full trace. [10min] [5] generate the branch use ts=404629 for 0x40103c. Seems to fix the 403fdc problem, but new problems comes up. ------------------------------------------------------------------------------- Task 165: check writeSOC. DONE. ------------------------------------------------------------------------------- 11:30AM ------------------------------------------------------------------------------- Task 166: During every iteration, redo the init data slice again and see what's going on. ------------------------------------------------------------------------------- *use ts=404629 for 0x40103c. [1] Implementation [15 min] DONE. [2] Debug [15 min] [1. problem. Add program entry into slice]. Read the log.. full_slice is not working. fixed [2]. Still problems: (1) printf is still included. (2) infinite loop at the beginning. 3:30PM ------------------------------------------------------------------------------- Task 166: check why printf is included again ------------------------------------------------------------------------------- *use ts=404629 for 0x40103c. printf eip: 0x401022 (ts=392742) [1] bp on Trace::gen..., set ts=404629, and reverse_trace on 392742 ============================== Reverse ID: 0, ts: 392742, ins @401022: call 0x00000076 Reverse ID: 1, ts: 399467, ins @40112b: ret Reverse ID: 2, ts: 399468, ins @401027: add esp, 0x04 Reverse ID: 2, ts: -3, -> SOC End ============================== Problem: 399468 should be treated as bNoDataDependency at all! BP on 399468; it's included because of the esp link. [2] even fixed the above, the ret at 399467 is still treated as control link because the function has dependee. processFunction identifies printf() as bHasDependee because of 392741 has isEspDelayDependent() and could not find one before the entry. processFunction has bug! it's not recovering EIP! 392743 could not find a corresponding ts with same ESP! Found the problem: 392743 is not cleared for EspProcessDelay flag! 7:30PM. [3] check why findTS did not find anything. [45 min] fixed the bug. DONE. [3.5] also double check the last connection point. OK. [4] fix the clear_in_slicetags, and add bSOC to processFunction [30 min] [5] the tsEsp search is delaying. [5] debug: use ts=404629 and break on processFunction of tsRightAfterRet=399468. Observe 399468. 7:30AM 10/19/2013. Observe the log.txt, sliceat: ts=404629 for 0x40103c (jnz ...) printf call: 392742, 399467 read the processFunction log and see why there are real data dependency. (in ts reverse order) 399457: because of 399480 push fs:[], it reads from 399457 move fs:[], ecx 399223: because of 400490 in getchar. It's in sysenter code, push [ebx], must be some kernel structure. 399186: because of 400935 @804d917e: xadd [ecx], eax (looks like some stats updates) 399156: caused by 400729 ins @8056452a: xadd [ecx], eax 399122: caused by 400798 ins @804e2a00: mov ebx, [eax] 399092: caused by 402006 @804d91b9: xadd [ecx], eax Trouble with kernel structure. Debug effort 2: check 399457 and 399223's reverse trace. Analysis shown below: Problme: why is 40159d included? ====================================== reverse trace for ts: 399457 the mov fs:[], ecx ====================================== Reverse ID: 0, ts: 399457, ins @402938: mov fs:[], ecx Reverse ID: 1, ts: 399480, ins @4028f5: push fs:[] Reverse ID: 2, ts: 399484, ins @402908: sub esp, eax Reverse ID: 3, ts: 399485, ins @40290a: push ebx Reverse ID: 4, ts: 399486, ins @40290b: push esi Reverse ID: 5, ts: 399487, ins @40290c: push edi Reverse ID: 6, ts: 399491, ins @402917: push eax Reverse ID: 7, ts: 399493, ins @40291b: push [ebp-0x8] Reverse ID: 8, ts: 399499, ins @402934: ret Reverse ID: 9, ts: 399508, ins @401500: push esi Reverse ID: 10, ts: 399514, ins @4016ba: mov esi, [ebp+0x8] Reverse ID: 11, ts: 399530, ins @404779: push esi Reverse ID: 12, ts: 399547, ins @4047a1: pop esi Reverse ID: 13, ts: 399550, ins @4016de: or [esi+0xC], 0x00008000 Reverse ID: 14, ts: 399610, ins @404091: mov eax, [esi+0xC] Reverse ID: 15, ts: 399617, ins @4040b3: or eax, 0x01 Reverse ID: 16, ts: 399620, ins @4040be: jnz 0x0000000B Reverse ID: 17, ts: 399621, ins @4040c9: mov eax, [esi+0x8] Reverse ID: 18, ts: 399626, ins @4040d5: call 0x000000BC Reverse ID: 19, ts: 399635, ins @4041b6: ret Reverse ID: 20, ts: 399636, ins @4040da: pop ecx Reverse ID: 21, ts: 399638, ins @4040dc: call 0x00005403 Reverse ID: 22, ts: 404483, ins @4095c9: ret Reverse ID: 23, ts: 404484, ins @4040e1: add esp, 0x0C Reverse ID: 24, ts: 404492, ins @4040fe: push esi Reverse ID: 25, ts: 404497, ins @404196: mov eax, [ebp+0x8] Reverse ID: 26, ts: 404500, ins @4041b2: mov eax, [eax+0x10] Reverse ID: 27, ts: 404504, ins @404105: cmp eax, 0xFF Reverse ID: 28, ts: 404505, ins @404108: jz 0x00000032 Reverse ID: 29, ts: 404506, ins @40410a: push esi Reverse ID: 30, ts: 404511, ins @404196: mov eax, [ebp+0x8] Reverse ID: 31, ts: 404514, ins @4041b2: mov eax, [eax+0x10] Reverse ID: 32, ts: 404518, ins @404111: cmp eax, 0xFE Reverse ID: 33, ts: 404519, ins @404114: jz 0x00000026 Reverse ID: 34, ts: 404520, ins @404116: push edi Reverse ID: 35, ts: 404522, ins @404118: call 0x00000079 Reverse ID: 36, ts: 404531, ins @4041b6: ret Reverse ID: 37, ts: 404533, ins @404120: push esi Reverse ID: 38, ts: 404539, ins @404196: mov eax, [ebp+0x8] Reverse ID: 39, ts: 404542, ins @4041b2: mov eax, [eax+0x10] Reverse ID: 40, ts: 404545, ins @40412d: and eax, 0x1F Reverse ID: 41, ts: 404551, ins @404138: jmp 0x00000007 Reverse ID: 42, ts: 404552, ins @40413f: mov al, [eax+0x4] Reverse ID: 43, ts: 404555, ins @404146: jnz 0x00000009 Reverse ID: 44, ts: 404556, ins @40414f: cmp [esi+0x18], 0x00000200 Reverse ID: 45, ts: 404557, ins @404156: jnz 0x00000017 Reverse ID: 46, ts: 404558, ins @40416d: mov ecx, [esi] Reverse ID: 47, ts: 404563, ins @404178: jmp 0x00000016 Reverse ID: 48, ts: 404564, ins @40418e: pop esi Reverse ID: 49, ts: 404566, ins @404190: ret //control link Reverse ID: 50, ts: 404567, ins @401599: pop ecx //need visit link Reverse ID: 51, ts: 404569, ins @40159d: mov [ebp-0x4], 0xFFFFFFFE //---- problem. XXXX. need visit link Reverse ID: 52, ts: 404609, ins @4015a9: mov eax, [ebp-0x1C] //ok. Reverse ID: 53, ts: 404625, ins @40102f: mov [ebp-0x4], eax //ok. Reverse ID: 54, ts: 404628, ins @401038: cmp [ebp-0x4], 0x61 //ok. Reverse ID: 55, ts: 404629, ins @40103c: jnz 0x00000011 //ok. Reverse ID: 55, ts: -4, -> SEED! ====================================== END OF reverse trace for ts: 399457 ====================================== sliceat: ts=404629 for 0x40103c (jnz ...) 3:00pm 10/19/2013 [1] Check timestamp 404569, it is not added in slice at all [2] check why it's in reverse_trace. check why it's added to 404567's reverse trace 404567 adds 404569 as the reverse pointer because of the setNeedVisit link 404569 reaches 404567 because of control link ok. [3] check why 404569's reverse pointer points to 404609. OK. So 404569 is set as control visit because 404570 call... is skipped, and it has to be hit [4] make improvement to processFunction. [1] fix the add order [10 min] [2] fix the bOutEntireDependee [10 min] Need more elaboration [5] Add ReverseTrace Link Type. [30 min] [1] add. [5 min] DONE [2] clear in_slice_flags for InstrInfo as well. DONE. [2] fix all syntax errors. [20 min] DONE. [3] serialization [20 min] DONE. [4] unit test [15 min] DONE. --- TO DO PROESSFUNCTION!!!!!!!!!!!!! 9:00AM 10/20/2013 ------------------------------------------------------------------------------- Task 167: improve the processFunction ------------------------------------------------------------------------------- [1] algorithm design [1 hr] DONE [2] refactor call entry. [15 min] DONE. [3] declare and use checkFunctionNoChangeOnESPEBP(tsEntry, tsRet) [15 min] DONE. 11:00AM [4] implement checkFunctionNoChange [20 min] DONE. [5] verify printf() getchar() do not change esp/ep [10 min] DONE. [6] algorithm design checkDependee(). [20 min] scan backward if is needed for mem mark and directly return if is needed for reg, check if its reg is delayable [7] update callRetRecord. [50 min] DONE. [7.1] add an array of registers to protect and a counter [5 min] DONE [7.2] add method addRegProtected [8 min] DONE [7.3] add method isRegprotected [5 min] DONE. [7.3] update serialization and unit test it [20 min] 10:30AM 10/21/2013 [8] add the following to InstrInfo [1] hasExactlyOneRegOperand(bool bAsRead) [20 min] get the insn, and then check all in/out records. [2] unit test, provide a list of sample instructions [40 min] DONE. 7:30AM 10/22/2013 [2.5] re-implement hasExactlyOneRegOp and unit test it [1.5 hr] very trick case, blame on the bad design of libdias. 10:00AM [3] Trace::isAccessOneRegFromMem(long ts, bool bRead), readFromMem, writeToMem [3.0] memRange::getTotalSize() [10 min] DONE. [3.0] InstrExecRecorder::getWriteMemSize, getReadMemSize [10 min] DONE. [3.1] implement is AccessOneReg [15 min] DONE. chech has on reg, one writeMem or one ReadMem, and check hasExactlyOneregOperand [3.2] implement read and writeOneReg [8 min] DONE [3.3] testTrace:: constructSampleCall [40 min] 11:30AM [3.35] debug the constructSampleCall [45 min] [3.4] unit test isReadRegFromMem and isWriteRegFromMem [20 min] DONe 8:00PM [4] Trace::collectRegProtected [30 min] DONE scan forward 20 instructions for each ts if isWriteOneregFromMem get reg, mem addr and save to arrRegStoreAddr scan backward 20 instructions for each ts if isReadOneRegFromMem verify if ok, update the CallRetRecord 9:45AM 10/23/2013 [5] call collectRegProtected in collect call and debug it [45 min] [5.1] code inspection and make the changehe [DONE] [5.2] debug through the collectReg [DONE] [6] regenerate the full trace and check the registers protected by printf. [1] observe printf in winxp [10 min] It does not protect any register except ebp. 1:30PM [7] bug fix: need to redo the getOnlyRegs of an instrcution, change its parameters to InstrInfo itself. [7.1] Algorithm Design [30 min] DONE. [7.2] Modify InstrInfo::hasExactlyOneRegOperand and add a parameter setReg [15 min] DONE. [7.3] update the algorithm for func_check_operand [15 min] - skipped no need. DONE [7.4] update the algorithm of getOnlyOneReg --> call some function in InstrInfo.cc [20 min] DONE. [7.4] update the algorithm of collectRegProtected [10 min] [7.5] debugging [15 min] sliceat: ts=404571 for 0x40103c (jnz ...) printf call: 392678, 399409 Debug plan: [1] b Trace::gen and set the ts, and then break on Trace::collectRegProtected and conditional branch on 399467 [a] fix one bug in match call ID. [b] fix algorithm of first visit. [c] printf should protect EBP, EBX, ESI, EDI but the algorithm did not find it. See if increase the search range could help. --> 100. Now works!!! 4:30Pm [6] update the algorithm of Trace::hasDependeeInFunctionBody() [6.0] algorithm design [20 min] OK. [6.1] implement Trace::isFunctionProtectingReg() [15 min] .DONE. [6.2] call isFunctionProtectingReg in hasDependeeInFunctionBody() [10 min] DONE. [6.3] Debugging [1] fix the bug on check on ESP/EBP. [OK] [2] fix unhandled case for eip: 404171. [6.5] check on printf(), it seems that it still has that memory dependency problem on fs:[0]. has data dependency on 399399. 7:30PM [7] update the algorithm of processFunction [7.0] declare an unsigned int inSliceCount for InstrInfo, and remove the bMultiOccurance tag in InstrInfo [10 min] DONE. [7.1] update the Trace::setInSlice(long long int ts) [15 min] DONE. [7.2] declare unmark_inslice() for IER and II, but keep reverse pointer [15 min] DONE. [7.3] define Trace::unmark_inslice(long long int ts), clear the slice tag for IER, and reduce one count on the InstrInfo, and if it reaches 0, clear the slice tag for InstrInfo. The reason to keep the reverse_pointer is just in case it is to forward the dependency [10 min] DONE. [7.4] declare delayDependencyForFunction(long long int tsEntry, long long int tsRet) [5 min] DONE. 9:00AM 10/24/2013 [7.5] call delayRegDependency() in processFunction [30 min] DONE. [7.6] implement delayRegDepenndency().[30 min] DONE. 10:00AM [7.7] debug delayRegDependency [20 min] sliceat: ts=404571 for 0x40103c (jnz ...) printf call: 392678, 399409 [problem 1] has to serialize countInSlice. [7.8] debug processFunction [30 min] [problem 1] fix handling of other types of data dependency. DONE. [problem 2] missing updateCache(). DONE. 11:04AM [7.9] problem: 399409 is included as the last timestamp of SOC and is included. Fix the identifySOC. DONE. [7.10] check why the last ret 399409 is inSlice. (reverse link points to 399410 for control link). caused by memlink 399499. [7.12] fix bug tsEntry<tsSOCStart, 322391 to 333791 [7.12] dump the dependency of 399399 modify reverse_trace DONE. Dump below: printf call: 392678, 399409 7:30PM Check the trace [1] Problem: 404708 is included in slice. It's greater than the sliceat point. it is caused by 404571 has 40 times of access (starnge) must be something wrong with serialization. Regenerate the raw, full, and branch trace. Fixed [2] Problem 2: crashed at complaining not descending order. bp on full_slice tsStart==289491. When processing 289490 (searching for tsEnd of SOC), strangely it did not find 289491. Still the problem of visitedOnce ----- ---------------------------------------------- 7:30AM 10/25/2013 Continue on the descending order problem. Conjecture: problem is with the countInSlice number Sliceat: Timestamp: 404737 0x40103c printf (392684 @401022 --> 399409 @0x40112b) Still the problem is 399399. Fixed the problem, but now the algorithm works very slow. Now dumps below: ====================================== reverse trace for ts: 399399 ====================================== Reverse ID: 0, ts: 399399, Type: MEM_LINK ins @402938: mov fs:[], ecx ------------ printf finishes at 399409 ------------- Reverse ID: 1, ts: 399422, Type: ESP_LINK ins @4028f5: push fs:[] //OK. Reverse ID: 2, ts: 399426, Type: ESP_LINK ins @402908: sub esp, eax Reverse ID: 3, ts: 399427, Type: ESP_LINK ins @40290a: push ebx Reverse ID: 4, ts: 399428, Type: ESP_LINK ins @40290b: push esi Reverse ID: 5, ts: 399429, Type: ESP_LINK ins @40290c: push edi Reverse ID: 6, ts: 399433, Type: ESP_LINK ins @402917: push eax Reverse ID: 7, ts: 399435, Type: MEM_LINK ins @40291b: push [ebp-0x8] Reverse ID: 8, ts: 399441, Type: CONTROL_LINK ins @402934: ret ..------------------------------------------------------------------------------------- 8:30AM Problem: 399441 should match 399420 so that the entire call is not processed. [1] verify using Windows XP 399420: @4014d0 call seh_prolog4 399441: return to @4014d5 It establishes a new exception handler and adjusts esp (enlarges stack frame). It pushes fs:[] to let the new exception handler to point to the existing SEH handler. Algorithm Discussion: for instructions modifying fs, send alert and record the fs:[0] address. For instruction modifying fs:[0], send event and entail information of the new fs:[0] value. When processing function, if an instruction is modifying fs:[0] value and it is being dependent on memory link, unmark the instruction and add a delayed link. 10:00AM ------------------------------------------------------------------------------- Task 168: fix call/ret pairing routine ------------------------------------------------------------------------------- [1] debug and set conditional bp on 399441 [15 min] [2] remove the last statement in the function [10 min] [3] causes desc order check to fail. Temporarily enable it after we fix the push fs:[] issue. 10:20AM ------------------------------------------------------------------------------- Task 169: identify fs preserving function and delay handling ------------------------------------------------------------------------------- [1] Algorithm Design [25 min] 11:00AM [2] Capture FS modifying. [2.1] define FS_0 value in env [8 min] DONE. [2.2] defines event NEW_FS_0 [10 min] DONE. [2.3] in process instruction in ops_sse.h, whenever FS_0 value changes from the older one, send an event [15 min] DONE [2.4] fix TraceManager, Trace handle event. in Trace add FS_0 value and set it to new value. [20 min] DONE. [2.5] debug and get the new FS value. should be 7ffdxxxx range. [15 min] bug1. it should be initialized [ DONE]. verified. fs0 set. [DONE] [2.6] fix the old problem of non existing page map again. pagemap is 0 again. --> fixed SIMPLEY REBUILT THE ENTIRE SYSTEM AGAIN. [2.7] observation: FS_0 ACTUALLY NEVER CHANGES!!! 3:40PM [3] Capture re-writing of SEH handler. [1] declare class FSChangeRecord. in Trace declare a Cache that keep the record of fs0, name it histChangeFS0, record the ts and the value of SEH [30 min] DONE [2] code inspection and unit test FSChangeRecord [20 min]. DONE. [3] add FCR. [10 min] DONE. 7:30PM [1] in helper_trace_mem, if the write addr is fs_0, set the flag to collect fs:[0] [15 min] DONE. [0] in ops_sse.h, declare a flag bCollectFS0Content. [5 min] DONE. [2] in helper_trace_mem set the flag. [10 min] DONE. [2] in helper_trace2, if the flag to collect fs:[0] is set, unmark the flag, collect the value and send the event to Trace [15 min] [1] declare event resetSEH [10 min] DONE. [2] in help_trace2 send the event [10 min] DONE. [2.5] set the flags in helper_trace_mem [5 min] DONE. [3] BatchAnalyzer to Trace, handle_reset_SEH [15 min] DONE. [4] debug and verify Trace has the event [10 min] NOT WORKING 8:30AM 10/26/2013 Redesign Algorithm [45 min] DONE 9:30AM [1] in helper_trace_mem if the write addr is fs_0, send an event for SEH_CHANGE_ALERT [1] in event.h define the event [10 min] DONE. [2] in helper_trace_mem send the event [10 min] DONE. [3] in BatchAnalyzer, TraceManager, and Trace handle the event [15 min] DONE. [3.5] in BatchAnzlyer declare two functions for managing NEED_SEH. [5 min] DONE. [4] in Trace::handle_seh_change_alert call BatchAnalyzer::setNeedsSEH increase a counter [10 min] DONE. [5] in handle.h declare function isNeedSEH [5 min] DONE. [6] debug and capture the SEH_CHANGE_ALERT [15 min] DONE. 10:40AM [2] collect the SEH value [0] algorithm design [15 min] DONE [1] in helper_trace2 check if SEH is needed, if it is needed send the event (cr3, eip, value) [10 min] DONE [2] in BatchAnalyzer, TraceManager handle the event [10 min] [DONE] [3] Trace handle the event, check (eip, value) if it is as expected [15 min] [DONE] [4] continue the logic, push it into FCS record [10 min] [DONE] [5] debug through [15 min] [1] should collect only when in record mode. DONE. [2] collect raw trace. DONE. 12:00PM [3] construct full trace. [3.1] in Trace::constructFullTraceFromRaw Trace, simply set the FCR to the right path [10 min] DONE [3.15] in Trace::expandFromRow add a log message if a instruction is saving SEH. [10 min] DONE. [3.2] debug and test [10 min] DONE ------------------- TO do. [3.3] do the samething for loadFullTrace [10 min] DONE. [3.4] develop Trace::isFunctionPreserveSEH(tsCall, tsRet, idSEHHint) [30 min] ------------------------------- 7:30 10/28/2013 [1] update the search of SEH records [20 min] DONE. [2] call searchforseh in setupCallTable [20 min] [2.1] in CallRetRecord add a flag and the update function [10 min] DONE. [2.2] fix the serialization problem in unit test. [20 min] DONE. 8:30AM [2.2] call it in setupCallTable [30 min] DONE [2.3] finish the searchSEH [30 min] [DONE] [2.4] code inspection [30 min] 10:30AM [2.2] debug through isPreserveFunctionSEH [1] fix loadFullTrace bug, pathSCR not set. [15 min] [2] fix loop bug [10 min] [3] fix the loop logic. [15 min] [4] fix the appendRecord error [30 min] try call resetToLast(). not working. debug and check last 3 appends. Observation: broke and cache block size. problem: search for all did not reset it. Now seg fault. --- strange, cannot find out the problem. ------------ need bp later. 7:30PM [4.1] recompile, rebuild, and regenerate the full trace. [4.2]. still broke at 98166, use watch point to find out problem. loadCache 3797 problem. It's the serialization problem. 3817 is already not right. [4.3] try call resetToLast() in append() and see if it works. [4.4] this was overwritten. check the logic of Cache::loadBlock --- debug -- check the contets of 3817 and check when it is written to disk. 8:30AM 10/20/2013 Continue the debug. [1] b Trace.cc:360 if retID==98083 [2] load ccr.loadFromCache(3817) and see if it is messed. confirmed it's messed [3] b Trace.cc:360 if retID==97747 and repeat 2 see if it's mess.ed. verified it's messed So we need to debug and set watch point and see how it's messed [4] set a breakpoint at ccr.appendToCache when callID is 3817 and retrieve where it is stored size is 34 content is: 0xbfffda8c: 0x013c8401 0x00000000 0x12fa4800 0xffffff00 0xbfffda9c: 0xffffffff 0x919b78ff 0x000ee97c 0xffff0000 It is appended in the last position (3816). It is stured at: offset 26578, 0x6640f7da (curBlockID is 3) (gdb) p this->block $36 = 0x66409008 "\001k\001\001" (gdb) p this->curBlockSize $37 = 26578 (gdb) p this->block + this->curBlockSize $38 = 0x6640f7da "" [5] check when it is saved, if it's the same content. and check when it's loaded. 1. the first time it's saved it's fine. First time it's loaded it's ok. 2. multiple loads and save it's ok. 3. check if after an updateCache it's changed. It's still fine. 4. set a watch point (blockID=3 it should imple that at location 26578 the content is 0x013c8401 watch this->curBlockID!=3 || this->block[26578]--0x013c8401 does not work, only captured that when it is loaded, it's messed. 5. try to locate the last Cache::writeCurrentBlockToDisk. Insert the check at the beginning and end of writeCurrentBlockToDisk(). Findings: first throwing error at 97693. 6. b Trace.cc:413 if i==97693 and check what's going on. Delve into searchForCall, the stack has 11 calls in it. display this->callTable->test() on each iteration. After the loop it is fine. Found that after line 450 the append call it messed. 450 long long int cid = ccr.appendToCache(this->callTable); 7. to repeat. [1] b Trace.cc:413 if i==97693 and then [2] b 450 test it before and after. VERIFIED. now check why it performs like this. 8:45Ma 10/30/2013 [1] Repeat the experiment 7: [15 min] 7. to repeat. [1] b Trace.cc:413 if i==97693 and then [2] b 450 test it before and after. VERIFIED. now check why it performs like this. [2] debug into the last appendRecord don't see any difference [15 min] [3] do a comparative study of callTable->test() [45 min] [1] before: in saveCurBlockToDisk, startIdx is 39640, nxtRecordInBlock is 679. posOfSizeIndex: 25106, curBlockSize: 25098. for the block to load, startIdx is 39540. Does not look right: only 100 bytes of difference? 1st blow is shown below: (gdb) x/8wx this->block 0x66209008: 0x01016b01 0x00000000 0x12f9ac00 0x01017800 0x66209018: 0x00000000 0x9105c800 0x000c007c 0xffff0000 [2] after: Same Verified, the distance of the records are too close to each other, and the later writes overwrites the earlier, which messes up. [4] check how vecStartIdx is loaded. [30 min] break on Cache.cc:158 Observation: when it is working in the LAST block (still not reaching the full capacity). It seems that vecBlockSize[] is not updated correctly. 10:45AM [5] introduce a function: updateLastBlockIdxSize() [20 min] DONE. if the idx of next block is already in vec, update the size of idx of next block. [6] debug the the updateLastBlockIdxSize() [30 min] Fixed the problem [7] remove the test functions. 11:45AM [8] now pick up where we stopped. check the isFunctionPreserveSEH call in it in setupCallTable BP on 451 and display. seems ok. [9] check the read of 0xFFFFFFFF problem. b ops_sse.h:2419 it seems that when reading 0xfddf0000 (in kernel mode) it's always return 0xFFFFFFFF. [10] Sliceat: Timestamp: 404629 0x40103c printf (392742 @401022 --> 399467 @0x40112b) set BP at 451 if i==399467 verified it's true [11] slice. Trace Size: 455751, in slice: 136261, Percentage: 29.90% Instruction Store Size: 48115, in slice: 13929, Percentage: 28.949392% Instruction Store Size (excluding imported DLL): 3247, in slice: 2225, Percentage: 68.524792% Printf is still there. [12] check the fs reading instruction is still there. (timestamp: 399457 @eip: 0x402938). 9:00AM 10/31/2013 ------------------------------------------------------------------------------- Task 170: Examine the printf in slice again. ------------------------------------------------------------------------------- [1] run it again. [30 min] Sliceat: Timestamp: 404629 0x40103c printf (392742 @401022 --> 399467 @0x40112b) set BP at 451 if i==399467 verified it's true The fs instruction is at is 399457 [2] Algorithm design: Trace full_slice. [15 min] 9:45AM [3] Implementation: [3.0] modify getCallEntry and retrive the SEH [15 min] DONE. [3.1] Modify ::hasDataDependency. First check if the function is preservingSEH, and then check ier if it is writing to seh. [20 min] DONE. [3.2] Define delayMemSEHReference() [1 hr]. 11:30AM [3.3] call delayMemSEHReference [30 min] DONE. --- debug --- Sliceat: Timestamp: 404629 0x40103c printf (392742 @401022 --> 399467 @0x40112b) set BP at 451 if i==399467 verified it's true The fs instruction is at is 399457 [3.3] Debug into getCallEntry [10 min] OK. DONE. [3.4] debug into hasDataDependency [10 min] b Trace.cc:1091 [3.4.1] declare Trace::isWriteToSEH(long long int ts, sehHint) [3.4.2] modify the hasDataDependendee and take sehHint [10 min] DONE [3.5] debug into isWriteToSEH. DONE. 7:30PM [3.6] debug delayMemSEHRefernce [40 min] DONE. [3.7] Fix the VecSOC desc order problem. failed at addTS ts=387378 bp on ts==387379 Problem with 387641 its countInSlice is changed and not to be used as a boundary any more. *** need to rething about the SOC identification and merging algorithm. 8:30AM 11/1/2013 [3.8] check the addSOC algorithm again. [1] read the algorithm. [30 MIN] [2] modify the algorithm. [30 min] 10:30AM [3.9] debut the addSOC algorithm [1] change insertSOC to return the resulting SOC. [20 min] DONE. [2] fixed the copyFrom issue [15 min] DONE. [3] check the first couple of add. [15 min] [4] fix the merge algorithm. [15 min] 7:30PM [5] check why it's violating the reverse order again. Last add: 387657. bp on IT. Found the problem, <= problem. mergeID should be socIdx+1. [6] improve the <= problem in findSOCToInsert. [20 min] [7] fix bridge problem. [15 min] [8] fix the check desc order problem. [5] check line 71 (did not hit [3.6] debug processFunction for printf [20 min] 7:30AM 11/02/2013 ------------------------------------------------------------------------------- Task 171: improve speed. ------------------------------------------------------------------------------- [1] add a bool flag to full_slice_all_soc() when flag is true, check modified soc only [15 min] [2] test and run [15 min] DONE. 8:30AM ------------------------------------------------------------------------------- Task 172: check printf again. ------------------------------------------------------------------------------- printf (392742 @401022 --> 399467 @0x40112b) set BP at 451 if i==399467 verified it's true The fs instruction is at is 399457 [1] observe processFunction and see what causes function printf is included. [1 hr] dependency: [1] 399457 (seh writing) skipped ok. [2] 399223 (also seh writing. skipped ok. [3] 399186 Problem: timeStamp: 400935, ins @804d917e: xadd [ecx], eax read: (start: 0xe1339748, end: 0xe133974b) write: (start: 0xe1339748, end: 0xe133974b) , DEPLINKS: , R: 400934 , M: 399186 Also 398592, It seems only these two verify it later. [2] modify the program so that it print out all violations in process functions. [30 min] Observation: it has a lot of unknown dependency. Take some examples and study them. [1] 399121, it is introduced by 399122 on reg dependency. Got to add a limit that restrict the reverse_pointer inside the function. There are over 20 memory dependency between scanf and printf, as shown below: - has mem dependency at 399186 -- has mem dependency at 399156 -- has mem dependency at 399122 ... -- has mem dependency at 394560 -- has mem dependency at 394405 [2] study these dependency and see if we can remove any -- has mem dependency at 399186 -- depended by internal syscall instructions (intenral) -- has mem dependency at 399156 -- internal -- has mem dependency at 399122 -- internal -- has mem dependency at 399092 -- internal -- has mem dependency at 399051 -- internal (lock) -- has mem dependency at 399038 -- internal -- has mem dependency at 398790 -- internal -- has mem dependency at 398774 -- internal -- has mem dependency at 398674 -- internal (looks like a lock inc and dec) -- has mem dependency at 398592 -- internal -- has reg dependency at 398586 -- INTERNAL REG DEPENDENCY ON CR0!!! (switch cr0 and back and forth) -- has mem dependency at 398551 -- some global var internal -- has mem dependency at 398546 -- internal -- has mem dependency at 398501 -- internal -- has mem dependency at 398500 -- internal -- has mem dependency at 398360 -- internal -- has mem dependency at 398359 -- internal -- has mem dependency at 398338 -- internal -- has mem dependency at 398182 -- internal looks like lock -- has mem dependency at 397988 -- internal -- has mem dependency at 396514 -- internal -- has mem dependency at 396507 -- internal -- has mem dependency at 396504 -- internal look like a counter -- has mem dependency at 396501 -- internal -- has mem dependency at 395364 -- ins @402543: mov [ebp-0x211], al !!! looks like preparing some internal data strucutes, but it is read by scanf -- has mem dependency at 395359 -- *** also in range @4025xx -- has mem dependency at 394560 -- ***** depended by 400579 check why.??? -- has mem dependency at 394405 -- similar to above -- has mem dependency at 394403 -- similar to above -- has mem dependency at 394274 *** similar to above but in @7crange -- has mem dependency at 394170 *** similar -- has mem dependency at 394166 *** similar -- has mem dependency at 392979 *** [3] start winxp and check 395359 [30 min] timeStamp: 395359, ins @40192a: inc [esi] read: (start: 0x12fcfc, end: 0x12fcff) write: (start: 0x12fcfc, end: 0x12fcff) , DEPLINKS: , R: 395346 , M: 395272 , C: 395358 ESP: 0x12fc98 EBP: 0x12ff20 Observation: 0x40192a is visited multiple times. It's part of the write_char, clearly [esi] is the counter of the number of characters written mem addr of the counter is 0x0012fcfc (when printf() is finished it has counter value 9 - 9 chars printed). During the call of getchar, 0x0012fcfc is overwritten with some value 982b0000. In the other printf, it is cleared to 0 again and used as a counter. When calling getchar, the esp is 0x0012ff70 (higher than 0x0012fcfc). So the getchar does use the 0x12fcfc as the temp local stack frame and passes somehow the region to the syscall. Check the dependee: imeStamp: 400579, ins @80578677: repz movs es:[edi], ds:[esi] The first related @7c... instruction is: 0x7c91ec82, it is verified that after the syscall, the area of 0x0012fcfc is modified. They are copied somehow to kernel buffer, but it does not seem to be useful to me here??? Notice that 400579 also depends on other memory bytes in the same region. Clearly, all of them belong to the message structure passed by CsrClientCallServer. 0x0012FCEC is the message passed to CsrClientCallServer (the message structure). And clearly 0x0012FCFC is in some type of union structure and included as extra bytes. The kernel routine then blindlessly first copies the guy from user stack to kernel buffer, without actually using it. Observation: to verify check how 0x0012fcfc is used. At 400579, copy range is shown as below read: (start: 0x12fcec, end: 0x12fd03) write: (start: 0xf74dbcc4, end: 0xf74dbcdb) ==> 0x12fcfc is copied to 0xf74dbcd4. Then it is accessed by TimeStamp: 400860, ins @8056a652: movs es:[edi], ds:[esi] read: (start: 0xf74dbcd4, end: 0xf74dbcd7) write: (start: 0xe117fac0, end: 0xe117fac3) --> never used. and also: timeStamp: 403065, ins @8056a652: movs es:[edi], ds:[esi] read: (start: 0xf74dbcd4, end: 0xf74dbcd7) write: (start: 0xe117fac0, end: 0xe117fac3) --> never used timeStamp: 406070, ins @8056a652: movs es:[edi], ds:[esi] read: (start: 0xf74dbcd4, end: 0xf74dbcd7) write: (start: 0xe117fac0, end: 0xe117fac3) --> never used It seems that they will not impact user code! Then it is overwritten by a push instruction. *************************************8 note: get_char is from 399469 to 404625. ************************************** Task 2: observe 399186. 400935, ins @804d917e: xadd [ecx], eax read: (start: 0xe1339748, end: 0xe133974b) Seems to be a lock, always +1/-1 in the kernel code section. Check how it's included. 400935 is added because the entire function from 400463 to 402174 is added (It's a KiFastSysCall), it seems that sysenter/sysexit needs processing (protect registers). ===================================================================================================== 9:35AM 11/03/2013 ********************** 9 [1] double check how 395359 is included: generate the reverse_trace. The problem (previously analyzed): unused stack contents part of the union of the request message sent by CsrClientRequestServer. It is copied to kernel buffer, however, actually never used. ====================================== Reverse ID: 0, ts: 395359, Type: NEED_VISIT ins @40192a: inc [esi] Reverse ID: 1, ts: 400579, Type: MEM_LINK ins @80578677: repz movs es:[edi], ds:[esi] **** copied the entire request_message structure where 395359 modifed part is actually is not used **** the part is 0x12fcfc Reverse ID: 2, ts: 400840, Type: REG_LINK ins @8056a621: lods eax, ds:[esi] *** the contents is essentially from 0x12fcec (the first word) this determines the message type Reverse ID: 3, ts: 400842, Type: REG_LINK ins @8056a623: lea ecx, [eax+0x3] *** use it as a pointer (offset)? -- anyway part of the message parsing Reverse ID: 4, ts: 400843, Type: REG_LINK ins @8056a626: and ecx, 0x0000FFFC *** still part of mesage parsing Reverse ID: 5, ts: 400844, Type: REG_LINK ins @8056a62c: shr ecx, 0x02 *** still part of message parsing Reverse ID: 6, ts: 400863, Type: MEM_LINK ins @8056a658: repz movs es:[edi], ds:[esi] *** So this actually determines the copy size, rlies on 40844 by copying its contents Reverse ID: 7, ts: 401872, Type: MEM_LINK ins @8056a658: repz movs es:[edi], ds:[esi] *** still copying contents from that one Reverse ID: 8, ts: 402192, Type: MEM_LINK ins @7c91eb96: sub [ecx], edi *** Reverse ID: 9, ts: 402213, Type: REG_LINK ins @7c8715f8: mov esi, [ebp-0x38] *** the above is to determine the buffer location Reverse ID: 10, ts: 402220, Type: MEM_LINK ins @7c87160d: repz movs es:[edi], ds:[esi] *** the above is to copy the I/O reading contents Reverse ID: 11, ts: 404386, Type: REG_LINK ins @409184: mov al, [ecx] Reverse ID: 12, ts: 404391, Type: MEM_LINK ins @409192: mov [ebx], al Reverse ID: 13, ts: 404560, Type: REG_LINK ins @404172: movzx eax, [ecx] Reverse ID: 14, ts: 404568, Type: MEM_LINK ins @40159a: mov [ebp-0x1C], eax Reverse ID: 15, ts: 404609, Type: REG_LINK ins @4015a9: mov eax, [ebp-0x1C] Reverse ID: 16, ts: 404625, Type: MEM_LINK ins @40102f: mov [ebp-0x4], eax Reverse ID: 17, ts: 404628, Type: NEED_VISIT ins @401038: cmp [ebp-0x4], 0x61 ============================= ********************************************************************************************************* ***!!! Summary (1) : again, verified the problem is the inaccurate of the movb instruction, part of the data is actually never used. In processing 400840 depends on 400579, it should piggy back the information about the address that it is dependent on. When 400579 selects the next dependee, it will wisely choose which one. **** [1] in the toProcess array, in addition to the ts specify the reason why it is needed (reg or mem address to be read). Then depends on the data_needed, add the corresonding link. For examle, at 400579, if it is said the address 0x12fcfc is needed, then it looks at the depend link and pick the one. For example, for a PUSH EAX register, if it is needed because of the value, then both EAX AND ESP should be included, if only ESP is included, then only ESP is included. Could declare a Trace::dispatchDependency(Queue&, ts, DataSource) then based on the instruction info, dispatch the address dependency. << --- Task 2: observe 399186. 400935, ins @804d917e: xadd [ecx], eax read: (start: 0xe1339748, end: 0xe133974b) Seems to be a lock, always +1/-1 in the kernel code section. Check how it's included. Check 400935. Strangely there is no reverse trace for 400935. 400935 is added because the entire function 400463 to 402174 is added. Manually constructed trace: 400935 --> 400936 (no where) 400935 --> 402100 (similary xadd) [local dependency goes nowhere] ->403140 (also added because entire function added) 403140 --> 404222 (this is the last xadd included, there are other xadd, they are not included, maybe because they are after 404629 the slicing point) 404222->404226-->...404231 Now the question is why 404222 is included. 404225 (jz) has indirect data dependency on 404222 set BP on 404225. It's a conditional jump so it needs data propagation. Think .... 8:30AM 11/04/2013 [1] Analysis and algorithm design. [1.5 hmin] Apparently, the data itself it not affected by any user level data (could user level code conttrol flow affect its value -- it's a question.) It's not directly impact any user level data, but it's infect the path inside the function call. Our question formally should be framed as: if the function call is replaced with NOP, would it affect the execution of the dependee???? This depends on how many memory references we need to check. Exam of all dependencies below -- has mem dependency at 399186 -- depended by internal syscall instructions (intenral) -- has mem dependency at 399156 -- internal (mem addr e1126480) xadd lock like -- has mem dependency at 399122 -- internal (mem addr e117f798) xadd lock like, similar -- has mem dependency at 399092 -- internal (mem addr 0x81d803c0) xadd lock like -- has mem dependency at 399051 -- internal (lock 0x800ca300) -- has mem dependency at 399038 -- internal (0x8055a540) look like counter of lock -- has mem dependency at 398790 -- internal (0x81d80447) -- has mem dependency at 398774 -- internal (0x81d805d0) -- has mem dependency at 398674 -- internal (looks like a lock inc and dec) -- has mem dependency at 398592 -- internal (look like a global const, no write 0x80042004) -- has reg dependency at 398586 -- INTERNAL REG DEPENDENCY ON CR0!!! (switch cr0 and back and forth) -- preverse CR0, may add CR0 to preserve register -- will not solve the problem. -- NEED TO CHECK LATER!!!! -- has mem dependency at 398551 -- some global var internal (0x8055196c) -- has mem dependency at 398546 -- internal (0x8055196c) -- has mem dependency at 398501 -- internal -- has mem dependency at 398500 -- internal -- has mem dependency at 398360 -- internal -- has mem dependency at 398359 -- internal -- has mem dependency at 398338 -- internal -- has mem dependency at 398182 -- internal looks like lock -- has mem dependency at 397988 -- internal -- has mem dependency at 396514 -- internal -- has mem dependency at 396507 -- internal -- has mem dependency at 396504 -- internal look like a counter -- has mem dependency at 396501 -- internal -- has mem dependency at 395364 -- ins @402543: mov [ebp-0x211], al !!! looks like preparing some internal data strucutes, but it is read by scanf -- has mem dependency at 395359 -- *** also in range @4025xx -- has mem dependency at 394560 -- ***** depended by 400579 check why.??? -- has mem dependency at 394405 -- similar to above -- has mem dependency at 394403 -- similar to above -- has mem dependency at 394274 *** similar to above but in @7crange -- has mem dependency at 394170 *** similar -- has mem dependency at 394166 *** similar -- has mem dependency at 392979 *** ---------------------------------------------------------------- Solution???? hard. [1] register protection list. Add cr0. Need to actually collect cr0 value. This is doable printf (392742 @401022 --> 399467 @0x40112b) [2] in-accuracy problem. solvable. [3] internal lock problem (hard to solve) - ok. call printf acquire(lock) ... release(lock) ret call scanf acquire(lock) ... release(lock) ret If the value of lock is preserved, then we can skip the printf. Then we need a two-pass process, first we slice analyze and record the [1] the instructions that write to lock [releases] [2] search along the call and find those that first write to lock [acquires] [3] record these two addresses (instructions) In the second pass, whenever these instructions are encounterd, record the memory content, and patch it to instrrecord. In the slice analyze, we could add the address of memory to call/ret pair that is preserved. [4] internal data structures that ARE modified during the call. Solvable. E.g., 396654. It could be some global counter, of running stats collector. It has no influence on user code, however, its value is affected indirectly (control dependency, because one more call would add 1 to the counter); it does not have any influcence on on the user code (i.e., has no real data dependency), and does not really have any control influence. In this case, any prioir value (to be depended on) would not affect any of the control path or user code execution. This is also solvable. NOTE *** printf (392742 @401022 --> 399467 @0x40112b) Another Example: 396514. It looks like a part of some data structure. (a pointer). Check its data source 396514, ins @804dc315: mov [eax], ecx, write: (start: 0x81f1e880, end: 0x81f1e883) 1. addr eax is from esi @ 396459 --> pop esi --> 396450 push esi -> 396420 ecx -> eax -> 396411 -> 0x81fde888 -> 299218. ... lost trace. seems not related to user code 2. content ecx is from ->396509 (0xf74dbc68) -> ebp-10 (seems to be a reserved memory buffer address in local buffer). Check its influence: 396514. 396530->396577->396580 (jnz) might affect control flow. !!! Check why it's used at 403329: timeStamp: 403329, ins @804dc256: cmp [ecx+0x60], 0x00 it's used by a jz. ecx is from: it's from some global memory addr. It may be preserved by the function though. _________________________________________________________________ _________________________________________________________________ [1] Implementation Plan: Solve the address preservation problem first, and then accuracy, and then the register preservation cr0. 9:00 11/05/2013 ------------------------------------------------------------------------------- Task 173: Check Function Address Resolution (estimated 8 hrs of work). Will solve register and address preservation. ------------------------------------------------------------------------------- [1] define class recordRequestProcessor, and serialization[90 min] instrAddr flag for memwrite or register_value short register_value keep an internal cache. 7:30AM 11/06/2013 [2] experiment with mem addr reading, find a function call (printf) and read out the value of addresses at the entry and exit. DONE. NOTE *** printf (392742 @401022 --> 399467 @0x40112b) ts: 392742: @401022 399156 -- internal (mem addr e1126480) xadd lock like, @804d91b9 399122 -- internal (mem addr e117f798) xadd lock like, similar, @804e2a25 399092 -- internal (mem addr 0x81d803c0) xadd lock like, @804d91b9 399467 -- @0x40112b Idea: at any of these four EIPs, print out the values at e1126480, e117f798, and 81d803c0. Do it in ops_sse.h. 9:15AM 11/06/2013 [2.1] implementation modify ops_sse.h. [20 min] DONE. [2.2] interprest the results [15 min] DONE. Observation: [1] collection has to be done in the right context. At the beginning or end of call it does not work. [2] the values are different at each run! The first value changes 3->2. The second value can be a very large number. But 2nd and 3rd value is preserved. 11:45AM [3] define GEN_RECORD_REQUEST mode in config.txt and in BatchAnalyzer parse it [20 min] DONE. [3.1] test if GEN_PRESERVE_RECORD_REQUEST mode is set. [10 min] DONE. 7:30PM [4] modify processFunction()A [4.1] in Trace define rrProcessor (record_request) processor it should be actually static and a cache for it [5 min] DONE [4.2] in Trace create init_record_request() will remove or create a new record request. The cache path should be related to job only. (instead of trace). [25 min] DONE. [4.3] in BatchAnalyzer when starting a job, calling init_record_request [10 min] DONE. [4.4] in Trace define load_rrProcessor() to load it from cache [20 min] DONE. [4.5] in Trace constructor, load the rrProcessor. [10 min] DONE. 7:30AM 11/07/2013 [4.6] in processFunction register dependency, just create two requests for call and return instruction to add collection request. [20 min] DONE. 9:30AM [4.7] in processFunction handle the memory dependency. [4.7.1] in Trace define findLastTSWriteTo(long long int tsSearchPoint, unsigned int) [15 min] DONE. [4.7.2] update the processFunction correspondingly [15 min] DONE. 7:00PM. [4.7.3] debug into the memory handling [15 min] (1) check the "should have only one register being read" problem. DONE. Sliceat: Timestamp: 404629 0x40103c printf (392739 @401022 --> 399455 @0x40112b) temporarily disabled it. (2) check the findLatestTSWriteTo [10 min] DONE. (3) check the call in processFunction [10 min] two bugs: size and mrWrite. DONE. [4.8] re-implement the RecordRequestProcessor [4.8.1] implement RecordRequest class [1 hr] (1) data members and constructor and destructor [10 min] DONE. (2) addRegRequest(unsigned int reg) [8 min] DONE. (3) addMemRequest() [5 min] DONE. (4) serialize() [10 min] DONE. (5) deserialize [10 min] DONE. (6) code inspection [10 min] [4.8.2] RecordRequestprocessor [1] reorganize data and public interface [10 min] DONE [2] add eipToId [8 min] DONE [3] destructor [5 min] DONE. 7:30AM 11/08/2013 [4] addReg [8 min] DONE [5] addMem [8 min] DONE 8:30AM [6] saveToCache [15 min] DONE [7] loadFromCache [15 min] DONE [8] set up unit test framework [1 hr] design: mod3: 0: reg only, 1 mem only, 2 both reg and mem [8.1] fix bug in getId DONE [8.2] fix bug in loadFromCache. DON [8.3] fix deserialize bug. DONE. [8.4] fix another deserialize bug. DONE. 10:00AM [5] Misc testing [1.1] delete RecordRequestProcessor when job is completed. DONE. [1.2] fix bug in processFunction. DONE [1.3] Now the problem is the processing is greatly slowed down. Found that searchForLatestWrite is the most time consuming. Idea: in full_slice_all_soc only slices it when it is greater than the last size. Sliceat: Timestamp: 404629 0x40103c printf (392739 @401022 --> 399455 @0x40112b) * problem: find the findLatestWrite searchs for -1, it is very consuming. And there are lots of them. Idea: have a WriteLink in InstrExecRecorder. Whenever it is writing, update its writeLink to point to lastWrite based on the Cache. 11:45AM [6] Improve the findLatestWrite efficency [1] similar to arrDependLink add arrWriteLink and a counter [5 min] DONE. [2] serialization and deserialization of InstrexecReocrder and unit test it [15 min] DONE. [3] add code for handling arrWriteLink [15 min] DONE. [4] generate the full trace and test [10 min] [1] fix counter problem. [1] Problem. appendCache .. 7:30PM [2] Solve problem appendCache. save the entire cache first. DONE. [3] generate the full trace. DONE Sliceat: Timestamp: 404629 0x40103c printf (392739 @401022 --> 399455 @0x40112b) [5] update the logic of findLatestWrite and debug it [30 min] DONE. [6] debug findLatestWrite [1] fix isWriteTo [5 min] DONE. [2] Problem: did not call save_... disable the Util::error_exit Fixed. NOW gen_mem request list: --------------------- 8:30AM 11/09/2013 ------------------------------------------------------------------------------- Task 174: Continue on preservation of mem and register. ------------------------------------------------------------------------------- [0] Alrogirhtm Desiugn [0.75 hr] 9:15AM [1] define boolean flag bNextInstrSetRecordRequest, int rr_id, unsigned int reg_to_record [], int reg_count, unsigned int mem_to_record [], int size [], int mem_count [10 min] DONE. [2] in Trace::handle_instr setnextInstrSetRecordReuest [10 min] [3] debug if the setNextInstrSetRecord is ok. [80 min] [1] problem with vpage table build up again. Rebuild from scratch. Solved. [2] the Trace.cc:1534 is never hit, problem is the init_rr_processor is called. Solved. [3] brach mode error. rebuilt. [4] new stats Sliceat: Timestamp: 404629 0x40103c printf (392742 @401022 --> 399467 @0x40112b) [5] check bug on . Trace.cc:1360 (replace Error_exit with log msg). solved. [6] fix bug on save_rr. DONE [6] check if Trace.cc:1544 is hit in 0 mode. Fix the load_rr problem. NOW completely work. 11:30AM [4] in Trace::handle_instr set the record to request [20 min] [1] add protected Trace::setRecordRequest(int id) [15 min] DONE. [2] call it in Trace::handle_instr [5 min] DONE. [3] debug and verify if everything is ok. [10 min] DONE. 9:30AM 11/10/2013 [5] provide function get_record_request [30 min] [1] define isNeedRecord() and pass it to TraceManager -> Trace [20 min] DONE. [2] debug see if isNeedRecord is ever hit. [15 min] DONE. 10:45AM [3] define get_record_request(int *pEIPToCollect, int eipcollectPoint, int *pCountRegRequest, int *pCountMemRequest, unsigned int **pArrReg, unsigned int **pArrMem, int **pArrMemSize); [30 min] [4] debug both. [20 min] DONE. 12PM ======================================================= 8:45am 11/11/2013 [5] Solution for collecting register value [1] right after call the disas_insn call collect_reg_value(env, pc_ptr); [5 min] DONE. [2] declare gen_save_regs_after_instr(env, pc_ptr) similar to gen_save_esp [5 min] DONE. [3] in handle.h and handle.cc define isNeedReg(unsigned int eip, env->cr[3]) and map it to TraceManager and Trace [15 min] DONE. [4] update gen_save_regs_after_instr correspondingly [5 min] DONE. [5] debug and verify if it is ok [10 min] [5.1] fix buf_limit problem. OK. now 9:50AM [6] Fix the reg translation problem. [1] find out where R_ESP and others are defined. cpu.h [5 min] DONE. Problems now, the regs save only handles the first 6 registers. What about the others? [1hr] (1) flags is computed at run time using gen_compute_eflags(temp_reg) (2) dr registers. generated using gen_helper_movl_drN_T0 (3) cr registers. gen_helper_read_crN (4) xmm registers tcg_gen_st32_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,xmm_regs[reg].XMM_L(0))); etc. So will need a big switch case to handle the registers 11:10AM [2] define a storage of registers in env [8 min] DONE. [3] define translator table reg_part_to_whole in trace.h[15 min] DONE. [3] update RequestRecord::addRegRequest, use the part_to_whole table [10 min] DONE [4] debug addRegRequest and regenerate everything[15 min] DONE. Sliceat: Timestamp: 404658 0x40103c printf (3927771 @401022 --> 399496 @0x40112b) Generated dpeendendies: -GenRecordRequest for mem dependency at 399215 --GenRecordRequest for mem dependency at 399185 --GenRecordRequest for mem dependency at 399151 --GenRecordRequest for mem dependency at 399121 --GenRecordRequest for mem dependency at 399080 --GenRecordRequest for mem dependency at 399067 --GenRecordRequest for mem dependency at 398819 --GenRecordRequest for mem dependency at 398803 --GenRecordRequest for mem dependency at 398703 --GenRecordRequest for mem dependency at 398621 --GenRecordRequest for reg: 49 dependency at 398615 --GenRecordRequest for mem dependency at 398580 --GenRecordRequest for mem dependency at 398575 --GenRecordRequest for mem dependency at 398530 --GenRecordRequest for mem dependency at 398529 --GenRecordRequest for mem dependency at 398389 --GenRecordRequest for mem dependency at 398388 --GenRecordRequest for mem dependency at 398367 --GenRecordRequest for mem dependency at 398211 --GenRecordRequest for mem dependency at 398017 --GenRecordRequest for mem dependency at 396543 --GenRecordRequest for mem dependency at 396536 --GenRecordRequest for mem dependency at 396533 --GenRecordRequest for mem dependency at 396530 --GenRecordRequest for mem dependency at 395393 --GenRecordRequest for mem dependency at 395388 --GenRecordRequest for mem dependency at 394589 --GenRecordRequest for mem dependency at 394434 --GenRecordRequest for mem dependency at 394432 --GenRecordRequest for mem dependency at 394303 --GenRecordRequest for mem dependency at 394199 --GenRecordRequest for mem dependency at 394195 --GenRecordRequest for mem dependency at 393008 [5] define get_regs_to_record and propgate to trace [30 min] [6] update the save_regs_to_table, and call get_regs_to_record and debug it [30 min] 9:20AM 11/12/2013 [7] debug the get_regs_to_record. [problem 1]. recorded data not right. Regenerate. [20 min] Fixed [problem 2]. not the entire reg is added. fixed [problem 3]. regenerate the trace. DONE. 11:00 AM [7] big switch cases [1 hr] [7.1] handle EFLAGS [20 min] DONE (1) observe translate.c for handling of EFLAGS, read pushf [10 min] (2) add the code [10 min] [7.2] handle EAX to EDI. [10 min] [7.3] handle CR registers. [20 min] DONE. (1) observe translate.c [10 min] (2) add and debug [10 min] [7.4] fix DR register DONE. [7.5] cs to es [15 min] DONE [7.6] ldtr [15 min] observe from sldt instruction. [7.7] gdtr [10 min] observe sgdt instruction. DONE. 7:00PM [8] update the get_record_request call in ops.seh [20 min] DONE [7] debug into the record_request [15 min] [1] check first 5 mem read. DONE. [2] check register values. Problem 804b00e1. 8:30AM 11/13/2013 [8] debug problem. 0x804b00e1 is not handled in translate.c, because it is part of the OS routine, and it is already translated and in buffer before the process loaded. 9:00AM Fix: develop static version of check reg [20 min] DONE. [9] debug again, check register values. [15 min] DONE. 9:40AM [6] send the value. [6.1] create an event to in form value is coming in. [30 min] DONE. [1] add the event [8 min] DONE. [2] handle the event [12 min] DONE. [3] debug [8 min] DONE. 10:30AM [6.2] declare class RecordValueProcessor, constructor takes parameters of regCount, memCount, regValueArr, [estimate: 128 min] [6.1] add attribute rr_id_processed and copy rr_id [5 min] DONE [6.2] add and declare class RecordValueProcessor and add the .cc file [8 min] DONE [6.3] constructor and data members, add and initilize cache in Trace [25 min] DONE 11:30AM [6.4] addRecord(int rrid, int regCount, int memCount, ...) [15 min] DONE. [6.4] serializeTo [10 min] DONE. [6.5] deserializeFrom [10 min] DONE. [6.7] function getRegValue() [15 min] DONE. [6.8] function getMemValue() [15 min] DONE. [6.9] call addRecord in Trace.cc and update addRecord [10 min] DONE. [6.10] destory rvRecord and save everything [8 min] DONE. 7:30PM [6.11] call getRegValue and getMemValue at process function. [80 min] [1] add the TS information into the request [15 min] DONE. [2] add loadFromCache(long long int idx) [10 min] DONE. [2] declare RecordValueProcessor::loadByTS(ts, int idxHint) [15 min] DONE. [3] call getRegValue in processFunction [10 min] DONEj [4] call getMemValue in processFunction [10 min] DONE. ------------------ TO DO. 9:00AM 11/14/2013 ------------------------------------------------------------------------------- Task 175: Test and verify the handling of MEM and REG dependency ------------------------------------------------------------------------------- [1] unit test the getRegValue and getMemValue of recordValueProcessor. [90 min] *1. index calc. FIXED *2. reg. FIXED. *3. loading reg err. FIXED *4. expected reg value. FIXED. *5. expected mem value. FIXED. [2] check recording buf exceeded problem. Instruction 0x7c910c71 might be copying more than 6k bytes!!! At this moment, we could only add a limit. timeStamp: 11760, ins @7c910c71: repz stos es:[edi], eax write: (start: 0x140688, end: 0x141e87) , DEPLINKS: , R: 11735 , R: 11757 , R: 11759 Sovled. Capped at 0.5k. [3] check how is recordValueProcessor saved. break on Trace destructor. b Trace.cc:232 recorded about 29k records. Data recorded about 1.3MB. Instruction exec history: about 14MB. [4] check full trace how recordValueProcessor is loaded. b Trace.cc:80 DONE. [5] check branch trace how recordValueProcessor is loaded. did not hit. check when the trace is constructed. Sliceat: Timestamp: 404664 0x40103c printf (392771 @401022 --> 399496 @0x40112b) [6] problem: loadTS error. At first function call. search problem. 404659 is not included but 404660 and 404658 is included. It seems that the recording of rr_ts is wrong. 8:30PM Fix in Trace.cc (rr_id assignment + 1) recollect raw trace, full trace, and branch. *************************** Sliceat: Timestamp: 404569 0x40103c printf (392682 @401022 --> 399407 @0x40112b) [7] fix the register conversion problem. FIXED. [8] cannot find record for 399408. fix eipcall-1. regenerate raw, full, branch trace. from record mode. 9:00AM 11/15/2013 [9] complaining cannot find 404651 (first processFunction) regenerate the raw, full, branch trace. Sliceat: Timestamp: 404627 0x40103c //printf (392771 @401022 --> 399496 @0x40112b) Problem 1: search does not work. Broke the 4th time. It's caused by the ts+2 problem. See problem 2. Problem 2: the rr_ts is the actual ts + 2. BP on Trace.cc:1614 and then 2100. and then 2502. Still the +1 problem. check the ts match eip problem. solved. [10] problem of mismatch size. The first call of getMemValue gets problem. timestamp 403992 needs 28 bytes. latest write to the same is 402652 which writes 4 bytes. Then it tries to load 28 bytes from the 4 bytes write. This caused problem. We will fix this using more accurate memory tracing later. At this moment take the samller one. 11:00AM 11/17/2013 [10] solve mismatch size. For any size>4, directly return true (means matching values fails, hasDataDependency). DONE. [11] check mem case. --> problem. But the values seems ok. --------------------- TO DO ---------------------- [13] another problem, cannot find rv record. Occurs in one load_vm, however, not the first. The problem is that 401550 is not recoreded. (eip: 804e53f1). Record again and see if it re-occurs. imeStamp: 395329, ins @402543: mov [ebp-0x211], al imeStamp: 384535, ins @7c911533: call 0xFFFFF699 write: (start: 0x12fd0c, end: 0x12fd0f) , ESP: 0x12fd10 -> 0x12fd0c , DEPLINKS: , R: 384534 and ESP value: 0x12fd0c [11] check reg case --> problem. the recorded value seems not right (all 0's), but the locate of ts is successful Plan: debug the entire process again. check the memory problem first. 11/18/2013 8:30AM -------------------------------------------------------------------------------------- Task 176: continue debugging the value recording system -------------------------------------------------------------------------------------- [1] error in size mismatch to record. Caused by writing 1 or 4 bytes. Fix: take the minimal and print a warning. DONE. [2] another problem, cannot find rv record. Occurs in one load_vm, however, not the first. [1] modify the code to bool [15 min] [2] check how often it occurs [10 min] [1] first occurance at 384535, it's a call; second time: 384579 another call. chech how often it occurs. 80 in 995 times. Around 10% of the cases, they are not hit. Could be any instructions. temporarily marked as hasDataDepency: true. Later handle it. 11:00AM [3] problem: case skip should not occur. Seems the problem is identifySOC. set breakpoint on socmanager.cc:133 if tsEnd=356225 --> so it seems that identitysoc is not a problem. It's the mergeSOC. Let is run and continue and read the log. Problem: 356197 addsoc(356197, 356197) merge with 356198->356225 and 356197 is a RET. check addSOC condition tsStart==356197 and see how it's appended. Found the problem. It directly sets the SOC even if it is a RET instruction. In this case, we add another branch to handle it. DONE. [4] message "ts xxx should have only one register ..." seems suspicious, the ts value too large. 8:00PM [5] now the problem, extremely slow. Needs to modify the recordValueProcessor to add a cache. [5.1] declare a CachedMap tsToId in trace.h [5 min] DONE. [5.2] in loadByTs, use tsToId to find the id. [10 min] DONE [5.3] move constructor, and establish CachedMap [12 min] DONE [5.4] unit testing [5 min] DONE. [5.5] devug into loadByTs [15 min] DONE. [5.6] still too slow. RV_SIZE too small. need to regenerate. --> solved. Solved! ---------------- to do -------------------- 8:45am 11/19/2013 [4] check printf, why it's included. Sliceat: Timestamp: 404589 0x40103c printf (392705 @401022 --> 399427 @0x40112b) DUMP below (note: register handing is wrong. only check mem!) --processFunction ts 399427 -- has mem dependency at 398750 on 0x81ef8417, first bytes: 4 and 2 [1 byte reading!!! check] depended by 403714 It's the same piece of code called in printf and scanf. It's impacting control flow in the interrupt handler. -- has mem dependency at 398319 on 0x81fde888, first bytes: ffffff90 and ffffffe0 dependended by 401022 -- has mem dependency at 398142 on 0x8068ceb4, first bytes: ffffffed and ffffffef depended by 400900 -------------------------------------------------------------------------------------- Task 177: check register handling. -------------------------------------------------------------------------------------- [1] check the cr0 register changing instruction. [1] get the addr of the instruction: timeStamp: 396649, ins @804dbfd4: mov cr0, ecx [2] set a conditional BP on translate.c:747 if addr== 0x804dbfd4 --> it's never hit. [2] check if the cr0 modification instruction is getting recorded check timestapm 399649 --> it's not recorded, because it's not the LAST! got to search from backward. 7:30AM 11/20/2013 [3] new data. Sliceat: Timestamp: 404589 0x40103c printf (392705 @401022 --> 409220 @0x40112b) [4] check the cr0 register changing instruction. imeStamp: 401615, ins @804dbfd4: mov cr0, ecx [5] check eip 0x804dbfd4 in Trace::processFunction and see what's happening. It is recorded for 0x7c90e3eb and 0x7c90eb94 (eipCall and eipRet). The same function call appeared 3 times. It seems that mod cr0 instruction also occured 3 times inside the function. 8:45AM [6] now in non_request mode, check translate.c and see if these two instructions (0x7c90e3eb and 0x7c90eb94) are ever recorded. [20 min] bp on lines 748 and 756 of translate.c Then bp on Trace::handle_instr It seems that we need to wait long enough to hit 748 and 756 set BP on Trace::handle_instr if addr==0x7c90e3eb || addr==0x7c90eb94 && this->execRecorder>390000 && ts<4020000 7c90e3eb -> 7c90e8eb (bNext.. is set)A (rr_id is 9), then bp on ops_sse.h:2518 (it is hit), problem is that env->arrReg[xxx] are all 0's. Problem: env->arrRegs never gets a value other than 0. [7] check the code which is embedded in translate.c bp on gen_save_reg_to_env. It seems that the code is doing the job. Need to check the translated code 10:15AM [8] bp on Trace::handle_instr for 0x7c90e3eb and trace into its real instructions. See if we got 9 similar code segments which copies into env->arrRegs. [1] env->arrRegs address: [2] debug observation: Problem: code is generated, however, there is a jump which directs the control directly to the next instruction. 11:20AM [9] Redesign: gen_save_regs move them before the begnning of each instruction. [1] @instr_to_record(Trace::handle_instr) --> set bNextInstr --> if needed for reg, copy the reg to record first //mem addr later because we do not know it yet [2] the execution of the instruction, performs the copy of registers (these are register values before the instruction) [3] @instr_next, set the memory addr to copy [4] @instr_next before its execution, copy registers and mem values. 11:30AM [5] implementation: (1) patch documentation [10 min] DONE. (2) move gen_save_regs [8 min] DONE. (3) @instr_to_record setbNextInstr [8 min] DONE. (4) check [2], [3], [4] [15 min] DONE (5) debug on ops_sse.h copy part [10 min] DONE. works. (6) debug on save record part [15 min] works. (7) debug on register comparison part of discharge [15 min] Sliceat: Timestamp: 404589 0x40103c printf (392705 @401022 --> 399427 @0x40112b) done works! (8) check the mem case. works. (9) read the branch slice log. DUMP BELOW: -- RESOLVED mem dependency at 399146 on 0xe1339748, first bytes: 38 and 38 -- RESOLVED mem dependency at 399116 on 0xe153aee0, first bytes: 2 and 2 -- RESOLVED mem dependency at 399082 on 0xe1177798, first bytes: ffffffa0 and ffffffa0 -- RESOLVED mem dependency at 399052 on 0x81d545e0, first bytes: 2 and 2 -- RESOLVED mem dependency at 399011 on 0x800ca300, first bytes: 0 and 0 -- RESOLVED mem dependency at 398998 on 0x8055a540, first bytes: 1 and 1 -- has mem dependency at 398750 on 0x81d54667, first bytes: 4 and 2 -- RESOLVED mem dependency at 398734 on 0x81d547f0, first bytes: 0 and 0 -- RESOLVED mem dependency at 398634 on 0x81d546cc, first bytes: 0 and 0 -- has mem dependency at 398552 on 0x80042004, first bytes: ffffffe0 and ffffffe0 -- REMOVED reg dependency at 398546 for reg 49 (0 vs 0) -- RESOLVED mem dependency at 398511 on 0x8055196c, first bytes: 1 and 1 -- RESOLVED mem dependency at 398506 on 0x8055a440, first bytes: 58 and 58 -- has mem dependency at 398461 on 0xffdff124, first bytes: 70 and ffffffb8 -- RESOLVED mem dependency at 398460 on 0xffdff128, first bytes: 0 and 0 -- RESOLVED mem dependency at 398320 on 0x81e20ae4, first bytes: ffffff88 and ffffff88 -- has mem dependency at 398319 on 0x81fde888, first bytes: ffffff90 and ffffffe0 -- RESOLVED mem dependency at 398298 on 0x81fde884, first bytes: 0 and 0 -- has mem dependency at 398142 on 0x8068ceb4, first bytes: fffffff0 and fffffff2 -- RESOLVED mem dependency at 397948 on 0xe122cfd8, first bytes: ffffffe1 and ffffffe1 -- RESOLVED mem dependency at 396474 on 0x81f1e880, first bytes: 78 and 78 -- has mem dependency at 396466 on 0x81f1e88f, first bytes: 6 and 5 -- has mem dependency at 396458 on 0x81f1e853, first bytes: d and e -- has mem dependency at 395322 on 0x12fcfc, first bytes: ffffffeb and 9 -- has mem dependency at 394523 on 0x12fd00, first bytes: 4e and 0 -- has mem dependency at 394366 on 0x12fcf8, first bytes: ffffff96 and 70 ==> 7:30AM 11/21/2013 -------------------------------------------------------------------------------------- Task 178: improve memory dependency granularity. -------------------------------------------------------------------------------------- Idea: let's say i1 reads memory written by i2 and i2 is a movsb of a large region. i2 will have a lot of memory dependency. When propagating from i1 to i2, the mem depend link will attach the address and size. We will keep an additional map from timestamp to memRangeManager. When propagating i2, for each mem link, it checks if the associated mem range is affected. [1] in InstrExecRecorder, when update the call of memLink correspondingly. up to now, all mem links have the information of dependence. Optimize by saving it for movsb related only. Declare all functions necessary. [30 min] 9:30AM [2] code inspection in InstrExecRecorder [15 min] DONE. [3] in Trace::full_slice, when propagating the memory link, for the destination, check if it has more than 4 bytes of write, if yes, use the cache map to add the memory access [1 hr] [3.1] decalre new attributes in dependLink DONE [3.2] decalre Trace::isTsInSliceWritesTo DONE [3.3] declare Trace::tsToMrM DONE [3.4] update clear_in_slice_tags DONE. [3.5] declare updateTsInSliceWrite DONE. [3.6] inspect the logic again. [15 min] OK. 11:20AM [4] work on dependLink methods [45 min] [4.1] setMemLink clear the addrStart and addrEnd [5 min] DONE [4.2] implement addMemAccess [15 min] DONE [4.3] update the serialization [15 min] done [4.4] unit test [10 min] DONE [4.5] update all access of mem link DONE. 11:50AM [5] work on Trace functions [70 min] [5.1] hasMultipleWrites [15 min] DONE. [5.2] isTsInSliceWritesTo [10 min] DONE [5.2] clearTsToMrM [15 min] DONE. [5.3] updateTsInSliceWriteTo[15 m n] DONE [5.4] unit test of trace [15 min] 7:30PM [1] fix fail on memrange test1 DONE [2] fix fail on memrange 2. DONE [3] testIsReadRegFromMem, trace loading problem. Problem: the call of tshasMultipleWrites (load old ts during the mock_mem of the latest ts causes the loss of data). The problem is with mock_mem 7:30AM 11/22/2013 -------------------------------------------------------------------------------------- Task 179: Debugging code on improving memory dependency granularity. -------------------------------------------------------------------------------------- [1] the mock memory problem. Idea: pass the raw trace instead, [1.1] add parameter to mock memory and other related functions [12 min] DONE [1.2] modify mock meomry code [8 min] DONE [1.3] unit test case again [10 min] DONE [2] regenerate raw trace still in non-request mode. [5 min] [3] debug the generate full_trace [3.1] bp on 107 bug1. in the loop to enlarge size. fixed. full trace size: 33.5MB -> 33.6MB (nearly negligible) New slice criteria: Sliceat: Timestamp: 404572 0x40103c printf (392679 @401022 --> 399410 @0x40112b) 10:00AM [4] debug the branch_slice part. [1] Trace.cc:559 DONE. [2] Trace.cc:631 Fix the updateTsInSlice logic. DONE. [3] check problem of link.tsDependee changes. DONE [4] fix the end and length problem. DONE. [5] fix the error on multiple read/write case. DONE [6] fix another tsCur case. DONE. 7:30PM [7] case 388428. Problem: timeStamp: 388428, ins @7c910c71: repz stos es:[edi], eax, write: (start: 0x323440, end: 0x323c3f) it is write only. DONE. [7] check bMW1 case. Fix need to add range. DONE. [8] fix updateTsInSliceWrite(tsTarget...) add one more parameter. The idea is to go through each active range and apply it to the dependLink (see if there any intersection) 7:00AM 11/23/2013 [1] debug clearTsToMrM verified OK. [2] collect the dump. Sliceat: Timestamp: 404572 0x40103c printf (392679 @401022 --> 399410 @0x40112b) 10:00AM [3] check the rest of the unresolved case. The following are the dependency that are not resolved --processFunction ts 399410 -- has mem dependency at 398733 on 0x81d54e17, first bytes: 4 and 2 --> KiAdjustQuantumThread -- has mem dependency at 398535 on 0x80042004, first bytes: ffffffe0 and ffffffe0 --> SwapContext -- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20 --> KiUnlockDispatchDatabase -- has mem dependency at 398347 on first write on 0x81f1e88e --> KiUnwaitThread -- has mem dependency at 398344 on first write on 0x81f1e853 --> KiUnwaitThread -- has mem dependency at 398341 on 0x81f1e88f, first bytes: 1 and 6 --> KiUnwaitThread -- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread -- has mem dependency at 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea [4] check timestapm 399410 It has a lot of similar dec comp patters at 0x804e3bc9 (also there is a check of jge with 4). Use WinDbg to check ==> 0x804e3bc9 is part of the function nt!CcWriteBehind (wrong. checked the code not right) Search for the bytecode of the assembly. kd> s -b 80000000 88000000 fe 49 6f 8a 804f91cd fe 49 6f 8a 41 6f 84 c0-7f 51 2a 51 6e 8b 41 44 .Io.Ao...Q*Qn.AD 805486e0 58 87 54 80 00 00 00 00-1c 87 54 80 78 bb 65 80 X.T.......T.x.e. 80673008 fe 49 6f 8a 00 00 00 00-1c 87 54 80 78 bb 65 80 .Io.......T.x.e Disassemble it to verify: So it's a part of KiAdjustQuantumThread nt!KiAdjustQuantumThread+0x19: 804f91cd fe496f dec byte ptr [ecx+6Fh] 804f91d0 8a416f mov al,byte ptr [ecx+6Fh] 804f91d3 84c0 test al,al 804f91d5 7f51 jg nt!KiAdjustQuantumThread+0x74 (804f9228) nt!KiAdjustQuantumThread+0x23: 804f91d7 2a516e sub dl,byte ptr [ecx+6Eh] 804f91da 8b4144 mov eax,dword ptr [ecx+44h] 804f91dd 8a4063 mov al,byte ptr [eax+63h] 804f91e0 feca dec dl 804f91e2 3ad3 cmp dl,bl Set a BP on it, it's called many times. Set a BP on scanf first (in b20.exe) and then set a BP on kiAdjustQuantumThread. List stack frame. 00 ba947a78 804f97e6 nt!KiAdjustQuantumThread+0x19 01 ba947ab4 bf8aec51 nt!KeWaitForMultipleObjects+0x32c 02 ba947d30 bf8c8594 win32k!RawInputThread+0x4f3 **** 03 ba947d40 bf800ff4 win32k!xxxCreateSystemThreads+0x60 04 ba947d54 8053c808 win32k!NtUserCallOneParam+0x23 05 ba947d54 7c90eb94 nt!KiFastCallEntry+0xf8 06 006dffe0 75b653d6 ntdll!KiFastSystemCallRet The C source code is shown as below: VOID KiAdjustQuantumThread ( IN PKTHREAD Thread ) /*++ Routine Description: If the current thread is not a time critical or real time thread, then adjust its quantum in accordance with the adjustment that would have occurred if the thread had actually waited. N.B. This routine is entered at SYNCH_LEVEL and exits at the wait IRQL of the subject thread after having exited the scheduler. Arguments: Thread - Supplies a pointer to the current thread. Return Value: None. --*/ { PKPRCB Prcb; PKTHREAD NewThread; // // Acquire the thread lock and the PRCB lock. // // If the thread is not a real time or time critical thread, then adjust // the thread quantum. // Prcb = KeGetCurrentPrcb(); KiAcquireThreadLock(Thread); KiAcquirePrcbLock(Prcb); if ((Thread->Priority < LOW_REALTIME_PRIORITY) && (Thread->BasePriority < TIME_CRITICAL_PRIORITY_BOUND)) { Thread->Quantum -= WAIT_QUANTUM_DECREMENT; **** //corresponds to THE instruction if (Thread->Quantum <= 0) { // // Quantum end has occurred. Adjust the thread priority. // Thread->Quantum = Thread->QuantumReset; // // Compute the new thread priority and attempt to reschedule the // current processor as if a quantum end had occurred. // // N.B. The new priority will never be greater than the previous // priority. // Thread->Priority = KiComputeNewPriority(Thread, 1); if (Prcb->NextThread == NULL) { if ((NewThread = KiSelectReadyThread(Thread->Priority, Prcb)) != NULL) { NewThread->State = Standby; Prcb->NextThread = NewThread; } } else { Thread->Preempted = FALSE; } } } // // Release the thread lock, release the PRCB lock, exit the scheduler, // and return. // KiReleasePrcbLock(Prcb); KiReleaseThreadLock(Thread); KiExitDispatcher(Thread->WaitIrql); return; } =================> From the disassembly we can infer that ECX points to the _KTHREAD structure and the offset 0x6F is the "quantum" field, see below: kd> dt _KTHREAD ntdll!_KTHREAD +0x000 Header : _DISPATCHER_HEADER ... +0x06e PriorityDecrement : Char +0x06f Quantum : Char *** ... So the instruction is to decrement the quantum by 1, if quantum runs out (<0) [4] check timestamp 398535, similar approach kd> s -b 80000000 86000000 89 41 04 8b 66 28 8b 46 20 [the binary code of the 3 instructions] 80540b1f 89 41 04 8b 66 28 8b 46-20 89 43 18 fb 8b 47 44 .A..f(.F .C...G It's part of the SwapContext. Could not find more information but it should be related to modifying some thread or process structure. [5] check 398444. -- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20 KiUnlockDispatchDatabase [6] has mem dependency at 398347, 398344 (nt!KiUnwaitThread) 80500299 004633 add byte ptr [esi+33h],al 8050029c 384e33 cmp byte ptr [esi+33h],cl 8050029f 7d03 jge nt!KiUnwaitThread+0xa2 (805002a4) nt!KiUnwaitThread+0x9f: 805002a1 884e33 mov byte ptr [esi+33h],cl nt!KiUnwaitThread+0xa2: 805002a4 c6466e00 mov byte ptr [esi+6Eh],0 [7] check 398293 on 0x81fde888, first bytes: ffffffe0 and 28 --> nt! KiUnlinkThread [8] check 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea --> this one could not find. Up to now we have the following!!!! All related to context switch. ======================================================================================================= -- has mem dependency at 398733 on 0x81d54e17, first bytes: 4 and 2 --> KiAdjustQuantumThread -- has mem dependency at 398535 on 0x80042004, first bytes: ffffffe0 and ffffffe0 --> SwapContext -- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20 --> KiUnlockDispatchDatabase -- has mem dependency at 398347 on first write on 0x81f1e88e --> KiUnwaitThread -- has mem dependency at 398344 on first write on 0x81f1e853 --> KiUnwaitThread -- has mem dependency at 398341 on 0x81f1e88f, first bytes: 1 and 6 --> KiUnwaitThread -- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread -- has mem dependency at 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea --->nt!NtRequestWaitReplyPort --> it's increasing the LpcpNextMessageId ======================================================================================================= 8:30AM 11/25/2013 -------------------------------------------------------------------------------------- Task 180: Figure out the above are actually proactively called by printf, or are they part of the context switch. -------------------------------------------------------------------------------------- [1] figure out 298116 Found that it's part of NtRequestWaitReplyPort, and it's increasing variable LpcNextMessageId: see the following: nt!NtRequestWaitReplyPort+0x550: 80597130 83660c00 and dword ptr [esi+0Ch],0 80597134 a1b4d96680 mov eax,dword ptr [nt!LpcpNextMessageId (8066d9b4)] 80597139 894628 mov dword ptr [esi+28h],eax 8059713c ff05b4d96680 inc dword ptr [nt!LpcpNextMessageId (8066d9b4)] 80597142 750a jne nt!NtRequestWaitReplyPort+0x56e (8059714e) nt!NtRequestWaitReplyPort+0x564: 80597144 c705b4d9668001000000 mov dword ptr [nt!LpcpNextMessageId (8066d9b4)],1 nt!NtRequestWaitReplyPort+0x56e: 8059714e 83662c00 and dword ptr [esi+2Ch],0 [2] set a BP and see how it's invoked [1] be on 0x401010 and 401022 [2] be at 0x8059713c and see how many times it's invoked Note: use bp /p process_id 0x8059713c (otherwise there are too many distractions) It shows that the scanf triggers 3 calls of NtRequestWaitReply Note that pt in WinDbg is not that reliable. [3] study 398116 again. See how it's depended. Dump below: ====================================== Reverse ID: 0, ts: 398116, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] Reverse ID: 1, ts: 400885, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] ====================================== Sliceat: Timestamp: 404572 0x40103c printf (392679 @401022 --> 399410 @0x40112b) 400885 is another ntRequestWaitReply in printf, and it is reading the timestamp for updating the global LpcNextMessageId. 400885 is the dependent of the next instruction relying on the resulting EFLAGS value. 8:30AM 11/26/2013 -------------------------------------------------------------------------------------- Task 181: fix the back tracking algorithm -------------------------------------------------------------------------------------- Dump below: ====================================== Reverse ID: 0, ts: 398116, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] Reverse ID: 1, ts: 400885, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] ====================================== Sliceat: Timestamp: 404572 0x40103c printf (392679 @401022 --> 399410 @0x40112b) [1] check the reverse_trace algorithm [10 min] Problem: 403090 has no reverse pointer. check how it is included. [2] check how 403090 is included in slice. [20 min] Problem is the realDepenee is set to -1. [3] modify hasDataDependee and set the realDataDependee [20 min] The new dump is shown below: [5] check 404571 why it's need_visit. [30 min] It's not via Trace->setInSlice. set a BP on ier->setInSlice Got ot fix the reverseLinkType in clear link. --> trouble. could not get it cleare. [6] 2nd attempt: set bp on IER::serialize. Found that type 6 is caused by SOC(404572,404572) -> setNeedControl update correspondingly. New dump below: ... Reverse ID: 59, ts: 404512, Type: NEED_VISIT ins @40159d: mov [ebp-0x4], 0xFFFFFFFE //mem Reverse ID: 60, ts: 404552, Type: REGI_LINK ins @4015a9: mov eax, [ebp-0x1C] //read eax Reverse ID: 61, ts: 404568, Type: MEM_LINK ins @40102f: mov [ebp-0x4], eax //read mem Reverse ID: 62, ts: 404571, Type: REGI_LINK ins @401038: cmp [ebp-0x4], 0x61 // needed by jz at 404572 New problem area: 404512 it is depended by 404552. which is trange set bp on it. Found that it's ok. It's because skipping function calls. [7] check again. ====================================== reverse trace for ts: 398116 ====================================== Reverse ID: 0, ts: 398116, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] Reverse ID: 1, ts: 400885, Type: MEM_LINK ins @8057882c: inc [-0x7F97314C] Reverse ID: 2, ts: 403090, Type: ALL_FUNCTION ins @8057882c: inc [-0x7F97314C] Reverse ID: 3, ts: 403937, Type: MEM_LINK ins @8056a658: repz movs es:[edi], ds:[esi] Reverse ID: 4, ts: 404254, Type: REGI_LINK ins @7c81ac42: mov eax, [ebp-0x8C] Reverse ID: 5, ts: 404255, Type: MEM_LINK ins @7c81ac48: mov [esi], eax Reverse ID: 6, ts: 404279, Type: REGI_LINK ins @7c8018ce: test [ebp-0x1C], 0x01 Reverse ID: 7, ts: 404280, Type: CONTROL_LINK ins @7c8018d2: jz 0x0000003C Reverse ID: 8, ts: 404281, Type: NEED_VISIT ins @7c8018d4: mov [ebp-0x4], ebx Reverse ID: 9, ts: 404284, Type: CONTROL_LINK ins @7c8018dd: jnz 0x00000007 Reverse ID: 10, ts: 404285, Type: NEED_VISIT ins @7c8018e4: or [ebp-0x4], 0xFF Reverse ID: 11, ts: 404286, Type: CONTROL_LINK ins @7c8018e8: jmp 0x00000026 Reverse ID: 12, ts: 404287, Type: REGI_LINK ins @7c80190e: cmp esi, ebx Reverse ID: 13, ts: 404288, Type: CONTROL_LINK ins @7c801910: jge 0xFFFFFF80 Reverse ID: 14, ts: 404289, Type: NEED_VISIT ins @7c801890: xor eax, eax Reverse ID: 15, ts: 404291, Type: ESP_LINK ins @7c801893: call 0x00000C78 Reverse ID: 16, ts: 404294, Type: REGI_LINK ins @7c802515: pop ecx Reverse ID: 17, ts: 404299, Type: ESP_LINK ins @7c80251a: push ecx Reverse ID: 18, ts: 404300, Type: ESP_LINK ins @7c80251b: ret Reverse ID: 19, ts: 404301, Type: ESP_LINK ins @7c801898: ret 0x0014 Reverse ID: 20, ts: 404377, Type: NEED_VISIT ins @4094da: pop edi Reverse ID: 21, ts: 404381, Type: CONTROL_LINK ins @4094de: ret Reverse ID: 22, ts: 404382, Type: NEED_VISIT ins @409596: add esp, 0x0C Reverse ID: 23, ts: 404384, Type: CONTROL_LINK ins @40959c: jmp 0x00000019 Reverse ID: 24, ts: 404385, Type: NEED_VISIT ins @4095b5: mov [ebp-0x4], 0xFFFFFFFE Reverse ID: 25, ts: 404413, Type: REGI_LINK ins @4095c1: mov eax, [ebp-0x1C] Reverse ID: 26, ts: 404431, Type: REGI_LINK ins @4040ef: cmp eax, 0xFF Reverse ID: 27, ts: 404432, Type: CONTROL_LINK ins @4040f2: jz 0x00000088 Reverse ID: 28, ts: 404433, Type: REGI_LINK ins @4040f8: test [esi+0xC], 0x82 Reverse ID: 29, ts: 404434, Type: CONTROL_LINK ins @4040fc: jnz 0x00000053 Reverse ID: 30, ts: 404435, Type: MEM_LINK ins @4040fe: push esi Reverse ID: 31, ts: 404440, Type: REGI_LINK ins @404196: mov eax, [ebp+0x8] Reverse ID: 32, ts: 404443, Type: REGI_LINK ins @4041b2: mov eax, [eax+0x10] Reverse ID: 33, ts: 404447, Type: REGI_LINK ins @404105: cmp eax, 0xFF Reverse ID: 34, ts: 404448, Type: CONTROL_LINK ins @404108: jz 0x00000032 Reverse ID: 35, ts: 404449, Type: MEM_LINK ins @40410a: push esi Reverse ID: 36, ts: 404454, Type: REGI_LINK ins @404196: mov eax, [ebp+0x8] Reverse ID: 37, ts: 404457, Type: REGI_LINK ins @4041b2: mov eax, [eax+0x10] Reverse ID: 38, ts: 404461, Type: REGI_LINK ins @404111: cmp eax, 0xFE Reverse ID: 39, ts: 404462, Type: CONTROL_LINK ins @404114: jz 0x00000026 Reverse ID: 40, ts: 404463, Type: NEED_VISIT ins @404116: push edi Reverse ID: 41, ts: 404465, Type: MEM_LINK ins @404118: call 0x00000079 Reverse ID: 42, ts: 404474, Type: ESP_LINK ins @4041b6: ret Reverse ID: 43, ts: 404476, Type: MEM_LINK ins @404120: push esi Reverse ID: 44, ts: 404482, Type: REGI_LINK ins @404196: mov eax, [ebp+0x8] Reverse ID: 45, ts: 404483, Type: REGI_LINK ins @404199: test eax, eax Reverse ID: 46, ts: 404484, Type: CONTROL_LINK ins @40419b: jnz 0x00000017 Reverse ID: 47, ts: 404485, Type: REGI_LINK ins @4041b2: mov eax, [eax+0x10] //DATA Reverse ID: 48, ts: 404488, Type: NEED_VISIT ins @40412d: and eax, 0x1F //control Reverse ID: 49, ts: 404494, Type: CONTROL_LINK ins @404138: jmp 0x00000007 //control Reverse ID: 50, ts: 404495, Type: NEED_VISIT ins @40413f: mov al, [eax+0x4] //control Reverse ID: 51, ts: 404498, Type: CONTROL_LINK ins @404146: jnz 0x00000009 //control/ Reverse ID: 52, ts: 404499, Type: REGI_LINK ins @40414f: cmp [esi+0x18], 0x00000200 //data Reverse ID: 53, ts: 404500, Type: CONTROL_LINK ins @404156: jnz 0x00000017 //control Reverse ID: 54, ts: 404501, Type: NEED_VISIT ins @40416d: mov ecx, [esi] // control Reverse ID: 55, ts: 404506, Type: CONTROL_LINK ins @404178: jmp 0x00000016 //control Reverse ID: 56, ts: 404507, Type: NEED_VISIT ins @40418e: pop esi //control Reverse ID: 57, ts: 404509, Type: CONTROL_LINK ins @404190: ret //function included Reverse ID: 58, ts: 404510, Type: NEED_VISIT ins @401599: pop ecx //ok. block basic Reverse ID: 59, ts: 404512, Type: NEED_VISIT ins @40159d: mov [ebp-0x4], 0xFFFFFFFE //because skipping function call Reverse ID: 60, ts: 404552, Type: REGI_LINK ins @4015a9: mov eax, [ebp-0x1C] //ok. Reverse ID: 61, ts: 404568, Type: MEM_LINK ins @40102f: mov [ebp-0x4], eax //ok. Reverse ID: 62, ts: 404571, Type: REGI_LINK ins @401038: cmp [ebp-0x4], 0x61 //ok. ====================================== END OF reverse trace for ts: 398116 ====================================== 7:30AM 11/27/2013 -------------------------------------------------------------------------------------- Task 182: Algorithm Design how to handle printf -------------------------------------------------------------------------------------- [1] simplified problem scanf [1] do something serious [2] send request [3] OS routine: ntWaitReplyPort -> increase id, if id<0 then ...; if id>0 then ... printf [1] do something else [2] send request [3] OS routine: ntWaitReplyPort -> increase id, if id<0 then ...; if id>0 then ... The global id chains the two together. No the problem is: can we remove the printf block? Algorithm idea: [1] identify that particular INC instruction as non-interfering instruction [2] incremental removal. Test remove printf and see if it affects the trace generation. [2] check 398293 and verify if the algorithm works. (-- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread) Code is shown as below: 805001a6 095154 or dword ptr [ecx+54h],edx ; thread->waitStatus |= edx param 805001a9 8b415c mov eax,dword ptr [ecx+5Ch] ; eax <- waitBlockList 805001ac 56 push esi nt!KiUnlinkThread+0x7: 805001ad 8b10 mov edx,dword ptr [eax] ; edx <- _KWAIT_BLOCK.WaitListEntry 805001af 8b7004 mov esi,dword ptr [eax+4] ; esi<- KWAIT_BLOCK.WaitListEntry.B_LINK 805001b2 8916 mov dword ptr [esi],edx ***; this is to perform the removal. prevElement.B_LINK = cur.F_Link 805001b4 897204 mov dword ptr [edx+4],esi ; this is to peform the removal. cur.B_Link = ... 805001b7 8b4010 mov eax,dword ptr [eax+10h] 805001ba 3b415c cmp eax,dword ptr [ecx+5Ch] 805001bd 75ee jne nt!KiUnlinkThread+0x7 (805001ad) The psudo code can be found from reactOS. We can infer that ecx must be pointing to _KTHREAD, thus ecx+54 is the wait staus. The mov dword ptr [esi], edx is the operation to remove a block from the waiting list. Check how it's depended by the instruction in printf, could not find the instruction. It's the instruction at 804fb1b4!!! ---------------- nt!KeReleaseSemaphore: 804fb172 8bff mov edi,edi 804fb174 55 push ebp 804fb175 8bec mov ebp,esp 804fb177 51 push ecx 804fb178 53 push ebx 804fb179 56 push esi 804fb17a 57 push edi 804fb17b ff1514774d80 call dword ptr [nt!_imp__KeRaiseIrqlToDpcLevel (804d7714)] 804fb181 8b7508 mov esi,dword ptr [ebp+8] 804fb184 8b5e04 mov ebx,dword ptr [esi+4] 804fb187 8ac8 mov cl,al 804fb189 8b4510 mov eax,dword ptr [ebp+10h] 804fb18c 8d3c03 lea edi,[ebx+eax] 804fb18f 3b7e10 cmp edi,dword ptr [esi+10h] 804fb192 884dff mov byte ptr [ebp-1],cl 804fb195 7f04 jg nt!KeReleaseSemaphore+0x29 (804fb19b) nt!KeReleaseSemaphore+0x25: 804fb197 3bfb cmp edi,ebx 804fb199 7d0f jge nt!KeReleaseSemaphore+0x38 (804fb1aa) nt!KeReleaseSemaphore+0x29: 804fb19b e868570400 call nt!KiUnlockDispatcherDatabase (80540908) 804fb1a0 68470000c0 push 0C0000047h 804fb1a5 e8566b0400 call nt!ExRaiseStatus (80541d00) nt!KeReleaseSemaphore+0x38: 804fb1aa 85db test ebx,ebx 804fb1ac 897e04 mov dword ptr [esi+4],edi 804fb1af 7511 jne nt!KeReleaseSemaphore+0x50 (804fb1c2) nt!KeReleaseSemaphore+0x3f: 804fb1b1 8d4608 lea eax,[esi+8] 804fb1b4 3900 cmp dword ptr [eax],eax *** // 804fb1b6 740a je nt!KeReleaseSemaphore+0x50 (804fb1c2) -------------- We need to figure out what is esi, our guess is that it is pointing to _KSEMAPHOARE. Thus (offset 0x10 is the "limit" attribute). esi+8 is the _KSEMAPHORE -> waitListHead!!! So printf is using the queue structure and scanf is using it too. 1:30PM 11/29/2013 [1] Continue to figure out the logic: (1) is it really not preserving the value in the queue list? Or is it a bug? [2] modify the code so that it dumps the first 4 bytes. Sliceat: Timestamp: 404572 0x40103c printf (392679 @401022 --> 399410 @0x40112b) -- has mem dependency at 398293 on 0x81fde888, first 4 bytes: 81e20ae0 and 81e1f928, size: 4 [3] use WinDbg to verify if the above memory recording is true. Information of 398293 is shown below 805001b2 8916 mov dword ptr [esi],edx ***; this is to perform the removal. prevElement.B_LINK = cur.F_Link [2] Use windbg set bp at 0x401010, 0x401022, and 402027 (begin and end of call printf) then ba e1 on 0x805001b2 and find out the address that is writing to. Start the program again and check the contents at 0x401022 and 0x401027 [2.1] problem: 0x805001b2 is hit twoo many times. While in dump log, it's only hit twice during the printf call. Address of [esi]: 89915028, 898989e0, 89890528, 89925968, 89b3a990 Most likely, there are too many thread switches. Related calls: nt!KiUnlinkThread --> seems to be removing all waiting objects in the thread baccfe6c 8050040b nt!KiUnwaitThread+0x12 --> make thread not waiting on nay object, make it ready to run. baccfe98 804ff18c nt!KiWaitTest+0xab --> test the object waited by threads and release them baccffa4 804ff34b nt!KiTimerListExpire+0x7a baccffd0 80540d5d nt!KiTimerExpiration+0xaf baccfff4 80540a2a nt!KiRetireDpcList+0x46 baccfff8 b6ba7854 nt!KiDispatchInterrupt+0x2a --> interrupt dispatch Check reactos code: 9:00AM 12/02/2013. -------------------------------------------------------------------------------------- Task 183: solve the printf problem -------------------------------------------------------------------------------------- Idea: ignore instructions belong to sysenter. The only problem is that we'll not be able to capture interrupt handler changes by malware. So later, this should be set as an option. But implement it first. 9:30AM [1] Review the checkRecordLogic. [30 min] [1.1] when bJustReceivedInterrupt will be set to true? [15 min] It's triggered in seg_helper.cc::do_interrupt_all(), any interrupt will trigger this function. But we are not sure if sysenter will trigger it. [1.1.1] design a lab to verify it 7c90eb8b and 7c90eb8d (sysenter) [1] set a conditional bp on 7c90eb8b and 7c90eb8d in Trace::handle_instr [2] once it's hit set a breakpoint on handle_interrupt see if its ever hit. Conclusion: sysenter does not trigger the do_interrupt_all. [1.2] check bDelayedOneStep. [15 min] --> ok. [1.3] how is bTracePhyMem set? [1] bTracePhy is true and the capturing of memory is done through handle_phy_mem_access, which is called by handle_trace_mem for every memory access. But currently it needs bRecordEnabled. [2] bTracePhyMem is set to true in setPhyMemTraceMode, it currently records the cr3Modify eip. It is called in ops_sse.h:helper_trace2. When an instruction to be found to modifyCR3, the mode is set. [3] bTracePhyMem is set to false when the Trace it comes back (from the other cr3) and the eip is not the same as the cr3 switch. [2] Problem 1: decide how to deal with cr3 switch. CR3 switch only occurs in a system call when getchar() waits for another process which collects the input and then transfer the user input into the address space. So the original cr3 detection can be disabled. Need to declare a new mode in all related functions. If it is to trace kernel mode, then use the original idea; otherwise, bTracePhyMem is turned on at a system call. 11:00AM [3] Implementation: [1] introduce a boolean variable trace_kernel in config.txt and parse it and declare a boolean var in Trace for trace_kernel as an instance attribute. Need to handle all instructor and copy functions. [30 min] DONE. [2] in checkRecordStatus modify the algorithm. Based on opcode 134 (sysenter). Enter the tracing mode. Based on bTraceKernel, decide if to enalbe or disable bTracePhyMem [30 min] DONE. 11:50AM [3] in setPhyMemTraceMode add the protection based on bTraceKernel and in Trace::handle_instr (the logic for disableing bTracePhy). [30 min] DONE. [4] debug: verify the bTraceKernel mode first [20 min] Verify if syscalls are still there. DONE. [5] debug: verify the bTraceKernel=false mode [30 min] [5.1] found that the return is not right. Should capture sysexit. opcode: 0x135 3:45PM [5.2] fix a problem. the code terminates too early. DONE. [5.3] the opcode are not right, check again. [1] find the timestamp of sysenter: 987 0xf 0x34 sysexit: 5605 0xf 0x35 [2] make the change still not working. (check opcode 2nd byte, still not working). 9:30AM 12/03/2013. [5.4] sysexit problem, not returning to an address that is close. SOLVED. Now full dump is only 31MB, almost 60% of the original. [5.5] modify to include sysenter. DONE. [5.6] strangely only improved 7%. check. ---------------------------- Problem: error in reading memory. --processFunction ts 238737 ERROR in reading mem 0x12fd0f ERROR reading mem at 237748 or 231647, set as hasDataDepend 8:30AM 12/04/2013 -------------------------------------------------------------------------------------- Task 184: check the memory reading problem on 0x12fd0f. -------------------------------------------------------------------------------------- [1] new stats Sliceat: Timestamp: 240572 0x40103c printf (235876 @401022 --> 238737 @0x40112b) [2] set a BP at 2377438 DONE. [3] check data timeStamp: 237748, ins @402543: mov [ebp-0x211], al write: (start: 0x12fd0f, end: 0x12fd0f) , DEPLINKS: , R: 237722 and EBP value: 0x12ff20, R: 237747 timeStamp: 231647, ins @7c911533: call 0xFFFFF699 write: (start: 0x12fd0c, end: 0x12fd0f) , ESP: 0x12fd10 -> 0x12fd0c , DEPLINKS: , R: 231646 and ESP value: 0x12fd0c In the branch slice, it tries to read 0x12fd0f. It's the reading on 231647 fails. Problem is with the read mem algorithm. 11:00AM [4] implementation [20 min] DONE. [5] debug [20 min] [6] result: problem is still memory mismatch: has mem dependency at 237748 on 0x12fd0f, first 4 bytes: 7c and 0, size: 1 [7] check windbg on @402543 and see its functions and see why it's depended on. timeStamp: 237748, ins @402543: mov [ebp-0x211], al write: (start: 0x12fd0f, end: 0x12fd0f) , DEPLINKS: , R: 237722 and EBP value: 0x12ff20, R: 237747 timeStamp: 239766, ins @7c913309: mov eax, [esi+0x20] read: (start: 0x12fd0c, end: 0x12fd0f) , DEPLINKS: , R: 239694 , M: 231647 , M: 237748 , C: 239765 ESP: 0x12fca4 EBP: 0x12fcac Observation: [a] the instruction at @402543 copies the printf string byte by byte into 0x12FD0F. (it is also accessed in the same loop) [b] setting a hardware breakpoint also verifies that it is actually accessed by the instruction in scanf @7c903309. It is a part of CsrClientCallServer. [8] figure out the logic of @7c903309. In instruction @7c903309, mov eax, [esi+0x20], the esi points to the PCSR_API_MESSAGE ApiMessage, Using windbg, we are able to figure out offset 0x20. From the ReactOS source code we can infer that offset 0x20 is ApiMessage->Status. The problem is: why the byte of 0x12fd0f is not overwritten by the ClientCallServer itself? Reading ReactOS document again --> CsrClientCallServer only sets the status when the status is a failure! So when it returns, it directly returns the ORIGINAL whatever data is contained in ApiMessage. It seems that CsrClientCallServer fill into the fields of the ApiMessage and then forward the request. ------------------------------------------------------------------------------- Now the question is should ApiMessage be ever INITIALIZED??? check WinXP image the value of it. 9:00AM 12/05/2013 [9] check why ApiMessage->status is never initialized [9.1] set a BP at 0x7c903309 after the call of scanf, and check what's the function that's calling it. kernel32.readConsoleA -> kernel32.7c8713f9 -> CsrClientCallServer -> @7c903309 (return ApiMsg->status which is not initialized) The readConsoleA declares APIMessage as a local variable and then calls CsrClientCallServer. But the APIMessage ->Status itself is never initialized. APIMessage->Status is located at 0x12fd0c, and APIMessage is located at 0x12fcec. [9.2] guess: APIMsg->Status maybe set in the NtRequestAPIReplyPort. Experiment: set 0x12fd0c to 1 and see what's the change. Verified: NtRequestAPIReplyPort does clear the status to 0. Now the problem is that if the value is originally 0, will the clear-to-0 action be still executed? [9.3] Delve into NtRequestAPIReplyPort and check where is APIMessage->Status is cleared. hardware breakpoint does not work. Coz it is modified in kernel mode. [9.4] read the trace and find out if the memory writes are recorded for the sysenter instruction. [1] locate the timestamp of the following: Sliceat: Timestamp: 240572 0x40103c printf (235876 @401022 --> 238737 @0x40112b) scanf (238738 @40102a --> 240567 @401092) call CsrClientCallServer 239685 @7c8715bb call NtRequestWaitReplyPort 239734 @7c9132f3 sysenter 239739 @7c90eb8d <---- however, not mem writing recorded for this instruction!!! > [9.5] debug the checkRecordStatus [1] check how record mem is done. --> comment out the check on bRecordEnabled in handle_phy_mem_access [2] regenerate raw trace and full trace. [3] problem: handle_phy_mem is called but the translation from ha to va always yields 0xFFFFFFFF. Reason: page map is not built yet. It's built only when the instruction is modifying CR3. Change the logic when it's syscall. [9.6] modify isModifyCR3 [1] add function isNeedCollectPhyMem(cr3, eip, byte1, byte2) [2] add the logic of checking syscall DONE. [3] fix the handle_mem_read. 9:00AM 12/06/2013 -------------------------------------------------------------------------------------- Task 185: Fix problem of memory recording -------------------------------------------------------------------------------------- [1] complain about mismatch. solved. 10:20AM [2] solve the memory capacity problem. [1] declare a CachedMap for phyMem [5 min] DONE. [2] add functions for enablePhyMem tracing and disable phyMemTracing [8 min] DONE [3] modify handle_mem_read logic, if the address is already written, don't add it [10 min] DONE [4] modify handle_mem_write, add the tracing [10 min] DONE [5] debug and see the result. [10 min] DONE. [3] for !bTraceIntoSyscall mode, don't trace those kernel memory. [1] read about page table directory. In a 32-bit page entry, bits 31:20 is the linear address, bit 2 indicates if it is superuser access only (value 0). [5 min] [2] read build_page_pae. It seems to be skipping the non-user page already! break on mem_helper.c:233 7:20AM 12/07/2013. [3] add a function is kernelMem in trace.h [10min] FIXED. 9:30 12/09/2013 [4] regenerate the trace Sliceat: Timestamp: 240565 0x40103c printf (235870 @401022 --> 238731 @0x40112b) scanf (238733 @40102a --> 240560 @401092) [5] branch slice. Now has cleared the memory dependency, which is good. Still has a number of unknown dependency. 10:00 12/09/2013 -------------------------------------------------------------------------------------- Task 186: Fix the unknown dependencies -------------------------------------------------------------------------------------- [1] data dump below, analyze each. NOTE: 239732 (belongs to scanf, and it's a sysenter which has a lot of reads) -- has unknown dependency at 237742 depended by 239732 (AdvMemRead) on 12fd0f. -- has unknown dependency at 237737 depended by 239732 (AdvMemRead) on 12fcfc-12fcff -- has unknown dependency at 236938 depended by 239732 (AdvMemRead) on 12fd00-12fd04 -- has unknown dependency at 236783 dependedn by 239732 -- has unknown dependency at 236781 -same -- has unknown dependency at 236652 -same -- has unknown dependency at 236548 -same -- has unknown dependency at 236544 -same [2] check dependency 237737. See if it's really being dependended by scanf 237737:@40192a inc [esi] -- has unknown dependency at 236107 Use WinDbg, set the BPs at the following sequence: @40192a (ba), @40112b (bp) and then memory bp on 12fcfc Note to use /p to indicate the process to capture e.g., ba r4 /p 898338d0 0x12fcfc Problem: the instruction @40192a is hit many times. check its logic? no debugging information. [3] regenerate b21.exe and place it. Sliceat: Timestamp: 240981 0x40103d printf (235862 @40101d --> 238723 @0x4011fb) scanf (238728 @40102e --> 240977 @40110e) --processFunction ts 238723 Cannot find RV record for 237744 ERROR reading mem at 237744 or 204067, set as hasDataDepend -- set ts 235862 @0x40101d in slice Function included! add into slice RET 238723 @0x4011fb ---> debug... 8:30AM 12/11/2013 [4] check the reading mem problem. Regenerate the data [1] request mode 1. DONE. [2] request mode 0. Sliceat: Timestamp: 240983 0x40103d printf (235863 @40101d --> 238724 @0x4011fb) scanf (238729 @40102e --> 240979 @40110e) -------------------------------------------------------- --processFunction ts 238724 Cannot find RV record for 237745 ERROR reading mem at 237745 or 204068, set as hasDataDepend -------------------------------------------------------- [3] debug: set conditional breakpoint. [30 min] it is the read of 237745 error. but b1 is ok (204068) Records below: timeStamp: 237745, ins @404c6b: and [eax+0x70], 0xFD write: (start: 0x321f00, end: 0x321f03) , DEPLINKS: , R: 237744 timeStamp: 204068, ins @40baa1: and [ecx+0x70], 0xFD write: (start: 0x321f00, end: 0x321f03) , DEPLINKS: , R: 204067 [4] debug again and check why RV record cannot be found. The problem is that @404c6b is not recorded at all., but @40baa1 is recorded. set a bp on it. check when incNeedRecord is called. The problem is that both instructions do have their records in rrProcessor at 0x404c6f it is actually recording. Now this time it is able to locate the RV record, but the countMem is 0! [5] would it be caused by the change of InstrExecRecorder::handle_mem_read/write (added one parameter of bTracePhyMem) unit itest it. --> found problem with RecordValueProcessor. check later. 9:00AM 12/12/2013. [6] check the unit test problem. FIXED. [7] regenerate the raw trace, full trace and branch trace again. Still could not fix the problem. Each time the branch is generated differently (should should be a deterministic process). Still reports cannot find RV record. 10:00AM [8] recompile from scratch. mode 1: done. Still the problem of cannot find RV record 237742 @404c6b [9] check @404c6b in log and see how many times it has occured. It has only occured twice. [10] debug design. [1] condition bp on eip of @404c6f and trace into it and see if it is calling rvProcessor->addRecord; also bp on Trace::handle_value_alert Problem: this->bNextInstrToRecord is not set! 11:30AM [2] check @404c6b is ever contained in the request. Verified no. [3] check why @404c6b is not included in the request [3.1] start in mode1 again and generate the raw trace and full trace. DONE [3.2] in branch trace, set conditional bp to handle function 0x4011fb, then set conditional bp on @404c6b DONE Sliceat: Timestamp: 240980 0x40103d printf (235861 @40101d --> 238722 @0x4011fb) scanf (238727 @40102e --> 240976 @40110e) 237742 (@404c6b) is not added because it's not in slice. --> verify: at the end of branch_slice check this->rrProcessor->getRR_ID_ForEIP(0x404c6b) Verified it's not there. [3.3] on mode0, generate the raw trace and full trace [3.4] check @404c6b is every included. The first time it's not included. 8:30AM 12/13/2013 [3.5] check the second time. generate the trace and see we got the getRV error. [30 min] It's the second time, that it reports it cannot find RVRecord for 237743, the problem is why it needs to? 237743 is not in the request anyway. Debug: set BP in hasDataDependency and then check ts ==237743, the problem is that 237743 is now set as in slice now. [3.5.1] check why 237743 is set in slice. [1] read the log file verified that 237743 is not set in slice yet. [2] at the beginning of branch_slice, check if 237743 is in slice; then at the beginning of [30 min] hasDataDependency check if 237743 is in slice. from the beginning it is in slice. slice for 237743 is cleared at the beginning of branch_slice, after init_data slice it is also false. 9:30AM [3] 20 min check when it is set to true at 237743 [30 min] set a conditional bp on InstrExecRecorder->setInSlice() It's added because 238982 has mem dependency on 237743 Fix one bug in Trace::setInSlice() Information: 237743 and 238982 timeStamp: 237743, ins @404c6b: and [eax+0x70], 0xFD write: (start: 0x321f00, end: 0x321f03) , DEPLINKS: , R: 237742 timeStamp: 238982, ins @401c25: test [eax+0x70], 0x02 read: (start: 0x321f00, end: 0x321f00) , DEPLINKS: , R: 238981 , M: 237743 [4] comparative study. [30 min] genreate the raw trace and full trace and the log twice. Sliceat: Timestamp: 240980 0x40103d printf (235861 @40101d --> 238722 @0x4011fb) scanf (238727 @40102e --> 240976 @40110e) 1st branch pass: 61.83%. 2nd branch pass: 65.39% 11:00AM [5] compare the difference in slice logs, use binary search [20 min] The first line of difference is in 27520 log_1: ! generate data slice for 240968 to reach bridge 240977 log_2: ! generate data slice for 240975 to reach bridge 240977 the difference is the generate slice for: this is the first time "to reach bridge 240977" occurs. 11:20AM [6] debug "to reach bridge 240977" and set a conditional bp there. [20 min] --> the next action is to sm.addTS(240967) the new SOC generated is tsStart = 238735, tsEnd = 240974, this caues the bridge operation 20975 to 20977 The question: why the tsEnd is not 240968??? --> the only difference is the call of ii->getCountInSlice() check if clear_slice clears the count. Find the fxxx bug! Costed more than 6 hours of debugging! the clearCount is called after updateCache()! SOLVED! Now check the rest of unknown dependencies. -------------------------------------------------------------------------------------- Task 187: Fix the unknown dependencies -------------------------------------------------------------------------------------- [1] data dump below, analyze each. Sliceat: Timestamp: 240980 0x40103d printf (235860 @40101d --> 238721 @0x4011fb) scanf (238726 @40102e --> 240976 @40110e) [2] check why there is unknown dependency, or at least fix the dump message. The problem is that 238717 is inslice and it has a reverse_pointer>tsRet, but it is neither of the isNeededForMem(), isNeededForReg() or isNeededForVisit() case. Got to check why it's included. Check the log file. --processFunction ts 238722 -- has unknown dependency at 238717; # it is a "pop ebx", it is in slice because it is SOC end. --> in this case, it should be ignored. -- has unknown dependency at 238715 # it is "pop edi", needed by 238728 (in scanf) for register. Problem: 238728 did add the register dependency to it, why it's not set for isNeededForReg()? set a bp and see how 238715 is added. --> found the bug. In real data dependency, forgot to set that it is needed for reg. Now need to regenerate the raw and full traces. 9:00AM 12/15/2013 [3] check init_data slice has bug. debug into Trace::init_data slice for timestamp 240980 Problem: 240978 has not data read!!! re-generate the raw trace and check @401036 has memory read. After completely recompile the project it's recorded. Now generate the branch trace. Need to recollect the raw trace. The first couple of unknown dependencies are listed as below: Sliceat: Timestamp: 240980 0x40103d printf (235860 @40101d --> 238721 @0x4011fb) scanf (238726 @40102e --> 240976 @40110e) --processFunction ts 238721 -- has unknown dependency at 238716 - included by socend, can be excluded. -- DISCHARGED *** -- REMOVED reg dependency at 238714 for reg 5 (0 vs 0) - just solved ***-- has reg dependency at 238714 for reg 8 (322ed8 vs 9) ----- register edi/edi not right. register 5 esp value does not seem right! -- has unknown dependency at 238711 -- it's writing to fs:[0], why it's not in protect mem list? -- we are stuck at how it is setInSlice now it is included becamuse multiple occurance of the same instruction. -- has unknown dependency at 238675 - mov edi, edi. , dependended by 238686i (but it's within the printf function) why should it be included?; it's included because of SOC, again can be discharged. -- DISCHARGED -- has unknown dependency at 238671 -- add eax, 0x20 only depended by 238672. Set in slice as SOCEnd. -- has unknown dependency at 238668 -- has unknown dependency at 238642 -- has unknown dependency at 238636 -- has unknown dependency at 238635 -- has unknown dependency at 238634 -- has unknown dependency at 238631 9:00AM 12/16/2013 -------------------------------------------------------------------------------------- Task 188: Discharge the SOCEnd/SOCBegin in slice case and other cases. -------------------------------------------------------------------------------------- [1] check on SOC begin/end is included in slice [10 min] it finally propagates to reversePointerType [2] add a switch case [15 min] DONE [3] test and run [10 min] Sliceat: Timestamp: 240980 0x40103d printf (235860 @40101d --> 238721 @0x4011fb) scanf (238726 @40102e --> 240976 @40110e) [4] disappears again. check the something wrong with data dependency. [1] check the result of hasDataDepend. --> it seems hasDataDependency is solved. [15 min] [2] read "something wrong" message [10 min] DONE. 10:30AM [5] check problem: "can skip" error message. [1] identify where it is [5 min] tsStart=238040, tsEnd=238066 --> that are located in printf, but the entire printf should be already skipped? set a bp on processFunction for 238721? did not hit. 11:45AM [2] check how soc(238040, 238066) is added. [10 min] read log: ! generate data slice for 238717 to reach bridge 238722 [3] check why 238717 is hit (it should be skipped actually) [3.1] add a bp on process function first , bp on Trace.cc:1442 first [2] bp on soc.cc:40 It reveals that it is soc: soc(238714,238716) setBridge to soc(238723, 238724) Now the problem is why would soc(238714,238716) included? It's caused by addTS(238715), why? It's added because it has multiple occurance!!! ---> this is caused by the following! Another question is the processFunction for the printf is never called!!!! Figure it out later! 9:00AM 12/18/2013 -------------------------------------------------------------------------------------- Task 189: Find out why the printf function is not included -------------------------------------------------------------------------------------- Sliceat: Timestamp: 240980 0x40103d printf (235860 @40101d --> 238721 @0x4011fb) scanf (238726 @40102e --> 240976 @40110e) [1] read log file and ifnd if scanf is included and if printf is included. The problem is that scanf itself is not included as well! It seems that the in-slice ts added for bridge node is not right! 9:15AM [2] check code [15 min] fixed. and unit testing. fixed! But still not solving the problem. 9:50AM [3] Solve th case skip problem.k The problem occurs at 238767->238769. The problem is that 238768 is included, and its tsEntry is smaller than 238767. Check how 238767->238769 is included as SOC. [3.1] set a bp at 238767 at addSOC. [20 min] It's caused by sm.addTS(238768) No the problem why is 238767 included? [3.2] trace into 238768 Found the problem matchCallTS is set to ts itself! 10:30AM [3.3] fix code! [15 min] [3.3.1] fix getCallForRet [5 min] DONE [3.3.2] fix code in socmanager. [5 min] DONE. [3.4] new problem on caseSkip: 239251->239254, check where error occurs. [15 min] it's caused by adding ts 239252 The problem is that 239252 does not have a corresponding call!, it should be 239241 (does not match esp). 11:00AM [3.5] check why 239252 does not have a corresponding call [15 min] bp on 239252 Trace::searchForCall retID==239252. second criteria is used but given up again. check why. It seems that there was a worry (in some earlier stage of the implementation) that a call does not preserve ESP/EBP. It seems that this problem is fixed by resetting esp/ebp of each SOC. [3.6] re-enable the use of second criteria and see if unit testing is working. [15 min] DONE. [3.7] Now the new case 239240->239254, check why [15 min] The problem: 239253 has a deeper matchCALL. 11:40AM [3.8] fix the matchCall calculation again in identifySOC. [15 min] DONE. [3.9] run and check [15 min] new problem with 138516->141715. still the problem of search. fix it [15 min] tsEntry is 138433, check the identifySOC function. Found the problem 138516->141715 is the result of merging. [3.10] the problem is verify_and_reset_soc Need to decalre a function in SOC::getMinMatchCallTS, if the minMATCHTs is smaller than soc, continue to merge, until it is defined. ====> TO DO!!! 9:00AM 12/19/2013 -------------------------------------------------------------------------------------- Task 189: Fix the tsEntry < tsStart problem -------------------------------------------------------------------------------------- [1] Algorithm design [20 min] DONE. [2] implementation [35 min] [2.1] define SOC::getMinMatchCallTS [10 min] DONE. [2.2] in verify_and_reset_soc call getMinMatchCallTS, if the matchTS is smaller than the tsStart, enter the same branch; notice that in the next iteration, the check will be performed again. [20 min] DONE [2.3] debug. [15 min] [2.3.1] check getMinMatchCallTS [8 min] done. [2.3.2] debug the verify_and_reset ... [10 min] done. [3] find a new place where it does not work (tsEntry<tsStart). It's caused by the full_slice call when merge. Question: do we need the full_slice when merging SOCs? We can skip it because anyway, the bSuccess is set to false, and there will be a complete full_slice for each SOC a second time.. [4] observation of lab results. Sliceat: Timestamp: 240980 0x40103d printf (235860 @40101d --> 238721 @0x4011fb) scanf (238726 @40102e --> 240976 @40110e) 238721 is added because read register error 3:11PM [5] new problem: full_slice_all_soc called by insertSOC throws the tsCall < tsEntry complaint. Problem: full_slice_all_soc is called for each insertSOC. adding SOC 147187 -> 147196, but merged with socPrev 147187 -> 147207 , and the 147201 has a call at 145xxx. [6] solution: design a function findSOCStart(ts) which returns the ts proper for SOCStart which is smaller than ts, get the logic from identifySOC. and then modify identifySOC. [55 min] 8:00PM [6.1] define findSOCStart(ts) and take the logic [15 min] DONE [6.2] modify identifySOC [10 min] DONE [6.3] unit testing [5 min] DONE. [6.4] modify insertSOC [10 min] --> seems to already solve the problem without omdifying SOC. [6.4] run and debug [15 min] --> stlil does not work. Need to think about the SOC algorithm again. 9:00AM 12/20/2013. [7] Rethink the logic of sm.addTS [1 hr] [7.1] check the logic of Trace::branch_slice -> break when slice size is equal to the last iteration. It first init raw data slice, and then add each instruction that is in slice. Then it full_slice_all_soc, and checks and verify SOC. [7.2] check the logic of socmanager.addTS, identifySOC and adds it. Note that addSOC logic seems to be very complex, it has too generalized assumption that an SOC may be appended in the middle. ---> check it later Also when insertSOC, when an SOC is really added it calls full_slice_all_soc (which is a lot of repeated job) The point of full_slice_all_soc is to find out those that cannot be used for bridging. [7.3]sm.verify_and_reset_soc merges SOC if bridge fails or tsEntry<tsStart (for calls), return true if everything is ok. [7.4] algorithm improve idea: [1] branch_slice: each iteration inherits the SOCs generated in the last iteration: the purpose is find the right set of SOC. During each iteration: (1) clear and reset the initial raw data slice (2) verify and reset SOCs, merge SOCs if bridge fails --> all SOCs bridge fine and cover the init raw slice (3) full_slice_all_soc, will generate control dependency inside SOC and generate data slices outside of SOC (4) for each time stamp in slide, add and find SOCs for each time stamp, call sm.addTS (but do not call full_slice) (5) if step does not yield any SOCs, then break [1.2] call and write all SOCs. 10:00AM [8] implementation [8.1] change the branch_slice algoritm [40 min] DONE. [8.1.5] code inspection of gen_branch_slice [10 min] DONE. [8.2] change socmanager.addTS (modify the return value) [5 min] DONE. [8.2.1] change identifySOC logic [10 min] DONE. [8.2.2] change addSOC logic. [10 min] DONE. [8.2.3] change insertSOC logic. [15 min] [8.3] check the logic of verify_and_reset_soc 3:45PM -------------------------------------------------------------------------------------- Task 190: fix the complaint about memory error. -------------------------------------------------------------------------------------- [1] break at "something wrong" message and check what is goin gon [10 min] print the reversePoinerType instead. mostly report reversePointer type 7. [2] check type 1. ts: 235425 Sliceat: Timestamp: 240980 0x40103d printf (235861 @40101d --> 238722 @0x4011fb) scanf (238727 @40102e --> 240976 @40110e) ---------------------- timeStamp: 235425, ins @4019e8: mov fs:[], ecx write: (start: 0x7ffdd000, end: 0x7ffdd003) , DEPLINKS: , R: 235424 It is located before printf, I would not worry about that at this moment [3] check the printf function 238722. It looks that it is skipped. 236425 is set as seh delay. But still the rate is about 61%. [4] check accuracy of the stts report. It's ok. The non-imported instruction store size is still big. 7:30PM -------------------------------------------------------------------------------------- Task 191: fix the slicing algorithm error. -------------------------------------------------------------------------------------- [1] 7c8017fd's dependence is not resolved yet, there is only once occurance of this instruction timeStamp: 127006, ins @7c8017fd: mov ecx, [ebp+0x8] The problem is that the memory read operation of this instruction is not recorded! , DEPLINKS: , R: 127001 and EBP value: 0x12ff94, C: 127005 ESP: 0x12ff94 EBP: 0x12ff94 [2] Confirmed 7c8017fd's memory read is not recorded. set a conditional BP. Note: not right: bTracePhyMem is true when the mem addr is captured. 9:00AM 12/21/2013 [3] regenerate the raw trace and look at if it's the interrupt handling causing the problem. It seems that there are a lot of trouble related to recording around context switch. But debug this problem first 9:45AM [4] debug plan: [1] set a BP on Trace::handle_mem_read first and then disable it [2] set a conditional bp on 7c8017f5 which has the read access recorded. [3] set a watchpoint on trace->bPhyTraceMem [3] set a conditional bp on 7c8017fd Problem is to study why bPhyTraceMem is changed. Observation: 7c8017f5 has the memory recorded because there is no record of writeEIP for that particular physical mem addr. Its bTracePhyMem is still wrong! Check if 7c8017f5 is between Syscall back. 10:10AM [5] check how bTracePhyMem is changed [30 min] change src code. Found the problem: when interrupt switch back, it does not disable bTracePhy Fixed. check again Now works. Regnerate all traces. DONE. 11:30AM [6] check if the slice is working. Still not working 12:00PM -------------------------------------------------------------------------------------- Task 192: fix the slicing omission error. -------------------------------------------------------------------------------------- [1] problem statement: function call 40149b (ts: 126979) -> ret 406206 (ts: 127075) [2] guess: it may be caused by an additional setting in slice operation for the program entry [3] implementation: [3.1] remove the adding of program entry [8 min] DONE [3.2] in socmanager::writeSOCs, if the program entry is not in any SOC, then create a bridge to it. [15 min] DONE [3.3] test [15 min] [4] the first instruction is still included. check why: [4.1] first instruction ts: 126978: (@40149b) imeStamp: 126978, ins @40149b: call 0x00004CD1 [4.2] check why 126978 is included. It's included because of other reasons. [5] the first SOC is: tsStart = 126978, tsEnd = 127137 So the problem is why tsStart 126978 (the call is included but its ret@406206 is not included. --------------------------------------- Need to check binWriter and find why the ret@406206 is not included. 11:45PM 12/22/2013 [6] fix bug in MOCManager::handleProgramEntry. [10min] [7] fix bug in full_slice for soc, the soc begin should not be in slice, it will be reached by bridge anyway. [10 min] DONE. fixed. -------------------------------------------------------------------------------------- Task 193: fix another slicing error. -------------------------------------------------------------------------------------- [1] problem is in setArgV. compareative study. The problem is when it returns, the stack RET is not right one is at 0x12FFC4, and the other at 0x12FF88. Found the problem: function 0x0040586B parse_commandline does not reset the stack to its original status! (entering and exiting esp value not the same!) 9:00AM 12/23/2013 [2] check parse_commandline again. [15 min] The problem is that the function call at 0x40586b does not change the esp, but the previous two push instructions reduces esp value by 8. The question is: why isn't esp_delay handled? The corresponindg instruction is 202778. timeStamp: 202778, ins @405867: push esi write: (start: 0x12ff60, end: 0x12ff63) , ESP: 0x12ff64 -> 0x12ff60 , DEPLINKS: , R: 202777 and ESP value: 0x12ff60, R: 202769 The return instruction timeStamp: 204094, ins @4057d2: ret read: (start: 0x12ff5c, end: 0x12ff5f) , ESP: 0x12ff5c -> 0x12ff60 , DEPLINKS: , R: 204093 and ESP value: 0x12ff60, M: 202780 [3] check why 202778 is not handled [10 min] [4] check processFunction 204094 (ret), why esp delay link is not handled. [30 min] The problem is that none of hte instructions in 202780->204094 is marked with EspDelay. So there is no need to actually delay the esp. Rethink about the esp delay algorithm. 10:20AM 12/23/2013 [5] start from the last instruction of the failing function and check ESP chains. [30 min] [5.1] check the handling of instruction @00405873 (ts: 204096) Fromt he log: 204096 added an esp link to ts: 204094 [5.2] check the processing of funciton again and check 204094. Found the bug at line 1508. when it is EspDelay is should not skip. Now fixed, the ESP at 0x0040586b seems fine. But still return in the container function is no ok. [6] compare the ESP of the new problem. problem is th leave instruction (mov esp<- ebp; pop ebp) The problem is that the EBP value is not the same at the leave instruction. 11:00AM [7] check the ebp value problem. [20 min] DONE. The problem is that 158732 @4057d6 (mov ebp, esp) is not included to save the esp value in ebp, which leads error of the leave instruction. Involved instructions: 158732 @4057d6 204093@4057d1: leave 11:40AM [8] trace the dependency chain of 204093 and see how it did not get to 158732 [20 min] 204093->204072 204072 is not handled at all [8.1] It seems that 204093 is only handled in init raw data slice, but not in full_slice, check: [5 min] --> it is actually hit It has included 204072 as ebp delay. check log check new log for 204072 (40bc76: pop ebp) does not work, too long a chain to follow [8.2] try reverse direction: start from 158732 and see who depends on it 158732->158753 (@4019b0)[ it is actually included in the slice, the problem is why it does not set 158732 into slice]. [8.3] trace 158753 (@4019b0), it is depended by 199541 (@0x4019f6: pop ebp), which resets ebp. However, it does not set an ebp link to 148753. [5 min] [8.4] trace into 199541 and check why it does generate mem link to 158753. The problem identified: bNoDataProgation is set to false Found the problem: the if-elseif case forgot to check ebp case! fixed. 3:50PM -------------------------------------------------------------------------------------- Task 194: fix another slicing error. -------------------------------------------------------------------------------------- [1] observation. [2] the problem is the instruction at 0x00405964 depends on 0x00405953 on EDX, but 0x00405953 is not included [3] get the stats. 145675 (jnb ...) 145674 (@405964: cmp eax, edx) 145671 (@405953 lea edx, [eax+0x800] [4] problem: 145675 did not include 145674 as dependency. [5] trace into 145675 fixed bug [6] check again. ------------------- WORKING! and printf skipped --------------------------------------------------- Trace Size: 267373, in slice: 99947, Percentage: 37.38% Instruction Store Size: 48397, in slice: 9494, Percentage: 19.616918% Instruction Store Size (excluding imported DLL): 3511, in slice: 2430, Percentage: 69.211051% ---------------------------------------------------------------------------------------------------- 11:00AM 12/25/2013 -------------------------------------------------------------------------------------- Task 195: check if there is anything that can be improved -------------------------------------------------------------------------------------- [1] check the call at 0x00401418 and see why it is included. ts: 232089, its corresponding ret is 235847 It complains about: 235845: xor EAX, EAX. should have only one register being read! Stopped delay dependency! [2] conditional BP on 235845. [15 min] the problem is that for xor eax, eax it does not identify eax at all! Fixed. now new Something wrong error!!!! ----------------- 9:00AM 12/26/2013 [3] it complains about SEH preservation. [3.1] check how SEH is checked. [15 min] There is a way to use scr or ccr to check [3.2] re-run and check the return at 0x404fe6 [15 min] Still got the same problem "Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 1, ts: 235426, eip: 4019e8!" It's 238713 depends on 235426 (another mov fs:[], ecx) 9:40AM [3.3] conditional bp on hasDataDependency and see how 235426 is discharged. [15 min] There are two problems: [3.3.1] setNeedMem is not set by 238713 DONE. [3.3.2] in processFunction, no need to check isWriteSEH, just use memory check. DONE. [3.4] check if it's ok now. Rebuild and regenerate [15 min] Has problem in generated slice. [15 min] [3.5] check problem with 7c910337 reverse_pointer type 10. [REGI_LINK delay] [15 min] Fixed. Crash on 004058aa (ts:147225), it depends on 0040589e (147222, 147221) but it's not listed. [3.6] check why 147225 ignores links on 147222 and 147221 Problem is with the handling of processFunction 147220 -- REMOVED reg dependency at 147219 for reg 1 (0 vs 0) the value is definite not 0! It's a string located at 0x00010000!!! check the register recording algorithm. This is not correct!!! Instruction info is listed below! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! timeStamp: 147219, ins @7c812c84: mov eax, [eax+0x48] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 11:00AM -------------------------------------------------------------------------------------- Task 196: check the register recording algorithm -------------------------------------------------------------------------------------- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! timeStamp: 147219, ins @7c812c84: mov eax, [eax+0x48] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [1] check if @7c812c84 is in the request list. The problem is that it does not trigger isNeedRecord. [2] check the log and see why @7c812c84 is not in the list, instead tsRet and tsEntry is recorded. ret: 0x7c812c87 (next @40589e) entry: 0x7c812c78 (@405898 call), next @7c812c7e [3] check if these values are recorded. It is recording the value @0x40589e (by @405898 triggered), recording one register 1. Recorded value is 0x142378. Recorded ts is 147215, for eip 0x405898, [4] fixed the reg problem. DONE. now working.A Slice size: Trace Size: 267373, in slice: 111624, Percentage: 41.75% Instruction Store Size: 47560, in slice: 9470, Percentage: 19.911690% Instruction Store Size (excluding imported DLL): 3511, in slice: 2404, Percentage: 68.470521% 11:30AM 12/28/2013 -------------------------------------------------------------------------------------- Task 197: see if there are any improvements -------------------------------------------------------------------------------------- [1] check the last call. 0x00401418->0x0040141d and see why they are included [2] timestamp: ret is 235847 [3] check log and records are shown below --processFunction ts 235847 -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- REMOVED reg dependency at 235845 for reg 81 (246 vs 246) -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- has reg dependency at 235843 for reg 5 (12ff88 vs 12ff84) -- set ts 235847 @0x404fe6 in slice Function included! add into slice RET 235847 @0x404fe6 timeStamp: 235843, ins @404fc5: pop esi read: (start: 0x12ff7c, end: 0x12ff7f) , ESP: 0x12ff7c -> 0x12ff80 , DEPLINKS: , R: 235842 and ESP value: 0x12ff80, M: 235144 [4] the problem is that it tries to compare register dependency on ESP. [5] 235843 is needed for esp by 235849 at 0x40141e: CMP EAX, ESI. [6] fix: change the collection of register timestamp. and see the result (tsRet+1) --processFunction ts 235847 -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- REMOVED reg dependency at 235845 for reg 81 (246 vs 246) -- REMOVED reg dependency at 235845 for reg 1 (0 vs 0) -- REMOVED reg dependency at 235843 for reg 5 (12ff88 vs 12ff88) -- REMOVED reg dependency at 235843 for reg 7 (0 vs 0) -- REMOVED reg dependency at 235843 for reg 7 (0 vs 0) -- REMOVED reg dependency at 235843 for reg 5 (12ff88 vs 12ff88) -- delay dependency on reg: 13 and ts: 235847 to 232088 -- delay dependency on reg: 17 and ts: 235845 to 232080 -- set ts 232080 @0x4055ff in slice -- delay dependency on reg: 21 and ts: 235845 to 232080 -- set ts 232080 @0x4055ff in slice -- delay dependency on reg: 81 and ts: 235845 to 232086 -- set ts 232086 @0x40140a in slice -- delay dependency on reg: 93 and ts: 235845 to 232080 -- set ts 232080 @0x4055ff in slice -- delay dependency on reg: 13 and ts: 235843 to 232088 -- delay dependency on reg: 15 and ts: 235843 to 232084 -- set ts 232084 @0x405604 in slice -- delay dependency on reg: 97 and ts: 235843 to 232084 -- set ts 232084 @0x405604 in slice -- delay dependency on reg: 100 and ts: 235843 to 232088 -- set ts 232088 @0x401416 in slice -- set SEH delay 235427 -> 232064 -- set ts 232064 @0x7c90ee05 in slice Function skipped! add visit link 232088 @0x401416 --- new stats: race Size: 267373, in slice: 57085, Percentage: 21.35% Instruction Store Size: 48165, in slice: 8778, Percentage: 18.224852% Instruction Store Size (excluding imported DLL): 3511, in slice: 1928, Percentage: 54.913130% +++ Task completed: Task generate branch slice for: /home/samba/smbuser/slice_jobs/job1 Worked! now the last function called is setargv! check it later. 9:00AM 01/03/2014 -------------------------------------------------------------------------------------- Task 198: check setargv is really needed -------------------------------------------------------------------------------------- 9:00AM [1] check the timestamps of setargv eip is 0x004013f4. ts: 158729 after ret is 0x004013f9, ts: 204107 problem on processFunction 204106: ERROR in reading register 5 for 158729 or 204106. [2] regenerate all data slices. [2.1] record request mode. [3] --processFunction ts 204104 first couple of messages. ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 //check later ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 //check later ERROR in reading register 1n-- ERROR reading reg value at 158728 or 204104 //check later -- has mem dependency at 204097 on first write on 0x421308 // depended by 235852: eip:0x401434, it's preparing argv. -- has mem dependency at 204096 on first write on 0x421304 // this is argc The problem is that argc and argv is never needed by any other instruction during execution! Check [4] check dependency on 235852, it is depended by 235853 (push argc), and then the call instruction (due to esp) 10:10AM [5] read log and check how 235852 is added in slice. and then how 204097 is added. (1) 204097 is added using memlink from 235852 [not ok, as 235852 is not dependent on mem] [6] conditional BP on 235852. Found the problem. bNoDataPropagation is not accurate enough. 235852 is needed for esp (register), however, its esp is not dependent on its memory. Need more accurate analysis!!! [7] algorithm design to refine analysis: [7.1] read the current design. current design is not accurate enough. [8] algorithm design: Add InstrProcessor::NeedPropagateMemLink; and InstrProcessor::NeedPropagateRegLink. Based on bNoDataPropagation, add more level of control. For example, if a push instruction is not needed for mem, should return false. 7:20PM [9] design I: [25 min] In InstrInfo class provides the following: get OutputDependeceMatrix(int [4][4] mtr) index: reg, mem, esp, ebp. values: 0, no; 1: yes; -1: unknown For example, push EAX looks like the following (input) reg mem esp ebp (out) reg 0 mem 1 (eax) 0 0 0 esp 0 0 1 0 ebp 0 0 0 0 When check if needs to propagate a link, check if it is needed for reg, or for mem, or for esp, or for ebp, then based on ebp. Then based on if one data output is needed, update the row correspondingly (e.g., if for the PUSH EAX instruction, it is only needed in memory, then only keeps, the mem row. [10] design II: [20 min] Treat the memDependeLink as a special case, add a function isMemInputReallyNeeded() to InstrProcessor. always return true for unknown cases; for push case, if it's only needed for Esp; return false; When bProgatate data is set (isNeededForReg, or isNeededForMem), and an instruction has mem dependelink, the only case we could think of is: push [0x401010] which is needed for reg.A [11] make decision: use approach 2 only. Instructions with two outputs are rare. [12] implementatoin: add a function isMemInputReallyNeeded(). [15 min] solved. [13] check the updated dependency list of the function, see below: --processFunction ts 204104 ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 ERROR in reading register 1n-- ERROR reading reg value at 158728 or 204104 -- has mem dependency at 199498 on first write on 0x4209a8 -- has mem dependency at 195804 on 0x321ef8, first 4 bytes: 420580 and 322c98, size: 4 -- has mem dependency at 168552 on 0x12fb4e, first 4 bytes: 420000 and 320210, size: 2 ==> it seems that 19948 mov [0x4209a8] is part of the intitmbctable. It's writing to ptmbcinfo. (a global variable). It's dependended by 201174 (@401c04) a part of setlocalinfo <- vscanf. 9:30AM 01/04/2014 -------------------------------------------------------------------------------------- Task 199: find another to improve -------------------------------------------------------------------------------------- [1] verify if the last improvement is working. verified, working. [2] Think about the case of 199498, and see if it can be improved. Information below. 199498 is saving information to ptmbcinfo. timeStamp: 199498, ins @4071f0: mov [0x4209A8], ebx write: (start: 0x4209a8, end: 0x4209ab) , DEPLINKS: , R: 195779 , C: 199497 ESP: 0x12ff28 EBP: 0x12ff5c It's dependended by 201174 (@401c04) a part of setlocalinfo <- vscanf. 201174 is depended by 201175, which is a branch Then it is depended by 201176 [2.2] check 201174, it is a part of function ___ its return is 0x401c43, ts: 201185. --processFunction ts 201185 -- set ts 201185 @0x401c43 in slice Function included! add into slice RET 201185 @0x401c43 check why function 201185 is included [3] conditional debug of processFunction 201185. The function is included because it changes esp/ebp [4] explore if the change esp/ebp is ignored, will the function still has dependee. failed, because needs to collect the recording information. Current stats: Trace Size: 267371, in slice: 57077, Percentage: 21.35% Instruction Store Size: 47554, in slice: 8772, Percentage: 18.446398% Instruction Store Size (excluding imported DLL): 3511, in slice: 1923, Percentage: 54.770721% [5] slicing improvement: after the check of bNoChangeOnEspEbp. check on if on the same callEIP the difference of ESP and EBP is the same and check if the minor adjustment can fit into the slot. call binWritter to generate the binary instruction. Once it passes, then this information should be passed to binwriter when writing the slice. 10:30AM [6] Detailed Alg Design: [1] declare class CallAdjustFailureRecord(eip). supports methods addEIP, findEIP. Itself keeps a simple hash function. Need serialization. [2] unit testing CallAdjustFailureRecord. [3] declare class CallAdjustRecord (eip, esp_change, ebp_change), use CachedMap. [4] in Trace destructor, call the eipToCallAdjustRecord clear and see if the destructor of CallAdjustRecord is called. [5] add bool Trace::AdjustCall(ts,tsRet), it should first check if eip belongs to the CallAdjustFailureRecord; and then check if it managable to replace hte instruction with the adjust of esp. [6] update the binWriter::writePartrialTrace. Call Trace::eipToCallAdjustRecord and see if there is any record to replace. 11:00AM [7] Implementation [1] CallAdjustFailureRecordProcessor [1 hr] [1] vector<unsigned int> vecEIP, map<eip, 1>, Cache [8 min] DONE. [2] function addEIP() [8 min] DONE [3] function serializeTo(char *) [15 min] DONE. [4] function deserlizeFrom(char *) [10 min] DONE. [5] hasEIP [5 min] DONE. [6] loadFromCache and saveToCache [15 min] DONE. [5] unit testing [20 min] [1] something wrong with rrv loadFromCache. OK. [2] needs to adjust the serialization save integer by integer. 3:00PM [3] debug through. found the problem. 3:50PM [2] define class CallAdjustRecord [DONE] [1] define data members: [5 min] DONE. [2] public operations: [30 min] DONE. (1) define class CallAdjustRecord(eip, espChange, ebpChange) - just data class [10 min] (2) function int size asReplacement(char *buf), use binWriter functions. [20 min] [3] unit test [25 min] 4:45PM 9:00AM 1/5/2014 [3] define class CallAdjustRecordProcessor [estimated 2 hrs] [1] data members [15 min] DONE. (1) string basePath <- from Trace directory (2) CallAdjustFailureRecordProcessor (3) vector of CAR* (4) map of eip to CAR* [2] public operations: [1 hr] (1) contructor: based on Job::REQUEST_MODE decide if to load the cache for CallAdjustFailureRecordProcessor or creat it as a new. First from BatchAnalyzer get the current job, and then job->job_path. [15 min] DONE (2) destructor: remove the list of CAR*. [5 min] DONE (3) protected: getCAR(eip) -> pointer to CallAdjustRecord [5 min] DONE (4) protected: addCAR(eip, espChange, ebpChange) [8 min] DONE (5) public: tryAddCar(eip, esp_change, ebpChange) -> return false if failed, it first checks CallAdjustFailureProcessor to check if this is a failure record, and then it gets the current car, see if the espChange and ebpChange the same, if the same, it attempts to serialize it. For all failure, mark the callAdjustFailureRecordProcessor. [25 min] 6:50PM [3] include CallAdjustRecordProcessor in Trace [40 min] (1) declare CallAdjustRecordProcessor in Trace [8 min] DONE. (2) change the logic on hasDataDependee based on isFunctionNoChangeESPEBP. [15 min] (3) change the logic of binWriter [15 min] [4] debug into it. [30 min] [1] request mode [15 min] [2] work mode [20 min] 9:00AM 1/6/2014 [1] test the CallAdjustRecordProcessor [1] 1. the request mode [20 min] ok. bp on the constructor and destructor and then start the raw mode. trace_record ok, full_trace. ok. [2] 2. test 2nd chance code in Trace.cc:1508 [15 min] DONE. [3] fix the addCAR problem. [10 min] fixed. [3] 3. check the binWriter case. [20 min] Observation: raw mode non was collected 10:40AM [4] use the capture mode. Found the problem: the EIPs pushed into the carp is not the "call" instruction. [5] check Trace.cc:1515 again. Fixed the bug first replace: eip 0x401355 (call heapSetInformation) second replace: eip 0x405058 Fixed. New stats are: Trace Size: 267373, in slice: 56912, Percentage: 21.29% Instruction Store Size: 47547, in slice: 8717, Percentage: 18.333438% Instruction Store Size (excluding imported DLL): 3511, in slice: 1916, Percentage: 54.571347% Trace sice: 21.35->21.29%. [5] check the setarg call again. ret: meStamp: 204106, ins @40588d: ret --processFunction ts 204106 ERROR in reading register 5n-- ERROR reading reg value at 158730 or 204106 ERROR in reading register 5n-- ERROR reading reg value at 158730 or 204106 ERROR in reading register 1n-- ERROR reading reg value at 158730 or 204106 -- has mem dependency at 199500 on first write on 0x4209a8 -- has mem dependency at 195806 on 0x321ef8, first 4 bytes: 420580 and 322c98, size: 4 -- has mem dependency at 168554 on 0x12fb4e, first 4 bytes: 420000 and 320210, size: 2 -- has mem dependency at 168531 on 0x12fb4c, first 4 bytes: 420000 and 320210, size: 2 -- has mem dependency at 168232 on 0x12fb32, first 4 bytes: 427c80 and 320210, size: 2 check setlocal function is skipped (for 199500) ******************************************************************************** for 0x401e2b (call setLocaleUpdate) --processFunction ts 238992 -- has reg dependency at 238990 for reg 5 (12fcf8 vs 12fcfc) -- dependency on [esi] ---> it seems that the reg 5 is not right! Check later!!! -- has mem dependency at 238987 on 0x12fd18, first 4 bytes: 8148 and 8101, size: 1 -- set ts 238992 @0x401c43 in slice Function included! add into slice RET 238992 @0x401c43 01/07/2014 8:30AM -------------------------------------------------------------------------------------- Task 200: add the branch exit -------------------------------------------------------------------------------------- Idea: find an empty hole big enough and insert the branch to call the TerminateProcess function. [1] modify Trace::branch_slice and insert the call on binWriter [1] in gen_branch_slice call binWriter writeProgramExit [8 min] [2] in binWriter header define writeProgramExit [8 min] [3] implement it [30 min] 9:50AM [2] genExitCode [2.1] scheleton [15 min] [2.2] genCallTerminate [20 min] [2.3] implement writeLittleEndian [10 min] 10:30 [2.4] finish genCallTerminate logic [20 min] 11:30 [2.5] finish the jmp logic. [DONE] -- 6:50AM 01/08/2014 [2.6] finish the logic of changeBranch (10 min) [DONE] [2.7] handle the logic of failure of inserting branching exit. remove it from folder. (15 min) [DONE] 7:30AM [3] Debugging [1] WriteProgramExit [ok] [2] findHole. problem minDist. FIXED. [3] need to flush after write the first step. [4] asJMP. [5] genExitCode. [6] genTerminateProcess There are problems with genTerminateProcess. 10:00AM [7[ fix genTerminateProcess. Don't use strcpy. Use memcpy. [10 min] [8] the terminate code is not written into the right place. fix it. [15 min] 10:55AM [9] target address of TerminateProcess is wrong (it is 1e1c, the correct one should be 1e16) - 6 bytes away. [10] Now fix the visiting logic in findHole. 11:00AM 1/9/2014 [10.1] define bool checkIsHole(Trace* trace,unsigned int eipStart, int size, int fidTarget, int fidSource) [25 min] [10.2] add a second fid of source, read the instruction instruction opcode and then check the trace->hasInstruction, and call checkIsHole() [10 min] DONE. [10.3] modify genProgramExit and add parameter source filename [5 min] DONE. [10.4] modify gen_slice_for_branch pass parameter source file name [5 min] DONE. [10.5] modify the caller. [8 min] DONE. Success. 9:00AM -------------------------------------------------------------------------------------- Task 201: generate all branches and build the program to collect running results. -------------------------------------------------------------------------------------- [1] build all branches. [30 min] report error on case skip [2] regenerate raw and full slice mode 0 -- strangely the network does not work [3] modify the collection program. DONE. 8:00AM 1/11/2014 [4] check samba configuration try command "net use y: \\169.254.236.150\smbuser" Observation: try "dir" -> it's very slow, then try "ipconfig /all" it's extremely slow. Not sure what's going on. "notepad" is also working however slow. "ipconfig" - never reports anything just hangs. -> after 30 minutes no response 9:00AM -------------------------------------------------------------------------------------- Task 202: check qemu image networking problem again. -------------------------------------------------------------------------------------- [1] recompile and reset. does not work 10:45AM [2] read previous logs about network setup. Check how to diagnose. using tcpdump, it seems that the xp vm is sending request to tap0 device (captured by tcpdump), however, there is never a response. [3] read about TAP device: tap is for link layer 2, tun is for routing (layer 3). A user program connects to tap/tun device to receive packets. [4] read about bridge: a bridge of two adaptors is to simply merge these two guys. --- strangely, the host regard it as 10.0.2.16. ****************************************************************************************** ****************************************************************************************** ------------------------------------------------------------- -- solved the problem. needs to set static eip 10.0.2.15 for the br0. (manually, see the new startup.sh in qemu_image) ------------------------------------------------------------- ****************************************************************************************** ****************************************************************************************** 1/12/2014 10:40AM - -------------------------------------------------------------------------------------- Task 202: check the case skip problem -------------------------------------------------------------------------------------- [1] identify the ts that got the thing broke. 132786 [2] regenerate full trace. Problem: tsEntry:126977, tsCur: 127073, tsEnd:130412, tsStart: 126990 (the start of SOC) First check if 126977 and 127073 are a pair of match Confirmed. The call is about init_security_cookie. [3] find out how SOC 126990 to 130412 is established. It is the result of verify_and_reset_SOCs. The problem is that the ts introduced in the merged slice introduced a call Problem is that the verify funciton already reports false, it still continues to slice. [4] another case skip problem: this is because the last SOC (instead of SOCPrev is not verified yet)A Fixed: new problem: cannot handle instruction size 6. 9:00AM 01/13/2014 -------------------------------------------------------------------------------------- Task 203: fix slicing algorithm bugs -------------------------------------------------------------------------------------- [1] identify ts: for cannot handle instruction size 6. error thrown by binWriter changeBranch. instruction bytes: 0x0f 0x84 0xc1 0x00 0x00 0x00 0x00 handle size 6. 10:00AM [2] check merging logic infinite loop problem; problem: timestamp 203769 is added over and over again. Fix the logic of sm.addSOC --> when it's already in slice, no need to return true. Found the bug: soc is not inserted when its index is 0. 11:00AM [3] run the 367 branches. found new error: 146858. [138th] set conditional bp Problem with soc(146103->146247) bridge to 146249, soc id 4. It seems that it needs one more iteration of fullslice. Verified it's the problem if verify_and_reset Problem with verify_bridge 7:30PM [4] new problem: ts=242259 tsCur 238465 check how trace->tsToMrM is used. fixed. 11:00AM 01/14/2014 [1] now complete run of all 327 branches [2] Still got a lot of c005 errors. [3] these programs crash the debugger itself. check the bridge for program entry. 8:30AM 01/14/2014 [5] bug found: bridge right after ts does not work, has to be in the same thread. Add a Util::error_exit on location. [5.1] regenerate the raw and full trace. DONE. [15 min] [5.2] run and capture error. [5 min] discovered the error. It is id 1 branch (2nd branch). generate file location error for 126979. This is caused by a bug that discovers context switch one instruction later, so there is one instruction in interrupt handler cut in. 9:00AM [6] bug fix: context switch handling [6.1] read the code about Context Switch [20 min] [6.1.1] gen raw trace and check it again. [6.2] recompile and see if the problem persists. [10 min] Now works for 4 examples [7] check brc_3, why the exit code is something different. [7.1] first branch is still not right. debug into it [15 min] [7.2] found the problem is the program entry handling, not handled. [10 min] The problem is that the last instruction is a CALL, and it is marked off. So the control directly jumps to the next one. Need to uncomment out the unmark statement [7.3] fix and test [10 min] [7.3.1] another bug at branch 2. The problem is that the program entry is not added in slice. 10:30AM [8] another bug: terminate branch itself. It seems that the JE branch is not right brc_2. The jmp is not right. Fixed the bug in changeBranch. SOLVED 10:40AM [9] another bug: brc_0 is not right. Still program entry. Fixed. 11:00AM [10] let it run for a while. broke at slice 7. check it later. [11] add configure DUMP_ENABLED handle. DONE. 11:50AM [11] now the scanf example does not work. First it broke at a function; then, the printf is included. [1] recompile and regenerate the trace in both modes. mode 1: mode 0; [2] found that printf is skipped successfully (saving about 20% of instr store instructions), but it still broke at _minit. [3] trace into the problem and see why it is broke. (ts=240744) Broke at 0x40606a. -> 0x406112 The problem is at 0x405d42 the esp value is different, causing ret value different. Problem found: 0x00405d34 (pop) is not included. [4] new problem found: 0x405d52 (pop) is not included which causes the problem- VERIFY LATER. Study the two calls at: 405d5a (ret: 144636) is skipped, 405d5f (ret: 144648) is called, 405d64 (ret, ts: 144649) failed. Function 144637(ret ts) is skipped, however, the delay dependency on reg (14) [bp] is relayed to 144612, but there is no delay of esp. Conditional bp on 144637 process function. Problem: only EBP delay is recorded. Check these two instructions! Problem: 144637 is inslice (next call), it depends on 144636 for esp, however, it's not needed for esp. Check: how the esp of 144649 is broken It seems that RET does not propagate the esp link. Fixed. 9:00AM 01/17/2014 -------------------------------------------------------------------------------------- Task 204: fix slicing algorithm bugs -------------------------------------------------------------------------------------- [1] check another bug at 0x004085a5. mov[edi+constant], eax. Both edi and eax are not right. The problem is that eip 0x4085a5 (ts: 231916) depends on 0x408597 (ts: 231891) ==> 231916 is not needed for mem and does not propagate data But its data destination needs register, hence actually bDataNoPropagation should be set to false. [2] debug 231916 [20 min] Proposed solution: add a method in InstrInfo to tell if an instruction needs to write to memory, and the destination has register, set bNoDataPropation to false, and set bNoDataRegDependency to false. 9:30AM [3] Implementation: [3.1] Create InstrInfo::isMemOperandContainReg() [3r min] DONE [1] declare a flag for hasMemoperandContaingReg [5 min] DONE. [2] declare the set and get function [8 min] DONE. [3] declare a checkRegUpdate function [8 min] DONE [4] update the setInputOutput reg [5 min] DONE [5] debug. [15 min] DONE [3.2] Modify InstrExecRecorder::isMemInputReallyNeeded [15 min] SKIP. [3.3] in Trace.cc change the logic to change value on bNoDataPropagation and bNoRegDep. [15 min] DONE [3.4] debug through [30 min] Now the new stats eip 0x4085a5 (ts: 231916) depends on 0x408597 (ts: 231891) [3.1] 45 min DONE. does not pass unit testing. fixed. another unrelated unit testing bug. fix destructor of CallAdjustRecord. [3.2] 8 min -> skipped [3.3] 15 min [3.3.1] regenerate the trace. [1] raw, [2] full and doc, [3] branch check @4085a5 (it's not there anymore) Regenerate the mode 1 and then mode 0 9:00AM 01/18/2014 [4] new problem: 0x004019ac. (ts: 234937) The problem is that its memroy dependency link is not propagated. [4.1] conditional BP on 234937 and find out the cause. [30 min] Problem is line 642 introduced in yesterday's code. 9:35AM [4.2] fix: add the check on isNeededForReg as well, and then check [15 min] fixed. 10:00AM [5] new problem: 0x40cc8f. [5.1] comparative study of trace. [35 min] failed. There are so many instances of 0x40cc8f. Regenerate the entire thing. mode 1 and mode 0 Now the problem is located at eip: 0x406cdc (this is the first time that the time stamp is hit: ts: 193869) The address to write is not right. Problem: esi value is not ok. It relies on 192920 (@40ce44 pop esi), and the instruction in the slice is NOP (problem). So the problem is that The ESI register is changed in the function that contains @40ce44, but its value is not recovered. go to check the slicing log. Function is: 181506@406cb2 -----------------------------------------------------------------> 192936@40ce44 (skipped) inside 406cb2 it calls (181624@40ce81 ---> @40ce44:192920 192929 @40ce51 ) is included because 170185@40ce81 --> 181486@40ce51 is INCLUDED! @0x40ce81: ts: 181624), the function is skipped. In unsliced version: 0x40ce81 --> 0x40ce44 --> 0x406cb2 --> 0x40ce81 --> 0x40ce44 --> 0x406cdc Both functions (@0x40ce81, @0x406cb2) preserve the esi value In sliced version (buggy): 0x40ce81 --> 0x40ce44(nop) --> 0x406cb2 (function skipped did not call 0x40ce81) --> 0x406cdc (error on esi) Check log: 193869 (@406cdc) --> 192920 in slice (0x40ce44)--> processFunciton ts 192936 [the ret for functio n0x406cb2]A --> 0x40ce44 delayed to 181477 (@0x40ce44) [but the log shows that it is included in slice] Next check: in binWriter call trace->setIER_II(181477) and then check if it's in slice. Strangely IER is in slice but II is not in slice. Stragenly, after the first set of inSlice, the II because false. [[check!!!] Found the problem: in Trace::setInSlice it first checks ier->setInSlice(), if it's already set, it's not going to call ii->setInSlice(), which sets the counter; thus the counter is always 1, even if there are multiple increments. But when applying delayRegDependence, if an instruction has multiple delay links, it's decrementing the counter too many times. In the design, the counts of InstrInfo reflects the number of distinct locations in the slice. The call of unmark slice seems not following this semantics strictly. Change the implementation of unmarkInSlice. If the ier is not inslice, don't do it. FIXED.!!!! 8:30AM 01/20/2014 -------------------------------------------------------------------------------------- Task 204: make sure the 1st 10 slices are working -------------------------------------------------------------------------------------- 1. manual analysis and running both prove that they are working. 9:30AM -------------------------------------------------------------------------------------- Task 205: chain all the parses together and create a new task type. -------------------------------------------------------------------------------------- [0] planning [20 min] --9:45 AM implementations (expected to complete 11:45AM) [1] Modify the config.txt [5 min] DONE. [2] add Job category in header file [5 min] DONE. [3] process the category [10 min]. DONE [4] create class taskBatchBranchSlice [15 min] [5] call taskBatchBranchSlice [10 min] DONE [6] debug the above [15 min] --10:45AM [7] refine implementation of taskBatchBranchSlice [40 min] [1] add a parameter to Trace::gen_branch_slice to change JOB::PRESERVE_REQUEST_MODE value, if the input value is -1, keep the orginal value. [8 min] DONE. [2] change the taskBranchSlice correspondingly and make the compile through [10 min] DONE. [3] fix the others [8 min] DONE. [4] debug through [15 min] --11:50AM [8] debug the above [20 min] [1] found the problem related to job_cateogory. [2] fix: [2.1] create class taskChangeJobCateogory [20 min] [2.2] insert taskChangeJobCategory [20 min] [2.3] debug [15 min] [9] test first 10 slices [20 min] [1] problem. segmentation fault! 10:00AM 01/21/2014 [9.1] fix the segmentation fault problem. The problem is that the taskChangeCategory has no logger property. Fix that. [25 min] [9.2] find out the gen_branch segmentation fault problem. [15 min] problem is that full trace is not there yet. The problem is that the raw trace is not saved to disk. 11:15AM [9.3] start a raw trace mode running and find out when it is written to disk. [15 min] It's called by the destructor of TraceManager. [9.4] Solution: add a new task to delete all Traces [20 min] DONE. [9.5] new problem. the second loadvm is timedout. rebuild all. problem 1. the "log" is counted as one job. fixed problem 2. the vm should be resumed. fixed 9:30AM 01/22/2014 [9.5] the vm is still stopped. Need a command to resume the vm. add a resume task after the loadvm task. recompile all. bp on taskContVM::do_job Problem: it never actually hits the taskContVM. Add the "cont " command to loadVM Now it seems to work (after appending a cont command) -> ng helper_trace2 at least [9.6] problem: send_evt problem on TraceManager::myinst (already null), it always pop a segmentation fault. Check why. Recompile first. [1] fix destructor of TraceManager. DONE. [2] add TraceManager::createInstance() in loadvm. DONE. [3] need to clear numCR3. check: bp on ops_sse.h:2386 check when the vm loaded signal is setn. [4] problem: callAdjustRecord destructor error when deleting raw trace in gen_full_trace. SOLVED. -=----------------- [10] problem: out of memory. Check if Cache has destructor. Use leak detector. problems seems to be fixed. [11] rr_processor loadFromCache (round 2) b Trace.cc:2213 (data is the ame for two parses)A The problem is with line 279 of Cache.cc if total_size = 0, should not be added with arrPosition. [12] out of memory again. 8:30AM 01/25/2014 [13] run the system again. Still crashes because of memory leak. Check val_run using full_mode. [1] leak on cacheRV. Found the problem when loadCache, there is a memory leak. It's a possible leak. leave it. [14] run valgrind on branch slice and check leak. problem in x86_disassembly a lot of memory leaks. Problem is that x86_disas cals calloc to intialize insn->op field, but did not release it. is there any x86_destroy_insn functions? see below-- x86_oplist_free(&insn); //does the real job x86_cleanup(); //this function simply returns 1? -- to do: find each x86_disasm(...) and call x86_oplist_free correspondingly. fixed [15] fix the mapTsToId problem: Now owrking the first time! add another source. 9:30AM 01/26/2014 [1] continue the exploration of memory problem. There are around 560MB memory not freed for the PC emulator. Check in the raw mode (reloading snapshot), if it is getting worse. Reachable is about the same. It seems that we can stop the exploration of mem leak. [2] run the batch again and see if it terminates right. error in cannot open raw_traces. The problem: the full_trace task tries to open it but raw trace is not there. bp on taskSaveTrace::synch_job(). strangely taskSaveTrace is only hit once. It seems that taskSaveTrace does not save the trace. remove the raw_trace there. Problem initVM recreates the trace!!! Solution: move the part of the code to initVM. [3] now the problem of rr_processor unit test fails. double free. just checked. avoid double free. check rrProcessor.saveToCache() is called. add to save_rr_processor to taskTraces. 12:00PM 01/28/2014. It pops an error in save_rr_processor in the second pass. should be fixed [4] out of memory again. This time try to break on pc.c:929 It is not hit the second time. Try valgrind again. It still times out. Try enlarge mem capacity. does not work. ram device error. [5] check mtrace. embedded mtrace code. the trick is to use "export MALLOC_TRACE=/tmp/t". If it's not in tmp, it seems not passing. The progra mstopped at about 400MB (request 200MB). [6] try setting OOM killer exceptions. 9:30AM 01/29/2014 -------------------------------------------------------------------------------------- Task 206: solve the out of memory problem and other problems of batch branch -------------------------------------------------------------------------------------- [1] try change the trace size. seems to solve the problem. [2] error: cache must be empty before saving to cache. Strangely the save_rr_processor is called in the second pass, where the Job::REQUEST MODE is 1. It's the change it back causing the problem. [3] job2 starts from slicing immediately. The category is not reset back. Now seems working. [4] move a 3rd file to processing. passed [5] check 10 branches each file. and then run the results. [6] smbd connect still too slow, try adjust /etc/samba/smb.conf, enable the tcp_option= NO_DELAY. seems not helping. Seems stil lnot working. reboot. [7] smbd: it seems to be the problem of xp side. The initial request is not sent until several minutes later. [8] double free delete this->cur_job; [9] samba is still too slow. Try to think about the solution. [10] try take another snapshot with it net use ... already loaded. problem with snapshot. inactive CPU. (info cpus -> halted even after cont command). 2:30PM 01/31/2014 -------------------------------------------------------------------------------------- Task 207: solve the SLOW net map samba drive problem -------------------------------------------------------------------------------------- Check the following functions. qemu_run_all_timers () at qemu-timer.c:454 (calls the following) qemu_run_timers is visited many many times It seems that it takes a very long time to reach the "break" in the following 386 if (!qemu_timer_expired_ns(ts, current_time)) { 387 break; 388 } ******* SOLUTION **************** [1] attempt 1: add a base timestamp to -rtc clock=vm option. does not work. [2] search "halted" for the "cpus command" the "(halted)" message is printed by hmp_info_cpus. It is reading value cpu->value->halted. It is generated by qmp_query_cpus. It's actually getting the env->halted. interestingly when helper_trace2 is called, env->halted is always 0. (this is for snap111) When loading snap222, the env->halted is always 1 when doing qmp_query_cpus, however, it is 0, when helper_trace2 is hit. set a watch point on it and see how it's changed? It is called by do_hlt <- helper_halt. While loading snap111, it's not easy to hit the do_hlt. It seems that both snap111 (good one) and snap222 (bad one) do turn on/off the env->halted. The question is: maybe it is unrelated? (or maybe related)? [3] research how is helper_hlt called. It is triggered by a hlt instruction (see translate.c). a hlt instruction halts CPU until the next interrupt (e.g., timer interrupt). [4] check timer interrupt. from intel documentation, timer interrupt is generated by apic (later verified wrong. should be i8254 chip. there is a qemu simulator for it). timer interrupt is triggered in acpi_pm_tmr_update (wrong: should be pit_update_timer in i8254.c) [5] check interrupt handling IRQ: 0 hardware interrupt is done using do_interrupt_x86_hardirq Compare snap111 and snap222: when sending a keyboard event do_interrupt_x86_hardirq are called (intno: 147) however, it is called much less freqently than snap111 (in failed snap222). A lot of intno 61, 98 etc. suspect 61 is the timer interrupt. ******************************************************************** Guess: the messed snapshot has CLOCK value invalid. which does not trigger the timer interrupt. ******************************************************************** 9:00AM 02/01/2014 [6] read hw/apic carefully. If possible, figure out how hardware interrupt is raised. [1 hr] location: hw/apic.c. Interested functions listed below: NMI - non maskable interrupt SMI - system management interrupt (when OS is suspended. CPU management mode) APIC supports several "delivery" methods of interrupt: local , smm, external, and bus deliver. For non maskable interrupts (NMI), it's calling function cpu_interrupt to direct pass the non-maskale or system management interurpts in. For tohers, it is setting irq using apic_set_irq. ** apic_update_irq (signals CPU when an IRQ is pending) ** apic_set_irq both called cpu_interrupt, it changes env; may be the ones that used by timer ** apic_get_interrupt gets the highest prioirty interrupt currently in apic Now timer related functions: *** apic_timer_update *** apic_timer Interestingly, these two timers are not called in snap111 (the good one). It seems that APIC is the wrong place to look at. qemu_mod_timer and hw/i8254 is the place to look at. Intel i8254 is the programmable timer. [7] read hw/i8254.c (programmable timer) [0.5 hr] 8254 use pit_set_gate to send out information. the important function is pit_irq_timer_update ----------------- *pit_irq_timer_update: it computes the expire time and irq level and calls pit_set_irq [seems to be called for every timer interrupt] next_delay is defined as (expire_time-current_time)/get_ticks_per_sec() get_ticks_per_sec() returns 1G. expire_time-current_time in GDB shows something like 843 ----------------- 10:30AM 02/01 [8[ read qemu_timer.c [1 hr] qemu_next_alarm_deadline is calculated as the smaller of the delta of host timer and rt timer relative to expire_time. qemu_del_timer is to stop a timer (but not deallocate it). It's basically to remove the timer from the linked list. qemu_mod_timer is to modify the current timer so that they will be fired after the expire_time. Expire_time is the absolute time in ns. Its function is to change the expire time of the given timer "ts" and insert it back into the list of active timers of its associated clcok. There should be three clocks: vm, host, and real time. *qemu_run_timers -> (1) get the current time (2) if the active timer is not expired, return; wait until the next time tick to check. (3) if there are expired timers, call ts->cb [which is set to i8254.c:pit_irq_timer_update Summary: the logic here: main_loop_wait -> qemu_run_all_timers (on vm, host, real) It's the qemu_run_timer(vm) triggers pit_irq_timer_update will immediately shoot a qemu_set_irq request, and then update the timer (to set the next expire time) qemu_set_irq -> pic_irq_request -> cpu_interrupt Observation on snap222: pit_irq_timer_update is still called frequently. So what's the difference between snap111 and snap222???? 11:30AM [9] attempt: explore the relation between [30 min] do_interrupt_x86_hardirq are called (intno: 147) AND pit_irq_timer_update Design: first break on pit_irq_timer_update and then do_interrupt_x86_hardirq and see if it's paired, and record interrupt number in snap111 and then repeat it in snap222. [9.1] Observation: in snap111, pit_irq_timer does not trigger do_interrupt_x86_hardirq. The interval between each neighboring pit_irq_timer is 838 and 54923725 (ns). All vm clock triggered. drill down into pit_irq_timer: it's calling hpet_handle_legacy_irq (when irq_level is 0) -> gsi_handler -> 8259 -> pic_update_irq -> pic_irq_request The irq_level in pit_irq_timer_update is alternative between 1 and 0 (because 54923725 %65535 is alternating between 0 and another non-zero number) ****** Note from wiki: On the PC, the BIOS (and thus also DOS) traditionally maps the master 8259 interrupt requests (IRQ0-IRQ7) to interrupt vector offset 8 (INT08-INT0F) and the slave 8259 (in PC/AT and later) interrupt requests (IRQ8-IRQ15) to interrupt vector offset 112 (INT70-INT77). So timer interrupt is int 8. ***** [9.2] attempt 2: set bp on do_interrupt_x86_hardirq and bp on intno: 8 But do_interrupt_x86_hardirq the reported interrupt numbers are mostly 177. Ocassionally 113. keyboard event is 147. Observation 2: in snap222 it is also receiving hardware interrupt 177. Different: when sending a key, interrupt 61 did not occur. Got to do a comparative debugging for int no 61. 6:50PM 02/01/2014 comparative study of hardware interrupt 61. [10] study interrupt 61. Design: break on do_interrupt_x86_hardirq and see how it reached from 147 (twice) to 61. cpu_exec triggered interrupt 147 (keyboard event) Problem: cannot direct "n" in gdb there are long jumps. capture the first 147 and then break on cpu_x86_exec then bp on helper_trace2 for each instruction Observation: the interrupt handler for interrupt 147 is executed on the same sequence. However, interrupt 61 just did not show up. Seems not working. 7:30PM [11] coming back to the timer interrupt. Purpose: figure out the intno for timer interrupt. Candidate interrupt numbers are 177, 61, 68, 98 Actually confirmed in another way, numCR3 in snap222 is never increased. So timer interrupt is handled wrong, or scheduling is not right. [12] Trace on pit_irq_timer_update read PIT 8254 documentation. Mode 2 is the rate generator mode. There seem to be too many timer interrupt generated! It seems that in both snap111 and snap222 images, there is a gap between current time and expire time, and it takes a lot of iterations to reach the point that the two are equal. ???maybe should break on qemu_del_timer in line 269 of pit_irq_timer_update. 8:30AM 02/02/2014 [13] Study the logic of qemu_run_timer again. A clock has a sequence of timer. Each timer has an expire time in nanoseconds. Question: what is the difference between a timer and an alarm timer? struct qemu_alart_timer can be regarded as a class qemu_alart_timer which has three methods: start, stop, and rearm. It has no relation with QEMUTimer, but has a data member (dependeing on OS), to a real timer of OS. qemu_next_alart_deadline computes the "smaller" deadline of host and virtual timers, it is calculated using real system timers. *** show_available_alarms shows two alarms dynticks and unix_timer, note that an alarm timer has a name. qemu_get_clock_ns gets the "real" (lasted time) for vm clock. qemu_modify_timer: modify the timer updates its expire_time (given as a parameter) and insert it back into the clock as the active timer. qemu_run_timers: take the current time (from the host system or a real timer) using a for loop to repeately increase the timer (maybe many times). Each time calls pit_ireq_timer_update to generate IRQ signals. However, most of them are ignored. event_notifier_set is defined in util/event_notifier-posix.c:92 qemu_clock_warp does not actually do anything just return because use_icount is 0. The purpose of alarm clock is to stop vcpu from the thread of execution code and execute the other thread to check interrupts. 10:30AM Summary: qemu_run_timers run the 3 clock timers. Each clock has a sequence of timers, however, it seems that only one timer gets updated. Each timer, when updated, will call pit_ireq_timer_update to trigger IRQ. Problem/Question/Puzzle1: for each qemu_run_timer the pit_ireq_timer_update is called many times, because there is a huge gap to the real time stamp read (or maybe this delay is caused by debugging???...) 10:45AM [14] explore pit_irq_timer_update again and drill down to details how signals are sent. [1] pit_get_out - generates the out-pin value. At mode 2, if the the value%count is 0, out is 0; otherwise it is 1. According to i8254 manual, the low edge represents a clock pulse (signal) to the i8259 apic controller or other controller. [2] pit_irq_timer logic: 1. calculate the output line (1-bit) level (high-1, low-0) 2. call qemu_set_irq(s->irq, level) to output the signal. the s->irq defines which controller handles the output channel Here each QEMUTimer corresponds to a PITChannel (one of the 3 counters on i8254). Depending on which counter it is, output line is connected to different place. According to i8254 documentation, on i8254 channel (counter) 0 generates timer interrupt, channel 1 generates DRAM refreshment signal, channel 3 generates signals to speaker. 3. drill down into qemu_set_irq it is calling: [1] hpet_handle_legacy_irq (defined in hw/hpet.c) - high precesion event timer: hpet is the next generation of timer device (10Mhz) better than i8254 (1Mhz). it basically forwards the request (irq:0, level:0) to the gsi_handler [2] gsi handler (in pc.c) GSI stands for "global system interrupts" it consists of 8259 and apci irqs. depending on the irq line number n, sends to 8259 or apic (second chip). In our case, it's level 0 (low edge), and irq number 0: timer interrupt. So it is sent to 8259 programmable interrupt controller (PIC). 8259's handler will be called. [3] pic_set_irq by 8259: (look at i8259.c) irq: 0 (line of irq. one of 8 lines) level: 0 (low voltage edge) elcr stands for (level triggered) timer interrupt is edge triggered. From the code line 174 of i8259.c, we have: s->last_irr used for edge triggering (remember the last state for edge), change is triggered at level 1 (rising edge). *** s->irr (interrupt request register) is set (on the correspoding bit) level 1 will update the s->irr (however, it's already marked) then pic_update_irq is called. [4] pic_update_irq calls pic_get_irq first, it returns -1? not sure why, then it calls pic_irq_lower, which sends pit_irq_request(0,0) [5] apic_accept_pic_itr: Accordin to chapter 10.5.5 of Intel Manual, when a local interrupt is sent to APIC, it is subject to a number of criteria for acceptance. If the interrupt is accepted, it is logged into IRR register. DO_UPCAST(APICommonState, busdev.qdev, d) is to find the container APICCOMMONState object that contains d as the qdev member. this is a constant operation (by doing some offset pointer moving). LINT0 is the "Local Interrupt Pin 0" (input pin) lvt is the local vector table. MSR_APIC_ENABLE is defined as 1<<12 It seems that the apic interrupt accepted always to be 0 (rejected) --------------------------------------- So APIC never delivers anything!???? No breakpoint hit! The timer interrupt is always rejected by APIC controller! Sequence: qemu_run_timer -> pit_irq_timer_update -> i8254 (timer) ->hpet -> gsi -> i8259 -> apic controller -> rejected (never accepted) To verify: tomorrow, make a change to the code: disable the pit_irq_timer_update and see if things still work. Verified: it is ACTUALLY NEEDED! Otherwise the screen is not refreshed! 8:30AM 02/03/2014 [15] continue yesterday's experiment on qemu_run_timer, disable pit_timer_irq_update and see what happens. ???????????????? [STILL CANNOT EXPLAIN] 1. we know that screen updates not working but try helper_trace2 and see if process number is increasing. [1] in snap111: numCR3 is increasing (although screen is not showing up) [2] in snap222: helper_trace2 is never hit! Analysis? [1] apparently, the vm clock running is still useful! So somehow the interrupt (for sure) is SENT OUT TO cpu. [2] in snap111: the VM clock timer interrupt is used to refresh screen only. There is another interrupt for process scheduling purpose. [3] in snap222: the VM clock is used for scheduling purpose (maybe?) [16] try to understand cpu_x86_exec and main_loop, how does it switch to cpu_x86_exec [SOLVED] main_loop_wait: poll all select I/O interrupts and run timers cpu_x86_exec: execute tb blocks How does the switch occur? main_loop_wait and cpu_x86_exec belong to two threads! main_loop_wait runs the timer and process I/O cpu_x86_exec run the emulator code! 10:15AM [17] clearly snap111 is relying on another interrupt for scheduling processes. Try to figure out that interrupt and ireq/interrupt number. set a watch pointer on env->cr[3] and see how it is changed. [30 min] Observations: [1] *** cr3 is changed by cpu_x86_update_cr3 in helper.c called by helper_write_crN for translating an instruction. [2] the current EIP is: 0x804dbf60 env->interrupt_request is 0 env->interrupt_injected is 0 as well. [3] check do_interrupt_hardware_irq and see what's the related interrupt number seems both 177, 65 can trigger [4] check do_interrupt_all It seems that 177 triggers the most it is called by do_interrupt_hardware_irq, and called in cpu_x86_exec()! [which checks interrupt before each tb block of code in cpu_exec.c] [5] read about cpu_x86_exec(): 1. first check exception and interrupt based on env->exception_index>=0, do interrupt if it's value is <= 0x10000 (EXCP_INTERRUPT) it's interrupt, it then checks and performs the following: handle INTERRUPT_DEBUG, IRQ poll, SIPI, variety of interrupts. env->exception_index is 5 (stands for EXCP_IRQ) 2. then it runs a tb. using tcg_qemu_tb_exec(). 11:30AM [5] verify interrput 177. in do_interrupt_all add a line to not treat int no 177. see it impacts anything on snap111. Verified. it actually blocks process interleaving context switch. Interestingly snap222 also gets interrupt 177. We need to do a comparative study. [6] comparative study of handling of interrupt 177. [failed] [6.1] snap222. Questions: who sends the interrupt 177? How is the int number determined? calls do_interrupt_protected. interrupt dc is 0x28dc4260 ptr is 0x8003f988 type is 14 selector is 8 offset is 0x81f8f5c4 (next eip) address But it does not reach the change cr3 instruction. [6.2] snap111. check what's different. env->eip is changed to 81f8f5c4 the same Cannot tell the difference! [6.3] find out why it's interrupt 177. intno is generated by cpu_get_pic_interrupt() in pc.c. ->apic_get_interrupt() ->apic_irq_pending() ->get_highest_priority_int() p/x s->irr $31 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x20000, 0x0, 0x0} It looks like it's irq5. (parallel port 2?) Int number is computed as i = 5 * 32 + bit_position of leftmost of 0x20000 (17) =177. Next figure out who's placing 0x2000 on the s->irr, set a mem bp on it. *** now we have the nice discovery ****!!! #0 set_bit (tab=0x28dcf830, index=177) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:60 #1 0x082726c8 in apic_set_irq (s=0x28dce510, vector_num=177, trigger_mode=1) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:390 #2 0x08271f6c in apic_bus_deliver (deliver_bitmask=0xbffff04c, delivery_mode=1 '\001', vector_num=177 '\261', trigger_mode=1 '\001') at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:240 #3 0x08272298 in apic_deliver_irq (dest=1 '\001', dest_mode=1 '\001', delivery_mode=1 '\001', vector_num=177 '\261', trigger_mode=1 '\001') at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:290 #4 0x08275349 in ioapic_service (s=0x28ded6c8) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:71 #5 0x08275460 in ioapic_set_irq (opaque=0x28ded6c8, vector=9, level=1) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:102 #6 0x08126d7a in qemu_set_irq (irq=0x28de9f5c, level=1) at hw/irq.c:38 #7 0x08284e12 in gsi_handler (opaque=0x28dd6c50, n=9, level=1) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../pc.c:98 #8 0x08126d7a in qemu_set_irq (irq=0x28de5294, level=1) at hw/irq.c:38 #9 0x080bf3d0 in pm_update_sci (s=0x28ddac60) at hw/acpi_piix4.c:106 #10 0x080bf45b in pm_tmr_timer (ar=0x28ddb1fc) at hw/acpi_piix4.c:115 #11 0x080be2ca in acpi_pm_tmr_timer (opaque=0x28ddb1fc) at hw/acpi.c:389 #12 0x081e35f4 in qemu_run_timers (clock=0x28c458d0) at qemu-timer.c:394 ------------------------- Still the run_timers put it's the acpi_pm_tmr_timer directly!!! (so we were looking at the wrong place of timer!) ************************************************************** * it is still triggered by the vm_clock, but this time it's a different timer It's the acpi_pm_tmr_timer *** (in acpi module) according to 10.5.4 of intel documentation, the LVT timer register determines the vector number to deliver. Note that before it delivers to piix4, it calls the following: *** qemu_system_wakeup_request(QEMU_WAKEUP_REASON_PMTIMER); The associated irq is 9 on sci. Because it's irq->n is 9, it genrates s->irr to be 0x200 (see bit 9) at ioapic, it's determined by s->ioredtbl[9] (last two hex digits) Summary: VM clock has both i8259 and APIC timers. They go through different interrupts. 8259 goes through interrupt 0, however, it's never delivered; the APIC tiemr goes through ioapic pin 9, vector number 177 and delivered as hardware interrupt 177. ------------********************************************* The above confirmed that interrupt 177 (0xb1) is one of the working timer interrupts [1] pit i8259 timer responsible for refreshing screen, however, not used for scheduling [2] APIC timer (interrupt 177) responsible for process scheduling. ----------- ********************************************* 4:00PM [18] verify 177 is the intno that triggers the context switch. [15 min] [1] declare a global variable last_intno and set it in do_interrupt_all [2] bp on cpu_x86_update_cr3 Observation: Most likely it's 177, but it could 14 and 65 [19] check do_interrupt_hardware_irq on int_no 65, [1] who's triggering 65? from do_interrupt_hardware_irq and then check who's writing to s->irr. watch on s->irr[2]. verified: 65 is i/o write. [2] where does it go? it will also cause the switch of cr3 because of another process space routine. [3] check 14. - did not capture it again. Conclusion: 177 is the timer interrupt no. 7:00PM [19] Figure out why 177 does not trigger the scheduling in snap222. Record the instructions and then compare. [1] in the main function open the log file [15 min] DONE. [2] use a global variable to control helper_trace2 and dump the instruction into the file [20 min] -- [3] start the capture starting from do_interrupt_all for 177 [4] end the capture until the next do_interrupt_all Observation: the sequence departs at the 15th instruction! ins @81f8f5c4: push esp 2 ins @81f8f5c5: push ebp 3 ins @81f8f5c6: push ebx 4 ins @81f8f5c7: push esi 5 ins @81f8f5c8: push edi 6 ins @81f8f5c9: sub esp, 0x54 7 ins @81f8f5cc: mov ebp, esp //ESP = OLD_ESP-0x68 8 ins @81f8f5ce: mov [esp+0x44], eax //it's old ESP-24, save EAX to ESP-10, curr stack frame 9 ins @81f8f5d2: mov [esp+0x40], ecx //it's old ESP-28, save ECX .. 10 ins @81f8f5d6: mov [esp+0x3C], edx //it's old ESP-34, save EDX ... *** 11 ins @81f8f5da: test [esp+0x70], 0x00020000 //it's old ESP+8 It should reflects to the CS register stored as parameter 10:00AM [20] Redo the experiment. Record multiple occurance of cpu_x86_update_cr3. [1] bp on do_interrupt_all for 177, bp on cpu_x86_update_cr3. record ok.txt, nok.txt. Strangely, cpu_x86_update_cr3 is not hit that frequently. The problem is that there are a lot of disruption. There are a lot of sysenter from the user program syscalls, which switch cr3. Check the process(CR3) and their names p/x arrCR3 $4 = {0x0, 0x39000, 0xb35e000, 0x5dee000, 0xbccb000, 0x6d39000, 0x62c4000, 0x39000 - seems to be the OS kernel b35e000 - unknown 6d39000 - svchost.exe (checks and loads services) bccb000 - wuauclt.exe (windows auto update client) 5dee000 - csrss.exe (controls threading and windows console) 62c4000 - services.exe (service control manager) 62ce000 - lsass.exe (local security authentication server) hit sequence: 6d3900, bccb00, 6d3900, bccb00, 39000, bccb00, 39000, 62c4000<->6d3900, b35e000, (then long time no switching), bccb00, 6d3900, 62c4000, 11:00AM [2] attempt 2: select a process that is being switched to, not as a result of sysenter. Chosse 0xbccb000 (wuacult.exe - windows auto update client). Design: (1) bp on do_interrupt_all, (2) bp on cpu_x86_update_cr3 which sets new_cr3 to 0xbccb000, (3) sets bLog to 1 at do_interrupt_all and stop at new_cr3 breakpoint and FFLUSH the filelog. The change of cr3 instructio is located at 806ecbc8. Strangely, the recorded cr3 for all instructions are all bccb000. Failed system are calling each other. Need to find a process that is frequently working. Failed. [3] attempt 3: start a new application program calc.exe and see how frequently process context switch occurs. Failed. calc.exe is never swiched to. (reason: it does not have any computation). [4] attempt 4: start ieexplore and see how it works. too slow. give up. [5] attempt 5: create a batch loop Use p BatchAnalyzer::myinst->sendCommandToVM("...") will save a lot of job. failed: batch file is not treated as a process. It looks like a part of the cmd.exe 3:00PM [6] attempt 6: create a time consuming batch task.[pure CPU no I/O] [30 min] [1] create a time consuming loop, experiment it on xp vm . DONE. [2] copy it to samba DONE. [3] try run the new file and get its cr3/proc id. [4] see when it's swapped to (this must be a real context switch, because no one is request service from it). DONE. mainly switched from 5dee000 (csrss.exe). start to take Still not recorded right. It maybe the first couple of system calls (switching between services). 6:30PM [7] attempt7: use WinDbg to trace into the timer interrupt handler and check the logic *** ins @81f8f5c4: push esp does not work, need to find the idt table. Doing "!idt -a" in windbg shows @8053c3fa Try the instructions listed in http://winprogger.com/the-epic-of-apic/ about apic. [1] reload the symbols .reload [2] !apic Does not work. Still cannot read apic contents. 0xfffe0000 is not accessible. 7.2 try to locate context_switch routine in xp. ??? intno 177 (0xb1) is not in the list Found that HalpDispatchInterruptHandler is a wraper of KiDispatchException, and handles APIC signals. May be we could trace from KiQuantumEnd: it checks quantum of each thread in the list and does the swap. It is called by KiDispatchInterrupt -> KiQuantumEnd It reads _KPRCB->QuantumEnd (0x88c offset) The ISR entry point is: 806d4a41 ff3524f0dfff push dword ptr ds:[0FFDFF024h] 806d4a47 c60524f0dfff02 mov byte ptr ds:[0FFDFF024h],2 806d4a4e 832528f0dffffb and dword ptr ds:[0FFDFF028h],0FFFFFFFBh 806d4a55 fb sti 806d4a56 ff1528e46c80 call dword ptr [hal!_imp__KiDispatchInterrupt (806ce428)] 806d4a5c fa cli 806d4a5d e8fea7ffff call hal!HalpEndSoftwareInterrupt (806cf260) 806d4a62 ff25f8e46c80 jmp dword ptr [hal!_imp_Kei386EoiHelper (806ce4f8)] ------------------------------------------------ Basically it just calls KiDispatchInterrupt. It seems not to distinguish any interrupt number, because the KiDispatchInterrupt will read the APIC signals. *** all source code available in ReactOS!!! Use windbg commands !pcr to display current processor control register, and dt _KPCRB (find the address of pcr entry) to find QuantimEnd field. We can then set a hardware breakpoint (mem write) on it. It's updated by KeUpdateRunTime<-KeUpdateSystemTime (not sure which called it) irql (using !pcr) is 0x001c Irql: 0000001c IRR: 00000004 IDR: ffff20f8 Another Idea: [1] get the binary code of the kiDispatchInterrupt (move byte ptr ds:[0ffdf024]) and search for the next code get the EIP in the qemu/xp. [2] trace the hardware interrupt who's triggering it. 10:00AM 02/05/2014 -------------------------------------------------------------------------------------- Task 207: study the logic of process/thread scheduling and clock. -------------------------------------------------------------------------------------- [1] study the logic of scheduling. (ReactOS) [1 hr] [1.1] HalpDispatchInterrupt2ndEntry: it calls KiDispatchInterrupt [1.2] KiDispatchInterrupt: handles both hardware and software get the prc (_KPRC) and prcb (_KPRCB, the kernel processor state data) Note: prcb->TimerRequest prcb->CurrentThread, prcb->NextThread Logic: //prcb->QuantumEnd means the current thread's quantum is end if(prcb->QuantumEnd){ KiQuantumEnd(); //calculate QuantumEnd of the current thread, if yes, schedule another; if no, keep it; but strangely it does not update prcb->QuantumEnd; GUESS: prcb->QuantumEnd if updated by another procedure} else{ context switch and get a new thread to run; call KiContextSwitch to perform Context Switch } [1.3] KiQuantumEnd: get the current thread from prcb->Thread check thread->Quantum (note: it's different from prcb->QuantumEnd) if runs out of qunatum (quantum<0): reassigns priority of the current thread pick the next thread and schedule it to run. [1.4] KeUpdateSystemTime: called by HalpClockInterrupt, HalpInterruptDispatch and KiInterruptDispatch: 1. call KiWriteSystemTime 2. call KeUpdateRunTime [1.5] KeUpdateRunTime: Update usertime, systemtime, interrupt time count (by clock cycles or quater?) Update Thread->Quantum (minus certain amount) if(Thread->Quantum<=0){ prcb->QuantumEnd=1; //so that's where it is updated [2] Locate the 1st 15 bytes of the binary code [1] KiQuantumEnd Use WinDbg: Locate the following code: corresponds to if(Thread->Quantum<=0...) 804ff081 33db xor ebx,ebx 804ff083 385e6f cmp byte ptr [esi+6Fh],bl 804ff086 8845ff mov byte ptr [ebp-1],al 804ff089 7f6c jg nt!KiQuantumEnd+0x95 (804ff0f7) nt!KiQuantumEnd+0x29: 804ff08b 8b4644 mov eax,dword ptr [esi+44h] 804ff08e 385869 cmp byte ptr [eax+69h],bl 804ff091 740c je nt!KiQuantumEnd+0x3d (804ff09f) Byte string: "\x33\xdb\x38\x5e\x6f\x88\x45\xff\x7f\x6c\x8b\x46\x44\x38\x58" Implementation: [1] In helper_trace2 declare a private function isKiQuantumEnd() [2] trace into it and prints the last_intno --> failed. could not capture KiQuantumEnd (maybe because it's not that frequent?) [2] try KeUpdateRunTime: Try the following: 80540388 806b6f03 sub byte ptr [ebx+6Fh],3 8054038c 7f19 jg nt!KeUpdateRunTime+0x133 (805403a7) This corresponds to source: if ((CurrentThread->Quantum -= 3) <= 0) { Prcb->QuantumEnd = TRUE; HalRequestSoftwareInterrupt(DISPATCH_LEVEL); } Still could not get it right. Or is it caused by different code? Does not work. It seems that the data structure is different? Well then test the preceeding instructions at the beginning of the function that loads pcrb etc. 80540274 a11cf0dfff mov eax,dword ptr ds:[FFDFF01Ch] 80540279 53 push ebx 8054027a ff80c4050000 inc dword ptr [eax+5C4h] Another try: 8054027a ff80c4050000 inc dword ptr [eax+5C4h] 80540280 8b9824010000 mov ebx,dword ptr [eax+124h] 80540286 8b4b44 mov ecx,dword ptr [ebx+44h] Still not successful. Maybe it's the offset of eax stuff. Attempt 2: only capture the skeleton. Mark the don't care as \xF7. not working. cannot find the sequence Attempt 3: try to read the logic of interrupt handler of 177, not like anyone. Failed. [2] failed. Could not locate or trace into any of the KeUpdateRunTime functions! Strange! guess? THE INTERRUPT handler of 177 checks 0x20000 is to verify intno is int 177. Check the WinDbg again on HalpClockInterrupt. Found it! It's the handler of int 177!!!! hal!HalpClockInterrupt: 806d4d50 54 push esp 806d4d51 55 push ebp 806d4d52 53 push ebx 806d4d53 56 push esi 806d4d54 57 push edi 806d4d55 83ec54 sub esp,54h 806d4d58 8bec mov ebp,esp 806d4d5a 89442444 mov dword ptr [esp+44h],eax 806d4d5e 894c2440 mov dword ptr [esp+40h],ecx 806d4d62 8954243c mov dword ptr [esp+3Ch],edx 806d4d66 f744247000000200 test dword ptr [esp+70h],20000h 806d4d6e 75b8 jne hal!V86_Hci_a (806d4d28) ***********************************************8 !!!1 -> but on VBox WinXp it's hooked as interrupt 0x30! ************************************************* Compare with the following on interrupt handler of 177 ins @81f8f5c4: push esp 2 ins @81f8f5c5: push ebp 3 ins @81f8f5c6: push ebx 4 ins @81f8f5c7: push esi 5 ins @81f8f5c8: push edi 6 ins @81f8f5c9: sub esp, 0x54 7 ins @81f8f5cc: mov ebp, esp //ESP = OLD_ESP-0x68 8 ins @81f8f5ce: mov [esp+0x44], eax //it's old ESP-24, save EAX to ESP-10, curr stack frame 9 ins @81f8f5d2: mov [esp+0x40], ecx //it's old ESP-28, save ECX .. 10 ins @81f8f5d6: mov [esp+0x3C], edx //it's old ESP-34, save EDX ... *** 11 ins @81f8f5da: test [esp+0x70], 0x00020000 //it's old ESP+8 [3] Comparative Study QEMU and VBox Winxp's HalpClockInterrupt Handler! VBox: 0x806d4d50 --> 1. //to test if it is the timer interrupt? 806d4d66 f744247000000200 test dword ptr [esp+70h],20000h --> 2. //call begin system interrupt al!HalpClockInterrupt+0xa3: 806d4df3 e850f9ffff call hal!HalBeginSystemInterrupt (806d4748) --> 3. //call KeUpdateSystemTime 806d4f35 0f843df5ffff je hal!KeUpdateSystemTime (806d4478) This leads to: Qemu: 0x81f8f5c5 --> 1. //to test if it is the timer interrupt? 81f8f5c5 44247000000200 test dword ptr [esp+70h],20000h Code begin different from: @EIP 0x81f8f65b: length: (2): jnz 0x0000000F There are some extra instructions, but still follow roughly the same logic. ---> then it departs from one branch and it never calls KeUpdateSystem Trouble is that the logic of the two routines are completely different!!! It may be caused by different device driver for hardware? Trouble ... *************** TO DO -------------------------------------------------------------------------------------- Task 208: Figure out how context switch is done. -------------------------------------------------------------------------------------- [1] find the context switch code in KiDispatchInterrupt. nt!SwapContext 80540ab0 0ac9 or cl,cl 80540ab2 26c6462d02 mov byte ptr es:[esi+2Dh],2 80540ab7 9c pushfd 80540ab8 8b0b mov ecx,dword ptr [ebx] [2] find the corresponding procedure in QEMU Identified the code!: (!804dbec0!!!!) @EIP 0x804dbec0: length: (1): pushf @EIP 0x804dbec1: length: (2): movl (%ebx), %ecx @EIP 0x804dbec3: length: (7): cmpl $0x00, 0x994(%ebx) @EIP 0x804dbeca: length: (1): push %ecx @EIP 0x804dbecb: length: (6): jnz 0x0000013A @EIP 0x804dbed1: length: (7): cmpl $0x00, 0x8056198 last_int_no is 177 keyboard event could also trigger it. verified in snap222. It's not triggered. Note: not every 177 triggers it. !!!!!!!!!!!!1 TO DO !!!!!!!!!!!!!!!!!!!!!!!!!!!! [3] record the code and find how it's triggered. start from interrupt 177 and stop at the first hit of 0x804dbec0 8:00AM 02/06/2014 [4] record the code and comparative study. Implementation: [1] perform the experimen tagain on do_interrupt_all (147, keyboard event) Verified: snap111, every 147 triggers SwapContext. in snap222, the first 147 triggers SwapContext but the rest dont 9:30AM [2] mode 1: recording mode. start: do_interrupt_all, triggered by 147 (keyboard event). set bLog=1 end: 0x804dbec0 is hit. -> ok.txt [3] mode 2: to check snap222 start: do_interrupt_all, triggered by 147. set bLog=1 end: do_interrupt_all, 147 again, and set bLog=0 [4] observation: compared ok.txt and nok.txt, the different starts from nok.txt 1630 ns @80518611 (cr3: 39000): xchg [ecx], eax 1631 ins @80518613 (cr3: 39000): test eax, eax 1632 ins @80518615 (cr3: 39000): jnz 0x0000001E 1633 ins @80518633 (cr3: 39000): ret 0x0004 ok.txt 1630 ins @80518611 (cr3: b35e000): xchg [ecx], eax 1631 ins @80518613 (cr3: b35e000): test eax, eax 1632 ins @80518615 (cr3: b35e000): jnz 0x0000001E 1633 ins @80518617 (cr3: b35e000): and [-0x7FAAC520], eax 10:00AM [5] figure out the logic of the key processing algorithm and check how it gets to swapcontext. Approach: use WinDbg first dump !idt -a and check the key event handler, it's ace36e200000031: 899cc15c i8042prt!I8042KeyboardInterruptService (KINTERRUPT 899cc120) Verified: it's the ISR that handles keyboard event. First couple of instructions (from WinDbg dumped below): --------------- kd> uf i8042prt!I8042KeyboardInterruptService i8042prt!I8042KeyboardInterruptService: ba9a8495 6a18 push 18h ba9a8497 68a8b79aba push offset i8042prt!`string'+0x154 (ba9ab7a8) ba9a849c e8ff000000 call i8042prt!_SEH_prolog (ba9a85a0) ba9a84a1 8b7d0c mov edi,dword ptr [ebp+0Ch] ba9a84a4 8b7728 mov esi,dword ptr [edi+28h] ba9a84a7 837e3001 cmp dword ptr [esi+30h],1 ba9a84ab 0f854f010000 jne i8042prt!I8042KeyboardInterruptService+0xa2 (ba9a8600) ---------- It corresponds to the following in ok.txt, recorded for (snap111) [by search "push.*18"] --------- ns @f85c0495 (cr3: b35e000): push 0x18 ins @f85c0497 (cr3: b35e000): push 0xF85C37A8 ins @f85c049c (cr3: b35e000): call 0x00000104 ins @f85c05a0 (cr3: b35e000): push 0xF85C3274 [6] figure out the key ISR logic (and see how it reaches the SwapContext). Logic commands in ok_comment.txt in (qemu_image) Summary of the logic of I8042KeyboardInterruptService: (1) call SEH_prolog (2) perform in al, 0x63 (port 0x63) [it's one of the 8042 i/o ps2 device) (3) call I8xGetBytesAsynchronousA (4) call I8xQueueCurrentKeyboardInput (5) _SEH_epilog (5) back to KiInterruptDispatch (WRONG) Conclusion: keyboard still does not DIRECTLY trigger SwapContext. It still needs something like timer interrupt. [7] compare with nok.txt, see if all major points are there. All there Conclusion: confirmed. It's the problem of timer interrupt. [8] observe the rest of the ok_comment.txt and see how SwapContext is actually triggered. 81f8f5c5 44247000000200 test dword ptr [esp+70h],20000h 12:15PM [9] The conclusion above IS NOT RIGHT. The last_intno report of SwapContext is 147. So it must be the second 147 handler that triggers it. Check using WinDbg: [1] ba e1 I8042KeyboardInterruptService [2] hit it the second time and then ba e1 SwapContext and see what's going on. After iretd of the 2nd I8042KeyboardInterruptService, it enters intelppm.-> popProcessorIdle -> KiIdleLoop -> HalClearSoftwareInterrupt -> KiRetireDPCList -> then SwapContext!!! Mark the above in ok_comment.txt Problem: but in the qemu version, it may be in a completely different context (when the key is pressed). --> check if KiIdleLoop is ever executed??? [10] identify KiIdleLoop using WinDbg. KiIdleLoop code from WinDbg: ---------------- nt!KiIdleLoop+0x10: 80540cc0 fb sti 80540cc1 90 nop 80540cc2 90 nop 80540cc3 fa cli 80540cc4 3b6d00 cmp ebp,dword ptr [ebp] 80540cc7 740d je nt!KiIdleLoop+0x26 (80540cd6) nt!KiIdleLoop+0x19: 80540cc9 b102 mov cl,2 80540ccb ff15a8764d80 call dword ptr [nt!_imp_HalClearSoftwareInterrupt (804d76a8)] 80540cd1 e841000000 call nt!KiRetireDpcList (80540d17) Pattern sti nop nop cli is unique, capture it. [a] impleent isIdIdleLoop. [b] test it. It's never hit for the 4 instructions version. [c] conclusion: KiIdleLoop is NEVER hit!!!!!! Conclusion: it may be because the KiIdleLoop itself relies on timer interrupt to be scheduled. (so for snap111 it's a problem?) 10:00PM [11] go back to analyze ok_comment.txt The problem is what happened after the 2nd keybard event returns. in WinDbg it's going to KiIdleLoop. But in QEMU's version, it goes directly to call of SwapContext immediately. [11.1] Need to find out: [1] dump memory: add function print_mem, e.g., to print the bytes at 0x804dbe90 print_mem(0x804dbe90, 16, env) [2] windbg search s 80000000 L2000000 90 90 90 "search for 3 nops between 8000000 and a000000" [11.2] search for @804dbe0f (cr3: b35e000): cli ins @804dbe10 (cr3: b35e000): cmp eax, [eax --- more details here @EIP 0x804dbe0f: length: (1): cli @EIP 0x804dbe10: length: (2): cmpl (%eax), %eax @EIP 0x804dbe12: length: (2): jz 0x0000001F -- sometimes need to call print_mem twice because of page fault, the 5 bytes are below: 0x804dbe0f: fa 3b 00 74 1d Search in WinDbg: s 80000000 L70000000 fa 3b 00 74 1d, find two and identified one *** KiDispatchInterrupt! 80540a06 8d8380090000 lea eax,[ebx+980h] 80540a0c fa cli 80540a0d 3b00 cmp eax,dword ptr [eax] 80540a0f 741d je nt!KiDispatchInterrupt+0x2e (80540a2e) nt!KiDispatchInterrupt+0x11: 80540a11 55 push ebp 80540a12 ff33 push dword ptr [e [11.3] now continue to analyze the logic of KiDispatchInterrupt! Analyzisis done in ok_comment.txt Basic logic follows the source code in ReactOS for KiDispatchInterrupt take if elseif(prcb->NextThread) branch and swaps a thread. [11.4] compare ok_comment.txt and nok.txt, mainly check the important calls The difference is that prcb->QuantumEnd is 0 and prcb->nextThread is 0 **************************************************************************************** Conclusion: it's the guy who updated prcb->NextThread (the snap222 never has prcb->NextThread updated). Needs to check: (interrupt 177 in QEMU and 0x30 in WinDbg) **************************************************************************************** *** WinDbg: hal!HalpClockInterrupt: 806d4d50 54 push esp 806d4d51 55 push ebp 806d4d52 53 push ebx *** QEMU: 1 ins @81f8f5c4: push esp 2 ins @81f8f5c5: push ebp 3 ins @81f8f5c6: push ebx !!!!! KiQuantumEnd resets prcb->NextThread! 8:15AM 02/07/2014 [12] Design: identify who's updating prcb->NextThread [1] in WinDbg find who's updating prcb->NextThread [a] bp on KiDispatchInterrupt and find the check on prcb->NextThread prcb->NextThread is located at ffdff128 [b] set a write BP on prcb->NextThread Observatoin: prcb->NextThread is modified in (1) KiQuantumEnd (when it's found that QuantumEnd flag is set, switch thread). (2) also in the else if branch of KiDispatchInterrupt (when prb->NextThread is found to be not null), this time to swtich and clear it (see ReactOS) (3) KiReadyThread <- ExReleaseResource (4) KiUnlockDatabase (5) KiReadyThread<- .... <-win32!CreateSystemThread (6) KiAdjustQuantumThread ... too many to analyze. 9:45AM [13] Conjecture: HalpClockInterrupt calls KeUpdateSystemTime which then calls KiCheckForTimerExpiration then sets the software interrupt. Check if there is any difference between snap111 and snap222. [13.1] with the help of WinDbg, provide the annotated comments of HalpClockInterrupt below Steps: bp on do_interrupt_all if intno==177, and then comparative study of code ************************************************************************************ HalpClockInterrupt *********************************************************************************** @EIP 0x81f8f5c4: length: (1): push %esp @EIP 0x81f8f5c5: length: (1): push %ebp @EIP 0x81f8f5c6: length: (1): push %ebx @EIP 0x81f8f5c7: length: (1): push %esi @EIP 0x81f8f5c8: length: (1): push %edi @EIP 0x81f8f5c9: length: (3): sub $0x54, %esp @EIP 0x81f8f5cc: length: (2): mov %esp, %ebp @EIP 0x81f8f5ce: length: (4): movl %eax, 0x44(%esp) @EIP 0x81f8f5d2: length: (4): movl %ecx, 0x40(%esp) @EIP 0x81f8f5d6: length: (4): movl %edx, 0x3C(%esp) @EIP 0x81f8f5da: length: (8): testl $0x00020000, 0x70(%esp) @EIP 0x81f8f5e2: length: (6): jnz 0x00000130 #jne hal!V86_Hci_a @EIP 0x81f8f5e8: length: (6): cmpw $0x08, 0x6C(%esp) @EIP 0x81f8f5ee: length: (2): jz 0x00000025 @EIP 0x81f8f5f0: length: (4): movw %fs, 0x50(%esp) #copy segment registers @EIP 0x81f8f5f4: length: (4): movw %ds, 0x38(%esp) @EIP 0x81f8f5f8: length: (4): movw %es, 0x34(%esp) @EIP 0x81f8f5fc: length: (4): movw %gs, 0x30(%esp) @EIP 0x81f8f600: length: (5): mov $0x00000030, %ebx @EIP 0x81f8f605: length: (5): mov $0x00000023, %eax @EIP 0x81f8f60a: length: (3): mov %bx, %fs @EIP 0x81f8f60d: length: (3): mov %ax, %ds @EIP 0x81f8f610: length: (3): mov %ax, %es @EIP 0x81f8f613: length: (7): movl %fs:0x0, %ebx #set fs:[0] the SEH handler pointer @EIP 0x81f8f61a: length: (11): movl $0xFFFFFFFF, %fs:0x0 #set to 0xFFFF (for kernel) @EIP 0x81f8f625: length: (4): movl %ebx, 0x4C(%esp) @EIP 0x81f8f629: length: (6): cmp $0x00010000, %esp @EIP 0x81f8f62f: length: (6): jc 0x000000BB # Abios_Hci_a (checking intno) @EIP 0x81f8f635: length: (8): movl $0x00000000, 0x64(%esp) @EIP 0x81f8f63d: length: (1): cld @EIP 0x81f8f63e: length: (3): movl 0x60(%ebp), %ebx @EIP 0x81f8f641: length: (3): movl 0x68(%ebp), %edi @EIP 0x81f8f644: length: (3): movl %edx, 0xC(%ebp) @EIP 0x81f8f647: length: (7): movl $0xBADB0D00, 0x8(%ebp) @EIP 0x81f8f64e: length: (3): movl %ebx, (%ebp) @EIP 0x81f8f651: length: (3): movl %edi, 0x4(%ebp) @EIP 0x81f8f654: length: (7): testb $0xFF, 0xFFDFF050 @EIP 0x81f8f65b: length: (2): jnz 0x0000000F jne hal!Dr_Hci_a (another handler) # *********************************************************************** #in the following (HalpClockInterrupt+0x99) the code is completely different # *********************************************************************** @EIP 0x81f8f65d: length: (5): mov $0x81F8F588, %edi # -------- to find out the call, trace into the long jump and get the signuatre instructions # ------- then search in WinDbg!!! @EIP 0x81f8f662: length: (5): ljmp 0xFE54B700 #!!! jump to KiDispatchInterrupt!!! # the rest of the code will actually never be hit! # continue to ljmp FE54B700! (804dad62) @EIP 0x804dad62: length: (6): incl 0xFFDFF5C4 @EIP 0x804dad68: length: (2): mov %esp, %ebp @EIP 0x804dad6a: length: (3): movl 0x24(%edi), %eax @EIP 0x804dad6d: length: (3): movl 0x29(%edi), %ecx @EIP 0x804dad70: length: (1): push %eax @EIP 0x804dad71: length: (3): sub $0x04, %esp @EIP 0x804dad74: length: (1): push %esp @EIP 0x804dad75: length: (1): push %eax @EIP 0x804dad76: length: (1): push %ecx @EIP 0x804dad77: length: (6): lcall *0x804D75D8 # call _imp_HalBeginSystemInterrupt #HalBeginSystemInterrupt mainly sets up the interrupt vector @EIP 0x804dad7d: length: (2): or %eax, %eax @EIP 0x804dad7f: length: (2): jz 0x00000038 @EIP 0x804dad81: length: (3): sub $0x0C, %esp @EIP 0x804dad84: length: (7): cmpl $0x00, 0x8056198C @EIP 0x804dad8b: length: (7): movl $0x00000000, -0xC(%ebp) @EIP 0x804dad92: length: (2): jnz 0x0000002D @EIP 0x804dad94: length: (3): movl 0x1C(%edi), %esi @EIP 0x804dad97: length: (3): movl 0x10(%edi), %eax @EIP 0x804dad9a: length: (1): push %eax #eax is the interrupt context @EIP 0x804dad9b: length: (1): push %edi #edi should be the interrupt number @EIP 0x804dad9c: length: (3): lcall *0xC(%edi) # call the interruptservice routin #------------- **************************************************************** # --- see what is the interrupt service routine # It's f850c31e # ----------------------------------------------------------------------------- # //problem could not find the corresponding code in WinDbg! #!!!! got to use lm command to display memory range first # find that 0xf8... mem range corresponds to 0xb0... range! # use 0xf850c334 9 bytes as signature to search. #!!!!! corresponds to ACPIInterruptServiceRoutine!!!! #------------------------------------------------------------------------ @EIP 0xf850c31e: length: (2): mov %edi, %edi @EIP 0xf850c320: length: (1): push %ebp @EIP 0xf850c321: length: (2): mov %esp, %ebp @EIP 0xf850c323: length: (1): push %ecx @EIP 0xf850c324: length: (1): push %ecx @EIP 0xf850c325: length: (1): push %ebx @EIP 0xf850c326: length: (1): push %esi @EIP 0xf850c327: length: (1): push %edi @EIP 0xf850c328: length: (5): lcall 0x0000F84C #ACPIIoReadPm1Status @EIP 0xf850c32d: length: (2): mov %eax, %ebx @EIP 0xf850c32f: length: (5): lcall 0xFFFFEA35 #ACPIGpeIsEvent @EIP 0xf850c334: length: (2): test %al, %al @EIP 0xf850c336: length: (5): mov $0x00010000, %edi @EIP 0xf850c33b: length: (2): jz 0x00000004 @EIP 0xf850c33d: length: (2): or %edi, %ebx @EIP 0xf850c33f: length: (7): testb $0x01, 0xF851F279 @EIP 0xf850c346: length: (2): jnz 0x00000008 @EIP 0xf850c348: length: (2): test %ebx, %ebx @EIP 0xf850c34a: length: (2): jnz 0x00000004 @EIP 0xf850c34c: length: (2): mov %edi, %ebx @EIP 0xf850c34e: length: (2): mov %ebx, %esi @EIP 0xf850c350: length: (3): and $0x11, %esi @EIP 0xf850c353: length: (3): movl %esi, -0x4(%ebp) @EIP 0xf850c356: length: (2): jz 0x0000001B @EIP 0xf850c358: length: (1): push %esi @EIP 0xf850c359: length: (5): lcall 0x0000F45B # call CLEAR_PM1_STATUS_BITS @EIP 0xf850c35e: length: (3): test $0x01, %bl @EIP 0xf850c361: length: (2): jz 0x0000000A @EIP 0xf850c363: length: (5): movl 0xF851F590, %eax #PmHalDispatchTable @EIP 0xf850c368: length: (3): lcall *0xC(%eax) #LOOKS LIKE THE REAL HANDLER # ---------------------- ACPITimerCarry @EIP 0x806f4e08: length: (1): push %ebx @EIP 0x806f4e09: length: (6): movl 0x806F90A8, %edx #edx->hal!TimerInfo @EIP 0x806f4e0f: length: (1): in %dx, %eax #MUST BE READING CLOCK VALUE @EIP 0x806f4e10: length: (2): mov %eax, %ebx @EIP 0x806f4e12: length: (6): movl 0x806F90B8, %ecx # three attributes of hal!TimerInfo @EIP 0x806f4e18: length: (5): movl 0x806F90AC, %eax @EIP 0x806f4e1d: length: (6): movl 0x806F90B0, %edx @EIP 0x806f4e23: length: (2): add %ecx, %eax @EIP 0x806f4e25: length: (3): adc $0x00, %edx @EIP 0x806f4e28: length: (2): xor %eax, %ebx @EIP 0x806f4e2a: length: (2): and %ecx, %ebx @EIP 0x806f4e2c: length: (2): add %ebx, %eax @EIP 0x806f4e2e: length: (3): adc $0x00, %edx @EIP 0x806f4e31: length: (6): movl %edx, 0x806F90B4 #update the three attributes of TimerInfo @EIP 0x806f4e37: length: (5): movl %eax, 0x806F90AC @EIP 0x806f4e3c: length: (6): movl %edx, 0x806F90B0 @EIP 0x806f4e42: length: (1): pop %ebx @EIP 0x806f4e43: length: (1): ret # ----- back to ACPIInterruptService @EIP 0xf850c36b: length: (2): mov %esi, %eax @EIP 0xf850c36d: length: (2): not %eax @EIP 0xf850c36f: length: (2): and %eax, %ebx @EIP 0xf850c371: length: (2): test %ebx, %ebx @EIP 0xf850c373: length: (2): jz 0x00000061 -- will jump 0x61 bytes away and kill the following (most likely) -- seems TO BE AFFECTED #-- SEEMS TO BE AFFECTED BY THE TimerCarry results (ebx value) #6:45PM test if 0xf850c375 is ever hit. # ??? actually the following until 0xf850c3d6 is never hit # -- in WinDbg it IS HIT? @EIP 0xf850c375: length: (3): movl 0xC(%ebp), %esi @EIP 0xf850c378: length: (3): add $0x30, %esi @EIP 0xf850c37b: length: (2): movl (%esi), %eax @EIP 0xf850c37d: length: (2): not %eax @EIP 0xf850c37f: length: (2): test %eax, %ebx @EIP 0xf850c381: length: (2): jnz 0x00000004 @EIP 0xf850c383: length: (2): or %edi, %ebx @EIP 0xf850c385: length: (2): test %ebx, %edi @EIP 0xf850c387: length: (2): jz 0x00000009 @EIP 0xf850c389: length: (2): push $0x00 @EIP 0xf850c38b: length: (5): lcall 0xFFFFE7F9 @EIP 0xf850c390: length: (1): push %ebx @EIP 0xf850c391: length: (5): lcall 0x0000F423 @EIP 0xf850c396: length: (2): movl (%esi), %eax @EIP 0xf850c398: length: (5): mov $0x80000000, %edi @EIP 0xf850c39d: length: (2): or %edi, %ebx @EIP 0xf850c39f: length: (2): mov %eax, %edx @EIP 0xf850c3a1: length: (1): push %eax @EIP 0xf850c3a2: length: (2): or %ebx, %edx @EIP 0xf850c3a4: length: (2): mov %esi, %ecx @EIP 0xf850c3a6: length: (3): movl %eax, -0x8(%ebp) @EIP 0xf850c3a9: length: (6): lcall *0xF851C314 @EIP 0xf850c3af: length: (3): cmpl %eax, -0x8(%ebp) @EIP 0xf850c3b2: length: (2): jnz 0xFFFFFFED @EIP 0xf850c3b4: length: (2): not %eax @EIP 0xf850c3b6: length: (2): and %ebx, %eax @EIP 0xf850c3b8: length: (3): orl %eax, -0x4(%ebp) @EIP 0xf850c3bb: length: (3): testl %edi, -0x4(%ebp) @EIP 0xf850c3be: length: (2): jz 0x00000013 @EIP 0xf850c3c0: length: (3): movl 0xC(%ebp), %eax @EIP 0xf850c3c3: length: (2): push $0x00 @EIP 0xf850c3c5: length: (2): push $0x00 @EIP 0xf850c3c7: length: (3): add $0x34, %eax @EIP 0xf850c3ca: length: (1): push %eax @EIP 0xf850c3cb: length: (6): lcall *0xF851C390 @EIP 0xf850c3d1: length: (3): movl -0x4(%ebp), %esi @EIP 0xf850c3d4: length: (2): xor %eax, %eax #--- the above is never hit in snap111 --- strange, directly return @EIP 0xf850c3d6: length: (1): pop %edi @EIP 0xf850c3d7: length: (2): test %esi, %esi @EIP 0xf850c3d9: length: (1): pop %esi @EIP 0xf850c3da: length: (3): setnz %al @EIP 0xf850c3dd: length: (1): pop %ebx @EIP 0xf850c3de: length: (1): leave @EIP 0xf850c3df: length: (3): ret $0x0008 #---------- back to KiDispatchInterrupt @EIP 0x804dad9f: length: (4): cmpl $0x00, -0xC(%ebp) @EIP 0x804dada3: length: (2): jnz 0x00000045 @EIP 0x804dada5: length: (3): add $0x0C, %esp @EIP 0x804dada8: length: (1): cli @EIP 0x804dada9: length: (6): lcall *0x804D75DC #call _HalEndSystemInterrupt @EIP 0x804dadaf: length: (5): ljmp 0x00004B4C #-------------- NEW TO ANALYZE!!! 9:30AM 02/08/2014 # -- this is KiExceptionExit @EIP 0x804df8fb: length: (1): cli @EIP 0x804df8fc: length: (7): testl $0x00020000, 0x70(%ebp) #check IRQ is 0x20000 (timer) @EIP 0x804df903: length: (2): jnz 0x00000008 @EIP 0x804df905: length: (4): testb $0x01, 0x6C(%ebp) #check IRQL is 1 @EIP 0x804df909: length: (2): jz 0x00000036 #--------- the following will be skipped (however, it will sometimes be hit) #in snap111 it is hit especially after sending a key #in snap222 it never fired # the following is HIT only if(IRQ has 0x20000 set || IRQL!=1) # SO the following is HIT only when it's NOT timer interrupt. @EIP 0x804df90b: length: (6): movl 0xFFDFF124, %ebx #FFDFF124 points KTHREAD, now ebx has Thread @EIP 0x804df911: length: (4): movb $0x00, 0x2E(%ebx) # set _KTHREAD->Alerted to 0 @EIP 0x804df915: length: (4): cmpb $0x00, 0x4A(%ebx) # _KTHREAD->ApcState->UserAPCPending @EIP 0x804df919: length: (2): jz 0x00000026 #if no ApcState->UserAPCPending skip following @EIP 0x804df91b: length: (2): mov %ebp, %ebx @EIP 0x804df91d: length: (5): mov $0x00000001, %ecx @EIP 0x804df922: length: (6): lcall *0x804D7648 #call _imp_KfRaiseIrq @EIP 0x804df928: length: (1): push %eax @EIP 0x804df929: length: (1): sti @EIP 0x804df92a: length: (1): push %ebx @EIP 0x804df92b: length: (2): push $0x00 @EIP 0x804df92d: length: (2): push $0x01 @EIP 0x804df92f: length: (5): lcall 0x00005F3E #call KiDeliverApc @EIP 0x804df934: length: (1): pop %ecx @EIP 0x804df935: length: (6): lcall *0x804D7670 #call _imp_KfLowerIrql @EIP 0x804df93b: length: (1): cli @EIP 0x804df93c: length: (2): ljmp 0xFFFFFFCF @EIP 0x804df93e: length: (1): nop #----------- the above will be skipped @EIP 0x804df93f: length: (4): movl 0x4C(%esp), %edx @EIP 0x804df943: length: (7): movl %fs:0x50, %ebx @EIP 0x804df94a: length: (7): movl %edx, %fs:0x0 @EIP 0x804df951: length: (6): test $0x000000FF, %ebx @EIP 0x804df957: length: (2): jnz 0x00000050 @EIP 0x804df959: length: (8): testl $0x00020000, 0x70(%esp) @EIP 0x804df961: length: (6): jnz 0x000000C7 @EIP 0x804df967: length: (7): testw $0xFFF8, 0x6C(%esp) @EIP 0x804df96e: length: (2): jz 0x00000079 @EIP 0x804df970: length: (4): movl 0x3C(%esp), %edx @EIP 0x804df974: length: (4): movl 0x40(%esp), %ecx @EIP 0x804df978: length: (4): movl 0x44(%esp), %eax @EIP 0x804df97c: length: (5): cmpw $0x08, 0x6C(%ebp) @EIP 0x804df981: length: (2): jz 0x0000000E @EIP 0x804df983: length: (3): leal 0x30(%ebp), %esp @EIP 0x804df986: length: (2): pop %gs @EIP 0x804df988: length: (1): pop %es @EIP 0x804df989: length: (1): pop %ds @EIP 0x804df98a: length: (3): leal 0x50(%ebp), %esp @EIP 0x804df98d: length: (2): pop %fs @EIP 0x804df98f: length: (3): leal 0x54(%ebp), %esp @EIP 0x804df992: length: (1): pop %edi @EIP 0x804df993: length: (1): pop %esi @EIP 0x804df994: length: (1): pop %ebx @EIP 0x804df995: length: (1): pop %ebp @EIP 0x804df996: length: (7): cmpw $0x0080, 0x8(%esp) @EIP 0x804df99d: length: (6): ja 0x000000A7 @EIP 0x804df9a3: length: (3): add $0x04, %esp @EIP 0x804df9a6: length: (1): iret # ------------ return now # ---- FINISH ????? did not update timer etc. Summary: ClockInterrupt -> ACPIInterruptService -> ACPITimerCarry -> ExceptionExit Seems no important calls placed. Conjecture: if disable TimerCarry what would happen? [1] check if snap222 is also caling TimerCarry. @EIP 0xf850c368: length: (3): lcall *0xC(%eax) #LOOKS LIKE THE REAL HANDLER # ---------------------- ACPITimerCarry ... # ----- back to ACPIInterruptService @EIP 0xf850c36b: length: (2): mov %esi, %eax [2] test snap111 and snap222 add a conditional bp at helper_trace2 (on f850c368) both are hit [3] try disable 0xf850c368 by directly setting the eip_in to f850c36b see the results of snap111 and snap222 verified, it's NOT APICTimerCarry which causes the difference. [4] try disable the entire service by replacing eip @EIP 0x81f8f5c4: length: (1): push %esp with @EIP 0x804df9a6: length: (1): iret Does not work in helper_trace2. It has to be done in disas_insn() in translate.c Translate the instruction to iret (0xcf). change is in translate.c:4425 also bp on ops_sse.h:2423 Verified, if disable the interrupt (iret) directly, it would not even trigger the timer interrupt again (because it ignores the READ_REG_C). [5] try disable 0xf850c368 (clal of APICTimerCarry) by replacing the three instructions at 0xf850c368, 0xf850c369, 0xf850c36a to NOP instructions. See if it would change the behavior of snap111. Verified, disableing APICTimerCarry does not actually affect it. Need to find another one. [6] try disable the lcall interrupt service routine at 0x804dad9c @EIP 0x804dad9c: length: (3): lcall *0xC(%edi) # call the interruptservice routin verified: cannot disable. It disrupts the entire service. [7] disable the ACPIGpeIsEvent @EIP 0xf850c32f: length: (5): lcall 0xFFFFEA35 #ACPIGpeIsEvent @EIP 0xf850c334: length: (2): test %al, %al Verified: it can actually be disabled. [8] disable _imp_HalBeginSystemInterrupt @EIP 0x804dad77: length: (6): lcall *0x804D75D8 # call _imp_HalBeginSystemInterrupt #HalBeginSystemInterrupt mainly sets up the interrupt vector @EIP 0x804dad7d: length: (2): or %eax, %eax blue screen. [9] try ACPITimerAgain @EIP 0xf850c368: length: (3): lcall *0xC(%eax) #LOOKS LIKE THE REAL HANDLER @EIP 0xf850c36b: length: (2): mov %esi, %eax verified ACPITimer does can be disabled. no effects. --------------------- [10] --- TO DO find out the constant 0x20000 and EBP+70h (what the data structure of ESP; find the KiExceptionExit branch that are hit. !!! it should be related to PCR or PCRB *** VERIFYED EBP+70h is the IRR field of _KPCR (offset 0x28) So _KPCR address is EBP+70h-28h = EBP+48 So we can infer that [EBP+70h] is the IRR field [EBP+6ch] is the IRQL field (irql LEVEL) KiExceptionExit is to verify if IRR is 0x20000 and IRQL is 1, according to the calculation 32*IRQL + left most bit, the intno related on WinDbg is 17 + 32*1 = 49. [11] It seems that timer interrupt cannot be disabled because of the following line @EIP 0x804df92f: length: (5): lcall 0x00005F3E #call KiDeliverApc Verified: if disabled, the entire qemu system is frozen, not accepting commands and also blocked gdb. [12] Check again how the part of the code is visited @EIP 0x804df8fc: length: (7): testl $0x00020000, 0x70(%ebp) #check IRQ is 0x20000 (timer) @EIP 0x804df903: length: (2): jnz 0x00000008 @EIP 0x804df905: length: (4): testb $0x01, 0x6C(%ebp) #check IRQL is 1 @EIP 0x804df909: length: (2): jz 0x00000036 #--------- the following will be skipped (however, it will sometimes be hit) #in snap111 it is hit especially after sending a key #in snap222 it never fired # the following is HIT only if(IRQ has 0x20000 set || IRQL!=1) # SO the following is HIT only when it's NOT timer interrupt. @EIP 0x804df90b: length: (6): movl 0xFFDFF124, %ebx #FFDFF124 points KTHREAD, now ebx has Thread @EIP 0x804df911: length: (4): movb $0x00, 0x2E(%ebx) # set _KTHREAD->Alerted to 0 @EIP 0x804df915: length: (4): cmpb $0x00, 0x4A(%ebx) # _KTHREAD->ApcState->UserAPCPending @EIP 0x804df919: length: (2): jz 0x00000026 #if no ApcState->UserAPCPending skip following *** observation: in snap111, most of these are coming from testrb 0x01,0x6c(%ebp) (check IRQ1); in snap222, the same jz at 0x804df909 is hit, but it never jumps into the 0x804df90b branch. *** find who's writing to 0x6c(%ebp) - not working. too many writes to it. [13] Consider call of SwapContext again. @EIP 0x804dbec0: length: (1): pushf SwapContext is never invoked in snap222 except the first time. The hit of SwapContext is rare for 177 in snap111 as well. 9:00AM 02/10/2014 -------------------------------------------------------------------------------------- Task 208: Find out why snap222 does not trigger SwapContext -------------------------------------------------------------------------------------- Observation: [1] SwapContext is never called in snap222 [2] _KPRCB->QuantumEnd is updated in KeUpdateRunTime, but it's never showing up in snap111. [1] try locate KeUpdateRunTime. [1] get the signature string: [2] add a function isKeUpdateRunTime: still could not get the KeUpdateRunTimer. Strange??? 10:00AM 02/11/2014 [3] try another snippet of isKeUpdateRunTime. Still too slow. Doing a dir does not work. It seems that before turning bLog=1, if we do dir, and then do it again it will work (pages loaded?) to improve the speed, comment out the disasm and sprintf line. Still could not capture any code of KeUpdateRunTime. give it up. [2] observation: it seems _KPRCB->QuantumEnd is a fixed number Approach: read code of KiDispatchInterrupt, there is a branch about if(prcb->QuantumEnd){ prcb->QuantumEnd= 0; KiQuantumEnd; } the prcb->QuantumEnd (after setting a write BP on it), is modified in KeUpdateRunTime, it's assigned with the value of ESP prcb is identified as ffdff9ac. [2.1] in helper_mem function read/write set a bp and check who is accessing ffdff9ac. [1] read. Verified. it's been read in KiDispatchInterrupt. Code is only slightly different from the windbg version. [2] write. [1] performed at the end of KiDispatchInterrupt [2] found also where it is written, looks like KeUpdateRunTime observation: it's called much less frequently after the snapshot is loaded. Ideitnfied: it's a part of KeUpdateRunTime! [that explains why after snapshot is loaded a while >10 seconds, KeupdateRunTime is not hit again! strange!] The following is the partial code from QEMU: ############ YEAH!!! FINALLY FOUND the KeUpdateRunTime !!!! ############################## ############ !!!!!!!!!!!!!!!!!!!!!! KeUpdateRunTime !!!!!!!!!!!!!! in QEMU !!!!!!!!!!!!! @EIP 0x804e39c1: length: (4): subb $0x03, 0x6F(%ebx) # CurrentThread->Quantum-=3 ### 0x6f(ebx) is Currentthread->Quantum is NOT always located at the same place!!! ### but usually they have values like 0x00000000fa (not big value) @EIP 0x804e39c5: length: (2): jg 0x0000001B #if >0, skip; else update QuantumEnd @EIP 0x804e39c7: length: (6): cmpl 0x12C(%eax), %ebx @EIP 0x804e39cd: length: (2): jz 0x00000013 @EIP 0x804e39cf: length: (6): movl %esp, 0x9AC(%eax) #write ESP to prcb->QuantumEnd @EIP 0x804e39d5: length: (5): mov $0x00000002, %ecx @EIP 0x804e39da: length: (6): lcall *0x804D7654 #nt!_imp_HalRequestSoftwareInterrupt @EIP 0x804e39e0: length: (1): pop %ebx @EIP 0x804e39e1: length: (3): ret $0x0004 ############ !!!!!!!!!!!!!!!!!!!!!! End of KeUpdateRunTime !!!!!!!!!!!!!! in QEMU !!!!!!!!!!!!! [2.3] set a breakpoint at 0x804e39c1 and 0x804e39cf and see how frequently they are hit. Observation: interestingly both are hit ONCE and ONLY ONCE!!!! why???? [2.4] try to locate KeUpdateSystemTime. use the above two breakpoints and set then bp on helper_trace2 and display print_instrRange(eip_in, eip_in+1, env), until it hits the next instruction after ret. ############ YEAH!!!!!!! FINALLY FOUND THE KeUpdateSystemTime !!! ###################### @EIP 0x804e373d: length: (5): mov $0xFFDF0000, %ecx @EIP 0x804e3742: length: (3): movl 0x8(%ecx), %edi @EIP 0x804e3745: length: (3): movl 0xC(%ecx), %esi @EIP 0x804e3748: length: (2): add %eax, %edi @EIP 0x804e374a: length: (3): adc $0x00, %esi @EIP 0x804e374d: length: (3): movl %esi, 0x10(%ecx) @EIP 0x804e3750: length: (3): movl %edi, 0x8(%ecx) @EIP 0x804e3753: length: (3): movl %esi, 0xC(%ecx) @EIP 0x804e3756: length: (6): subl %eax, 0x80551994 @EIP 0x804e375c: length: (5): movl 0x80551980, %eax @EIP 0x804e3761: length: (2): mov %eax, %ebx @EIP 0x804e3763: length: (6): jg 0x0000008A @EIP 0x804e3769: length: (5): mov $0xFFDF0000, %ebx @EIP 0x804e376e: length: (3): movl 0x14(%ebx), %ecx @EIP 0x804e3771: length: (3): movl 0x18(%ebx), %edx @EIP 0x804e3774: length: (6): addl 0x80551990, %ecx @EIP 0x804e377a: length: (3): adc $0x00, %edx @EIP 0x804e377d: length: (3): movl %edx, 0x1C(%ebx) @EIP 0x804e3780: length: (3): movl %ecx, 0x14(%ebx) @EIP 0x804e3783: length: (3): movl %edx, 0x18(%ebx) @EIP 0x804e3786: length: (2): mov %eax, %ebx @EIP 0x804e3788: length: (2): mov %eax, %ecx @EIP 0x804e378a: length: (6): movl 0x80551984, %edx @EIP 0x804e3790: length: (3): add $0x01, %ecx @EIP 0x804e3793: length: (3): adc $0x00, %edx @EIP 0x804e3796: length: (6): movl %edx, 0x80551988 @EIP 0x804e379c: length: (6): movl %ecx, 0x80551980 @EIP 0x804e37a2: length: (6): movl %edx, 0x80551984 @EIP 0x804e37a8: length: (1): push %eax @EIP 0x804e37a9: length: (5): movl 0xFFDF0000, %eax @EIP 0x804e37ae: length: (3): add $0x01, %eax @EIP 0x804e37b1: length: (2): jnc 0x00000008 @EIP 0x804e37b3: length: (6): incl 0x8055671C @EIP 0x804e37b9: length: (5): movl 0x80556718, %eax @EIP 0x804e37be: length: (7): imull 0x8055671C, %eax @EIP 0x804e37c5: length: (2): add %ecx, %eax @EIP 0x804e37c7: length: (5): movl %eax, 0xFFDF0000 @EIP 0x804e37cc: length: (1): pop %eax @EIP 0x804e37cd: length: (5): and $0x000000FF, %eax @EIP 0x804e37d2: length: (7): leal -0x7FAA6400(,%eax,8), %ecx @EIP 0x804e37d9: length: (2): movl (%ecx), %edx @EIP 0x804e37db: length: (2): cmp %edx, %ecx @EIP 0x804e37dd: length: (2): jz 0x0000000E @EIP 0x804e37df: length: (3): cmpl -0x4(%edx), %esi @EIP 0x804e37e2: length: (2): jc 0x00000009 @EIP 0x804e37e4: length: (2): ja 0x00000027 @EIP 0x804e37e6: length: (3): cmpl -0x8(%edx), %edi @EIP 0x804e37e9: length: (2): jnc 0x00000022 @EIP 0x804e37eb: length: (1): inc %eax @EIP 0x804e37ec: length: (1): inc %ebx @EIP 0x804e37ed: length: (5): and $0x000000FF, %eax @EIP 0x804e37f2: length: (7): leal -0x7FAA6400(,%eax,8), %ecx @EIP 0x804e37f9: length: (2): movl (%ecx), %edx @EIP 0x804e37fb: length: (2): cmp %edx, %ecx @EIP 0x804e37fd: length: (2): jz 0x00000052 @EIP 0x804e37ff: length: (3): cmpl -0x4(%edx), %esi @EIP 0x804e3802: length: (2): jc 0x0000004D @EIP 0x804e3804: length: (2): ja 0x00000007 @EIP 0x804e3806: length: (3): cmpl -0x8(%edx), %edi @EIP 0x804e3809: length: (2): jc 0x00000046 @EIP 0x804e380b: length: (6): movl 0xFFDFF020, %ecx @EIP 0x804e3811: length: (6): leal 0x80559984, %eax @EIP 0x804e3817: length: (6): leal 0x8A0(%ecx), %edx @EIP 0x804e381d: length: (4): cmpl $0x00, 0x18(%eax) @EIP 0x804e3821: length: (2): jnz 0x0000002E @EIP 0x804e3823: length: (1): cli @EIP 0x804e3824: length: (6): incl 0x870(%ecx) @EIP 0x804e382a: length: (3): movl %edx, 0x18(%eax) @EIP 0x804e382d: length: (3): movl %ebx, 0x10(%eax) @EIP 0x804e3830: length: (6): add $0x00000860, %ecx @EIP 0x804e3836: length: (3): movl 0x4(%ecx), %ebx @EIP 0x804e3839: length: (3): movl %eax, 0x4(%ecx) @EIP 0x804e383c: length: (2): movl %eax, (%ebx) @EIP 0x804e383e: length: (2): movl %ecx, (%eax) @EIP 0x804e3840: length: (3): movl %ebx, 0x4(%eax) @EIP 0x804e3843: length: (1): sti @EIP 0x804e3844: length: (5): mov $0x00000002, %ecx @EIP 0x804e3849: length: (6): lcall *0x804D7654 @EIP 0x804e384f: length: (7): cmpb $0x00, 0x805530C1 @EIP 0x804e3856: length: (2): jnz 0x0000003C @EIP 0x804e3858: length: (7): cmpl $0x00, 0x80551994 #comp KiTickOffset @EIP 0x804e385f: length: (2): jg 0x00000021 @EIP 0x804e3861: length: (5): movl 0x8055198C, %eax #eax:= [KeMatxTickOffset] @EIP 0x804e3866: length: (6): addl %eax, 0x80551994 @EIP 0x804e386c: length: (3): pushl (%esp) @EIP 0x804e386f: length: (5): lcall 0x0000003E # call KeUpdateRunTime @EIP 0x804e3874: length: (1): cli @EIP 0x804e3875: length: (6): lcall *0x804D75DC #_imp__HalEndSystemInterrupt @EIP 0x804e387b: length: (5): ljmp 0xFFFFC080 #nt!KiExceptionExit # --------- END Of KeUpdateSystemTime ---------------------------------- ##### [2.4] set bp on 0x804e3858 (compare KiTickOffset) and @EIP 0x804e373d (beginning of KeUpdateSystemTime) Observation: KeUpdateSystemTime is also called only ONCE!!!! forever!!! check who's calling KeUpdateSystemTime. The problem is that it never returns from @EIP 0x804e3875: length: (6): lcall *0x804D75DC #_imp__HalEndSystemInterrupt [2.5] set a conditional bp on 0x804e3875 and delve into _imp__HalEndSystemInterrupt @EIP 0x806eec50: length: (2): xor %ecx, %ecx #hit many times, but KeUpdateSystem is #never called again. @EIP 0x806eec52: length: (4): movb 0x4(%esp), %cl @EIP 0x806eec56: length: (6): movb -0x7F911DA8(%ecx), %cl @EIP 0x806eec5c: length: (10): movl $0x00000000, 0xFFFE00B0 @EIP 0x806eec66: length: (3): cmp $0x41, %cl @EIP 0x806eec69: length: (2): jc 0x00000011 # will actually jump to 7a @EIP 0x806eec6b: length: (1): push %ecx @EIP 0x806eec6c: length: (5): lcall 0xFF9DB58A @EIP 0x806eec71: length: (1): nop @EIP 0x806eec72: length: (5): lcall 0xFF9DB547 @EIP 0x806eec77: length: (3): ret $0x0008 @EIP 0x806eec6b: length: (1): push %ecx @EIP 0x806eec6c: length: (5): lcall 0xFF9DB58A @EIP 0x806eec71: length: (1): nop @EIP 0x806eec72: length: (5): lcall 0xFF9DB547 @EIP 0x806eec77: length: (3): ret $0x0008 @EIP 0x806eec7a: length: (7): cmpb $0x00, 0xFFDFF096 # !!! jump here @EIP 0x806eec81: length: (7): movb $0x00, 0xFFDFF095 @EIP 0x806eec88: length: (2): jz 0xFFFFFFE3 @EIP 0x806eec8a: length: (5): push $0x00000041 @EIP 0x806eec8f: length: (5): lcall 0xFF9DB567 #??? @EIP 0x806eec94: length: (1): push %ebx @EIP 0x806eec95: length: (1): push %ecx @EIP 0x806eec96: length: (1): sti @EIP 0x806eec97: length: (7): movb $0x00, 0xFFDFF096 ### will breakhere @EIP 0x806eec9e: length: (6): lcall *0x806EC430 --> calls 804dbe03 # this is KiDispatchInterrupt. @EIP 0x806eeca4: length: (1): cli #----------- NEVER REACHED @EIP 0x806eeca5: length: (1): pop %ecx @EIP 0x806eeca6: length: (1): pop %ebx @EIP 0x806eeca7: length: (2): ljmp 0xFFFFFFC4 @EIP 0x806eeca9: length: (3): leal (%ecx), %ecx @EIP 0x806eecac: length: (2): xor %eax, %eax @EIP 0x806eecae: length: (4): movb 0x4(%esp), %al @EIP 0x806eecb2: length: (6): movb -0x7F911DA8(%eax), %al @EIP 0x806eecb8: length: (1): nop @EIP 0x806eecb9: length: (5): lcall 0xFF9DB4F6 @EIP 0x806eecbe: length: (5): lcall 0xFF9DB531 @EIP 0x806eecc3: length: (4): movl 0xC(%esp), %eax @EIP 0x806eecc7: length: (3): shr $0x04, %ecx @EIP 0x806eecca: length: (6): movb -0x7F906F78(%ecx), %cl @EIP 0x806eecd0: length: (2): movb %cl, (%eax) @EIP 0x806eecd2: length: (5): mov $0x00000001, %eax @EIP 0x806eecd7: length: (1): sti @EIP 0x806eecd8: length: (3): cmp $0x02, %cl @EIP 0x806eecdb: length: (2): jnc 0x00000009 @EIP 0x806eecdd: length: (7): movb $0x02, 0xFFDFF095 @EIP 0x806eece4: length: (3): ret $0x000C @EIP 0x806eece7: length: (1): int3 [2.6] it seems that KeUpdateSystemTime -> KeUpdateRunTime ->halEndSystemInterrupt (multiple times) First instruction of KeUpdateSystemTime @EIP 0x804e373d: length: (5): mov $0xFFDF0000, %ecx Just check what is the previous eip. The previous instruction is located at 0x806f46d4. (ljmp) 9:30AM 02/12/2014 [2.7] record the previous 100 instructions [estimate: 40 min] [1] declare a queue and append it in queue about eip [15 min] DONE [2] declare a function that prints the contents of queue [10 min] DONE [3] break on 0x804e373d and see what are the previous 100 instructions and then find the lcall or ljmp instructions [15 min] found that KeUpdateSystem it invoked by a ljmp Search for the instruction sequence of hte container function. There are a lot of in/out to 70/71 ports (verified it's real time clock) search for the following two signatures @EIP 0x806ece14: length: (2): mov $0x0C, %al @EIP 0x806ece16: length: (2): out %al, $0x70 There are many candidates. Trick (out 0xc, 0x70) is repeated twice in the QEMU version. The following are the list of candidates: (1) HalStartProfileInterrupt (not right) the second is 0xd to 0x70 port. (2) HalStopProfileInterrupt not right, it outputs 0b, 0c, 0d (3) HalpSetWakeAlarm - ends with output 0d to 0x70 none of them work! [4] another attempt: check how many times the following instruction is hit: EIP 0x806ecf7c: length: (6): cmpl %ebx, 0x806F4ED4 This instruction is ONLY hit once only!!!! [5] maybe we should display 150 instructions instead. Found the instruction sequence starts from 0x806ecd34. Strangely, 0x806ecd34 is ONLY HIT ONCE! It's triggered by hardware interrupt 209. -------------------- @EIP 0x806ecd34: length: (1): push %esp @EIP 0x806ecd35: length: (1): push %ebp @EIP 0x806ecd36: length: (1): push %ebx @EIP 0x806ecd37: length: (1): push %esi @EIP 0x806ecd38: length: (1): push %edi @EIP 0x806ecd39: length: (3): sub $0x54, %esp @EIP 0x806ecd3c: length: (2): mov %esp, %ebp @EIP 0x806ecd3e: length: (4): movl %eax, 0x44(%esp) @EIP 0x806ecd42: length: (4): movl %ecx, 0x40(%esp) @EIP 0x806ecd46: length: (4): movl %edx, 0x3C(%esp) @EIP 0x806ecd4a: length: (8): testl $0x00020000, 0x70(%esp) @EIP 0x806ecd52: length: (2): jnz 0xFFFFFFBA @EIP 0x806ecd54: length: (6): cmpw $0x08, 0x6C(%esp) @EIP 0x806ecd5a: length: (2): jz 0x00000025 @EIP 0x806ecd7f: length: (7): movl %fs:0x0, %ebx @EIP 0x806ecd86: length: (11): movl $0xFFFFFFFF, %fs:0x0 @EIP 0x806ecd91: length: (4): movl %ebx, 0x4C(%esp) @EIP 0x806ecd95: length: (6): cmp $0x00010000, %esp @EIP 0x806ecd9b: length: (6): jc 0xFFFFFF49 @EIP 0x806ecda1: length: (8): movl $0x00000000, 0x64(%esp) @EIP 0x806ecda9: length: (1): cld @EIP 0x806ecdaa: length: (3): movl 0x60(%ebp), %ebx @EIP 0x806ecdad: length: (3): movl 0x68(%ebp), %edi @EIP 0x806ecdb0: length: (3): movl %edx, 0xC(%ebp) @EIP 0x806ecdb3: length: (7): movl $0xBADB0D00, 0x8(%ebp) @EIP 0x806ecdba: length: (3): movl %ebx, (%ebp) @EIP 0x806ecdbd: length: (3): movl %edi, 0x4(%ebp) @EIP 0x806ecdc0: length: (7): testb $0xFF, 0xFFDFF050 @EIP 0x806ecdc7: length: (6): jnz 0xFFFFFE99 @EIP 0x806ecdcd: length: (5): push $0x000000D1 @EIP 0x806ecdd2: length: (3): sub $0x04, %esp @EIP 0x806ecdd5: length: (1): push %esp @EIP 0x806ecdd6: length: (5): push $0x000000D1 @EIP 0x806ecddb: length: (2): push $0x1C @EIP 0x806ecddd: length: (5): lcall 0x00001ECF @EIP 0x806eecac: length: (2): xor %eax, %eax @EIP 0x806eecae: length: (4): movb 0x4(%esp), %al @EIP 0x806eecb2: length: (6): movb -0x7F911DA8(%eax), %al @EIP 0x806eecb8: length: (1): nop @EIP 0x806eecb9: length: (5): lcall 0xFF9DB4F6 @EIP 0x800ca1af: length: (2): out %al, $0x7E @EIP 0x800ca1b1: length: (7): movzxb 0x800CA300, %ecx @EIP 0x800ca1b8: length: (1): ret @EIP 0x806eecbe: length: (5): lcall 0xFF9DB531 @EIP 0x800ca1ef: length: (1): push %eax @EIP 0x800ca1f0: length: (5): lcall 0x00000006 @EIP 0x800ca1f6: length: (1): pushf @EIP 0x800ca1f7: length: (1): push %eax @EIP 0x800ca1f8: length: (1): push %ebx @EIP 0x800ca1f9: length: (2): out %al, $0x7E @EIP 0x800ca1fb: length: (5): movl 0x800CA300, %eax @EIP 0x800ca200: length: (2): mov %eax, %ebx @EIP 0x800ca202: length: (4): movb 0x10(%esp), %bl @EIP 0x800ca206: length: (8): lock cmpxchgl %ebx, 0x800CA300 @EIP 0x800ca20e: length: (2): jnz 0xFFFFFFED @EIP 0x800ca210: length: (2): cmp %bh, %bl @EIP 0x800ca212: length: (2): jnc 0x00000004 @EIP 0x800ca216: length: (3): rol $0x08, %ebx @EIP 0x800ca219: length: (2): cmp %bh, %bl @EIP 0x800ca21b: length: (2): ja 0x00000008 @EIP 0x800ca21d: length: (1): pop %ebx @EIP 0x800ca21e: length: (1): pop %eax @EIP 0x800ca21f: length: (1): popf @EIP 0x800ca220: length: (3): ret $0x0004 @EIP 0x800ca1f5: length: (1): ret @EIP 0x806eecc3: length: (4): movl 0xC(%esp), %eax @EIP 0x806eecc7: length: (3): shr $0x04, %ecx @EIP 0x806eecca: length: (6): movb -0x7F906F78(%ecx), %cl @EIP 0x806eecd0: length: (2): movb %cl, (%eax) @EIP 0x806eecd2: length: (5): mov $0x00000001, %eax @EIP 0x806eecd7: length: (1): sti @EIP 0x806eecd8: length: (3): cmp $0x02, %cl @EIP 0x806eecdb: length: (2): jnc 0x00000009 @EIP 0x806eecdd: length: (7): movb $0x02, 0xFFDFF095 @EIP 0x806eece4: length: (3): ret $0x000C @EIP 0x806ecde2: length: (7): cmpb $0x00, 0x806F97A4 @EIP 0x806ecde9: length: (2): jz 0x00000007 @EIP 0x806ecdf0: length: (5): movb 0x806F4ECC, %al @EIP 0x806ecdf5: length: (2): or %al, %al @EIP 0x806ecdf7: length: (2): jz 0x00000018 @EIP 0x806ecdf9: length: (7): addb $0x56, 0x806F4ECD @EIP 0x806ece00: length: (2): jnc 0x0000000F @EIP 0x806ece0f: length: (5): lcall 0xFFFFFC89 @EIP 0x806eca98: length: (1): push %eax @EIP 0x806eca99: length: (1): pushf @EIP 0x806eca9a: length: (1): cli @EIP 0x806eca9b: length: (6): leal 0x806FDF20, %eax @EIP 0x806ecaa1: length: (6): popl 0x806F4E84 @EIP 0x806ecaa7: length: (1): pop %eax @EIP 0x806ecaa8: length: (1): ret @EIP 0x806ece14: length: (2): mov $0x0C, %al @EIP 0x806ece16: length: (2): out %al, $0x70 @EIP 0x806ece18: length: (2): pushf @EIP 0x806ece1a: length: (2): popf @EIP 0x806ece1c: length: (2): ljmp 0x00000002 @EIP 0x806ece1e: length: (2): in $0x71, %al @EIP 0x806ece20: length: (2): pushf @EIP 0x806ece22: length: (2): popf @EIP 0x806ece24: length: (2): ljmp 0x00000002 @EIP 0x806ece26: length: (2): mov $0x0C, %al @EIP 0x806ece28: length: (2): out %al, $0x70 @EIP 0x806ece2a: length: (2): pushf @EIP 0x806ece2c: length: (2): popf @EIP 0x806ece2e: length: (2): ljmp 0x00000002 @EIP 0x806ece30: length: (2): in $0x71, %al @EIP 0x806ece32: length: (2): pushf @EIP 0x806ece34: length: (2): popf @EIP 0x806ece36: length: (2): ljmp 0x00000002 @EIP 0x806ece38: length: (5): lcall 0xFFFFFC7C @EIP 0x806ecab4: length: (1): push %eax @EIP 0x806ecab5: length: (6): pushl 0x806F4E84 @EIP 0x806ecabb: length: (6): leal 0x806FDF20, %eax @EIP 0x806ecac1: length: (1): popf @EIP 0x806ecac2: length: (1): pop %eax @EIP 0x806ecac3: length: (1): ret @EIP 0x806ece3d: length: (5): movl 0x806F4EAC, %eax @EIP 0x806ece42: length: (2): xor %ebx, %ebx @EIP 0x806ece44: length: (6): movl 0x806F4EB0, %ecx @EIP 0x806ece4a: length: (6): addb %cl, 0x806F4EC4 @EIP 0x806ece50: length: (2): sbb %ebx, %eax @EIP 0x806ece52: length: (7): cmpl $0x00, 0x806F9920 @EIP 0x806ece59: length: (6): jz 0x00000123 @EIP 0x806ecf7c: length: (6): cmpl %ebx, 0x806F4ED4 @EIP 0x806ecf82: length: (6): jz 0x00007752 @EIP 0x806f46d4: length: (6): ljmp *0x806EC40C @EIP 0x804e373d: length: (5): mov $0xFFDF0000, %ecx ################## Conclusion: it seems to be INTERRUPT NO 209 who triggers the KeUpdateRunTimer and KeUpdateSystemTime. QUESTION IS: who's generating interrupt 209? in snap222, there is no 209 at all; in snap111, there is ONLY one 209. If load the system from the image, we are getting a lot of 209's. ################## -------------------------------------------------------------------------------------- Task 209: Find out who's generating interrupt 209. -------------------------------------------------------------------------------------- [1] figure out the irq and irr number of 209. Check cpu_get_pic_interrupt(env); debug: in qemu command line, do loadvm snap111 and immediately ctrl+c. cpu_exec.c:330 209 is generated by apic_get_interrupt(env->apic_state); It's irql 5 and 0x20000 , which actually should generate interrupt 177. If it's irq 6 and 0x20000, then it generates 209 [2] use mem hardware breakpoint to trace on s->tab[6] found the following: #### !!!! yeah finally found that it's the rtc_period_timer!!!! #0 set_bit (tab=0x28dcf830, index=209) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:60 #1 0x08272810 in apic_set_irq (s=0x28dce510, vector_num=209, trigger_mode=0) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:390 #2 0x08272307 in apic_bus_deliver (deliver_bitmask=0xbffff01c, delivery_mode=0 '\000', vector_num=209 '\321', trigger_mode=0 '\000') at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:277 #3 0x082723e0 in apic_deliver_irq (dest=1 '\001', dest_mode=1 '\001', delivery_mode=0 '\000', vector_num=209 '\321', trigger_mode=0 '\000') at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:290 #4 0x08275491 in ioapic_service (s=0x28ded7b8) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:71 #5 0x08275608 in ioapic_set_irq (opaque=0x28ded7b8, vector=8, level=1) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:111 #6 0x08126d7a in qemu_set_irq (irq=0x28dea040, level=1) at hw/irq.c:38 #7 0x08284f5a in gsi_handler (opaque=0x28de5228, n=8, level=1) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../pc.c:98 #8 0x08126d7a in qemu_set_irq (irq=0x28de5330, level=1) at hw/irq.c:38 #9 0x0810f271 in hpet_handle_legacy_irq (opaque=0x28deb738, n=1, level=1) at hw/hpet.c:677 #10 0x08126d7a in qemu_set_irq (irq=0x28e0781c, level=1) at hw/irq.c:38 #11 0x08280d46 in qemu_irq_raise (irq=0x28e0781c) at /home/csc288/qemu/qemu-1.4.0/hw/irq.h:14 #12 0x082814e1 in rtc_periodic_timer (opaque=0x28e07178) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../mc146818rtc.c:200 #13 0x081e35f4 in qemu_run_timers (clock=0x28c45a18) at qemu-timer.c:394 #14 0x081e3813 in qemu_run_all_timers () at qemu-timer.c:452 #15 0x081b62b4 in main_loop_wait (nonblocking=0) at main-loop.c:436 #16 0x08235954 in main_loop () at vl.c:2007 #17 0x0823c991 in main (argc=14, argv=0xbffff744, envp=0xbffff780) at vl.c:4341 ##################### Conclusion: somehow RTC timer is NEVER triggered! But RTC timer is used to trigger the KeUpdateSystemTiemr->KeUpdateRunTime -> SwapContext. ##################### 7:30PM 02/13/2014. -------------------------------------------------------------------------------------- Task 210: Try to solve the problem of drifting timer. -------------------------------------------------------------------------------------- [1] bp on qemu_run_all_timers and then run into qemu_run_timers -> then delve into qemu_run_timer(vm_clock) [2] change the clock values and see how it's going. [3] observation: it seems that cpu_get_clock() returns the real time from the host (instead qemu_get_clock_ns() -> check clock_type (VM_CLOCK, correct) -> check itype use_icount (check) if(use_icount) is defined, it will call gen_io_start()/gen_io_end() for i/o operations. of vm clock!) [4] if simply change the use_icount to 1, it will try to read the qemu_get_clock_ns() from the icount (instruction count), which is currently 0. This eventually leads to a segmentation fault. it's expectable because some instructions are not translated with io/start [5] still keep the same use_icount as 0, but change all the timers and see the effects. doing vmclock->active_timers->next->next ... there are three timers: pit_irq_timer, rtc_update_timer, rtc_periodic_timer. It seems that the third timer triggers the 209 interrupt. now chagne all of their expire_time to the current time (in debugging) Observation: it still does not work on snap222. interrupt 209 is only triggered once and not again. Guess: still not enough 209 triggered. [6] figure out the question why ts->expire_time is changed back. It is also affected by PITChannel* opaque->next_transition_time, which is changed in pit_irq_timer_update 9:00AM 02/14/2014 [7] figure out how the 3rd timer is called actually. set the expire_time of all three timers and then watch *(&(clock->active_timers)) there are four timers found: $4 = (QEMUTimer *) 0x28e07d18 /PIT $5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer $6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer $7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer Recorded events: [1] hit run_all_timres again [2] mod PIT and temporarily set to apic_pm_tmr_timer [3] in qemu_mod_timer_ns. The apic_pm_tmr_timer (currently the top one) is also expired, re-insert the pit timer before apic_pm_tmr_timer and resets its expire time. the operation is an insert operation into the list. now PIT is the first timer again. [4] ** after the call back ts->cb, the expire_time was reversed (need to change later) pit timer is again expiring in the loop. .... it's the pit timer always getting updaed almost [5] set condition bp on qemu_mod_timer_ns, condition is set to that it's not the pit timer. found that the apic_pm_tmr_timer is updated by an out instruction (io port writing, addr: 45056). 10:45AM [6] add the condition again and look at how rtc_periodic_timer is updated. reset expire_time of everybody. It is called inside qemu_run_timer(vm_clock) due to the loop. Need to understand it completely!!! for(;;) { ts = clock->active_timers; //ts points to head of clock timer list if (!qemu_timer_expired_ns(ts, current_time)) { //if ts->expire_time>current_time break out of loop break; } //if ts->expire_time<=current_time (i.e., the head timer expired) clock->active_timers = ts->next; ts->next = NULL; /* run the callback (the timer list can be modified) */ //call the call back function to //(1) send the interrupt out //(2) modify the timer->expire_time based on (opaque to increase to // next transition time and then insert the timer back ts->cb(ts->opaque); // it will call qemu_mod_timer_ns for all timers // the following is form qemu_mod_timer_ns //pt points to 2ND timer this moment //ts is actually the first timer pt = &ts->clock->active_timers; for(;;) { //go search for the 1st NOT EXPIRED timer t = *pt; if (!qemu_timer_expired_ns(t, expire_time)) { break; } pt = &t->next; } ts->expire_time = expire_time; //set 1st timer NOT EXPIRE ts->next = *pt; //link to the NOT EXPIRED timer *pt = ts; //AFTER THIS IS DONE the clock->active_timers points to //shape: EXPIRED, EXPIRE, NOT_EXPIRE, NOT_EXPIRE, NOT EXPIRE } //--------------------------------------------------- //CONCLUSION: // the outer loop is supposed to update actually ONE expired timer ONLY!!! //! so we should make the timres expire //--------------------------------------------------- --------------------- [8] check the status of all timers in snap222. Current_time: 0x701ff06b531 $4 = (QEMUTimer *) 0x28e07d18 /pit_irq_timer 0x6fe778196cb (expired) $5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer 0x6fec62748a9 (expired) $6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer 0x134e83cf833f61e1 (not expired) $7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer 0x134e83cfbc7b8b00 (not expired) after resetting all of them(expire time) to current_time, after 100 clock updates Current+time: 0x741f11fc12b $4 = (QEMUTimer *) 0x28e07d18 /pit_irq_timer 0x7401bb625f1 (expired) $7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer 0x7402908cab0 (expired) $5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer 0x740400f72a1 (expired) $6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer 0x134e83cf842dcd09 (UPDATED NOT RIGHT) -> it's reading from the host time resource!!!! [that's the reaon it's NEVER GOING to be updated!] Note: 209 is triggered by rtc_periodic_timer!!!!! That explains. [9] check how is rtc_periodic_timer updated its source, set a hw bp on it. Find that it is the RTCState->next_periodic_time decides it: as shown below: static void rtc_periodic_timer(void *opaque) 179 { 180 RTCState *s = opaque; 181 182 periodic_timer_update(s, s->next_periodic_time); What about resent s->next_periodic_time as well? It will reset the things right. [10] new problem: pic_irq_timer updates too slow. It will obstruct the updates of other timer! understand how it's getting updated. it uses PITChannelState->next_transition_time. Basically, it's to increment the PITChannelState, increment by 65545 every time, and it's not affected by the current time (real). there will be some drift of time as it proceeds. As it progresses, the PIT timer kind of live locks the qemu_update_timer. First, it wastes a lot of cycle (generate hundreds/thousands of mini incremental interrupts) many many times (most are ignored because they are repeating each other). Proposed mod: in pit_irq_timer: get the current time using qemu_get_clock_ns(s->irq_timer->clock) DONE. manually modify the rtc_periodic_timer->expire_time and then ((RTCState *)s->opaque)->next_periodic_time to current time. Now triggers interrupt 209 frequently; HOWEVER, it still does not accept keyboard! 8:30AM 02/15/2014 [11] repeat the experiment on snap222 and see what is wrong. [a] in qemu_run_all_timers(), drill into qemu_update_timer() for vm_clock, then set the current_time as the expire_time for all timers linked by ts->next->next ... Specifically, identify the rtc_periodic_timer, update its opaque attribute which is actually an RTCState ((RTCState *)s->opaque)->next_periodic_time to current time. [b] break on do_interrupt_all and see if 209 is triggered. [c] check KeUpdateSystemTime and KeUpdateRunTime and see if they are invoked. KeUpdateSystemTime: 0x806ecd34 (hit), 0x804e373d (hit) KeUpdateRunTime: 0x804e39c1 (CurrentThread->Quantum-=3) (hit) -- the above two are hit multiple times -- SwapContext: @EIP 0x804dbec0: length: (1): pushf (hit) (hit less frequently, but is hit multiple times) Note sure what is broken on snap222, but no processes are detected though. Strange? even with SwapContext already working! Will revisit snap222 later. [12] Repeat the above on snap111 and save a snapshot as snap333. Working! But something is wrong which triggers blue screen. b raise_exception and see what's wrong. exception index is 6 or 7. But could not identify how it's generated. make clean install. Still not working. It may be caused by the invalid clock value. Try update all timers and opaque corresondingly. Seems to work. wait 10 minutes. Observation: the net use command is much faster than the snap111 (probalbly the interleaving improves the response to network packets!!!) ****************************************************************************************** ------------------------------------------------------------------------------------------ Snapshot Problem Finally Solved Completely! Points: see [11] of 8:30AM 02/15/2014 notes! got to update all opaque state of the four timers associated with the vm clock (bp on the corresponding ts->cb function). Cause: it's caused by the RTC periodic timer, interrupt 209 is NEVER fired. We fix the clock value and REMOVED ONE bug related to PIT-timer (update using the recent value to avoid live lock by pit timer) ------------------------------------------------------------------------------------------ ****************************************************************************************** 10:30AM [13] now generate the new image and change the job sequence (so that we can save the time costly step of net use). [30 min] [1] make the updates [2] remove test use (strangely, when using net use, it is still quite vulnerable when saving VM. if not using net use, it is good - network transmission cause some trouble in buffer i/o channel?] [3] completely recompile and check. still not working. strangely if load snap333 first, loadvm snap555 would work. ??? 11:30AM [4] need to check into the details of the raise_exception. [1] find out where it is throwing. exception_index: 6 (invalid opcode) global_eip: 0x20ece set a condition bp on it (have to code it), too slow It's an instruction: les %esp, %eax The exception is genreated by disas_insn for the les instruction, because mode is 3. Note translate.c:5673 [5] trace the last 150 instructions before 0x20ece and see what is going on. the last couple of instructions is iret, which gets into 0x20ece. see below: ------------ @EIP 0xc01d5: length: (1): popa @EIP 0xc01d6: length: (1): pop %ds @EIP 0xc01d7: length: (1): pop %es @EIP 0xc01d8: length: (1): popf @EIP 0xc01d9: length: (1): iret ------------ There is an interrupt right before 0x20ece.:w @EIP 0x20ec6: length: (4): mov $0x0012, %ax @EIP 0x20eca: length: (2): addb %al, (%eax) @EIP 0x20ecc: length: (2): int $0x10 It's BIOS interrupt call (related to GUI cursor etc.) Interestingly: 20ec6 is never HIT in other snapshots (also xxec6 is never hit in other snapshots <- it may be caused by others? check interrupt) ???? 7:00PM [6] break on 20ec6 and check the last interrupt number last_intno is 8. verified it's always 8 for do_interrupt_all before hitting 0x20ec6. --> triggers 804e0f69. check if in snap333 it's popping interrupt 8. ??? ---> yes eventually. VERIFIED. snap333 will also crash as long as interrupt 8 is there! --> only occured once. could not verify. [7] figure out who's triggering interrupt 8? Should be tab[0] is 0x00000100 --> did not catch it. interrupt 8 is not triggred from cpu-exec.cc30 current eip is: 0x804e1f25 (then break on helper_trace2 on it) 0x804e1f25 is hit many times. not a good way to find out who's triggering interrupt 8 (error code is 2). It seems to be caused by 718 void tlb_fill(CPUX86State *env, target_ulong addr, int is_write, int mmu_idx, 719 uintptr_t retaddr) 720 { 721 int ret; 722 723 ret = cpu_x86_handle_mmu_fault(env, addr, is_write, mmu_idx); the ret is 1 and then it triggers an excpetio The problem seems to be: illegal write of memory. (gdb) p/x env->cr[3] $27 = 0x62ce000 (gdb) p/x addr $28 = 0xf7901ffc (gdb) p/x is_write $29 = 0x1 (gdb) p/x global_eip $30 = 0x804e1f25 (gdb) p print_instrRange(0x804e1f25, 0x804e1f26, env) @EIP 0x804e1f25: length: (7): movw $0x0000, 0x2(%esp) it is trying to save to: 0xf7901ffc when calling tlb_fill, it generates the error (guess: protection error) candidates: PG_ERROR_P_MASK, rror_code |= (is_write << PG_ERROR_W_BIT); PG_ERROR_U_MASK, PG_ERROR_I_D_MASK; it seems that error_code is set to is_write <<PG_ERROR_W_BIT general exception_index: EXCP0E_PAGE (14) ==> it seems that interrupt 14 triggres 0x804e1f25, and the usual address to load is in 0x203c range. Later: check who's triggering interrupt 14? does snap333 got interrupt 14? [1] in snap333 there are lots of interrupt 14 as well. [2] it seems 14 is the page fault. 7:30PM 02/16/2014 Guess: is it the lock causing the problem? loadvm snap555 and check the clock value. (1) set a breakpoint at savevm.c:2311. loadvm_state() (2) then BP on qemu_run_all_timers. check the state of APICTimer check (a) what is the difference between RTC_periodic_timer and RTC_update_timer (b) what is the difference between next_periodic_time and next_alarm_time seems to be the difference between periodic and oneshot ONLY updates the PIT timer and RTC_periodic_timer, ignore apic timer and rtc_update_timer. got bluescreen [1] test 1. update PIT, rtc_PERIODIC_timer, and RTC_UPDATE_TIMER. Does not work. has to start from the analysis of blue screen **** bp on do_interrupt_all if intno==14, ignore the first 736 times *** Found the problem, it triggers page exception on itself very early (from 300 imes). Need to use binary search to find the first time it triggers the problem. 8:30AM 02/17/2014 [8] continue the analysis try to identify which interrupt 14 triggers the problem. Use binary search [30 min] [1] b savevm.c:2311 first and then bp on do_interrupt_all 100 (too large) -> 50 -> 25 -> 12 --> 6 --> 3 use ignore bp 5 times, it will be trapped [9] check what are the interrupts before ignore bp 5. There are about 10 interrupts handled. Mostly are 209 and 177 (periodic and pit timer interrupts) 9:30AM [10] record the last 10 interrupts. [20 min] {209, 177, 209, 209, 65, 14, 209, 14, 14, 14} {130, 209, 130, 130, 209, 65, 209, 14, 14, 14} {65, 177, 209, 209, 209, 209, 14, 14, 14, 14} {209, 209, 209, 177, 209, 14, 209, 14, 14, 14} {209, 177, 65, 209, 209, 209, 14, 14, 14, 14} {209, 209, 177, 209, 65, 209, 14, 14, 14, 14} {130, 209, 130, 130, 209, 65, 209, 14, 14, 14} Now the question is 209 may trigger something strange that causes segmentation fault? [11] now the question: is the number of 209 interrupts fixed before the crash? [15 min] 1st time: 36, 37, 36, 35, 47 not fixed. [12] record the last 100 instructions and see what are they It seems that it's always the 0x804df14a triggering the problem (which should be apart of handler for 209). @EIP 0x804df148: length: (2): pop %fs @EIP 0x804df14a: length: (3): leal 0x54(%ebp), %esp @EIP 0x804e1f25: length: (7): movw $0x0000, 0x2(%esp) @EIP 0x804e1f25: length: (7): movw $0x0000, 0x2(%esp) @EIP 0x804e1f2c: length: (1): push %ebp @EIP 0x804e1f25: length: (7): movw $0x0000, 0x2(%esp) @EIP 0x804e1f25: length: (7): movw $0x0000, 0x2(%esp) @EIP 0x804e1f2c: length: (1): push %ebp [13] set bp on 0x804df14a -> it's hit multiple times. Most likely the last hit is: >40. and add a second condition env->cr[3]==0x62ce0000 guarantess the hit. 11:00AM [14] trace into the bp on 0x804df14a [env->cr[3]==0x62ce0000] and trace it step by step in binary and see how the interrupt is thrown. cr3 0x62ce0000 is services.exe First of all, it seems that it's the 0x804df14d pop %edi causing the problem. A normal address is something like: 0xf800cdb4 Debug: [1] bp on savevm.c:2311 (last line of loadvm) [2] bp on ops_sse.h:2473(capture 0x804df14a) [3] bp on do_interrupt_all if intno==14 FOUND THE PROBLEM!!!! HA HA HA HA ... Details: when i>2 [process idx in arrCR3 is greater than 2], it tries to read process name/file path using target_ulong FilePath = cpu_ldl_data(env, pFilePath); When the page is not there, it triggers an interrupt and failed if currently it's not in the right privilege mode. So page fault in helper_trace2 actually killed other processes (timer triggered swap), that explains why it's not switching? ================================ primary fix ===================== 1. before loading memory cpu_ldl_data, check the availability of data in page table. add a function cpu_ldl_data_safe(int &res), set res to -1 if it fails. [120 min] [1] cpu_ldl_data [2] cpu_ldu_code ================================other fixes=========================== 1. in lodavm_state in savevm.c: initialize arrCR3, numCR3 2. in helper_trace last part, if proc_state is DETERMINED, should skip the check of the process name 3:30PM 02/17/2014. -------------------------------------------------------------------------------------- Task 211: fix helper_trace1 -------------------------------------------------------------------------------------- TO DOs: ================================ primary fix ===================== 1. before loading memory cpu_ldl_data, check the availability of data in page table. add a function cpu_ldl_data_safe(int &res), set res to -1 if it fails. [120 min] [1] cpu_ldl_data [2] cpu_ldu_code ================================other fixes=========================== 1. in lodavm_state in savevm.c: initialize arrCR3, numCR3 2. in helper_trace last part, if proc_state is DETERMINED, should skip the check of the process name ============================================================================== [1] study the cpu_ldl_data and check what we can improve. [20 min] 1. cpu_ldl_data is defined in softmmu_header.h:250 (include/exec/softmmu_header.h) 2. it checks the softmmu table, if it's in table, it just load from softmmu 3. otherwise it calls helper_ldl_mmu() to load mmu (for the address) 4. helper_ldl_mmu is defined in include/exec/softmmu_template.h:97 it calls tlb_fill to perform the job which then calls cpu_x86_handle_mmu_fault() --> checks page table (and permissions) --> if the page is loaded it calls tlb_set_page() to update the tlb with the page information (physical addr) Conclusion: it seems fine to reload tlb; hoewver, if there is a page fault, it's unexpected and the interrupt cannot return to the original code; that's no good. [2] improve idea: [10 min] [a] before each cpu_ldl_data, if va_to_ha returns -1, skip the cpu_ldl_data --- potentially, if the page is never loaded, the procedure of capturing the process name could fail. Will check later. [3] implementation steps: [1] repeat the error: (1) bp savevm.c:2311, (b) bp on do_interrupt_all if intno==14, (c) b helper_trace line ops_sse.h:2473 and display eip, capture 62ec00, display count209 (should be value around 40 to 50) [2] before the problematic cpu_ldl_data add the check [2] insert the logic and see if we can save from the crash. --> after the logic is interestingly, the check logic is only hit twice before the crashpoint. verified --> it does help. However, there are other crash points. 10:30AM 02/18/2014 [4] add all other check points [1 hr] Working!!!!! 11:20AM [5] now use snap555 for the batch analysis and see what is going on. Problem: we have to wait a couple of seconds before the y:\ drive is loaded. Found that it's the problem of snapshots. [6] save a new snapshot named snap666 [7] try the new snap666. --> still problem. Does not work. Seems need to do a dir before it. [8] add a dir task and see what's going on. suddenly it's very slow. [9] apply the optimizations 1. in lodavm_state in savevm.c: initialize arrCR3, numCR3 2. in helper_trace last part, if proc_state is DETERMINED, should skip the check of the process name [10] delete unused vms. SUCCESS! 9:00AM 02/20 -------------------------------------------------------------------------------------- Task 212: test the system -------------------------------------------------------------------------------------- [1] generate the trace and then run it [1 hr] OK. [2] strangely IMM cannot debug the generated slice (which did not occur before). Check it later [3] call isDebuggerPresent and generate a new program. finding: IsDebuggerPresent only checks when the process is really run under debugger. When WinDbg is connected, it's not discovered. [4] tried the other trick of checking PEB, also not working [5] had to use INT 2D. Use the INT2D trick, service 1 (print debug string), print a NULL string. EAX return value will be different. now works. copied to /home/samba/smbuser [6] collect all the traces of checkdebug.exe. The problem is that the process is not captured! 8:45AM 02/21 [7] check why the process is not captured by the process terminate event is sent? [7.1] check when process term event is sent? it's sent by seg_helper.c, sysenter. problem: the process is actually never captured. When the send_evt performs erase cr3 in the setCR3 of the condition, the size is already 0, but it triggers the termination of capture process. [8] consider algorithm fix for capturing process. Ideal way would be trigger the page fault and load the page. If in normal mode, cpu_ldl_memory --> tlb_fill -> set up interrupt ... -> in next cpu cycle, it picks the hardware interrupt -> do_interrupt_all ---> set up the trap frame and the next EIP (basically to repeat it). --> OS routine load the page --> redo the instruction. Now the problem is that if the instruction is not REPEATABLE it causes failure. [9] design idea: first test if page is in RAM, if not check if the instruction is JMP or CALL, if they are, avoid loading page (go to next instruction); if no, then set the env->eip to the next one (so the current instruction would not have to be done twice). 10:00AM [10] experiment: in one page not in RAM case, load the page and examine what's the next EIP and check the instruction type. [20 min] $2 = 0x804df14a @EIP 0x804df14a: length: (3): leal 0x54(%ebp), %esp Then do_interrupt_all is hit, and next_eip is 0x804df148. (which is env->eip because it's not updated yet) next eip hit: 0x804e1f25 Now if we reset env->eip would that solve the problem? Does not work, it will recursively trigger page fault infinite many times. the page fault handler itself is triggering page faults. maybe forbid load memory when it's in 0x80range? does remove the failure bug. Strangely, none of the breakoint is hit. The process is not captured [11] set a BP on discovering new CR3. see if the CR3 is captured b ops_sse.h:2549 It seems that new process is 0xf32b000, but its proc_status is already set to 0 (NO) the problem is that the same cr3 occured twice! set a watch point on proc_status[7] and arrCR3[7] and arrCR3[8] After setting the watch point, it becomes ok (strange). must be some timing problem? 11:20AM [12] solve the problem of copy y:\se (l) first. [1] check how sendCommandToVM is accomplished. it calls handle_usr_command hmp_send_key 8:00PM [2] prepare a large buffer before hmp_send_key and str concat it. [20 min] check what's the string generated. Verified no problem with this function. Maybe introduced as a bug by the fix for 11?. [3] attempt 3: try disable the fix for [[1] and the problem still occurs. Could not figure why. fix [11] first. removed sleep statement, seems still 40-50% of chance of getting it wrong. 8:30AM 02/22 -------------------------------------------------------------------------------------- Task 213: Fix the problem that process name is not captured. -------------------------------------------------------------------------------------- [1] move b21.exe back and see if the problem persists. [20 min] it seems problems persists. It seems that the copy problem never occurs. if we change the file name to checkdebug the problem shows up. It may have to do with the size of file name? enlarge buffer -> still does not solve problem. All readline buf are declared with 4k, which should be suffice enough. Strange. change the name to the same size and see if problem persists. not occur change checkdebugger.exe --> b1234567890.exe problem occurs b212.exe --> repeated 8 times, problem never occurs b11112222333344445555.exe -> problem every time. CONFIRMED: longer file name cause the problem. Must be some buffer overflow. will check it later. [2] now concentrate on the process name not captured problem. [2.1] collect potential cr3 process ids: new 8th CR3 value: 0xf36b000 ---new 9th CR3 value: 0x5f33000 new 8th CR3 value: 0xf32b000 0xf2ad000 --new 8th CR3 value: 0xf36b000 ---new 9th CR3 value: 0x5f33000 Never the same! [3] Debug: [1] in all set to NO (with i>=8) set bp [2] in check file name (with i>=8 and eip<=0x800000) set bp The problem is that arrCR3 has duplicated entry, winlog.exe 5f3300 is writen into two neighbor slots. identified problem: /home/csc288/qemu/qemu-1.4.0/target-i386/translate.c:8215 8215 arrCR3[cr3Count++] = env->cr[3]; Comment out this section of code! Found the problem and now it's working! 10:45AM [4] collect the trace and do the experiment 293 slices to genreate [5] problem: cannot get file offset of 0x7c80xxxx. The problem is caused by the generate bridge which overwrites 0x7c80xxxx not in range. [5.1] generate the full dump first. for soc: sStart = 146327, tsEnd = 146535, bridge: 146536 it looks like the trace completely do not match each other. clean the full trace. error again. Problem: {tsStart = 146327, tsEnd = 146535, bModified = false, tsBridge = 146536, room = 8, tsNextStart = 14656 ---------------- DATA BELOW ------------------------ timeStamp: 146535, ins @403790: call [0x420208] read: (start: 0x420208, end: 0x42020b) write: (start: 0x12ff24, end: 0x12ff27) , ESP: 0x12ff28 -> 0x12ff24 , DEPLINKS: , R: 146534 and ESP value: 0x12ff24, M: 72984 timeStamp: 146536, ins @7c80c6cf: mov edi, edi , DEPLINKS: , R: 146517 , C: 146535 ESP: 0x12ff24 EBP: 0x12ff84 ---------------- DATA ABOVE------------------------ the problem is that the CALL cannot be the end of an SOC. Check how SOC is extended. --> still not solved. recompile it later. 9:00AM 02/23/2014 [1] find out the slice point 159421 [2] check how soc end is identified as a call instruction. set a conditional bp at socmanager::identifySOC. does not capture it [3] try to set a watch point on sm.vecSOCs[8].tsEnd could not capture it. [4] search tsEnd in soc. Result: soc (146535, 146535) is first created as a single SOC; then it is merged with socNext (146327, 146534) at socmanager.cc:79. [5] check verifyBridge set a conditional bp there. problem is in setBridgeTo it returns true (which should be actually a false) in setBridgeTo if the tsEnd is a call instruction, then it should be a direct false. The bridge itself is actually checked against call instruction in get_room(). [6] fix setBridge [20 min] --> seems to be fixed, but only one soc. problem is that it's always one soc. maybe add singleSOC should be expanded for single ts that is a jump/call. [7] test: 293 branches: started: 11:13AM--> 8:45AM -------------------------------------------------------------------------------------- Task 213: perform experiment -------------------------------------------------------------------------------------- [1] collect and copy the branch slices. There are some infinite loop slices. Put them in problem_slices Prolbme files: 60/293 [2] check the good files. 223 lines. [3] kernel debug mode. 9:00AM 02/25/2014 [4] compare. Yeah! it works!!!! branch 97 discovered the difference! 23c223 < slices\b212.exe\brc_97\b212.exe: 0x11220001^M --- > slices\b212.exe\brc_97\b212.exe: 0x22330001^M [5] apply it to b1.exe and see if it works. Verified: no difference! DONE! 10:45AM 02/25/2014 -------------------------------------------------------------------------------------- Task 214: strengthen the time-out function of runproc.cc -------------------------------------------------------------------------------------- [1] just change the time out value in milliseconds. Now change to 3 seconds. solved. All timed out gets exit value 0x103. 11:30AM 02/25/2014 -------------------------------------------------------------------------------------- Task 215: Fix the long bin executable file name problem -------------------------------------------------------------------------------------- [1] pick one .exe that is trapped in infinite loop. Example: branch_40 Trapped in a loop close 0x00409aa6 to 0x00409aaa (there is no update of the value). The problem is that 0x00409aaa is ONLY hit once and it passes. Problem: at 0x004012FC the stack pointer location is different. at 0x0040125D the call instruction did not actually balance the ESP. It did ADD ESP, 0x10 [which reverts the effect the call instruction on ESP], (however, it ignores the fact that the previous 4 ESP instructions are ignored. [2] regenerate the branch 40 and get the corresponding slice eip. ts=163443. Not working. Needs to set a conditional BP to capture it. [3] verify if branch 40 is the problematic one. [4] study why the error occurs: [1] where is the function adjust ESP? it is generated by CallAdjustRecord::asReplacement() in binWriter::writePartialTraceToFile. [2] if the function is skipped, and if the function has replacement (not nops), then we'll need to add dependency to the previous ESP writing instruction! In this case, we'll need to add a progESP function after the call of processFunction in Trace.cc [4] b binWriter.cc:433 is not hit! 8:45AM 02/26/2014 [5] try to generate an infinite loop branch. [0] generate the full trace. [1] set a conditional CallAdjustRecord::asReplacement(), need to check that it is being called by the binWriter! [2] get the ts eip: [0x403843] ts: 141819 [3] generate the trace in both mode 1 and 0. Verified the slice is not working. reason: at 0x403843, it's no go. CAPTURE SLICE DONE. [4] Fix idea:in Trace class add a function propagateESPEBPLink when a car record is available. 9:00AM 02/27/2014 [5] Implment propgateEspEbpLink - do it directly inside trace::full_slice [5.1] check if car record exists for EIP, if no, directly return call Trace::isFunctionNoChangeOnESPEBP [5.2] if esp change, finds the latest esp writing instruction and add an ESP delay link to it call delayRegDependency [5.3] if ebp change, finds the latest ebp writing instrucgtion and add an EBP delay link to it 9:40AM [5.4] test/debug ts: 141819, eip: 0x403843 problem: did not capture 0x403843, the problem is that the function itself is identified as has dependency, and is included? How come 0x403843 is not included in slice? [1] enable log and check the following timestamps: 137803 and check how it's NOT included. 137825 (the RET) Strangely: 137803 and 137802 are included in slice. When writing, they are not in slice. check when it's disabled, break on InstrInfo::setInSlice() and unmarkInSlice() Observation: the tryAddCAR is called and used to add it; but finally it is identified as having data dependency. Question: for every iteration, should we clear CallAdjustRecord????? 8:40AM 02/28/2014 [6] figure out whether timestamp 0x403843 is contained in slice. [6.1] find out how many iterations [10 min] ts: 141819, eip: 0x403843 (id = 4) [6.2] verify if it is hit in the last iteration at all [10 min] set a BP conditional and see how it's hit. captured in pass3. [6.3] problem analysis: [1] car is added even if the call is identified to have data dependency in body. which is no good. [2] car is not cleared at each pass which is no good. 9:20AM [7] fix the problem. [7.1] move the place of esp/ebp check. [5 min] DONE. [7.2] add CallAdjustRecord::clear() [5 min] [7.3] call CallAdjustRecord::clear in every pass beginning [10 min] DONE. [7.4] double check the code of propagating ESP/EBP [10 min] DONE [7.4] debug ts:141819 and binWriter.cc and check if it's hit again. [15 min] b Trace::gen_slice_for_branch set ts to 141819 b binWriter.cc:412 [8] now new problem. the generated program crashed close to the termination point. [8.1] analyze the problem. It looks like that it starts from 0x403d30 (the jump to last section is not handled right). [8.2] generate it again. It seems problem persists. [8.3] in binWriter.cc set a conditional bp on 0x403d30 observation: it is writing the instruction and the next few instructions. Know that file offset is from 12592 to 12602 (where it's messed with other data or instructions). Observation: it writes 12592 (2 bytes), and then 12594 (1 byte) then this is the last SOC, and it closes fid. Then it contins to binWriter::writeProgramExit. Problem is asJMP. Fix the logic. fixed. [8] generate 20 slices and test. There is One exception. [9] generate 40 slices and test. Around 4 slices with problems: 27, 40, 51 0x103, 0x25021 etc. [10] generate the entire slice Bug: crash. Problem REQ mode is set to 0. FIXED. It's in Trace constructor. 8:30AM 03/01/2014 [11] generate the entire slices (293) Bug: found most of them still crashes with code 0x103. It seems to be the problem of runproc.exe Found that it is still the problem of slicing. [12] check slice 108. Compare with a correct slice. It's the stack problem again. find the earliesr slice with problems. It seems that it's runproc.exe which drains the resource eventually. It did not release the resource of a process 10:00AM [13] modify runproc.cc [30 min] DONE. It has greatly reduced the number of 103s. The earliest slice is 40. 10:30AM [14] compare the log file again. verified. It successfully detects the error. 11:00AM -------------------------------------------------------------------------------------- Task 216: Fix the long bin executable file name problem -------------------------------------------------------------------------------------- [1] check how sendCommandToVM is accomplished. it calls handle_usr_command hmp_send_key [2] enlarge the hmp_send_key buffer see if it helps [2.1] repeat the error first (create a 30 char long name) total: 10 fail: 9 [2.2] enlarge the buffer in hmp_send_key total: 5 fail: 1 So enlarge the buffer of hmp_send_key does not work. [3] study the logic of hmp_send_key [15 min] All chars are actually sent to hmp_send_key. [4] analyze the workflow. main-loop.c -> handle_user_command_dummy -> BatchAnalyzer::take_and_exec_cmd_from_buffer() -> only called handle_user_command ONCE! How about using a loop? added 1s wait time before puching each key. does not work. [5] get into the details of sending a key board input. qmp_send_key -> kbd_put_keycode-> qemu_put_kbd_event Check which part is missing [6] in kbd_put_keycode renable the printf Translation from keycode to index is done by index_from_key 2:30PM [7] add from from keycode to key string (index_from_key) -> index (keycode_from_keyvalue) -> keycode -> kbd_put_keycode in ui_input.c in ui_input.c work on the following: [7.1] key_from_index [7.2] key_from_keycode [7.3] insert into kbd_input_key see how it works. Conclusion: at the level of kbd_put_keycode is still fine. It may be that the key board events are pressed too often and it is missed by the windows kernel. [8] add a magic number to slow down the entering of keycode. [9] test: total: 10 fail: 0 [11] regenerate all tests. ALL done. 9:30AM 03/03/2014 -------------------------------------------------------------------------------------- Task 217: find an experiment with a packer -------------------------------------------------------------------------------------- [1] find packer. Themida [2] problem: crashes QEMU, problem found in the code va_to_ha. [3] problem area: nclude/exec/softmmu_header.h:176 176 fprintf(stderr, "ERROR in I/O unalgined access. Count is %d\n", count); exit(8); [4] read the va to ha logic: [20 min] It seems to be hit multiple times and no problems with it [5] check the error address. It's 0x425ffe. [6] set conditional bp on 0x425ffe and check the logic. observation: 0x425ffe is first translated to an I/O address and then translated into a complete address. @EIP 0x425527: length: (5): push $0x000008BE @EIP 0x42552c: length: (3): movl %ebx, (%esp) @EIP 0x42552f: length: (3): popl (%edx,%eax) instruction 0x42552f popl instruction triggers the helper_trace_mem function. [7] check the meaning of TARGET_PAGE_MASK TARGET_PAGE_BITS is 12 #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS) ==> 2^12 = 4k = 0x00001000 #define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1) => 0xFFFFEFFF tbl_addr & ~TARGET_PAGE_MASK is equivalent to tlb_addr & 0x00001000 tlb_addr is retrieved from softmmu table. (given the page index) found: tlb_addr: 0x425010. [8] check the 1st part: page_idx: 0x25 because va&(realsize-1) is 0, it jumps directly to the iotable access. [9] check the 2nd part: same addr returned [10] check the 3rd and 4th path now va is 0x426000 -> page index is 0x26 [different page now]A tlb_addr: 0x426000 -> now since its & with TARGEG_PAGE_MASK is not 1 anymore, it's not treated as I/O. So the 3rd addr is mapped a regular addr, and the 4th addr is mapped as a regular addr. That's the problem that it crashes, because it cannot hold 2. But actually the second address is consecutive, we can actually increase the length by 1. [11] improve the algorithm so that it can be incremented length by 1. 9:00AM 03/04/2014 It runs and reports memRange out of range. Verified it's caused by the problem of detect_vm 9:30AM [12] enlarge memRange range size. [20 min] DONE. [13] test the testvm.exe again. It takes a lot of time to run. The program timed out. 10:15AM [15] running result: it throws a dialog that the program can only run in the computer that it is protected. [14] run testvm.exe without the analysis platform [20 min] [15] try to figure it out why the themida (testvm.exe) takes so much time to run. break on Cache::savetoDisk and Cache::saveCurrentBlockToDisk Problem: InstrStore size too small. Change it [15.1] test with job 1 OK. [15.2] run testvm. Still too slow. Try gprofiling and see what's going on. [16] use gprof, add -pg to Makefile (rules.mak) Use command: gprof /usr/local/bin/qemu-system-i386 gmon.out > ana.txt [16.1] first, run it for b21.exe. after pg is linked, no end. [16.2] run the testvm.exe time seconds seconds calls ms/call ms/call name 50.00 0.01 0.01 740 0.01 0.01 int128_2_64 50.00 0.02 0.01 aio_ctx_prepare 0.00 0.02 0.00 5437623 0.00 0.00 update_proc_stats Note int128_2_64 and aio_ctx_prepare. Suspected that the gprof data is not accurate enough. It seems that gprof does not report the accurate report and it does not include the I/O time. [16] remove the -pg profiler 8:30PM [17] try kcallgrind (a part of valgrind) - generated callgrind_run.sh [18] install kcachegrind [19] run - it seems to be producing more accurate results. Let it run. Since it's too slow, use callgrind control to start recording in the middle. callgrind_control -i on Still waiting ... It fails at copying. [20] check build_page_map logic, when is it called? It's not very efficient. It is called at each ENTRANCE of sysenter. comment out the count and see if it makes things runs faster. set a global counter and print how many times it is printed every 1000 times. Verified: at a certainly point, it starts to build page table too often. Wait and see if it can finish the task. [21] It breaks at the memRangemanager capacity 500. Strangely handle_phy_memory access is called at every helper_trace in tcg/i386/tcg-target.c:1254 Need to study the logic of helper_trace_mem 10:00AM 03/06/2014 -------------------------------------------------------------------------------------- Task 218: double check the handling of physical memory -------------------------------------------------------------------------------------- [1] set a breakpoint and study the logic b tcg-target.c:1255 In the trace, there is actually bTracePhyMem protecting it. [2] observation: (a) setPhyTraceMode is called in ops_sse.h: helper_trace2 (b) it's unset in disablePhyMemTrace in Trace.cc [3] algorithm design for speeding it up. [1] check when bTracePhyMem is modified. get all of them trace.h: setPhyMemTraceMode enablePhyMemTrace disablePhyMemTrace [4] Design: [40 min] [1] add bNeedsTracePhyMem as an integer to trace.h [5 min] DONE. [2] in BatchAnalyzer::execRawTrace set it to 0 [8 min] DONE [3] modify setPhyMemTraceMode and enablePhyMemTrace in Trace [5 min] DONE [4] modify disablePhyMemTrace [5 min] DONE [5] run and compare performance [10 min] check if it's every disabled --> no significant improvement [5] still crashes on memory range limit. 9:00AM [6] fix the memory range limit problem. [6.0] modify the program and record the cr3 of the trace, and the eip DONE. of the last instruction before entering syscall (change EIP mode). [1] in Trace class, there is last_eip, cr3 [5 min] [6.1] repeat the problem twice and see if it is the same EIP. [25 min] Skipped. There seems to be once that the program executes normal 9:45AM [6.2] simply enlarge the memRangeManager size and unit testing it [20 min] OK. no bug found 10:45AM [6.5] for timeout event of execRawTrace, try sendkey ctrl-c. [1 hr] [a] how to send key. DONE. [b] add a command sendkey to BatchAnalzyer [10 min] DONE [c] add an attribute Task:bTaskAnalyze to Task and set it to false by default; in taskAnalyze initiliaze to true. [8 min] DONE [d] at line 1182, change the handling of taskAnalyze. send a control-c to command module [10 min] DONE [e] debug: [1] case loadVM [8 min] [2] exec taskAnalyze [10 min] --> there are bugs related to this. fix it later. address 3 first. [3] test the program testvm [15 min] --> full trace is never generated. The problem is that raw_trace is not there. Problem: full trace generation algorithm is hit; however, it is switched to some other thread in the qmeu emulator about i/o locking. 8:30AM 03/08/2014 [4] check if the PROCESS_TERMINATE signal is received. b BatchAnalyzer.cc:69, 1100, 1208 Verified, it's the timestamp problem. 9:10AM [5] add timeouts for different tasks (rawtrace, fulltrace, slicetrace and remove the original TASK_TIMEOUT). Then update the call of TASK_TIMEOUT to corresponding tasks. [20 min] Actually no need because fulltrace and branch slices have no timeouts. 9:20AM [6] verify if taskFullTrace and taskBranchSlice are hit. [10 min] b taskBranSlice::gen_branch_slice b taskFullTrace::gen_full_trace Found the problem, the reason is that in branch_slice, tsEIP is not found (hit) yet. Trace.cc [1153] 9:40AM [7] fix the problem, add exception handling here. [35 min] 10:15AM [8] handle segmentation in second batch. [15 min] found the problem->memory consumption too big. Still occasionally I/O thread lock up. Not sure what's the problem. 10:30AM [9] check when trace is seraillized to disk. taskSaveTraces::do_job, it's hit. So not likely the file locking. stop the vm in saveTraceJob. does not help. recompile. 10:45AM [10] still not working. vm wait no good. try another job. Memory problem is solved. 11:04AM [11] try notepad.exe The control-c method is not working. Notepad continues to run and never ends. 11:00AM 03/11/2014 -------------------------------------------------------------------------------------- Task 219: improve the process termination -------------------------------------------------------------------------------------- [1] investate the task terminate command [15 min] DONE Use taskkill /f /im notepad.exe [2] design term proces algorithm [25 min] DONE [1] determine the process name. can collect in taskAnalyze [2] check timeout. For each task there is a timeout event. [3] at line 1201 of Batchanalyzer.cc is to terminate the job. [3] implement it and test it [25 min] [4] shoot the command [5] find the process terminate event handling. BatchAnalyzer.cc:69 BatchAnalyzer.cc:1226 add a conditional branch. [6] it seems that it never captures the process teriminate event for the notepad.exe the system is running but the notepad is never showing up, the taskkill is also not working. After the first non related PROCESS TERMINATE EVENT is received, mouse_move does not work [7 ]when stopvm is called. -> never called helper_trace is called [4] check the b21.exe. It is working fine. [5] check b21 and let it time-out see if it is ok. It seems to be working. VERIFIED it is working. break at BatchAnalzyer.cc:69, 1226, 1205 It seems that ctrl-c is pretty quick at terminating [6] run notepad.exe again and give it a long break. It seems one of the svchost.exe terminates and then the entire system is not responding (helper_trace2 is still capturing). Disable the sendkeyCommand killtask. [7] It looks like that the NOTEPAD.exe freezes the system. Not sure why. 4:00PM [8] check why NOTEPAD.exe freezes the system. [8.1] break on qemu_run_all_timers when it freezes and check the timers of vm_clock. It looks that rtc_update_timer does not like normal. Before freeze its value is: 1115771634328 --> after --> 86400771630298 (it seems that it needs to be hit first by GDB first, and then the 2nd time it is hit it turns out to be the big number). But it went back to normal --> 1345165842103 The program prints ROCESS EXITS! DUMP THE TRACE b seg_helper.c:2345 The first PROCESS EXITS is 5dee000 5dee000 - csrss.exe (controls threading and windows console) Which is NO GOOD. Why is this process get killed? ------------------- Description of csrss.exe (from Internet microsoft.com) csrss.exe is is the user-mode portion of the Win32 subsystem; Win32.sys is the kernel-mode portion. Csrss stands for Client/Server Run-Time Subsystem, and is an essential subsystem that must be running at all times. Csrss is responsible for console windows, creating and/or deleting threads, and implementing some portions of the 16-bit virtual MS-DOS environment. http://www.neuber.com/taskmanager/process/csrss.exe.html ------------------------- Need to check out the last moment of csrss.exe! [9] declare append_eip, remove_eip, and print_queue in handle.h and also handle.cc. [20 min] 0x804dbdbd 0x804dbdc1 0x804dbdc4 0x804dbdca 0x804dbdcc 0x804dbdd2 0x804dbdd8 0x804dbddb 0x804dbeb9 0x804dbebb 0x804dbec0 seg_helper.c:2345 before sysenter, it's 0x75b44df9 0x75b44dfb 0x75b44dfd 0x75b44e00 0x75b44e03 0x75b44e06 0x75b44e08 dump below: EIP 0x75b44ded: length: (3): andl $0x00, (%ecx) @EIP 0x75b44df0: length: (7): leal 0x75B480E0(,%eax,8), %edx @EIP 0x75b44df7: length: (2): movl (%edx), %ecx @EIP 0x75b44df9: length: (2): cmp %edx, %ecx @EIP 0x75b44dfb: length: (2): jz 0x0000001A @EIP 0x75b44dfd: length: (3): movl 0x4(%esi), %edi @EIP 0x75b44e00: length: (3): leal -0x10(%ecx), %eax @EIP 0x75b44e03: length: (3): cmpl %edi, 0x1C(%eax) @EIP 0x75b44e06: length: (2): jnz 0x00000009 @EIP 0x75b44e08: length: (3): movl 0x18(%eax), %ebx It's hit many times. Check the above again tomorrow. 9:00AM -------------------------------------------------------------------------------------- Task 220: improve the process termination -------------------------------------------------------------------------------------- [1] check the handling of process termination [10 min] sysenter, EAX requeest 0x101 is to terminate process. does other registers impact the semantics such as ECX or EBX? gdb) p/x env->ECX_BEFORE_SYSENTER $5 = 0x69ffd4 (gdb) p/x env->EDX_BEFORE_SYSENTER $6 = 0x69fed0 Strangely, append_eip is never called but still able to dump. [2] re-check the last 10 instructions before process termination of csrss.exe (5dee000) [15 min] 0x75b448dc 0x75b448dd ** lcall 7c9010ed 0x7c9010ed 0x7c9010f1 0x7c9010f3 0x7c9010f6 0x7c9010f8 0x7c9010fb 0x7c9010ff 0x7c901101 ** return 0x75b448e3 ** cmp $0x07, %si 0x75b448e7 0x75b448e9 0x75b448ef 0x75b448f2 0x75b448f5 ** lcall 0x7c90e88e mov $0x00000101, %eax 0x7c90e893 mov $0x7FFE0300, %edx 0x7c90e898 lcall *(%edx) 0x7c90eb8b 0x7c90eb8d This must be sysQuickCall [3] check if any of thehese has been hit. Verified: 0x75b448dd is actually ONLY hit once, now the problem boils down to what is the unique path that leads to the problem. check the prevoius 100 instr. ... 0x804df184 sysexit 0x7c90eb94 ret 0x7c90e384 ret 0x10 0x75b446be test %eax .. 0x75b4470a ** lcall will be called multiple times 0x7c901005 * always hit after the above lcall??? ... 0x7c90102b ret ** hit many times 0x75b44710 0x75b44fe1 ** hit many times! ... 0x75b448dc ** hit ONCE 0x75b448dd ** only HIT ONCE! 0x7c9010ed ... 0x7c9010ff 0x7c901101 0x75b448e3 [4] from the above it seems that 0x75b448dc's problem. dump @EIP 0x75b448d0: length: (2): jnz 0xFFFFFFEB @EIP 0x75b448d2: length: (1): cmpsb %ds:(%esi), %es:(%edi) @EIP 0x75b448d3: length: (2): addb %al, (%eax) @EIP 0x75b448d5: length: (3): addb %dl, -0x18(%edi) @EIP 0x75b448d8: length: (1): stc @EIP 0x75b448d9: length: (1): push %es @EIP 0x75b448da: length: (2): addb %al, (%eax) *** @EIP 0x75b448dc: length: (1): push %ebx It seems that it's 0x75b448da cuts in some interrupt and causes the problem. Problem: 75b448da is never hit! (VERIFIED) strangely it is never hit. check without the condition on CR3. (still not working) verified: strangely the previous instruction 0x75b448da is never hit. [5] conjecture: someone prepared the stack so that it jumps to the path to terminate the process. check 0x75b44fe1 , what are usually the next addr. 0x75b449a2 0x75b448a4 It seems to be alternativing between these two. The problem is who is pushing these addresses to the stack? Need to set a conditional BP and check the save wordsA 8:30AM 03/14/2014 [6] add value capture code snippet to line 1287 of tcg/i386/tcg-target.c (helper_trace_mem) and see who is pushing value 0x75b448dc to the stack? [30 min] [6.1] is helper_trace_mem done before or after the mem operation? It seems to be before the real operation. [6.2] then we should block on read instruction first, get what is the stack address that 0x75b448dc is read from the stack. DOES NOT WORK. it crashes program. [6.3] in ops_sse.h helper_trace2, in the next instruction (0x75b448dc), read the esp value from env->ESP_VAL_BEFORE. value is 0x69fee0, 0x52fee0, 0x101fee0 not stable, but 0x69fee0 is the most frequent. (especially the first time GDB is run) 9:30AM [6.4] 6.3 does not work. Find out where is cpu_stl_code [15 min] it's defined in target-i386/soft_mmu.h:306 9:50AM [6.5] set a conditional BP there and see who is saving 0x75b448dc [15 min] include/exec/softmmu_header.h DOES NOT WORK. because cpu_stl_xxx might not even be called! 10:30AM [6.6] study the logic of where helper_trace_mem is called again! [30 min] tcg_out_trace_mem is called at tcg_out_qemu_ld and tcg_out_qemu_st, they will be always called for memory read and write. It basically generates a branch using three calls tcg_out_tlb_load, tcg_out_qemu_st_direct, add_qemu_ldst_label. tcg_out_tlb_load: it calculates the address and test some bit in softmmu table entry and based on it generates two branch. The first branch will be load_direct tcg_out_qemu_st_direct: tcg_qemu_ldst_label: create a label and will be processed later to be wired._ 11:00AM [6.7] modify tcg could be too costly to create a function like helper_trace_mem. Instead, set a bp on helper_trace_mem when the addr being written is 0x69fee0 and 0x52fee0 and 0x101fee0 verified, it's hit too many times!!!! [6.8] check tcg_out_qemu_st_direct for 4-bytes operation, it calls directly tcg_out_modrm_offset: it just generate 2-byte instruction (psuedo). which does not look like the direct access. Strange it is generating one instruction of MOV. --- try to understand the logic completely. [6.9] browse softmmu_header.h 8:30AM 03/15/2014 [6.10] read tcg_out_tlb_load (tcg-target.c): [30 min] the following is the code generated, see comments 0xb5134b8d <code_gen_buffer+2957>: call 0x834104a <helper_trace_mem> #addr to write is 0x6ffc 0xb5134b92 <code_gen_buffer+2962>: add $0x10,%esp 0xb5134b95 <code_gen_buffer+2965>: pop %edx 0xb5134b96 <code_gen_buffer+2966>: pop %ecx 0xb5134b97 <code_gen_buffer+2967>: pop %eax 0xb5134b98 <code_gen_buffer+2968>: mov %ecx,%eax #copy addr into %eax #ecx is the address %6ffc 0xb5134b9a <code_gen_buffer+2970>: mov %ecx,%edx #copy addr into %edx 0xb5134b9c <code_gen_buffer+2972>: shr $0x8,%eax 0xb5134b9f <code_gen_buffer+2975>: and $0xfffff003,%edx 0xb5134ba5 <code_gen_buffer+2981>: and $0xff0,%eax 0xb5134bab <code_gen_buffer+2987>: lea 0x360(%ebp,%eax,1),%eax #load MMU entry 0xb5134bb2 <code_gen_buffer+2994>: cmp (%eax),%edx #see if entry matches 0xb5134bb4 <code_gen_buffer+2996>: mov %ecx,%edx 0xb5134bb6 <code_gen_buffer+2998>: jne 0xb5134bbc <code_gen_buffer+3004> #jump to slow path 0xb5134bbc <code_gen_buffer+3004>: add 0x8(%eax),%edx #store the offset to softmmu *** Note the instruction label_ptr[0] = s->code_ptr; recorded 0xb5134bba (which is the target of the address of branch instruction), it will be overwritten later. *** offsetof(CPUArchState, tlb_table[mem_index][0]) is the way to access global variable. *** It looks like the %edx contains the actual address. See below tcg_out_mov(s, type, r1, addrlo); //at the beginning here it moves addrlo (which is from a dynamically allocated register) [%ecx], into r1 (which is %edx) tcg_out_qemu_st_directly just generate One instruction: 0xb5134bbf <code_gen_buffer+3007>: mov %edi,(%edx) #this is to perform the save into MMU *** here tcg_out_qemu_st_direct(s, data_reg, data_reg2, TCG_REG_L1, 0, 0, opc); data_reg is the register that contains the 32-bit data, (i.e., it's %edi) TCG_REG_L1 is the register that contains the target address (i.e., it's %edx) - the target address is actually the address of the softmmu entry. 9:30AM * hardware breakpoint to figure out the relationship between addresses, see annotations above (from helper_trace_mem) 10:00AM Implementation steps: [1] in CPUArchState (cpu.h) add last_write_val [10 min] DONE. [2] add a move instruction that writes into env->last_write_val, simulate env->ESP_VAL_BEFORE [60 min] tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State, ESP_VAL_BEFORE) initial implementation does not work. Modify from: tcg_out_modrm_sib_offset(s, OPC_LEA + P_REXW, //opcode r0, //register --> destination register TCG_AREG0, //rm --> ebp->points to actually the CPUArchState structure's base r0, //index // contains the corresponding MMU entry index 0x360[ebp+eax+1] -> eax //it is the eax here. 0, // shift offsetof(CPUArchState, tlb_table[mem_index][0]) //offset ); It generates the following 0x360 must be the result of offset 0xb5134bab <code_gen_buffer+2987>: lea 0x360(%ebp,%eax,1),%eax #load MMU entry 10:50AM break into tcg_out_qemu_stl and observe the values: Effort 2: in tcg_out_qemu_st_direct: [1] push r0. DONE [2] set r0 to 0. DONE [3] then call tcg_out_modrm_sib_offset to set r0 [4] generate the save code [3] debug into tcg_out_qemu_st_direct [15 min] Checked the code generated but could not debug into it, Seems to be ok and did not crash app. 11:45AM [4] now check who's pushing 0x75b448dc as the return address. @EIP 0x75b448dc: length: (1): push %ebx Now it's hit! the eip is: @EIP 0x75b448d7: length: (5): lcall 0x000006FE @EIP 0x75b448dc: length: (1): push %ebx Shoot! wasted one days' effort! It's the problem of dumping at a wrong offset! Now break on 0x75b448d7 and see the last 100 instructions. 0x804df12a ... 0x804df184 0x7c90eb94 0x7c90e384 0x75b446be //hit many times. .. 0x75b4470a 0x7c901005 //hit many times ... 0x7c90102b 0x75b44710 //hit many times .. 0x75b448d6 //hit once [8] now use binary search to handle the following: 0x75b44710 //hit multiple times #leal -0x100(%ebp), %eax 0x75b44716 #push eax 0x75b44717 #leal 0x20(%ebp), eax 0x75b4471a #push eax 0x75b4471b #lcall 0x000006b8 0x75b44dd3-- new func 0x75b44dd5 # push %ebp 0x75b44dd6 #mov %esp, %ebp 0x75b44dd8 #movl 0x8(%ebp), %ecx 0x75b44ddb #push %ebx 0x75b44ddc #push %esi 0x75b44ddd #movl 0xc(%ebp), %esi 0x75b44de0 #movl 0x4(%esi), eax 0x75b44de3 #and 0x00FF, %eax 0x75b44de8 #test %ecx, %ecx 0x75b44dea #push %edi 0x75b44deb #jz 0x5 0x75b44ded #andl 0x0, (%ecx) 0x75b44df0 #leal 0x75b480e4, eax, 8, %edx 0x75b44df7 # movl (%dcx), %ecx 0x75b44df9 # cmp %edx, %ecx 0x75b44dfb # jz 0x1a 0x75b44dfd # movl 0x4(esi), %edi 0x75b44e00 # leal -0x10(%ecx), Teax 0x75b44e03 #cmpl %edi, 0x1c(%eax) 0x75b44e06 #jnz %0x000009 0x75b44e08 #movl 0x18(%eax), %ebp 0x75b44e0b #cmpl (%esi), %ebx 0x75b44e0d #jz 0x11 0x75b44e1e #movl 0x8(%ebp), %ecx 0x75b44e21 #test %ecx, %ecx 0x75b44e23 #jz 0xF4 0x75b44e25 //multiple times #movl 0x20(%eax), %edx 0x75b44e28 #movl %edx, (%ecx) 0x75b44e2a #jmp 0xFFFFFFED 0x75b44e17 #pop %edi 0x75b44e18 #pop %esi 0x75b44e19 #pop %esp 0x75b44e1a #pop %ebp 0x75b44e1b #ret 0x0008 0x75b44720 //multiple times # mov %eax, %edi 0x75b44722 # test %edi, %edi 0x75b44724 // mutiple times #jnz 0x155 0x75b44879 // multiple times # cmp 0x1, %esi ***** 0x75b4487d // multiple tims #jz 0x11f (jmp not taken) **** 0x75b44883 //hit once #cmp 0x6, %si 0x75b44887 //hit once #jnz 0x4f (jmp taken) 0x75b448d6 //hit once # //next job: try to find out the meaning of the code 8:00AM 03/16/2014 [5] try to figure out the logic at 0x75b44879, print memory first 0x75b44879: 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d Use windbg command: s -b 80000000 Lf0000000 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d s -b b0000000 L40000000 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d Did not find it Then search for 75b44722 s -b 80000000 L10000000 85 ff 0f 85 4f 01 00 00 53 Then search for 0x75b44e17 too many Then search for 75b44e1e 8:30AM 03/18/2014 [6] search for 0x75b44710 10:30AM tried all combinations, did not work. strangely [7] another attempt: in WinDbg type: !process 0 0 find csrss.exe Then attack process using .process command search again. Does not work. [8] read help file of "s" command, found that unless using "L?", the search range will not exceed 256MB which is 0x10000000. [9] To search full address range: s 0x00000000 L?0xffffffff 66 8e fe 06 [10]try the process limite again. This time: use command ./process /i proc_id the "/i" enforces to run and break at the specified process. now use the s 0x00000000 L?0xffffffff 66 8e fe 06 to search for the entire addr space !!!! ***************************************************************************** found similar address at cmp 0x6, %si (0x75b44883) located at addres 0x75b4474c!!!! However, it is not listed by the lm command! Use !address 0x75b4474c to find it out. did not fidn any useful information *** .reload /f /v (enforce immediate load (now have loaded all the moules) ******* IMPORTANT LESSON *************8 ##############!!!!!!!!!!!!! .reload /f /v s 0x00000000 L?0xffffffff 66 8e fe 06 !!!! ***************************************************************************** [11] search for instruction: 0x75b44710, it is also located on address 0x75b44710 in windbg. [12] Now use WinDbg to study the code again. 0x75b44710 //hit multiple times #leal -0x100(%ebp), %eax 0x75b44716 #push eax 0x75b44717 #leal 0x20(%ebp), eax 0x75b4471a #push eax 0x75b4471b #lcall 0x000006b8 #//CsrLocateThreadByClientID 0x75b44dd3-- new func 0x75b44dd5 # push %ebp 0x75b44dd6 #mov %esp, %ebp 0x75b44dd8 #movl 0x8(%ebp), %ecx 0x75b44ddb #push %ebx 0x75b44ddc #push %esi 0x75b44ddd #movl 0xc(%ebp), %esi 0x75b44de0 #movl 0x4(%esi), eax 0x75b44de3 #and 0x00FF, %eax 0x75b44de8 #test %ecx, %ecx 0x75b44dea #push %edi 0x75b44deb #jz 0x5 0x75b44ded #andl 0x0, (%ecx) 0x75b44df0 #leal 0x75b480e4, eax, 8, %edx 0x75b44df7 # movl (%dcx), %ecx 0x75b44df9 # cmp %edx, %ecx 0x75b44dfb # jz 0x1a 0x75b44dfd # movl 0x4(esi), %edi 0x75b44e00 # leal -0x10(%ecx), Teax 0x75b44e03 #cmpl %edi, 0x1c(%eax) 0x75b44e06 #jnz %0x000009 0x75b44e08 #movl 0x18(%eax), %ebp 0x75b44e0b #cmpl (%esi), %ebx 0x75b44e0d #jz 0x11 0x75b44e1e #movl 0x8(%ebp), %ecx 0x75b44e21 #test %ecx, %ecx 0x75b44e23 #jz 0xF4 0x75b44e25 //multiple times #movl 0x20(%eax), %edx 0x75b44e28 #movl %edx, (%ecx) 0x75b44e2a #jmp 0xFFFFFFED 0x75b44e17 #pop %edi 0x75b44e18 #pop %esi 0x75b44e19 #pop %esp 0x75b44e1a #pop %ebp 0x75b44e1b #ret 0x0008 ----------------------- # return from CsrLocateThreadByClientID 0x75b44720 //multiple times # mov %eax, %edi # esi value always remains unchanged, mostly 1 #sometimes 6 0x75b44722 # test %edi, %edi #edi is something like 0x171938 #looks like thread id #type: PCSR_THREAD 0x75b44724 // mutiple times #jnz 0x155 (jmp) //# in Wdbg it is to jump 0x263 bytes away 0x75b44879 // multiple times # cmp 0x1, %esi ***** // 0x75b4487d // multiple tims #jz 0x11f (jmp not taken) **** 0x75b44883 //hit once #cmp 0x6, %si 0x75b44887 //hit once #jnz 0x4f (jmp taken) 0x75b448d6 //hit once # // CONCLUSION ------------------- [1] entire function is CSRSRV!CsrApiRequestThread !!! [2] so 0x75b44710 corresponds to the following !!! the if branch!!! //the following is from reactOS. CsrThread = CsrLocateThreadByClientId(&CsrProcess, &ReceiveMsg.Header.ClientId); #esi must be the LPCMessage /* Did we find a thread? */ if (!CsrThread) //0x75b44724 { /* This wasn't a CSR Thread, release lock */ CsrReleaseProcessLock(); /* If this was an exception, handle it */ if (MessageType == LPC_EXCEPTION) ... } //--> 0x75b44879 if (MessageType != LPC_REQUEST) //LPC_REQUEST is defined as 1, cmp 0x1, esi { //--> 0x75b4487d /* It's not an API, check if the client died */ if (MessageType == LPC_CLIENT_DIED) //LPC_CLIENT_DIED is 6 // 0x75b4488e cmp 0x6, %si { /* Now we reply to the dying client */ ReplyPort = CsrThread->Process->ClientPort; /* Reference the thread */ CsrLockedReferenceThread(CsrThread); /* Destroy the thread in the API Message */ CsrDestroyThread(&ReceiveMsg.Header.ClientId); /* Check if the thread was actually ourselves */ if (CsrProcess->ThreadCount == 1) { /* Kill the process manually here */ ****** CsrDestroyProcess(&CsrThread->ClientId, 0); ***** So here, it kills the CsrThread ***** } /* Remove our extra reference */ CsrLockedDereferenceThread(CsrThread); 12:30PM [8] Study the logic of NTSTATUS NTAPI CsrApiRequestThread ( IN PVOID Parameter ) It seems to have some relation with timeout. The function itself is responsible for receiving user threads and handle their request. It basically serves the client requests (API calls). There is a branch is the message is LPC_CLIENT_DIED, it check if this is the the only thread of the processs and it destroys the process. The question is The question for us is that: why is the CSRSS.exe kills itself? 8:45AM 03/19/2014 [9] break on the branch to CsrDestroyProcess and check signal 6. [1] qemu side: b ops_sse.h:2480 [2] windbg side: !process 0 0 .process /i the PROCESS_ID of csrss.exe g (to run to the process) .reload /f /v (reload symbols force) ba e1 0x75b448d6 (this is where check thread count and terminate process will go o) Observation: 0x75b448d6 is NEVER invoked in windbt, instead 0x75b44883 is hit many times It's si = 7 (LPC_EXCEPTION) that triggers 0x75b448d6. So there could be something wrong initiated from notepad.exe and caused csrss.exe to kill itself. The pseudo code from reactos is listed below //----------------------------------------------------------------- if (MessageType == LPC_EXCEPTION) { /* Kill the process */ //***** NOTE HERE, it's terminating the CSR PROCESS!!!! ****** NtTerminateProcess(CsrProcess->ProcessHandle, STATUS_ABANDONED); /* Destroy it from CSR */ CsrDestroyProcess(&ReceiveMsg.Header.ClientId, STATUS_ABANDONED); /* Return a Debug Message */ DebugMessage = (PDBGKM_MSG)&ReceiveMsg; DebugMessage->ReturnedStatus = DBG_CONTINUE; ReplyMsg = &ReceiveMsg; ReplyPort = CsrApiPort; /* Remove our extra reference */ CsrDereferenceThread(CsrThread); } //------------------------------------------------------------------------- 10:00AM [10] Modify the system so that we can trace the last 100 instructions of notepad.exe. [10.1] check if the pid (cr3) is the same for trace: f3a5000, f45e000 It looks like that the addr is always 0x0fxxxxxx [10.2] change the appending of instruction [10.3] observation: 0xbf8e4e6d .... 0xbf8e4e6b after csrss.exe died, the notepad.exe is still running. [10.4] set a breakpoint at the deadth of csrss.exe and see what's the last instruction of notepad.exe (see whether they are triggering anything). observation: 0x804dbc61 ... 0x804dbf60 (looks like context switch) Running the second time returns the last last array of instructions. Not enough for making a prediction. 10:45AM [10.5] enlarge the array of instructions to 2000 and then make a dump. [15 min] Still in kernel mode 11:00AM [10.6] modify the adding instruction mode, do not add 0x80xxx instruction. [15 min] observation: first run: 0xf849ac6d 0xf849ac6e 0xf849ac6f 0xf849ac70 --- 0xf81a37a4 0xf81a37a5 0xf81a33ec 0xf81a33ed --- 0xf812d413 0xf812d414 0xf812d415 0xf812d416 Every time it's different. [10.7] try to remove 0xf8 range and see what's the last instruction. 0x7c9011ab 0x7c9011ac 0x7c9011ad --- 2nd time same. 3rd time also same. [10.8] Study the code in VM. 0x7c80a68d 0x7c80a68f 0x7c80a690 0x7c80a691 0x7745aad8 #jmp COMCTL.7745aadd 0x7745aadd #pop ebp 0x7745aade #ret 4 0x773d40e6 0x773d40e9 0x773d40ea #jmp 773d40ee 0x773d40ee 0x773d40ef #pop 0x773d40f0 0x773d40f1 #ret 4 0x773d4255 0x773d4257 0x773d4258 0x773d4259 #ret 4 0x773d42d4 0x773d42e1 0x773d42e2 #ret c 0x7c9011a7 # mov %esi->%esp 0x7c9011a9 0x7c9011aa 0x7c9011ab 0x7c9011ac 0x7c9011ad #ret 10. SHOULD RETURN TO 0x7c91CBAB (or maybe others) Note program entry: 0x0100739D. check if it's hit [10.9] set a conditional bp ON 0X7C9011AD and see what's the next. it's hit 10 times (1st run), which is significantly more than expected verified it's always hit 10 times. The next one is 0x804e1f25 (looks like an illegal addr complaint) 7:50PM 03/19/2014 [10.10] break on the last conditional bp and check if there are any interrupts b on ops_sse.h:2482 and ignore it 9 times then when it's hit, break on helper_trace2 break on helper_trace_mem break on helper_raise_interrupt break on helper_raise_exception --> helper_trace_mem is hit first, access: 0x773d1c2c (read) Then helper_trace2 is hit (eip: 0x804e1f25) --> why no raise_interrupt It did not hit ldl_mem and directly goes to cpu_x86_exec [ meand MMU hit] check the other hits. It seems that the env->ESP_VAL_BEFORE suddenly jumps from 0x7fa14 to 0x773d1c2c. Then it invokes cpu_loop_exit, which tricks a long jump and we are not able to trace Check who is setting the ESP to 0x773d1c2c. So it's 7c9011a9. --> mov %esi -> %esp causing problem. Maybe it's the ESP value problem. check later. 10:00AM 03/20/2014 [11] continue to figure out the problem [11.1] analyze the exact location of the error [15 min] verify 3 times. ___th hit of 0x7c9011ad (10th) __ th hit of 0x7c80a68d (1st) [11.2] check the ESP is the same [1 hr] Starting from 0x7c80a68d and check/display ESP value of each instructioni [20 min] It always context switch in between Context Switch happens at the jmp instruction at 0x7745aad8 (jmp 5) set a conditional bp at 0x7745aadd (too slow change code) and similar for all context switches Verified: upto 0x7c9011a7 the ESP value is always correct. It's the value of ESI (which assigned to esp) causes problem. esi is from: [11.3] repeat the error, conditional break on 0x7c9011a7 __ th hit causes the problem. (10th time) [11.4] check ESP value for each all normal before 10th [11.5] check who's changing ESP_VAL_BEFORE [1] ignore bp on 0x7c9011a7 9 times [2] then set breakpoints [3] it is verified that ESP is indeed changed (before calling of helper_trace2). [11.6] check where is the esi value from It's set by 0x773D40E9 reading from 7F9C0 (result: 7FA04) **** TO DO **** break pon 0x773D40E9 and watch the content being written. 9:00AM 03/21/2014 [12] analyze the memory saved by 0x773D40E9 [45 min] set bp on 0x773D40E9 and trace into the helper_trace_mem and see what is the contents saved. [1] figure how many times of its call to reach error ONLY hit one time [2] check out the value It is reading 0x7f9c0 to esi. verified it is reading 0x773d1c1c. So the POP %esi instruction has no problem. The question is who is writing to 0x7f9c0 with value 0x773d1c1c? [13] analyze the winxp version and check who is writing to 0x7f9c0 with [20 min] value 0x7fa04? It is caused by instruction at 0x773d40a0 push esi (first time) Found that esi has value 0x7fa04 from the beginning of the module is loaded. 10:30AM 03/21/2014 [14] check ESI value at 0x773d40a0 and 0x773d42b3. [30 min] [14.1] in translate.c, add statements for saving esi save_reg_esp_ebp_before_instr then conditional bp on 773d40a0 and 773d42b3 check the value of esi Found that somehow the esi value is changed between the two BPs! check out where it is. Found: the esi value is changed in kernel routine, and the latest one is 7c9103ae. (before it there is an instruction which loads esi with value from stack [ebp+12], it seems like a dll name] [14.2] check when it occurs set conditional bp on 7c9103b8 and ignore it and see how many times to reach the switch. hit 40 times. 8:30AM 03/22/2014 [15] continue the research on ESI problem. [20 min] [15.1] research what is the use of 7c9103ae. set breakpoint on 773d42b3 (entry) --> how many times hit 7c9103ab (change esi) --> 773d40a0 (push esi) write to memory) Observation: 7c9103ab is hit many times before reaching 773d40a0, it can be pointing anywhere before it has the value 0x7FA04 that reaches 773d40a0. 1st hit: 0x7F7A0 (WindowsShell.manifest) 2nd hit: same 3rd hit: same 4th hit: 0xA3CF8 (global path) 5th hit: 0x7F1D8 (WindowsShell.manifest) 6th hit: 0x773D1C1C (Desktop/Control Panel) 7th hit: 0x773D1C00 (SmoothScroll) 8th hit: 0x773D1B88 (Microsoft\Current Version\ ...) 9th hit: 0x773D1B60 (EnableBoloomTips) 10th hit: 0x773D29A0 (Microsoft\...CurrentVersion...) ---------------> 0X7C3D40A0 (now ESI has value 0x7FA04) [15.2] in winxp: start from 773d42b3 (entry) to 0x773d40a0 and see how 7c9103ae is called. 1st hit by 0x773d42cf (it hits all of them) and then reach 773d40a0 --> 773d4211 -> invoked 6 times and the value of esi is recovered to 0x7fa04 --> 773d4234, 773d4239 (4 calls) -> invoked another 4 times of 773d03ae and esi is recovered to 0x7fa04 --> 773d4250 --> calls 0x773d40a0 10:00AM [15.3] break on 773d4211, 773d4234, 773d4239 and see what is the value of esi before them [20 min] 773d4211: 7fa04 773d4234: --> 0x773d1c1c, 7f9dc 773d4239: [15.4] break on 0x7c9103ae and see if it matches [15.1] 1st: 7F7A0, ESP: 7f454 xp_ESP: 7f454 2nd: 7c97ce28: ESP: 7f044 xp_ESP: 7f044 3rd: 7c97ce28, ESP: 7f03c xp_ESP: 7f03c 4th: a49f8, ESP: 7f130 xp_ESP: 7f130 5th: 7f1d8, ESP: 7f0b4 xp_ESP: 7f0b4 *** --0x773d4225, ESI: 7fa04 ESP:7f9d8 , xp_ESP: 7f9d8 6th: 773d1c1c, ESP: 7f754 xp_ESP: 7f768 (now different) !!! 7th: 773d1c00, ESP: 7f750 xp_ESP: 7f764 8th: 773d1b88, ESP: 7f754 xp_ESP: 7f768 9th: 773d1b60, ESP: 75750 xp_ESP: 7f764 --> recovered to 773d1c1c (did not recover to 0x7FA0C) -- 0x773d4234, ESI: 773d1c1c, ESP: 7f9dc, xp_ESP: 7f9dc -- 0x773d4239 10th: 773d1c1c, 7f9dc 11:30AM [15.5] study why the esp difference starting from 0x773d4225 (call) Found that four calls related to registry key open and registration. The first one at 773d6ff4 leads to the 6th call of 773d0a1e bp on 773d6ff4 and check data 0x773d6ff4, ESI: 77d48f75 ESP: 7f79c xp_ESP: 7f79c xp_ESI: 774d8f75 0x77dd6b2f, ESI: 0x7c4 ESP: 7f760 xp_ESP, 7f774 xp_ESI: 0x40 (strage it's never hit) 9:00AM 03/24/2014 [16] Use binary search to study from where 773d6ff4 to 77dd6b2f, the esp starts to change. [16.1] First set breakpoint on 773d42b3 (entry) --> how many times hit 7c9103ab (change esi) --> 773d40a0 (push esi) write to memory) when the first point is hit, set BP at 773d6ff4, and 77dd6b2f 0x773d6ff4, ESI: 77d48f75 ESP: 7f79c xp_ESP: 7f79c xp_ESI: 774d8f75 ---> there is a context switch stragenly 0x773d6ff4 is hit a second time ESI: 77d48f75, ESP: 7f788 0x77dd6b2f, ESI: 0x7c4 ESP: 7f760 xp_ESP, 7f774 xp_ESI: 0x40 (strage it's never hit) [16.2] figure out why a context switch will reset the ESP value, why isn't it reset to 7f79c? add a new branch (eip_in & 0xFF0000000) == 0x77000000 Verified: it's not the problem of context switch [16.3] continue with the helper_trace2 Strangely it hits: 773d6fe2 (not the regOpenReg function) 2nd time it's hit it jumps to 77dd6a78 after 0x773d4211 0x773d6ff4: jumps to 0x773d6fdc (not right) [16.4] check again. First break on 773d42b3 , 773d4211, then 773d6ff4 and then 773d6a78 (the 1st instr) --> it hits: 773d6fdc It triggers a context switch when doing memory read. check which address it is doing: 9:30AM 03/25/2014 [17] check again the instruction at 773d6ff4 [17.1] check the instruction [1] bp on 773d42b3, 773d4211 first [2] then bp on 773d6ff4 (call openRegistryxxx) observation: gets a core dump when print_instrRange at 0x773d6ff4. is not successful at printing the instruction. [17.2] read the memory contents at the target address [17.3] trace into helper_trace2 of the instruction and see how it ends up without the calling of helper_trace2. Found the problem: in helper_trace2 when it tries to perform memory read, it fails. because it triggers a page fault. Found the problem: when it reads 12 bytes, it is accrosing the boundary of the page and the check did not find that. So copying 16 bytes is quite dangerous operation that leads to memory read failure. [18] try to fix the above problem: in the loop which tries to copy 15 bytes, chekc if the the current address is page boundary (that is to & with 0x0000FFF is 0x0000000 - last 12 bites is 0, 4kb page), perform a second check on va_to_ha, if not stop. It now works. [19] needs to check how it works for instruction 773d6ffe The page is now located here. [20] to do: BREAK ON ops_sse.h:2644 and check every possible case. Notepad case successfully solved. 9:00AM 03/26/2014 -------------------------------------------------------------------------------------- Task 221: test notepad.exe -------------------------------------------------------------------------------------- [1] run batch analysis with large time interval and see how it works. It wrongly cleared all tasks. [2] try add a parameter to clearTaskList(int upperLimitCount) and pass 1 in. [20 min] Continue to next task directly 9:45AM 03/28/2014 -------------------------------------------------------------------------------------- Task 222: fix the problem of frozen analyze task. -------------------------------------------------------------------------------------- [1] check the sequence of events fired. the list of events are defined in //BatchAnalyzer::execBatchBranchSlice this->addTask(new taskChangeJobCategory(job, logger, Job::GEN_RAW_TRACE, 1)); this->genTasksForGenRawTrace(job); this->addTask(new taskSaveTraces(job, logger)); this->addTask(new taskChangeJobCategory(job, logger, Job::GEN_FULL_TRACE, 1)); this->genTasksForFullTrace(job); [2] run and see how it works. It's frozen, never jump from full trace to the slice job. It seems to be actually ok. Found that full_trace is slowed by tsHasMultipleWrites [3] try enlarge block size to 64MB found that over limit change both to 16M entries per block does not work. 8mb does not. try 4MB. 9:00AM 03/29/2014 -------------------------------------------------------------------------------------- Task 223: improve the performance of loadBlock by adding a cache. -------------------------------------------------------------------------------------- [0] recompile and get a good setting. [60 min] [1] observe the performance of loadBlock and analyze when it is called most often. [15 min] Observation: it is switching back and forth too many times between two blocks. [2] design idea: maybe in the Cache set up a backup block. When loading, switch. [3] need to redesign Cache: [1] create an array of two blocks, for every attribute keep two copies, and declare an active_block_idx [2] for load_black() if the alternative block idx is the same as the id, then just point to the other; if not, then load the alternative block and set it to the other. [3] for saveCurrentBlock() still do the same [4] check all the others 11:00AM [4] implementation. [1] change all data definitions of cache.h [20 min] [2] correct syntax error in each of the functions. [90 min] [1] createCache DONE. [2] loadCache DONE [3] ~Cache DONE [5] appendRecord DONE [7] updateRecord DONE [8] updateLastBlockIndexSize [DONE] 9:45AM 03/30/2014 ------------------- [9] saveCurrentBlockToDisk [DONE] 10:15AM [10] saveToDisk [DONE] [10.1] refactor saveCurrentBlockToDisk to saveBlockToDisk(i) [15 min] [10.2] refactor saveCurrentBlockToDisk [5 min] [10.3] saveToDisk, depending on the order. [20 min] [6] retrieveRecord DONE 10:45AM [9] loadBlock [9.1] refactor loadBlockInto [15 min] [9.2] redo loadBlock [15 min] 9:15AM 04/01/2014 [5] unit testing [1] destructor problem [10 min] [2] testLoadCache [15 min] 9:40AM [3] appendCache [15 min] [4] loadBlock again. [15 min] [5] loadBlock issue 2 load rec2 line number: 69 10:30AM [6] dependLink error. [25 min] check saveCache... 7:50PM [7] fix issues in loadBlock [30 min] [8] bad alloc problem [30 min] 10:00AM 04/02/2014 [9] fix the problem of 10000 records???? fixed. It's cleared all unit test. -- to do: the fwrite at line 230 can be improved 9:30AM 04/03/2014 [10] test the entire thing on qemu_image. Problem. segmentation fault with Cache::loadBlockInto. Fixed: it's because the request mode set to 0. 9:00AM 04/04/2014 [11] exec job end too early problem. solved, the copy time is too short. 10:30AM [12] should show a progress bar of the full trace analysis [20 min] [13] check how the full trace is generated. Still very slow though. Found that 2 chunks of block is still not enough. May consider multiple buffers. Observation: after 39% is's slowned down a lot. 48% - 11:22AM 55% - 11:35AM 62% - 11:45AM 77% - 12:11PM 82% - 12:26PM 87% - 12:35PM About 7% per 10 minutes. 9:00AM 04/05/2014 [14] add reports of loadBlockIntoIDs and saveBlock invocation times. [20 min] [15] handle SIGFPE, Arithmetic exception. FIXED. [16] try to dump the ratio of load and save blocks. everything ok before 33%. [17] mysterious problem of cannot printing out Cache::loadTimes! Clean and recompile. 9:00AM 04/06/2014 [18] fix the ratio of load and save blocks. found the problem, it's the first %d literal caused problem. (because it should actually be %lld) Now the stats: ---progress: 37%, loadBlock: 34991, saveBlock: 2, load/save: 11663 ---- ---progress: 38%, loadBlock: 35889, saveBlock: 2, load/save: 11963 ---- ---progress: 39%, loadBlock: 36868, saveBlock: 3, load/save: 9217 ---- ---progress: 46%, loadBlock: 43915, saveBlock: 3, load/save: 10978 ---- [19] add another attribute called actual load. ---progress: 38%, loadBlock: 35589, saveBlock: 2, actualLoad: 250, load/save: 11863, load/actLoad: 141 ---- ---progress: 39%, loadBlock: 36299, saveBlock: 2, actualLoad: 310, load/save: 12099, load/actLoad: 116 ---- ... ---progress: 48%, loadBlock: 45691, saveBlock: 3, actualLoad: 1237, load/save: 11422, load/actLoad: 36 ---- ---progress: 49%, loadBlock: 49115, saveBlock: 3, actualLoad: 1296, load/save: 12278, load/actLoad: 37 ---- ---progress: 55%, loadBlock: 70037, saveBlock: 4, actualLoad: 2517, load/save: 14007, load/actLoad: 27 ---- ---progress: 56%, loadBlock: 71713, saveBlock: 4, actualLoad: 2630, load/save: 14342, load/actLoad: 27 ---- ---progress: 65%, loadBlock: 86129, saveBlock: 4, actualLoad: 3627, load/save: 17225, load/actLoad: 23 ---- ---progress: 66%, loadBlock: 91041, saveBlock: 5, actualLoad: 4090, load/save: 15173, load/actLoad: 22 ---- Observation: from 65% to 66%, there are 5000 loads and 500 loads. Actual ratio is 10:1 1% costs about 1 minutes. 500 actual loads is aboout 200MB * 500 = 100GB of reading. That's a lot. At 77% to 78% it's roughly the same ratio. 9:00AM 04/07/2014 [20] try improve by adding more buffer. Make it generic [3 hrs] [1] createCache [2] loadCache [3] saveToDisk [4] appendRecord [5] updateRecord [6] updateLoastBlockIdxSize [7] saveBlockToDisk [8] saveCurrentBlocktoDisk [9] saveToDiskA [10] retrieveRecord [11] loadBlockInto [12] loadBlock 09:30AM 04/08/2014 [21] unit test the change [1] loadCache problem. [10 min] [2] fix arithmetic fault. [15 min] [2] block_size too large problem. increase process heap size. call malloc_stats() to check memory usage. [20 min] Found that in testTrace it exceeds memory usage. around 3.2GB. the end of it testAddInstr() wasted about 300MB. [4.5] fix the percent problem (lSize/100) [5] now broke on constructSampleTrace(), break at about 1.9GB. Before the call the heap use is about 5MB. A trace takes 667MB (each block takes about 256 * 500k = 112MB. So it has two cache blocks, one for instrStore and the other for sequence) When constructFullTraceFromRawTrace is called, it is 667MB it first calls a trace will consumes 667MB --> 1.32GB trace->instrStore creates another 336MB. --> 1.75 GB (so this should be freed first) trace->execHistory creates another 336MB --> 1.97 GB After the fix, reduced from 700MB to 120MB. [6] stack overflow error. fixed. [7] make ad-hoc allocation of disk block. [1] fix Cache::Cache only init the first block. [2] fix loadBlockInto [3] fix saveBlock reduced another 10MB. 09:00AM 04/09/2014 [8] test the entire system. segmentation fault. on vpage->hpage translation. Could not replicate the erorr unfortunately. Address it later. 5 success, 3 fail. [9] performance comparison on buffer number nad size. Chteeomparison: Buffer setting: 500k 2 buffer: ---progress: 48%, loadBlock: 45691, saveBlock: 3, actualLoad: 1237 ---progress: 49%, loadBlock: 49115, saveBlock: 3, actualLoad: 1296 ==> load: 3424, actual load: 59. Ratio: 58:1 3 buffer: ---progress: 48%, loadBlock: 47311, saveBlock: 3, actualLoad: 193 ---progress: 49%, loadBlock: 51571, saveBlock: 3, actualLoad: 242 ==> load: 4260, actual load: 49. Ratio: 86: 1 4 buffer: ---progress: 48%, loadBlock: 48169, saveBlock: 3, actualLoad: 9, ---progress: 49%, loadBlock: 52507, saveBlock: 3, actualLoad: 9, ==> load: 4338, actual load: 0!!!! ---progress: 98%, loadBlock: 147447, saveBlock: 7, actualLoad: 1683 ---progress: 99%, loadBlock: 150041, saveBlock: 7, actualLoad: 1740 ==> load: 2594, actual load: 157, ratio: 16 8 buffer: --progress: 98%, loadBlock: 147467, saveBlock: 7, actualLoad: 25, --progress: 99%, loadBlock: 150071, saveBlock: 7, actualLoad: 25, ==> 0! No the problem is that most slices are broke (throw exception). 09:15AM 04/11/2014 [10] fix the occasional problem of memory broke. Could not repeat it again. strange. can now repeat. [11] found the problem of notepad1.exe crash: something wrong with ntdll during slicing. 10:30AM [11] run valgrind. (val_run.sh) [1] problem iwth task. [we'll let them leak anyway, not big] there is a potential that tasks deleted (get time out event called). [2] invalid read in saveToCache of RecordRequestProcessor. fixed. [3] invalid memory read of getTrace() read of bytes already freed in handle_mem-write->getTrace. (solved) call the destroyInstance instead. 9:30AM 04/12/2014 [4] seems to clear all of them. Run 1 branch slice. But still got the segmentation fault. in ha_to_pa. set bp on taskSaveTraces which deletes TraceManager and traces and see how many times it is hit before the next crash. Verified, it is still the TraceManager destroyer problem. See when it is called. It is a synchronization problem. The TraceManager instance has been destroyed but someone else is still using it. Typical scenario: Thread 1: TraceManager *tm = TraceManager::getInstance; //s1 if(tm!=null){ //s2 Trace *tr = tm ->getTrace(cr3); //s3 Thread 2: TraceManager::destroyInstance; //s4 If s3 is executed between s1 and s2, that's no good. Even it's after s3 it's no good as well. So actually ALL handle_mem_read, handle_mem_write, handle_instruction should be protected by a mutex lock and TraceManager::destroyInstance, which is caused by taskSaveTraces should be handled carefully as well. 11:00AM [5] add synchronization protection. [1] declare mutex lock and protect all functions in handle.cc [20 min] DONE. [1.5] make a double check of everything [15 min] DONE. DONE. [2] examine BatchAnalyzer.cc and add protections to functions that call TraceManager. It seems only need to put protection to the call of destrooyInstance of TraceManager, because the others do not have conflict with it. (will be guarnateed to be sequential; other the current running threads of qemu threads (system emulation) will race against it). [20 min] DONE. [3] test [1 hrmin] [1] Problem: it locks up the system. VERY HARD TO DEBUG. [2] replace all pmutex_lock to mylock and others to myunlock found that there are two consecutive lock. Confirmed 4th lock creates the problem. isNeedResetCR3 problem. Not sure who locks the guy. add a piece of code to remember the lock It's called by send_event has lock and then [3] set recursive lock to allow same thread to own/request the same lock twice. Added in BatchAnalyzer::do_jobs 4:30PM remove the lock protection on clearTasks() in BatchAnalyzer It now works. 5:00PM [4] run it 5 times to verify no meomry crash anymore. 1. but seems to be locked in wait_for_io 2. It now breaks at every 2nd pass. 10:00AM 04/14/2014 [6] gdb the problem. segmentation fault: Cache::loadBlockInto (id=0, destID=0). The problem is with RecordRequestProcessor::loadFromCache. It is called by Trace::Trace depending on the GEN_PRESERVE_REQUEST_MODE And it is called by TraceManager::isProcessToBeTraced. Strangely it is calling the constructor of Trace::Trace. The problem is that after the raw trace is done, the TraceManager still has the name of the process in the setNamesProcessToTrace. In addition, the emulator should not be started at all. Find out who calls it. (break on BatchAnalyzer::stopvm and contvm). No one actually calls contvm, but it is still fired. Still mutliple places of error. [6.1] attempt 1: since gdb has problem with static field displaying, check the assembly instruction of TraceManager::myinst and find its location. Found the problem. After the first round, it does call the BatchAnalyzer::createInstace() and then helper_trace2 is called, it then clal isProcessToBeProcessed Question: is it called before loadvm? Fix: adjust the sequence of lodavm and initVM (which add the process to load). Also comment out taskContVM. Why is it needed for branch slice? comment it out. [7] now it complains about TraceManager is NULL after the 1st half round. DONE. 9:00AM 04/14/2014 [8] broke again on TraceManager::isProcessToBeTraced, when it finds that the process is being traced break on taskInitTM::synch_job and BatchAnalyzer::loadvm Trace::Trace seems to be called at the right place. check how vecBlockIdx is initialized in Cache.cc Problem is that RecordRequestProcessor has no data at all! This crashes loadBlockInto. Check if RecordRequestProcessor is EVERY SAVED! the RecordRequestProcessor is inconsistent: it reports that there is one record, but the block is not modified. [8.1] set breakpoint on RecordRequestProcessor operations and see if they are ever called. Problem is RecordRequestProcessor->cacheRRP->curBlockModified[0] is modified to false when save to disk. The block is indeed written into the file. But it's loaded. Then in Trace::constructFullTraceFromRawTrace, because the job mode is GEN_PRESERVE_REQUEST_MODE, it initializes the rr_processor, which is not good. Because it should actually load it. Modify Trace::Trace add a condition that the job category is gen raw trace. Seems to be fixing the problem [8.2] verify if branch_slice is called ever. called 10:15AM [8.3] robustness test. run it five times. change it to 2 branch_slice 2 success, 1 fail. (deadlock on os_host_main_loop_wait) 7:30PM 04/14/2014 -------------------------------------------------------------------------------------- Task 224: check the generated slice for notepad.exe crash problem. -------------------------------------------------------------------------------------- [1] check the generated .exe file and see why it crashes. Always break at 0x7c913395/96. It's a part of stricmp function. Passed address is not right. Try to look for common starting point from the collection of return addresses in the xp stack. candidates: 7c8171b5 (does not work). Use WinDbg to inspect stack: candidate: 7c90eac7. [2] problem: not able to stop at any instruction in IMM. [3] try in WinDbg. Use command sxe ld to break on any module loading, found that the last module loaded is winspool.drv [4] found the path 7c9272e6 -> 7c91b0dc -> 7c913396 (broke) [5] immunity debugger not able to stop at the breakpoint Have to use windbg for the faulty version [1] break on 7c9272e6 then break on 7c91b0dc and then check the difference need to figure tomorrow. 9:00AM 04/15/2014 [6] continue analysis of IMM + WinDbg (look at the difference) [30 min] (use sxe ld and sxd ld to enable or disable events on module load) 7c9272e6 -> 7c91b0dc -> 7c913396 Problem is with 0x7c91b1b2 (when it calls the stricmp, the 2nd parameter, which should be a pointer of a string is not right, it is value 90909091 (which is an illegal address) [7] Find out where the address is from [30 min] The following sequence are reversed -----------------------------CORRECT VERSION -------------------- 7c91b1b1 push edi (edi should have 775fb0ec, "msvcrt.dll") -> 7c91b1af add edi, eax (edi should be 774e0000, eax is 0011b0ec) -> 7c91b1a9 mov edi, [esi+18] (esi should be 001a2b58, it then loads 774e0000 from 001a2b70) -> 7c91b102 mov esi, [ebp+10] (ebp is 0x0007e7bc) -----------------------------WRONG VERSION -------------------- 7c91b1b1 push edi (edi has 91909090,points to nowhere) -> 7c91b1af add edi, eax (edi should be 01000000 , eax is 90909090)) -> 7c91b1a9 mov edi, [esi+18] (esi 001a1ee0, it then loads 01000000 from 001a1ef8)) -> 7c91b102 mov esi, [ebp+10] (ebp is 0x0007fa78)) -================ Conclusion: =====[1] edi is the base of the module (01000000 is the base of the notepad.exe and 77re0000 is the base of ole32.dll) =====[2] eax at 7c91b1af is maybe the OFFSET to a module name (import table)? just a guess?? Clearly, notepad.exe's PE header is somehow overwritten by 9090909090. So, there is something wrong with the module that writes into the binary executable. 11:00AM [8] Figure out what's the meaning of EAX. [30 min] [8.1] At 0x7c91b197 it first call PVOID NTAPI RtlImageDirectoryEntryToData ( PVOID BaseAddress, BOOLEAN MappedAsImage, USHORT Directory, PULONG Size ) ==== working version baseAddr: 774e0000 (base of ole32.dll) mappedAsImage: 1, Directory: 1, size: 0007e7a0 It retrieves OptionalHeader.DataDirectory[1] (according to ReactOS source) seems to be the import table, because [0] is the export table [8.2] At 0x7c91b19c: MOV EBX, EAX #move EAX to EBX Now EBX and EAX both have 0x775fb04c (the virtual address attribute of import table) -> it should be the import address table BEGINNING ADDRESSS [8.3] At 0x7c91b19e: MOV EAX, [EBX+C] #move content at 0x775fb04c+c Note: EBX currently points to IMAGE_IMPORT_DIRECTORY: long rvaImportImportTable long timeDateStamp long forwardChain long rvaModuleName ###!!!! So +0xC points to the field of rvaModuleName (relative address) So we can confirm that it is loading the module name at strcmpi, and this area (import table) is wiped by the current implementation. 11:30AM [8.4] obsreve the PE header ox 0x01000000 [30 min] Both version: import table address and size correct: 7604 Observe: 01007604: correct version has the right information (+12 is the rvaModuleName 01007604: WRONG version: has all wiped out starting from 0x01007410 (all wiped with 0x90909090)!!!! Found the bug now! [8.5] found that 90909090 starts from 0x01001000. START ADDR!!!! OF .text section! So the current algorithm is WRONG! It overwrites the DATA stored in the .text section. 7:30PM [8.6] read the original logic [20 min] it's done in clearAllSections. Call graph below: binWriter::writeDataSlice -> clearAllSections read the writeSOC logc: calls writePartrialTrace, which writes instruction in InstrInfo one by one. 8:00PM [8.7] implement the conservative algorithm [1 hr] [1] change declration of the function and make it compile [10 min] DONE. [2] change the algorithm [15 min] DONE [3] unit testing [10 min] [4] testing [30 min] [9] problem again: crash on creating exit point. 08:30AM 04/17/2014 [10] check how it is used to insert Problem 1: jmp size: 5 > branch size: 2! Failed in writing program exit. Removing directory: /home/samba/smbuser/slice_jobs/job_notepad/branch_slices/notepad.exe/brc_22!!!!!!!!!!!!----- Problem 2: cannot find hole for 40-byte jump. Failed in writing program exit. Removing directory: /home/samba/smbuser/slice_jobs/job_notepad/branch_slices/notepad.exe/brc_27!!!!!!!!!!!!----- Problem 3: Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 13, ts: 4033348, eip: 1007155! Breakpoint 2, Util::error_exit (fmtstr=0xb7c3b930 "Could not locate SOC for ts: %lld\n") [12] fix problem 2 first ts=4038671 [11] fix problem 1 and 2. Attempt: instead of searching for '\x90' search for both \x90 and \x00. not sure effects yet 9:00AM 04/18/2014 -------------------------------------------------------------------------------------- Task 225: fix the SOC not found problem -------------------------------------------------------------------------------------- [1] fix problem 2 first ts=4038671 [2] set a breakpoint at branch_slice and set the ts to 4038671 , yes, can repeat the error 9:15AM [3] error analysis: [20 min] tsStart = 4033356 tsEnd = 4062616 maxTS = 4062616 It broke because tsend is equal to maxTS Think about the semantics of maxTS. [4] check source code at SOCmanager::195 [15 min] 10:00AM [5] make the change to SOCmanager and test it [20 min] Worked! 10:40AM 10:00AM 04/16/2014 -------------------------------------------------------------------------------------- Task 226: run check on notepad.exe -------------------------------------------------------------------------------------- Some minor exceptions, verified that it does not distinguish between the environment. Around 80 branches collected. 10:00AM 04/18/2014 -------------------------------------------------------------------------------------- Task 227: fix jump size problem -------------------------------------------------------------------------------------- [1] find out where it complains about cannot find jump size. [20 min] set breakpoints on 4 locations of binWriter.cc about complaining size of jump. [a] slice 22 captured jump size > branch size (5>2) ts: tsBranch=4033592 [b] slice 45 jump size > branch size (5 >2) ts: tsBranch=4027658 [c] slice 46 jump size > branch size (5>2) ts: tsBranch=4033594 It finally broke at Trace::~Trace(), CallAdjustRecordProcessor broke 8:30AM [2] solve the jump size problem. [15 mi] Observation: main problem is a 2byte JE instruction, check tsBranch 4033592 again. 9:15AM [3] solution: develop a second chance function getImmediateHole(int bytes) - usually five. Design [15 min] 9:30AM [4] implementation [1] create binWriter.cc::static int findHoleImmediate(Trace *trace, int fid, int fidSrc, vector<sectionInfo*>* vecSection, unsigned int eipStartSearch, int minHoleSize, unsigned int* holeStartEIP); //start the search at next immediate instruction of eipStartSearch [5 min] DONE. [2] add function getSectionContainAddr, modify addrInAnySection [15 min]. DONE [3] read byte one by one, if \x90 or \x00, make the hole. if note, call trace->getInstrInfoID and then load the InstrInfo. check if the instruction is in slice. If no, then expand the hole and update the logic [20 min] FAILED the solution does not work. The next immediate instruction is included in slice. (its eip is 0x01003543). Check why it's included in slice. [3] unit test [20 min] 9:00AM 04/21/2014 [5] check into case 4033592 and see why the instruction is already in slice right after the jz. set a conditional bp on InstrInfo::setInSlice (0x01003543). It's caused by SOCEnd. 09:20AM [6] check why it's list as SOCEnd: Since the tsSeed is a branch instruction, it's identified as a transfer control instruction and cannot be SOCEnd. So the next instruction is listed as SOCEnd. [7] check what quick trick we can make: check tsReversePointer. [8] implementation: DONE. [9] still not working, tsReversePointer is set a second time to a positive value. It is included because the subsequent one has a register dependency on it. now the SOC becomes: 4027335, 4033595 Another idea: check countInSlice. It would not work. It will be counted in slice. [10] there seems to be a bug to be fixed: countInSlice. 3:00PM 04/22/2014 [11] re-run the program with 4033592 again. set bp on Trace::gen_slice_for_branch and then set bp on binWriter::writeProgramExit. it broke because of the 2nd instruction after it. Check in winxp image. (addr is the next instr after 0x01003543) It's a JE instruction and is dependended by 4033157 (which is a earlier timestamp) 10:00AM 04/23/2014 [12] check timestamp 4033592 (0x01003543) in winxp. [20 min] Strangely still found that it is visited before the next je [13] re-run the program and check 4033592 again. Set bp to check how is 01003541, 01003543, 01003544, 01003546 included Found that they are added because of TYPE_OF_ALL_FUNCTION. So need to make an exception for that case. 9:00AM 4/24/2014 [14] add the TYPE_ALL_FUNCTION in findHoleImmediate() [20 min] ts: 4033592 (0x01003543) [15] fix the problem of exceeding boundary 01003554 Now timestamp 4033592 works. [16] test time stapm ts: tsBranch=4027658 ts: tsBranch=4033594 works [17] run these three branches. Found that 4033592 branch fails (throws exception). 10:00AM 04/24/2014 -------------------------------------------------------------------------------------- Task 228: fix exception throwing branches -------------------------------------------------------------------------------------- [1] observe Branch: 4033592 broke at 0x0100750c. -> 0100297b --> 0x010049b6 9:15AM 04/25/2014 continue the analysis Start two instances and compare 0100750c same 0100297b - different. There is a shift of 4 bytes in ESP. 010049b6 - completely differnet sets of parameters passed. So the problem is that which causes the 4 bytes shift in ESP? [2] comparative study: Different occurs at the call at 0x0010750c (call entry) The problem occurs at 0x0100239b, the calculation of the bridge connection is not accurate [3] try run the algorithm twice and see what's the difference. Manual run. ts: 4033592 (0x01003543) {set REQUEST mode to 1 first} [4] Problem: when it finishes generate_branch, rr_processor destroyer is not called. fixed the save trace problem. VERIFIED it's till the same problem. of ESP adjust size. So now the problem: bridge adjustment of ESP size is not right (adjusted 4 bytes more). Trace into where it is from. 9:00AM 04/26/2014 [5] set a breakpoint at the writing to 0100239b (add ESP, -28). problem: did not capture it [6] set a breakpoint at binWriter. found the bridge at 0x0100293b. [7] set another bp on tsBridge to capture the generating process and compare. 9:00AM 04/27/2014 [1] check the tsBridge conditional BP and then do the same and compare ESP value. Bridge starts at 0x0100293b. (nextSOC eip: 0x01002940) ESP_BEFORE_next_SOC: 7fef4 EBP_BEFORE_next_SOC: 7ff1c ESP_AFTER_cur_SOC: 7ff1c EBP_AFTER_cur_SOC: 7ff1c WinXP: ESP_BEFORE_next_SOC: 7fef4 EBP_BEFORE_next_SOC: 7ff1c ESP_AFTER_cur_SOC: 7ff1c EBP_AFTER_cur_SOC: 7ff1c Error version: ESP_AFTER_cur_SOC: 7ff20 EBP_AFTER_cur_SOC: 7ff20 The problem: 0x01002938 (push ebp) is not put in slice. [2] check why 0x01002938 is not included in slice. It is depended by 0x0100293b Trace from 0x01002939. bp captured 0x01002939 (MOV EBP, ESP) Note that both ier->isNeededForReg() and Mem() are false. The soc (2947617 -> 2947639)A It has a link (ESP link) but it is skipped because it is not a RETURN instruction. But it is an SOC end. [3] new problem is with 0x01004568 (ESP messsed). use the same method to debug ig. It's the 0x01004567 messed up. 12:30pm 04/29/2014 [4] set bp at binWritter::gen_bridge (gdb) p/x esp_after_soc $8 = 0x7fee0 (matches xp) (gdb) p/x ebp_after_soc $9 = 0x7ff1c (matches xp) (gdb) p/x esp_before [next tsTarget: eip: 0x01004577) $10 = 0x7fe2c (matches) (gdb) p/x ebp_before $11 = 0x7fedc (matches) Found two problems: (a) assemble ebp: should call assemble EBP (b) assemble ESP error: the ADD ESP, 0xFFFFFF4C is wrong Strangely, it's working again. Fix the ebp problem. Generate slice for 4033592 again. [5] still got signal 0x502. This time broke at 0x010049b6. esp value is not the same. Problem analysis: SOC end does not have ESP dependent link, so it did not propagate to the instruction at 0x0100457a. [6] fix idea: [1] set a boolean variable which records if the instruction has ESP or EBP propagated. [2] create a function propagateEBPESP(int esp=1, tsStart, tsEnd) backward search the latestESP modification instructions and mark it as ESP_DELAY call functions: find the one changes ESP. 9:15AM 04/30/2014 [3] Implementation: [1] create Trace::findTsWithoutESPAfter. Simulate the implementation of findTsWithESPAfter, make it also work for ebp. [15 min] DONE. [2] in Trace::full_slice set two boolean variables which records if ana instruction has ESP or EBP propagated [10 min] DONE. 9:00AM 05/01/2014 [3] create Trace::propagateESPEBP(int esp=1, tsStart, tsEnd) which propagates the ESP/EBP lik [20 min] DONE 9:30AM [4] test over 4033592 cas eip: 0x010049b6. Find the soc first. [30 min] [1] capture the SOC of 0x010049b6 [5 min] Problem it did not capture 0x010049b6 It did not capture 0x010049b6 [2] try break on 878 and 881. [5] broke at 0x0100750c--> 0x0100297b. Still cannot capture the real code. [5] check the slice generated, still got 502 error. 9:00AM 05/02/2014 [6] check the new 502 error. 01007509 -> 01004592 the problem is with the instruction 0x0100457b, one PUSH instruction is not taken (it's located at 0x0100457a and it is skipped in the slice) [6.1] break on 0x0100457b and check its dependency. It should be SOCEnd. advance 876 after the BP is found [6.2] the problem seems to be tsFindESPNotEqual. trace again. Still got the 502 error. Check again. Problem: 0x0100462e (test call given input "c:\windows\notepad.exe" return 1 in EAX) [6.3] trace into above problem. Problem is with 0x01004500 registerClassW has different returns. Then it differs at 0x77D4AE44A it's because 0x77D4A566 (differs on ESI value). caused 77D4A550 (before the call all params seem to be the same) Why is registerClassW failed? if we place the file in win (verified it's not the location problem). Summary: sequence of error: 0x01007509 -> 0x01004592 -> 0x0100462e --> 0x01004550 --> 0x77d4a550 (RegisterClassExW) 9:30AM 05/03/2014 There seems to be something wrong with RegisterClassExW (check if the information is passed right] [7] check in winxp abou t0x77d4a550. Windbg does not provide symbols, has to analyze manually. Correct Version: the third word (WNDPROC lpfnWndProc;) is not null: 0x01003429 Problematic Version: the third word is 0x00000000 Hit too many times. Need to follow the chain of The payload is located at 0x6fdf0 (the 3rd field is at 0x6fdf8) 0x01007509 -> 0x01004592 -> 0x0100462e -> then set the hw breakpoint. Found that at 0x0100462e it is still not there. Found the problem: the instruction at 0x01004539 which pushes the parameter to the stack (lpfnWndProc 0x01003429) is SKIPped! It is copied to 0x6fb58 at 0x77d4a485 (it is also included in wrong version) Then the content at 0x6fb58 is cleared at 0x77dd72ef When 77d4a550 (RegisterClassExW) is called it's the copied data 0x6fb50 is passed!!! Now the question is why isn't the data dependency being produced in the trace alg? [8] Find out the timestamp which actually reads the information. Figure out any of the following are hit multiple times. 0x01007509, 0x01004592, 0x0100462e, 0x01004550 ALL hit once. Then set a bp on 77d4a550 (how many times is it hit)? It is called three times AFTER 0x01004550 is hit. [8.1] Trace algorithm debug: [15 min] set bp at 0x01007509, 0x01004592, 0x100462e, and 0x01004550 set bp at 0x77d4a550 (and see how many times it's hit) seed ts: 4033592 0x01004550 timestamp: 3730860 0x77d4a550 hit times: only 1 time before 0x01004550 ts:3732503 the correspoinding 0x7c90eb8d (syscall) must be located at 3732507 [8.2] check the dependency of 3732507: its eip is 0x804df184. only one dependency on 3732503. [8.3] check dependeny of 3732506: eip: 7c90eb8b found that it has 22 dependency links. Problem: it's eip should be 0x7c90eb8d? link 1: esp_link: 3732505 link 2: type 12: 3732500 (TYPE_MEM_LINK_ADV) link 3: type 12: 3732498 but none of TYPE_MEM_LINK_ADV is actually processed. The problem is bDataProgatation is true. BECAUSE it's set to true earlier at the beginning! 3:45PM 05/03/2014 [9] debug into timestamp 3732506: eip: 7c90eb8b (it should be 7c90eb8d) first of all [1] two problems: why eip mismatch [2] why bNoDataProgation? Solve problem [2] first. For the instruction the isNeededForReg() and isNeededForMem() are both false, that's the reason, no data propagation is not allowed. This seems wrong for kernel service. Think about the case of registerClass. It does not generate any memory writes and then it's going to cause problem (when pass a null pointer). Another problem is: there should be actually register dependency. becaues EAX is being read. [2.1] check why isNeededForReg is not called. Trace from timestamp 3732512 Problem: CONTEXT SWITCH INSTRUCTION IS NOT MARKED REGISTER DEPENDENCY. They change EAX, EBX, ECX, EDX, ESP registers. Should mark all these registers. 3732514: 0x77d4a563, MOV EAX, EBX (EBX->EAX) deplink: 2 (reg link on ebx, control link: 3732513) 3732513: 0x77d4a55d (JE 77d6fa25) deplink: reg link: 3732511 3732512: 0x77d4a55a (MOVZ SI, EAX) deplink: reg link: 3732503 (but it should be 373507): guess original version is EAX return is 0 and nothing is changed so there is no dependency on it. 3732511: 0x77d4a558 (TEST EBX, EBX) deplink: reg link: reg link on 3732510 3732510: 0x77d4a555 (MOV EBX, [EBX+14]) deplink: 3 reglink: 3732423 (out dated, should be depend on sysenter instruction) memlink: 3730852 controlink: 3732509 3732509: 0x77d49835 (RET 1C) deplink: 3 reglink: 3 ESP link: 3732508 (ok) memlink: 3732502 (take the return addr from ebp+4) ok. control link: 3732508 3732508: 7c90eb94 (RET) deplink: 3 ESPLINK: 3732505 memlink: 3732505 (call pushes ret) control link: 373507 *** 3732507: 0x804df184 sysexit - (*** SHOULD NOT BE RECORDED!!! *** deplink: reg link on 3732503 [does not look right] *** 3732506: 0x7c90eb8b (missing 0x7c90eb8d!!!!) deplink: 22 of them! ESP link: 3732505 mem adv link: 3732500 ... 3732505: 0x77d49833 (call [edx]) ESP link: 3732502 reg link: 3732504 3732504: 0x77d4982e (mov edx, 0x77fe0300) no dep link 3732503: 0x77d49829 (mov eax, 11e8) control link: 373502 373502: 0x77d4a550 (call 77d49829) Summary: [1] problem that caused the miss of code: all bMultipleWrites && code 0x 0x7c90eb8d ( or 0x7c90eb8b) should be marked as propagating memory (needs all memory contents). [2] issue 1: during the trace generation, 0x7c90eb8d should be recorded instead of 0x7c90eb8b [3] issue 2: during trace gneration, sysexit should not be recorded [4] issue 3: eax, ebx, ecx, edx, esp should be recorded as updated Decision fix: [1] first. 9:00AM 05/04/2014 [1] NEW ISSUES AFTER [1] is fixed, still not working check these points. 0x01007509, 0x01004592, 0x0100462e, 0x01004550 Problem is that the push instruction is still not included in slice. [2] trace into 3732506 and check each dep link memlink: 372505, 3732500, 3732498, 3732497, 3732496, 3732495, 3732493, 3732456, 3732446, 3732490, 3732487, 3732477, 3731039, 3731040, 3731052, 3731068, 3731049, 3731048, 3731050 ... For ts: 3732495, eip: 77d4a536 For ts: 3731039, eip: 77d4a5b3 For ts: ... there are many layers of info forwarding, not sure which one. [3] check how these are first layer dependency are handled: problem: isTsInSliceWriteTo the tsToMrM has no record at all. Because no one is actually reading from the memory that is WRITTEN by the ts. Now fixed. 9:00AM 05/04/2014 -------------------------------------------------------------------------------------- Task 229: trace collection problem. -------------------------------------------------------------------------------------- [1] think about the change of adding all memory dependencies of a syscall/sysenter. think about a case that a printf is never needed. the syscall might write to some memory slots that the others depend on (kernel memory). These will be skipped. Otherwise, there will not no dependency to the printf. Should be ok. [2] issue 1: during the trace generation, 0x7c90eb8d should be recorded instead of 0x7c90eb8b Fix Trace::checkRecordStatus about context switch. bRecordEnable still should be true, delay one step. [3] issue 2: during trace gneration, sysexit should not be recorded Fix Trace::checkRecordStatus about context switch. bRecordEnable still should be false. [4] issue 3: eax, ebx, ecx, edx, esp should be recorded as updated in InstrExecRecorder::expandFromRaw, if it's sysenter, update the this->writeRegAccessMap. 8:45PM 05/06/2014 [5] implementation plan: address issue 1 and 2 in Trace::checkRecordStatus. [5.1] trace into the logic Problem: at 0x7c90eb8d the bRecordEnabled is disabled (it should be delayed one step). 11:00AM 05/10/2014 [5.1] fixed 0x7c90eb8d fixed. [5.2] check how the sysexit is handled. break on Trace.cc:1885 Seems to be ok. verify again. it's handled correctly. [5.3] verify issue 1 and 2. break at the Trace::gen_slice_for ... set a breakpoint at eip 0x7c90db8d and see how many links it has. [1] check the problem that that system receives segmentation fault. cache->vecBlockIdx[] break. It's RecordRequestProcessor::loadFromCache broke. Still got the segmentation fault. even change the mode. -- stuck at this point. In full trace mode, it is to load. But load fails. [2] check if RecordRequestProcessor is ever saved. Verified. 11:30AM 05/12/2014 Need to enable save trace::rr_processor in BatchAnalyzer.cc Fixed [3] now verify if 0x7c90eb8d has the links DONE. working! [4] verify sysexit is not recorded. check if there is any instruction recorded for 0x804df184 or 0x80xxxxxx. Problem: sysexit not handled yet. 8:30AM 05/11/2014 [5.4] check the handling of sysexit. [30 min] Set a breakpoint on 0x804df184 and 0x804d1f25 in Trace::handle_instr and checkRecords. Found the error, did not delay one step. [5.5] Fix and check again: [30 min] Observation: still not removing 0x80000000 range instructions completely because 0x7c90eb94 returns to 0x8000000 range instruction. [5.6] check the event of interrupt how it is handled [30 min] search for INT_EVT seg_helper.c:1219 trace again on 0x7c90eb94. interrupt is INDEED raised by 0x7c90eb94. Trace::handle_interrupt sets the bJustReceivedInterrupt. It seems that 0x7c90eb94 generates a page fault and CPU jumps to 0x806ecd34. it seems to be handling it right, ALBEIT the context switch should not have one step delay. 10:30AM [5.7] fix the extra one step delay in sudden context switch for page faults. [30 min] debugging: check 0x7c90eb8d case and the other switch case from 0x7c90eb94. [5.8] verify if the fix is working. Seems to work, no 0x80XXXXYY instructions are hit. 12:15PM [6] fix the register issuance. break on InstrExecRecorder.cc:244 [7] generate the slices. 9:00AM 05/14/2014 [8] run all the slices in regular mode. Perfect! no exceptions at all! and some does pop notepad. copy the file for diff test. [9] run in debug mode. difference: none! PROBLEM SOLVED. 9:10AM 05/14/2014 -------------------------------------------------------------------------------------- Task 330: compare the slices for test_vm job -------------------------------------------------------------------------------------- test_vm is: Themida [1] problem: error cannot find instruction at 0x0 change record time to 2 minutes and see how it works. It break at the first call gen_slice_for_branch 7:00PM 05/16/2014 [2] debug into the problem. Problem: generate the 1st branch -> final step: writedataSlice write the last soc: 385751->385751 write instruction 0x604000 file_offset 2105856 the problem is that the eip is located at 0x604000 which is the last byte of the 3rd section. [3] found the problem: 0x604000 may be an instruction which is completely out of the range. Copy the file and analyze how many sections are there. Found that: there are two sections lsited by Qemu: 0x401000 (startInMem), size 0x20000 0x423000 (start in Mem), size: 0x1e1000 But reported in IMM: there is only one section: 0x400000, 0x205000 which includes the range. Check the PE header. (0x205000 is reported by the size of image). There are at least 5 sections listed, but only 3 sections are listed in QEMU. Check binWriter::getExecutableSections. [4] finally the bug: it is the binWriter::writeSOC. it should NOT use vec[1] in soc list. 11:30AM 05/17/2014 [5] new problem: crash at verifyAllNOPS. 2nd visit of line 95 of binWriter.cc crashes the program. Problem is 0x42494e(it's regarded as NOP). [5.1] check in xp. Verified it's a 0xbb. The question is why is it written? [5.2] check why binWriter::writeInstruction is called and why does it call verifyAllNops. The problem is that the instruction written into the file does not match the one in disk. gdb) x/5bx toWrite 0x6f96dfe3: 0xba 0x00 0x30 0x02 0xf0 (mov edx, ...) ins @42494e: mov edx, 0xF0023000 (gdb) x/5bx verifBuf 0x85d4e3f4: 0xbb 0x01 0x31 0x03 0xf1 (mov ebx, f1033101) [5.3] check tomorrow what is at 0x42494e (set a hardware breakpoint). Verified: it's run-time code extraction. [5.4] Discussion: should be enforce writing it? [1] possibility 1: it could be the source and make hte decoded instruction actually different. [2] possibility 2: keep it and let it extracted. The problem is that program exit does not work. 10:20AM 05/18/2014. [5.5] Check possibility 1, enforce writing. [30 min] It does not produce an executable file. [5.6] try possibility 2. Still got exception 0x80000004 at 0x424959 In the original program, it's already changed to extracted code; but in the sliced program, it's still not touched. So the problem is that THE SLICING ALGORITHM DOES NOT DEAL WITH THE SELF-PACKING ALGORITHM!!! 11:00AM 05/20/2014 -------------------------------------------------------------------------------------- Task 331: handle self-extraction in slicing algorithm -------------------------------------------------------------------------------------- [1] algorithm design: idea: Whenever handling an instruction add a memory dependency reference on the last write. This can be added when generating the full trace. 7:00pm [2] Implementation: in moc_mem_access, add the memory link. It still crashes. The problem is control flow. 8:45AM 05/21/2014 [3] verify the self-extraction is working. [VERIFIED] [1] change back to 20 seconds version and see if it is working. There are occasionally deadlock. Check it later. [2] in InstrexecRecorder set a conditional bp and check if 0x424959 propergate to other instructions. Problem: for one instruction at 0x424959, it has estalibhsed over 5 linkes. timestamp: 1038921, 1038990, 1038994, 1039998, 1039002 eip: 0x424945, 0x424946, 0x424946, 0x424946, 0x424946 So 0x424946 is hit multiple times. But it should be actually the instruction at 0x424943 which writes into these instructions. (verified, the problem is caused by 2 instrutions shifting in recording). So it is 0x424943 who writes to the same instruction (different location), the recording is actually right. 10:30AM [4] check the generated slice again. [15 min] Does not work [5] problem: the program hangs. It is blocked in os_main_wait_loop, when requesting for i/o lock. [6] solve i/o lock problem: set a boolean variable bTerminated as global and check it before acquire the lock [6.1] check the normal running of os_main_wait_loop found that there is a global variable called qemu_shutdown-requested 11:30PM [6.2] in BatchAnalzyer::stopvm set the qemushutdown_requested to 1; and in loadvm set the shutdown_requested to 0. [15 min] Actually just call qemu_system_reset_request. [not working] [1] add a global variable bVMTerminated. [5 min] [2] change implementation of qemu_shutdown_requested and check the bVMTerminated [5 min] [3] set the variable in loadvm and stopvm of BatchAnalyzer [10 min] 12:00PM [4] tes: it seems that the thread of init thread is running. It's just very slow. It seems that the threads are having a live lock. The mutex lock check is locking the system up. Displaying the lock, it shows that thread 10014 is the owner of the lock thread 10014 waits for a lock for mem_write (mylock) the lock is held by thread 10031 which does the analysis of slicing. [5] set a breakpoint on the thread of Trace::gen_slice (mylock). get its thread id first. [1] b BatchAnalyzer.cc:491 [1.5] pthread_create [2] get the thread number [3] b mylock thread thread_id Found that problem is caused by mylock of thread. The task of slicing and generate full trace is a long running thread. ** test try to disable mylock in send_event unable to catch it. [6] test it again. Question: would disabling mylcok in send_event cause synchronization errors? (one thread doing read/write mem but the trace/traceManager is deleted?) 9:00AM 05/22/2014 [6.1] new problem: failed becaues of time out. Guess it's the full-trace timeout too short. Redo. [6.2] it turns out to be gen raw trace, execNextTask crashes the entire thing. The problems turns out to be TraceManager::destroyInstance. The problem is caused by the termination of the main thread. So, we should actually remove the is_shutdown_request() change in vl.cA instead, inesrt the bVMTerminated logic into is_vmstop_request() in vl.c problem: still throw exception on vmstate<VMSTATE_MAX change the state to PAUSED. Now seems to be fixing point 5. 10:00AM 05/23/2014 [7] it still hangs, because there are too many SOCs. Try see if the modification on verifyAllNops caused the problem. Look at ::writeInstruction. Still no effect. Check what's the problem (is thread 1 main thread costing too much?) Verified that when generating the full trace the main thread is working. [8] test if it works on notepad.exe. the main loop thread is still executing. The generation speed is about 2 minutes per slice. [9] make modificatoin in main-loop.c:422 add a sleep statement. on condition of bVMTerminated. Note: move the actual definition to dummy.cc New experiment: speed is about the same 8:30AM 05/24/2014 [10] think about what caused the low speed. [10.1] disable the mem dependency and check. Verified, when mem dependency is brought into scenario, it makes it very slow. 9:30AM 05/24/2014 [10.2] do the experiment again. There is only two socs resulted. 1153635 -> 1153636 and 1153627-> 1153628, only 4 instructions involved? 10:30AM [10.3] recover the mock_readinstr and see what are the socks. the first loop that propagates the dependdency takes forever. Takes about 15 minutes to generate the first slice (28764 socs!) [10.4] print the data and cache loading and see what's the problem. 9:30AM 05/26/2014 [10.5] refine the print and check It looks like the getESP is consuming a lot of time. [10.6] add a print_stats in Cache class for debugging purpose. 15 min. there are about 1 million timestamps. Cache load is not that crazy however, most of the see if reset prioirty of thread is ok. 2:00PM [10.7] try adjusting prioirty of the branch slice thread and see what happens. 7:45PM 05/27/2014 [10.8] adjust the priority of main thrad as well and see what happens. Found the original approache does not work. Prioirty is always 0. seems not quite helpful. Need to set policy to RR. 9:00AM 05/28/2014 [10.9] debug and check the effect of SCHED_RR. It seems no use. Delete all threads config statements. 9:4rAM [10.10] research problem again and check the bottleneck. use sampling approach. Most of the time it is spent on getESP value. 5:00PM 05/29/2014 [10.11] check the feasibility of improving the getESP_BEFORE productivity. [30 min] [a] read the logic of getESP_BEFOR [done] [b] think about hte current solution of CachedMap [done] no need to use CachedMap [c] think about the solution. Idea: keep an array list (Cache) of ESP/EBP value to store the entire ESP/EBP value change of records. Part of the records may be located on the disk, but the recently used records can be stored in chunks. When a request comes in, perform a binary search on the chunk. [10.12] Implementation Steps [Estimate: 2.5 hrs] 5:40PM 05/29/2014 [1] declare class BinSearchTable. It includes a Cache, and allow operations to add value paper. [5 min] DONE. [2] constructor BinSearchTable(char *cachePath) [5 min] DONE. [3] void addValuePair(long long int ts, unsigned int value). Mainly to call Cache::appendRecord [15 min] [4] long long int getSize() [5 min] [5] void saveToDisk [5 min] 6:00PM 05/29/2014 [6] unsigned int value getValueForTS(long long int ts). Perform binary search on it [30 min] [6.1] retrieveRecord(long long int *ts, unsigned int *value) [15 min] 8:00PM 05/29/2014 [8] unit testing 1. 1 -> N 3 times size of the cache [20 min] [9] uni testing 1 -> N 3 times but prime number entries. [25 min] DONE. 4:50PM 05/30/2014 [10] add a faster version of quick_search using the latest loaded cache. [11] unit test it. 8:45PM 05/30/2014 [7] Trace::constructEspChangeTable(bool bESP) [1] declare two BinSearchTable one for esp and one for ebp [8 min] [2] in Trace::constructFullTrace from raw trace, build the binSearch Table [15 min] [3] in Trace::loadTrace() deserialize the two binSearchTable. [15 min] [10] modify Trace::getESPAfter [10 min] [11] modify Trace::getESPBefore [15 min] 8:40PM 05/31/2014 [12] debug: load trace problem. solved 8:50PM 06/01/2014 [13] debug: problem of consecutive ts. UNSOLVED YET. inspect code. It seems ok. The problem is solved. [14] too slow in clear_inslice_tags. Found the problem: too many passes. Every time bNoMod is false. Find out who is writing to bNoMod due to the call of verify_all_socs. 7:30PM 06/02/2014 [15] debug inot verify_all_socs and see how it returns false on bNoMod. It seems that soc id 0 always gets false in its bModified, let'c check what is going on. Found the problem, an extra i++ caused the problem!!!! (fxxx!) introduced by testing code. [16] now back to the problem: cannot read byte at specific address. 4:00Pm 06/03/2014 [16.1] check the "EIP" it is writing into. give up the write if the address is NOT right. Problem: read byte 512 in section 2. (0x604000). the problem is that the real size (in mem) is 0x1024 but the in file size is 512. So need to make two updates. First, read the in file size and then control the loop. [a] fix sectionInfo [b] fix the loop on reading byte/writing byte. Seem to be fixed. Need to verify. Very slow though. [17] generate 10 slices and run them. 4:00PM 06/04/2014 [18] it turns out the slice 3 is still too slow. Check it again. in Trace.cc:1279 change i to 3 Pass2 : 7:48PM -8:07PM (20 minute) Pass3: 8:08PM - 8:27PM (20 minute) foudn that there are 157545 SOCs. Too many. Most time spent on verify_desc_soc. Pass4: 8:07-8:49PM Pass5: 8:49PM- 5:00PM 06/05/2014 [19] try to improve it, remove verify_desc_order() pass2: 8:25PM -8:39Pm (14 min) pass3: 8:40PM -8:50PM (10 min) improved half the speed pass4: now analyze each. clear_tag 8:51PM - 8:54pm 3 MIN init_data_slice: 8:55PM - 8:55PM 1 min slice_all_soc: 30 seconds add new ts to soc manager: 8:56PM - 9:05Pm 10 min verify_and_reset: 9:09PM - does not take much time. 4:00PM 06/06/2014 [20] check it again and record the number of SOCs after verify_and_reset. pass1: smsize: 157210 first SOC tsStart = 1153664, tsEnd = 1153668 -> 157202 pass2: 157457 (after adding ts) (added by the prevous rounds) -> 157457 92809 modified consecutive many SOCs modified pass3: [21] analyze the bottle neck of for loop of addSOC. random sampling: ii->set_inslice - 3 sm.addTS (findSOC) - 15 So socmanager.addTS findSOC is the most costly operation [22] try to improve SOCManager??findSOC after improvement of findSOC pass2: 9:59PM ->10:02PM 2 minutes! 4:00PM 06/07/2014 -------------------------------------------------------------------------------------- Task 332: Speed-up -------------------------------------------------------------------------------------- [1] replace the sample programs. [2] improve clear_init_slice. [a] in InstrExecRecorder add an integer pass attribute (5 min) DONE. [b] modify interface of setInSlice and add pass number. (10 min) DONE. [c] clear all related syntax errors (20 min) DONE. [d] modify isInSlice interface and implementation and clear all syntax errors (20 min) DONE [e] modify clear_init_slice (10 min) DONE [f] handle serialization DONE [f] unit testing [15 min] DONE. [3] test on b21.exe (there are lots of exceptions). Must be introduced by the improvement. [4] run on themider job first and see if it improves. 4:00Pm 06/08/2014 [5] debug into themida job and look at the performance. pass 2: sm.getSize() 157204. full_slice does not take much time. Problem: more and more passes [6] debug into one pass and check the time. found that one addTS operation could take a lot of time. There are too many SOCs, and merge them takes too much time. 8:00PM [7] heurstics: if pass>5 or number of SOCs greater than 1000, don't do SOC, do one slice directly. 4:00PM 06/09/2014 -------------------------------------------------------------------------------------- Task 333: Debug the Speed-up Algorithm -------------------------------------------------------------------------------------- [1] use b21.exe Found proble brc_1 [2] IMM broke at b21.exe of brc_1. (actually it broke on all of the EXE files) [3] use IMM to debug IMM. It looks like the path problem, too long./ [4] cannot figure out why. Use windbg to debug it. brc_0 works. [5] use hxD to perform a binary diff between correct version and wrong version. First difference occurs at file offset 0x400. (it is the start of the .text section). The first byte diff is 0x405 (file offset) corresponds to 0x00401005 (jmp MAIN instruction) It is changed to NOP. but it does not seem to explain why IMM crashes on it. One observatoin: why is NOP placed???? [6] place the faulty b21.exe as b22.exe in the same folder. IMM does not crash anymore. It seems to be the path problem. [7] verified. Now need to copy the file to the ...\sdk path to do the check. [8] comparative study of the two versions. Find out where it breaks. The problem: 0x4014A0 (jump Start) is not included. [9] debug into the slicing algorithm line 1228 source seed eip: 0x40134e the init_data-slice looks pretty normal check the full_slice_all_soc now Strange: soc is only 126949->126945. 4:30PM 06/10/2014 [10] verify the only SOC problem: only ONE SOC identified. small range. Observation: the handleProgramEntry does insert a new SOC at the program entry. bridge for the first SOC is 0x40616e First instruction is 0x40149b. Problem is that II is identified as not in slice. Check handleProgramEntry again and check why II is NOT identified in slice. Problem is inSlice (pass_no) returns true, but ii is not in slice. See the problem: need a deep clean which clears ts of all IER. 3:30AM 06/11/2014 [11] implement the clear_init_slice_deep. [12] debug into slice 1 and check handleProgramEntry. Removed the problem. But there are occasionally other exceptions. [13] generate the full slice for b21.exe. [14] new problem: too many timeouts. 3:00PM 06/11/2014 -------------------------------------------------------------------------------------- Task 334: Fix the timeout problem -------------------------------------------------------------------------------------- [1] find which slice times out. id = brc_4. [2] comparative study of brc_4. Problem: the RET of 0x4019E4. [3] find out brc_4. ts. ts is 0x405fcb. It's called by _mtinit 8:48AM 06/13/2014 [4] check out how many times processFunction is called. It is called many times need conditional bp. [5] set a conditional BP so that the processFunction is called for 0x4019E4. Process function is not called at all. [6] so we do have to trace back from 0x405fcb. [7] display the last EIP included in trace. There are 8 SOCs being processed. The last two are: : /x lastEIP = 0x406183 1: /x lastEIP = 0x406174 [8] check what is located at 0x406174. The question is WHY 0x406174 did not trace to [9] problem is with the call at 0x401341 (call PRE-LOG) 0x4019a0 to 0x4019e4 (call of pre_log) [10] problem is the handling of 0x401348 (cmp instruction), why it does not lead to the RET instruction at 0x4019e4. Problem is that 0x401348 is not hit at all? 9:00AM 06/14/2014 [11] debug 0x401348. In previous rounds get the ts of 0x401348 and then in the 4th round. check if it is in slide, if it is how it is set? ts is 126945. At the beginning of 4th round: during pass 1: it is not set. After all the passes, it is still not set. Strange: after debugging, it actually generate a correct slice. The address 0x401348 is overwritten with a jump instruction correctly. [12] regenerate slices and observe behaviors. Still found a lot of problems. generate 8 slices. [13] problem now starts from brc_5. Check what's the problem. New problem: there are actually two problems. First the ESP value is maintained not correctly and then the RET at the end of the function security_init_cookie is not included in the slice. Decision: check the ESP problem first. Analyze instruction 0x4061A2. Look at why its ESP dependency is not propagated. [14] debug 0x4061a2. set a conditional BP in Trace::full_slice. verify first if it generates the bad exe file. Problem: it's not producing exactly the same exe file after the code is changed. A second time, it reproduce the error trace. Try it again. Problem: 3rd time. It does not work again. Has to record the trace firs tand then reproduce. verified twice. Now it's the problem of 0x4061a2 which does not propagate ESP properly check why. 11:00AM 06/14/2014 [14] check 0x4061a2 it's in slice and bEspProcessDelay is true. ESP dependency leads to ts: 126842 (eip: 0x40619e). It is ignored because of SOC starts at 126843 (eip: 0x40619f). [15] check the list of SOCs. At pass4, soc 9 (starting from 0, second last soc) , there are 11 socs. 126843 (eip: 0x40619e), tsEnd = 126906 ( vec10: 126832 (eip: 0x406174), tsEnd = 126833(also: 0x406174??) The problem is then, why is the first call included? It is called because the setting of the last SOC. When pass exceeds 5, there is ONLY one SOC. Then it is doen that way. [16] check the slicing in the last SOC and see how it would because like this. Now in the last SOC, the esp link does propagte to 126827 (0x40619e) Now look at the handling of timestamp 126827 (0x40619e): has esp link. propagates to eip: 0x406182, ts: 126837 Is it set in slice? Strangely ier is in slice but ii is not!!! Check the handling of 0x4061a2 and see how tsTarget is set in slice: only setEspProcessDelay is called. A problem of the pass number approach: since flag is not cleared. if in another pass, if an setEspDelay is called it will be used to update the pass number and then the other inSlice flags will be confused!!!! needs to fix. 3:4506/15/2014 -------------------------------------------------------------------------------------- Task 335: Fix the bug on pass_no -------------------------------------------------------------------------------------- [1] add attributes and fix all functions. [30 min] [2] unit test.. OK. [3] generate 10 slices. [4] new problem a lot of 0xC00005 error. Check brc_5 comparative study: break at 0x4013b5. It seems that the generate bridge is WRONG. At 0x40601E it tries to jump to 0x406011, which is in the middle of an instruction.. [5] break on brc_5 and check how the slice is generated. [a] what is the slice target? target slice is 0x40601E. Strangely, the findHole locates 0x406011 as the starting address of the hole. 9:00AM 06/17/2014 [6] continue debug break on brc_5 and check writeProgramExit. b Trace.cc:1247 Found the problem. The hole_start is started in the middle of an in slice instruction. Strangely, the trace->has_instr(eip, opcode) should actually have checked it. buggy implementation. 10:00AM [7] proposed fix: [20 min] introduce two variables. If the instruction is in trace then set the block. 10:30AM [8] test the fix. check brc5. [15 min] [9] generate 30 slices and test. There are still quite some c0005 errors. [10] check branch br_11 break at _mtinit break at 00406112 Problem: 00405ccb (push instruction) is not in slice. This leads to 0x4019AC not reading the right instruction. which relies on 0x405cc9. 0x405ccb is skipped and leads to the problem. [11] debug: break on br_11, and then set a conditional bp on 0x00405ccb and see how it's propagating the dependency link. pay attention to SOC. Observation: tsSocStart: 130256, tsSocEnd: 144667 tsCur: 142345, so it's not the soc problem. check why. (a) 142345, eip: 0x4019ac. depends on: espLink: 0x4019a5 ... (b) check eip: 0x405cd0 and check its dependency. (ts:142342) depends on: espLink did not propagate because it is not bEspProcessDelay. (c) check 0x4019a0. espLink: 142342. Problem: ESP flag is NOT set correctly! Found the problem. ier->setInSlice does an extra of cleaning job. [12] new problem: a lot of process kill when running. slice brc_18. 0x103 error code. Problem: 0x004013D5 is not included in slice, it is dependent by 0x004013DF. This leads to the problem that it does not jump to 0x004013DF. 8:45AM 06/18/2014 [13] break on brc_18. and check what's the dependency of 0x4013DF, it should have a dependency on 0x4013D5 (jnz). [20 min] did not hit. 9:00AM [14] find out what's the last ts that is hit in brc_18. [15 min] The slice starts from 0x4070d6. 0x4013f4 (setargv) -> 0x4057e8 -> 0x40724b [15] check 0x4013f4. 0x4013f4: [1] esp link: 0x405924 [2] 0x4013ef set need visit. 0x4013ef: [1] reglink: ignored because no data propagation [2] control link on 0x405924: ret of getEnvironmentStrA process control link and find function can be skipped. set the previous instruction as need visit: 0x4013e5. 0x4013e5: [1] reglink: ignore [2] control link on 0x7c812c92. process function and skip the function. set need visit on 0x4013d5 *** 10:10AM 0x4013d5: It is not in slice because it is set to be NeedVisit only. It does propagate the data dependency (conditional on reg values) and set the previous TEST EAX INSTRUCTION into slice. Conclusion: bug: if the instruction before a function call is a JUMP, it is not right to set it only to needvisit, it should be directly in slice; otherwise, the control link will not be properly propagated. 10:40AM [16] fix: at the processFunction if the dependee is transfer control, then set it in slice. [17] debug case 18. propagates to 0x4013d3. [18] fix setIER_II [19] debug case 18 again. [20] test 20 cases. all good. [21] try all slices. same except slice 95. Could be caused by response in keyboard. [22] generate themider slice. too slow. stuck on slice 12. 9:15 AM 06/19/2014 -------------------------------------------------------------------------------------- Task 336: check the speed problem of slice 12. -------------------------------------------------------------------------------------- [1] repeat the experiment: set bp at Trace.cc:1297, modify i to 12. check how it slows down: read 2099717 set inSlice: 2099667, 1049079, 1049083, 1049091 read 2099736 read 2099738 block switch occurs: 4179912, 1608840 set slice of 1048523, 2516929, ... because the order of toProcess is random order. [2] add an integer and check the number of processing items. tid: 1, 3, 16545, 26009, 29385, 29803, 78528, 78688, 79202, 79437, 79672, 80103, 80109, 80110, 80116, 80121, It seems that it's thrattling. [3] improvements that can be made: Make the toProcess a heap instead of FIFO. ts: 17591, 256333, 5435259, 5435259, 760082, 5435259 tid: 1034259, 1034260, 1435475, 1479031, 1479114, 1600447, 2655052, -> end: 2794517 time: 2 minutes, end: 2794547 [3] improvements: avoid duplicate id processing. It actually never hits. the priority queue already handles it. [4] more improvements to make [a] line 369 of Cache.cc can be improved. time: 1 minute. saves about half. [5] observation: init_data_slice is CALLED many times! SOCmanager.cc:61 [6] observation: about 1 hr to 1.5 hrs per slice. Desparately need to analyze the performance. 8:50AM 06/20/2014 [7] verify the 7 slices generated. 2 ok and 5 generates exceptions. [8] analyze the first branch which generates the error. The first branch ends about 30 instructions from execution. 9:00AM [9] compare it with the correct version. The problem is that the instruction at 0x00604062 is skipped (while strangely the instruction before it, also used for self-extraction is kept). 9:30AM [10] check the current implementation and see if they work for the job1. seems fine no bugs introduced. 10:00am [11] check the algorithm, if no SOC handling at all what's the speed? full slice still takes more than 20 minutes [12] check the most expensive time operation by sampling: tsCur recover: 2 Trace.cc: 949: 8 add link: 1 [13] plan: add priority_queue to processing. 1 add counter. add Util::clock functions. 12:00pm [14] Util::clock: [1] startClock. 10 min. [2] endClock. 5 min [3] getDuration. 5 min [4] add counter. 5 min [15] run and collect data: [1] slice 12: first run of full_slice: (gdb) p sec $3 = 1477.6700000000001 (gdb) p cntProcessed $4 = 5049499 ===== second run =============== (gdb) p sec $6 = 1533.54 (gdb) p cntProcessed $7 = 5049499 [16] improvement: add a priority_heap and each timestamp is processed. Refactor the code. Need a lot of coding do it later. 9:00AM 06/21/2014 [17] Implementation: [1] Modify the trace.h and change InstrExecRecorder [10 min] DOE setInslice, setEsp, setNeedVisit etc. [2] Change implementation and correct all syntax errors [30 min] DONE [3] run unit testing [10 min] DONE. 10:00AM [4] run on job1. [15 min] [a] remove the bug of duplicates. Around 8 c0005 error. Acceptable (out of 147 slices) [5] test slice 12 of themider. [15 min] (gdb) p sec $4 = 1541.9200000000001 (gdb) p cntProcessed $5 = 4416542 Strangely, did not improve the time (gdb) p sec $7 = 1498.9000000000001 (gdb) p cntProcessed $8 = 4416542 Does not gain much. [18] proposal of another improvement. Remove the reloading. 9:30AM 06/22/2014 [19] Implementation: [a] make a snapshot before proceeding [b] idea: declare a bunch of local variables (boolean) to represent results of function calls and make copies of dependentLinks. 9:50AM [c] details: [c1] add a copy constructor to dependLink [8 min] DONE. [c2] declare an array of dependLink of MAX_WRITE_LINK [5 min] [c3] do the copy [5 min] DONE [c4] declare all the boolean variables for IER functions [40 min] [a] mem link. DONE [b] reg link. DONE [c] esp link. DONE [d] ebp link. DONE [e] control link. DONE [d] check the effect b Trace.cc:1315 b Util::getDuration (gdb) p sec $2 = 1460.95 (gdb) p cntProcessed $3 = 4416542 Verify slight change. [e] verification: ok around 8 c0005s out of 108 slices. [f] observation: most GDB ctrl-c stops at the handling at the end of the control link. 9:30AM 06/23/2014 [20] improve the handling of link [1] declare FastCachedMap class (template) , takes long long int [30 min] [a] constructor takes capacity [5 min] [b] hasKey [10 min] [c] get(long long int key) [5 min] [d] add(key, value) [8 min] [2] unit test it [20 min] 10:00AM [3] improve the implementation [20 min] [4] debug implementation [15 min] [4.1] issue 1: the place it is cleared is wrong. Achieved around 10x improvements (gdb) p sec $22 = 154.22 (gdb) p cntProcessed $23 = 4416542 [5] run the algorithm on job1 and see if it works. around 14 c005. out of 110 ok. [21] try to improve the handling of deep cleaning of cache. The problem is that instrStore is already cleared. Check if it caused any errors. 9:15 AM 06/24/2014 -------------------------------------------------------------------------------------- Task 337: check the speed problem of slice 12 --> further improve speed of clear_init_slice. -------------------------------------------------------------------------------------- [1] try to understand why no deep clean is no ok. generate the slice. [30 min] [a] break at brc_2. [b] analyze difference. Problem: instruction 0x4014a0 is not included [c] trace into trace generation and see what is the last EIP. lastEIP is 0x4014A0. Now the problem is why it is not written into slice. Use the following to find out its ts: p this->getInstrInfoID(0x4014a0, 0xe9) $3 = 45962 Then call ii->loadFromCache(45962) When calling ii->isInSlice() it returns true. [d] continue to check why 0x4014A0 is not included in slice when written to file. Problem: the instruction is actually written! strange. So we are looking at wrong slice. Generated slice did work. UNDERSTAND THE PROBLEM. The problem is that the previous passing POLLUTES the next slice! (because pass number still starts from 0!) [2] idea: change pass no from "char" to int, this is going to increase the data stored for each instruction for about 12 bytes (three 32-bit words). Should be worth it. If there are 4 million instructions * 12 bytes = 400MB. No good. alternative design: store an short integer slice number. (2 more bytes). 10:40AM 06/24/2014 [3] implementation: [1] declare a short int slice_no, init to 0. [5 min] DONE. [2] moidfy the serialization and unit test it [15 min] DONE [3] add a STATIC SHORT INT slice_no and provide a static function to set it [8 min] DONE [4] modify deep_clean to reset slice_no, and writes it [10 min] DONE. [5] modify gen_slice and pass a slice number and change the static number of slice [10 min] DONE. [6] at the beginning of gen_slice, if the pass_no is 0, do a deep clean [5 min] DONE. 11:05AM [7] modify each isXYZ function, if slice_id is smaller, returns false. [15 min] [8] inspect InstrExecRecorder code again. [15 min] 11:30AM [9] generate 30 slices and see the result. [10] found problem. First slice: brc_2: error 0x080003. [11] debug: generate slice 2 and see what's the problem. When it's generated separately, it's ok. [12] problem instruction at 0x00401341 depends on 0x0040133c but it is ignored. Debug it. set a conditional breakpoint at Trace.cc:738 if ier->eip==0x00401341 link is esp link on 0x40133c. setEspDelay is settled properly. ier->isInSlice(pass) is true. but ii->isInSlice() is false. [13] check it again. Strangely, after setEspDelay, the isInSlice() is true. Found the problem. Before round 2, the pass_no_inslice is set to 2 by previous branch generation Now when slice_id is updated by setEsp, but the pass_no is not updated, it's leading to wrong result. 2:00PM [14] whenever needs to set the slice_id, check if this is the first time it needs to be set and clear the pass_no. Declare a function for it. [15] debug slice 2. OK. [16] gen 100 slices and see how they work. around 15 c0005 in 111 slics [17] check setInSlice implementation (clear other flags part). It actually gives the same number of c0005 error. [18] now check the effects on the slice. 9:15 AM 06/25/2014 -------------------------------------------------------------------------------------- Task 338: check the speed problem of slice 12 --> further improve speed of other factors. -------------------------------------------------------------------------------------- [1] timing all components. init_data_slice: about 1.5 minute full_slice_soc: about 3 minutes loop that adds TS: about 1.5 minute init_data_slice again: 1.5 min full_slice_soc again: about 3 min writeDataSlice: about 2 minutes report_stats: 1 minute Total: > 15 minutes Still full_slice_soc is the most expensive one. [2] sampling on full_slice recover tsFirst: 1 mem link: 1 dpendLink copy: 1 toProcess.pop: 2 delayRegDependency: 1 [3] check if gen_slice can be improved. make change to heuristics on break on soc. 8:45AM 06/26/2014 [4] check the improvement init_data_slice: 73 seconds addTS: 327 seconds (too many initDataSlice) init_data_slice again: 399 seconds full_slice_soc again: :w check entire efficiency: 9:10AM slice 12 10:11 slice 19 (around 10 minutes each) 11:06 slice 21 (around 30 minutes each) 4:04pm slice 38 (around 20 minutes each) 7:20pm slice 69 (around 6 minutes each) 8:50pm slice 86 (around 6 mintues each) 8:43AM slice 174 (around 6 minutes each) 8:45AM 06/27/2014 [5] test the slices generated. [1] most are exceptions. [2] the debug version generates a log of Int3 interrupt, which stops the windbg but there is no "normal" 0x1122 difference between the two versions. Will solve the slice problem later. For now, concentrate on the speed problem. 9:30AM [6] add a class called timer to project. Move startClock, endClock and getDuration(). to it and fix all compiler errors. [20 min] 9:40 [7] add timer for init_data_slice, addTS, full_slice 9:50am analyze the stats on slice 12. [6] add the data stats to init_data_slice and all other relevant functions. init_data_slice: 67 seconds identifySOC: 9 seconds, 9 seconds, 8 seconds, 0.01 seconds, ... Algotether: 173 seconds writeProgram: 40 seconds. full_slice: 143 seconds. gen_slice: takes 439 seconds. 10:45AM [7] try to improve init_data_slice [7.1] measure init_data_slice performance 3 times. [20 min] 46.93, 57.32, 49.75, 57.2, 54. Avg = 53 [8] improve init_data_slice by adding boolean flags [20 min] remove the reload and the current link 68.93, 56, 63, no good , why? 58, 56, drop two extra see the result. 56.93, 53. not much improvement though. [9] try save one updateCache(). does not work. 3:15PM [10] improve the update only if changed policy. see if it helps. why only 0.03 seconds? needs to check. [11] fix error in clear_slice and try it again. [11.5] unit test. [11.6] found that need to do a deep clean of the slices for each test iteration! 60 seconds. [12] verify if the algorithm change. OK. [13] check the speed. improved a lot but suspicious. 9:00AM 06/28/2014. [14] check validity. Too many exceptions. Only 2 out of 39 are correct ones. Most exceptions are 0xc000001d. (illegal instruction). This must have something to do with code extraction. 9:00AM [15] compare brc_0 and real .exe [30 min] IMMM breaks on the brc_0. Use windbg instead. windbg fails as well. Check the program header and see what is wrong. cannot see much. [16] check the IMM crash problem. Use IMM to debug IMM. It broke in a strlen function call. Found the problem. It may be because of the name t0.exe.exe. Problem solved! Never have two .exes! [17] compare brc_0 and real themide.exe 9:30AM Problem: 0x00604030 depends on 0x0060402b and it is not included. [30 min] Observation: Note: arrCOPY can first save the time. It did call setEspProcessDelay on 0x0060402b. It seems that pass_no is not cleaned. [18] fix deep clear Observation: fixed the previous. It broke on code extraction again. [19] fix the arrLink copy first. data initial: 106 seconds, 101 seconds, 102 seconds after change: 101 seconds, 102 seconds [20] check the new problem. The slice skips the simple decoding at 0x00604060. tmp breakpoint: 0x004247cf. Problem: 0x00424949 (instruction MOV EAX, 0X48692121) is treated by the place as NOP instructions when writing slices. check the algorithm. Found the problem. It is OVERWRITTEN by 0x90 in clearAllSections because it's not in slice. related logic: find code extract at verifyAllNops [21] proposed fix: at clearAllSections. read the byte at the place, if it does not match the opcode of the instruction then give a warning and skip. 9:15 AM 06/29/2014 -------------------------------------------------------------------------------------- Task 339: check the incorrect slice problem -------------------------------------------------------------------------------------- [1] find out the branch branch eip is 0x424974. It is a self-extracted instrution which is changed by 0x424943 to 0x424947 Broke at 0x004247d9 -> 0x004248d3 The problem is that 0x004248d3 is replaced with a "JMP 0x004247ed" Now, 0x004247ed is identified as self-extraction area and it is not touched. The jmp instruction is likely to be one in the writeExit Case. [2] find out who's generating the JMP instruction at 0x004248d3. //set a breakpoint at asJMP and watch its target addr It's part of writeProgramExit. 0x004248d3 is identified as the first Jump addr. eipExit: 0x004247d9 is identified in findHole [3] need to update the logic of findHole (should avoid the self-extract part). The problem is that the area is identified as proper area for placing program exit because: [1] it is not an instruction executed before (yes, ok), and the byte is 90 or 00. But it lacks the check that it will be OVERWRITTEN. To fix: add another condition for checking overwrites. This needs help from run time trace recording. TO DO!!!!!! -------------------------------------------------------------------------------------- Task 340: handling of self-extraction. Identify the memory range that is self extracted. -------------------------------------------------------------------------------------- [1] Design: [1] Changes on Trace [a] add a string executable_path to Trace data [5 min] [b] change Trace constructor() and the TraceManager::isProcessToBeTraced [15 min] [c] verify raw trace recording is ok [15 min] -- seems no need for full-trace to know about it [2] Trace has a mem range manager which keeps track of addrInAny section for each memory write. [a] add a Cache to MemManager and a constructor [10 min] [b] add MemManager::saveToDisk [15 min] [c] add MemRangeManager::loadFromCache [15 min] [d] unit testing [30 min] [2] implementation [1] implement MemManager [a] add a Cache constructor [8 min] DONE. [b] add memManager::saveToDisk [15 min] DONE. [c] add MemManger::loadFromCache [15 min] DONE. 8:50AM 07/01/2014 [d] unit test. create 10000 and use a small buffer. DONE. Problems: need DEBUG! Found the problem: saveToDisk() is called twice. [e] found problem with addRange. [f] fix unit test problem. fixed: note the efficiency of memRangeManager needs some improvement when number of items > 1000. [2] add executable_path to Trace data and constructor [15 min] 10:20AM [3] debug on Trace() and verify constructor is working. [10 min] ok but mem crash. recompile. ok. just need recompile the entire thing. 11:15AM [4] add memRangemanager to Trace [1] add data definition [5 min]. DONE. [2] in Trace() and destructor create a Cache and create the memRangemanager. [15 min] done. [3] debug it. [5 min] DONE. [4] add vecSections into trace in constructor [15 min] DONE. [5] debug it [5 min] DONE. [6] in Trace::handle_mem_write use vecSections to tell if it is a writing. [15 min] DONE. OK [7] debug. [8] copy memRangemanager in full_trace. [15 min] [9] debug problem: always broke on delete [] ok. solved. 8:30AM 07/02/2014 [5] verify if vecSecs is set up correctly on branch slice [20 min] [5.1] need to modify loadFullTrace. DONE. 9:10AM [6] in binWriter::findHole supply a new parameter of memRangeManager [a] add function to memRangeManager:: bool hasAddr [15 min] DONE [b] change definition and implementation of findHole [10 min] [c] debug findHole and see how it works. [15 min] [d] test [15 min] NOW It's working [e] more testing: generate 20 slices and see the result. [7] found new problem: slice 12 c0005 11:00AM -------------------------------------------------------------------------------------- Task 341: check problem of slice 12 (0xc000005 problem) -------------------------------------------------------------------------------------- [1] compare the slice. Found the problem: 00604062 is not included but the instruction that writes to the same location 00604060 is included. This leads to the "jmp" instruction at 0x00423014 is not processed right. [2] re-generate the slice 12. and check the processing of 0x00423014 problem: bNoDataPropagation is identified becaues the instruction does not need anything (it's a simple jump fixed) fix: whenever an instruction is the result of self-extraction should propagate all data dependency! [3] new problem: 0x00423014 is not in any section! because it's out of section range! There must be a bug: [1] 0x00423014 is located in section "lgksxcmc" (size 0x1e100 and start: 0x00423000). [2] but it is not in the self-extraction mrm! 5:30PM 07/03/2014 TO DO - check how mrmSelfExtract did not include 0x00423014? -- SO there is a bug!!! [1] start in raw mode and see if it's captured. Problem: 0x423014 is destroyed when save to disk [2] set a conditional breakpoint. Strangely, it did not capture it! Shoot, found the problem. It's the destructor! 8:30AM 07/04/2014 [3] generate 20 slices and check it. Problem: generation stops at slice 39. 10:00AM [4] new problem: now every slice results in exception. [5.1] debug slice 0. the problem is that after self-extraction the contents at 0x00423014 is still not right. Found the problem: the contents are changed. [5.2] check why it is writing to 0x00423014. did not get it overwritten in findHole firstJump: 424974. 4248d3. eipExit: 0x42463a. file_offset is 135700 [5.3] break on lseek when offset is 135700 -- FOUND THE PROBLEM: it writes the DECODED instruction into the place. So Fix idea: fix writeInstruction in binWriter. No problem: it broke. 10:00AM 07/06/2014 [5] debug the broke problem: problem is findHoleImmediate Problem: it complains that it cannot find instruction 0x4042f9. (line 693) Reason: 0x4042f9 is a self-extracted instruction (its corresponding location in file does not correspond to the self-extracted form). Easy fix: just return -1 when it tries to locate such a hole starting from the instruction Long run problem: the entire writeProgramExit will anyway fail because the target location is SELF-EXTRACTED. overwrite it will fail. ===> need to think over the entire solution!!!! 9:00AM 07/08/2014 [6] modify findHomeImmediate and return false when the instruction is self-extracted. [20 min] DONE. [7] self a bp at the point and see how it works. [30 min] verified: it indeed kills the entire folder (so branch 3 is skipped) [8] generate 30 slices and see how it works. a lots of exceptions (but no error windows). compare. This time did not generate a lot of int3 in windbg. The current slice did not find any difference. [9] check the failed slice. First broken one is br16. check 0x004249cd. check 0x00424b5c. check 0x00426324. did not reach 0x00563516. it broke afer zwContinue. Start debug from 0x00426324. No much difference in stack. it seems that conents at 0x0012fcd4 is not the same. The sliced version misses one word. It complains about the access of eab3867e It broke after the 3rd shift-f9. Note: after the STI it should return to 0x00424c90. And it did return. Find parameter 2 of zwContinue, and +xb8 is the next EIP: 0x004236325. Setting software BP did hit it in both. Then last visited: 0x004272f1 00561587 005615b9. (note some info skipped). 005616a9 0050c652 So there is something wrong in the big loop. About 4 hrs generating 35 slices. 10 minutes each (500 seconds). Most of them suffer the same problem from slice 12. 9:00AM 07/10/2014 [10] analyze slice 12. need to regenerate the slice. Now it's slice 16. [11[ check slice 16. There is a big loop which hard to get out. Hard to find out Use binary search to find out how many steps to reach the point of break. First use animate over and see how it works. use "run trace" in debug menu of IMM to find out what are the previous instructions right before crash. NOTe ecx has the same as the EAX. It must be something like a "jmp ecx" or "jmp eax" instruction. Find it! (use the trace over to save time)!!! (call 0x004266ca) Detailed procedure: [1] first a couple of shift+f9 until 0x00426324 (STI) [2] then shift+f7 into 0x7c90 range and set a hardware breakpoint at 0x00426325 (right after the STI instruction) [3] then shfit+f9 to 0x00426325 [4] in IMM->View, clear and open run trace. Then in Debug->run trace. [5] view the trace and find out the "call 0x004266ca" instruction the latest call. This part is still extracted, so set a hardware breakpoint on it. 0x004266ca is actually hit 3 times and then broke. [6] Compare it with the regular version of Themider. Interestingly it never hits 0x004266ca! [7] compare the traces of the two. in the wrong trace: it takes only 169 instructions to reach 0x004266ca. Trace over is too carse, use trace into Still trace into takes 269 steps to reach So just do manual comparison. not working (caused by RTDSC???) [8] do in trace comparison again... Trick: can use "set condition" to set the number of "command count" to limit the number of execution steps in the correct trace (coz it never hits 0x004266ca) Two traces differ from: found the previous observation was wrong. The correct trace actually hits 0x004266ca!!! [9] binary search. Collect how many instrcutions until it hits the crash point in bad trace (t16). [9.1] Too many instructions: set breakpoint at 0x004266ca and check how many instructions to hit the point. [9.2] Observe the trace and set a bp at 0x00513cf0 which is within the 65535 range. not work. [9.3] observe the trace. The crash is caused by the instruction "jmp ecx" at 0x004e4643. (it's only hit once). But in the correct trace it's never hit. 7PM 07/11/2014 [10] binary search. First use trace over and then look at the difference. Use the limit the call approach. [10.1] incorrect trace. set hardware bp on 0x004e4643 (jmp ecx). 0x00426324 (STI) -> 0x7C90EAF0 -> 0x004e4643 (trace over -> 12 instructions). So set hardware BP on 0x00426324 and then 0x7c90eaf0 and then trace into. There are too many of them use Trace Over again. Found that at one zwContinue - eip (at 0xb8 of the input parameter 1 of zwContinue) returns to 0x0042632b. From Trace into it seems that the system direct reaches 0x004e4643 after the sysenter for zwContinue. [11] another try: trce into from 0x00426325 to 0x00424643. Given the trace: do the binary search. Binary search: 004e4310 (202) [did not hit] 005505f0 (1320) [hit] but the jump to 0x004e3ed2 (which eventually falls to 0x004e4310 never occurs) that's right 0x005505f0 the 4th time hit: 0x00438807, 0x0046f907, 004d62b6, 0x004e3ed2 correct trace 0x004db2b6 0x004c6876. There are over 17k instructions between each call. @#@#$ hard to analyze. 9:30PM [12] collect the two traces and compare. Note!!! the last two checkboxes in the file dialog needs to be checked! Then use windiff to compare the two files. The trace even recorded the register values!!! The different starts from 0x00550514 where ECX and EDI are the same!!! Check why they lead to the different ECX value for "jmp ecx" at 0x005505f0. So there must be someway that the slicing messed up somewhere and could not compute the right slice given the intensive data computing and we could not figure from where. What we know is that at location 0x0042bae6 it reads out ECX, and it does not work then. -- think about it later [13] generate all slices possible. broke at slice 39. 9:00AM 07/12/2014. Check slice 39. Why it's broke [14] check slice 39. does not give a lot of information. Needs to bp into slice 39. 9:00AM 07/13/2014 [14] check slice 39. break on it. it is sliceing 0x432084. Problem: the instruction has a dependency link on itself! Check why [14.1] check instruction 0x432084. It is a JNZ instruction. It depends on the previous instruction "CMP DL, 75". [14.2] check what is the repvious instruction. the previous ts is indeed located at 0x432081. So the problem is: how is the depLink constructed. [14.3] has to regenerate the raw trace and set a bp in InstrExecRecorder.cc on eip 0x432084. In the new trace generation it's ok. Problem still break on slice 39. Needs to improve breakpoint again. 9:30AM 07/14/2014 [15] check slice 39. The problem is in the raw trace gneration. set a breakpoint on when it produces the self-pointing depLink; and then trace on slice 39. Strange: could not capture the production of self-pointing depLink, but the slicing algorithm fails on it. Now the eip is 0x43f96a. Still the same problem: it depends on the previous instruction 11:00AM 07/18/2014 [16] read the slice full trace expansion algorithm and see if there are any chance that it may produce self-pointing depLink. [17] debugging plan: [1] run batch slice and find out where it broke [a] make a check in REGI (init_data_slice) and check tsTarget<tsCur, if not, error_exit [b] b gen_slice... and start from slice 37 (note: to do true clear of flags manually got branch 39. ts: 19457710 eip: 0x432084 [2] regenerate full slice and stop at that point [3] generate trace again. 9:00AM 07/19/2014 [18] debug the full_trace [18.1] break on ts: 19457709 and observe 3 more timestamps [20 min] [18.1.1] it breaks on Cache::loadBlock(0) -> loadBlockID 4th time. Strange. Not likely a timeout problem. Recompile the entire project again. It broke when loading rr_processor. the binfile is 0. The problem seems to be that rr_processor directory is wiped out everytime. It is called by init_rr_processor because the GEN_REQUEST_MODE is 1 then load_rr_processor is called. Rethink it's logic. 10:30AM [18.1.2] fix rr_processor. and then regenerate the ts and eip. (gdb) p ts $4 = 19451515 (gdb) p/x ier->eip $5 = 0x432084 [18.1.3] check the full load again. break on 1949514. 19451514 -> (eip: 0x431fb9) 19451515 -> (eip: 0x432081) 19451516 -> (eip: 0x432084) dpends on 1949515 19451517 -> (eip: 0x4320db) depends on others 19491478 Conclusion: in branch_slice, the timestamp 1949515 is essentially 1949516 [18.1.4] further check branch_slice: ts: 0, eip: 0x7c92289a ts: 10000, eip: 0x7c9220e0 ts: 19451500: eip: 0x431cea full_mode: ts: 0, eip: 7c92289a ts:10000, eip: 7c9220e0 ts: 19451500, eip: 431ce8 ts: Seems that there is a compact of raw slice of one instruction difference. 3:30PM 07/22/2014 [19] BINARY SEARCH of the incompatible place. approach: compare the full mode and branch_slice ts: break at line 217 of InstrExecRecorder.cc in full_mode break at Trace::gen_slice in branch mode and then use Trace->loadIER_II() to load and check. ts: 8000000, full: 0x563419, branch: 0x563419 ts: 16400000, full: 56db6b, branch: 56db6b ts: 16400010, full: 56db8c , branch: 56db8c ts: 16400020, full: 56dddd, branch: 56dddd ts: 16400021, full: 56dddf, branch: 56dddf ts: 16400022, full: 56dde1, branch: 56dde1 ts: 16400023, full: 56dde2, branch: 7c90eaec ****!!!!!! HERE'S THE DIFFERENT POINT. Why would branch slice miss one!!! ts: 16400024, full: 7c90eaec, branch: 7c90eaf0 ts: 16400025, full: 7c90eaf0, branch: 7c90eaf3 ts: 16500000, full: 76b42a67, branch: 76b42a6a ts: 17000000, full: 571769, branch: 57176b ts: 18000000, full: 57175b, branch: 57175d * ts: 19000000, full: 42677f, branch: 426781 **************************************************************** ts: 19451512: full: 431f42, branch: 431fb6 **** ts: 19451513: full: 431fb6, branch: 431fb9 ts: 19451514: full: 431fb9, branch: 432081 ts: 19451515: full: 432081, branch: 432084 ts: 19451516: full: 432084, branch: 4320db Strangely, it departs from trace id: 19451512 the EIPs are completely different from each other! Check winxp image and look at what is located at 0x431f42: JNZ 431FB6 0x431fb6: CMP DL, 95 0x411fb9: JNZ 432081 So there is a ONE difference between raw trace and full trace. Needs to re-do the timing of full trace. The trick is to use raw instead of the newly constructed full trace 07/30/2014 9:00am [1] identify exactly the location of error using binary search [20 min] 9:30 [2] analyze why branch slice miss the following record and causes the shift of one. ts: 16400022, full: 56dde1, branch: 56dde1 ts: 16400023, full: 56dde2, branch: 7c90eaec ****!!!!!! HERE'S THE DIFFERENT POINT. Why would branch slice miss one!!! [2.1] break on construct full trace timestamp 16500022 and see what is the full trace generated. [15 min] Found the problem: line 230 of InstrExecRecoder.cc skips the construction of the record because there is no record of the instruction in InstrStore. [2.2] first fix: the log error should be replaced by a Util::error call, because there is no way to proceed. 10:00 [2.3] verify if it's always the same for 56dde2 (eip). construct the raw trace first and then the full trace. [a] 1st time: 56dde2 [b] 2nd time: 56dde2 again. 10:30 [2.4] break on 56dde2 and see how it's handled. break on helper_trace2. check how many times 56dde2 is hit.. only hit once 10:45 [2.5] trace into 56dde2 temporarily change the timeout from 240 to 2400 because it will terminate the thread. It did add to the instructore store: opcode is 0xf. first 4 bytes are: 0x0f 0x3f 0x07 0x0b Problem: it seems that the instruction 0x0f, 3f, 07, 0b is not recognized as an instruction. Check it again. The problem is that x86_disasm reports that it's an invalid instruction. So it looks like a trick by the packer (it's an invalid instruction and then triggers the interrupt handler?) 11:15 [2.6] check 56dde1 and 56dde2 in theimider program. Use IMM At 56dde1, there is an INC EAX command (0x40) At 56dde2, this is an invalid instruction (0x0f 0x3f) Using IMM to run it breaks at 56dde2 which complains about an invalid instruction but then shift+f9 jumps to 56dded (I suspect that theimider has set up a certain exception handler technique to jump to it). [2.7] fix for the qemu: InstrInfo::load_instr (when the len is 0, which means that the current instruction is invalid, generate an alert message, and take at least one byte for length). This will allow that the search of the opcode will still succeed. 11:50 [2.8] unit testing. multiple errors. all passed. [2.9] run batch slice again and see if it breaks on 56dde2 full slice problem is avoided. branch_slice: set bp on Trace.cc:1445 and then re-init count and let it start directly at slice 39. Note line 1223 has a Util::error exit, it will break if the slice number is no ok. [2.10] problem: complain about rr_processor. Let it run fro mslice 0. Seems to be ok. Already processed to slice 45. 11:30AM 07/31/2014. There are still about 1/3 103s. Slicing did not find the debugger. Seems need to analyze themider and see what's the technique it is using. 8:4AM 08/01/2014. -------------------------------------------------------------------------------------- Task 342: speed up SOC problem. -------------------------------------------------------------------------------------- 8:55AM [1] collect stats. Let it run 1 hr and see how many slices it is generating. GENERATE branch 22 of 242 ***************** !!!! init_data_sice takes 93.410000 seconds. !!! identify SOCs takes 95.680000 seconds! !!!! init_data_sice takes 107.260000 seconds 10:00AM [2] design of modification. [1] in addTS: directly call addSOC. [5 min] DONE. [2] add private function: hasSOCStart and hasSOCEnd [15 min] DONE. [3] change identifySOC: remove getCountInSlice. It seems that insertSOC needs not to be done. [5 min] DONE. [4] fix getSOCStart [15 min] DONE. [4] debug: take a slice and do it. [25 min] [4.1] found problem with addTS. SOCManager::addTS, 7:15pm [3] RUN ANOTHER ROUND. for 1 hr. Does not improve a lot. GENERATE branch 23 of 242. Prolem: MAX soc size is too small. 10:00PM readjust the max soc size to 128000 from 128 and pass iteration from 5 to 10. Look at the result. 10:15pm start. 8:45AM 08/02/2014 [4] figure out why it's so slow. Sampling findInsertionLocation: 4 hasSOCStart/end: 5 It seems that most of the time are spent on sequential search. 9:15AM [5] collect the running data for slice 0. !! identify SOCs takes 276.960000 seconds! TOO MANY SOCs, use direct/one pass slicing! [6] improvement: [20 min] findInsertionLocation: use binary search. hasSOCStart and hasSOCEnd, all use binary search. DONE. after hasSOCStart and hasSOCEnd, improved to 135 seconds. [7] improvement on findInsertionLocation, just add a check for the last case. [15 min] New timing: 122 seconds. Not improve much. Still has to try binary search. --> push to later [8] increase MAX SOC limits again. first slice has 157451 SOCs. Stats listed below: !!! identify SOCs takes 345.100000 seconds! !!! write program takes 13.610000 !!! gen_slice takes 445.900000 seconds! [9] debug into the 0'th slice and look at how slices are propagated and see if there is anything we can do. Look at how init_slice are collected. first init_slice: 68. First couple of ts: 1153628. 1153627 (merged with previous one). 1153623. 1153622 (merge with previous. 1153618). First problem: do we really have to mark every ts, whose instruction is marked? Now, because of the check of ii->isInSlice(), first init_slice grows from 68 to 28763 -> 28674 ->28678->157456->. Another problem is when pass reaches the limit, it did not report fail. 7:30PM [10] remove the logic on ii and repair the logic on pass no. no need to fix pass. [11] debug slice #0. sm size: 51 -> 55 -> 1102 -> 221-> 209 There are too many passes and it broke. [12] check why there are so many passes set a breakpoint at the last line of socmanager::addTS and set breakpoint at all bSuccess=false in socmanager::verify_and_reset_soc. also break on bModified: 1153634, 1153636 Other more important causes: gen_bridge introduce new dependency. fix dump operations of InstrInfo and InstrexeCRecorder It's already existing. [13] check why setBridge needs to move along the chain. This seems a bad choice to introduce more. For exapmle: tsEnd: 1039116 (dec [edi]), 1039117 (inc edi), 1039118 (dec ecx) trace into setBridgeTo for 1039117. Problem: 1039117 is marked in slice? why? 08:45AM 08/04/2014 [14] check why 1039117 is included in slice and check when it is on bridge. [20 min] Observation: ier does not show 1039117 is in slice, but ii->isInSlice() is true. Set another breakpoint on ii->setInSlice on eip: 424945, instruction is "inc edi" 424945 is set in slice in init_data_slice for timestamp: 1039113. It is dependended by 1039116 (0x424945) When the setBridge is called: the current soc list has two socs: (1153635, 1153636), (1153627, 1153628), to SOC which tries to be added is: (1039116). Its bridge on 1039117 is failed because 1039117 has the same eip as 1039113, which is already in slice [1039113 is not added yet as an SOC]. This seems fine. Then it introduces new dependencies. The entire thing looks like a loop: @424943: dec [edi] #<--- jump back from 0x424947 @424945: inc edi @424946: dec ecx @424947: jnz 0xFFFFFFFC Check how many of these are actually in slice: (altogether 68 data points). 1153635: @424974, @424974: jnz 0x0000001B 1153628: @424953, @424953: sub edx, 0xF0000000 1153627: @42494e, @42494e: mov edx, 0xF0023000 #load contents of 0xf0023000 into edx == the rest are in the lop stragely it did not discover 1039116: @424943 1039116: @424943 1039113 @424945 1039112: @424943 1039109: @424945 1039108: @424943 1039105: @424945 1039104: @424943 1039096 @424943 repeating until: 1038925: @424945 ---- 1038923: @42493d: lea edi, [ebp+0x86B1935] (this may be loading from a global init data variable) ** So the entire slice should include hte first 3 instructions and the decoding procedure which reads from ebp+0x86b1935 and performs the 4 instruction decoding. Then the result is used to compare with 0xF000000. The slice should have 4 SOCs only!. ** but the algorithm does not yeild the best SOC solution. The current SOC works like this Add (1153635, 1153636) ok, add (1153627, 1153628) ok, Biw add 1939116, intends to use 1039117 as bridge but canot, because 1039113 in slice (but not added yet) 7:40PM continue the trace into the slicing algorithm. soc0. (1153635, @424874, 116536) --> why is 1153636 added? -- 1153636 is included because 1153635 is a jump instruction. Actually, seems no need. soc1. (1153628, @424653), (1153627, @42494e) soc2. (1039116@424943 dec [edi], which is read by @42494e). single soc (1039116, 1039116), but when finding bridge, 1039117 (@424945 inc edi is already in slice, in earlier iteration, 1039117 itself is not in IER slice) Then 1039118 @424946 dec ecx is not in slice, it is used as the bridge (2 bytes) -> it uses @424947 as well soc3. (1039113 @424945 inc edi, 1039113) -> this is wrong, because 1039116 is already used. WRONG. it uses the same bridge for overwriting. 8:45AM 08/05/2014 [1] read about Hadoop for future improvement. [0.75 hr] The algorithm needs major improvement. 9:30AM [2] design algorithm for improvement. [0.25 hr] 9:45AM [3] implementation [1] hasSOCStart and hasSOCEnd. [1 hr]] [a.1] should establish a hashmap for socstartEIP, socsendeip. [8 MIN] DONE. [a.2] hasSOCStart and hasSOCEnd change the parameter to EIP [10 min] DONE. [a.3] change call to hasSOCStart [8 min] DONE. [a.4] function addSOCStart and addSOCEnd [8 min] DONE. [a.5] modify insert_into_vec, add a parameter of trace. [15 min] DONE. [a.6] recheck the implementation any place that modifies soc.tsStart or tsEnd. [30 min]. DONE 11:00AM [a.6] debug into the trace [15 min] [b.1] debug into addSOCStart, addSOCEnd, hasSOCStart, hasSOCEnd, removeSOCStart, removeSOCEnd [b.2] when get to soc3 (1039113, @424945, should return true on eip). It now can compact the loop, but it's too over agressive. It now includes soc 1 (1153628, 1153627) which is too far away (when doing full slice, it's going to introduce too much). 12:00pm [b.3] Util::error_exit on already exists in map. check line 443 and examine what is being removed and what is being added. first time hits, generates the fault. $35 = 714089 >>>>>(gdb) p soc.tsStart $36 = 714078 [b.4] one more but related to line 431. hit 3 times. When line 431 tries to remove 385774, it broke. 7:30Pm set a BP at InstrExecRecorder: 385774 in trace.h and enalbe it after line 431 It may be removed by another ts who has the same EIP 0x604049. 8:45AM 08/06/2014 [4] debug the bridge algorithm. [4.1] for the first slice, after the first pass, check all SOCs. The first pass generates 4 slices: (1153635, 1153636), (1039104, 1153628), //THIS ONE IS OVER EXPANDED. (1039093, 1039101) (1038923, 1039090) This leads to explosion of dependencies 9:15AM [20 min] [4.2] find out why in the first pass, it expands from (1039104, 1039116) to 1153628. break on on first slice and break before pass++ It merges from soc 6: 1, 2, 2, 3, 4, 5, 2 (sm.size) The merge happens at slice count 5 -> 2. (after addTS) the related SOCs are: gdb) p sm.vecSOCs[2] {tsStart = 1039116, tsEnd = 1039116, bModified = true, tsBridge = 1039118, room = 3, tsNextStart = 1153627} {tsStart = 1039111, tsEnd = 1039114, bModified = true, tsBridge = 1039115, room = 3, tsNextStart = 1039116} {tsStart = 1039109, tsEnd = 1039109, bModified = true, tsBridge = 1039110, room = 1, tsNextStart = 1039111} It looks like all the bridge will fail. next ts to add is 1039108. the identifySOC merges and expand to 1153627. Problem: the search for end loop directs gets to 1153626. from 1039116. Needs more examination. Found that between 1039116 and 1153626 it is still inside the loop. So the dependency is only reading a very small portion of the data generated by the loop. Verified, the merge of slice does not cause any problem though, the last one is 1153626 (0x424949) which is just out of hte loop. there is no more improvement could be done for it. After the first pass, the SOCs are (1153635, 1153636), (1039104, 1153628), (1039093, 1039101), (1038923, 1039090) Note: during the set briding process, a lot of init_data_slice are called, which is not necessary because these bridges are later merged. The verify_and_reset_soc seems no ok, it did not count the bridges. Then in the second iteration, it introduces new data dependency. 11:30AM [4.3] check why verify_soc failed to identify failed bridges. Pass the first pass and then trace into it. soc0 doesnot need bridge soc1 (1039104, 1153628) use bridge 1153629: @42498f. success. soc2 (1039093, 1039101) uses bridge 1039102: @424946, it's part of the SOC but not in slice. [SO THIS IS WRONG HERE!] soc3 (1038923, 1039090) uses @424947 as the bridge is not fine as well. [4.4] proposed fix: 7:45PM [1] in InstrInfo add a flag IN_SOC, and add functions setInSOC(), unmarkInSOC(), isInSOC(), and update clear_in_slicetags in Trace() [15 min] DONE. [1.5] unit test. [10 min] DONE. [2] refactor: SOCManager::findNextSOCEnd() [15 min] DONE. [2.5] test the current implementation. [10 min] DONE. [3] modify soc::setBridgeTo it keeps searching until there is enough room. and make sure to update the tsEnd [20 min] DONE. 9:20AM 08/07/2014 [4] debug again. [30 min] problem. too slow. found an infinite loop in setBridge. [4.1] fix it and add soc end logic. [4.2] handling bug: eip not in map. [4.3] further protection when removing and add soc end. Problem: isINSOC is never hit because setInSOC is never called! 11:00AM [5] need to revamp the implementation of SOC, set the data members to be protected and set up the set methods. [5.1] add protected resetII_SOCsInRange(long long int tsStart, long long int tsEnd); [10 min] DONE [5.3] change all public attributes to protected [5 min] DONE. [5.4] add inline get function [10 min] DONE. [5.5] add inlin set function [10 min] DONE. [5.6] fix all syntax errors [20 min] DONE. --- UNIT TEST fails! [5.7] make it compatible with old version (use on trace) all unit test now passed. 8:45AM 08/08/2014 [6] debug the slice algorithm [40 min] [a] get and set methods setTsStart, setTsEnd OK. setIISOCInRange. OK. setBridgeTo OK. hasSOCStart, addSOCStart, removeSOCStart, removeSOCEnd, addSOCEnd ALL ok. 9:25AM [b] Trace::gen_slice [15 min] [b.1] bug: get_room is wrong. bridge should not be ok for loop ierations. So, setBridge should include it in SOC as well. 9:35AM [25 min] [c] fix SOC::setBridgeTo Success. Now after the 1st round the 3 SOCs are: {tsStart = 1153635, tsEnd = 1153636,} {tsStart = 1153627, tsEnd = 1153628} {tsStart = 1038923, tsEnd = 1039117} They are the most compact SOCs around the data slices. Now the problem is that after several rounds, new dependencies are introduced and the 3rd SOC grows very large. 10:30AM 08/08/2014 -------------------------------------------------------------------------------------- Task 343: simplify the SOC dependency problem (shrink its size) -------------------------------------------------------------------------------------- [1] trace into the full slice of SOC [1153635, 1153636] and see if there is any new dependency introduced [10 min] 1153636 introduced new dependency on 1038938 (which is not necessary) root cause: 1153636 should not be included. tsEnd can be a jump instruction. The next instruction will be its bridge. The only problem is the writeProgramExit --> verified: writeProgramExit will OVERWRITE the conditional jump instruction anyway. 11:45AM [15 min] [1.1] comment out getSOCEnd's part. Problem. Needs to comment out check on jump control of setBridgeTo. seems ok now, but now only two SOCs. needs to check. 7:30PM [1.2] debug into getSOCEnd and see why there are only two SOCs now. sm size shrinks from 3 to 2 at i=1039109 Normal, from 1039117 to 1153627 there are lots of loop iterations (which is not included in data slice). [2] trace into the full slice of SOC [1153627, 1153628] and see if there is any new dependency introduced [10 min] DONE. 9:30AM 08/09/2014 [3] trace into slicing. Two slices: soc0: [1153635, 1153635] soc1: [1038923, 1153628] but actual data slice should be from 1039116 (downward) [1] soc0: [1153635, 1153635]. @424974: jnz 0x0000001B the ii->isJumpNeedData() returns true and forces the bNoDataProgation to be false (will propagate data dependency). Dependencies: 1153628 This is in the second slice. CORRECT. the program exit does need the register condition. 1153628: @424953: sub edx, 0xF0000000 1039096: ins @424943: dec [edi] 1039100: ins @424943: dec [edi] 1039104: same 1039108: same 1039112: same 1039116: same It has memory dependency because it is self-extracted (6 bytes as the result of extraction). So one register dependency and 6 memory dependency.. All these are in the SOC1 [1038923, 1153628] and they did not introduce more than the original set of data slice [2] soc1: [1038923, 1153628] but the actual data slice should be from 1039116 (downward) 1153628: propagate to 1153627: mov edx, 0xF0023000 1038964: @424943: dec [edi] [the following are self-extraction data slice - should be originally there ] 1038968, @424943: dec [edi] 1038972, 1038976, 1038980: @424943: dec [edi] 1153626: control link Propagate to 1038924, 1038928, 1038932 ... 6 links (all self-extraction links) 1153625: jnz 0xFFFFFFFC (control link) 1153624: @424946: dec ecx (this is reasonable, because it is needed by 1153625) 1153622: @424943: dec [edi] (WHY???) 1153621: ins @424947: jnz 0xFFFFFFFC (control link), it leads to 1153620 1153620: @424946: dec ecx, depends on 1153616 1153619: @424943: dec [edi] (why?????) 11:15AM [2.1] check why @424943: dec [edi] is included. They are introduced as setVisit (because of control link of the block) Check propagation of such instructions: bVisitControlLink is true, bNoDataProgation is true. ii->hasMemopWithRegOp (because of EDI), then set bNoDataProgation to false!!! propagated dependency to: ins @424945: inc edi Strangely, it does not have memory dependency link! The other is the control link! Problems Fix: 11:30am [1] check why ts:1153619: @424943 does not have memory link? break on Trace::expandFromRaw and check the construction of memory link. No one writes to it, the contents is originally in the PE image. so it should be fine. But strangely ts shifts -1. Now problems: it crashes on appendRecord. Problme is that CallAdjustRecord when delete itself failed to write to cache. ==> FIXED. 7:15PM 08/09/2014 [2] when hasMemopWithReg should not enalbe bNoDataProgation. ==> OK. there is already code handling it. [3] FOR SOC is bSOC is set, the initial slice should include everything that is already in slice, should do a pre-scanning first. 7:30PM ==> a for loop which does the job. [4] tsEnd, if not originally in slice, only need to be set to needVisit. ==>modified 8:20pm [5] TEST [3] AND [4]. Seems fine. ---------------- soc0: [1153635, 1153635] soc1: [1038923, 1153628] but actual data slice should be from 1039116 (downward) ------------- After full_slice all soc, check what are the newly added slice instructions. Newly added TS: 1038922, ins @424938: mov ecx, 0x00007000 //NOTE. right before 1038923 (reasonable, it's the loop counter) Second pass leads to the following SOCs. ------------------------- soc0:{tsStart = 1153635, tsEnd = 1153635} soc1:{tsStart = 1038922, tsEnd = 1153628} [only the tsStart is shifted once to the left] ------------------------- Result: only 10 instructions in slice (in store). [STOP HERE] multiple pass passing error, check. DONE. 9:00AM 08/11/2014 -------------------------------------------------------------------------------------- Task 344: check the function processing error. -------------------------------------------------------------------------------------- [1] read processFunction code and check the error message. [2] take slice 12. Problem instruction: timestamp 3116233, eip: 7c91003f. ns @7c91039f: ret find out how it's included in slice. Run slice to slice 12. processFunction: tsStart: 3115445, tsEntry: 3115510, tsRightAfterRet: 3116236 9:00AM 08/12/2014 [30 min] [3] debug into processFunction and check when ts:3116233 is added. [a] set a breakpoint at call Trace::gen_slic, after the 1st run change the id [b] set a conditional bp at InstrExecRecorder->setInslice (compile) [c] set a conditional bp at Trace::processFunction on tsRightAfterRet: 3116236 take slice 12. Observation: [1] instruction 3116233 is first set in slice for full_slice of SOC: 3115445, 3116317 because ii is inSlice. Strangely, don't know why ii is in slice. [2] check process function, why it's not identified if it's already in slice. when processing 3116233, its reverse pointer is set to 5435027. It directly returns false, seems not right. The instruction is not needed for mem, reg, visit etc. fixed. there are two bugs (one simple condition problem) [4] still problems with slice 16, in the following: Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5596085, eip: 7c913288! Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5595297, eip: 7c915187! Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5595247, eip: 7c91506b! [1] check how is 5596085 is set. break on slice 16. it is set in slice by processFunction (entire function in slice) (5591054, 5596291) [callentry, rightAfterRet] Reports error in (5596022, 5596087) [a function nested inside] The question is why is its reversePointerType 3? first time the linkType is set to 7. It's 3. and not updated because it does not check the pass number. 9:15AM 08/13/2014 [5] seems to fix the complaint about processFunction. check the timing problem. DONE. 9:40AM [6] slice 18 again has problem. check. [6.1] add slice quality report (doesn't cost anything) add timing report for slice quality report. [6.2] issues: slice 12 fails on SOC identification. Check why. [6.3] identify data: slice 18. (5596022,5596087) for ts 5596085 Strangely did not find it. found that it's slice 19. It seems that it first needs one full slice, and then in the second slice it broke. Problem: this->pass_no_inslice is not cleared. fixed. 9:00AM 08/15/2014 -------------------------------------------------------------------------------------- Task 345: try to improve the data slice again. -------------------------------------------------------------------------------------- [1] observe slice 12. [1] let slice 0 do the init [2] set slice id to 12 [3] break on init_data slice and check the slice size. init data slice is 2790153 (big number!) first pass: sm size: 99. 2nd pass: sm size: 77 3rd pass: sm size: 66 4th pass: sm size: 63 5th pass: sm size: 61 6th pass: sm size: 57 [4] observe the init_data_slice what data are included. [1] 5435262, @426dac: jno 0x00000008 [2] 5435245, @426d8c: sub dx, 0x9B5D [3] 5435221, @426471: sub edx, ecx [4] 5435220, @42646b: xor ecx, 0x020A4E99 [5], 5435218, @426462: mov edx, [esp] [6] 5435217, ins @426461: pop ecx [7] 5435216, @426460: push edx [8] 5435215, add edx, 0xE9ECBFEB [9] 5435214, @426454: or edx, 0x78AD0634 [10] 5435213, @42644e: or edx, 0x67660CBB [11] 5435212, @426449: mov edx, 0x42A846CA [12] 5435211, @426446: mov [esp], edx [13] 5435208, @42643e: add edx, ebp [14] 5435206, ns @42643b: add edx, ebx [15] 5435204, @426438: add ebx, ecx [16] 5435203, @426432: xor ecx, 0x39C7B881 [17] 5435202, ins @426430: neg ecx [18] 5435201, ins @42642b: mov ecx, 0x0C5722D4 [19] 5435199, @426428: neg ebx [20] 5435198, ins @426423: mov ebx, 0x5E99253A [21] 5435194, ins @426417: mov edx, eax [22] 5435192, @426415: pop eax [23] 5435191, @426412: mov [esp], ebx [24] 5435188, @42640a: xor ebx, edi [25] 5435187, @426405: mov ebx, 0xA38B5286 [26] 5435186, ins @426400: mov edi, 0x6B5A7E3F XX [27] 5236091, ins @42619c: pop [ecx] //introduced by memory link. [5] figure out why 5236091 is included by 5435245. So, the memory link is introduced by the self-extraction. Otherwise, --------------------------------------------------- New goal - add an instruction trace for data tracing --------------------------------------------------- 2PM-3PM 08/18/2015 [1 hr] // [1] add a new option of datatracing "dtr" [20 min] [a] modify hmp-commands.h [b] modify monitor.c [c] modify handle.h //[2] give up direction 1. [10 min] [a] reverse the change on hmp-commands.h [3] modify the config.txt [10 min] [4] add constant for data trace [5] add the framework genExecDataTrace() [10 min] [6] add BatchAnalyzer::genTasksForGenDataTrace() [10 min] 3-5PM 08/18/2015 [1 hr] [7] fix cateogryToName [1 hr] [8] read about how keyboard event is processed [1 hr] debug trace into it. [a] handle_user_command (in monitor.c) -> "sendkey x" [b] hmp_send_key [c] qmp_send_key, e.g., for key "d". The value is *(keylist->value) values are: {kind = KEY_VALUE_KIND_QCODE, {data = 0x27, number = 39, qcode = Q_KEY_CODE_D}} keycode is 32. note the call of keycode_from_keyvalue(p->value); [d] kbd_put_keyboard [e] qemu_put_keyboard_event [f] ps2_put_keyboard [g] ps2_queue --> the opaque parameter is the PS2State, which can be monitored (see who's reading it). use command like "awatch *0x28e08d4c" (monitor the read and write access) [h] -> found that the following functions are called #0 0x08156dc2 in ps2_read_data (opaque=0x28e08d00) at hw/ps2.c:191 #1 0x0814e497 in kbd_read_data (opaque=0x28e07b44, addr=0, size=1) at hw/pckbd.c:323 [g] find out the current EIP of the instruction : approach: set bp on the helper_trace2 and print eip_in, the "in" instruction is locted the previous one 0x806f48af (next instruction) @EIP 0x806f48ae: length: (1): in %dx, %al //***** @EIP 0x806f48af: length: (3): ret $0x0004 @EIP 0x806f48b2: length: (2): mov %edi, %edi @EIP 0x806f48b4: length: (2): xor %eax, %eax [h] the problem is that env->arrRegs has nothing but 0 (not updated actually because dynamic run-time translation by qemu) [g] *** in gen_save_regs_before_instr, modify the condition so that for instruction 0x806f48ae, the instruction register values are recorded. !!!!! 1:00PM-2:00PM [1] should indicate the registers to RECORD!!!! [a] in translate.c:800 [b] debug into it. 1. b ps2_queue (on keyboard) 2. b ops_sse.h:2480 and see how it's been read in EDX register. display/x eip_in display/x env->arrRegs Expected data: {kind = KEY_VALUE_KIND_QCODE, {data = 0x27, number = 39, qcode = Q_KEY_CODE_D}} keycode: 0x20 keycode is 32. 3. b ps2_read_data Strange. Observation is that many IN instructions are executed before ps2_read_data and the data of registers are not updated. [2] check if ps2_read_data is getting the keycode. Yes, val read is 0x20 (32) (for char d) --> memory_region_read_accessor *value |= (tmp & mask) << shift; //note tmp is 0x20 --> *value is // no change, value to return is sitll 0x20 --> iorange_read (address is 0x60) --> helper_inb (addr is 0x60), still return 0x20 --> then it goes to helper_trace2 for the next instruction, dump of regs env->arrRegs = {0x0, 0x0, 0x2ee0, 0x60, 0x1, 0x8055068c, 0x805506a0, --> it seems that EDX has the right value but EAX does not have it. [3] trace into ps2_read_data and trace why it's not writing to EAX. it should be located at: p/x &(env->arrRegs[0]) $21 = 0x28dcf55c (env->arrRegs[0]) Observation: esp is copied to 0x28dcf550, ebp is copied to 0x28dcf554, $eax value is copied to 0x28dc6390, [4] found the problem: because registers are copied before instruction, should take it RIGHT before the instruction AFTER the in_instruction change the capture address to 0x60 ---> still does not work.