Menu

Tree [593977] master /
 History

HTTPS access


File Date Author Commit
 QMP 2013-08-22 csc288 csc288 [174374] Initial Source
 audio 2013-08-22 csc288 csc288 [174374] Initial Source
 backends 2013-08-22 csc288 csc288 [174374] Initial Source
 block 2013-08-22 csc288 csc288 [174374] Initial Source
 bsd-user 2013-08-22 csc288 csc288 [174374] Initial Source
 default-configs 2013-08-22 csc288 csc288 [174374] Initial Source
 disas 2013-08-22 csc288 csc288 [174374] Initial Source
 docs 2013-08-22 csc288 csc288 [174374] Initial Source
 fpu 2013-08-22 csc288 csc288 [174374] Initial Source
 fsdev 2013-08-22 csc288 csc288 [174374] Initial Source
 gdb-xml 2013-08-22 csc288 csc288 [174374] Initial Source
 hw 2014-02-18 Xiang Fu Xiang Fu [f8dab5] FINALLY FIXED the batch processing (speed up lo...
 include 2014-03-21 Xiang Fu Xiang Fu [d3cdbb] fix errors about notepad.exe, half way.
 ldscripts 2013-08-22 csc288 csc288 [174374] Initial Source
 libcacard 2013-08-22 csc288 csc288 [174374] Initial Source
 linux-headers 2013-08-22 csc288 csc288 [174374] Initial Source
 linux-user 2013-08-22 csc288 csc288 [174374] Initial Source
 net 2013-08-22 csc288 csc288 [174374] Initial Source
 pc-bios 2013-08-22 csc288 csc288 [174374] Initial Source
 pixman 2014-03-02 Xiang Fu Xiang Fu [076e16] FIRST VERSION THAT CAPTURES detection of debugg...
 qapi 2013-08-22 csc288 csc288 [174374] Initial Source
 qga 2013-08-22 csc288 csc288 [174374] Initial Source
 qobject 2013-08-22 csc288 csc288 [174374] Initial Source
 qom 2013-08-22 csc288 csc288 [174374] Initial Source
 roms 2013-08-22 csc288 csc288 [174374] Initial Source
 runproc 2014-01-11 csc288 csc288 [52f858] added runproc
 scripts 2013-08-22 csc288 csc288 [174374] Initial Source
 slirp 2013-08-22 csc288 csc288 [174374] Initial Source
 stubs 2013-08-22 csc288 csc288 [174374] Initial Source
 sysconfigs 2013-08-22 csc288 csc288 [174374] Initial Source
 target-alpha 2013-08-22 csc288 csc288 [174374] Initial Source
 target-arm 2013-08-22 csc288 csc288 [174374] Initial Source
 target-cris 2013-08-22 csc288 csc288 [174374] Initial Source
 target-i386 2015-09-09 Xiang Fu Xiang Fu [593977] added processing of data tracing.
 target-lm32 2013-08-22 csc288 csc288 [174374] Initial Source
 target-m68k 2013-08-22 csc288 csc288 [174374] Initial Source
 target-microblaze 2013-08-22 csc288 csc288 [174374] Initial Source
 target-mips 2013-08-22 csc288 csc288 [174374] Initial Source
 target-openrisc 2013-08-22 csc288 csc288 [174374] Initial Source
 target-ppc 2013-08-22 csc288 csc288 [174374] Initial Source
 target-s390x 2013-08-22 csc288 csc288 [174374] Initial Source
 target-sh4 2013-08-22 csc288 csc288 [174374] Initial Source
 target-sparc 2013-08-22 csc288 csc288 [174374] Initial Source
 target-unicore32 2013-08-22 csc288 csc288 [174374] Initial Source
 target-xtensa 2013-08-22 csc288 csc288 [174374] Initial Source
 tcg 2014-03-21 Xiang Fu Xiang Fu [d3cdbb] fix errors about notepad.exe, half way.
 tests 2013-08-22 csc288 csc288 [174374] Initial Source
 trace 2013-08-22 csc288 csc288 [174374] Initial Source
 traceinstr 2015-09-09 Xiang Fu Xiang Fu [593977] added processing of data tracing.
 ui 2014-03-02 Xiang Fu Xiang Fu [076e16] FIRST VERSION THAT CAPTURES detection of debugg...
 util 2013-08-22 csc288 csc288 [174374] Initial Source
 ! 2013-08-22 csc288 csc288 [174374] Initial Source
 .exrc 2013-08-22 csc288 csc288 [174374] Initial Source
 .gitignore 2014-06-21 Xiang Fu Xiang Fu [25ba97] changed gitignore to ignore test data
 .gitmodules 2013-08-22 csc288 csc288 [174374] Initial Source
 .mailmap 2013-08-22 csc288 csc288 [174374] Initial Source
 .vimrc 2015-09-09 Xiang Fu Xiang Fu [593977] added processing of data tracing.
 CODING_STYLE 2013-08-22 csc288 csc288 [174374] Initial Source
 COPYING 2013-08-22 csc288 csc288 [174374] Initial Source
 COPYING.LIB 2013-08-22 csc288 csc288 [174374] Initial Source
 Changelog 2013-08-22 csc288 csc288 [174374] Initial Source
 HACKING 2013-08-22 csc288 csc288 [174374] Initial Source
 LICENSE 2013-08-22 csc288 csc288 [174374] Initial Source
 MAINTAINERS 2013-08-22 csc288 csc288 [174374] Initial Source
 Makefile 2013-08-22 csc288 csc288 [174374] Initial Source
 Makefile.objs 2013-08-22 csc288 csc288 [174374] Initial Source
 Makefile.target 2013-08-22 csc288 csc288 [174374] Initial Source
 README 2015-09-09 Xiang Fu Xiang Fu [593977] added processing of data tracing.
 TODO 2013-08-22 csc288 csc288 [174374] Initial Source
 VERSION 2013-08-22 csc288 csc288 [174374] Initial Source
 aio-posix.c 2013-08-22 csc288 csc288 [174374] Initial Source
 aio-win32.c 2013-08-22 csc288 csc288 [174374] Initial Source
 arch_init.c 2013-08-22 csc288 csc288 [174374] Initial Source
 async.c 2013-08-22 csc288 csc288 [174374] Initial Source
 balloon.c 2013-08-22 csc288 csc288 [174374] Initial Source
 block-migration.c 2013-08-22 csc288 csc288 [174374] Initial Source
 block.c 2013-08-22 csc288 csc288 [174374] Initial Source
 blockdev-nbd.c 2013-08-22 csc288 csc288 [174374] Initial Source
 blockdev.c 2013-08-22 csc288 csc288 [174374] Initial Source
 blockjob.c 2013-08-22 csc288 csc288 [174374] Initial Source
 bt-host.c 2013-08-22 csc288 csc288 [174374] Initial Source
 bt-vhci.c 2013-08-22 csc288 csc288 [174374] Initial Source
 cmd.c 2013-08-22 csc288 csc288 [174374] Initial Source
 cmd.h 2013-08-22 csc288 csc288 [174374] Initial Source
 configure 2013-08-22 csc288 csc288 [174374] Initial Source
 coroutine-gthread.c 2013-08-22 csc288 csc288 [174374] Initial Source
 coroutine-sigaltstack.c 2013-08-22 csc288 csc288 [174374] Initial Source
 coroutine-ucontext.c 2013-08-22 csc288 csc288 [174374] Initial Source
 coroutine-win32.c 2013-08-22 csc288 csc288 [174374] Initial Source
 cpu-exec.c 2013-08-22 csc288 csc288 [174374] Initial Source
 cpus.c 2013-08-22 csc288 csc288 [174374] Initial Source
 cputlb.c 2013-08-22 csc288 csc288 [174374] Initial Source
 device_tree.c 2013-08-22 csc288 csc288 [174374] Initial Source
 disas.c 2013-08-22 csc288 csc288 [174374] Initial Source
 dma-helpers.c 2013-08-22 csc288 csc288 [174374] Initial Source
 dump-stub.c 2013-08-22 csc288 csc288 [174374] Initial Source
 dump.c 2013-08-22 csc288 csc288 [174374] Initial Source
 err.txt 2015-09-09 Xiang Fu Xiang Fu [593977] added processing of data tracing.
 error.txt 2014-03-02 Xiang Fu Xiang Fu [076e16] FIRST VERSION THAT CAPTURES detection of debugg...
 exec.c 2013-08-22 csc288 csc288 [174374] Initial Source
 gdbstub.c 2013-08-22 csc288 csc288 [174374] Initial Source
 hmp-commands.hx 2013-08-22 csc288 csc288 [174374] Initial Source
 hmp.c 2013-08-22 csc288 csc288 [174374] Initial Source
 hmp.h 2013-08-22 csc288 csc288 [174374] Initial Source
 iohandler.c 2013-08-22 csc288 csc288 [174374] Initial Source
 ioport.c 2013-08-22 csc288 csc288 [174374] Initial Source
 kvm-all.c 2013-08-22 csc288 csc288 [174374] Initial Source
 kvm-stub.c 2013-08-22 csc288 csc288 [174374] Initial Source
 main-loop.c 2014-05-28 Xiang Fu Xiang Fu [46d76b] STILL stuck on performance issue. too slow.
 memory.c 2013-08-22 csc288 csc288 [174374] Initial Source
 memory_mapping-stub.c 2013-08-22 csc288 csc288 [174374] Initial Source
 memory_mapping.c 2013-08-22 csc288 csc288 [174374] Initial Source
 migration-exec.c 2013-08-22 csc288 csc288 [174374] Initial Source
 migration-fd.c 2013-08-22 csc288 csc288 [174374] Initial Source
 migration-tcp.c 2013-08-22 csc288 csc288 [174374] Initial Source
 migration-unix.c 2013-08-22 csc288 csc288 [174374] Initial Source
 migration.c 2013-08-22 csc288 csc288 [174374] Initial Source
 monitor.c 2013-08-22 csc288 csc288 [174374] Initial Source
 nbd.c 2013-08-22 csc288 csc288 [174374] Initial Source
 os-posix.c 2013-08-22 csc288 csc288 [174374] Initial Source
 os-win32.c 2013-08-22 csc288 csc288 [174374] Initial Source
 page_cache.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qapi-schema-test.json 2013-08-22 csc288 csc288 [174374] Initial Source
 qapi-schema.json 2013-08-22 csc288 csc288 [174374] Initial Source
 qdict-test-data.txt 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-bridge-helper.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-char.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-coroutine-io.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-coroutine-lock.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-coroutine-sleep.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-coroutine.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-doc.texi 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-img-cmds.hx 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-img.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-img.texi 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-io.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-log.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-nbd.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-nbd.texi 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-options-wrapper.h 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-options.h 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-options.hx 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-seccomp.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-tech.texi 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu-timer.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qemu.sasl 2013-08-22 csc288 csc288 [174374] Initial Source
 qmp-commands.hx 2013-08-22 csc288 csc288 [174374] Initial Source
 qmp.c 2013-08-22 csc288 csc288 [174374] Initial Source
 qtest.c 2013-08-22 csc288 csc288 [174374] Initial Source
 readline.c 2013-08-22 csc288 csc288 [174374] Initial Source
 rules.mak 2014-03-21 Xiang Fu Xiang Fu [d3cdbb] fix errors about notepad.exe, half way.
 rules.mak.VERBSE 2013-08-22 csc288 csc288 [174374] Initial Source
 rules.mak.old 2013-08-22 csc288 csc288 [174374] Initial Source
 savevm.c 2014-02-18 Xiang Fu Xiang Fu [f8dab5] FINALLY FIXED the batch processing (speed up lo...
 spice-qemu-char.c 2013-08-22 csc288 csc288 [174374] Initial Source
 startgit.sh 2013-09-03 csc288 csc288 [4e8912] more changes made on slicing algorithm.
 tcg-runtime.c 2013-08-22 csc288 csc288 [174374] Initial Source
 tci.c 2013-08-22 csc288 csc288 [174374] Initial Source
 test1 2013-08-22 csc288 csc288 [174374] Initial Source
 thread-pool.c 2013-08-22 csc288 csc288 [174374] Initial Source
 thunk.c 2013-08-22 csc288 csc288 [174374] Initial Source
 timer.txt 2014-03-02 Xiang Fu Xiang Fu [076e16] FIRST VERSION THAT CAPTURES detection of debugg...
 trace-events 2013-08-22 csc288 csc288 [174374] Initial Source
 translate-all.c 2013-08-22 csc288 csc288 [174374] Initial Source
 translate-all.h 2013-08-22 csc288 csc288 [174374] Initial Source
 user-exec.c 2013-08-22 csc288 csc288 [174374] Initial Source
 version.rc 2013-08-22 csc288 csc288 [174374] Initial Source
 vl.c 2014-05-28 Xiang Fu Xiang Fu [46d76b] STILL stuck on performance issue. too slow.
 xbzrle.c 2013-08-22 csc288 csc288 [174374] Initial Source
 xen-all.c 2013-08-22 csc288 csc288 [174374] Initial Source
 xen-mapcache.c 2013-08-22 csc288 csc288 [174374] Initial Source
 xen-stub.c 2013-08-22 csc288 csc288 [174374] Initial Source

Read Me

Read the documentation in qemu-doc.html or on http://wiki.qemu.org
****
 if anything goes wrong, place a breakpoint at raise_exception_err
****
- QEMU team
----------------- the following are MOD logs by Dr. Xiang Fu -----------
(1) --------- OBJECTIVE: instruction trace
  *** Take popcnt as an example. ***
1. modify ops_sse.h add void trace(uint32_t eip)
2. modify ops_sse_header.h add the DEF_HELPER_1(...) and call 
   trace in disas_insn
3. the problem is the system is always complaining about macro definition
of parameter mismatch. The PROBLEM is with TARGET_ULONG, the system is 
always treating it i32. And it always crashed.
4. th debug:
size calculation (tcg.cc 2060). Debug the system by 
  ./configure --enable-debug --disable-pie
  sudo make install
  gdb qemu-system-i386
  run -m 512 -hda winxp.img 
5. solution: don't pass TARGET_ULONG, pass ENV instead. Then use env->eip.

(2) ---------------- IMPORTANT CODE NOTES
 1. CPU op code defined in cpu.h, there is an ENUM construct
 2.int_helper.cc defines the handling (emulatioN) of integer operations
 3.ops_sse_header.h defines all helper functions for all instructions
   translate.c disas_insn is the key!!!
   TO UNDERSTAND ITS LOGIC, use GDB
   run -m 512 -hda winxp.img -no-kvm
 4. inside the body of disas_insn is the handling for each instruction, however,
 seems all functions not well wrapped as a disassembler. cannot actually
 print the instruction.
 5. MEMORY READING: achieved using MMU translation.
       env->tlb_table[mmu_idx][page_idx]
       Once addr is obtained, we can call ldsb ... functions to retrieve word.
       Or read it directly if we do not care about machine endianess
	as we are pretty sure to work on x86 platform here.
	cpu_ldub_code ... function is defined in include/qemu/bswap.h 
	could be combined with the use of libdis library to disassemble (more
	convenient than modifying the translate.c)
(3) --------------- WRITE a  disassembler
  mostly ok. one sample file in the test directory. 
  Need to copy libdisasm.so to /usr/lib (or specify the directory in configure)
  in Makefile.target and Makefile for line "libs +=" append -ldisasm
  Note: CPU_LDL... are defined in include/exec/cpu_all.h
(4) ------------------- TRACE NOTEPAD.exe -------------------
  (a) CR[3] stores the page table entry address
  (b) read TIB and PEB to get the process name, note that however,
       FS[0]'s data HOWEVER, is sometimes not pointing to the right data 
	50% of the times. 
  (c) To solve (b), we need to study target-i386/seg_helper.c. Here is summary:
	[1]. load_segment(env, *e1, *e2, selector) retrieves a SEGMENT
		descriptor entry from LDT/GDT, based on the selector value.
		selector could be CS/DS/ES/FS/GS etc (16 bit). It's 3rd bit
		decides which DT to read (LDT or GDT).

		Each entry has two words, and they are stored into e1 and e2.
	[2] Now with e1 and e2, the base and limit of the segment can be
		calculated using get_seg_base(e1, e1) and get_seg_limit
	[3] tss_load_seg(env, seg_reg, selector) given selector updates the
		SegmentCache[seg_reg] entry. This is an INTERNAL (not real
		hardware) cache for the information of segment.
		NOTE that it has side-effect. We should try to avoid calling it.
	[4] tss_load_seg(...) calls cpu_x86_load_seg_cache(...). 
		cpu_x86_load_seg_cache() is a simple function that
		sets the contents of the SegmentCache[seg_reg] entry
  (d) plan: to get the right FS[0]. We will do:
	load_segment(env, e1, e2, VALUE_OF_FS)
	base = get_seg_base(e1, e2);
	Problem: what is the value of FS register? should be somewhere in
	CPUX86State.
	It seems that VALUE_OF_FS IS the CPUX86State.segs[R_FS].selector
	(this can be evidenced by the implementation of "push fs" in 
	translate.c -- however, this eventually leads to the problem:
	for about 50% of the processes, the dump of FS:[0] is NOT right!)
  (e) After dumping every 100 instruction of the same process find that
	50% of the time, when instruction address is 0x0804xxxx (kernel space)
	the FS register is not pointing to TIB structure. Clearly, 
	the FS value is saved some where ...

	It might be retrieve the FS from TSS (Task Store State), however,
	the logic is quite complex, we need to look at why the kernel is
	invoked (e.g., via JMP or via IRET or interrupt) etc.

	We'll use a little bit costly but do-able approach:
	Simply check FS:[0x18] is the address of FS:[0]. If not match,
	simply go ahead because it's not the instruction to take the
	process name

(5) ----------------- MINI Project 6: I/O--------
	(1) How is keyboard input handled?	
	   QEMU uses VNC console. In the console, when a key is pressed,
       a funtion named "kbd_put_keycode" in ui/input.c is called. Then
	function ps2_put_keycode in hw/ps2.c is called. NOTE that the 
	keycode is in "keyscan code" (not ascii). Note in ps2.c there
	are two states: PS2Keyboardstate and PS2BoardState. Each maintains
	a queue of events. So ps2_put_keycode is to simply append the
	keycode to the queue. hw/ps2.c also provides a function ps2_read_data()
	to read out the data, this is clearly prepared for those "in instruction".  Sequence of functions:  kbd_read_data->ps2_read_data.

  	Now let's look at the QEMU emulator side. When an instruction "INB eax, 0x64) is met", disas_insn is called. Note the INB opcode is 0xec (INB). 
The translated code includes "cpu_inb" in ioport.c. It then calls ioport_read,
 which based on a function pointer array "ioport_read_table", determines
 which function to call. The table is initialized by register_ioport_read.
 Using GDB (set a bp on register_ioport_read), we can find that
  it's a general function ioport_readw_thunk registerd, which calls
  IORange->ops->read. We didn't spend time on investigate how the reigstration
  is done, next we'll use breakpoint in GDB to find out which I/O port is used
  for getting user input from keyboard.  (Here software BP seems ok, hardware
	BP seems slightly faster).

   **********************************************************
	1) gdb qemu-system-i386
	2)  b main
	3) run -m 256 -hda winxp.img -no-kvm 
	!!! when we see SIGUSR1 type
	>> handle SIGUSR1 noprint
		Note: don't type "HANDLE SIGUSR1 ignore" -> it's going to
		break the running of the system
	5) Ctrl+c and then set the breakpoint on kbd_read_data.
		!!! PAY ATTENTION IF WE STOP AT THE BREAKPOINT, THE
		ENTIRE SYSTEM IS FROZEN (BECAUSE KEYBOARD IS CAPTURED).
		!!! We need to let the system continue and print out
		!!! the information we want
		*** set BP at line 324 of pckbd.c (which is the branch for
		handling keyboard events)
		>> b pckbd.c:325 if val==0x1e (if "a" is pressed, the scancode
			for 'a' is 0x1e)
		>> commands
		  > backtrace (we want to see who calls keyboard function)
		  >>silent
		  >> cont
		  >>end  
		Somehow, the above still freezes the system -- not sure why, thesame trick works well with other breakpoints on other parts of the systme
	        BUT, we are able to see the dump of backtrace

	 	ioport.c  cpu_inb  calls
		          ioport_read calls (retrieves func pointer from
					      ioport_table_read) port:???
	         memory.c  memory_region_iorange_read
				memory_region_read_access calls
		 pckbd.c        kbd_read_data note that the input parameter
				'addr' is NOT used.
			  
  CONCLUSION: the "INB EAX, 0x60" instruction is used to read from KEYBOARD. Port number is 0x60!!!			  

   **********************************************************

(6) --------------- PORTING TO C++ ------------------------
   Objective: add an instruction class to the sytem so that we can use STL.
   Approach: establish a folder called "traceinstr", and create the Makefile as usual. Then in the root folder Makefile, add a ".PHONY" rule about traceinstr, and add rules "make -C traceinstr" to compile the directory as a target.
   The real trouble is the linker. Most sytems are compiled using "cc" (c 
compiler), but needs to link with the files in "traceinstr". We have to use
"extern 'C'" to wrap the functions to be imported by the C language part
   Note we also need to add "-L../traceinstr -linstr" to the Makefile of the i386-softmmu folder.
  then in rules.mak replace "LINK" definition with "g++" (however, will trigger an error about main missing in multiboot.o) Then go to pc-bios/optionrom/Makefile and change the call of $(LD) directly to "ld"!!! (this is to allow the
BIOS stuff to use C compiler and C linker, but the other parts of qemu
would still be ok).

(7) ------------------ MEMORY TRACING ----------------------------
  Objective: we'll track which addresses is each instruction reading from and writing to.
  Idea: instrument cpu_x86_ldsb_code etc. Using GDB, we can find out that
 all the memory loading and saving operations are essentially defined as
 macros in include/exec/softmmu_header.h:85. [it looks like a macro of
 the same function for many different types of data sizes, something like
 a C languaguage version of C++ template].  This saves code space, but is 
 very UGLY and creates a lot of trouble. If we want to instrument and print
 cr3, note at at line 85, the CPUArchState can be CPUX86State or CPUAlphaState
 (which does not have cr3).

  The trick is to also use MACROs to define a macro "dummy_cr3" and use it
 in the definition in softmmu_header.h. Then "dummy_cr3" is SPECIALLY redefined
 in cpu.h in target_i386. 

(8) ----------------- NETWORK ---------------------------
  The trick is to specify the model rtl8136, see run.sh in qemu_image folder.

(9) ------------------ DEBUG MEMORY ERRORS -----------------------
  Sometimes, if we read error (causing segmentation relted faults), we want
 to intercept the error. Set BP on "
****

- QEMU team
----------------- the following are MOD logs by Dr. Xiang Fu -----------
(1) --------- OBJECTIVE: instruction trace
  *** Take popcnt as an example. ***
1. modify ops_sse.h add void trace(uint32_t eip)
2. modify ops_sse_header.h add the DEF_HELPER_1(...) and call 
   trace in disas_insn
3. the problem is the system is always complaining about macro definition
of parameter mismatch. The PROBLEM is with TARGET_ULONG, the system is 
always treating it i32. And it always crashed.
4. th debug:
size calculation (tcg.cc 2060). Debug the system by 
  ./configure --enable-debug --disable-pie
  sudo make install
  gdb qemu-system-i386
  run -m 512 -hda winxp.img 
5. solution: don't pass TARGET_ULONG, pass ENV instead. Then use env->eip.

(2) ---------------- IMPORTANT CODE NOTES
 1. CPU op code defined in cpu.h, there is an ENUM construct
 2.int_helper.cc defines the handling (emulatioN) of integer operations
 3.ops_sse_header.h defines all helper functions for all instructions
   translate.c disas_insn is the key!!!
   TO UNDERSTAND ITS LOGIC, use GDB
   run -m 512 -hda winxp.img -no-kvm
 4. inside the body of disas_insn is the handling for each instruction, however,
 seems all functions not well wrapped as a disassembler. cannot actually
 print the instruction.
 5. MEMORY READING: achieved using MMU translation.
       env->tlb_table[mmu_idx][page_idx]
       Once addr is obtained, we can call ldsb ... functions to retrieve word.
       Or read it directly if we do not care about machine endianess
	as we are pretty sure to work on x86 platform here.
	cpu_ldub_code ... function is defined in include/qemu/bswap.h 
	could be combined with the use of libdis library to disassemble (more
	convenient than modifying the translate.c)
(3) --------------- WRITE a  disassembler
  mostly ok. one sample file in the test directory. 
  Need to copy libdisasm.so to /usr/lib (or specify the directory in configure)
  in Makefile.target and Makefile for line "libs +=" append -ldisasm
  Note: CPU_LDL... are defined in include/exec/cpu_all.h
(4) ------------------- TRACE NOTEPAD.exe -------------------
  (a) CR[3] stores the page table entry address
  (b) read TIB and PEB to get the process name, note that however,
       FS[0]'s data HOWEVER, is sometimes not pointing to the right data 
	50% of the times. 
  (c) To solve (b), we need to study target-i386/seg_helper.c. Here is summary:
	[1]. load_segment(env, *e1, *e2, selector) retrieves a SEGMENT
		descriptor entry from LDT/GDT, based on the selector value.
		selector could be CS/DS/ES/FS/GS etc (16 bit). It's 3rd bit
		decides which DT to read (LDT or GDT).

		Each entry has two words, and they are stored into e1 and e2.
	[2] Now with e1 and e2, the base and limit of the segment can be
		calculated using get_seg_base(e1, e1) and get_seg_limit
	[3] tss_load_seg(env, seg_reg, selector) given selector updates the
		SegmentCache[seg_reg] entry. This is an INTERNAL (not real
		hardware) cache for the information of segment.
		NOTE that it has side-effect. We should try to avoid calling it.
	[4] tss_load_seg(...) calls cpu_x86_load_seg_cache(...). 
		cpu_x86_load_seg_cache() is a simple function that
		sets the contents of the SegmentCache[seg_reg] entry
  (d) plan: to get the right FS[0]. We will do:
	load_segment(env, e1, e2, VALUE_OF_FS)
	base = get_seg_base(e1, e2);
	Problem: what is the value of FS register? should be somewhere in
	CPUX86State.
	It seems that VALUE_OF_FS IS the CPUX86State.segs[R_FS].selector
	(this can be evidenced by the implementation of "push fs" in 
	translate.c -- however, this eventually leads to the problem:
	for about 50% of the processes, the dump of FS:[0] is NOT right!)
  (e) After dumping every 100 instruction of the same process find that
	50% of the time, when instruction address is 0x0804xxxx (kernel space)
	the FS register is not pointing to TIB structure. Clearly, 
	the FS value is saved some where ...

	It might be retrieve the FS from TSS (Task Store State), however,
	the logic is quite complex, we need to look at why the kernel is
	invoked (e.g., via JMP or via IRET or interrupt) etc.

	We'll use a little bit costly but do-able approach:
	Simply check FS:[0x18] is the address of FS:[0]. If not match,
	simply go ahead because it's not the instruction to take the
	process name

(5) ----------------- MINI Project 6: I/O--------
	(1) How is keyboard input handled?	
	   QEMU uses VNC console. In the console, when a key is pressed,
       a funtion named "kbd_put_keycode" in ui/input.c is called. Then
	function ps2_put_keycode in hw/ps2.c is called. NOTE that the 
	keycode is in "keyscan code" (not ascii). Note in ps2.c there
	are two states: PS2Keyboardstate and PS2BoardState. Each maintains
	a queue of events. So ps2_put_keycode is to simply append the
	keycode to the queue. hw/ps2.c also provides a function ps2_read_data()
	to read out the data, this is clearly prepared for those "in instruction".  Sequence of functions:  kbd_read_data->ps2_read_data.

  	Now let's look at the QEMU emulator side. When an instruction "INB eax, 0x64) is met", disas_insn is called. Note the INB opcode is 0xec (INB). 
The translated code includes "cpu_inb" in ioport.c. It then calls ioport_read,
 which based on a function pointer array "ioport_read_table", determines
 which function to call. The table is initialized by register_ioport_read.
 Using GDB (set a bp on register_ioport_read), we can find that
  it's a general function ioport_readw_thunk registerd, which calls
  IORange->ops->read. We didn't spend time on investigate how the reigstration
  is done, next we'll use breakpoint in GDB to find out which I/O port is used
  for getting user input from keyboard.  (Here software BP seems ok, hardware
	BP seems slightly faster).

   **********************************************************
	1) gdb qemu-system-i386
	2)  b main
	3) run -m 256 -hda winxp.img -no-kvm 
	!!! when we see SIGUSR1 type
	>> handle SIGUSR1 noprint
		Note: don't type "HANDLE SIGUSR1 ignore" -> it's going to
		break the running of the system
	5) Ctrl+c and then set the breakpoint on kbd_read_data.
		!!! PAY ATTENTION IF WE STOP AT THE BREAKPOINT, THE
		ENTIRE SYSTEM IS FROZEN (BECAUSE KEYBOARD IS CAPTURED).
		!!! We need to let the system continue and print out
		!!! the information we want
		*** set BP at line 324 of pckbd.c (which is the branch for
		handling keyboard events)
		>> b pckbd.c:325 if val==0x1e (if "a" is pressed, the scancode
			for 'a' is 0x1e)
		>> commands
		  > backtrace (we want to see who calls keyboard function)
		  >>silent
		  >> cont
		  >>end  
		Somehow, the above still freezes the system -- not sure why, thesame trick works well with other breakpoints on other parts of the systme
	        BUT, we are able to see the dump of backtrace

	 	ioport.c  cpu_inb  calls
		          ioport_read calls (retrieves func pointer from
					      ioport_table_read) port:???
	         memory.c  memory_region_iorange_read
				memory_region_read_access calls
		 pckbd.c        kbd_read_data note that the input parameter
				'addr' is NOT used.
			  
  CONCLUSION: the "INB EAX, 0x60" instruction is used to read from KEYBOARD. Port number is 0x60!!!			  

   **********************************************************

(6) --------------- PORTING TO C++ ------------------------
   Objective: add an instruction class to the sytem so that we can use STL.
   Approach: establish a folder called "traceinstr", and create the Makefile as usual. Then in the root folder Makefile, add a ".PHONY" rule about traceinstr, and add rules "make -C traceinstr" to compile the directory as a target.
   The real trouble is the linker. Most sytems are compiled using "cc" (c 
compiler), but needs to link with the files in "traceinstr". We have to use
"extern 'C'" to wrap the functions to be imported by the C language part
   Note we also need to add "-L../traceinstr -linstr" to the Makefile of the i386-softmmu folder.
  then in rules.mak replace "LINK" definition with "g++" (however, will trigger an error about main missing in multiboot.o) Then go to pc-bios/optionrom/Makefile and change the call of $(LD) directly to "ld"!!! (this is to allow the
BIOS stuff to use C compiler and C linker, but the other parts of qemu
would still be ok).

(7) ------------------ MEMORY TRACING ----------------------------
  Objective: we'll track which addresses is each instruction reading from and writing to.
  Idea: instrument cpu_x86_ldsb_code etc. Using GDB, we can find out that
 all the memory loading and saving operations are essentially defined as
 macros in include/exec/softmmu_header.h:85. [it looks like a macro of
 the same function for many different types of data sizes, something like
 a C languaguage version of C++ template].  This saves code space, but is 
 very UGLY and creates a lot of trouble. If we want to instrument and print
 cr3, note at at line 85, the CPUArchState can be CPUX86State or CPUAlphaState
 (which does not have cr3).

  The trick is to also use MACROs to define a macro "dummy_cr3" and use it
 in the definition in softmmu_header.h. Then "dummy_cr3" is SPECIALLY redefined
 in cpu.h in target_i386. 

(8) ----------------- NETWORK ---------------------------
  The trick is to specify the model rtl8136, see run.sh in qemu_image folder.

(9) ------------------ DEBUG MEMORY ERRORS -----------------------
  Sometimes, if we read error (causing segmentation relted faults), we want
 to intercept the error. Set BP on "gen_exception" does not work, instead, 

  **** set a breakpoint on raise_exception_err of mem_helper.c in target_i386. !!! It seems that exception_index 14 and error_code 0 often causes BLUESCREEN.
Using backtrace we can find error.

   One problem we have with ops_sse.h is that if we adjust the
 WAIT_INS to a smaller number like 101, the system will crash with bluescreen.
 Using the above technique, we can identify that when we try to read
 the process name too early (ing the FS:[0] structure->0x3c offset), 
 the contained address is usually an illegal address). This causes the
 problem.

  To fix the problem, we need a way to tell if an address is a legal address.
READ mem_helper.c in target_i386. It seems that we could use
cpu_memory_rw_debug(...) function to verify if an address is legal (it basically
checks the validity bit in page table and check the access right). Use it
in ops_sse.h where we try to read the pFilePath.


(10) ------------------- SNAPSHOT and Monitor ------------------------
To avoid restarting the system every time, we can use the monitor, by 
appending "-minotr stdio" to the command.

 we can use "savevm name" and "loadvm name" command to save and restore
snapshots (albeit it's a big time consuming about 30 seconds - but it's
better than to restart the system).

 Using monitor we can also use mouse_move xoff,yoffset command to reset
the mouse position which is convenient.

  NOTE THAT however, not working well with GDB!!! (it's actually ok, as long
as using exactly the same qemu-system-i386 command arguments when
the vmsnapshot is taken0.

(11) --------------------------- system interrupt handling observation ----
  When an interrupt occurs, there will be a piece of code
 of OS interrupt handler inserted (with an IRET). Notice that this can
 be an I/O event and can happen anywhere.
  
(12) ------------------ revist memory tracing -------------------------
  from translate.c by studing disas_insn and a memory saving instruction
(such as push), we can identify/trace to file tcg_op.h about 
micro operation tcg_gen_qemu_st32 where "st" stands for "store", it translates
into a micro-instruction INDEX_op_qemu_st32. Then we need to study
where INDEX_op_qemu_st32 is executed.

 *** INDEX_op_qemu_st32 *** 
 Doing a grep search of the above, we can find that tcg/i386/tcg-target.c:1348 *** static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,***
This function contains the logic. It calls tcg_out_tbl_load to load
the TLB, and it writes to TLB using tcg_out_qemu_st_direct. It then
calls tcg_out_modrm_offset *** These functions are used to generate
TCG micro-operation code, and generate the corresponding machine instructions
on the target i386 system for these micro-opcodes.

  Using GDB, we could find that these tcg_out_*** functions are actually 
called by *** cpu_x86_exec ***, it calls tb_find_slow/fast to generate/locate
the tb_code and then it calls tcg_qemu_tb_exec to execute the generated code
 tcg_qemu_tb_exec is defined as a macro:
	code_gen_prolog(env, tb_ptr)
	code_gen_prologue = code_gen_buffer + code_gen_buffer_size - 1024;

   At code_gen_buffer is a function that takes env and tb_ptr as two parameters and execute them, now the problem is to figure out the source code of 
code_gen_buffer. Using GDB, we could find that code_gen_prolog address is 0x8bf8d04. (use x/16i $eip at line 599 of cpu-exec.c), NOTE!! need to use hardware breakpoint here!!! Then we step into b71e5c00 (the code_gen_prolog buffer). However, could not see source (because most likely these are dynamically generated 
code). Stepping into a couple of instructions, we find that 
it's calling helper_trace2.

  So, in summary, it's the dynamic compilation of binary code that builds the
machine code to call helper_trace2 for each instruction.  It's the
tcg_out_qemu_st that generates the binary instruction that
stores the memory words. 

  ** tcg_out_qemu_st *** logic:  (supports multiple addressing modes, addressing
parameters stored in args, seems not easy to handle [too many cases])
          tcg_out_tlb_load: 
		tcg_out_mov(r0, addrlo) (e.g., instantiated to EAX < - EBX?)
		     tcg_out_modrm(opc, ret, arg): opc must be X86 opcode (e.g., MOV 0x8b EAX, EBX)
		tcg_out_mov(r1, addrlo) (r1 <- addrlo)
		tcg_out_shifti (r0, 8) [shift r0 8 bits to the left]
		tcg_arthi(r1 AND (TARGET_PAGE_MASK | 1 << s_bits-1)
		tcg_arith(r0 AND (CPU_TLB_SIZE-1)<<CPU_TLB_ENTRY_BITS)
		tcg_out_modrm_sib_offset opcode: 141 is LEA r0 + offset(tlb_table[mem_index]), the function is responsible for generating the INDEX BASED
ADDR MODE instruction given any opcode.
		tcg_out_modrm_offset CMP r1, 0(r0)
		jne slow path r1 <-addrlo
		... clearly it's to generate instructions to WRITE TLB table.
		
          then call tcg_out_tbl_st_direct: generate the MOVL instruction to
save 32-bit word using register indexed mode. --> so it does generate
a save instruction, Similarly: tcg_out_ld/tcg_out_st:  basically register based MOVL
 

 ******* at line 1072 of tcg/i386/tcg-target.c has the explaination of
   all local variables !!!! ADDRLO_IDX contains the index into the ARGS (lower
	part of the address), ADDRLO_IDX+1 stores the higher part,
	MEM_INDX: memory contentxt index and log2 (log of the size).


 ***** note: helper_ldb(wlq)_mmu, helper_stb(wlq)_mmu defined in softmmu_template.h. Their ENV is exactly the same as the subsequent helper_trace2 parameter,
however env->eip is not always the pc_start. So the question is: why is
pc_start different from env->eip?


(13) ------------------ revisit MEMORY TRACING Part 2 ---------------------
 It seems that the key to the current problem of memory tracing is 
the difference between env->eip and pc_start
 Experiment 1: in handle_trace2 function, print both the instructions. 
Result: env->eip contains the address of the first instruction of a 
code block (no branches)

 Experiment 2: check if CPUX86State has anything similar to eip.  
Result: CPUX86State does not store anything about EIP other than the
env->eip data attribute. Now the problem is how does handle_trace2 get
the program counter. Note that it's disas_insn who passes handle_trace2
the cpu_env (a global variable for TCG micro-operations)

 Experiment 3: we then need to figure out how EIP register is updated
by micro operations. Look at how NOP instruction is done! Break at
translate.c:6879, where NOP instruction is handled and look at how 
EIP is updated. Actually the new EIP is returned. It seems that the
micro-instructions do not update EIP (more maintain the EIP in 
the global state).

 Experiment 4: ADD a new GLOBAL VARIABLE called my_eip, which is
updated whenever the handle_trace2 is called. See if it is called
BEFORE or AFTER the memory access.
 Conclusion: READ operations are always logged before handle_trace2 is called.
Strangely WRITE operations are not logged. THESE READ operations are called
only when used to load instructions.
  ??? cpu_x86_ldxxx is not called whenever a LOGICAL MEMORY READ ACCESS
is performed by an instruction, it is called lazily when TLB flush????

 Experiment 5: Add a hook to the helper_ldb(lq)_mmu and helper_st(blq)_mmu.
Question is READ/WRITE recorded correctly.

 Conclusiopn: helper_ld and helper_st functions are called by tci.c 
(interpreter of intermediate TB code)e,g., ldu8 opcode of TCB.
 *** 

 Experiment 6: see if helper_ldb(lq)_mmu is called for every memory read.
Use GDB.
 Conclusion: the system does not always call helper_ldb and helper_st
when the instruction has to read or write memory. It only does them
 occasionally when it's a jump (translation block has to change)
 and occasionally on some push operations (but no pattern clearly to follow).
 Looks like TLB operation caches the right at the level of helper_ldb. 

 Experiment 7: revisit tcg_out_qemu_ld and tcg_out_qemu_st and see
their relation with each instruction. They do not correspond to the
machine instructions (instead, they are actually called by the
disas_insn function. Similarly there is no direct correspondance with
helper_ldb(wlq)_mmu and helper_st functions!!!

 Experiment 8: read source file of disas_insn (translate.c) and check
 how addl %(ecx), %ebx is handled. Opcode: 0x01. 
 They are calling qemu_gen_ld_32u .. qemu_gen_st_32u which generates
 the intermediate TCG code. They are called by disas_insn
 and the passed addr is the oprand id (not the real address).
 However, the helper_ldb_mmu are not called correspondingly.

 Experiment 9: examine the generated code for a sequence of PUSH instructions.
Start from disas_insn and set a breakpoint at where it handles PUSH instruction,
it's at line 5132, which consists of two micro operations, first move
the register contents to a temporary variable T0 and then push_T0 into
stack. where gen_push_T0,  first reduces ESP by 4, and then calls 
gen_op_st_T0_A0 which eventually calls ***tcg_gen_qemu_st32***!
  Note *** tcg_ctx.gen_opc_ptr points to the location of the current
 micro-instruction. It calls tcg_gen_code(tcg_ctx, char *ptr code_buf)
to generate from intermediate code to machine code. 

!!! tcg_optimize does
some optimizations like liveness analysis and operand optimization
and REALLOCATES REGISTERS. So it's not strictly following the 
micro operations generated for each instruction!!!

 Experiment 10. find out how to disable the optimization
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--------------------------------------------------------------
disable the first two macros: USE_LIVENESS_ANALYSIS and USE_TCG_OPTIMIZATIONS
in tcg/tcg.c
handle_instr verified no problem (passing pc_start no problem).
It seems that disable USE_TCG_OPTIMIZATIONS does not HELP!!!! Strange!!!
We'll have to check how the intermediate code is translated next.
--------------------------------------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

----------------------------------------------------------------------
 Experiment 11. Find out how instructions are dynamically recompiled.
----------------------------------------------------------------------
 Step 1. Find out a TB (translation block) that contains at least one
push instruction that does not have a helper_stl_mmu called.
      		This can be done using breakpoints on translate.cc:5132
and count the number of tb_find_slow called. 
		Data: take the pc_start 0xf1ec6, 0xf1ec7, 0xf1ec8 (all push),
using env->eip we know that the TB (translation block) starts at 0xf1ec6, 
doing info_b we know that when 0xf1ec6 (as pc_start) is hit, the
tb_find_slow is hit 16 (or 17 times) - just manually control. 
**** We could set a conditional breakpoint at tb_find_slow when pc==0xf1ec6)
		Discovery: from tb_slow_find, it first tries to locate if
the TB has already been generated by a hash. It turns out the TB needs
to be generated, then it calls tb_gen_code, note that the passed
PC is 0xf20d8. 

Step 2. Find out how intermediate code is generated
		tc_code is stored at tb->tc_ptr (code_gen_ptr) at 0xb31e6b30.
It then calls cpu_gen_code(), which calls gen_intermediate_code first. It
first calls gen_intermediate_code_internal.  Note DisasContext->tb always
points to the current TB. 
		Intermediate code: opcode is stored at tcg_ctx.gen_opc_ptr,
num_insns records the number of instructions in this TB.
   push instruction:      tcg_ctx.gen_opc_ptr (0x8c1cde2), note opcode is stored in tcg/tcg-opc.h (cound end as 0, nop as 1 ... and then mov_i32 is 10). arguments are stored at tcg_ctx.gen_opparam_ptr (0x8c1d30c). 
  push instruction generates the opcode from 0x8c1cddc to 0x8c1cdee: (use x
command). Thse opcodes are defined in tcg/tcg_opc.h
     11 11  10 10 11 23 120 10 8
	10: mov_32, 11: mov_i32, 23: sub_i32, 120 (0x78): qemu_st32, 
	8: call

  Note: at 0xf1ec6, 0xf1ec7, 0xf1ec8 there are three consecutive PUSH instructions. All of them are translated into the same sequence of 10 micro-operations:
each micro-operation code (opcode/opc) is 16-bits long. Again, notice opcode
120 (0x78) is qemu_st32.
     11 11 8 10 10 11 23 120 10 8 

Step 3. Find out how intermediate code is translated
  set a breakpoint at 8107 (to break out of the loop if disas_insn), at
8135 it generates the icount. Currently the number of instructions
translated is: 11. Micro-instructions from 0x8c1cddc to 0x8c1ce92 (182 bytes
, i.e., 91 micro-ops, average 9 each instruction). When gen_intermeidate_code()
finishes, it returns to cpu_x86_gen_code.
   The generated code is stored at global variable gen_code_buf (and also
tb->tc_ptr): 0xb31e6b30. It then calls tcg_gen_code() to generate the x86 code.
Now let's look at tcg_gen_code().

   In tcg_gen_code(), s->code_ptr (initial value: 0xb31e6b30) always points to the current location of the x86 code. It first calls tcg_gen_code_common.
   In tcg_gen_code_common(), it first calls tcg_liveness_analysis. As
we disabled the MACRO earlier, the dummy version of tcg_liveness_analysis()
is called. It then calls tcg_reg_alloc_start(s), which 
has to actually go through the entire list of 
temporary variables and registers and set their status. 
   Back to tcg_gen_code_comm(), it then has a processing loop starting from
line 2258 of tcg.c, it processes the register allocation for 
every intermeidate instruction. Note that
it nicely displays the ENUM string of each opcode value. The first being
processed is INDEX_op_movi_i32 (value; 11, using "p (int)opc" to display
it in GDB). It first allocates registers for the movi micro-operation by
calling tcg_reg_alloc_op, it actually generates
x86 code.  use gdb x/16i 0xb31e6b30 (the orignal starting point). 
For the first two micro-operations, it is generating
the helper_trace2(env, 0xf1ec6) for instruction 0xf1ec6. Every movi32 corresponds to about 2 to 3 machine instructions.  For these MOV instructions, they
are translated using functions like tcg_out_ld and tcg_out_modrm_offset
to generate x86 code.
  Now observe the opcode 120 (at index 7), let's observe the logic: it's
going to generate instruction at 0xb31e6b4e (wihch is 6 instructions
away from the call of helper_trace2). Stepping through the
tcg_reg_alloc_op, we found that it calls tcg_out_mov and at line 1985 of tcg.c
it calls tcg_out_op(s, opc, new_args, const_args) where the opc is
INDEX_op_qemu_st32.
       in tcg_out_op(...INDEX_op_qemu_st32...), we find that it is located
at tcg-target.c: 1667. This is a big switch case, at line 1919, it calls
tcg_out_qemu_st(...), it first calls tcg_out_tlb_load(). This generates 
9 instructions at 0xb31e6b5e (9 instructions away from helper_trace2).
   They are: (not sure about how it corresponds to loading TLB)
    mov %ecx, eax; mov %ecx, %dx; shr 0x8, %eax;
    and 0xfffff003, %edx, and 0xff0, %eax, 
    lea 0x360(ebp, eax, 1), eax, cmp (%eax), %edx, mov %ecx, %edx
    jne 0xb31e6b82, add 0x8(eax), edx
   Now it's going to generate code at 0xb31e6b85 (which is 19 instructions
away from call of help_trace2) by tcg_out_qemu_st_direct. It calls
tcg_out_modrm_offset (.. OPC_MOV_EvGv...) that generates 1 instruction
starting from 0xb31e6b85: mov %edi, (%edx). 
   tcg_out_qemu_st next calls add_qemu_ldst_label() to add the current
context of store into ldst label, but it does not generate ANY more
instructions!!!!

---------------------------------------------------------------------
---------------------------------------------------------------------
  Conclusion: when generating the x86 code for INDEX_op_qemu_st32, 
it does not generat the call for helper_stl_mmu at all!!! Instead,
it flushes TLB. Note that ldst_labels are created insteaded, and
they are executed at the end of TLB, which is strange.
--------------------------------------------------------------------
--------------------------------------------------------------------

Next to figure out: add_qemu_ldst_label()??? What do these ldst_labels do?

------------------------------------------------------------------------
Experiment 12: find out add_qemu_ldst_label and TLB_flush logic
-----------------------------------------------------------------------
	
 (1) tcg/i386/tcg-target.c defines tcg_out_tlb_load, it loads TLB given
  addrlo_idx (which can be used to locate the address), it computes
  address = s->args[addrlo_idx] + s->args[addrlo_idx]<<16; Then
  it computes the address of env->tlb_table[mem_index][0], and then
  tests TLB hit (using a comparison), conditional jump to a branch to
 load TLB. It seems that parameter "which" could be used to determine
  if it's read or write.  To figure it out do the following GDB experiment:
    *** set a conditional breakpoint at tb_find_slow when pc==0xf1ec6, and
step into the tcg_out_tlb_load. Let's check the addr:
	addrlo_idx = 1
	args[addrlo_idx] = 1
	args[addrlo_idx+1] = 0
	Seems that the logic of computing address is MUCH MORE complex than this,give this up first unless we do not have any other solutions

  (2) tcg/i386/tcg-target.c. tcg_out_qemu_st_direct: has two parameters:
   datalo, datahi (seems to be the two parts of the address), it first
   generates a MOVL instruction and then creates an ldst label, 
   cannot step into add_qemu_ldst_label, seems to be a macro

  (3) add_qemu_ldst_label: creates a ldst label structure, it has information
 of addrlo_reg, addrhi_reg, datalo_reg, datahi_reg.

  (4) tcg_out_qemu_ld_slow_path takes the TCGLabelQemuLdst and calls the
  helper method (helper_st/ld_mmu) to read/write. The address is pushed
  into the stack for helper_st/ld command by pushing the contents of
  addrlo_reg/addrhi_reg in real x86 code.
     Now the question is when tcg_out_qemu_ld_slow_path is called? 
  Call path: tcg->gen_code_common (the_end at line 2327 of tcg/tcg.c) ->
                   tcg_out_tb_finalize -> tcg_out_qemu_ld(st)_slow

        DON'T UNDERSTAND THE LOGC HERE: why perform all the read/write
  operations at the end of the X86 code for a Translation Block? (Why 
  delay them)?
     *** tcg_gen_code_common:   most operators except for mov will be
 handled by tcg_reg_alloc_op (which, e.g., handles INDEX_op_st32 opcode),
 that performs the translation. After the for loop of processing, it 
 calls tcg_out_tb_finalize (at i386/tcg-target.c:1648) it reads
 TCGContext->qemu_ldst_labels one by one and performs the action
 one by one. It seems that raddr corresponds to the x86 code of the ld/st.
 Note at line 1559, it generates a JUMP instruction to raddr! It seems
 to be weaving the logic of lt/rd in! [so the code is actually generated
 at the end by modifying the code].
                   

--------------------------------------------------------------------
Experiment 13: Watch the behavior of tcg_out_tb_finalize. Correlate
with the x86 code generation for opcode 120 (0x78)
--------------------------------------------------------------------
  Set up: (1) We could set a conditional breakpoint at tb_find_slow when pc==0xf1ec6), step into tcg_gen_code_common() and record the opcode
generated by the instructions one by one and the address of ldst_labels.
Run another round and Break on handle instruction to get the disassembly of 
these two instructions.

	(2) Step into tcg_out_tb_finalize and look at the generation of
x86 code and compare the change of JMP.

	(3) set breakpoint on the corresponding instructions and do a step
by step.

   Data below:
  x86 code (tb->tc_ptr: 0xb31e6b30)
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Instruction 1:  (@f1ec6) PUSH EBX
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Intermediate Code: Stored tcg_ctx.gen_opc_ptr 0x8c1cddc
   11 (movi) 11 8 (call) 10 (mov_32) 10 11 23 (sub_i32) 120 (qemu_st32) 
	10 
  Generated Code: tb->tc_ptr 0xb31e6b30
  Opcode 120: s->code_ptr: 0xb31e6b4e to 0xb31e6b85
   0xb31e6b4e <code_gen_buffer+2894>:	mov    %eax,%edi
   0xb31e6b50 <code_gen_buffer+2896>:	mov    %eax,0x80(%esp)
   0xb31e6b57 <code_gen_buffer+2903>:	mov    %ecx,0x84(%esp)
   0xb31e6b5e <code_gen_buffer+2910>:	mov    %ecx,%eax
   0xb31e6b60 <code_gen_buffer+2912>:	mov    %ecx,%edx
   0xb31e6b62 <code_gen_buffer+2914>:	shr    $0x8,%eax
   0xb31e6b65 <code_gen_buffer+2917>:	and    $0xfffff003,%edx
   0xb31e6b6b <code_gen_buffer+2923>:	and    $0xff0,%eax
   0xb31e6b71 <code_gen_buffer+2929>:	lea    0x360(%ebp,%eax,1),%eax
   0xb31e6b78 <code_gen_buffer+2936>:	cmp    (%eax),%edx
   0xb31e6b7a <code_gen_buffer+2938>:	mov    %ecx,%edx
   0xb31e6b7c <code_gen_buffer+2940>:	
    jne    0xb31e6b82 <code_gen_buffer+2946>
   0xb31e6b82 <code_gen_buffer+2946>:	add    0x8(%eax),%edx
   0xb31e6b85 <code_gen_buffer+2949>:	mov    %edi,(%edx)
 --------- the following are for the next opcode mov32_i
   0xb31e6b87 <code_gen_buffer+2951>:	mov    0x84(%esp),%edi
   0xb31e6b8e <code_gen_buffer+2958>:	mov    %edi,%esi


 generated on label, see s->nb_qemu_ldst_labels
 display s->qemu_ldst_labels:
    p s->qemu_ldst_labels[0]
$26 = {is_ld = 0 (write), opc = 2, addrlo_reg = 1, 
	addrhi_reg = 0, datalo_reg = 7, datahi_reg = 0, 
	mem_index = 0, raddr = 0xb31e6b87 <code_gen_buffer+2951> "", 
	  label_ptr = {0xb31e6b7e <code_gen_buffer+2942> "", 0x0}}
  NOTE:!!!! raddr is 0xb31e6b87 (which is RIGHT AFTER the last instruction
  generated!!! Note the last instructions are some how added before the
  processing of next opcode). 




  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Instruction 2:  (@f1ec7) PUSH EDI, 3rd instruction (PUSH ESI; PUSH EBX)
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Intermediate Code: Stored tcg_ctx.gen_opc_ptr 0x8c1cdee
	11 11 8 10 11 23 120 10  
   Code starts from 0xb31e6bb1.
       0xb31e6bb1 <code_gen_buffer+2993>:	mov    %eax,%edi
   0xb31e6bb3 <code_gen_buffer+2995>:	mov    %eax,0x80(%esp)
   0xb31e6bba <code_gen_buffer+3002>:	mov    %ecx,0x84(%esp)
   0xb31e6bc1 <code_gen_buffer+3009>:	mov    %ecx,%eax
   0xb31e6bc3 <code_gen_buffer+3011>:	mov    %ecx,%edx
   0xb31e6bc5 <code_gen_buffer+3013>:	shr    $0x8,%eax
   0xb31e6bc8 <code_gen_buffer+3016>:	and    $0xfffff003,%edx
   0xb31e6bce <code_gen_buffer+3022>:	and    $0xff0,%eax
   0xb31e6bd4 <code_gen_buffer+3028>:	lea    0x360(%ebp,%eax,1),%eax
   0xb31e6bdb <code_gen_buffer+3035>:	cmp    (%eax),%edx
   0xb31e6bdd <code_gen_buffer+3037>:	mov    %ecx,%edx
   0xb31e6bdf <code_gen_buffer+3039>:	
    jne    0xb31e6be5 <code_gen_buffer+3045>
   0xb31e6be5 <code_gen_buffer+3045>:	add    0x8(%eax),%edx
   0xb31e6be8 <code_gen_buffer+3048>:	mov    %edi,(%edx)
   0xb31e6bea <code_gen_buffer+3050>:	add    %al,(%eax)
   0xb31e6bec <code_gen_buffer+3052>:	add    %al,(%eax)
  later at 0xb31e6bea the instructions for the next opcode are:
  0xb31e6bea <code_gen_buffer+3050>:	mov    0x84(%esp),%edi
   0xb31e6bf1 <code_gen_buffer+3057>:	mov    %edi,%esi


   It generates one ldst_label. dump s->qemu_ldst_labels[1] we have
  (gdb) p s->qemu_ldst_labels[1]
$33 = {is_ld = 0, opc = 2, addrlo_reg = 1, addrhi_reg = 0, datalo_reg = 7, 
  datahi_reg = 0, mem_index = 0, raddr = 0xb31e6bea <code_gen_buffer+3050> "", label_ptr = {0xb31e6be1 <code_gen_buffer+3041> "", 0x0}}

  Note: raddr is 0xb31e6bea (which is the next immediate code_ptr for
 next opcode)


  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 At line 1654 of i386/tcg-target.c tcg_out_tb_finalize
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

 There are 6 ldst labels.
 Process s->qemu_ldst_labels[0]: calls tcg_out_qemu_st_slow_path
	It first changes label_ptr[0] (which 0xb31e6b73) to s->code_ptr -label_ptr[0]. 
	
So the original instruction at
----------------------------
   0xb31e6b7c <code_gen_buffer+2940>:	
    jne    0xb31e6b82 <code_gen_buffer+2946>
----------------------------
  is changed to 
  0xb31e6b7c <code_gen_buffer+2940>:	
    jne    0xb31e6e3a <code_gen_buffer+3642> (the beginning of the code
 to be generated by tcg_out_qemu_st_slow_path!!!!!)
---------------------------

  Then afer the helper_stl_mmu call is generated, it uses the raddr to
 jump back to the instruction for the next immediate micro-operation!!!!

----------------------------------------------------------
So the logic is very clear:
  for INDEX_op_qemu_st32 opcode, the qemu_gen_code_common first generates
 the first logic
        (1) load/check TLB
        (2) if TLB hit, load from TLB
        (3) otherwise jump to the helpmer_ld/st_mmu code (generated
	and hooked up/wired by the tcg_out_qemu_st_slow_path!!!)
 This is the reason why helper_ld/st_mmu is NOT called every time!
 --------------------------------------------------------- 

Conclusion call graph of the generated X86 CODE:
      load TLB
      check if TLB hit
      branch {go ahead} : {slow ld/st which calls helper function}

 Big question: from load RIGHT BEFORE TLB, we should insert a call
to record the memory reference. Now the problem is how can we get
the address that is beging read/write?

Algorithm Idea: to insert a call of tcg_out_trace_mem(...) at the
beginning of tcg_out_qemu_st(...). The function will have
 all of the parameters of tcg_out_qemu_st(...) to calculate 
 the address to access.
   Now the logic to calculate the address should be similar to
 that of tcg_out_qemu_st_slow_path(), it takes tcg_ctx and label.
 It pushes mem_index, data_reg, addrlo_reg, TCG_AREG0 (all these
 information come from label) and calls
 a function similar qemu_st_helper(tcg_ctx, function_ptr).

-------------------------------------------------------------
Experiment 14. Add the memory tracer
-------------------------------------------------------------
(1) define helper_trace_mem(unsgiend int addr, unsigned int size, unsigned int bRead) in ops_sse.h, It will be called RIGHT AFTER
helper_trace2 (for st/ld instructions), and RIGHT BEFORE TLB load. 
 No need to pass EIP, because we can access global variable global_eip.
 The logic is to simply call the corresponding trace_mem function.
<DONE>

(2) define tcg_out_trace_mem(mem_index, data_reg, data_reg2, addrlo_reg, addrhi_reg)  i386/tcg-target.c. This function simulates tcg_out_qemu_st_slow_path and takes all of its parameters. It calculates the access address and then call helper_trace_mem(..)
<DONE. Used some tricks if #ifdef for different arch>

(3) in tcg_out_qemu_st() and tcg_out_qemu_ld() in tcg-target.c, call tcg_out_trace_mem(..) to perform the trace. The call should be placed right
 before line 1367 tcg_out_tlb_load.

To verify:
     
   (1) We could set a conditional breakpoint at 
tb_find_slow when pc==0xf1ec6), step into tcg_gen_code_common() and record the opcode. The set breakpoint at disas_insn and then handle_instr, and also
handle_mem_read and handle_mem_write. < OK.>

   (2) ********************8
  However the system eventually fails on CPL_MASK (dpl <cpl error), need to check details
  *** line 652 of seg_helper.c
  ************************************8
  Design: set a breakpoint at 652 of seg_helper.c and figure out the meaning
 of dpl and cpl.
   CPL is the current privilege level. RPL is the request privilege level. DPL is the description privilege level. Use a stupid way BINARY SEARCH (ignore BP) to find out the last instruction involved. (set a breakpoint on help_trace2). Not quite reliable due to threads interleaving.
   Attempt2:  notice that env->cr[3] 0x39000 is quite suspicious (never encountered before). Set a watch on it (use watch ((CPUX86State*)0x08da4b90)->cr[3]),
this caught where cr3 is changed (switch_tss).  This does not help a lot.
   Attemp3: note that raise_exception_err env->eip is always 0xecc (because
the handler address is 0xecc). env->sysenter_eip is 0x804def6f
   Attemp4: gdb setup:
 	(gdb) handle SIGUSR1 noprint
Signal        Stop	Print	Pass to program	Description
SIGUSR1       No	No	Yes		User defined signal 1
(gdb) b ops_sse.h:2255
Breakpoint 1 at 0x82e245f: file /home/csc288/qemu/qemu-1.4.0/target-i386/ops_sse.h, line 2255.
(gdb) b raise_exception_err if exception_index!=14
Breakpoint 2 at 0x82cacd1: file /home/csc288/qemu/qemu-1.4.0/target-i386/excp_helper.c, line 122.
  Attemp5: change to branch.exe (from srss.exe) to observe and see the result.
 Observation: no even the raise_excepion_err BP does not work (maybe we need
to remove the condition). This looks like a memory corruption bug where
some important region is overwritten. Hard to find!
  Attemp6: Note, it may be caused by the additional read of instructions (it
reads 15 bytes always, which could be reaching the end of the code region).
disable the read 15 bytes parts (in ops_sse.h), and see if it crashes. DOES
NOT WORK!
  Attempt7: observation: it seems that when we change the process to trace,
it is the process being traced crashed. So the problem should not be located
in the qemu_out_xxx calls. It still resides in the traceinstr package. Now 
disable the call of Trace::handleMemRead and Trace::handleMemWrite and
just printf() the message in handle_mem_read and handle_mem_write.
         RESULT: It still crashed!. Try remove the printf again. Still
 does not work.
  Attemp8: switch back to srss.exe and run GDB. (to switch in between
 the winxp guest window causes trouble of freezing the system). Still
 the same. No try to redefine the HANDLE_MEM_READ and HANDLE_MEM_WRITE.
    It seems that function calls are the problem. Try to not call anything
 (including printf) and just do local variables mod. See what's going on.
 Now it's not throwing the error. It seems to be the extra function call that
 causes the problem.

  Attempt9: make an extra call directly in tcg/i386/tcg-target.c replace
 it with a dummy function and see what's going on. OK. Now replace the
 dummy function with a printf and see what's going on.. It seems to break it
 when we put printf in. When it has multiple levels of calls it seems ok.

  
 Attemp 10: figure out how the STACK_REG and why it should be changed. Read
  tcg_out_qemu_st_slow_path and observe what it does.
  	Observation: tcg_out_qemu_st_slow_path calls tcg_out_calli to
 generate the code of call and the displacement (target). It looks fine.
		Note that has a a JMP_short 5 right after the call.
  It then generates OPC_JMP_long and  advances the code_ptr by 4,
  this is clearly to set up the place (JMP_addr) for the later 
  st/ld label processing to fill out the eventual address to jump back.
  then tcg_out_addi(...TCG_REG_CALL_STACK...). (it's like add $0x10, %esp).
  then it generates jump back to raddr. So a call of tcg_Zout_qemu_slow_path
  generates the following instructions:
            ------------------
		 	push   $0x0
			push   %edi
			push   %esi
			push   %ebp
			//the above pushes 4 params in reverse seq
			call   0x82f09ca <helper_stl_mmu>
		        jmp    0xb31e6a3f (jump 5 bytes away, skip next instr)
   0xb31e6a3a 	        jmp    0xb31e6945 (THE START OF NEXT MICROCODE AFTER THIS)
   0xb31e6a3f 		add    $0x10,%esp
    			jmp    0xb31e6945 (jmp raddr the next immediate instruction)

	----------------------------------------
	Now the problem, why add 0x10 (16) to %esp? In GDB, use tb *addr (
hardware breakpoint!!!) to set a breakpoint on the first push $0x0 instruction and 
the other instructions to observe. Somehow, step by step does not work that well.
 Initially, the ESP is 0xb02f4e50, after 3 pushes  (notice that sometimes
it's 3 pushes). and then the call to helper_ldl_mmu, ESP is 0x...4e44 (note
that it's 12 bytes away!). So by C language calling convention, after
a function call is completed, when reseting ESP/EBP, it takes 
away the parameters. It seems that we are doing it right regarding pushing
parameters.
        ----------------------------------------
              
  Attempt 11: given that it's always the problem of cpl/dpl/rpl in accessing
a segment, let's check the values of env->hflagsA
	Observation: even for the same process the hflags may change.

  Attempt 12: hflags changes too frequently, first get close to the crash point
then watch on the dpl expression.
  Obsevation: cr3_to_trace is 0x5042000, and normally cpl is 0 and dpl is 0.
 Note that somehow cpl is changed to 3, cpl is calculated using the
 following formula (from helper_seg.c):
          env->hflags & 3
   set watch point "watch *0x8da4bc8 & 3", the address is retrieved using
 p/x &(env->hflags).
   Found that the value is changed from 0 to 3 at line 995 of cpu.h! cpu_x86_set_cpl. It's called by helper_iret_protected and then helper_ret_protected
and then helper_ret_protected and then x86_set_cpl. Next step into code_gen_buffer() and cpu_x86_exec. The current global_eip is 0x806eeec5. next eip is env->eip 0xec6. (here is line 2200 of seg_helper.c). Then after quite a while it reaches the raise_exception_err.
   Now do a comparison, observe other cases of helper_iret_protected and see
 what's going on. It seems that helper_iret_protected doe snot always 
 hit the cpu_x86_set_cpl! (in many branches it pops ESP). It depends on the
 value of parameter shift. !!!! Actually line 2200 of seg_helper.c is ONLY hit 
 one times, and that directly triggers the exception !!! (note: to repeat
 this needs to type the "run" command directly, don't let windows to
 warm restart!) ---- now let's set a BP on helper_trace2 (it's called
 for every instruction) and see how many more will be called. !!!
  after 3 instructions it's exception! 
   ********************************************************************
  eip_ins are 134854, 134860, env->cr[3] is 0x39000, note that cr3_to_trace
 is 0x4dc2000.
  Next: set a bp on tb_find_slow, hit then it hits dias_insn for the following
	(1) pc_start 134854 (0x20ec6): mov R, Iv (mov immediate number to R)
	(2) pc_start 134860:(0x20ecc): INT 16 (looks like an I/O request), code placed
	tcg_ctx.gen_opc_ptr (index 16), opc11,
		list of micro-opcode:
			movi (pc)
			st_tl (save to cpu_env->eip)
			gen_helper_raise_inerrupt
	at 0xb6cb5643!!!!!!!!1 (later set a BP on it)

   ********************************************************************
   ++++++++++++++++++++++++++++++++++
	Now we could simply the breakpoint process
	b disas_insn if pc_start==134860 (it will be hit twice, ignore the
1st hit). However, it's kind of slow though, could simply add a branch
in dias_insn to speed it up. 
	>>> 
	b translate.c:4272
	b raise_exception_err if exception_index!=14
	>>>
   ++++++++++++++++++++++++++++++++++
	Continue the observation (3):
	the TB has only these two instructions, then it starts to generate 
	code. The generated code is:
	0xb6cba9a1 <code_gen_buffer+61688225>:	mov    %ebp,(%esp)
   0xb6cba9a4 <code_gen_buffer+61688228>:	mov    $0x20ecc,%ebx
   0xb6cba9a9 <code_gen_buffer+61688233>:	mov    %ebx,0x4(%esp)
   0xb6cba9ad <code_gen_buffer+61688237>:	mov    $0x12,%ebx
   0xb6cba9b2 <code_gen_buffer+61688242>:	mov    %ebx,0x0(%ebp)
   0xb6cba9b5 <code_gen_buffer+61688245>:	
    call   0x82e2362 <helper_trace2>
   0xb6cba9ba <code_gen_buffer+61688250>:	movl   $0xecc,0x20(%ebp)
   0xb6cba9c1 <code_gen_buffer+61688257>:	mov    %ebp,(%esp)
   0xb6cba9c4 <code_gen_buffer+61688260>:	mov    $0x10,%ebx
   0xb6cba9c9 <code_gen_buffer+61688265>:	mov    %ebx,0x4(%esp)
   0xb6cba9cd <code_gen_buffer+61688269>:	mov    $0x2,%ebx
   0xb6cba9d2 <code_gen_buffer+61688274>:	mov    %ebx,0x8(%esp)
   0xb6cba9d6 <code_gen_buffer+61688278>:	mov    %eax,0x80(%esp)
 ***  0xb6cba9dd <code_gen_buffer+61688285>:	
***    call   0x82ca99c <helper_raise_interrupt>

   Clearly, 0xb6cba9dd is the one which raises the interrupt!
 ------------ set a BP at 0xb6cba9dd --------------------------------

  Now simplify the debug: hb *0xb6cba9d6 (use hb incase of overwriting). DOES
NOT WORK!!! set a bp at helper_raise_interrupt (condiiton into==16) after the first hit on raise_exceptioN_err

   ++++++++++++++++++++++++++++++++++
	Now we could simply the breakpoint process
	>>> 
	b helper_raise_interrupt if intno==16
	b raise_exception_err if exception_index!=14
	>>>
   ++++++++++++++++++++++++++++++++++

   ********************************************************************

Attempt 13: study the logic of helper_raise_interrupt
	>>> 
	b helper_raise_interrupt if intno==16
	b raise_exception_err if exception_index!=14
	>>>
Observation: when it's called, env->cr[3] is 0x39000, next_eip_addend is 2,
intno is 16, error_code is 0, according to (google interrupt list, int16/eah=0
is to get keystroke). Current env->eip is 0xecc and next eip is 0xece (because
of next_eip_addend), 
   !!! env->hflags is 0x4008c7 and it & 3 is 0x3. Use this to compare with
 regular version !!!
!!!!!!!!!!!!!!!!!!!!
  It's clear helper_raise_interrupt belongs to the last INSTRUCTION. Next
helper_trace2 is never hit, jumps directly to exception
!!!!!!!!!!!!!!!!!!!!
   Now recompiles and then check. Modify target-i386/cpu.h It seems that
the helper_raise_interrupt when intno==16 is NEVER called!
   Conjecture here: use of external calls somehow changes the relative
speed of the threads and 0x39000 thread when triggers and exception 
has some how env variable messed up.
	Notice that thread 0x39000 does call the INT 16 multiple times,
its env->hflags is 0x44 (different from the 0x4008c7) of the previous.

Attempt 14: check back again on env->hflags and see why its changed

	>>> 
	b raise_exception_err if exception_index!=14
        b cpu_x86_set_cpl if env->cr[3]==0x39000
	display/x s->hflags
	display/x s->hflags & 3
	>>>
  the hflags is shown to be flipped several times ranging from 2b4 to 0x400b4 .
   >>> When it is close to the crash, Ctrl+C and enter a watch point (to save time, watch is too slow)
	>>>
	"watch *0x8da4bc8 & 3"  (the address is retrieved earlier by 
		p/x s->hflags)
	>>>
	The reaspon cpu_x86_set_cpl(3) is called is because shift is 1
	and is_iret=1, also new_eflags and VM_MASK is 1.
	see seg_helper.c:2026, set a breakpoint on it (condition on
	env->cr[3]==0x39000).
	Observation: in most cases, it is returning new_eflags 0x206. tracing
into the call POPL(ssp, sp, sp_mask, new_eflags) we find that POPL
 is a macro defpnition that cpu_ldl_kernel(env, SEG_ADDL(ssp, sp, sp_mask)) and
then sp+4. It reads the data from kernel_stack. It actually calls
cpu_ldl_kernel, addresses are like 8055014c. 
	NOTE THAT after the dumping message shows up, the cpu_x86_set_cpl
for process 0x39000 will be only called ONCE!!!
	Now let's observe the last case that it is called. The last
addr is is trying to retrieve from is 0x8054d6c4, the new_eflags
it retrieves is 0x23006. 
	Tried to run it a second time, within the same GDB instance,
the value is ALWAYS 0x8054d6c4!!!
	Now let's set a breakpoint on 0x8054d6c4! cpu_stl_kernel. The
cpu_stl_kernel is never hit but the cpu_ldl_kernel is hit, which
is strange.
	Let's try the physical address in cpu_ldl_kernel it eventually
calls ldl_p(0xa004a6c4), this is always a fixed address as well, now
set a breakpoint on stl_p(0xa004a6c4). Very time consuming... Still not hit.

	
Attempt 14: study the logic of helper_raise_interrupt again. It must be
pushing eflags into the stack. Check specifically the use of
VM_MASK. It seems that when syscall or interrupt in protected mode, 
VM_MASK is off (on env-->eflags).  See do_interrupt_protected at line 792
of seg_helper.c)
	>>> 
	b raise_exception_err if exception_index!=14
	wait for some time untill close to crash and then
	b helper_raise_interrupt if intno==16
	>>>
 Check the value of env->eflags at the last helper_raise_interrupt
(note that VM_MASK is 0x0002 0000).
  Observation: the env->eflags of the process 0x39000 is 0x23002 (it is SET!)

Attempt 15: now the question is why the env->eflags for process 0x39000
has VM_MASK enabled (it is never enabled after the system starts to 
trace srss.exe). We need to watch when this happens.
           Design:
		(1) in source code add a global variable last_39000_eip
		(2) modify helper_trace2 instruction to update the
	last variable and print it out when env->eflags & VM_MASK changes.
	Observation: switch made from 1 to 0: 20ec6, c83f9
		from 0 to 1, 806ef788, 806ef804,  
	Each of these instructions are only hit up to 4 times.
	Given the above:	
	>>>  
	b raise_exception_err if exception_index!=14
	b ops_sse.h:2265 (at 2265 we add a branch to check process id condition to stop at c83f9)
	b ops_sse.h:2269 (at 2269 we add a branch to check process id condition to stop at 22ec6)
	>>> 
	Note that bp at 2265 is only hit once, after the first raise_exceptioN_err leads to the problem. It changes the flags from 0x23202 to swap OFF VM_FLAG.
But then it is switched from 1 to 0 by 8063f804, but again, it is
switched from 0 to 1 by 20ec6.
		The BP at 20ec6 is ONLY hit ONCE! right before the exception.
It's the one that SETs the VM_FLAG. and submits the INT 16 in the VM_FLAG
mode, it triggers do_interrupt_protected()->cpu_x86_set_cpl(), based
on the condition that env->eflags & 0x20000 (VM_FLAG) is 1.
***** Now let's look at the two instructions 0x20ec6 and 0x20ecc, the
following is the analysis from dias_insn


***** the following are the last 4 records recorded ***********
********************************************************************
eflags swiched from 0 to 131072 at 20ec6 (switch is already done, actually should be the last eip who does the switch). Last eip is 806eeec5
eflags switched from 131072 to 0 at 806ef788 (last eip is 20ecc, actual change)
eflags from 0 to 131072 already at c83f9 (last eip is 804dfa41, actual change)
eflags from 131072 to 0 at 8063f804 (last eip is 20ece, actual changes)
********************************************************************
   Instructions at 
       ***	806eeec5: iret (turn on VM flag)
		806ef804: mov R, Iv (actually no effect, it's just the next immediate instruction 20ece)
		806ef788: mov Ev, Iv (actually NO effect on VM, it's just the next immediate instruction after 203cc)
       ***	804dfa41: iret (turn on VM flag)
       		20ec6: mov R, Iv (actully no EFFECT)
       		20ecc: INT 16 (turn OFF VM flag)
       ***	20ece: ILLEGAL_OP the first visit!!!  second visit les Gv,
			still ILLEGAL_OP.
		c83f9: pushfA (actually no EFFECT)

It seems that it's the instruction 20ece (les Gv) causes ILLEGAL_OP error!
Next to compare it with a normal run. In the CORRECT VERSION, 20ece is generating ILLEGAL_OP as well, HOWEVER, it is not hit twice!
   >>>>>
	b disas_insn if pc_start==0x20ec6 || pc_start==0x20ece || pc_start==0x20ecc
	b ops_sse.h:2266 (where we added condition to check eip_in is
		0x20ec6, 20ecc, and 20ece)
	b raise_exception if exception_index!=14
   >>>>>	
	observation: ***** 
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
	disas 0x20ec6 (mov)-> disas 20ecc (INT 16) -> execute 20ec6 -> 20ecc (INT 16) -> do_interrupt_protected getting e1/e2 by cpu_ldl_kernel, and then calculate dpl=0, cpl = 3! next_eip = 0xece -> raise_exception(GPF - general
protection error) -> do_interrupt_protected (13 - GPF), next_eip=0xecc -> 
still dpl<cpl (but is_int is 0), so won't raise exception. but push segment
selectors, set cpl back to 0. sets back to env->eip to 806ef788 [this is actually the interrupt handler, calculated from e1 and e2]
            disas 20ece -> execute 20ece -> RIASE_EXCEPTION (index=6, invalid opcode) - do_interrupt_protected(intno=6 invalid opcode) -> it jumps then
to eip 0x806ef804 (interrupt handler)-> ... a lot of instructions ...
	    -> disas 0x20ec6 (mov) -> disas 0x20ecc (INT 16) ->
		execute 20ec6 (now VM mask is 1) -> exec 20ecc (INT 16) 
		-> do_interrupt_protected -> cpl is 3 causes an exception
		 -> do_interrupt_protected(13, is_int=0) -> 806ef788 (mov instruction).... a lot
		   -> dias_insn 20ece -> exec 20ece -> raise_eception (6, invalid opcode) -> do_interrupt_protected (index6) -> 0x806ef804 -> 0x806ef809 -> 
   ... !!! over 1000 instructions (this is exactly the same as FIRST ONE!)
  ... 
   ->  crash (stop: 0x0000007f unexpected kernel mode trap), 0x8 means 
	double fault from MSDN. Guess: the same error 0x20ece is repeated
        twice and this leads to the crash. do_interrupt_protected is (int45, unknown service). LAST EIP: 0x8050b895: INT 45.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
----> It seems dooing the exception twice cause the problem. Check around
the code between 0x20e00 to 0x20f00 (set BP on disas_insn condition on
pc_start)

     --------- now env->eflags & VM_FLAG is 0
  -> 	20e74: push 20e76: push 20e78: push 20e7a: push 20e7c: mov Ev, Gv
	20e7f: mov R, Iv 20e82: mov seg, Gv  20e84: mov Seg, Gv
	20e86: mov R, Iv 20e89: push 20e8b: mov   20e91: Arithmetic
	20e98: push  20e9a: pop  20e9b: push ds  20e9c: pop ds
	20e9d: Arithmetic 20ea0: Arithmetic 20ea3: mov   
        -------> 20ea5: mov Ev, Gv
	loop  -> 20ea8: mov  20eaa: Arithmetic 
	-------> 20eb4: jecxz (loop) 
	20eb6: push  20eb7: pop ds 
        -----> 20eb8: call im (-2850), 20ebb: push, 20ebc: mov Ev, Gv
	20ebe: Arithm, 20ec1: push/pop, 20ec2: push, 20ec3: push ds
	20ec4: push, 20ec5: mov R, Ib   20ec7: mov R, Iv
	20eca: mov Ev, Gv, 20ecd: INT 19  (0x13, disk request)
	   !!! strangely, first did not hit 20ec6, 20ecc and 20ece

	20ecf: jcc Jv, 

	Then it hits 20ec6: INT 16(0x10 - VGA request?)
	Check the binary code starting from 20ec5
	20ec5:  (all in hex) b4 41 bb aa 55 8a 56 4 cd 13 cd 13A
        Afer 10 sec (1st time 20ec6 is hit):         
			     cf 66 b8 12 0  0  0  cd 10 c4   c4
      ?????!!!!! the code segment is changed!!!
	compare with the version of tcg_out_trace_mem is commented out
	20ec5:  (all in hex) b4 41 bb aa 55 8a 56 4 cd 13 cd 13A (SAME!)
        Afer 10 sec (1st time 20ec6 is hit):         
			     cf 66 b8 12 0  0  0  cd 10 c4   c4 (SAME!)
	BUT AGAIN, 20ecc is NEVER CALLED again in the CORRECT version.

    
  Attempt 15: Now set an additinal variable of cr3_to_trace2 and print
out the instructions of 390000. Analyze the instruction before 0x20ecc.
  Observation: it immediately crash the system. Notice that the memory
tracing has been disabled!!! So it's the relative speed of the process
that caused the problem!
	Now, let's only call handle_instr for instructions in
 range 0x20000 to 0x20f00. Still does not help. Trace using GDB.
 Found that the program stuck at: os_host_main_loop_wait <- main_loop_wait
 <- main_loop <- main(). Found that if we disable the check of
 cr3_to_trace2 in translate.c it is then fine. The  regular running
 produces exactly the same dump of stack trace of os_host_main_loop_wait
 <-main_loop_wait <-main_loop <-main(). So the Ctrl+c in GDB does not work,
99% of the time, the system is waiting for I/O. Need to set breakpoint on
 handle instruction.

  Attempt 15: Now eable cr3_to_trace2 again and set breakpoint on
 help_trace2 when env->cr[3]==0x39000 (and without the condition). see
 what is going on. By running on helper_trace2 we found that the
 system is running in cr3 process 0. By conditional breakpoint on 
 0x39000, the helper_trace2 is still hit.
  !!!!! AFTER REMOVING the isInsOfProcess() check the system got to work!
  !!!! What the ...... heck .... could not figure why ....A

  Attempt 16: now remove the cr3_to_trace2 and resume the normal execution
 (enable trace mem) - still does NOT WORK!!!!

  Attempt 17: now check another process and see what's the reason of crash.
winlogon.exe ok. try csrss.exe ok. 
  lsass.exe. ok.
  svchost.exe. ok
  logonui.exe. ok.
  spoolsv.exe. ok.
  userinit.exe. ok
  alg.exe. ok.
  rundll32.exe. ok
  wscntfy.exe ok
  dumprep.exe didn't check.
 
  try notepad.exe (crash), branch.exe (
 !!!!!!!!!!!!!!1 Still the same error with 0x39000. Seems that process
 0x39000 is responsible for handling keyboard I/O. It's always complaining
 about invalid opcode at 0x20ece (les GS).!!! Verify next time.

   
 Attempt 20: It seems 0x20ece is the cause (multiple hits causes problem).
 Set a breakpoint on helper_trace condition 0x20ecc and 0x20ece and see
 how's different on (branch.exe) on GOOD and BAD versions.
    GOOD VERSION (branch.exe): 20ecc and 20ece is only hit ONCE! , check
if other instructions are ever hit. We set a bp on helper_trace2 if
env->cr[3] is 0x39000. It seems to be running all the times, only when
the program being traced is running, it is not invoked. ??does OS every switch?
    BAD VERSION (branch.exe): it's altogether hit twice. But after 0x20ece
 is hit, it's running in kernel model for over 1000000 instructions. not sure
what's the purpose of the process. 20ecc will be hit twice.

 Attempt 21: find a way to skil the execution of 20ece and see how it goes.
   (1) where does 20ece start from (translated from x86)?
   (2) where does next instruction start from?
   (3) do the jump.
  >>>> (1) set BP on disas_insn condition on 20ece. It seems that 0x20ece
	is the first instruction of this TB, however, env->eip is 0xece (missing
	the 20 at the beginning)? now sure if it's caused by a segment or not.
	it calls gen_exception  -> we can simply skip this by resetting EIP. 
       (2) gdb is too slow, let's simply COMMENT OUT the gen_exception statement	5547 of translate.c -> copy code of illegal_op and comment out
	the statement which generates the exception) --> does not work and
	involved in infinite loop of 0x39000 in GDB!!! To simply disable the
  	if check does not work either! (needs 2nd restart and skips the
	srss.exe due to windows mechanism).  check the effects on branch.exe
	shows that it is involved in blackscreen.  Now set a condition to 
	skip exception on 0x20ece ONLY. Does not work either!!!
  ----> conclusion: to either disable exception generation for 0x20ece or
	to disable all for any cases would not work. There are other 
	instructions after 0x20ece and they will trigger exception as well.

 Attempt 22: Figure out why 0x20ec6 is called from iret from 0x80xxxxx.
   (1) Figure out the functionality and process name of 0x39000
    >>> using QEMU does not work. Cannot read the process structure, maybe
	the address is in real mode.
    >>> use WinDbg to examine the list of all processes. 0x39000 is the
	process named "System".
    >>> From internet found that the System process has all EPROCESS structure
	for each process, it seems to be managing the processes and
	managing system calls. However, it's strange why 0x20ece will 
	be invokved and run in normal user mode.

 Attempt 23: check the ORIGINAL version and see if 0x20ec6 is EVER HIT and
	what's the CPL mode.
	>>> note that help_trace2 function is not there. we have to
	set bp on raise_exception_err if exception_index!=14. The system 
	does raise a general protection error (13) at 0xecc , but no
	further error at 0xece. Also set a bp at disas_insn condition
	pc_start==0x20ecc and pc_start==0x20ece. It is exactly the same
	as the revised version (adding help_trace2).

	Now use bp on disas_insn and condition to check how the process
	switch between 0x39000 and other processes. After the initial timer
	screen, it starts to switch to other processes (most of time
	EIP in 0x806xxxxx range, first instruction
	is 0x8069e090 and env->eflags is 2 - in kernel mode).
	Running in 0x39000 takes pretty long before switch to another process.
	
		Some instructions (1st) being switched to (in 0x39000):
			0x8069e090, 0x804ea161, 0x804f1c48, 0x80589c93
		It's strange that they are not jumped from gen_exception 
	or gen_interrupt; we'll need to figure out how it's jumping
	from one process (identified by cr[3]) to another. 

 Attempt 24: check why 0x20ec6 is accessed. (jumped from iret)
       bp on ***	806eeec5: iret (turn on VM flag), and ip in range
0x20ec0 and 0x20ecf (if too slow, create an if branch in helper_trace2).
	--->(result): 
	Logic of iret: (1) set CC op, (2) jmp pc (0x806eeec5), (3) calls
		helper_iret_protected (set a bp on it can see the logic):
		it calls POPL(ssp, sp, sp_mask) and gets the new EIP 0xec6!!!
		At this moment eflags is 0x202
	 pretty early
		0x20ec1 (push) is hit (eflags 0x202)
		0x20ec2: push
		0x20ec3: push ds
		0x20ec4: push
		0x20ec5: mov R, Ib
		0x20ec7: mov R, Iv
		0x20eca: mov Ev, Gv
		20ecd: INT 19 (0x13, eflags 0x202, disk request!!!!
			visited 
		Strangely it's hit multiple times! - and then system loading

		Then it hits 0x806eeec5 (iret) and then 0x20ec6
	Now we are clear about the iret logic, the EIP is the FIRST WORD
popped from kernel stack (ssp, sp, sp_mask). 

 Attempt25: set a BP at helper_trace 0x806eeec6, and 0x20ecd and see how
	it's pushed. After helper_trace finished, si in GDB, we should
  	see the call of helper_gen_interrupt and helper_iret respectively.
	<<<
		(1) create at line 2259 of ops_sse.h inside helper_trace2 to
		capture eip_in in the range.
		(2) set a BP at 2259 of ops_sse.h
		(3) observation on 0x20ecd: it does NOT push anything! Just
			sets env->exception_next_eip to 0xecf in 
			helper_gen_interrupt -> then it calls cpu_loop
			which later calls do_interrupt_real()!!! [instead of
			protected] this is determined by env->hflags & HF_SVMI_MASK (0x44 & 1<<21). (next eip is 0xecf), it does uses PUSHW to push the
information: cpu_compute_eflags(env) [0x202], old_cs:0x2000, old_eip: 0xecf
Note that PUSHW is defined as cpu_stw_kernel at ssp + sp & sp_mask.A
During the last push EIP of the last 0x20ecd:
	SSP: 0x22f30, SP (esp): 0x67fe4 (when retrieved it's 0x67fe2 in POPW)
The corresponding helper_iret_real matches

		(4) observation on 0x806eeec5: it's the iret instruction.
it calls helper_iret_protected (but NOT helper_iret_real!!!!) , it takes
out old_eip:0xec6, cs: 0x2000, eflags: 0x23206). The difference between
helper_iret_real and helper_iret_protected is that PUSHed and POPPed are
16-bit word!!! The first 5 0x20ecd is matched with helper_iret_real,
but the LAST one is matched with helper_iret_protected!!! (which is kind
wrong!!!!)---------------------------!!!!!

   Note that EACH CALL of 0x20ecd is PAIRED WITH an helper_iret_real in the
five first calls. Then after the 30 seconds waiting, 0x20ecd is hit
again (multiple times), this time:
	SSP: 0x22f30, SP (esp): 0x57fe4 , eip: 0xecf (still do_interrupt_real)
Still being matched every time for 0x20ecd (a corresponding 
helper_iret_real is called).
	Then after a while, helper_iret_protected is called by 
0x806eeec5 (iret)--> It turns out that (NOTE: there is no match of any 
hit of 0x20ecx range!!!)
	SSP: 0x0, sp: 0xf9e6374c. It reads out new_eip: 0xec6.
	Now the question is WHO PUSHes and calls?
	Set a BP on cpu_stl_kernel if ptr>=0xf9e63740 && ptr<=0xf9e63750
The problem is that this BP is NEVER EVER HIT!!!!! (so there is a 
memory corruption HERE!!!!)

	Work to do: set a watch point on the RAW ADDR that the system is
READ for cpu_ldl_kernel. It's reading VAddr 0xf9e6374c, real addr:0xa1a6d74c 
, then it calls ldl_le_p which really reads the address 0xa1a6d74c --> 0xec6.
(in two consecutive debugging sessions, all the above values are the same!)
Now it's easy, let's set a hardware w/r breakpoint on 0xa1a6d74c.
 >>> *** use command awatch *0xa1a6d74c (speed is pretty fast!)
  The contents of 0xa1a6d74c changes several times 0 -> 209 -> 1 -> 209 -> 
 0x806eec9e -> use ignore 4 (the awatch bp nunber) we find that it is
 hit 53 times. 
   So ignore it 52 times and then check the value (doesn't quite work,
 sometimes it's only hit twice).
  --> finally find that it is OVERWRITTEN to 0xec6 at one instruction before
 0xb339934f!!! The next action is helper_trace_mem (write to 0xf9e6374c)
 and env->cr[3]=0x39000. 
  !!!!!!!!!!!!!!
  env->eip is 0x806eeec4 (just one instruction
 before 0x806eeec5).


   Dump of instructions: 
	0x806eee64: 
	0x806eee6b: mov Ev, Gv
	0x806eee6d: arth
	0x806eee72: mov Gv, Ev
	0x806eee75: push Iv
	0x806eee7a: push Iv
	0x806eee7f: push Iv
	0x806eee84: push Iv
	0x806eee89: push Iv
	0x806eee8e: mov R, Iv
	0x806eee93: arith
	0x806eee98: mov R, Iv
	0x806eee9d: GRP1
	0x806eeea3: arith
	0x806eeea5: push
	0x806eeea6: pushf
	0x806eeea7: GRP1
	0x806eeeae: GRP1
	0x806eeeb5: push Iv
	0x806eeeba: mov R, Iv
	0x806eeebf: Op A, Iv
*	0x806eeec4: push
	0x806eeec5: iret 

 Attempt 26: verify that the push instruction 0x806eeec4 is the one which
 pushes 0xec6 into stack.
   >>> modify help_trace2 and add an if branch on 0x806eeec4 and 0x806eeec5.
   >>> hit the breakpoint and then add do_interrupt_protected, find the addressing being read
   >>> run it again and set an awatch point on the address
   >>> check if this is done by 0x806eeec4 and 0x806eeec5.	
--------------------------------------------------------------------------
 Conclusion:!!! confirmed 0xec6 is purposely pushed by 0x806eec4 so that iret
could jump to it. Now the problem is that is 0xec6 is an immediate number or
from some other registers! Need to dump instructions from 0x806eee9d to 0x806eeec5.
  Job to do: write a function to dump instructions starting at an address.
----------------------------------------------------------------------------

 Attempt 27: write a function printInstr(begin_addr, end_addr) - print
all instructions in the range.
  >>> b ops_sse.h:2259
  >>> then call print_instrRange(0x806eee64, 0x806eeec6), got the dump below
 ---------------
 @EIP 0x806eee64: length: (7): movl	%fs:0x40, %esi
@EIP 0x806eee6b: length: (2): mov	%esp, %eax
@EIP 0x806eee6d: length: (5): sub	$0x00000210, %eax
@EIP 0x806eee72: length: (3): movl	%eax, 0x4(%esi)
@EIP 0x806eee75: length: (5): push	$0x00000000
@EIP 0x806eee7a: length: (5): push	$0x00000000
@EIP 0x806eee7f: length: (5): push	$0x00000000
@EIP 0x806eee84: length: (5): push	$0x00000000
@EIP 0x806eee89: length: (5): push	$0x00002000
@EIP 0x806eee8e: length: (5): mov	$0x806EF6CC, %eax
@EIP 0x806eee93: length: (5): sub	$0x806EEEC6, %eax
* @EIP 0x806eee98: length: (5): mov	$0x806EEEC6, %edx
* @EIP 0x806eee9d: length: (6): and	$0x00000FFF, %edx
@EIP 0x806eeea3: length: (2): add	%edx, %eax
@EIP 0x806eeea5: length: (1): push	%eax
@EIP 0x806eeea6: length: (1): pushf	
@EIP 0x806eeea7: length: (7): orl	$0x00020000, (%esp)
@EIP 0x806eeeae: length: (7): orl	$0x00003000, (%esp)
@EIP 0x806eeeb5: length: (5): push	$0x00002000
@EIP 0x806eeeba: length: (5): mov	$0x806EEEC6, %eax
@EIP 0x806eeebf: length: (5): and	$0x00000FFF, %eax
@EIP 0x806eeec4: length: (1): push	%edx
@EIP 0x806eeec5: length: (1): iret	
@EIP 0x806eeec6: length: (4): mov	$0x0012, %ax
@EIP 0x806eeeca: length: (2): addb	%al, (%eax)
* @EIP 0x806eeecc: length: (2): int	$0x10
@EIP 0x806eeece: length: (2): les	%esp, %eax
@EIP 0x806eeed0: length: (2): addb	%al, (%eax)
@EIP 0x806eeed2: length: (2): addb	%al, (%eax)
@EIP 0x806eeed4: length: (2): addb	%al, (%eax)
@EIP 0x806eeed6: length: (2): addb	%al, (%eax)
@EIP 0x806eeed8: length: (2): addb	%al, (%eax)
@EIP 0x806eeeda: length: (2): addb	%al, (%eax)
@EIP 0x806eeedc: length: (2): addb	%al, (%eax)
@EIP 0x806eeede: length: (2): addb	%al, (%eax)


 -----------------

  So it's pushing %edx, but its source is from two immediate numbers
(look at the two instructions with * above), it's intention should be
actually return to 0x806eeec6?
  From the sequence of POPLs in helper_ret_protected (in POP order), we have
   new_eip - 0xec6
   new_cs - 0x00002000
   new_eflags - result of pushf @0x80teeea6
 ------------
 Conjecture 1: intention is to actually jump back to 0x806eeec6?
   There are only three instructions after it. It's basically to 
   int 0x10 (ax=0x12) - looks like a VGA I/O request
   then the instruction les at 0x806eeece is to load result eax into es:[esp]
   Interestingly there are no more instructions after that!
   When the BP (at 0x806eeec4) is hit a second time, dump instructions
 displays the same.
 --------------
  Now simply look at another couple of iret instructions and see if it
 is arranging PUSHes before iret.
  examples: 0x804dfa41(no), 0xc01d9 (no), 0x804df9a6(no)
 ------------------
    dump the instructions around 0x20ec6 we have:
  They are EXACTLY the same around the code 0x806eeec6!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
 @EIP 0x20e75: length: (5): push	$0x00000000
@EIP 0x20e7a: length: (5): push	$0x00000000
@EIP 0x20e7f: length: (5): push	$0x00000000
@EIP 0x20e84: length: (5): push	$0x00000000
@EIP 0x20e89: length: (5): push	$0x00002000
@EIP 0x20e8e: length: (5): mov	$0x806EF6CC, %eax
@EIP 0x20e93: length: (5): sub	$0x806EEEC6, %eax
@EIP 0x20e98: length: (5): mov	$0x806EEEC6, %edx
@EIP 0x20e9d: length: (6): and	$0x00000FFF, %edx
@EIP 0x20ea3: length: (2): add	%edx, %eax
@EIP 0x20ea5: length: (1): push	%eax
@EIP 0x20ea6: length: (1): pushf	
@EIP 0x20ea7: length: (7): orl	$0x00020000, (%esp)
@EIP 0x20eae: length: (7): orl	$0x00003000, (%esp)
@EIP 0x20eb5: length: (5): push	$0x00002000
@EIP 0x20eba: length: (5): mov	$0x806EEEC6, %eax
@EIP 0x20ebf: length: (5): and	$0x00000FFF, %eax
@EIP 0x20ec4: length: (1): push	%edx
@EIP 0x20ec5: length: (1): iret	
@EIP 0x20ec6: length: (4): mov	$0x0012, %ax
@EIP 0x20eca: length: (2): addb	%al, (%eax)
@EIP 0x20ecc: length: (2): int	$0x10
@EIP 0x20ece: length: (2): les	%esp, %
------------------
  
   Conjecture: maybe the part of the code around 0x806eeec6 (push the addr in)
 is to enforce to jump to 0x20ec6 (switching real mode and protected mode?) 

>>>> now the question is who's calling and leads to 0x20ec6 eventually!!!!!

Attempt 28: keep an arraylist of addresses and then dump it to file when hit
0x20ec6 --> 64MB limit is exceeded at least 10 times.
   Find the entry point leads to 0x20ec6: use print_instrRange to print instructions.
	0x806e339b    16298285
        There is a loop repeated many times:
		@EIP 0x806f30a2: length: (4): andw	$0x00, (%edi)
		@EIP 0x806f30a6: length: (1): inc	%edi
		@EIP 0x806f30a7: length: (1): inc	%edi
		@EIP 0x806f30a8: length: (1): inc	%eax
		@EIP 0x806f30a9: length: (5): cmp	$0x00001000, %eax
		@EIP 0x806f30ae: length: (2): jc	0xFFFFFFDB

		@EIP 0x806f3089: length: (5): cmpw	$0xFFFF, (%edi)
		@EIP 0x806f308e: length: (2): jz	0x00000014
> It's a counter loop that increaes (%edi): use vi to remove the above loop, 
vi command:
  ----------------------->
  -----------------------> Finally found the entry of the interrupt handler
  0x8052d41f
0x8052d421
0x8054800f
0xf896094e
0xf896094f
0xf8960951
0xf8960955
0xf896095b
0xf8960961
0xf8960963
0xf8960968
0x806f3110 -----------1st instruction here!!!!!
0x806f3112
0x806f3113
0x806f3115
0x806f3118
------------------------------------------------------
%s/0x806f30a2\n.*a3\n.*.....8e//g
It's 0xf8960937 who triggers it!!!!!!!!!!!!!!!!!
---------------------------------------------------

  >>> now stop at 0xf8960937 and see how it goes
 Interestingly the bp never hits!
  In ddebug session: the first time it's hit: last_eip: 0xf9e5c937
                     the second time it's hit: last_eip: 0xf9e5c968
  2nd debug session: 1st hit: last_eip: 0xf9e5c937
			2nd hit: last_eip: 0xf9e5c968
  Instruction dump below: [NOTE the instructions with *]
*    	@EIP 0xf9e5c937: length: (3): lcall	*0x2C(%eax)
	@EIP 0xf9e5c93a: length: (5): push	$0xF9E5D900
	@EIP 0xf9e5c93f: length: (5): lcall	0xFFFFFA39
	@EIP 0xf9e5c944: length: (2): mov	%bl, %al
	@EIP 0xf9e5c946: length: (1): pop	%edi
	@EIP 0xf9e5c947: length: (1): pop	%ebx
	@EIP 0xf9e5c948: length: (1): pop	%esi
	@EIP 0xf9e5c949: length: (1): leave	
	@EIP 0xf9e5c94a: length: (3): ret	$0x0004
*	@EIP 0xf9e5c94d: length: (1): int3	
	@EIP 0xf9e5c94e: length: (1): push	%ebx
	@EIP 0xf9e5c94f: length: (2): xor	%ebx, %ebx
	@EIP 0xf9e5c951: length: (4): cmpb	%bl, 0x8(%esp)
	@EIP 0xf9e5c955: length: (6): movl	%ebx, 0xF9E5E6AC
	@EIP 0xf9e5c95b: length: (6): movl	%ebx, 0xF9E5E6A8
	@EIP 0xf9e5c961: length: (2): jz	0x0000000A
	@EIP 0xf9e5c963: length: (5): movl	0xF9E5C328, %eax
*	@EIP 0xf9e5c968: length: (3): lcall	*0x2C(%eax)
	@EIP 0xf9e5c96b: length: (5): push	$0xF9E5D900

 After recompile, hit 0xf9e5c937 (cr3:0x39000: -> 0x806f3110  
-- (use "watch absolute addr of env->cr[3]) to verify that no context switch) 
--> 0x806eeec4. 

   2nd hit: 0xf9e5c968 --> ... > 0x805eeec4 -> crash. Now the question is
 who triggers 0xf9e5c968. Need to look at 0xf9e5c94e (see above code dump). 
 When 0xf9e5c94e is hit, its last eip is  0x8054800f!

  Break on 0x8054800f (ljmp 0x804d76A0), 
--------------- the above analysis is not accurate, 0x806f3310 is NOT
the first instruction ----------------------------------

---------------------------------------------------------------------
Attempt 28: It's hard to tell the difference between function call. Our
conjecture is that process 0x39000 is a process providing interrupt handlers.
Very likely the 0x20ec6 is eventually triggered by an interrupt. 

  Modification: in helper_trace record last_cr3, if it is switching from 
other process to 0x39000, record the last_cr3 and last_eip. Chomp the dump
of instructions and compare.

-----> Observation:
  (1) the first context switch is from cr3 0x0, eip: 0x4047a9.
	Dump: 
	@EIP 0x4047a0: length: (1): inc	%ebx
	@EIP 0x4047a1: length: (3): addb	%dh, -0x24(%edx)
	@EIP 0x4047a4: length: (5): movl	0x0046A418, %eax
	*** @EIP 0x4047a9: length: (3): mov	%eax, %cr3
	So actually process 0x39000 is the same as process 0!!!

  (2) the second context switch is from cr3: 0x542e000 eip: 0x804e1f6c
					cr3: 0x50ae000 eip: 0x804e1f6c
---- dump --->
	!!!!!!!!!!!!!----------------
@EIP 0x804e9634: length: (1): push      %ebx
@EIP 0x804e9635: length: (1): push      %esi
@EIP 0x804e9636: length: (1): push      %edi
@EIP 0x804e9637: length: (6): movl      %fs:0x00000124, %eax
@EIP 0x804e963d: length: (3): movl      0x44(%eax), %ebx
@EIP 0x804e9640: length: (3): movl      %ebx, -0x24(%ebp)
@EIP 0x804e9643: length: (6): lcall     *0x804D7650

	Interesting, if run in GDB, it's cr3: 0x516e000 eip: 0x804e9634
  ----> could not capture it in GDB, the cr3 and eip is ALWAYS switching slightly, but the first instruction is always the same:  (cli insruction)
	0x804e0f69 - check what it is.
         @EIP 0x804e0f68: length: (1): nop	
	* @EIP 0x804e0f69: length: (1): cli	
	@EIP 0x804e0f6a: length: (6): movl	0xFFDFF03C, %ecx
	@EIP 0x804e0f70: length: (3): leal	0x50(%ecx), %eax
	@EIP 0x804e0f73: length: (4): movb	$0x89, 0x5(%eax)
	@EIP 0x804e0f77: length: (1): pushf	
	@EIP 0x804e0f78: length: (7): andl	$0xFFFFBFFF, (%esp)

  It looks like that the context switch occurs for the push instruction,
and note that it is NOT actually an INT instruction.


Attempt 30: check  the INT instruction and see how it is switched to 
process 0x39000.
  1> check out several instructions that trigger int N.
	(1) cr3: 0x463c000 (this can change though), pc: 0x8050b895, int 0x2d
	Interestingly. the BP is only HIT once! there are many other processes
   invoking interrupts, e.g., 0x39000 itself. The following is a list:
		(1) 0x20980 int 0x10
		(2) 0x20d79 int 0x15
		(3) 0x206cb int 0x13
       At 0x8050b895, the instructions are:
	@EIP 0x8050b887: length: (2): mov	%edi, %edi
	@EIP 0x8050b889: length: (1): push	%ebp
	@EIP 0x8050b88a: length: (2): mov	%esp, %ebp
	@EIP 0x8050b88c: length: (3): movl	0x10(%ebp), %eax
	@EIP 0x8050b88f: length: (3): movl	0x8(%ebp), %ecx
	@EIP 0x8050b892: length: (3): movl	0xC(%ebp), %edx
	*@EIP 0x8050b895: length: (2): int	$0x2D //x86 debug service
	@EIP 0x8050b897: length: (1): int3	
	@EIP 0x8050b898: length: (1): pop	%ebp
	@EIP 0x8050b899: length: (3): ret	$0x000C

     After the INT 2d -> it visits 0x804e0032 (still cr[3] 0x44fc000), using
the breakpoints we can find htat it executes within the same originator
process 0x44fc0000 for some instructions and then it jumps to

----------------------------------------------------------
	cr3: 0x39000, eip_in: 0x804dc0b4.!!!
   my_last_eip is 0x804dc0b1, and (actually both processes these instructions
are located in the same addr range. Note the instruction at 0x804dc0b1
at the following dump (when cr3 is 0x39000)
	@EIP 0x804dc0ab: length: (3): movl	0x18(%edx), %eax
	@EIP 0x804dc0ae: length: (3): movl	%eax, 0x1C(%ecx)
	*@EIP 0x804dc0b1: length: (3): mov	%eax, %cr3
	*@EIP 0x804dc0b4: length: (4): movw	0x30(%edx), %ax
	@EIP 0x804dc0b8: length: (4): movw	%ax, 0x66(%ecx)
	@EIP 0x804dc0bc: length: (3): ret	$0x0008
	@EIP 0x804dc0bf: length: (3): leal	(%ecx), %ecx
----------------------------------------------------------
!!! conclusion: when INT 2d occurs, it jumps to the interrupt handler 
(still in the same addr space, same cr3 page table). Then somewhere
later it switches to 0x39000. 
   Setting a BP at helper_raise_exception (NEVER hit). It seems that
it's the INT 2D causes the problem.

Attempt 31. Now change the process to watch to branch.exe and look at when
it is switching to 0x39000.
  <<<
   1. find out if it's hitting 0x8050b895 in branch.exe.
  >>> no it didn't hit 0x8050b895
  <<<
   2. modify the conditional breakpoint and check the first time
 it's switching from branch.exe to 0x39000
   >> after some dumps of helper_trae, it hits the switch ONLY once,
 and then it comes to the exception!!!!
   3. now find out the EIP of the instruction
   >>> eip: 0x804e0f69 (in 0x39000: cli instruction), and the last
 instruction in branch.exe (0x804e9634).
    Interestingly, the trace has only several instructions there,
   as shown below:
   -------------trace dump v2----------------
 #### CR3: c059000,   Size: 367, TOTAL: 4683
... 
@806ecdc0 [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecdc7 [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecdcd [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecdd2 [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecdd5 [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecdd6 [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecddb [cr3: 1000d000, visited: 1]:  add     [eax], al
@806ecddd [cr3: 1000d000, visited: 1]:  add     [eax], al

------found last 20ec6------------------------
0x8056bfc6
0x8056bfc7

>>> set a BP on 806ecdc0 and check the instructions (note env->cr3 is 0x39000)
dump is below:
@EIP 0x806ecdc0: length: (7): testb	$0xFF, 0xFFDFF050
@EIP 0x806ecdc7: length: (6): jnz	0xFFFFFE99
@EIP 0x806ecdcd: length: (5): push	$0x000000D1
@EIP 0x806ecdd2: length: (3): sub	$0x04, %esp
@EIP 0x806ecdd5: length: (1): push	%esp
@EIP 0x806ecdd6: length: (5): push	$0x000000D1
@EIP 0x806ecddb: length: (2): push	$0x1C
@EIP 0x806ecddd: length: (5): lcall	0x00001ECF ---> jumps 0x806eecac. 
    This can be verified in process 0x39000 dump
But interestingly, 0x806ecddd is THE LAST INSTRUCTION recorded for process
branch.exe. Need to figure out why the cr3 is switched (it's just an lcall)A

  At 0x806ecddd, it later hits helper_trace_mem (why is this called? check
 disas_insn later -- to push next addr), it's writing 0xf8f17d4c. --> 
 interestingly, it is calling helper_trace2 directly without calling
 helper_stl_mmu!!! why????  It's actually ok because the helper_stl_mmu
 is actually handled together by the ld/st_labels.

  Do an experiment and see if every qemu_ld ---> helper_trace_mem --> helper_ldl_mmu. 
   >>> b tcg_out_qemu_ld first find an instruction which does ld
	0xfe05b compw $0x94, %cs:(%esi)
       the translation of the instruction starts from 0xb31e6063
			add_qemu_ldst_label starts at	   0xb31e608c (actually it does not take any actual translation).
		call helper_mem_trace starts from 0xb31e608c

   >>> then b helper_trace on the instruction; b helper_trace_mem and then
       b helper_ldl_mmu
     ---> it's calling helper_ldl_mmu without calling helper_trace_mem
  ----> the current implementation is wrong, helper_trace_mem is NOT called
 at all! the tcg_out_tbl_load (at its end) has a conditional JUMP which direcltyjumps to the next instruction!!!!!!!!
   So needs to lift the qemu_out_trace_mem UP!!!!!!

  Verification of the change: b helper_trace on 0xfe05b, then b on helper_trace_mem and helper_ldl_mmu: VERIFIED FIXED.

  Now repeat the experiment on branch.exe again. check the following instr:
0x806ecddd.
@EIP 0x806ecddd: length: (5): lcall	0x00001ECF ---> jumps 0x806eecac. 
>>> now the problem is that it only records no more than 20 instructions. Fixed
the but, it's in Trace::addInstruction(...)

Last instruction recorded is the following:
@804e1f4c [cr3: 0ec6d000, visited: 0]:  mov     ax, 0x0023
@804e1f50 [cr3: 0ec6d000, visited: 0]:  sub     esp, 0x30
@804e1f53 [cr3: 0ec6d000, visited: 0]:  mov     ds, ax

------found last 20ec6------------------------
It seems after 5 instructions in @7c92xxxx range, it executes around 15000
instructions in the 804exxxx range and never gets out (and then crashed).
Last instruction is @7c92289c (jz 0x7 - that is jz 0x7c9228a3).
Set a BP on it.. Could not capture on it. The system skipped several instructions at the beginning.
>>> record the trace without memory trace and see how is it different.
  (1) Correct Trace: to 0x00401014, captured over 889k instructions. Instruction
804e1f25 is always hit after some 7c92xxxx instruction.
  (2) Incorrect Trace: about 16k instructions.  Interestingly, the incorrect
  trace dump the contents of instructions correctly!
 !!! GUESS: somehow cpl or segment registers not set correct.


Attempt 31. switch back to srss.exe and recrod the list of instructions
of 0x39000 and see what's the difference. (similarly, add some code to
dump the instructions before hitting 0x20ec6).
<<<<
 ???? write a simple program and search for the occurance of instructions
in trace_bad.txt which is not in trace_good.txt
<<<<
  >>> (1) tac trace_bad.txt > reverse_trace_bad.tx 
  >>> (2) do a simple python script to find the first departure point 
	between trace_bad and trace_good.
   > first word departure 0x806ee297 
  --- guess: it's a interrupt handler function depends on some values
  do the jump
  The sequence of departure instructions (the instruction is the same
for both traces and the next instruciton is different) are:
-=---------------------------------  size of hash table is 2884927
departure point: 0x806ee297 - RET
departure point: 0x804e37fd - jz        0x00000052 -- THIS SEEMS TO BE THE ONE
                        Then after 0x804e37ff, all not visited before
                        About 43k instructions executed
                                --> in trace_good, the jz is only hit once.
departure point: 0x8054b131 - ret       $0x000C
departure point: 0x804e2acc - ret
departure point: 0x8054b03c - ret 0x8
departure point: 0x804da2de - ret
departure point: 0x804fa5ee - ret 0x8


	
	


Attempt 32. To verify eflags and flags, we'll switch back to srss.exe to
monitor. (1) get its trace first and then dump the eflags/flags. Develop a
function that dump the eflags.
  print_eflags and flags at instruction 0x7c92289a (the first instruction)
 e_flags and h_flags are exactly the same at 7c92289a!
  Then dump the e_flags (202) and h_flags (400b4) when it's dumping the trace.
then wen dumping the trace.
  ---- good_trace: dumped at  0x800ca1f6  still eflags (202) and h_flags (400b4)
  ---- bad_trace: dumped at 0x20ec6 eflags (23002) and hflags (4008c7)
!!!! now inspect the situation at 0x804f08a3 (the departure point)
  --- good_trace: first couploe of hits 202 and 400ab4, last couple of hits
	202 and 4000b4. Seems to be OK though.
   ---bad_brace: 0x804f08a3 is hit multiple times, last time hit: 202 and 4000b4
, first couple ofhits: 400ab4 (note: ab4)
----> completely the same, there is no way to explain the different behavior
on the comparison with $0x20FD at 0x804f08a3.

Attempt 33. Now remove the printf function, and perform the analysis again.
After replacing it with a dummy function, found that the problem is with
the parametes passed!!!!
  Strangely: if passing all 4 parameters to dummy_func, it malfunctions.
		if passing any ONE parameter not as a contsnt, it hangs!
  Comparison below: 
----------------------------------------------------------
  good version (passing all 1's) as parameters:
   0x832efd2 <helper_trace_mem+106>:	movl   $0x1,0xc(%esp)
   0x832efda <helper_trace_mem+114>:	movl   $0x1,0x8(%esp)
   0x832efe2 <helper_trace_mem+122>:	movl   $0x1,0x4(%esp)
   0x832efea <helper_trace_mem+130>:	movl   $0x1,(%esp)
   0x832eff1 <helper_trace_mem+137>:	
    call   0x8459174 <dummy_func(unsigned int, unsigned int, unsigned int, int)>
  ----> parameters ARE pushed into the stack, set a bp at 0x832efd2 by
	b *0x832efd2, we have ESP value: 0xb02fde00, after the call is finished:(void *) 0xb02fde00, so the ESP is back to its normal status.
-------------------------------------------------------------

-------------------------------------------------------------
  bad version (pass one REAL parameter):
  0x832efd2 <helper_trace_mem+106>:	mov    -0x1c(%ebp),%eax
   0x832efd5 <helper_trace_mem+109>:	mov    0xec(%eax),%edx
   0x832efdb <helper_trace_mem+115>:	mov    0x85e92ac,%eax
   0x832efe0 <helper_trace_mem+120>:	mov    -0x24(%ebp),%ecx
   0x832efe3 <helper_trace_mem+123>:	mov    %ecx,0xc(%esp)
   0x832efe7 <helper_trace_mem+127>:	mov    %edx,0x8(%esp)
   0x832efeb <helper_trace_mem+131>:	mov    -0x20(%ebp),%edx
   0x832efee <helper_trace_mem+134>:	mov    %edx,0x4(%esp)
   0x832eff2 <helper_trace_mem+138>:	mov    %eax,(%esp)
   0x832eff5 <helper_trace_mem+141>:	
    call   0x8459178 <dummy_func(unsigned int, unsigned int, unsigned int, int
  ----> parameters ARE pushed into the stack, set a bp at 0x832efd2 by
	b *0x832efd2, we have ESP value: 0xb02fde00, after the call is finished:(void *) 0xb02fde00, so the ESP is back to its normal status.


**** (GOODVERSION!!!) By setting a BP at helper_trace2, we found that the system is involved in a small infinite loop of around 30 instructions
   0xfffffff0 --> ljmp $E05B
   0xfe05b -->    cmpw $0x94, %cs:(%esi)A
	--> calls dummy_func
   0xfe062 --> 	jnz 0xC031E58E
   0xfe066 --> xor %eax, %eax
...
   0xfc493
   0xfc495               lidt %cs: (%esi)
		hint the dummy_func twice
   0xfc49b               lgdt %cs:(%esi)
                hit the dummy_func twice --> after this instruction 
			env->gdt->base becomes 0xfd3a8!!!!
   0xfc4a1--> mov cr0, %eax
   0xfc4a4--> mov %eax, %cr0
   0xfc4a8--> ljmp 0xC4B3:0xC4B3
   0xfc4ab -- ljmp $0xC4B3, 0xC4B3
  ---> 0xfc4b3 (it is accomplished via helper_ljmp_protected:
		new_cs=8, new_eip=fc4b3, next_eip_addend=8
		load e1 = 0xffff, e2 = 0xcf9b00, cpl and dpl are both 0.
		Then it jumps to 0xfc4b3 (new_eip))

-------------------------------------------------------------
*** (BAD VERSION!!!) same setting
  
   0xfffffff0 --> ljmp $E05B
   0xfe05b -->    cmpw $0x94, %cs:(%esi)A
	--> calls dummy_func
   0xfe062 --> 	jnz 0xC031E58E
   0xfe066 --> xor %eax, %eax
   0xfe068 --> mov
   0xfe06a --> mov $7000, %sp
   0xfe070 --> mov $416c, %dx
   0xfe076 --> ljmp 0x5566e40c
   0xfc480 --> mov %ax, %cx
   0xfc483 --> cli
   0xfc484 --> cld
   0xfc485 --> mov $0x008f, %ax
   0xfc48b --> out %al, $0x70
   0xfc48d --> in $0x71, %al
   0xfc48f --> in $0x92, %al
   0xfc491 --> or $0x2, %al
   0xfc495               lidt %cs: (%esi)
		hint the dummy_func twice
   0xfc49b               lgdt %cs:(%esi)
                hit the dummy_func twice
   0xfc4a1--> mov cr0, %eax
   0xfc4a4--> mov %eax, %cr0
   0xfc4a8--> ljmp 0xC4B3:0xC4B3
   0xfc4ab -- ljmp $0xC4B3, 0xC4B3
   back to 0xfffffff0 AGAIN!!!
  ---> 0xfffffff0 (it is accomplished via helper_ljmp_protected:
		new_cs=8, new_eip=fc4b3, next_eip_addend=8
		*** load e1 = 0, e2 = 0 Different!!!!!
			it loads dt from env->gdt {selector 0, base 0, limit 55}
			It's not changed after lidt and lgdt.


---------------------------------------------------
Attempt 34: Analyze how dummy_func would affect lgdt %cs: (%esi)
--------------------------------------------------

*** good version:
   before call helper_trace_mem dump registers:
        eax            0xc480	50304  --> 0
	ecx            0xf0000	983040
	edx            0xfd3a0	1037216
	ebx            0xfd3a0	1037216
	esp            0xb02fde40	0xb02fde40
	ebp            0x28da4b90	0x28da4b90
	esi            0x0	0
	edi            0x2	2
	eip            0xb31e6453	0xb31e6453 <code_gen_buffer+1107>
	eflags         0x246	[ PF ZF IF ]
	cs             0x73	115
	ss             0x7b	123
	ds             0x7b	123
	es             0x7b	123
	fs             0x0	0
	gs             0x33	51
  After call helper_trace_mem dump registers, changes eax, eip.
  Note that it's 0xb31e64d5 (mov %ecx, 0xc4(%ebp)) writes into env->gdt->base!
While 0x...9b's translated code starts  at 0xb31e642b!
  %ecx is from helper_ldl_mmu (addr: 0xfd3a2, mmu_idx:0) -> 0xfd3a8

*bad version --> the problem is with second parameter: addr is 0x2 -> which
is definitely not right. %esi in emulator is 0, env->segs[1]->base is 0xf0000 (CS). Good version is the same. So it's where to load the address that matters!
---> next set BP on helper_ldl_mmu!!!!!!!!!!!!!!!!!

------------------------------------------------------
Attempt 35: check helper_ldl_mmu how it's different
-----------------------------------------------------
<<<< (1) set BP on helper_trace2 and display/x eip_in
     (2) run until 0xf4e9b (where it tries to load lgdt)
     (3) then start step by step and see how parameter 0xfd3a2 (good version)
	and bad version (0x2) is passed to helper_ldl_mmu

---------------
  good version
---------------
    
     
--------------
  bad version
--------------
	called helper_ldl_mmu twice
	1st time: (for lidt) addr ix 0xfd3e2, return: 0xfd3e6
	2nd time: (for lgdt) addr is 2
	Note that the two envs->regs and env->segs are exactly the same!
	Check translate.c line 7359 for the logic:
		gen_lea_modrm: 
			A0 <- disp
			A0 <- A0 + seg
			*reg_ptr = OR_A0
			*offset_ptr = disp

			T1 <- [A0]
			A0 <- A0 + 2
			T0 <- [A0]
			base <- T0  (cpu_T[0])
			limit <- T1 (cpu_T[1])
   The point is why A0 <- disp + seg yields 2?
Find out the corresponding instructions to the two micro operations
in gen_lea_modrm: 
	tcg_ctx.gen_opc_ptr is 0x28c1ce7c (A0<-disp)  microcode: 11 (movi_32)
		    ends   0x28c1ce82 (end of A0<-A0+2):miccode: 18 (ld_32), 22 (add_i32)
		(corresponds to tcg_ctx.gen_opc_ptr[80] to [82](included))
            checking tcg/i386/tcg.c tcg_gen_code, then correspond to code at
		A0 <- disp is just saved in args[15] (TCGTemp*), it is not really translated into an instruction
		A0 <- A0 + seg is then located from
		0xb31e642b to 0xb31e642e, as dumped below:

---------------------------------------------------------------------
!!!!!NOTE THAT THE FOLLOWING RUNNING TRACE IS EXACTLY THE SAME
AS THE GOOD TRACE EXCEPT instruction 0xb31e64a4 (where %ecx has different value)
****************** translation of lgdt %cs(%esi) *************************
0xb31e642b :	mov    0x54(%ebp),%ecx  %ecx has the set value (cs in emulator) 
	At run time: its %ecx receive value 0xf0000 (this is CS selector)
0xb31e642e :	mov    $0xd3a0,%ebx   (%ebx has the const value)
0xb31e6433 :	lea    (%ebx,%ecx,1),%edx (this is the add edx = ebx + ecx)
	# after this point $edx is 0xfd3a0
	# there is a helper_trace_mem call RIGHT AFTER this
	# after the helper_trace_mem call: edx is 0xfd3a0
#------------------ seems to be doing T1 <- [A0] now ------------------
Then we have the following sequence of instructions before helper_ldl_mmu
#-----------------------------------------------------------------------
   0xb31e6436 :	mov    %edx,%ebx   #save edx to ebx. ebx = 0xfd3a0 now
   0xb31e6438 :	mov    %eax,0x80(%esp)
		#esp is now 0xb02fde50, save $eax to 0xb02fded0 (val: 0xc480)
   0xb31e643f :	mov    %ecx,0x8c(%esp) #save ecx (0xf000) to 0xb02fdedc
   0xb31e6446 :	mov    %edx,0x88(%esp) #save edx (0xfd3a0) to 0xb02fded8!!!
		# later you will see these are generated by
		# check of clobber registers and save them at line 1934!!!
		# again, this is to preserve registers eax, ecx, edx
		----------------------------------------------------
#// set a watch point on 0xb02fded8 here, see who's changing it!
   0xb31e644d :	push   $0x1 #param 4: bRead 1
   0xb31e644f :	push   $0x1 #param 3: size 1
   0xb31e6451 :	push   %ebx #param 2: addr 0xfd3a0 (read 0xfd3a0) correct!
   0xb31e6452 :	push   %ebp #param 1: env
   0xb31e6453 :	call   0x832ef68 <helper_trace_mem>

   0xb31e6458 :	add    $0x10,%esp #at this point $esp is 0xb02fde40, reverse
	# to its orignal status, to 0xb02fde50!
   0xb31e645b :	mov    %ebx,%eax  #eax has now 0xfd3a0
   0xb31e645d :	mov    %ebx,%edx  #edx has now 0xfd3a0
   0xb31e645f :	shr    $0x8,%eax  #eax has now 0xfd3
   0xb31e6462 :	and    $0xfffff001,%edx #edx is still 0xfd000
   0xb31e6468 :	and    $0xff0,%eax #eax is now  0xfd0
   0xb31e646e :	lea    0x35c(%ebp,%eax,1),%eax   #eax is now 0x28da5ebc
		# guess operand T1? 
   0xb31e6475 :	cmp    (%eax),%edx
   0xb31e6477 :	mov    %ebx,%edx #edx has 0xfd3a0 again
   0xb31e6479 :	jne    0xb31e65d7 <code_gen_buffer+1495>
#----------------------------------------------------------------
  Then we have another call of helper_trace_mem!, as seen below:
 note that the actual cpu_ldl_mmu is not done yet! because they 
 need to do the read/write operations at the end of the block.
 the following is the memory trace for read operation to read [A0+2]
#----------------------------------------------------------------
   0xb31e647f :	add    0xc(%eax),%edx   # eax = 0x28da5ebc
   0xb31e6482 :	movzwl (%edx),%ebx       #$edx is 0xb06be3a0 now, ebx=0xfd3a0,
					 # after ebx -> 0x37???
   0xb31e6485 :	mov    0x88(%esp),%esi   # esi is from [88+esp], 
					 # should be value of A0: 0xfd3a0
					 # because A0 in s->temp[15] has
					 # the information about its location
					 # in memory!
   0xb31e648c :	lea    0x2(%esi),%ecx    # ecx is addr parameter
					 # from 2 + esi
   0xb31e648f :	mov    %ecx,0x88(%esp)   # save ecx to stack
					 # [0xb02fded8] = 0xfd3a2
   0xb31e6496 :	push   $0x1  #4th param  bRead --> this is to read 
   0xb31e6498 :	push   $0x2  #3th param  size --> 2
   0xb31e649a :	push   %ecx  #2nd param  addr --> 0xfd3a2
   0xb31e649b :	push   %ebp  #1st param  env --> stored at $ebp (0x28da4b90)
   0xb31e649c :	call   0x832ef68 <helper_trace_mem>
**************************************************************************
  Then the handling after helper_trace_mem is similar
**********************************************************************
   0xb31e64a1 add    $0x10,%esp  #esp is 0xb02fde50
   0xb31e64a4 mov    %ecx,%eax # eax = 2 !!!! in the good trace $ecx is
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
	# ecx is 0xfd3a2. So in the bad version, $ecx value is DESTRUCTED
	# somehow it's not saved!!!!!
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
  !!!!!!!!!@#$@#$@#$@#$@$#@#$!!@#!#@@!#$@#$@##$%#$^#$$#^%$#%^$%^$%^$^
   0xb31e64a6 mov    %ecx,%edx # edx = 2
   0xb31e64a8 shr    $0x8,%eax # eax = 0
   0xb31e64ab and    $0xfffff003,%edx #edx = 2
   0xb31e64b1 and    $0xff0,%eax #eax = 0
   0xb31e64b7 lea    0x35c(%ebp,%eax,1),%eax #eax = 0x28da4eec
   0xb31e64be cmp    (%eax),%edx #
   0xb31e64c0 mov    %ecx,%edx #edx = 2
   0xb31e64c2 jne    0xb31e65f2 
**********************************************************************
  Now call the helper_ldl_mmu!
  0xb31e65f4 push   %ecx  #push param 2: addr (2)!!!!!
  0xb31e65f5 push   %ebp  #push param 1: env
  0xb31e65f6 call   0x82f0d6e <helper_ldl_mmu>

 
**********************************************************************

+++++++++++++++++++++=
FINALLY FINALLY
Problem is helper_trace_mem did NOT preserve ECX register, but this is
according to Intel ABI (ECX, EDX, and EAX need not be preserved!!!)
. But ECX register is used by the CALLER!
That's the problem!

Fix: enfoce preserve ECX, EDX but not EAX register!
    Question 1. check if there is any special flags for pre-serving registers
		
			
-----------------------------------------------
Attempt 36: figure out why %ecx is allocated for storing temporary.
 Analyze which generates the following instruction pairs:
   1. 
   0xb31e6446 :	mov    %edx,0x88(%esp) #save edx (0xfd3a0) to 0xb02fded8!!!
   0xb31e6485 :	mov    0x88(%esp),%esi   # esi is from [88+esp], 

   2. 
   0xb31e648f :	mov    %ecx,0x88(%esp)   # save ecx to stack
   !!! but there is no corresponding restore instruction
--------------------------------------------------
<<< approach:
  (1) set BP at helper_trace2 and disas_insn, stop at 0xfc49b (lgdt %esi(%cs))
  (2) observe the micro-op code generated for each step
  (3) set BP at tcg_gen_code_common and watch the instruction range
*** use tcg_ctx->gen_opc_ptr and its difference with tcg_ctx->gen_opc_buf
to infer the location and content of the micro-code
	CROSS-REFERENCE with the previous attempt to check why the restoration code is missing for the second helper_trace_mem!
<<<
>>>   Pseudo-code                 Micro-code		x86 Instructions
gen_lea_modrm: 
  A0 <- disp			11 (movi_32)		0xb31e642b (no-code gen)
  A0 <- A0 + seg		18 (ld_i32), 		0xb31e642b
				22 (add_i32),		0xb31e642e,  
  T1 <- [A0]			114 (ld16u)		0xb31e6436 
  A0 <- A0 + 2			11, (movi_32)		0xb31e6485
				22, (add_i32)		0xb31e6485
  T0 <- [A0]			116 (ld32)		0xb31e648f
				[!!!diff from 114!,  
			 	because it's loading T0], 
				(load
  T0 <- T0 & 0xffffff		11 (movi32), 		0xb31e64cd
				31 (and_i32)		0xb31e64cd
  base <- T0  (cpu_T[0])	21 (st_i32)		0xb31e64d5 
  limit <- T1 (cpu_T[1])	21 (st_i32)		0xb31e64db
  #from debug A0: 0xfd3a0 
****************************************************************
Conclusion:
   1. 
   0xb31e6446 :	mov    %edx,0x88(%esp)  #belongs to T1<-[A0]
   0xb31e6485 :	mov    0x88(%esp),%esi   # belong to [A0<-A0+2]
   2. 
   0xb31e648f :	mov    %ecx,0x88(%esp)   # belongs to T0-<[A0] 
**************************************************************

-------------------------------------------------------------
Attempt 37: based on attempt 36, 
Now look at specifically 0xb31e6485 [A0<-A0+2] (MICROcode 11, 22, index: 84, 85), check why it's reading from memory; 
<<< do x/16i 0xb31e6485 to check from time to time about the code generated.
-------------------------------------------------------------

 Observation: for opcode at 84 (11, movi_32), it just call tcg_reg_alloc_movi, but not generate any real x86 code. It saves constant 2 into the ops/args array.
At opcode 85 (22, add_i32), it calls tcg_reg_alloc_op(...).
	it first copy cthe constant, then it calls tcg_regset_set(allocated_regs, s->reserved_regs), then it uses a for loop to retrieve the arguments.
For the 1st arg, its type is TEMP_VAL_MEM (value 2, note that 0 is dead, 1
is register, and 2 is MEM, and 3 is const), it calls tcg_reg_alloc, it allocates as reg6, then it generates a load instruction which loads from memory (esp+0x88)
, and records this information in s->reg_to_temp[reg/6] to arg number (which
is used to retrieve the value record at s->temps[arg]).
  That generates the instruction at 0xb31e6486. Note that it sets ts->mem_coherent to 1 (because currently the value is the same as the memory).

 In this debug session: ts is located at s->temps[15]. (arg=15).reg allocated
is number 6 (esi)

-----------------------------------------------------------
Attempt 38: how are the MEMORY VARIABLES saved (their register values saved
into memroy
	Check instruction 0x b31e648f [T1-<[A0]] (microcode 116, index 86,
ld32. At this moment A0 is already A0+2, saved in argument 15]
  check why it's savied to memory but NEVER read out!!!  Note that
  instruction 0xb31e648f writes the value of A0 (stored in %ecx) into
  memory! 
-----------------------------------------------------------
+++ the idea is similar. This time it's reading out of register directly.
arg is still 15 (A0), see attempt 37). Note that its type is NOT TEMP_VAL_CONST,
it's indicated as TEMP_VAL_REG.
   It, as usual, calls tcg_reg_alloc_op (just like 114):
	(1) it first copies s->reserved_regs to allocated_regs, it seems to
		be a bit string 0x48 01001000 
	(2) then it identifies the first argeument to be 15 (this is A0)
	it's located at s->temps[15]. val_type=1 (REG), reg=1, val is 0xd3a0,
	mem_coherent is 0, then it uses tcg_regset_test_reg(arg_ct->u.regs, reg)
	for if branch. Note that reg value is 1, and arg_ct->u.regs is 0xfa.
	(it seems to test if the register is contained in some constraint set).
	(3) It then sets allocated_regs (again, verified that allocated_regs
	is a bit string. Its value is now 0x32= 0011 0010, the logic
	of tcg_regset_set_reg(d,r) is defined as (d) |= 1L << (r). Here it's
	setting bit 2 (from right). 
	(4) Then because the ld instructio is defined in tcg.h as a 
		OPF_CALL_CLOBBER instruction (meaning clobbers call registers
		and potentially update globals), !!! at line 1934 of tcg.c
		it calls tcg_reg_free!!!! if the register is in the
		tcg_target_call_clobber_regs! It is initialized in i386/tcg-target.c (EAX, ECX, EDX) are set, and its value is now: 0x7 (111), notice that
the const value of EAX... here are TCG_REG_EAX etc! value defined as 
   TCG_REG_EAX = 0, ECX=1, EDX=2, EBX=3, ESP=4, EBP=5, ESI=6, EDI=7 (note that
this is different from the definition in target-i386/cpu.h!!!!)
  	So here reg=1 actually means ECX!!!!!
 	!!!!!!!!###########!!!!!!!!!!!!!!!!
	At line 1937 it's a simple loop which synchronize (save reg to mem!!!)
	It synchronizes ECX (1), the logic of tcg_reg_free is simple: it
 reads s->reg_to_temp[reg] which returns the arg_id, the memory variable
 can be retrieved using s->temp[arg_id]. Then if it's not memory coherence,
 the system generates a tcg_out_st instruction. 
	!!!!!!!!!!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
	It then sets ts->mem_coherent to 1????   This is quite suspicious as
	the st operation is not done yet??? need to verify
	!!!!!!!!!!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
	!!!!!!!!###########!!!!!!!!!!!!!!!!
	Conclusion: the registers ARE properly saved when the instruction
 is identified as a CLOBBER instruction! QEMU will first save the CLOBBER
 instruction (EAX, ECX, EDX) to memory first BEFORE doing ld or st instruction!

 
------------------------------------------------------------
Attempt 39: now the question when generating the code for ldl tasks,
why does the system read directly from %ecx? s->temps[15] at the 
end of handling of T1<-A[0] is set memory variable, and ecx has been
saved to memory!
  Need to look at instruction: 
   0xb31e64a4 mov    %ecx,%eax # eax = 2 !!!! 
  It  corresponds to the following micro code operation
  T0 <- [A0]			116 (ld32)		0xb31e648f
				[!!!diff from 114!,  
			 	because it's loading T0], 
  check instruction at 0xb31e64a4 to identify the translation, while
continue the exploration right after attempt 38
------------------------------------------------------------ 
  Observation: it's following into the tcg_out_tbl_load, it takes the
register %ecx directly as the parameter (so there is no check of the 
argument), this is actually set by tcg_reg_alloc_op, because
earlier tcg_reg_alloc_op has load the A0 into register and it
has synchronized it, it thought that the value won't be
messed up when it calls tcg_out_ld --> tcg_out_tlb_load. But
actually before tcg_out_tlb_load, there is a call helper_trace_mem
which might mess up the register values!!!!

-----------------------------------------------------------------
Conclusion: REAL CAUSE OF THE BUG!!!!!!!!!!!!!
---------------------------------------------------------------

  ###: tcg_out_tlb_load has the assumption that all registers are
EXACTLY the same as tcg_out tcg_out_qemu_load, because
tcg_reg_alloc_op has already cleared all register issues (saving
and synchronizing them). But because we inserted
 a function call before it, the CLOBBER registers might be messed up.
 In this case, we ONLY need to PUSH registers EAX, ECX, EDX into the 
 stack, and at the end, POP them out.

-------------------------------------------------------
#######################################################

Attempt 40: push EAX, ECX, EDX into stack and pop them out when at
the end of  tcg_out_trace_mem.
#######################################################
-------------------------------------------------------
(1) works.works.
(2) recover the mem_handle function, with logic commented out. works.
(3) now enable the real memory recording logic. works
(4) remove dummy function. works.
(4) register windows. works.

-----------------------------------------------------------------
Task 41: add a config file
-----------------------------------------------------------------
(1) place the file in traceinstr/config.txt (5 min)
(2) add a function init_traceinstr() in handle.h and handle.cc (10 min)
(3) the init_traceinstr() initializes a number of global variables. (10 min)

-----------------------------------------------------------------
Task 42: add dependency analysis
-----------------------------------------------------------------
(1) Later for memory management, add an additional layer of abstraction called
 dependencyCache which maps from instruction to instruction. At this moment,
 simply use C++ hash map. [15 min]. DONE. (see accessHistory.h)
(2) Add configuration trace range, to trace instructions in a specific range.
 [25 min] DONE. (see trace.c)
(3) In Memory Accesss handlers, add the dependency acccess [20 min] . DONE
(4)  Testing the above.
	[0] add a range to dump instructions. [10 min]. DONE.
	[1] get a proper range (about 20 instructions). [15 min]. DONE
	[2] check the initial dump [15 min]. DONE.
	[3] modify dump to include data dependency [20 min]. DONE.
	[4] examine and check data dependency [40 min]

--------------------------------------------------------------------
Task 43: analyze the problem of instruction dump error and fail of fflush
--------------------------------------------------------------------
(1) fflush problem. When run ./run.sh, it exits or crashes early 2 or the 3 times.
	run in gdb: strangely it hits exit(0), but the output of trace.dump() 
did not show up including all the printf statements around.
  Somehow it seems that the printf()'s are directly stopped by exit().
 Also trace.dump() is somehow run in parallel with other printfs. How would
 that be possible? It's supposed to be sequential.
	Answer: the threads/processes running on the emulator is actually
 really loaded with real threads in the OS? So the printfs can be from 
 others? ---  very strangely even if the exit is protected with getchar,
the exit() call still gets executed! It seems that getchar() does not work at
all. Notice that even not using monitor mode does not work either.

      Temporary solution: reduce the number of instructions to capture to 300
solves the problem temporarily.

----------------------------------------------------------------
Task 44:  problem of dump for system lib instructions.
----------------------------------------------------------------
(1) add into CONFIG file a file to write. DONE.
(1) Identify the probelamatic instruction. svchost.exe, 5c679cc0. DONE
(2) Set a BP and find the real instruction.
	env: 
	 segs = {{selector = 35, base = 0, limit = 4294967295, flags = 13628160}, {
	instruction is CMPL $0x...., %ecx.
(3) Compare the values of segment registers. DONE. The problem is that
  update_instr passes an empty buffer (not contents filled in) to add_instr
 when the instruction is not there.
(4) Fix. DONE.
  add one function has_instruction into the system so that it checks if the
instruction is there before update_instruction is called; if the instruction
is not there, fill out the instruction buffer first. 


---------------------------------------------------------------
Task 45: Register access? Think about solution! 
--------------------------------------------------------------
(1) study what are available in the libdism package
  x86_insn_t -> operands (type: x86_oplist_t)
      operand_count (no need t ouse this)
  x86_operand_list has next pointer
    x86_opt_list->op (type: x86_op_t)
  x86_op_t -> type (x86_op_type) -> data (cast as x86_reg_t)
    if the type is op_register then it's register, but the 
  handling of op_expression could be more complex.

  use x86_operand_foreach(op_src/op_dest) can read the target and source
operand. Now the only problem is how to handle the op_expression. reads
the register from data.expression.base or index.
(2) think about implementation plan.
   Add a function to InstrInfo class: parseInOutReg(). Pseudo-code as below:
   for each input operand (use x86_operand_foreach call) 
       if type is reg. then add the register code
       if type is op_expression. then add the register base.

 Note:!!!! jz conditional branch's flag register is not captured!!!
(3) use GDB to do the coding first and exploration. DONE.
(4) finish the implementation and testing. In progress. see below.
(5) fix op_expression. DONE.
(6) fix the src/dest problem.
   TEMPORARY solutio: do not handle op_expression, for those index based
or displacement based addressing modes (ignore the dependency on
the registers). This might be impacting the pointer arithmetic, we'll handle
it later. DONE.
(7) Fix instruction dump to dump all registers. DONE.
(6) fix conditional logic on flag registers. DONE.
  --> define a boolean function bool isReadFromFlags(x86_insn_t *insn)
	--> add those instructions that are conditional jumps, conditional calls
  --> define a boolean function bool isWriteToFlags(x86_insn_t *insn)
	--> add arithmetic, logic, and flag_manip instructions, check instr_group
  --> find out the ID of the eflags -- call x86_flag_reg. DONE.
(7) Add the register dependency.
  Use the accessHistory class to add appendency.
  (a) declare the accessHistory. DONE.
  (b) add to handle_instruction. For write register, update the accessHistory;
	for read register, add the dependency. DONE
  (c) Debug. DONE.

----------------------------------------------------------
Task 46: dump trace when process exits
----------------------------------------------------------
 (1) find out the following information.
	(a) read about process exit in windows. System call and interrupt number?
		zwTerminateProcess -> 
			set EAX: 0x0101
		It eventually executes SYSENTER instruction
	   ----> so SYSENTER, eax= 0X0101 IS THE time to dump the trace. DONE.
        (b) find out the interrupt handler in QEMU
	   --> seems to be helper_raise_sysenter. see translate.c:7141.
	(c) Need to understand how the sysenter is translated.
	 *** key: how to read EAX, Is it safe to read from env->regs[0]? ***
 	  (c.1) understand the disas_insn logic at 7141.
		tcg_ctx->gen_opc_ptr = 0x28c2618c
		(a) generate a jump to itself! Two steps
			INDEX_op_movi_i32 (mov PC to a temp) 0xb
			INDEX_st (mov temp -> env->eip) 0x15
			0xb (mov)
		(b) gen_helper_sysenter
			two opcodes: 0x8 (call), 
		Now tcg->ctx->gen_opc_ptr is 0x28c26194
	  (c.2) understand how these opcode are handled. 

		set a bp at 2259 of tcg.c (this is the loop that processes
		microcode one by one). Iniitially gen_opc_buf is located
		at 0x28c2617c (this is 16 bytes earlier). So we need to hit it
		8 times (2 bytes one code).
			s->code_ptr is:  0xb6ea4383

		The code are:
		 0xb6ea4383 :	movl   $0x7c90eb8d,0x20(%ebp) //to set env->eip
		 0xb6ea438a :	mov    %ebp,(%esp)
   		 0xb6ea438d :	mov    %eax,0x80(%esp) //push param env
   		 0xb6ea4394 :	call   0x82fa589 <helper_sysenter>

		It seems that EAX is not protected. env->regs[0] may be 
		different from the real value of EAX (emulated) (because
		env->regs[0] has only the value of at the beginning of
		the block).

	  (c.3) understand the logic of helper_sysenter.
		It resets all segement register's base and limits.
		It sets EIP to env->sysenter_eip = 0x804def6f

		@EIP 0x804def6f: length: (5): mov	$0x00000023, %ecx
		@EIP 0x804def74: length: (2): push	$0x30
		@EIP 0x804def76: length: (2): pop	%fs
		@EIP 0x804def78: length: (2): mov	%cx, %ds
		@EIP 0x804def7a: length: (2): mov	%cx, %es
		@EIP 0x804def7c: length: (6): movl	0xFFDFF040, %ecx

		Note how the system handles %ecx at 0x804def6f, it's 
		corresponidng tcg_ctx->gen_opc_ptr is 0x28c26184, it has two
		operaiontls: movl_T0_im, mov_reg_T0.

		clearly we need to look at mov_reg_T0 -> it eventually call
		tcg_gen_ext32u_tl(cpu_regs[reg], t0) (here reg is 1 stands
		for ecx, eax is 0 I guess).

			Note that cpu_regs[reg] maps from reg number to
		temporarily allocated reg (reg renaming). It generates
			mov_i32 (reg to reg) at 0x28c26184 (microcode: a)

		Now set a BP at tcg.c:2259, it is later leading to
		tcg_reg_alloc_mov which calls 
		  tcg_regset_set(allocated_regs, s->reserved_regs).

			*** if cpu_regs[1=ecx] is 6, then its data store
			is located in tcg_ctx->temps[6], as shown below
			{base_type = TCG_TYPE_I32, type = TCG_TYPE_I32, val_type = 2, reg = 3, val = 0, mem_reg = 5, mem_offset = 4, fixed_reg = 0, mem_coherent = 1, mem_allocated = 1, temp_local = 0, temp_allocated = 0, next_free_temp = 0, name = 0x851fc2b "ecx"}

		Inside tcg_reg_alloc_mov there are lot of cases to handle: e.g., the mem and the register itself are not coherance, etc.
		Note the following attributes: TCGTemp in tcg/tcg.h
		* fixed_reg
		* mem_coherent means the REGISTER has been sycnrhonized (saved) to the TCGContext->temps array.


	
 (2) define the algorithm. DONE
		1. add an attribute EAX_BEFORE_SYSENTER to X86EnvState (5 min)
		2. in the disas_insn (line 7141)'s part which handles  (20 min)
		the SYSENTER instruction, add code to save current EAX
		register vale to global attribute EAX_BEFORE_SYSENTER.
			(a)  declare a function: gen_save_regs_before_SYSENTER()
			(b) find out how to st to env
			Similar to the following:
			gen_op_mov_TN_reg(OT_LONG, 0, R_EAX); //check what's the proper ot size? 4?
		        tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State, eip));

		3. in helper_sysenter, read out the EAX_BEFORE_SYSENTER,
		if it is 0x0101, then this is to terminate process. (10 min)


 (3) implementation. DONE.
 (4) testing . 
	(a) set BP at seg_helper.c:2251 (inside helper_sysenter), watch the
	value of EAX captured. The gen_op_mov_reg_T0 ... has problems debug it
     Worked now!
----------------------------------------------------------------
Task 47: check data dependency on branch.exe again. 
---------------------------------------------------------------
  Done. 30 minutes.

--------------------------------------------------------------
Task 48: add control dependency
-------------------------------------------------------------
 0. add Instruction::addControlDependency() 5 min. DONE.
 1. record last instruction - declare it in Trace class. 5 min
 2. modify Trace::updateInstruction. 5 min. DONE.
 3. check srss.exe. 10 min. DONE.
 4. check branch.exe. 15 min. SKIPPED

-----------------------------------------------------------
Task 49: Establish FTP repository between VM instances
--------------------------------------------------------
 Solution: (1. not working). use FTP. However, MS DOS has some stupid error and does
not support passive FTP mode and we always get 500 port illegal port.
	(2. use the TAP device). Pretty much follow the instruction
http://en.wikibooks.org/wiki/QEMU/Networking. Install and properly edit
the qemu-ifup and qemu-ifdown scripts. Note that a lot of Linux 
commands such as openvpn  and firestarter needs to be installed. DOES NOT WORK.
	(3. following KVM instruction) https://help.ubuntu.com/community/KVM/Networking. Basically modify /etc/network/interfaces to set up the bridge
adapter directly. See below:
----------------------
auto eth0
iface eth0 inet manual

auto br0
iface br0 inet dhcp
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0

auto eth1
iface eth1 inet manual

auto br1
iface br1 inet static 
        address 169.254.236.150 
        network 169.254.236.0 
        netmask 255.255.255.0
        broadcast 169.254.236.255
        gateway 169.254.236.100 
        bridge_ports eth1
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0
------------------------
	In the run.sh, notice that the type of adaptor is important. PCI-virtio is not recognized by the guest XP, needs to replace it with rtl8139. Also need
to use "br0" to replace "hn0" in the original command from "hn0". Note that the
host can capture all traffic by tcpdump on "br0". Run.sh string see below:

	Then sftp can be provided by psftp, download from www.chiark.greenend.org.uk.

	(2) Adding a second parameter. The trick is to duplicate
the handling of br0 and add br1. Note that the 169.254.*.* network
adaptor needs to be set as STATIC IP ADDRESS. So the handling is 
slightly different for br1.
**** don't forget to enalbe "br1" in /usr/local/etc/qemu/bridge.conf
	############## 2nd adaptor still cannot work ##### check later.

---------------------------------------------------------------
Task 50: study VM snapshot
---------------------------------------------------------------
Implementaiton:
	in QEMU monitor use command savevm and loadvm. loadvm needs
about 1 minute. But it's better than nothing. Check why KVM is not possible later.

	*** to make save/load faster, use -drive file=winpx.img,cache=unsafe
It will make the load/save shortened to 5 seconds!!!


----------------------------------------------------------------
Task 50: Slice Algorithm
----------------------------------------------------------------
Implementation Steps:
(1) InstrInfo add a boolean attributes bCondBranch. Init to true when
 it is an conditional branch in its constructor. [15 min]. DONE
(2) in Trace add a data member Vector slice.  [30 min]. DONE.
	in Instruction add a boolean marker, bInslice.
    Add a function setSlice():
	Q = new Queue {instr, all branch instructions} mark all in slice
	while Q is not empty:
		ins = Q.removeFirst()
		if ins->hasNoDataDependency, mark in slice 
		for each data depenency of ins: add it into queue
	go over all the instructions and add them
	revser the slice.
(3) Trace::dumpSlice(int low, int upper_limit) [10 min] DONE.
	dump each slice.
(3.5) Add configuration. [20 min]. DONE.
(4) At this moment, can study the slice. [1 hr]
  *1. bug. CMP is not included in calculation. <-- it's the problem in tracing algorithm. FIXED.
  *2. bug. instruction at 0x401034 (for slicing to 0x40103e) is included (which
 should not be included). <--. The slice is too coarse.

--------------------------------------------------------------
Task 51: Network Problem Again: cannot FTP or SFTP to host network
--------------------------------------------------------------
  Summary of http://translate.google.de/translate?hl=en&ie=UTF-8&sl=de&tl=en&u=http://qemu-buch.de/de/index.php/QEMU-KVM-Buch/_Netzwerkoptionen

  *syntax for -net nic vlan=0, macaddr=xx:xx, name=nic1, model=rtl8139|e1000..,
  *by default each nic is attached to a VLAN (-net user), default 10.0.2.0
	network, with DHCP server at 10.0.2.2, host is also accessible from
	10.0.2.2
  *-net user,restrict=y forbids log on connection to host
  *"info network" in QEMU monitor to watch network status.
  * port redirection is to redirect a PORT (say 12345 on host) to a PORT
	of the GUEST system (say 22).
  * TAP is a software adaptor. In TAP mode, the VLAN is connected to TAP device;	*** in run.sh insert 
	sudo tunctl -t tap0 -u csc288
	at the end -> sudo tnctl -d tap0
   * in /etc/qemu-ifup, add one line
	#!/bin/sh
	sudo /sbin/ifconfig $1 10.0.2.100  [note here the host is set to 10.0.2.100]

	In /etc/qemu-ifdown, set it to
	sudo /sbin/ifconfig $1 down	

   TAP alone does not work! No DHCP, and cannot connect though.

   * in /etc/networking/interfaces, first create a bridge br0 attached
   to eth0; then in /etc/qemu-ifup, brdige the br0 to tap0. Use
	sudo "brctl show" to display the information.
 #######################------------------#############################
  Solution: 
	(1) do not create bridge statically. Still use the old /etc/network/interfaces file to bring up eth0 (DHCP) and eth1 (static address).
	(2) prepare startbridge.sh in qemu_images. This file add the bridge, adds a tap and bridges them.
		sudo tunctl -t tap0 -u csc288 #create tap0
		sudo ip addr flush dev eth0 #will drop IP from eth0
		sudo ip addr flush dev tap0 #will drop IP from tap0
		sudo ifconfig tap0 0.0.0.0 up # strangely this line is required
		sudo brctl addbr br0 
		sudo brctl addif br0 eth0 tap0 #now eth0 and tap0 are bridged
		sudo ip link set dev br0 up
		sudo dhcp br0 # will set up routing table

	Use 10.0.2.2 as default gateway.
	It seems that there is no way to mix both 10.0.2.x and 169.254.0.0 network (so we skip eth1 here)

	(3) run.sh setting: notice that both script,downscript are NOT used.
	-net nic,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=no,downscript=no

	SFTP is running too slow!!! change /etc/ssh/sshd_config to allow longer login time

	!!!! Make sure "route -n" produce the right routing table. If not, do "sudo service networking restart".
		
------------------------------------------------------------
Task 52: Add samba support
------------------------------------------------------------
 (1) sudo apt-get install samba
 (2) modify /etc/samba/ config file to enalbe file sharing, started with
 [homes] ...
	  COMMENT OUT ### valid users = %S
      then have the following----------------
	   comment = Network Logon Service
	   path = /home/samba/smbuser
	   public=yes
	   security = share
	   guest ok = yes
	   guest only = yes
	   read only = no
	   force account = smbuser

 
 (3) sudo smbpasswd -a smbuser (to add a new user)
 (4) sudo smbd reload

 ----------------
 In Windows XP, visit Network Places and add a network place.
  In command window do the following:
     net use X \\10.0.2.16\smbuser
  Then X drive is available.

-----------------------------------------------------------
Task 53: snapshot problem
----------------------------------------------------------
After network is enabled, cannot do snapshot
Solution:
	(1) experiment if network is disabled, can we do snapshot? - STILL no
	(2) observation: even after disable the entire network, drop
the connection, and replace it with user namework. It still does not work.
	(3) debugging into QEMU: found that the system is reloaded and
helper_trace2 is being executed. The system is trapped in some
kernel service (infinite loop). Might be some device problem that causes the
	error.
	(4) attempt1: drop the network device and the smb link in the
	"network places"  and try it again. Worked! It seems that it is
	the smb link in "network places" causing the trouble
	(5) attempt2: enalbe the tap device and try the snapshot again. 
	Strangely, the guest OS has exactly the same IP 10.0.2.15 and
	this dynamic IP is assigned ho dhcp at hofstra. We decide 
	to use a different MAC and see how it works. --- DOES NOT WORK!!!!
	TAP device not working.
	(6) attemp3: use the user network, however, there was a blue screen.
	try it again. STILL NOT WORKING, CANNOT duplicate (4).
------------------> by debugging, we found that the windows XP
is involved in an infinite loop in process 0x39000.

------------------------------------------------------------
Task 53: Binary Rewriting 
------------------------------------------------------------
Implement Trace::writeSliceIntoExecutable(char *filename, vector<Section*>) [3 hrs]
	//1. move to entry
	//2. add instruction one by one, without readjusting addr
	//3. finish writing.

Plan: 
  (1) add definition SectionInfo, BinaryWriter [15 min]  DONE.
  (2) add function Trace::writeSliceIntoExecutable(fileName, vector<SectionInfo*>) [5 min]. call binwrite.writeSliceToExec. DONE.
  (3) add CONFIG reading for Vector Info [25 min]. DONE.
  (3.5) get all section data from b3.exe. [15 min]. DONE
  (4) Implment a naive algorithm which FIRST CLEARS all area with NOP, and
  simply write the instructions ONE BY ONE. [65 min including debugging]
	(4.1) declaration of all related functions [15 min]. DONE
	(4.2) writeBytes(FILE *file, int location, char *instr, size) 
		[10min]. DONE.
        (4.3) writeInstruction(Instruction*, FILE *file, int location) 
		[10 min]. DONE.
	(4.4) getSectionInFileOffSet(vecSections, Instruction) return -1 if
		could not find. [15 min] DONE.
	(4.4) writeSlice (just write each instruction) [15 min]. DONE.
	(4.5) testing [30 min]. DONE.
  (5) Add the function clear section. (15 min). DONE
  (6) Test write Slicing using b3.exe. [1 hr]. DONE.


-----------------------------------------------------------------
Task 54: Snapshot FINAL SOLUTION!!!!!!!!!!!!!!! 
-----------------------------------------------------------------
 (1) try user net. CORRECT SEQUENCE IMPORTANT!!!
	(a) stopvm - delvm if any, restrat
	(b) stopvm - savevm (use a new id, maybe related) - quit immediately
		*** IMPORTANT! don't do anything before quitting!!!
	(c) restart - loadvm (without stop)
	SUCCESS!!!!
 (2) try tap device. Problem IP is the same as host.
    ****** IMPORTANT. CHECK route -n FIRST to make sure 10.0.2.2 is the
	default gateway for all traffice FIRST!!!!
    **************------------------ *********************
    ***!!!! use STATIC IP 10.0.2.17 for the guest (if it has the
	conflicting 10.0.2.15 ip assigned!!!!) #***********
    **************------------------ *********************
 	Now set X: drive use
	net use X \\10.0.2.15\smbuser

 (3) now the snapshot problem again!!!! note working. info cpus found that
	cpu status if false.
	**** still follow (1). do a couple of ***info cpus*** frequently.
----------------- FINALLY SET ----------------------------------------


-----------------------------------------------------------------
Task 57: Solve the Oversize Slice Problem
-----------------------------------------------------------------
Problem: say if a function a() is called and uses
parameters supplied by multiple functions (e.g.,
b,c,d,e,g). Let's say x only uses the result from d, but the
current slicing algorithm will include  bc,e,g as well.
This occurs a lot for system functions such as those in ntdll.

Idea: associate a time-stamp with each register writing/memory writing
operation. When a dependency link is established, the dependency should
be tagged with the timestamp of the object that it is accessing.

  ####
 Implementaiton Plan:
  (1) collect current slice size: 28739 for b3.exe (total trace size: 50656)
  (2) introduce a global timer of long. (name: lTimestamp) [10 min]
	update the lTimestamp in handle_instruction. DONE.
  (3) define a new class called dependLink (in instr.h) [15 min]. DONE
  (4) define the comparison function for dependLink (in instr.h) [15 min]. DONE.
  (5) modify the data dependency of Instruction class [10 min]  [10 min].DONE
  (6) update the write/read dependency for mem. [30 min]
	(a) update accessHistory add time stamp [15 min]. DONE.
	(b) update mem read and write [15 min].] DONE.
 (7) update the write/read dependency for reg [30 min]
  (8) test. [1 hr]
	trace size: 47272, slice size: 27140. does not improve. Reason,
  we did not take advantage of the time stamp information in the slice alg.
  (9) ALG: add another timestamp to each dependency link (the creation time).
  So a link has two time stamps (readTimeStamp: the time that the 
  dependee is created), and the createTimeStamp: the time that the link
  is created (and thus writing to the destination value). given two
  links LinkA -> LinkB, they have to match the condition: 
	linkA.readTimeStamp = linkB.createTimeStamp
  (10) Implementation:
	(a) add creatTimeStamp to dependLink class and change all related
	functions. [30 min]. DONE.
	(b) modify data dependency of memory. [20 min]. DONE
	(c) modify data dependency of reg [15 min]. DONE
	(c.2) test first [20 min]. DONE.
	(d) modify dependency algorithm [20 min]. DONE.
	(e) testing [30 min]. DONE
  IMPROVEMENT: slice slice: 5944 (reduced to 20% of original size).


-----------------------------------------------------------------
Task 56: Revisit the slicing algorithm. Fix it
-----------------------------------------------------------------
 1. Fix the jump instruction issue.
	Idea: 
	Need to think about the control dependency.
 	Each Instruction will have a "prev link" which indicates the
    prior instruction right before it. However, if the prior instruction (
    in most cases), if replaced with NOP, will still lead to the
    current instruction. In this case, there is no control dependency on it.
	Only when the previous instruction is a JUMP, BRANCH, or CALL, or INT3, we need to add an explicit CONTROL DEPENDENCY between them.
	The handling of RET needs special, to set up the prev link we need
 to trace back and find the CONTROL DEPENDENCY.
	The control dependency of an instruction should include all
 control dependency of a prev link. If the prev link.

 1. Implementation Plan:
  (1) Make the following modification in Instruction class:
	(1) add data member: priorInstruction [5 min]
 	(2) add comments for set<dependLink*> controlDependency [5 min] 
		previous instruction in
	(3) in Trace::updateInstruction, update priorInstruction/controlDependency.
		(3.1) add a global variable Instruction *prevInstruction to Trace. [5 min] DONE.
		(3.2) add a function setControlDependency( [5 min].DONE. 
		(3.2) add a function findPriorInstruction which traces back for the prior instruction)
			[15 min] DONE.
		(3.4) add an line function which tells if an instruction isJUMP() [10 min] DONE
		(3.5) add an inline function which tells if an instruction isRET() [10 min] DONE
		(3.5.1) add an line function for adding control dependency. [15 min] DONE.
		(3.6) Complete setControlDependency() [15 min] DONE.
		(3.7) Test findPriorInstruction [15 min] Use SRSS.exe DONE.
		(3.8) Test setControlDependency [15 min] Use SRSS.exe DONE.
		(3.9) Change algorithm of tracing [20 min]
			(a) add a function genDirectControlDependency() 
				- return a set of dependLinks [65 min] . DONE
			(a.1) change the add control dependency algorithm [15 min] DONE.
			(b) bulid it into algorithm [5 min]. DONE
			(c) test the addControlDependency first [5 min] DONE.
			(d) test the selectly add control dependency function [15 min].  DONE
			(e) test the slice algorithm[15 min]. add a local variable to record queue size.
		(3.20) Test use SRSS.exe [15 min]. DONE
			(a.1) fix the dump [10 min] DONE
			(a.2) fix the setPriorInstr [10 min]. DONE
		(3.21) Test use b4.exe [1.5 min] 
			(a.1) set up the watching stats. [10 min]. DONE.
			(a.2) check why it's too slow. [75 min] It seems to be the control dependency
				size that causes the problem.
			(a.3) remove the oldest timestamp. [25 min] DONE. still memory problem.
			(a.4) run the program b3 again and see what is actually the problem. [30 min]
				It seems to be a destructed b3.exe. Reconfigure the system. DONE.
			(a.5) study the trace file again. Problem: slice has only one instruction.
				Control dependency is not taken in! [15 min]	
			(a.6) fix the above bug [60 min]
				trace into Trace::setSlice and step by step.	
				(1) fixed one bug related to queue push
				(2) control dependency has no readAccessTS. (fixed)
			(a.6) fix the data dependency link problem [30 min]
				When doing the inclusion set, should do -1!
			(a.7) problem, register dependency not taken into account. [60 min]
				It seems that -1 does not occur to register dependency.
				Fix: in selectivelyAddControl dependency, timeStamp and accessTime the same, fix that!
			(a.8) there is still a bug related to the "-1" problem.
				Debug: trace into instruction 0x401034 and 0x401031 and 0x401024, 
			and observe the timestamp associated with each of the links.
				Fix: the problem is that when handling memRead and memWrite,
			the actual time stamp should be -1.. FIXED

-----------------------------------------------------------------
Task 57: Slicing Algorithm. Avoid visitng the same control dependency link multiple times. 
-----------------------------------------------------------------
 use cache.
 (1) introduce a cache set and compare function in Trace class. [15 min]
 (2) use the cacle in selectiveAddControlDependency. [10 min]
 (3) test [15 min]
  *** set::count() DOES NOT work!
  --> replace count() with find()
  does not work.
 Very strangely, the linkComp work. New experiment: in updateInstruction, just create another two dependLink and see if we could break at the comparison function.

  fixed the stupid error: forgot to add the instruction to the set of visited. took 3 hrs to fix it.

-----------------------------------------------------------------
Task 58: Slicing Algorithm Problem: slicing stops at 0x401005.
-----------------------------------------------------------------
Observation: it seems to be a simple bug of < vs "<="
CLEARED. done.

-----------------------------------------------------------------
Task 59: Examine the slice. See if it's executable.
-----------------------------------------------------------------
Observation: the sliced program is involved in an INFINITE LOOP! check later.
	Infinite loop: 0x401298 to 0x4012AC. However, the behavior departs from
0x401277 and 0x40127D. 
	Found bug: 0x401277 depends on 0x401275 (XOR ESI, ESI). But instruction at
0x401275 is not included in slice. Check dump: *** the problem again is the time stamp
not matching each other. There is a gap of around 0x30 between the access time of
0x401277 and 0x40127d and the two depend links can NOT be connected.
 
Debugging design:
  (1) set BP at 0x401275, 0x401277, and 0x40127d
  (2) constructor of dependLink constructor conditional breakpoint. check time stamp.

Observation: the problem is caused by the context switch. At 0x401277 there is a context switch,
and 0x401277 is later executed twice (resumed) and the time stamp cannot be connected.

Solution: once it is found that it is a context switch, add the current time stamp as the
access time to each dependlink.

Question: how to discover that this is a back from a context switch? check the global previous instruction insn_type.

Implementation:
  (1) in Trace::setControlDependency, add a case switch for IRET. and add a call to
	Instruction::updateDependencyForIRET. [5 min]. DONE
  (2) impelement updateDependencyForIRET.
	(a) for control dependency, if the priorInstr already exists, just update the time stamp. [10min]
		DONE.
	(b) for data dependency, first find the max smaller ts in all dependencies, then check
		those which contains this max-smaller ts and add the current ts. [20 min] DONE
  (3) Debugging
	(a) walk through all newly added instructions. [60 min]. Need to send commands using QEMU
	Monitor, otherwise hit the bp too fast. DONE
		(a.1) fixed a small bug related to NULL. [5 min]
	(b) check the resulted file. the "XOR" issue fixed. Now the problem of JMP is missing.

-----------------------------------------------------------------
Task 60: Examine the loss of control dependency of JMP instruction.
-----------------------------------------------------------------
  (1) collect the trace and check 0x401275. It seems that it did not trace back to 0x401070.
  (2) debug: set conditional breakpoint at 0x401275 in slicing, check what is going on.
	(a.1) it seems that 0x004013cf is in the slice, still need to check if it is WRITTEN. VERIFIED -> OK!
	(a.2) debug again check if the queue is accessed RIGHT. verified Ok.
  (3) new problems? at 0x004012da
	


-----------------------------------------------------------------
Task 61: Slicing Algorithm Bug 3. at 0x4013CA 
-----------------------------------------------------------------
  Observation: the function is called, entirely it's empty and it does not RETURN to the right place.
  (1) check why 4013CA is in the slice: because the following instructions have 
	depencies on it:
	(1) 0x804e1f25 (looks like context switch, NOT IN SLICE). check later
	(2) 0x403e5e (ret, NOT IN SLICE ): to return 
  (2) debug check how 0x4013CA is listed in SLICE. set conditional BP at
	trace.cc:213, 286, also set on push link to Queue. found that 
	0x4013CA is pushed by __0x4013ca__________ to the queue.
	Bug found: 0x4013ca is listed as control dependency of itself!!!
    Observation: 0x4013ca is reached from an IRET. then updateDependencyForIRET is called.
  The problem is that the update of the lastInstr occurs too early!!! --> actually not
  there is a recursive call of the setControlDependency (which should actually be
  setPriorInstr) -> then the "lastInstr" is not updated correctly. Correct it and still
  keep the set lastInstr operation.
 -----> FIXED. 0x4013CA is not in slice any more.


-----------------------------------------------------------------
Task 62: Fix RET/IRET logic
-----------------------------------------------------------------
  The RET/IRET should be included in control dependency. Simply add a control dependency.

-----------------------------------------------------------------
Task 63: Slicing Algorithm Bug 4. 0x00403DFB
-----------------------------------------------------------------
 Problem: function call without pushing the parameters. But these parameters are accessed
by system calls. E.g., 0x7C8017FD has data dependency on the push instruction, but
the push instruction is not included. The problem is that those instructions (which are 
NOT supposed in the slice), are not wiped with NOP instructions. So they still have
the dependency on the pushed parameters.
  For Task 62, there is also a problem: what if a function has no any data dependency by
the target instruction, then the entire function should NOT be included at all.

  [1] Fix for 62: make the slicing algorithm a multiple pass algorithm. In each pass, go over
  the instruction with PriorInstr to be a call again, search the entire trace and look for
  those CALL/RET pairs (forward), if there is any instruction in slice in between, add the
  RET instruction as the control dependency for the subsequent instruction, and redo the slicing.
  Until no more is added. 

  [2] Algorithm:
  	/** It checks the trace in the forward fashion, for each call instruction, find the
		the correspoding SUBSEQUENT INSTRUCTION, and then searches in the trace and
		see if there is any instruction in slice, if there is any, then add the corresponding
		RET instruction as the control dependency for the SUBSEQUENT instruction.
	*/
      int Trace::checkCallPairs(queue toProcess);
	/**
		scan from ts1 to ts2 and see if there is any
		instruction in slice
	*/
      bool Trace::hasInstructionInSlice(long ts1, long ts2); 

	some assisting functions
	Instruction *getSubsequentInstr(Instruciton *ins); //get the subsequent instruction in EXECUTABLE (not in trace)
	Instruction *getInstruction(unsigned int addr, unsigned int cr3);

 [3] For the above, we need to introduce a new history class for the ENTIRE trace. As the trace can
	be very huge, we only RECORD THE ADDRESS of instructions being hit at each time.
	This class should be at the logical/abstract level and be later expanded to include
	support for disk-mem exchange operations.

	As for the packing/unpacking, the different versions of instructions should be modeled
	by the Instruction class (and later queried by timestamp)

	class instrAddrHistory{
		instrHistroy(unsigned int cr3);
		unsigned int getCR3(); 
		long getSize();	//return the total number of instructions(addr) recorded
		void appendInstrAddr(unsigned int addr);
		unsigned int getInstrAddr(long timestamp);
	}
	Add a hash table to Trace class which maps from cr3 to instrAddrHistory

      
-----------------------------------------------------------------
Task 64: Slicing Algorithm Bug 4. 
-----------------------------------------------------------------
Now fix the CALL pair RET problem.
(1) add a addrInstrHist map to Trace [5  min] DONE.
(2) in Trace::updateInstruction, puch the address to the history, including testing [30 min] DONE.
(3) add prototype of the following functions. [10 min] DONE
      int Trace::callInSliceCalls(queue toProcess);
      bool Trace::hasInstructionInSlice(long ts1, long ts2); 
(4) implement handlelInSliceCalls [120 min]
	(a.1) there is an infinite loop problem. Fix it by adding a bit information about access.
	Algorithm: added one bool bit for representing the in slice status.
(6) test handleInSliceCalls [30 min]
 
-----------------------------------------------------------------
Task 65: Slicing Algorithm Bug 5. Infinite loop at 0x00403DFB
-----------------------------------------------------------------
Problem: parameter pushing.
The algorithm thought that some function (like those in imported dlls) are not contained in slice,
but actually they are, because we do not modify those system dlls. This leads to some missing
parameter pushing operations when calling these functions.

Fix idea: if a function (not in slice range) has some function in slice, then all function 
instructions (with proper timestamp) are labeled as in slide.

Implementation:
  (1) add parameter vecSectInfo to setSlice, and handleInSliceCalls [5 min] DONE.
  (2) add a function processInSliceCall(ts1, ts2) [5 min] DONE.
  (3) implementa processInSliceCall(ts1, ts2) [125 min]
	(1) issue 1? infinite loop. fix
	(2) issue 2: call/pair loop repeated two many times. fix.
	(3) issue 3: infinite recursion again. fixed. slice size: 43557 (entire size: 50997)
	(4) remove the add RET in control link.
  (4) debug processInSliceCall [15 min]
  (5) test the resulting file [15 min] 
	Infinite loop solved.

-----------------------------------------------------------------
Task 66: Add CONFIG paramter FILE_TO_ANALYZE
-----------------------------------------------------------------
  (1) add FILE_TO_ANALYZE in config file and add implementations to it. [15 min]. DONE
  (2) test it. [10 min]. DONE.

-----------------------------------------------------------------
Task 67: collect stats
-----------------------------------------------------------------
 Collect: (1) total in range instructions. (2) total number of instructions in slice
 Implementation:
	(1) add two attributes: nInRangeInstr, nInRangeSliceSize into trace class [5 min] DONE
	(2) in Trace::addInstruction increment nInRangeInstr [5min]. DONE.
	(3) in Trace::writeSlice increment nInRangeSliceSize [10 min] DONE.
	(4) test [90 min]
		(1) trace. DONE.
		(2) testing DONE. in range slice size: 1169, in range trace size: 2470.


-----------------------------------------------------------------
Task 68: test slicing functions
-----------------------------------------------------------------
  idea: the same function called twice. one is used and the ohter is not.
   	Make sure that the other one does not appear in the slice.
  Implementation:
	(1) compile the dump the program. [10 min]
	(2) run slice algorithm and then analyze it [40 min]
		Test program in b1.cpp -> b1.exe , run as b5.exe
		Entry: 0x401030
		SLICE_AT: 0x00401068
  Found problems:
	(1) parameters are NOT pushed (still related to the function body). Function body is
	not in slice! error in previous algorithm.
	(2) the function is still being called twice.

  Fix Debugging:
	**************
	(1) parameters are NOT pushed (still related to the function body). Function body is
	not in slice! error in previous algorithm.
	**************
	
	Set condition BP on 0x401048/401057 where 0x401005 is called.	Found several problems:
	(1) to test if an instruction is in slice, should use the timestamp. DONE. but not fixing the two duplicate calls
	(2) the number of instructions included in slice does not seem right. ONLY 1 instruction added.
		--> this seems OK. 0x401026 is captured, however, it did not trace to 0x401023.
		Problem with 0x401026, when build dependency it does not include the READ register.
		Disable the getDiff(input_reg, output_reg) in the constructor of InstrInfo. DOES NOT WORK.
		Now trace into InstrInfo constructor and check out how registers are being processed.
		Found the problem is with the handling of wr type of registers. But it forms a self-loop.
		Fix the Instruction::updateRegDependency.  FIXED.

-----------------------------------------------------------------
Task 69: fix issue 2 (duplicate function calls)
---------------------------------------------------------------- 
(1) first change the linkComp comparison, two instructions are regarded as equal only when both
the instruction and create timestamp are the same.
(2) made similar changes to handleMemRead and handleMemWrite
--- still does not work. debugging plan
(1) bp at instr.cc:252 and check instruction 0x401026 hit twice, what happens. Strangely two
  links are added, but in the dump there is only one showing up.
 (2) now the two links did show up, NOTE dump file too large, introduce debug levels. But it still
takes a very long time to produce the slice.
--- still not fixed.
  Debugging Plan: BP on setSlice and trace into the while loop. check the instructions added one by one. Added one debug function for dumping the contents of queue.
	Observation: trace on the ESP/EBP register caused the problem. It also enlarges the
 slice generated greatly. 
	Now the question is: can we ignore the ESP/EBP tracing? if "normal" compiler is used
and ESP/EBP is not used to pass data, then we can safely ignore them.

 Implementation Steps:
	(1) find the MACRO code of ESP/EBP. It's actually in disasm/ia32_reg.c big structure, search for "eax", esp and ebp are at index 5 and 6 respectively. This is verified by the dump of trace
	as well. DONE.
	(2) add system parameter TRACE_ESP_EBP. DONE.
	(3) modify the logic for TRACE_ESP_EBP. DONE.
	(4) check the size again. Still take quite some time to generate the slice.
	DONE! dual function call problems solved. Now the slice size is half of the in range trace.

-----------------------------------------------------------------
Task 70: New problem with seh_prolog4
---------------------------------------------------------------- 
  The problem is that there is some instruction assignment to ESP, and the disable TRACE ESP/EBP
lost the track. 
  Experiment 1: enable ESP trace again, see if the problem persists. All range:  40304/51002, In Range: 1313/2284, total links: 724,692 (this number is much larger) doubles the time.
  Experiment 2: disable ESP trace. All range: 31495/51520, In range: 1007/2284, Total links: 300658, 2 minutes 
  Experiment 3: in the ENABLE version, ignore the instructions such as CALL/RET (because they will
change ESP/EBP by default anyway).
	All range:  39772, 51542 , In Range, 1245/2284     Total Links:  845305.
	It DOES NOT HELP! The reason, there are still inside function instructions (like pop)
	impacting ESP value.
  Experiment 4: add hist based visit check in setSlice, see the result. links visited: 513k. Improves about 40%. 
	Slice Size: 30526, In Range Slize Size: 888, In Range Trace Size 2284
  Experiment 5: performance is very bad. Check if this is the problem of hist swapping. Add stats to hist swapping.
  Also add the timing information.
		Stats: # total links visited: 409622, histSwaps: 15384, duration 232.000000 (sec) - 4 minutes
  It seems that there is too many history swaps.

  Experiment 6: increase the size of history buffer to 20MB and see the result.  Saves about 10%
  	# total links visited: 407121, histSwaps: 29, duration 196.000000
	Slice Size: 26059, In Range Slize Size: 879, In Range Trace Size 2284
   *** so here the major bottle neck is still the total number of links visited.
  Experiment 7: still use the same idea as experiment 3, ignore CALL/RET/PUSH/POP, only handle the specific MOV/ADD instructions
 that involve ESP/EBP. There is a timestamp related bug. Fixed the addDirectControl Still need TO TEST IT!~
  This greatly reduces the time spent on slicing.
	# total links visited: 280876, histSwaps: 8, duration 56.000000
	Slice Size: 31531, In Range Slize Size: 1019, In Range Trace Size 2284


  Experiment 7 does not fix the problem with seh_prolog4.  Does not work. PUSH/POP not working. Has to cancel the 
 optimization on the ESP/EBP.

-----------------------------------------------------------------------------
Task 71: fix problem with instruction 0x40129C depends on 0x401297
-----------------------------------------------------------------------------
Problem: AX instead of EAX register. 
Implementation: add the handling in Instr handler and setInOutReg of InstrInfo..
Fixed.

-----------------------------------------------------------------------------
Task 72: fix problem with instruction 0x4012AF (implicit register on EAX)
-----------------------------------------------------------------------------
Debug: set a breakpoint on the update_2 and update_3 functions in instr.cc and a conditional
BP. The problem is that the x86_ea_t (expression) is not processed when a register is used
as a base or index register for calculating effective address.
  In handling implicit and explit, handle the implicit.

Implmentation: 
  (1) add an assisting function handleOpExpression(op, input_reg) to each update_func for input [5 min]
  (2) logic: if op is expression, if index/base id is not 0, then add it to input.  [15 min]
  (3) debug [25 min] DONE. The slice size is increased though.


-----------------------------------------------------------------------------
Task 74: Comparative Study and find out why qemu loadvm is so slow and why cpu status is halted.
-----------------------------------------------------------------------------
  (1) find out which functions in QEMU are responding for loadvm (during loadvm, Ctrl+C and bt)
  After loading the vm, the status is stuck on (looks like an infinite loop)
	qemu_run_all_timers () at qemu-timer.c:454 (calls the following)
	qemu_run_timers is visited many many times

	It seems that it takes a very long time to reach the "break" in the following
	386             if (!qemu_timer_expired_ns(ts, current_time)) {
	387                 break;
	388             }

	Verified, it's this stuck which causes the delay of loading.

It seems that the loop/timestamp combination caused problem.
SOLVED FINALLY! The problem is with the clock using host clock (merging clocks cause trouble).

Add an option ---
 -rtc clock=vm
---------



-----------------------------------------------------------------------------
Task 75: check performance problem
-----------------------------------------------------------------------------
 (1) total hits: 695k, total links hit: 400k, time: 60seconds. Strangely, performance increased
 over 10 times! Due to efforts in 74.
 (2.1) Effort 1. check the timing of addDirectControlDependency. it's 7 sec. 
 (2.2) Effort 2. check the timing of the other parts of while loop. It's 58/60 seconds (most of it)

 So the majority is the main while loop. Set three sets of data and calculate each part.
 (2.3) Effort 3. check the main while loop, split into three sections. 0 sec.  t1, t2, all 0. t3 takes all of them. Haha, found that link->setReadAccessTime.count() accounts for most of the time!
Even with the use of "find" it's no good. 
  t1: 16, t2: 51, t3: 120.
  treis to replace link->setReadAccessTime.find()!=end() with a simple loop
  t1: 16, t2: 48, t3: 122, maxset size: 12.

 Decision: keep the link->find() solution and remove timing t1 to t3.
 Now the trace slicing time down to 60 seconds.


-----------------------------------------------------------------------------
Task 73: Problem with 0x00403604
-----------------------------------------------------------------------------
Problem: at addres 0x0041DE80 the two version's value do not match.
	Reason: Instruction 0x004035c1 is set to NOP. It should be depended by 0x004035f0.
Debugging:
	(1) verify the following fact: 0x4035c1 sets up the value of 0x0041de80, and then
	0x004035f0 reads it. Verified. -> value changed to 0x003200d0. DONE.
	(2) check the dump and verify the dependency is set up correctly.
		Note: 0x004035c1 is in the dependency list.
		In the dependency list: 0x00403f38 (once), 0x004035fc (many times - loop), 0x004035c1 (once)
	Observation: the time stamp of 0x004035c1 are all 0x90xxx, but the timestamp of 0x4035fc's
	are 0x60xxx. It seems that the older time stamp of 0x004035c1 is KICKED OUT. Got to remove
	the logic of keeping the size of time stamp.

	After the change is made: no big change on slice, however, time increased to 80seconds.
  SOLVED! 

  Up to now, the SLICING ALGOIRTHM is completely working! YEAH!


-----------------------------------------------------------------------------
Task 73: Slicing Improvement: ESP/EBP problem
-----------------------------------------------------------------------------
  Simple program: 
         1 int x = f(a);
         2 int y = f(b);
         3 int z = x + 5;
	 4 print(z)
  Backward slicing from z, then the 2nd statement should NOT be called at all! But in binary
 slicing, function calls f(a) push/pop parameters, CALL/RET modifies ESP/EBP
 registers, it makes the function call g(b) depends on the instructions. See below:
       
1	push EAX		; ESP -=4
2	call FUNC_F		; ESP -=4
3           PUSH EBP		; ESP -=4
4 	    MOV EBP, ESP	; EBP = ESP
5	    //do something
6           POP EBP		; EBP covers to OLD EBP, ESP+=4
7           RET			; ESP+=4
8	add ESP, 4		; ESP+=4
8.5     mov addr_var_x, EAX	; x = f(a) done
9       mov ECX, some_var_addr
10      push ECX		; ESP-=4
11	call FUNC_F		; ESP-=4
12          PUSH EBP		; ESP-=4
13	    MOV EBP, ESP	; EBP = ESP, 
14	    //do something
15          POP EBP		; EBP = old EBP, ESP+=
16          RET			; ESP+=4, depends on 15
17	ADD ESP, 4		; ESP+=4, depends on 16
18	mov EDX, variable_z	; depends on init
19	mov EcX, var_a		; depends on 8.5
20      ADD EDX, ECX		; depends 18, 19
21      mov var_z, EDX		;  depends on 20
22	push var_z		; trace starts, depends on 21 AND 17!!! WHICH INCLUDES 11 to 17
23	call print

At this moment: the slice size is shown below:
# total links visited: 546121, histSwaps: 8, duration 84.000000
Slice Size: 42507, In Range Slize Size: 1411, In Range Trace Size 2284

  Plan, will need a rather complex multiple pass algorithm. To accomplish this, we need to record the full time stamp and
dependency information.

-----------------------------------------------------------------------------
Task 74: Re-engineer the traceinstr package and introduce unit testing
-----------------------------------------------------------------------------
  TraceManager - Trace - instrTrace - Instruction - InstrInfoStore - InstrInfo 
Implementation Steps:
  (1) establish folder, include handle.h and handle.cc first, and establish Makefile [2 hrs] DONE.
  (2) remove global variables in handle.cc and introduce several new functions [1 hr]. DONE.
	(a) remove cr3_to_trace and add function isProcessNameToTrace
	(b) remove global variables trace, NAME_TO_TRACE, etc.
  (3) define class declarations. [1 hr ] DONE
  (4) implement handle.cc forward all calls to TraceManager [1 hr]
	(a) isProcessToBeTraced. DONE.
	(b) TraceManager Constructor. DONE. 
	(c) remove init_Tracer. DONE.
	(d) forward call dump(). DONE.
	(e) add save(). DONE.
	(d) revise has_instruction.  DONE
	(f) add_instr and handle_instr. DONE
  (5) implement a dummy isProcessToBeTraced and test it hits add_instr [0.5 hr]. DONE


-----------------------------------------------------------------------------
Task 75: experiment with monior
-----------------------------------------------------------------------------
 (1) find out where is the QEMU monitor. DONE.
     Functions related: do_loadvm(Monitor *mon, const qdict), defined in monitor.c
	the handle_user_command(char *str) command is more user firendly. no need to handle *mon
	and qdict. DONE
 (2) add a new MENU of QEMU called "batch_analyze" to QEMU monitor which calls BatchAnalyzer. DONE
	(a) add an entry in hmp-commands.hx (seems to be help file)
	(b) add a function do_bprocess in monitor.c
 (2) add an "BatchAnalyzer" class. DONE.
	(a) add an empty BatchAnalyzer class, singleton pattern. DONE.
	(b) add a wrapper function in handle.h. DONE.
	(c) in BatchAnalyzer class calls loadvm. 
		There is a global variable called default_mon, call command handle_user_command
		Trouble is with the cross reference (linking).
		Make sure to wrap the functions in "extern 'C'", and put the cross-ref part
		in dummy.cc. DONE.
	(d) finally solved. need to remove the "STATIC" keyword before the "handle_user_command"
		in monitor.c!!!!! - "static" hides the function in the linker!!!! 

		Still does not work.  separate BatchAnalyzer.o from libinstr.so. 
	(e) another problem. Both qemu-i386 and qemu-system-i386 are compiled. qemu-system-i386 pass through,
	however qemu-i386 did not pass because there is no monitor.cc at all. Need to find out the way to handle it.
	---> solution: the $(all-obj-y) includes the conditional inclusion of
	monitor.o, so just put BatchAnalyzer.o together with minotor.o (when they are cross-referencing each other!!!!!!!!).
	(f) gdb to find out if it works. Fix: we have to copy the .so files to /usr/lib first.
---- DONE NOW ----


-----------------------------------------------------------------------------
Task 76: Implement BatchAnalyzer class
-----------------------------------------------------------------------------
  Design idea: since their is batch processer. All jobs should have similar configurations.
  (1) Nail down specification of config.txt. DONE
  (2) create function: loadvm(const char* name) [20 min]. DONE
  (3) create function: copyFileToVM(char *fileBasePath).  DONE.
  (4) Problem: after loadvm, the control does not resume. 
	It seems to be the problem of special chars.  DONE.

-----------------------------------------------------------------------------
Task 77: Figure out waiting for process termination (fast way) 
-----------------------------------------------------------------------------
  (1) check output. FAILED. did not capture anything. can be done later, e.g., to capture the OUT instruction of the
	device to find out the chars being printed. Need to later figure out how is I/O processed.
  (2) loadvm problem. DESIGN: could wait until 2 more new processes are discovered by instruction execution mode to know that
	VM is successful loaded.
  (3) general design: need to have an event triggering method, when 5 processes are discovered, trigger the next function.

-----------------------------------------------------------------------------
Task 78: Create Logger Class and Util class
-----------------------------------------------------------------------------
Design:
  (1) It should support multiple logging mode
  (2) It should display on screen
  (3) It should have a task name 
  (4) There should be one logger for each task
Implementation:
	(1) Util.isFileExist. DONE
	(2) Util.createDir. DONE
	(3) Logger. constructor and test. 20 min.  DONE.
	(4) implement the log and dump functions. DONE.
DONE.

-------------------------------------------------------------------------------
Task 79: Event Trigger Mechanism of BatchAnalyzer class
------------------------------------------------------------------------------
Design:
  (1) load vm, triggers event (more than 3 processes now) - number of processes will be cleared to 0 in QEMU part initially
		right before loadvm
  (2) net use command, triggers event (terminal print - command completed. Seems we can intercept sysenter call and
		check the data , check zwWaitReplyRequestPort!!!)

-------------------------------------------------------------------------------
Task 80: Implement Parse Configuration File of BatchAnalyzer
------------------------------------------------------------------------------
Implementation:
	(1) Create a JOB Class to include common configurations of jobs and specifics. [20 min] DONE.
	(2) implement BatchAnalyzer parseConfig. [220 min]
		(1) file operations. DONE.
		(2) parse line. DONE
		(3) Util string related functions. split etc. DONE.
		(4) Util string comparison. DONE. 
		(4) Implement the parse function . DONE
	(3) in do_jobs, get the folder one by one, create a TraceManager class and carry out the job [30 min]

-------------------------------------------------------------------------------
Task 81: Generate Jobs and Gen Raw Trace
------------------------------------------------------------------------------
Idea: continue the implementation of do_jobs(). Retrieve the job one by one.
  (0) Change config.txt to type GEN_RAW_TRACE [5 min] DONE.
  (1) in do_jobs, for each directory, get the sub_directory name, call genJOB(sub_dir)
which returns a JOB.  Temporarily in the test folder add a function to test do_jobs,
but need to delete it later. [120 min] DONE.
  (1.5) introduce Util::error_exit. 5 min. DONE.
  (2) implement genJOB(sub_dir), depending on the job category, read the CONFIG file
if necessary. Set up the job_base_path (this is an instance property). Based on the
category of jobs, read the config file correspondingly.
[15 min]  DONE.
  (3) in do_jobs call execJOB(job). this is basically a switch case that calls
the corresponding exec_TYPE_JOB function. [10 min] DONE.
  (4) implement execGenRawTrace(job). It creates a TraceManager (init constructor
with job instance). [15 min]
	(a) add TraceManager constructor. [8 min] creates trace instances for the processes
		to monitor respectively.
	(b) furnish Job initializer to read in all details.
		(b.1) add SMB folder in Job [5 min]. DONE
		(b.2) add Util::getRelativePath. DONE.

-------------------------------------------------------------------------------
Task 82: Implement the event capturing mechanism 
------------------------------------------------------------------------------
Implementation Sequence:
	(1) introduce struct tracer_event and define the first three categories. [20 min] DONE.
	(2) add function send_event() to handle.h and handle.cc [10 min] DONE.
	(3) implement the vm loaded event [15 min] DONE.
	(4) implement clear_process_info and add it to loadvm[15 min]. DONE.
	(5) implement the process terminate event. [45 min]
		(a) find how process termination is discovered. [15 min]
			FOUND IT. it's in seg_helper.c, helper sys_enter.
		(b) place in sys_enter[15 min]. DONE
			NOTE that EAX can NOT be read directly from env, there is a global variable EAX_BEFORE_SYSENTER
			set in translate.c. DONE.
		(c) test it [15 min]. DONE
		
	(6) implement the print string event. [1.5 hr]
		(a) use XP to compile many different printf statements and putchar and cout [15 min]. DONE
		(b) use IMM to find out the corresponding syscall and the EAX number and where are the data [45 min]
				For putchar: EAX=0xc8, DATA LOCATED AT   *(EDX+0xc)+0x30, it's a string terminated with 0
				for printf:  Yes, it is also *(EDX+0xc)+0x30, however, it's not terminated with 0
				For cout, it's printing character by character.

			Next question is: where is the data length? Need to trace through printf.
						It's located at *(EDX+0xC)+0x84

			So data is located at: *(EDX+0xC)+0x30
				Data size is located at: *(EDX+0xC)+0x84
				
		(c) figure out how EAX_BEFORE_SYSENTER is handled and maybe get the EBX_BEFORE_SYSENTER as well [30min]
			(1) similary create EDX_BEFORE_SYSENTER in CPUx86State. Copy it in the
		gen_save_regs function in translate.c [10 min] DONE.
			(2) verify the print function is 0xc8 in EAX. [15 min]. DONE.
				verified needs both EAX to be 0xc8 and ECX to be 0x0098007c
	
		(d) implementation in sys_enter in seg_helper.c [15 min]
			(1) create the conditional capture in helper_sysenter [5 min] DONE.
			(2) declare a function copy_string(char *buf, int &buf_content_size, ulong start_addr, int size) [5 min] DONE
			(3) add printf_buf, print_buf_size, and then  call copy_string [5 min] DONE
			(3.5) test if copy_string is called and if the data is retrieved right. DONE.
				Found that actually there are two modes. 8-bit char and 16-bit wide char.
			(4) implement copy_string, whenever encounter a "\n", send the event and
				reset the buffer. [20 min] DONE
			(5) test copy_string in the test folder.

	
-------------------------------------------------------------------------------
Task 83: Implement the event handling mechanism 
------------------------------------------------------------------------------
	Idea: BatchAnalyzer maintains a vector of tasks. Each task is triggered by
	a certain event. Tasks are executed one by one and can be timed out. For example,
	in executing a job, the tasks are:
		(1) loadvm (trigger: none). complete status: timeout or loadvm event received
		(2) execCommand: "net use y: \\10.0.2.15\smbuser". complete status: 
			"net use success message", or time out, or "failed"
		(3) execCommand: "copy y:\b1.exe .\" complete status: copy completed or time out
			or failed
		(4) execCommand: "b1.exe" complete status (process terminate captured or time out)

  Implementation:
	(1) define class Condition. [15 min]. DONE.
		getStatus()
		setStatus(evt)
		three status available: UNDECIDED, FAIL, SUCCESS
	(2) define derived class ConditionOnLoadVM [60 min]. DONE
		The tricky linkage problem again. 
	(3) define derived class ConditionOnProcessTerminate [15 min]. DONE
		an an attribute on CR3.
	(4) define derived class ConditionOnPrintString [15 min]. DONE
	(5) define class Task, which a success condition [30 min]. DONE
		Task(timeout)
		fire(); //calls do_job and starts timer thread
		do_job()
	(6) define derived class loadVMTask [15 min]. DONE
		(a) define loadVM Task. DONE
		(b) add function addTask(). DONE
		(c) add configurations: LOADVM_TIMEOUT, NETUSE_TIMEOUT, COPY_TIMEOUT, TASK_TIMEOUT. DONE
		(d) get rid of the vtable problem. DONE.
			It's caused by pure virtual function. However, the compiler complains about 
			constructor, which is silly.
		(e) add function startExecuteAllTasks(). DONE.
		(f) add semaphore. DONE.
		(g) add execNextJob(). DONE
		(h) add waitForAllTasksComplete(). DONE
		(i) implement the fire() calls two methods, do_job and set_time_out. DONE.
		(j) finish the timeout event handler. DONE. 
		(k) finish the loadVMTask::do_job. DONE 
		(l) add send_event to BatchAnalyzer so that event could trigger execNextJob.
			Logic: find the current task, set the condition. check the condition of the
				task, if it's not UNDECIDED, then either move on to the next task, or fail the
				entire job.
			Implementation Steps. 
				(1) add a Logger instance to the Job class.	  [20 min] DONE.
					(a) add a JOB counter
				(2) add various log statements to each step of execute a job. [15 min]. DONE
					(a) add Logger instance to task, do not delete it as this is the logger
						for job.
				(3) implement the logic of send_event (add a function to BatchAnalyzer) [20 min]

	(7) add handle_timeout_event() to BatchAnalyzer [30 min]
			(a) add current job to BatchAnalyzer DONE.
			(b) add Util::EvtToString. DONE
			(c) add Job::toString(). DONE
			(d) add genTasksForGenRawTrace(Job *job). DONE
			(e) handle time out event. DONE
			(f) handle other events. 
				(1) add checkEmptyList(). DONE
				(2) update execNextTask.

	(8) test the framework up to now
		(1) test logging. Create two jobs and let it work. [25 min]. DONE
			Problem: it kills directory. FIXED.
		(2) fix the abort problem.
			Problem: when qemu tries to do_all_vm stop, it tries to stop all vcpu, which calls
			pthread_cond_wait, that needs the assumption the global mutex is owned by the 
			thread. However, the task itself is a new thread, which does not own the global
			mutex.
			Fix idea: when executing a task, don't create a new thread, just call the do_job
				directly. Except the load_vm task, all other tasks are asynchronous. 
				The do_job of the task will be finished immediately (then it's done). 
				When the event comes in, the do_job of the next task will be called.
				Similarly, execNextJob will be called.  
					So there is no need to use semaphore at all.
			-- 8:00am 08/10
			(a) fix logger of BatchAnalyzer. DONE
			(b) add logger message to execNewJob. DONE 
			(c) remove extra thread in executing task.DONE
			(d) implmeent the handle_event.  DONE.
			--- 9:00am 08/10.
			(e) implement TraceManager destructor. DONE.
			(f) fix handle_event log. DONE.
			(g) fix the BatchAnalyzer log problem. DONE.
			(h) fix execnextJob at the end of job list problem. DONE
			(i) fix the display level problem. DONE
		
	(9) define derived class netuseCommandTask
			--- 10:00am 08/10/2013
			(a) declare conditionOnMsg. already defined. DONE.
			(b) declare class and its constructor.DONE
			(c) add Task::toString(). DONE.
			(d) add Task in job. DONE.
			(e) add config samba_ip. DONE
			(f) add NETUSE timeout. DONE
			(g) fix bug of execNextTask (should not remove the current task). pop should
				be done in handle_event. DONE.
			(h) fix the swapping sequence of fail and success message. DONE.
			(i) fix the problem of deleting task twice. DONE.

			--- 11:00am 08/10
			(j) find the fail string "was not". done. However, it could take too long. Need to
					use the timeout to stop the task. DONE.
			(k) fix the check of timeout, and change timeout value. DONE.
			(j) fix the timeout issue, no event sent. DONE.
			(l) now another problem when timeout: it crashes on segmentation fault. FIXED. this is
				because the task has already been deleted. DONE
			(m) another similar problem when a thread comes back, logger becomes invalid. Remove
				logger command for message. DONE
			(k) fix the qemu_cond_wait again. The problem occurs when the previous command timedout,
				check what if it does not time out. seems it does not occur again

		--- TO DO ------------
			(o) check why it times out the net use command. The system does not have a 
				abort when net use is successful. However, it aborts with the error on
				sema_condition wait when it tries to stop all VM. This might be because that 
				the system is doing I/O and it may have locked certain devices. DONE.
			(i) figure out the handle_user_command and see if there is anything with the lock.
				Found that handle_user_command is done in main_loop in vl.c:2007 (maybe that's the
				safe place to call.  DONE.
			(j) VERIFICATION: disable network run net use, and then loadvm in monitor and see
				if the same error could occur. DONE. Verified. 
			(j) in vl.c add two functions: (1) append_user_command, (2) execute_user_command. Keep
				a buffer of char * cmds[] to store commands. Test it in Util.cc first. 
				This should be a big array maintained by two indexes.
				(a) make the framework go through. 
 				COULD STILL NOT GET THROUGH AFTER MAKE CLEAN. Just leave
				one target dir in config-host.mak (and remove all other objectives!!!!!)
			#################################################################################
			#############!!!!!!!!!!!!!!!1 COPY IN config-host.mak config-host.mak.cp!!!!!!k
			#################################################################################
				(b) add a queue list to BatchAnalyzer. DONEj
				(c) call it in main_loop.c:407. still problems with linking. Add BatchAnalyzer.o
						in Makefile.objs. !!!!####????? still problems.
				(d) replace all handle_user_commands in BatchAnalyzer with addCommand. DONE
				(e) fix the compiling error. linker problem. Still does not work. Linker traces back
					to monitor.o which does not exist yet. check how actually it is linked.
					*** make it an .so file does not work. DONE, not solved
				(g) check if no calls to BatchAnalyzer.o, how it gets compiled. 
					#0  handle_user_command (mon=0x28db2680, cmdline=0x28db2ac0 "")
    at /home/csc288/qemu/qemu-1.4.0/monitor.c:3963
#1  0x082c08f5 in monitor_command_cb (mon=0x28db2680, cmdline=0x28db2ac0 "", opaque=0x0)
    at /home/csc288/qemu/qemu-1.4.0/monitor.c:4602
#2  0x081f0120 in readline_handle_byte (rs=0x28db2ac0, ch=13) at readline.c:373
#3  0x082c0849 in monitor_read (opaque=0x28db2680, buf=0xbfffe31c "\r\005\235\267", size=1)
    at /home/csc288/qemu/qemu-1.4.0/monitor.c:4588
#4  0x081d7776 in qemu_chr_be_write (s=0x28c38550, buf=0xbfffe31c "\r\005\235\267", len=1)
    at qemu-char.c:164
#5  0x081d88a7 in fd_chr_read (opaque=0x28c38550) at qemu-char.c:588
#6  0x081b1616 in qemu_iohandler_poll (readfds=0x89ea1e0 <rfds>, writefds=0x89ea260 <wfds>, 
    xfds=0x89ea2e0 <xfds>, ret=1) at iohandler.c:124
#7  0x081b22ee in main_loop_wait (nonblocking=0) at main-loop.c:422

			->> main-loop.o --> iohandler.o --> qemu-char.o (Makefile.objs)
			--> at line 164 of qemu-char.c, it calls sHandlerOpaque (so monitor's function
				is passed as a function pointer at dynamic time).
			
				(h) now the dependency relation is as below
				monitor.o <---> BatchAnalyzer.o -->libinstr
				monitor.o ---> libinstr
				main-loop.o ---> BatchAnalyzer.o (explicit. THIS CAUSED that mainllop.o 
					---> (depends) on monitor.o which should NOT be dependent on (causes a loop).

				Solution:
					(1)	 declare a void *f_cmd_handler(char *) function pointer 
						in main-loop.c, and call it DONE.
					(2)  in BatchAnalyzer constructor, resets the function pointer. FIXED.
			Note: network has to be set up, otherwise loadvm is not successful.

			(j) fix the broken logger issue. It seems that the problem disappeared.
	
 
	(10) define task taskCopy.		DONE.
	(11) task execTraceCommand. DONE.
		(a) define a similar class. 10 min. DONE.
		(b) call the creator of class. 30 min.
			(b1) framework. 10 min. DONE. 
			(b2) create a new TraceManager for each job. DONE.
			(b2) addProgramToTrace. 10 min DONE
		(c) test. 15 min
			(b1) refine message log for task complete. DONE
			(b2) test completed.
	(12) test and fix problem: 2ND JOB not able to capture load vm message. DONE
		(a) debug: check where clear_process_info is called and fix it. DONE.
		(b) test. Now both processes are captured successfully.DONE.

 

-------------------------------------------------------------------------------
Task 83: Raw Trace Trigger Mechanism
------------------------------------------------------------------------------
  Idea: the helper_trace2 function will capture each instruction, it then sends to
	TraceManager isProcessToBeTraced to set up the process status. Then the handle_instr()
	is called for each instruction for process to be traced. Also add_instr() might be
	called for the 1st time an instruction is encountered.

  Step 1. make sure isProcessToBeTraced is handled properly. When a new process to be
		added, in TraceManager keeps two mappings, from process name to cr3 and from cr3 to
		Trace. When a new process comes in, update the record.
		(1) check isProcesssToBeTraced is called. 10 min. DONE.
		(2) add cr3ToTrace in TraceManager and create Trace class.
			(a) add Trace constructor. DONE.
			(b) addProcessInfo(cr3, procname);
		(3) test if process information is added and test logger for Trace.
			set BP on isProcessToBeTraced, Trace::Trace
			Problem: the system is not able to capture the process (missed some of the processes).
			Debugging 1: (1) BP on taskAnalyze, (2) bp on helper_trace2. Found that the discovery of
				new CR3 is actually never hit!
			Debugging 2: (1) BP on taskAnalyze::do_job, (2) bp on helper_trace2, find all process
				ids and set condition bp to discover new ID at the entry. Attempt: clear arrCR3.
				failed.
			Debugging 3:  (1) BP on taskAnalyze, (2) bp on helper_trace2 and (3) bp on helper_sysenter (seg_helper.cc:2315). Still could not catch anything. Guess: maybe it's the print msg 
too early discharged the b1.exe. On copy command, use c:\ to discharge. VERIFIED. make a temporay
change to the exit condition to taskAnalyze, later will need to trace on cr3.

		(4) test: set BP on Trace::Trace. Fix log path problem. DONE.

  Step 2. Design the Trace class. The trace clas first provides a number of methods for updaing
		instructions and memory references. Internally, it keeps the following components:
		(1) an instruction store (in RAM) and a map from address to instructore store
		(2) history of trace (timestamp and the address of the instruction beging executed 
				and the memory address being referenced)A
		Later in full trace, we'll establish the map between registers and mems.
		All stores will be supported by a in memory cache to write the contents to file 
		from time to time.

  Step 3. Implement a Cache class in support of components of Trace. 
		(a) Definition of class. 40 min. DONE.
		(b) constructor. 20 min. DONE.
				(b1) set all properties
				(b2) create files
		(c) destructor temporory solution. 5 min. DONE.
		(d) appendRecord(char *bytes, int size). 20 min. DONE.
		(e) saveBlockToDisk. 20 min . DONE.
		(f) debug saveBlockToDisk. 40 min. DONE. 
			create cache of 5 records.
			append 6 records and see how it is written.
		(g) implement saveToDisk. 15 min. DONE.
			(a) save current block
			(b) save to index.
			(c) call it in destructor.
		(h) debug saveToDisk. 15 min. 
		(i) implement loadCache(char *filePath). DONE.
		(j) debug loadCache. 
		(k) implement loadBlock(long long int id), assumption 	Cache has been loaded. (20 min). DONE.
		(l) debug loadBlock. 60 min (stuck on a stupid memory overwriting, the report
			of segmentation fault does not yield the accurate location).
		(m) implement retrieveRecord(long long int id). 20 min. DONE
		(n) debug retrieveRecord. 20 min. DONE.
		(o) simple test. 15 min. DONE.
		(j) random test. 20 min. DONE.

  Step 4. Implement the has_instr() function. DONE and tested.
		declare an internal structure instr_quick_info(unsigned int addr, int char), 
	declare a hash_map on it.

		(e) unit testing. 20 min

  Step 5. Implement the add_instr(). DONE.
		There is going to be a global InstrInfo instance and and cacheInstrStore.
	Global instance writes into cacheInstrStore.
		(1) add the InstroInfo class. 15 min. DONE.
		(2) add the cacheInstrStore and InstrInfo instance. 10 min. DONE. 
		(2) add load_instr method. 10 min. DONE
		(3) add writeToCache method. 15 min. DONE.
		(4) add loadFromCache method. 10 min. DONE.
		(5) testing writeToCache and loadFromCache. 20 min DONE.
		(6) implement add_instr. 15 min. DONE.
		(7) simple test of addr_instr. 15 min. ---> found problems with hash.
		
  Step 6. Implement the handle_instr(), handle_mem_read, and handle_memwrite
	Idea: add an InstrExecRecord, includes addr, (timestamp) is implicit, memoryReadRange,
			memWriteRange. Backed up by Cache.
		(1) add InstrExecRecord definition. 20 min. DONE.
		(2) add InstrExecRecord instance and the cache instance. 10 min. DONE.
		(3) implement InstrExecRecord.exec(addr). 10 min. DONE.

		08/16/2013
		8:45am
		(4) implmeent InstrExecRecord.updateMemRead(addr). 15 min. DONE.
		(5) implement InstrExecRecord.updateMemWrite(addr). 10 min. DONE
		(6) unit test updateMemRead, updateMemWrite. 20 min. DONE.
		(7) unit test update_instr(). 15 min. DONE.

		10:00AM
		(8) implement InstrExecRecord.appendRecordToCache(). 15 min. DONE
		(9) implement InstrExecRecord.loadRecordFromCache(id i). 15 min. DONE
		(10) simple debug of append and loadrecord. 15 min. DONE.
		(11) unit test serialization. 20 min.

		11:00AM
		(10) hook up with qemu. 30 min. DONE.
		(11) simple debug on trace. 180 min
			(a) has_instr. DONE.
			(b) handle_instr. DONE.
			(c) add_instr. DONE.
			(d) handle_mem_read. . NETUSE slowed down to about 100 seconds. over 10 times slower.
				Problems of reading memory first.
				(d.1. 15 min) 
					add a simple hash trick to TraceManager::cr3 get and see if it's working to
					improve speed. 15 min --> shortened to 53 seconds.
					if without at all, it's 17 seconds.
				(d.2 15 min) change TraceMnager::getInstance to inline. Improves to 27 seconds
				(d.3 30 min) now move TraceManager::handle_mem_read and handle_mem_write
					both back to inline and header file. Still 27 seconds. does not improve a lot
				(d.4 15 min) move Trace::handle_mem_read to inline header file as well.
					improved to 24 sec.
				(d.5 5 min) use -O2 flag. best performance 22sec. no big difference.
				(d.6 15 min) handle_mem_read not showing up. FIXED.
				(d.7 15 min) problem handle_mem_read is hit first. change to all logs.
				
			(e) handle_mem_write. DONE. 
				(e.1 15 min) solve non-consecutive problem. bug solved.
				8:15AM 08/17
				(e.2 30 min) check other non-consecutive issues. Found the problem, some instructions needs to read. DONE
				two pairs of memory slots (such as the COMPARE instruction). Solution:
					[1] in InstructionExecRecorder, adds two sets of memReadStartAddr2 and memWriteStartAddr2 [8 min] DONE
					[2] update the logic [10 min] DONE
					[3] remove the error detection logic for startAddr<endAddr [5 min] . DONE
					[4] test [8 min]

		9:45AM 08/17/2013
		(12) add InstrExecRecorder dump. 53  min
			(a) add a config item. DUMP_ENABLED 1/0. [10 min] DONE.
			(b) add InstrInfo dump function. [15 min]. DONE.
			(c) add InstrExecRecorder dump function. [30 min]. DONE
				(c.1) add a function to Cache to get last id. DONE.
				(c.2) add the trce to the InstrExecRecorder. and add a number of functions for reloading instrProcessor.[20min]
			(c) call InstrInfo dump function in the appendToCache [8 min]
			(d) debug [10 min]
				set bp on InstrExecRecorder::dump DONE minor adjust.
			(e) test, using winxp image data [10 min]
		(13) integrate testing. 30 min. DONE.
			Logging speed is fast. less than 1 second.
		DONE- 11:45 08/17/2013

		(14) update the process terminate mechanism for task (chagne to process terminate task). (90 min)
			12:00PM 08/17/2013
			(a) change ConditionOnProcessTerminate, change the cr3 to a set of cr3 ints. [5 min]. DONE
			(b) add one function for add a CR3 [8 min]. DONE
			(c) change the setStatus to remove one cr3 from the set. if set is empty, set the status to satisfied. [8 min]. DONE.
			(d) create a new type of event in event.h (new_process_to_trace, cr3) [8 min]. DONE
			(e) change TraceManager::isProcessToBeTraced, call send_event [8 min]. DONE
			12:30pm.
			1:00PM
			(f) change Batchanalyzer::handle_event and add the handling of new_process_to_trace, modify the current top
				task.[15 min]. DONE
			(g) debug and trace if terminate process event can terminate task [30 min]
				set BP on the above functions
			(h) integrate testing [15 min]. DONE.
			1:26PM.

		2:00pm 08/17/2013
		(15) solve the issue that that a job is completed.
			(a) implement the destructor - delete the trace manager and set it to NULL [15 min]		. DONE
			(b) call destroyCurrentTraceManager in BatchAnalyzer::clearTasks() [5 min]. DONE
			(c) debug [15 min] . 
				(c.1) needs to fix the destructor of Cache. DONE.
				(c.2) check why history is not deleted. Strange the destructor is never being called.
		done: recorded about 650k instructions and the trace size is about 6.5MB (for about 1 second of execution).
		So 4GB max file size could support around 700seconds (10 minutes) of running.

		DONE; 3:10/PM 08/17/2013

		
-------------------------------------------------------------------------------
Task 84: Design Slicing Algorithm 
------------------------------------------------------------------------------
	(1) needs to add job specific config file. Specifies the slice starting point.
	(2) algorithm: for instrStore and execHistory, use the read only mode. Then create a copy of fullInstrStore and fullExecHistory, also supported by Cache. 
			1st forward processing and populate the fullInstrStore and fullExecHistory, establish the dependency between
				time stamps. So here we need to keep memory read/write cache and register read/write cache.
				Instruction needs to keep a copy of registers being read and written.
			2nd trace back from the starting point and trace backward (seems no need to keep all the edge information). 
		

	
-------------------------------------------------------------------------------
Task 85: Add and process slice config and other supporting classes, and the framework
------------------------------------------------------------------------------
	(1) add the config file: (1) SLICE_AT.  Check WinXP image. [20 min] DONE.
	(2) add the genSliceTasks() and make it triggered. [15 min]. DONE. 
	(3) make the config parsed in the job. [30 min]. DONE
	(4) add the config file: FULL_TRACE, and add genFullTraceTasks(). [15 min]. DONE.
	(5) add ConditionOnSynchTaskCompleted. declare a static method for generating task ID.  
			[15 min]. DONE.
			methods:
				static: createNewConditionOnTask()
				regular: int getID()
	(6) ass taskSynchoronized takes a ConditionOnSyncTask, has an internal ID for condition.
			when finish triggers a send_evnet. DONE.
		(a) create a new event_type and ID value [8min]. DONE.
		(b) add the taskSynchronized class [10 min]
  DONE ----


-------------------------------------------------------------------------------
Task 86: Implement Full-Trace Function.
------------------------------------------------------------------------------
	(1) add taskFullTrace [10 min]. DONE.
	(2) call loadCache to load the raw_trace instrStore [10 min]. DONE.
	(3) call loadCache to load the raw_trace execHistory  [5 min]. DONE
	(4) create InstrExecRecorder with raw_execHistory [5 min]. DONE
	(5) drill down to each folder of raw_trace. [15 min]. DONE.
	(6) Modify the InstrExecRecorder so that it does not depend on trace, but 
		on instrStore only! [30 min]
	(7) merge the above into trace::loadTraceFromDisk(); [20 min] DONE.
	(8) delcare Trace::expandFromRawTrace(trace); [5 min]. DONE
	(9) create the instrStore and exechistory and so on. [15 min]. DONE
	(10) Load InstrStore. DONE.
		(a) set up the loop to read the raw instruction one by one. DONE
		(b) call the add instruction one by one - BUT arrBytes not initialized yet!!!. FIXED. DONE!!!
		(c) update the registers for add_instruction. copy the implementation from OLD.
		----------- TO DO ---------------------------
		9:00AM 08/20/2013
		(d) update appendToCache for InstrInfo, to append set of input/output 
			registers. [20 min] DONE.
		(e) update loadFromCache for InstrInfo [20 min]. DONE.
		9:40AM DONE

	(11) create a memory write cache, at this moment, use unordered_map. But wrap it with
		an inline function. [15 min]
		(a) declare CachedMap, internally it has an unordered_map at this moment. 
			[20 min]. DONE
		10:00AM.
		10:45AM
		(b) declare an instance of CachedMap in InstrExecRecorder.[10 min]. DONE.
		(c) declare a class called dependLink [45 min]
				data members: flag, type, timestamp, ESP/EBP value and encoding [15 min]. DONE
				function: serializeTo(ptr) [15 min]. DONE.
				function: deserializeFrom(ptr) [15 min].DONE
				unit_test: [20 min]. DONE.
		(d) fix the old errors in unit testing. . DONE

	(12) create a register write cache. [10 min]. DONE.
	(13) update the logic
		(a) update the logic of handle_mem_write [10 min. [DONE]]
		(b) add an array of dependLinks and the count. Iniitlize count in handle_instr. [10 min]. DONE.
		(c) update the logic of handle_mem_read [15 min]. DONE.
		9:00AM 08/21/2013
		(d) update the logic of handle_instr and update the register updates. Note the handling of the old version.
			[20min]. DONE
		(e) record the ESP/EBP value. 1st attempt. simply copy the ESP/EBP values for every instruction.
			(e1) add function gen_save_esp_ebp [15 min]. DONE
			(e2) insert it into disas_insn blindly [10 min]. DONE
			(e3) test the system performance [20 min] DONE.
				(a) bp on before/after save esp/ebp functions. net use slowed down to 40 seconds (27 vs 40). around 30% slow down.
		10:00AM 08/21/2013
		(f) modify the raw_trace generation.
			(f.0) remove the ESP_AFTER value, cause it can be recorded by the previous instruction [10 min]. DONE
			(f.1) add two flags: change_ESP, change_EBP to InstrExecRecorder [5 min]. DONE
			(f.1.5) add esp_val and ebp_val to handle_instr functions in various classes. [15 min] DONE
			(f.1.6) clean the directory, clear all dirs not i386 arch. [30 min]. DONE
				335MB -> 15MB
			(f.2) in handle_instr: record the ESP_BEFORE value [8 min]. DONE
			(f.3) in handle_instr: for the older instruction, record the ESP_AFTER value [8 min]. DONE
		11/20am 08/21/2013
			(f.4) in appendToRecord, compare the ESP/EBP value and set the flag [15 min]. DONE.
			(f.5) appendToRecord, serialize the ESP/EBP value [15 min]. DONE
			(f.6) retrieveRecord, dserialize the ESP/EBP value [10 min]. DONE
			(f.7) fix unit testing. [10 min]. DONE.
			(f.8) dump instruction, dump the information. [10 min]. DONE.
			(f.9) debug/test [20 min]. DONE
				(a) check trace PUSH/POP instructions. DONE. working.
		12pm 08/21/2013	
		(d) minor fixes [30 min] 
			(d.1) code review memmory write [5 min]. DONE
			(d.2) code review memory read [5 min]. DONE
			(d.3) code review register write [5 min]. DONE
			(d.4) code review register read [10 min]. DONE
		1pm 08/21/2013
			(d.5) implement the dump() about links [15 min]. DONE
			(d.5) implement the append and retrieve () about links [15 min]. DONE
		(e) integration testing and debugging
			(e.1) environment set up. [5 min]. DONE
			2pm 08/21/2013
			(e.1.5) expand gen_full_trace, call handle_instr, handle_mem_read, handle_mem_write specifically. [30 min]
			(e.1.8) debug BP on handle_instr [15 min]. DONE
			(e.1.9) fix load instrExecRecofder [5 min]. DONE
			(e.1.10) fix get_total_size problem. fix the Cache size problem. [20 min]. regenerate raw trace first. DONEk
			3:20pm 08/21/2013
			3:30pm
			(e.2) debug BP on handle_instr [20 min]
				(a) fix the eip==-1 problem. DONE.
				(b) fix log error on no find register problem. DONE.
				(c) fix the set of registers problem. needs to clear reg sets. DONE.
				(d) fix the timestamp increment problem. DONE.
				(d) continue fix the set of registers problem.  It's caused by a bug in InstrInfo serialization.DONE
			-------------------- 4:30pm 08/21/2013.

			(e.2) debug BP on memory write (Trace.cc:92). [8 min]. DONE
				(e.2.1) fix bug on size in mock_mem_access
			(e.3) debug BP on memroy read. [20 min]. DONE.
				(e.3.1) fix CachedMap not find case. unordered_map somehow now work right. no bug.

--------------------- TOO MANY ERRORS IN REG READ/WRITE, check it later
		8:30 08/22/2013
			(e.4) debug BP on register read/write [15 min]
				(a) bp on Trace.cc:99 and see how the registers are handled. Trace each addition to map. [25 min]
					It seems that the map is working. Check if this is caused by missing calls of approx_regcode.
					It is called. The system is complaining about reg_code 51. Check what is the register.
					Found that register 51 is cr3, 81 is eflags, and 85 is eip. 
				(b) read the OLD logic about processing registers. [15 min]
					It seems that it's simply ignored in Instruction::updateRegDependency if the cache returns NULL.
					In libdisas.h there are a number of functions defined to retrieve register ID.
						unsigned int x86_sp_reg(void);
						unsigned int x86_fp_reg(void);
						unsigned int x86_ip_reg(void);
						unsigned int x86_flag_reg(void);
				(c) define a collection of register constants in InstrExecRecorder.cc as private, and dismiss warning
						when necessary 
						and ignore EIP.
				DONE. There will be some warnings initially about registers, but eventually it will be fine.
		9:30AM 08/22/2013.
		10:00AM 08/22/2013
			(e.5) debug BP on dump 
				(a) debug into InstrExecRecorder [15 min]. DONE.
				(b) remove the Util::error_exit in error finding instruction at 423. [15 min]. DONE
				(c) fix the destructor of Trace. [15 min]. DONE.
				(d) problem with CachedMap destructor again. It seems to be always causing trouble. Remove
					template and make it fixed type. [25 min]
		11:20AM 08/22/2013
				(e) Trace destructor cause problem [60 min]. strange problem. could not figure out for a while ...
						remove struct quick_instr_info and replace it with std::pair ...
		12:30PM still not solved.
				(e) Trace destructor cause problem [60 min]. strange problem. could not figure out for a while ...
						remove struct quick_instr_info and replace it with std::pair ...
					--- new attempt: download valgrind and check memory problem.

		~~~~ STUPID. deleted the folder ~~~~~~~~~~ should have set up the git earlier!!!!! fxxk!
		1:34 set up git
			Take 11:00AM version of yesterday.
		------------------------- oops, price paid for stupid rm -fr ! ------------------------------------	
		12pm 08/21/2013	
			(e.1) environment set up. [5 min]. DONE
			*** app crash. 
		Use Valgrind to find problem:
			(1) use writeRegMemCache[xxx] = xxx. (illegal op, but strangely did not find out by compiler)
			(2) declare dependLink arr [5] (should dependLink *arr = new dependLink [5]).



9:00AM 08/23/2013
-------------------------------------------------------------------------------
Task 87: Use Valgrind to remove memory errors and buffer overflow [30 min]
------------------------------------------------------------------------------
	(1) identify the buffer overflow place. Found when 69th instruction crashes app. It seems that
			the stack canary word is located at ebp-0xc. It is modified. 
	(2) use watch to find out the problem
		Use "watch *0xbfffccdc" to
		catch it.
		It is caused by dependLink.serializeTo(ptr). which is called by InstrExecRecorder.appendToCache
		Found that the buffer is not big enough.
---- DONE!!!
		
-------------------------------------------------------------------------------
Task 88: fix the complaints about special registers. [20 min]
------------------------------------------------------------------------------
	(information) register 65 - 70: segment registers. 49-56 cr registers, 71-72 ldtr/gdtr registers
	Implementation: add an inline function is specicial register, and do not generate
			complaint message. [15 min] 
	DONE.

10:00AM 08/23/2013
-------------------------------------------------------------------------------
Task 89: check the max link problem [15 min]
------------------------------------------------------------------------------
	(1) code review. [5 min]
	(2) implementation and debugging: add a isVisited function. [10 min]

-------------------------------------------------------------------------------
Task 90: make sure that the history are showing up [15 min]
------------------------------------------------------------------------------
	(1) code review [10 min]
	(2) implementation and debugging [5 min]
	Trace around 18MB, instructions: 600k, around 30 bytes per instruction record.

-------------------------------------------------------------------------------
Task 91: check the logger problem, exits and deleted too early. [30 min]
------------------------------------------------------------------------------
	(1) add a string path to each Logger, or check it [8 min]. DONE.
	(2) debug into InstrExecRecorder and check its logger path [8 min]
			It's using the logger from the rawTrace.
	(3) plan for revision [5 min]
		(1) get the file_name from rawPath in BatchAnalyzer::gen_full_trace.
		(2) in Trace::constructFromRawTrace add an additional parameter
		(3) create Logger
		(4) when finish delete trace.
	(4) implementation [10 min]
	(5) debug. BP on Trace.cc:330 [15 min]

11:00AM 08/23/2013
-------------------------------------------------------------------------------
Task 92: check the logger problem of raw trace. [25 min]
------------------------------------------------------------------------------
	(1) debug and code review [5 min]
	(2) plan of revision [10 min]
		resets member to NULL to avoid deleting it. [5 min]
	(3) debugging [5 min]


-------------------------------------------------------------------------------
Task 93: Misc Tasks
------------------------------------------------------------------------------
	(1) handle event 106. [15 min] DONE.
	(2) solve the cache -> block_size memory allocation error. Check Cache destructor. [10 min] DONE.
	(3) fix minor dump problem [5 min] DONE.
2:00PM 08/23/2013
	(5) read dump and find more bugs [10 min]
	(6) fix timestamp dump issue. [5 min]
	(7) instruction 2 reg dependency problem. [30 min]
		(a) It's the problem of generating raw trace, 5 registers recorded for 
				instruction @@7c92289c [5 min]
		(b) dump_set [10 min]
		(c) debug into full trace, BP on  InstrInfo.cc:227 [20 min]
			processing registers is ok.
			need to check serialize_set.  ok.
			check deserialization. bp on Trace.cc:101.  
			Found the problem : deserialize_set	
		DONE.

3:20PM 08/23/2013	
	(8) improve layout of output. [5 min]. DONE
	(9) problems with instruction @7c9228a3, regiser dependency not right [30 min]
		check processing of registers. BP on InstrInfo.cc:229
			Implicit ops should be all treated as input_reg!
	(10) check the ESP updates problem. BP on InstrExecRecorder.cc:67 at 
			7c9228de. not solved yet.
4:30PM 08/23/2013
	
7:30PM 08/23/2013	
		Continue with (10): found the problem. Information loss when replay the handle_instr. [10 min]
		(1) create a new function: expandFromRaw(): [30 min]
				(a) copy memRead and regSet from raw
				(b) update the memory access
				(c) update the register access
		(2) update Trace.cc [8 min] DONE.
		(3) revert the original implementation , make sure it's onwly called in raw mode [10 min]. DONE
		(4) test raw mode [15 min] .DONE
		(5) test full mode	[30 min]
			(a) fix the timestamp. DONE.
			(b) fix the ESP link. DONE.
			(c) FIX the mem link. DONE (around 60k mem access without source)
			(d)	 fix the 0xFFFFF ESP/EBP value. DON'T FIX. Leave 0xFFFFF
			(c) dump the memory warning.
	

9:00 AM 08/24/2013			
-------------------------------------------------------------------------------
Task 94: Solve the ESP Link (0xFFFF) problem (25 min)
------------------------------------------------------------------------------
	Problem: when there are instructions like MOV EAX, [EBP+10], and the EBP value does not
	change during the last instruction, EBP_VAL is set to 0xFFFF. 

	Idea: declare two attributes LATEST_ESP and LATEST_EBP, init to 0xFFFF. Whenever ESP_VAL_AFTER
	is changed, update the LATEST_ESP value.

	Implementation: declare the two attributes in trace.h and update in InstrExecRecorder.expandFromLink.
	Then test
-- DONE.
		
-------------------------------------------------------------------------------
Task 95: Develop control dependency link
------------------------------------------------------------------------------
	Idea: an instruction has a dependency on previous instruction if the previous instruction is 
replaced with NOP, the control flow cannot reach the current instruction (however, exception
is given to jump/contorl instructions). Even if the previos instruction naturally flows to
the current instruction, there is still a control dependency.

	Implementation:
		(0) design [15 min]. DONE 
		(1) declare two attributes in trace.h: nxtImmediateAddr [5 min]. DONE
		(2) update nxtImmediateAddr in expandFromRaw [15 min]. DONE
10:00AM 08/24/2013
		(3) define function isTransferControl [10 min] - check old implementation. DONE
			(a) add type to InstrInfo [10 min]. DONE
			(b) add serialization support [15 min]. DONE. 
			(c) unit test (add real code) [20 min]. DONE 
			(d) insert 4 inline about checking type [15 min]. DONE
11:15AM
		(4) declare and set attribute in InstrExecRecorder: isLastInstrTransferControl [10 min]. DONE
		(5) add the control dependency logic, note: the possibility of context switch! [15 min]. DONE.
		(6) debug and testing [30 min] 
			(a) check CLINK. ok
			(b) context switch not ok.
				check instruction 3, why the length is no ok.
				check when length is constructed it is ok InstrInfo::load. ok
				check ... found the bug. loadInstr is called after the value is assigned. 
			(c) add logic to isRET.
--- DONE

1:30PM 08/24/2013
-------------------------------------------------------------------------------
Task 96: add task stop VM to stop vm (25 min)
------------------------------------------------------------------------------
  (1) Design [5 min]
  (2) add class stopVM and method stop VM [8 min]
  (3) add the task to gen_full_trace [8 min]
  (4) test and debug [5 min]
--- DONE

-------------------------------------------------------------------------------
Task 97: design and set up slice framework  [40 min]
------------------------------------------------------------------------------
	(1) Design [15 min] DONE
		(2) implement genTasksForOneSlice() [15 min] 
		(3) implement taskFullTrace [15 min] 
		(3) Trace *loadFullTrace(job_path)
	(2) implement genTaskForOneSlice [10 min]. DONE
	(3) implement taskOneSlice[15 min]. DONE.

2:40pm 08/24/2013
-------------------------------------------------------------------------------
Task 98: implement Trace::loadFullTrace [35 min]
------------------------------------------------------------------------------
	(1) Design [5 min]
	(2) simulate load raw trace [10 min]
	(3) debugging [20 min]
		(a) fix the name problem of full trace

-------------------------------------------------------------------------------
Task 99: set up the slice framework (ignore the PE parsing first)
------------------------------------------------------------------------------
	(1) Design [10 min] DONE.
	(2) add Trace::slice(job) [5 min]
	(3) search for the timestamp that contains the instruction [20 min]

9:40AM 08/25/2013
-------------------------------------------------------------------------------
Task 100 (Yeah!):  add bInSlice flags to both InstrStore and ExecRecord
------------------------------------------------------------------------------
	1. Misc. correct documentation in trace.h [15 min]. DONE
	2. add bInSlice to InstrStore and serialize it [15 min]. DONE
	3. unit test InstrInfo [10 min]. DONE
	4. add bInSlice to InstrExecRecorder and serialize it [15 min]
	5. unit test. [25 min]
	6. regenerate the raw and full traces. DONE.

-------------------------------------------------------------------------------
Task 101:  change the control link to include both the ESP and EBP value
------------------------------------------------------------------------------
	1. change the definition. DONE
	2. change the serializatoin. DONE
	3. change the dump information. DONE
	4. change the call. DONE
	5. test. DONE. 

-------------------------------------------------------------------------------
Task 102:  expand the interface of one_slice
------------------------------------------------------------------------------
	1. introduce class section. DONE.
	2. add the section to the function. DONE
	3. create a dummy section for testing purpose.  DONE

12:00PM 08/26/2013
-------------------------------------------------------------------------------
Task 103:  port the binWriter class
------------------------------------------------------------------------------
	1. copy and set up makefile. [15 min] DONE
	2. clearAllSecitons [10 min]  DONE.
	3. writeFile [15 min]. DONE. 
	4. writeBytes [10 min]. DONE.
	5. writeInstruction [10 min]. DONE. 
	6. getInstrInFileOffset [15 min]. DONE

-------------------------------------------------------------------------------
Task 104:  add section investigation function  to binWriter
------------------------------------------------------------------------------
	1. add prototype [10 min]. DONE
	2. read about PE format. [60 min]. DONE
		Data to read:
			(1) magic code "50 45" at 0x0d8
			(2) offset to PE header at 0x3C
			(2) number of sections 2 bytes at 0x0de-0xd8 = 0x6.
			(3) image base 4 bytes at 0x10c-0xd8 = 0x34
			(4) sections tarts at 0x1d0-0xd8 = 0xF8, each has 0x28 bytes
				offset: virtual size 0x8, 4 bytes
						virtual address 0xc, 4 bytes
						in file locatio 0x14, 4 bytes
						characteristics: 0x24 (4 bytes).
							Macro :  EXECUTE bit 0x20000000
	3. implementaiton [60 min]. DONE
	4. test [60min]
		(1) make the framework [10min]. DONE
		(2) copy the file and establish the folder [10min] DONE
		(3) test function [10 min] 

9:10AM 08/28/2013
-------------------------------------------------------------------------------
Task 105:  test notepad [25 min] 
------------------------------------------------------------------------------
	(1) copy note pad [10 min]
	(2) modify code [8 min]
	(3) test. [5 min]
DONE. 9:25AM

-------------------------------------------------------------------------------
Task 106:  copy slice file to job folder [1 hr]
------------------------------------------------------------------------------
	(1) implement copy function in Util [10 min] DONE
	(2) test the copy function in Util [8 min] DONE
	(3) implement the set up of the slice file [20 min]
		(a) read about how job is passed. Trace->name has the information. [15 min]
		(b) logic: the file to copy is the file which matches Trace->name, the destination
			directory is the name of the Trace. [10 min]
		(c) set up logger. [10 min] DONE.
		(d) implement (b). [15 min] DONE.
DONE. 10:52AM

-------------------------------------------------------------------------------
Task 107:  clear in-slice flag and set up slice framework function  [40 min]
------------------------------------------------------------------------------
	(1) clear slice flag and debug [15 min]. DONE
	(2)  slice algorithm design [25 min]
		while loop and look back:
			if the current ts in slice 
				for each dependency ts (memory, register, esp, ebp)
					mark in slice
				for each control link
					if previous instruction is not RET
					otherwise search ESP value not to exceed min_marked value, if found data dependency
					on non-section instructions, then need to mark all 
1:00PM
-------------------------------------------------------------------------------
Task 108:  Implementation: slicing algorithm
------------------------------------------------------------------------------
	(1) Design [30 min]. DONE
	(2) Declare function searchFunction(startTS, minTS, vecSections), it returns the timestamp
		which between startTs and timstamp there are no dependency points. [30 min]. DONE
	(3) implement the algorithm [1hr]. DONE.
	(4) implement the searchFunction() [30 min]. DONE
	(5) debugging [20 min]

9:00AM 08/29/2013
-------------------------------------------------------------------------------
Task 109:  Implement update Cache (3 hrs)
------------------------------------------------------------------------------
	(1) create Cache::updateRecord(id, buf, size) [30 min]. DONE.
	(2) unit test Cache::updateRecord [30 min]. DONE.
	(3) refactor InstrInfo::appendToCache and unit test it [30 min]. DONE
		(a) add an attribute id - but no need to serialize it.
	(4) add InstrInfo::updateCache() [20 min]. DONE.
	(5) uint test InstrInfo::updateCache() [20 min]. DONE
	(5) refactor InstrExecRecorder::apendToCache [20 min]. DONE
	(6) add InstrExecRecorder:updateCache[20min]. DONE
	(7) unit test updateCache[20min]. DONE
12:00PM

2:00PM
-------------------------------------------------------------------------------
Task 110:  Debug Slicing Algorithm (2.5 hrs)
------------------------------------------------------------------------------
	(1) use the updateCache [10 min]
	(2) debug the mainloop.[1.5 min]
		bp on Trace.cc:204
		(a) bug on missing updateCache. DONE
		(b) similar. DONE.
		(c) bug with full_trace generation. remove EIP from the register dependence. DONE
		(d) fix the processFunciton (condition on it). DONE.
			(d.1) fix missing of map getInstrID
		(e) refactor the handling of transfer control.
			(e3) define a function search for control link [15 min]
			(e4) call it after search for control link [5 min]
			(e5) debug [10 min]
				(e.5.1) make control link the last when building link
				(e.5.2) set attribute needToBeVisited
				BP on Trace.cc:279 and 224 --> identified it's a bug. Found the problem. It's the
				serialization.
7:00pm
		(f) debug processFunction. BP on 318, 346. FIXED.
		(g) fix the addrInSection. fixed.


9:00AM
-------------------------------------------------------------------------------
Task 111:  Debug Slicing Algorithm (2.5 hrs)
------------------------------------------------------------------------------
	(1) extra depending on 0x40102a. timestamp 591585. check when it has the flag set. 
		-- there seem to be bugs about serialization. [30 min]
			(a) set a BP on updateCache and see when it's updated. -- IT'S NEVER HIT. Now the question is
		who writes to that address?
			(b) BP Trace.cc:327 --> fixed. Found that it's the updateCache() not called after init of
		slice and tobevisited flag.

	(2) continue debugging: start from 0x40106d
		Found problem: control link should be the last one. DONE
		The above introduced a bug, fix it.
		BP on Trace.cc: 228, 240, 248
		DONE. work ok.
		Still has extra function call to remove (problem, extra ESP/EBP)

-------------------------------------------------------------------------------
Task 112:  Remove extra function call
------------------------------------------------------------------------------
	Idea: handle control link first and then handle ESP/EBP links
	Implementation :
		(1) define findTSWithESP(minTS, bool bESP) [10 min]. DONE
		(2) separate the for loop [10 min]. DONE.
		(3) handle the next curTS [8 min]. DONE.
	Still problems. seems need to add reverse link.
	---> algorithm design:
	add a new flag called ESP_DELAY_FLAG, when a timestamp (instruction) has an ESP delay flag,
	it needs to be examined.
		(1) if it is visited during the main while loop, treat it like a normal instruction. In
		another word, if we have something like
				add esp, 4
				sub esp, 4
				add esp, 4
				sub esp, 4
		We would include all these instructions instead of optimizing them
		(2) During processFunction, the processor first do a pass of all instructions in the
	function body, if none has any hard real DATA/CONTROL dependency, then the function can 
	be skipped. [this needs to be verified about the those instructions with ESP_DELAY_FLAG]
		for each instruction with ESP_DELAY_FLAG, search for the timestamp (before the call)
	instruction and make sure that there is no ESP modifying instruction in between the call
	and the target. If that is fine the entire function can be removed.
-------------- IMPLEMENTATION PLAN ---------------------------------------------------
	(1) add a ESP_DELAY_PROCESS_FLAG to InstrExecRecorder. Note, needs to change char to short int
		for the flag. Add two functions to set and get the attribute. unit testing [25 min]. DONE
	(2) In trace.cc, marge the second loop with the first loop, just call the setESPDelayProcessFlag [10 min] DONE.
	(3) modify findTSWithESP, add the logic for checking no esp updating instructions. [15 min]. DONE
	(4) update the processFunction, include a loop to check all esp delay instructions. [15 min]. DONE 
	(5) debug [25 min]
		(1) bp trace.cc256. find problem with serialization. FIXED.
		(2) error with full trace memory link at 0x401062. check --

8:30AM 08/31/2013.
-------------------------------------------------------------------------------
Task 113:  Debug the algorithm 
------------------------------------------------------------------------------
	(1) error with full trace memory link at 0x401062. check --
		Read code of InstrExecRecorder.cc. check RawTrace first. The addr being read/write do not match. 
		Need to regenerate all traces. Problem fixed.
	(2) Debug the sequence of instructions being processed.
		BP on Trace.cc:260 Mostly, found one bug in processFunction
	(3) problemProcessFunction: check of isEspDelay
		BP on Trace.cc:412. DONE
	(4) bug: setEbpDelay not work.
		BP on Trace.cc:286, and then then BP on 420. check 0x401029. Found the problem Ebp not serialized.
		fixed.
	(5) bug: check why findTSWithEBP returns -1. Check why EBP valueis 0xffffffff. It's caused by loadFromCache.
		(a) add a function called searchForESPValue(ts start, bool ESP) [15 min]
		(b) bp on 211.
		FIXED.
	(6) need to clarify the semantics of findTSWithESP. --> findTSWithESP_AFTER
		debug: BP on 221
		verify in winxp image: esp case ok. and ebp case ok.
	(7) decide what to do with findTSWithESP. DONE.
	(8) debug the new strategy findTSWithESP_AFTER again. DONE.
12:30PM
2PM
	(9) check processFunction return. DONE.
	(10) for the else case, should set the return instruction in slice. DONE.
	(11) for esp/ebp delay instructions, if processeded directly, should set them in slice. 
		
-------------------------------------------------------------------------------
Task 114:  add logging message.
------------------------------------------------------------------------------
	(0) fix the bug why the log disappears.  DONE
	(1) log process instruction.  DONE
	(2) log data link. DONE
	(3) log esp link. DONE
	(4) log call processFunction
DONE.

-------------------------------------------------------------------------------
Task 115:  find out the -1 problem in processFunction
------------------------------------------------------------------------------
	(1) Problem: last logged operation:
		process ts 599210 @0x402c05 inSlice:1 needVisit: 1 ESPDelay: 0 EBPDelay:
     	process control link ts 599209 @0x408571
		unexpcted tsRet -1 in processFunction()!
	(2) check 0x408571 in winxp image. The call instruction is at 0x402c00.
	(3) read the log and check:
			ESP: 0x12ff74  	at 0x402c00
			ESP: 0x12ff74	at 0x408571
		So the search should actually work but it crashed.
	(4) gdb into the case. Found the problem: expected_esp is not right 0xffff.
	(5) create and call function getESP_BEFORE_VALUE
DONE

	


-------------------------------------------------------------------------------
Task 116:  find out another -1 problem in processFunction
------------------------------------------------------------------------------
	(1) problem. before CALL and after RET, the ESP actually does not match!!!
process ts 569476 @0x7c911bff inSlice:1 needVisit: 1 ESPDelay: 0 EBPDelay:
     add data link ts 569473 @0x7c910c00
     process control link ts 569475 @0x7c910c02
unexpcted tsRet -1 in processFunction()!
	(2) fix idea: The problem is that sometimes in 0x7c section, the RET instruction
	does not have the exact stack pointer when come back from a function.
	use a second criteria when process functions.  As long as the next immediate
	instruction is the expected next immediate instruction. Then let is pass ,but
	generate a warning in the logger.

9:00 09/04/2013
-------------------------------------------------------------------------------
Task 117:  find out yet another -1 problem in processFunction
------------------------------------------------------------------------------
	(1) debug: set BP on Trace.cc:503 and ignore it 7 times.
		Error point: timestamp 555893, it should be back to 555873. The problem is 
		that diffESP is 36, greater than the threshold 32.
	(2) solution: assumption: recursive calls should not destroy its own stack. So if
	exact ESP match could not be found, we'll just check next immediate address,
	ignore the use of diffESP.

-------------------------------------------------------------------------------
Task 118:  find out yet yet another -1 problem in processFunction
------------------------------------------------------------------------------
	(1) problem timestamp: 394875.
		It seems to be caused by 7c918de7 -> ... syscall --> 0x80range --> sysexit --> 7c90eac7 [no match of previous instruction]
		But 7c90eac5 is NOT the prior instruction of 7c90eac7!!!
	(2) verify in winxp image.  0x7c90eac7 soon leads to zwContinue that jumps to the
	entry 0x4013d7. So 0x7c90eac7 is not right after a particular call, the stack is set up by
	the process loading procedure. So this is a little bit like the return oriented programming
	attack. Nothing new here! arrange the stack and return to the corresponding code to accomplish
	the logic, but the kind of setting is set up by the OS!
	(3) solution idea: ignore the case there is no matching calls. Generate a warning message.


2:30PM 09/04/2013
-------------------------------------------------------------------------------
Task 117:  improve speed
------------------------------------------------------------------------------
	(1) Improve processFunction speed by builing a pre-process table. DONE.
	(2) in Trace class introduce two members: callTable and tsToCallID [40 min]
		(a) class CallRetRecord [40min]
		(b) test CallRetRecord [20 min]
8:30AM 09/05/2013
		(b) declare call table and tsToCallID in Trace [15 min]
			(b.2) test the creation [15 min]
	(3) implement function setCallTable() 
		(a) declaration and compile [15 min]. DONE
		(b) set up the loop to process each instruction/timestamp [15 min]. DONE
		(c) handle call [15 min]. DONE
		(d) handle RET [15 min]. DONE.
		(d) add function searchForCall. [20 min]. DONE
		(e) add nxtImmEIP into CallRetRecord and update all functions. [20 min]. DONE.
		(d) handle ret [15 min] DONE
		(f) debug into setupCallTable [20 min]
2:00PM 09/05/2013
			(a) found bug of getESP_Value_Before, id 61, check later.... It seems
				to be the problem of loadFromCache. FIXED. needs to call eipToCallID. 
			(b) searchForCall does not return the timestamp. FIXED
			(c) fix the stackIdx problem. FIXED.
			(d) fix the ID 64 not in range problem. CCR updateCache problem, got to
				add callID
			(e) examine the search for call result.
				bp on 283, 299, 346
				
			(f) still the serialization problem. load 67 but load 56. FIXED.
			(g) check the not found case. 
					Involved in infinite loop at 5152,
					Problem area: 5213. Problem is called by sign extension from int to long
			long int, 0xFFFFFFFF is expanded to 0x00000000FFFFFFFF.

8:30AM 09/06/2013
-------------------------------------------------------------------------------
Task 118:  Test pre-built function table correctness
------------------------------------------------------------------------------
	(1) generate the dump. [10 min] bp on Trace.cc:419
	(2) examine the dump
		(a) check the first 10 matches [20 min] OK
		(b) check 5 calls in 0x4010 range. [10 min] ok.
		(c) check the no match case. [10 min]ok.

9:30AM 09/06/2013
-------------------------------------------------------------------------------
Task 119:  Use the pre-built function information
------------------------------------------------------------------------------
	(1) Design. [15 min]
	(2) update the algorithm of slice [15 min]
	(3) update the processFunction using the call table [25 min]
	(4) Debug processFunction:
		(a) step through [25 min]
		(b) test hasDependee case. [25 min]
	Problem: entireFunctionDependee needs fix.
2:30PM 09/06/2013
	(5) Problem: entireFunctionDependee needs fix. Change the logic, when the entry instruction
	is not in the section, then mark the entire function as entire function dependee [15 min]
		OK.
	(6) the -1 logic. and change warning level. [15 min]
	(7) problem: second slice fails . found the problem. raw trace missing.[25 min]

-------------------------------------------------------------------------------
Task 120:  Write to file
------------------------------------------------------------------------------
	(1) figure out how to call binwriter. [10 min] DONE.
	(2) call binwriter and write the file [25 min] DONE.
	(3) debug [15 min]
		(a) step through. OK
	(4) test. [20 min]
		Problem: 0x40103d (b=2) and 0x401057 (call ...) should not be included! Strangely they did
	not show up in the slicing algorithm. Debug: set write_slice

9:00AM
-------------------------------------------------------------------------------
Task 121: debug slicing. Find out why every executed instruction is included in slice 
------------------------------------------------------------------------------
	(1) bp on Trace.cc:579 and see if each InstrInfo is in slice.  [20 min]
	(2) conditional BP too slow. Use customized if branch to check if InstrInfo is messed. [20 min]
		They are never hit. So the problem is in write slice.
	(3) double check using xp image. works!

-------------------------------------------------------------------------------
Task 122: verify if the slice is executable
------------------------------------------------------------------------------
	(1) bp on entry of main. [15 min]
10:00AM
	(2) problem at 0x00406a61. Comparative study of the missing instruction:
		Break at call 0x00401330. [20 min]
		 The problem is at function call 0x00404A5c, it changes the value of EDI (but the
		correct version does not)
	(3) check function 0x00404a5c. Problem is with 0x0040473f. It changes the value of EDI.
		There is a bunch of pop instruction in the function body which resets the EDI instruction,
		and the sliced version is not right.
		It maybe be caused by the function call handling. [20 min]
	(4) Guess of the reason: [1] one instruction might be depended on multiple reasons, esp and memory.
		The current code avoids it to be procesesd twice, got to revert it. [2] first pass of
		function call match too relaxed, got to strengthen it. [20 min]
11:00AM
	(5)  cancel the redundancy check in dependency construction [15 min] DONE.
	(6) strengthen function call match first check [15 min] DONE.
	(7) run the system again and then check. [20 min]
		(a) unit test (a) generate full trace, and (b) slice. STILL DOES NOT WORK.
	(8) check if the slice works well by log
		0x00404A61 @ts 479689 --> 0x402741 @ts 479681 --> 0x40270c @ts 479531
		All of the above instructions are included

		Compare the ESP value side by side. At 0x40272f the ESP does not match.
		So need to change the logic again. When 2nd criteria is used to find function match,
		then the function cannot be USED! set the bOutHasDependency to true.
		fixed.


2:00PM 09/07/2013.
-------------------------------------------------------------------------------
Task 123: Found new problem. 0x00403d7a.
------------------------------------------------------------------------------
	Problem: At 0x404c0c ESI value is not right
	ESI value is from 0x404459. It is in slice, however, it is popping bad value.

	Guess of the reason: when an instruction is included in slice, it should forever be included.
	Thus, to produce an executable slice should be an incremental one. If an instruction is
	included in slice, then it may be invalidate the decision of passes.

	Implementation plan:
	(1) declare a boolean variable bNeedsVisit [5 min] DONE
	(2) before skipping a timestamp, check its instruction, if in slice, set bNeedsMoreVisit [10 min] DONE
	(3) at the end, set the instrStore [5 min] SKIPPED.
	(4) add setInSlice(long long int ts) to Trace class [15 min]
	(5) replace all ier->setInSlice [15 min]
	(6) testing. [20 min] done. BUT NOT FIX THE Poblem.

	Had to check each function call and see if the pair of ESP/EBP ok. The pairs of ESP/EBP are ok,
However, it's the problem of the contents. Which is not popped ok.
	Check the trace
	0x00404c0c (call esi, esi value not right) --> 0x@404459 (pop esi) at 458042
			--> @404431 (push esi) at 458025 

		By setting bps, we fouund that until 0x004039b5, everything matches.A

	*** problem found: call esi instruction did not yield the register dependency on esi!!!!.
	(1) check the raw trace. no finding.
	(2) BP on InstrInfo.cc:261. Problem is in register set up.
		It seems that for instruction call esi, esi is not included in op_ro.
	(3) try op_src and op_dest. they don't work.
	(4) try op_explicit. It is included in op_explicit.
		Use a temporary logic. If the explicit register is not included in write set, then in
		clude it in read set.
	(5) dependency problem fixed.


-------------------------------------------------------------------------------
Task 124: Another bug: 00409ad6
------------------------------------------------------------------------------
	Observation: bad and correct behaviors depart at 0x00409ace (call esi)A
	Problem is that instruction at 0x00409AB3 is ignored!!
	(1) check the set generation, BP InstrInfo.cc:289.
		Register set is ok. EAX is included in the output_reg of XOR eax, eax
	(2) Now the problem is to check why 0x00409AB3 is not included.
		The problem is that there is an instrcution setnz al instruction in between, which
	is approximated to eax. --> caused the problem!!!!
	Needs to redesign the register handling algorithm!!!!

-------------------------------------------------------------------------------
Task 125: Fix register handling
------------------------------------------------------------------------------
	(1) remove approx_reg_code [5 min] DONE
	(2) declare find_reg_code(int reg, int *reg_codes) return number of reg code [10 min] DONE
	(3) call find_reg_code [15 min] DONE
	(4) call gen_reg_code in register dependency analysis [15 min] DONE
	(4) implement find_reg_code [120 min]
		(a) declare all registers [20 min]	
		(b) declare map [60 min]
		(c) debug. [20 min]
		(d) double check all register maping [20 min]
	(5) check handling of ESP/EBP. check to avoid duplicate register.

	(6) modify inSameGroup instruction to include the mapping
	(7) debug. ok [20 min]
	(8) debut the case on 0x00409ac3.[15 min]  It should has two dependencies:
			0x00409abd (setnz al)
			0x00409ab3 (xor eax, eax);
	(9) inpsect full trace. [15 min] ok.
	(10) inspect now the slicing result. [15 min]
	(11) fix bug at Trace.cc:527
	(12) test [20 min] SLICE ALGORITHM NOW WORKS NOW!

-------------------------------------------------------------------------------
Task 126: Add stats report
------------------------------------------------------------------------------
	(1) declare a function named stats_report() [5 min] DONE.
	(2) report the stats in trace [15 min] DONE. 
	(3) report the stats in InstrStore [10 min]. DONE
	(4) test. DONE.

-------------------------------------------------------------------------------
Task 127: Study a simple i/o input program.
------------------------------------------------------------------------------
	(1) create a simple getchar() program with a branch. [15 min]
	(2) debug it in windows and see how it works.
		It goes through several layers/wrapper of read, and finally called
			Kernel32.readFilea --> Kernel32.readConsoleA
				--> kernel32.7c8713f9 (its first parater 0x0040f440 stores the
					I/O value)
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c9132f3 calls zwRequestWaitReplyPort
				--> at 7c90e3eb calls KiFastSysCall
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
	(3) trace it in qemu. 
		(a) fix bug on loading program. DONE
	(4) trace it on qemu. Around 2500 instructions in between
		ts: 600884 7c90eb8d (begin)
		ts: 602599 7c90eb94 end)
		--- logic below-------
		There are too many in, out instructions, trace from address 0x41f440.
		Sequence of events from backward is:
			timeStamp: 602645, ins @7c87160d: repz movs es:[edi], ds:[esi]
 read: (start: 0x25069c, end: 0x25069e)  write: (start: 0x41f440, end: 0x41f442) , DEPLINKS:  , R: 602638 , R: 602639 , R: 602644 , M: 264761 -- verified. This is called right after
 CsrClientServerCall

		--trace 0x25069c -->	 **** THIS IS NOT RIGHT!!!!! It should be a timestamp
between 600884 and 602559 (it is verified that 0x25069c is overwritten during the syscall)
!!!
			timeStamp: 264761, ins @7c90256d: repz movs es:[edi], ds:[esi]
 read: (start: 0x12eea0, end: 0x12eed7)  write: (start: 0x25069c, end: 0x2506d3) , DEPLINKS:  , R: 264745 , R: 264747 , R: 264757 , M: 262754 , C: 264760 ESP: 0x12ed50 EBP: 0x12ed58

---------------- Another strange fact: there are no IN instructions in between 600884 and 602599.
	Now try to parse the logic between 600884 and 602599. In the following @ is followed by timestamp.
	(1) @600887, it pops fs, actually it is set fs to 0x00000030 [this must be the one for kernel],
		then the input cx register is copied to ds and es. This will affect the calculation of
		virtual address (segment), but not affect the translation from va to physical addr yet.
	(2) @600974, it resets fs:[0], SEH handler. 
	(3) at 601292, there is an OUT instruction , then it reads from 0x800ca300.
			at 601300, it repeats roughly the same.
	(4) @601797, it resets cr3.
	(5) @602099, lldt instruction, 
	(6) @602106, reset cr3.
-----------------
	Note lldt is reset twice during the period. This may have something to do with the reason
why 0x25069c cannot be traced. CR3 is also reset twice (on for switch into kernel mode and
	the other).

			

8:30AM 09/12/2013			
-------------------------------------------------------------------------------
Task 128: Continue study how I/O works
------------------------------------------------------------------------------
	(1) trace into sendkey BP on monitor.cc:4602
		It eventually calls ps2_queue to queue the keyboard event.
	(2) data is read by ps2_read_data. It is called by the following sequence of functions:
		#0  ps2_read_data (opaque=0x28df9e60) at hw/ps2.c:191
#1  0x0814c057 in kbd_read_data (opaque=0x28ddf2ac, addr=0, size=1) at hw/pckbd.c:323
#2  0x082b44cc in memory_region_read_accessor (opaque=0x28ddf2d0, addr=0, value=0xaa0fdd70, size=1, shift=0, mask=255)
    at /home/csc288/qemu/qemu-1.4.0/memory.c:322
#3  0x082b4709 in access_with_adjusted_size (addr=0, value=0xaa0fdd70, size=1, access_size_min=1, access_size_max=1, 
    access=0x82b441b <memory_region_read_accessor>, opaque=0x28ddf2d0) at /home/csc288/qemu/qemu-1.4.0/memory.c:370
#4  0x082b4a04 in memory_region_iorange_read (iorange=0x28df9df8, offset=0, width=1, data=0xaa0fdd70)
    at /home/csc288/qemu/qemu-1.4.0/memory.c:415
#5  0x082ace1d in ioport_readb_thunk (opaque=0x28df9df8, addr=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:186
#6  0x082ac940 in ioport_read (index=0, address=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:70
#7  0x082ad599 in cpu_inb (addr=96) at /home/csc288/qemu/qemu-1.4.0/ioport.c:310
#8  0x082fc0d5 in helper_inb (port=96) at /home/csc288/qemu/qemu-1.4.0/target-i386/misc_helper.c:77
#9  0xafa8ed96 in code_gen_buffer ()

	Plan: [1] restart the raw trace and set BP at helper_inb, trace into it during runing b10.exe [15 min]
			helper_inb is called too often, try kbd_read_data
		[2] find out the corresponding eip and cr3.
			eip is 0x806f48ae, cr3 is 0x39000 [it's clearly not the cr3 of the target process]
			Instruction dump below: 
			@EIP 0x806f48ae: length: (1): in        %dx, %al
@EIP 0x806f48af: length: (3): ret       $0x0004
@EIP 0x806f48b2: length: (2): mov       %edi, %edi
@EIP 0x806f48b4: length: (2): xor       %eax, %eax
@EIP 0x806f48b6: length: (4): movl      0x4(%esp), %edx
@EIP 0x806f48ba: length: (2): in        %dx, %ax
@EIP 0x806f48bc: length: (3): ret       $0x0004

	After switching from cr3 0x39000 to 0xec400000 (the process to trace), the first instruction
is located at 0x804dbf63. Note that it appeared twice: @601798, @602107, both after a mov cr3, eax
instruction. It seems that somehow, somewhere it is switched to process 0x39000

	Next experiment: (1) first verify if the process id is always 0xec400000 --> verified yes
	(2) check when the switch happens, bp on helper_trace2 when cr3 is not 0xec400000

		The first instruction of 39000 is 0x804dbf67, and the last EIP (of process is 0x804dbf63)

	Experiment 2: check what's happenging after 0x804dbf60 (switch cr3), first stop at handle_instr
to stop at the process. Then bp at helper_trace2. Too slow add if branches. Intersting observation,
before the switch of cr3, the instruction is as follows:
(gdb) print print_instrRange(0x804dbf60, 0x804dbf70, env)
@EIP 0x804dbf60: length: (3): mov       %eax, %cr3
@EIP 0x804dbf63: length: (4): movw      %cx, 0x66(%ebp)
@EIP 0x804dbf67: length: (2): ljmp      0x00000005
@EIP 0x804dbf69: length: (3): leal      (%ecx), %ecx
@EIP 0x804dbf6c: length: (3): movl      0x18(%ebx), %eax
@EIP 0x804dbf6f: length: (3): movl      0x3C(%ebx), %ecx

	After the switch of cr3. Then it comes the long dump instruction which jumps to 0x81f8f5c4 (with the
new cr3 0x39000), then it switches back. So in summary, during this period that the trace is not
recorded, the IN instruction is executed.

	Experiment 3: check how IN is handled. There are several cases and some AL values
do get saved.

******************* PAGE TABLE TRANSLATION ************************************
	Task 4: study how memory word is retrieved. It calls cpu_ldub_code. Debug into it.
It's defined in include/exec/softmmu_header.h:98, by delving into the logic
 it's possible to get the hardware address either from the soft MMU logic or
page table logic.
******************* PAGE TABLE TRANSLATION ************************************

Experiment 5: generate another log.
Log1:
		ts: 600884 7c90eb8d (begin)
		ts: 602599 7c90eb94 end)
	1st cr3 switch: 601797 (relative to begin: 913) 
Log2: from 600599 to 
		602316
	first cr3 switch: 601515 (relative to begin: 916)
Exactly the same amount of instructions 
	So it's not the context switch, it's fixed jump. And it's not using busy loop. It must be using
some type of 

9:30AM 09/13/2013
-------------------------------------------------------------------------------
Task 129: Fix the logic about telling "non-voluntanary" context switch
------------------------------------------------------------------------------
Idea: add the case that jump, call, sysenter and their subsequent instructions should be
regarded not as context switch. There is a slight chance that there are really timer interrupt after these
instructions. We at this moment ignore that.
	Implementation Steps:
		(1) in Trace class declare a boolean attribute bRecordEnabled init true, 
			declare lastEIP [5 min] DONE.
		(2) define InstrInfo::getNxtImmAddr [5min] DONE
		(2) declare a function checkRecordStatus() [20 min] DONE
			check if it is context switch [non voluntary], if yes, then stop
			the recorder; if the mode is stopped, check if the ip is close to lastEIP, and re-enable the recorder.  [20 min]
		(3) use it and update all related functions. [15 min] DONE.
		(4) debug, bp on the above logic. [20 min] DONE
2:00PM 09/13/2013
		(5) TEST. read log file
			Problems: (a) dump EIP one instruction earlier. fixed. [15 min] 
					(b fix problem at iret.) [15 min] DONE.
					(c fix problem at sysenter.) [15 min]  DONE.
					(d) cr3 problem.  Verified it's ok. 
					(e) test the full trace and slicing.
						(e.1) handle the iret problem.
		DONE - rate is still around 60%.
4:00PM.

-------------------------------------------------------------------------------
Task 130: Handle CR3
------------------------------------------------------------------------------
	Idea: if current instruction is modifying CR3, for next immediate instruction's helper_trace,
		send event new_cr3. TraceManager receives the event and dispatches to the current trace,
		add the new_cr3 to trace (mapped it to current trace).
			When CR3 is switched back, TraceManager removes CR3 to trace.
9:00AM 09/14/2013
	Implementation:
		(1) in InstrInfo class add an inline function to check if it is modifying CR3.
			debug into the function 
			OK.
		(1.5) record a new trace. [15 min] DONE.
		(2) add a type event for cr3_changed [15 min]  DONE.
		(3) when CR3 changes, invoke send event cr3_changed and debug it [20 min] DONE
		(3.5) debug it. [20 min] DONE.
10:15AM 
		(4.1) TraceManager::setTraceNeedsCR3Update [15 min]
		(4.2) declare TraceManager::handle_cr3_change and use it in BatchAnalyzer [15 min] DONE.
		(4.3) implement handle_cr3_change. [25 min] DONE.
2:40PM
		(4.4) in InstrExecRecorder call setTraceNeedsCR3Update, move cr3_to_watch to Trace class [25 min]
		(4.5) fix the handle_cr3_change [20 min]
		(5)  debug Trace::handle_instr [20 min]
			(5.1) fix the out of memory issue. DONE.
			(5.2) fix the setTraceNeedsUpdateCR3. DONE.
		(6) problem: proc_status is not updated. FIX:
			(6.1) TraceManager::getCR3ToWatch [10 min] DONE.
			(6.2) TraceManager::getCR3ToRemoveWatch [5 min] DONE.
			(6.3) getCR3ToWatch() and getCR3ToRemoveWatch() in handle.h [10 min]
			(6.4) update CR3_to_watch by calling getCR3ToWatch() and getCR3ToRemoveWatch() in ops_sse.h
					[15 min]
			(6.5) debug: starts from TraceManager::handle_cr3_event, and then trace into helper_trace2. 
					[20 min]. OK.
		(7) problem: cr3 not switched back to original. Remove the protection.
			
-------------------------------------------------------------------------------
Task 131: fix CR3 handling code
------------------------------------------------------------------------------
	(1) in Trace class, change cr3_to_watch to a vector, and modify the setCR3_to_watch.
		Treat the vector as a stack. If adding an existing cr3_to_watch, pop error.
		create a function removeCR3_to_watch. Move both functions to the .cc file [20 min] DONE. 
	(2) Modify the TraceManager::handle_cr3 [15 min]
	(3) Debug:
		(1) general logic of handle_cr3_change. fix logic at 164. [15 min] OK.
		(2) check the logic of continuous adding.[20min ok.
		(3) FIX THE PROBLEM FROM 39000->00e4c000.]  change the logic to pop until the last one. add function 
			is CR3 in watch list. pop warning.
	(3) run and test. [20 min] 

8:30AM 09/17/2013
-------------------------------------------------------------------------------
Task 132:  fix CR3 handling bugs
------------------------------------------------------------------------------
	(1) the warning message in removecr3. DONE.
	(2) fix the double delete error. FOUND the problem, when trace pops the cr3, TraceManager did not remove the
		cr3 correspondingly. This causes recording of more instructions than necessary! Add TraceManager parameter
		to trace. 
	(3) problem with task completion. When delete trace, need to add additional logic to check the trace's cr3 
		watch list one by one and remove these cr3 in watch list. FIXED.
-------------------------------------------------------------------------------
Task 133:  Now read the trace of b10.exe, and find out how the i/o is processed
------------------------------------------------------------------------------
	Trace to follow
				0x401014 (getchar)
			Kernel32.readFilea --> Kernel32.readConsoleA
				--> kernel32.7c8713f9 (its first parater 0x0040f440 stores the
					I/O value)
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c9132f3 calls zwRequestWaitReplyPort
				--> at 7c90e3eb calls KiFastSysCall
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
   Problem: lost track of 401010. Did not even record the instructions until back from 401019.

	[1] Debug: set bp on helper_trace2 and check why 0x401010 is not hit. BP on 2394 of ops_sse.h. This time it
		works. Guess: maybe inaccurate context swith leads to the problem? [context switch right after jump].
	7c8713f9: 590165
	7c8715bb: 590168
	START: 590672 @7c90eb8d
	END: 592000 @7c90eb94
	IN instruction and access of 0x25069c is still out of range.

	Job next: check if the END address is really the end address.
		Debug process: bp on 7c8715bb (this address is hit ONLY ONCE during execution), and then on
			7c90eb8d. 
		It should return at 7c90e3ed, and then return to 7c9132f8.
	So the correct exit address should be 908725 @7c9132f8. 
	Now corrected: START: 590672, END: 908725

		During this period, there are IN instructions at:
			691019, 691070, 691112, 691184, 
		During this period, there are no access of 0x25069c --> trace data
	[a] 0x401019 (@2255318) EAX -> 2255302 (@402bd3) --> depends on memory 0x12ff48, --> @2255261 (@402bc4) 
				--> depends on 0x41f440
				--> timeStamp: 2255084, ins @40c0bd: mov    [ebx], al
					 write: (start: 0x41f440, end: 0x41f440)
				--> timeStamp: 2250831, ins @7c87160d: repz movs    es:[edi], ds:[esi]
				 read: (start: 0x25069c, end: 0x25069e)  write: (start: 0x41f440, end: 0x41f442)
				--> 236530 [WRONG]. It must be written somewhere between 590672 and 908725 

	[2] effort: study the IN instructions between 590672 and 908725.
			691019, 691070, 691112, 691184, 

		691019 --> does not work. 

-------------------------------------------------------------------------------
Task 134:  Disable the CR3 switch logic and see if we can still capture the IN INSTRUCTION, and then
vary the input load and see if the number of IN instruction is changed.
------------------------------------------------------------------------------
	[1] disable the TraceManager::handle_instr [disable all, adding and switching CR3], also in
		Trace::handle_instr disable the setModifyCR3.  [10 min]
	[2] read the dump file.
		Check the following points	
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
		It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration).
	
	Identified:
		START: timeStamp: 473357, ins @7c8715bb: call  [0x7C801034]
		END: timeStamp: 475131, ins @7c9132f8: cmp   edi, ebx
			NOTE that 7c8715c1 (only hit once during entire execution) is located at timestamp 475159 [this verified]

CONCLUSION: There are kernel mode cr3 code running, and need to keep track of the physical memory.!!! for I/O.
	Idea: we turn on the tracing of physical memory beginnong from every sysenter, and ends at sysexit. Build
	a reverse page table and map the physical memory begin written back to sysenter [map them back to virtual addr].

9:00AM 09/18/2013
-------------------------------------------------------------------------------
Task 135:  Modify helper_trace_mem in tcg/i386/tcg-target.c so that the physical address 
 will be calculated
-------------------------------------------------------------------------------
	[1] develop a function vmAddrToPhyAddr, simulate /home/csc288/qemu/qemu-1.4.0/include/exec/softmmu_header.h [1 hr]
		(a) trace into cpu_ldub_code and study the logic. [20 min]
			page table access and virtual adddress translation is provided in cpu_x86_handle_mmu_fault.
		It seems that vmAddrToPhyAddr can be done without the access of page table. If trace_mem_access
	is called AFTER memory access is done, then memory access is already in TLB. This can be directly 
	used for calculating the physical address.
			Need to verify if the host address in the code is really the physical address.
		(b) study of the cpu_ldub_code function
			TARGET_PAGE_BITS is 12, defined in target_i386/cpu.h
			CPU_TLB_SIZE is defined as 1<<CPU_TLB_BITS-1 (which is 8). Thus it is 127 (7 bits of 1's)
			page_index is actually the index of page inside TLB. [it is assumed that page is always loaded
				active in TLB when it is accessed].
				the entry of TLB table is defined in include/exec/cpu-defs.h. ??? in CPUTLBEntry, the field
					addend is used to add with virtual address to get the physical address, not sure
					the use of addr_read, etc.
				unlikely is defined as a macro "include/qemu/osdep.h:#define unlikely(x)   __builtin_expect(!!(x), 0)"
			Simply interpret it as x==0. The "unlikely" condition in cpu_ldub_code means that if the 
			TARGET_PAGE_MASK is defined as 0xFFFFF000 (i.e., ~(1<<TARGET_PAGE_BITS)-1). This is used to
			tell if the address is in TLB. So here addr & target_page_mask IS the "REAL PAGE INDEX"!!!! (note that
			variable page_index is actually the real page index mod the TLB size).. 
				If it is in TLB, then the ELSE BRANCH is executed. It first generates the host address (physical addr)
			by adding the addend. Then it uses the host_addr to retrieve the contents. So here, the host_addr
			is actually the ADDRESS in the host which stores the data. So here it is confirmed that
			host_addr here IS the physical address.
				If it does not match the TLB entry read_addr, there could be several complications, e.g.,
			unaligned access accross pages. 
				case 1. I/O read. The I/O address is out of normal page address. It is forwarded to io_read
				case 2. unaligned access. It first checks if this is TCG generated code. Then it calls
					slow_ldl_mmu. It handles the case of spanning over two pages. It loads the data in two
				pieces and then merge the data. When loading the data it is caling slow_ldl_cmmu.
				case 3. unalgined access in the same page. It is simply treated as normal access. 
				case 4. not in TLB. call fill_tlb
		(c) logic of fill_tlb (located at /home/csc288/qemu/qemu-1.4.0/target-i386/mem_helper.c:137)
				It first calls handle_mmu_fault. 
					it first checks  cr0 is set. Then it checks PAE flag.  Different treatment.
					Read page table (logic starts from line 717!!!)
						page table entry address is stored as a part of env->cr[3]
						E.g., addr is 0x80087000, env->cr[3] = 0x39000
						pde_addr is env->cr[3]+addr>>20 & 0xffc = 0x39800 [note 0xffc is 1111 1111 1100]

							Interpretation: 0x39000 is the starting address of the first level page table.
						Note the operation addr>>20 & 0xffc. This is essentially take the left most 10 bits, and 
						then multiple by 4 (because each entry in page table is 4 bytes). This yields (
						from addr 0x800087000) the entry in first page table: 0x800. Thus pde_addr 0x39800
						is the corresponding entry address in the 1st-level page table. Note that 1-leve page
						table is called page_directory.

						line: 718 pde = ldl_phys(pde_addr), this is actually the entry of the 2-level page table
							pde in debug session is 0x3b003. It is acombination of flags.
								It has to be first verified with 
								PG_PRESENT_MASK=1. Thus it is a real address, otherwise this is a page fault.
								then it checks cr[4] PSE_BIT, this decides that the page size is 4MB. --> 
								page size is 4kb. Then check the PAGE_ACCESSED_MASK, and save it if this is the
								first time the page is accessed.
							pte_addr is calculated as pte_addr = ((pde & ~0xfff) + ((addr >> 10) & 0xffc))
								This is actually (addr >>10 & 0xffc) to take the 
								left most bits 10-20 (the middle 10 bits) and multiple by 4, and add to pde.
								so pde & 0xfff is the beginning address of the 2nd-level page table, and then
								pte_addr is the address of the page entry.
								pte_addr is now 0x3b21c.
							pte is then the real page entry its value 0x87063.
								Similarly, its lower bits are padded with flags.
							ptep is calculated as pte & pde (not sure why it's needed) -- seems to be getting the 
								conjunction of flags
						line: 821. virt_addr is the page starting address for the address
								virt_addr is 0x8008700 by clearing out the last 12 bits.
						line: 841 do the mapping
							page_offset is the offset of the address in the page. That is 0x0 for 0x80087000
							paddr is (pte & TARGET_PAGE_MASK) + page_offset is 0x87000
								notee that TARGET_PAGE_MASK wipes the last 12 bits of pte as 0.
							So pte entry with last 12 bits 0 is the REAL ENTRY ADDRESS of the page.
							paddr is the physical address.
						Then it calls tlb_set_page, the physical pages is actually mapped into host memory pages.
							it calls phy_page_find, which returns the section that the physical page is located in.
							Here "section" seems to be the private data structure used by QEMU to maintain 
							memory management.
						Then at line 270, addend = (uintptr_t)memory_region_get_ram_ptr(section->mr)
							memory_region_get_ram_ptr (mr=0x28d86918) at /home/csc288/qemu/qemu-1.4.0/memory.c:1150
							It goes through a list of ram blocks and check if its block offset matches the
							given address, then it returns the corresponding host address.
						So the addrend is actually corresponding host address (in emualator) fo rthe physical address.
		************************************************************************************************************
		Conclusion: in TLB, virtual address is actually mapped to host address (instead of the real physical address).
			In page table, virtual address is mapped to physical address. To translate physical address, need
				to call memory_region_get_ram_ptr to calculate the addend.
		************************************************************************************************************

	
2:30PM						
-------------------------------------------------------------------------------
Task 136:  design the physical memory tracing system.
-------------------------------------------------------------------------------
	Q1. trace the real physical address or the host address?
		Q1.1 is helper_trace_mem called after or before the memory access?
			[1]check tcg/i386/tcg-target.c, helper_trace_mem is placed before memory read/write. So
			they may not be in cache yet.
			[2] make an experiment and try to move them after the real memory/read/write and see if it works.
			[3] strangely, the dump seems ok, however, there are a lot of unrelated ERROR messages on 
				in-consecutive memory access. VERIFIED, cannot be placed anywhere. but only at the beginning,
				coz data might be destructed. Thus, host_addr is not usable
	Decision: take the host addr, it might be still faster.

	Design in general:
	(1) add a function vaToha which maps from vritual address to host address
	(2) insert the call vaToha into helper_trace_mem, it's going to slow down it a little bit
	(3) modify handle_mem_read, handle_mem_write and parameters phyaddr, phyaddr2
	(4) declare capture_physical_mem in handle.h and use it in handle_mem_trace
	(5) provide a function in QEMU for building reverse_page table, it calls add_entry page table
	(6) provide a data structure called page_table, it can be used for modeling both reverse_page table and the
		regular page table.
	(7) How to monitor the writing to page table?

-------------------------------------------------------------------------------
Task 137:  Implement the function va_to_ha()
-------------------------------------------------------------------------------
	(1) declare the function in /home/csc288/qemu/qemu-1.4.0/include/exec/softmmu_header.h [DONE]
		simulate tlb_fill in /home/csc288/qemu/qemu-1.4.0/target-i386/mem_helper.c:137
	(2) add the logic for handling other cases. [DONE]
	(3) debug. place it in helper_trace_mem
9:00AM 09/19/2013
	(4) add the handling of unaligned access within the same page. [15 min] DONE
	(5) handle I/O processing. [20 min] DONE
	(6) handle unaligned access across pages [90 min]
	(8) debug unaligned I/O access[20 min]
			check the regular handling of 0xf1bce --> it's always mapped to I/O port 1
	(9) I/O processing unaligned has to be very accurate. Cannot be over approximated. FIX it [30 min]
			DONE
	(10) fix the unaligned access acrss pages. [20 min]. debug and find if it returns the same.
		va: 0x1ffe5ffd, ha1: 0xa98e2ffc, ha2: 0xaa0fde18
	Now switch to the real logic and see what is going on:
			check if the branch is diabled, what's going on. Still does not work.
		Identified it's the reload tlb logic causes the problem.
		
	(11) Try to figure out why reload tlb causes the problem.
		enalble the branch and see how many times it was hit.
		the bad news is that it is hit many many times when causing blue screen.
		It happens at around 35000 times however not fixed.

		Effort 2: add a global variable as "last addr" record it and place a bp on it.
		Effort 3: bp on raise_exception first.

		Confirmed: it's 216c4. the problem is it goes to iotable, which is wrong.
		observe the values 148.
			Normal case: 147->148->194 [tlb_addr: 0x21000]
			Bad case: 147->148->150 [tlb_addr: 0x21010]
			check the value of TARGET_PAGE_MASK
			The problem seems to be tob_addr is 0x21010 (not clear aligned). debug into tlb_fill
		and see what's the problem.

				The tlb_addr: 0x21010 is caused by the following:
			} else if (memory_region_is_ram(section->mr)
301                        && !cpu_physical_memory_is_dirty(
302                                section->mr->ram_addr
303                                + memory_region_section_addr(section, paddr))) {
304                 te->addr_write = address | TLB_NOTDIRTY;
305             } 

	So the problem is that we refilled the TLB but did not do the writing to clear the TLB_NODIRTY tag.
Need to clear the tag.
	Directly modifying it back seems not solve the problem. Make a minor change to check if it is reloaded and then change. Still does not work.

		Attempt 4: comment out the reload again and. Still bug. 
		Attempt 5: comment out the read also. Remove all early returns. Add a check of -1 at the end.
			-1 check does not work.
			Found that read also causes problem. Make another experiment, enable write but disable read.
		Strangely, it still does not work. Both needs to be disabled, however, it does not
		occur always.

		Attempt 6: compare the trace after the last memory write of 0x216c4.
			@EIP 0xc01d9: length: (1): iret
			@EIP 0x20ece: length: (2): les  %esp, %eax

		Strangely, the bp are never hit!

		Attempt 7: just skip the load_tlb for 0x216c4 and see what happens. Does not work.

		Attempt 8: create a similar functionto tlb_fill without filling the tlb table. in
			target-i386/mem_helper.c. This time works.


7:30AM
-------------------------------------------------------------------------------
Task 138:  test precision of the tlb_fill simulator
-------------------------------------------------------------------------------
	[1] Design: 
		(1) capture a VA which usues get_ha read first
				va: fd094, ha: 0xaa442094
		(2) then set a bp on tlb_fill and step back to the main function and see what's the ha 
				verified
		(3) repeat for get_ha write
				va: 0xe0c3c, ha: 0x899ddc3c
				verified
		(5) check large page
				bp on mem_helper.c:512, check the return and then check the helper_ldl
				va: 0x806f9088, size: 0x400000 (4MB). -> ha: 0x89ff6088
				verified. The va_add_large_page only changes the TLB attribute for full flush
				when large page invalidated. it's not going to change the translation.
		(4) check the case it returns -1. The first write returns -1.
				bp on write and then bp on tlb_fill and the entire get_ha function 
				va: 0x4ea005, second visit transfer to 0x97d7a005
				verified, it works. The system will generate a page fault, and after some 
			interrupt handler, it handles the page fault properly and will call the va_to_ha
			properly.
			
DONE.	

9:00AM
-------------------------------------------------------------------------------
Task 139:  Improve the precision of unaligned access
-------------------------------------------------------------------------------
	[1] change the starting addr and size handling.
	[2] debug: bp on softmmu_header.h:193  DONE
	[3] fix the documentation of va_to_ha
	[2] debug: break helper_ldl(b)_mmu
DONE.

-------------------------------------------------------------------------------
Task 140:  improve memory read/write storage
-------------------------------------------------------------------------------
	(1) add class memRange. all members public. supports function [20 min] DONE
		isMergableWith(start, length)
		mergeWith(start, length)
		all inline function
	(2) test memRange [30 min] DONE
		(1) test isMergable
		(2) test mergeWith
	(3) add class memRangeManager. support functions: [20 min]
		addRange(start, length)
		getCount
		getArrRanges
		copyFrom(memRangeManager)
	(3) test memRangeManager. [25 min]
DONE.

12:00PM
-------------------------------------------------------------------------------
Task 141:  Modify InstrExecRecorder class to accomodate the memRangeManager
-------------------------------------------------------------------------------
	(1) remove readMemAddr etc. [DONE]
	(2) fix the expandFromRaw.  [DONE]
	(3) fix the mock [DONE]
	(4) fix handle_instr [DONE]
2:30PM
	(5) fix memory access  [DONE]
	(7) add memRange.serializeTo and desrialize [20 min] DONE
	(6) fix serialize  [15 min] [DONE]
	(7) fix deserialize [15 min] DONE.
	(8) fix dump [10 min] DONE.
	(9) fix testRecorder [15 min]
	(1) test. [20 min]
DONE
4:30PM.

10:00AM 09/21/2013
-------------------------------------------------------------------------------
Task 142:  Set up the physical mem tracing mechanism
-------------------------------------------------------------------------------
	Idea: whenever there is a CR3 change instruction, start to physical memory tracing mode

	[1] in handle.h add function isModifyCR3(cr3, eip, opcode) [15 min] DONE
	[2] in helper_trace2, add a branch to test isModifyCR3 and call two functions:  DONE
		buildPageTable(cr3), and setPhyMemTrace(cr3) [15 min]	
	[3] implement the setPhyMemTrace function in handle.cc [15 min] DONE. DONE
	[4] debug [20 min] DONE.
10:40 DONE.

11:00AM
-------------------------------------------------------------------------------
Task 143:  Read Page Table
-------------------------------------------------------------------------------
	[1] set up the framework. Declare the following: [15 min] DONE
		(1) build_page_map in include/exec/softmmu_header.h 
		(2) add_page_map in handle.h
	[2] debug into the template [10 min]
	[2] implement build_page_map in softmmu_header.h [1 hr]
		(1) copy from va_to_ha DONE.
		(2) handle_segment mapping case. DONE
		(3) handle CR4 case. skip it at this moment. DONE.
		(4) handle non pae case. add a function. DONE.
		(5) handle normal 4k page case. DONE.
		(6) virtual page number to physical page number. DONE
	4:20PM
		(7) remove in pte_to_ha. DONE. [10 min]
		(8) fix build_map_for_non_pae [25 min]
		(9) add function set_page_size(cr3, pagesize); [10 min]
		(10) fix build_page_map [15 min]
	[3] debug
		(1) all branches of build_page_map. DONE. the other two branches never encountered. [10 min]	
		(2) debug into build_page_non_ae [15 min]
		(3) fix << problem. [5 min] DONE
		(4) debug pet_to_ha. [15 min] DONE.
10:15AM 09/22/2013
		(5) check how many pages are generated. done. Around 10k pages * 4k = 40MB?
	

-------------------------------------------------------------------------------
Task 144:  Create page_map class
-------------------------------------------------------------------------------
	[1] define a page_map class, extended later. [15 min] DONE
	[2] function clear() [5 min] DONE
	[3] function add_page() [5 min]
	[4] function ppage_to_vpage [5 min] DONE
	[5] function vpage_to_ppage [5 min] DONE
	[6] function va_to_ha [10 min] DONE
	[7] function ha_to_va [10 min] DONE
	[8] debug and verify the entire system.
		[1] get cr3, and then set a BP on tcg_target.c:1252
			get three sets of va and ha
			va: 0x20044, ha: 0x98030044
			va: 0xf74dbc28, ha: 0x97dccc28
			va: 0xe10010e4, ha: 0x8c08f0e4
		[2] fix set page size. DONE
		[3] add_page_map has not been done yet.
		[2] bp on Trace::handle_instr and call trace.pagemap.va_to_ha

7:30AM 09/23/2013
There are still bugs: va 0x401010 does not work!!!
	0x00401010 first page 0x001, seconnd page number 0x001, page offset: 0x010 
	
	BP on target_i386/mem_helper.c: 182
	Found it page number calculation error:

	Still did not correct the problem.
	Found that page_width is still not right.
	Fixed.

	using ha, we can directly use "x/10i $ha_value" to verify that the stored binary instructions
correspond to code!
	page_map success now

	[9] test efficiency of the system. OK. a little bit slow. Later run profiling tools on it.
	
	
	
9:30AM  09/23/2013
-------------------------------------------------------------------------------
Task 145:  make the CR3 memory change working.
-------------------------------------------------------------------------------
	[1] Design [20 min] DONE
	[2] change InstrExecRecorder handle_mem_read/handle_mem_write, change size to real size,
		update documents [5 min] DONE
	[3] change Trace::handle_mem_read/handle_mem_write, change size to real size [5 min] DONE
	[4] change handle.cc handle_mem_read [10 min] DONE
	[5] test if everything works. [10 min] DONE
	[7] implelement TraceManager::handle_phy_mem_access(unsigned int addr, int realsize) [10 min]

10:30AM
	[8] implement Trace::handle_phy_mem_access [15 min] DONE.
	[8] in tcg/i386/tcg-out.c:1249, call va_to_ha and then call TraceManager::handle_phy_mem_access
		[8 min] DONE
	[9] debug[20 min]
		[1] tcg-out DONE
		[2] TraceManagger::handle_phy_mem DONE
		[3] Trace::handle_phy_mem DONE.

11:20AM
	[10] fix the bTracePhyMem logic enable/disable [20 min] DONE
		add lastCR3ChangeEIP to Trace.
	[11] debug it.	 [25 min]
		(1) check it is disabled  OK.
		(2) if record enabled is false, do not trace physical memory. OK.
12:30PM.
	[12] debug the trace memory, see where it gets captured.
		BP on 1410. Did not get hit once.
3:00PM
	[13] disable the check on RecordEnabled and see the result.
		Still does not work.
	[14] figure out if it's the bug of the capture code or it's the entire mechanism.
		(1) find the physical address of 0x0025690c, it's accessed by the following instruction
			which eventually contains the data:
				imeStamp: 475197, ins @7c87160d: repz movs es:[edi], ds:[esi]
 read: (start: 0x25069c, end: 0x25069e)  write: (start: 0x41f440, end: 0x41f442)
			va: 0x0025069c, ha: 0x9803469c
			verieid: 0x9803469c:     0x000a0d61

	Run another time: always the same address.
		Now we can set a watch point, use hardware breakpoint otherwise it's too slow.
		awatch *9803469c
		Captured! the eip is 0x75b443d5
			next eip is 0x75b443d5, cr3 is 0x5dee000
		The translation is correct back from ha, however, the Trace::handle_phy_mem_access it not called!

			@EIP 0x75b443d5: length: (2): repz movsl        %ds:(%esi), %es:(%edi)

		Debug Idea:
			(1) embed logic at helper_trace_mem and helper_trace2 when eip is 0x75b443d5
				ops_sse.h:2374
				tcg-target.c:1251

			in handle_instr bp on 7c87160d, and then bp on va_to_ha and get the hw address. verify
	it it contains data or not.
		(2) modify the system and find the write access on 0x0025690c
				Found the bug: passed the va not the ha!
		(2.5) check two address translation case. passed!
		(3) Problem: it recroded too many addresses! Verified, it is recording kernel space addresses.
We need to modify  build_page_map to EXCLUDE those special pages. List all the flags:
			PG_USER_MASK 1<<2
		in mem_helper.c:197, 213, 233

		Still minor problems:
		[1] (gdb) p/x addr
			$12 = 0x8993e300
			(gdb) p/x va
			$13 = 0x7ffe0300
			check page table 511-992
			This one seems ok, check xp image!!!!
			verified, this is a legal address.

		
8:30AM  09/24/2013
-------------------------------------------------------------------------------
Task 146:  fix the CR3 memory capacity problem.
-------------------------------------------------------------------------------
	[1] add a printf and see the memory range. [25 min]
		There are two problems: [a] merge problem, [b] address not in range.
	[2] merge problem [15 min]
		Check the merge problem first.
		(a) enlarge the capacity and see what's going on.
		Non mergible does cause problems
	[3] solve the merge problem [45 min]
		(a) in memoryManger implements a minimize() function [25 min] DONE
		(b) test it [10 min] DONE
		(b) call minimize when reaches capacity [10 min]  DONE
DONE 10:00AM
	[4] go back shrink the capacity and find out the non-in-mem ranges [15 min]
		No in range mems: 0x8993e2f0, 0x8d011620
		p1no: 550 , p2no: 318
		pde: 0x2423163
		page_no: 8993e
		It seems that page_no 8993e is never added, check pagemap result ha_to_va and va_to_ha

		add test logic to code in pageMap::add_page and  IT IS NEVER HIT!
		check how come it's added into range. Found the bug at trace.h:1441
		Now switch back to 500. Now it's fine (even 200 is not fine).

11:30AM
	[5] now verify the correctness of the analysis. 
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
		It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration).
		Start: timeStamp: 473548, ins @7c8715bb: call  [0x7C801034]
		End: timeStamp: 475755, ins @7c8715c1: cmp   [ebp-0x9C], edi
		During period (473548, 475755), there are _2__ cr3 change instructions:
			timeStamp: 474518, ins @804dbf60: mov   cr3, eax
 read: (start: 0x250688, end: 0x25269f)  read: (start: 0x250690, end: 0x250693)  read: (start: 0x7c9110d8, end: 0x7c9110d8)  read: (start: 0x7ffe02f0, end: 0x7ffe02f0)  read: (start: 0x7c911059, end: 0x7c911059)  read: (start: 0x7c902688, end: 0x7c90268b)  read: (start: 0x7ffe0300, end: 0x7ffe0307) , ESP: 0xf7584c34 -> 0xf74dbc70

			timeStamp: 474827, ins @804dbf60: mov   cr3, eax a lot of read/write
		OK. 0x25069c there.
	[6] improve the packing code. [15 min]
	[7] generate the full trace and check the dependency. first check the instruction that reads 0x25069c after timestamp
		475755. then check which instruction it depends on:
			imeStamp: 476542, ins @7c87160d: repz movs es:[edi], ds:[esi]
 read: (start: 0x25069c, end: 0x25069e)  write: (start: 0x41f440, end: 0x41f442)

2:30PM
	Found that there are segmentation fault: id=475984
		(1) bp on the desrialize function . did not find out the problem. it seems that it starts to break at index 10.
		(2) bp on memRangeManager::serialize when its count greater than 10, look at the memory dump and check it back
		found an additional bug: mock_memory_access

	Every time it broke at differneet index, 10, 36
			Found that the problem may be caused by InstrExecRecorder.serialize - size. The size is not right!

7:30PM
	Check how it's serialized. In serialization: size is ok 1109 given around 180 entries in myRead_VA. 
	timestamp: 474693, size: 1109, its position: 9512044.

	Guess, maybe there are memory overwriting. Check the record size of cache.

	New bug: fix append_record error. Could not find out why total_size is being overwritten.
		After total_size 16388 it generates the error


9:00AM 09/25/2013
-------------------------------------------------------------------------------
Task 146:  Fix the cache append problem
-------------------------------------------------------------------------------
	[1] find out how it is inconsistent.
		(a) BP on Cache.cc:90
		(b) then BP on InstrInfo::appendToCache
		(c) also BP on InstrInfo::loadFromCache - disable first.
		Observe until id 16388 = 16 * 1024 + 4
	[2] Observation: error occurs at 16389, the nxtRecordInBlock is 16384 which is wrong.
		Our guess is that there is a loadCache at 16383 (which leads to nxtRecordInBlock) but it never recovers to the
		latest position.

		Stragenly, did not hit 16384 loadcache, but the nxtRecordInBlock is set to 16384. Need to set a watch point.
		break on 1, and then enable 2 hit a couple of times and then set the watch point. DOES NOT WORK. too many swaps.

	[3] Attempt 3:  declare attribute lastloadID and check the ID. Last load ID is 174.
			verify it can be repeated. hit again.
			now set condition on Cache::retrieveRecord bp condition id==174	, it's hit 10 times, got to ignore  times.
				Last couple of calls' call-chain
					isModifyCR3(0x81f8f5ee)
					Trace::handle_instr: 787-> isModifyCR3(0x81f8f5ee)
					Trace::checkRecordStatus 0x81f8f5f0
					InstrExecRecorder->dump (0x81f8f5ee) --> setInstrPorcessor->InstrInfo::loadFromCache

		FAILED, the number of times that it is hit is not stable.

  [4] Attempt 4: break on Util::error_exit and check the eip being added. Then find out the previous instruction, set a BP on
			handle instruction of the previous instruction
		(1) eip being added: 0x81f8f5c4 (verified hit the same spot)
			the previous instruction is 0x800ca220
		(2) bp on Cache.cc:93, then bp on Trace::handle_instr where condition is eip==0x800ca220 check how many times it is hit
			before fault. It's only hit one time (but wait two seconds).
			There are multiple rounds of instruction execution, now it's 0x800ca223. Strange, after BP, cannot repeat 1.

  [5] Attempt 5: check all InstrInfo::loadFromCache and see if they are paired. Disabled 3 instructions at line 787
		The problem seems to be fixed. Now the problem is that it seems to be super slow.
  [6] Attempt 6: try differnet size. Worked like a charm. Increased from 16kb to 64kb buffer and no swaps.
					
	[7] now verify the correctness of the analysis. 
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
		It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration).
		Start: timeStamp: 473389, ins @7c8715bb: call  [0x7C801034]
		End: timeStamp: 475191, ins @7c8715c1: cmp   [ebp-0x9C], edi
		@25069c is written by the mov cr3 instruction at 474668

		Now generate the full trace. There is still a segmentation fault.

-------------------------------------------------------------------------------
Task 147:  Fix the full trace generation problem.
-------------------------------------------------------------------------------
	[1] break at 474667, add a conditional branch. Problem deserialization. The count in mrRead_VA is 161 however, the
		total size of the record is only 242. Which is clearly not right.
	[2] Check the serialization. Set a breakpoint at InstrExecRecorder::serialize when its eip is 0x804dbf60
		size is 1277, timestamp: 475212
		last line of x/100wx buf
		x/100wx buf
		0x087c91be      0x91b9ff00      0xe500017c      0x017c91b9
	FOUND THE PROBLEM: Cache::appendRecord size is only char!!!! 

8:45AM 
	[3] inspect full trace.
		There are still problems with full trace deserialization. check it.
		Deubg:
			(1) use generate_raw mode, and check the serialization. BP on InstrExecRecorder.cc:267
			timestamp: 475617, eip: 804dbf60, mrRead_VA size 1260, mrWrite_VA size 10, total size: 1265
			in Cache::serializeTo, position in block 345698 to 346963, dump of the first 20 words.

				0x5dc5066a:     0x804dbf60      0xce00100f      0x00000000      0x307ffe00
				0x5dc5067a:     0x91d3d400      0x9200017c      0xba7c91d2      0x90268800
				0x5dc5068a:     0x1800047c      0x207c9026      0xfe030000      0x9400087f
				0x5dc5069a:     0x027c91d5      0x91d59c00      0xa400027c      0x027c91d5
				0x5dc506aa:     0xfb000400      0xc000027f      0x0c7ffb00      0xfb00ce00

			seems no problem.

			(2) now use the full trace mode, check the deserialization BP on InstrExecRecoder.cc
				It did not hit the breakpoint. Broke at 208065.

			(3) add condition 208065 and see what's going on.
				Error occurs between 208062 and 208063 (at 208063 the eip should be 804fbd60).
				The problem is with the record 804fbd60 (it's saving only 3 bytes)
				arrPositionArry is not set right, the entries before and after it are all right.
			(4) fix the depend links problems first. increase to 1000.

			(5) recover to raw case. Set a conditional bp in Cache::appendRecord(size) when size is 
				smaller than 10. The BP is never hit.

				go back to full case again and look at the result. It broke again right before mov cr3 
				instruction.

11:30AM
			(6) The problem must be in serialization. use gen_raw mode. BP on InstrExecRecorder.cc:267

				Then insepect the following: 
					EIP: 0x804dbf60
					ts: 342754 mrRead_VA size: 31 
					blockID:  5  nxtRecordIdx:  15073 blocksize (its position): 297039 size:209 
					next record index: 15074 
				serializeTo is called again
					eip: 804dbf63
					ts: 342755 mrRead_VA size:0  
					blockID:  3          nxtRecordIdx:  15074 blocksize (its position): 297248 size:  21 
					next record index: 15075 

				Strangely, when it is deserialized, the index was completedly different.
				Need to set BP on Cache::saveCurrentBlockToDisk

			Guess: the problem might be still in Cache::serialization, when it's saving the arrPosition.
	
	

				Then bp on InstrExecRecorder::serializeTo and InstrExecReorder::loadFromCache check the
				next record

2:30PM fix the problem
				enlarge from byte to short int.
				Debug: serialization problem. Cannot get the serialization working. Always broke at ID 267!!!
				
3:00PM  debug into save block 53, and see how the sizes are saved. Problem: maybe the cache is not completely
	saved!!!! call delete cache.

3:30PM shirnk the test data set to 1 and see what's the problem. fixed

3:40PM still check the size problem. Inspect the data file. Verified it's the data file corrupted. fwrite is
	not reliable!
	Split short int into 2 bytes and then try it.
4:05PM still does not work. trace into saveCurrentBlock

	Guess: the problem might be the calculation of the startIdx??

4:20PM fix the calculation of startIdx and revert back to write of short int. DONE!

4:30PM test the full trace again.

4:33PM New bug found:  system segmentation fault at pagemap destructor.
		add NULL protection

7:30PM
	Problem again: full trace does not contain write to 0x25069c
	 now verify the correctness of the analysis. [raw mode] 
				--> ntdll.CsrClientCallServer (at 7c8715bb)
				--> at 7c90eb8d does SYSENTER (EAX: 0xc8, ... then complex message format)
		It should return at 7c90e3ed, and then return to 7c9132f8, and then 7c8715c1 (only once in entire duration).
		Start: timeStamp: 473267, ins @7c8715bb: call  [0x7C801034]
		End: timeStamp: 475069, ins @7c8715c1: cmp   [ebp-0x9C], edi
		@25069c is written by the mov cr3 instruction at 474546

	Now generate the full trace:  generated. Note timestamp moved -1.
		Problem: timestamp: 475084 does not depend on 474546 (474545) instead, it depends on 205063

	Debug: set a BP on   InstrExecRecorder.cc:297
		Found the error in mock_mem

	
8:40PM
	Now fix the extra memory link problem.

8:50PM check slicing
	Run too slow. could not stop.

8:00AM 09/27/2013
-------------------------------------------------------------------------------
Task 148:  Fix the slicing.
-------------------------------------------------------------------------------
	[1] mov job1 to job2 and job2 to job1, and reset the experiment. check if it is still stuck. [15 min]
		Does not work
	[2] Try enlarge the store size and see what's happening. [15 min]
		fixed a bug, increased to 1M entries.
	[3] code inspection. Check how block_size is initialized [15 min]
		See the problem. The block_size is loaded from the raw trace. We have to regenerate the raw trace.
		completed
9:00AM
	[4] inspect the slice generated for b1.exeA OK.  [15 min]
	[5] inspect the slice generated for b10.exe.  [15 min]
			Found problem at 0x407e45 Compare the trace.
			The problem is that the ESI at 0x407e42 is not the right value
9:30AM
	[6] Add a memory check support to Cache first. If the target address is over the limit, then stop the application. [30 min] Fixed.

9:50AM
	[5] insecpt the slice generated for b10.exe [30 min] Pair by Pair compare
	TO DO: fix the trace problem at 0x407e45.
	The problem is that the ESI at 0x407e42 is not the right value
			ESI is from 0x402702

			The value of ESI is from 0x4026cb (the push instruction is pushed three times 0, 0x00410008 (iob), 0
	in regular execution). It is also pushing the same value into the stack in bad trace. Now need to figure out
	why it's popping bad values out.

		Now check the ESP value if they match each other. yes; the value 0x41d008 is stored at 0x12feec.
	Check the pop instruction at 0x402702. The problem is the error trace is 12 bytes away when doing the pop.

		Next, start from the first pop, do the pair by pair comparison.

		Difference occurs at 0x40c4bc, the stack structure is different now. --> found that it departs
	at 0x0040C047. 

		*** inspect the algorithm ***
		Found the problem: 0x0040c4c1 is not included in slice (ADD ESP, 0XC). Verified it's hit only once.

		In the slicing algorithm 0x40c4e7 call instruction depends on 0x0040c4c1 and it is skipped in slice.

				Dump below:
					timeStamp: 477231, ins @40c4c1: add esp, 0x0C
				, ESP: 0x12fed8 -> 0x12fee4 , DEPLINKS:  , R: 477230 and ESP value: 0x12fee4, C: 477230 ESP: 0x12fee4 EBP: 0x12ff14


				timeStamp: 477235, ins @40c4e7: call    0x00000011
				 write: (start: 0x12fee0, end: 0x12fee3) , ESP: 0x12fee4 -> 0x12fee0 , DEPLINKS:  , R: 477231 and ESP value: 0x12fee0


2:30PM
	Idea: check timestamp 477813 and 477817 in slice algorithm, strangely 477817 is not hit. Check the log.

		Identified the problem: when the call instruction is replaced by the skip/NOP, there is a ESP dependency;
	when we set the previous instruction as needsToVisit, we skipped the ESP dependency. Problem: 
	the ESP Dependency at 477813 is not handled properly.

	[1] add the dump information for setInSlice. DONE
	[2] Debug 477813 and see why it's not listed as in slice. During debug, it is said to be not EspDelay(). Strange.
		Fixed.
	[3] verify if the fix is successful. ALL GOOD.
	[4] create another simple example b4.exe Verified, work ok.

	
7:30AM 09/28/2013		
-------------------------------------------------------------------------------
Task 149:  Mining Conditional Branches
-------------------------------------------------------------------------------
	[1] in config, add a new job called mine_conditions [5 min] DONE
	[2] update the Job class and add a new job cateogry JOB_MINE_CONDITIONSA [5 min] DONE
	[3] update BatchAnalyzer and update the following [56 min]
		execJob [5 min] DONE.
		execMINE_CONDITION_JOB [8 min] DONE
		gen_MINE_COND_JOB [8 min] DONE.
		create class taskCondJob [15 min] DONE.
		declare Trace::mine_cond_slices [10 min] DONE.
		Debug: [20 min] DONE.
9:30AM 

	[4] Implement Trace::mine_cond_slices(Job *job) , simulate the framework of one slice.
		first call collect_conditions and then extract_slice_for_condition [30 min] DONE.

	[5] declare gen_slice_for_branch() [20 min] DONE.

    [6] think about collect_conditions algorithm [20 min]

10:45AM

	[5] implement vector<long long int> collect_conditions(vector<sectionInfo>)). Collect the
		set of condition branches. Avoid loop points [60 min]

	[6] debug first part of gen_slice_for branch [20 min]

	[7] debug the function collect_branches [30 min] DONE.
12:30PM.

	[8] implement function extract_slice_for_condition(long long int ts, string src, string file_path)
			Idea: loop back from the ts, and mark all data dependencies. If one point is visited multiple times
	with the same ESP value, then mark that as a loop area. From the start to end loop area, perform the 
	control dependency analysis until it is self-contained.
	
7:30 AM	 10/01/2013
-------------------------------------------------------------------------------
Task 150:  refactor the onslice algorithm
-------------------------------------------------------------------------------
	[1] create a new function full_slice(ts1, ts2) [30 min] DONE
	[2] test the slice algorithm. [10 min] DONE.

9:00AM
-------------------------------------------------------------------------------
Task 151:  Data Slice and Identify Single Occurance Component
-------------------------------------------------------------------------------
	[1] add a function init_data_slice [60 min] DONE
	[2] test init_data_slice [30 min]
		set a bp at init_data_slice and change timestamp to 477436. (jnz ...) DONE.
	[3] completely check the data dependency one by one. Too copmlex to trace. [30 MIN]. DONE.
	[4] handle instructions like xor eax, eax [2 hr]
		[a] collect stats for 477437. Total size: 23592.
		[b] in InstrInfo declare flag FLAG_NO_DATA_DEPENDENCY [5 min] DONE.
		[c] in InstrInfo declare function examine_no_data_dependency(), first check those
				inReg set - outReg set is empty and list and then decide the algorithm [20 min]
				[c.1] implement Util::getSetdiff
		[d] inspect generated instructions that are identified as no reg data dependency [1 hr]
		[e] now compare the data slice: 23210 (reduced about 200 instructions).
8:30am 10/02/2013 
	[5] apply the algorithm to one_slice:
		slice 3: 
		Trace Size: 524692, in slice: 451858, Percentage: 86.12%
		Instruction Store Size: 46807, in slice: 40611, Percentage: 86.762664%
		Instruction Store Size (excluding imported DLL): 3096, in slice: 1792, Percentage: 57.881137%
		[a] modify the algorithm. [30 min]
		[b] verify if the new slice is working. OK. however, does not improve that much.

		Trace Size: 524692, in slice: 451593, Percentage: 86.07%
		Instruction Store Size: 46807, in slice: 40531, Percentage: 86.591749%
		Instruction Store Size (excluding imported DLL): 3096, in slice: 1787, Percentage: 57.719638%

		[c] inspect the log [90 min]
			Found problem: NO REG DEP here for ts 477157 @0x7c801892  ins @7c801892: inc  eax
			insert conditional BP to check it.
			Fixed the bug. the call of examine_no_reg_dep() is called after the type is set!

			Found 3 more problems: 472986, 471574, 475575.
				fixed another bug.

d
		[d] verify how the new data slice algorithm helps reducing the size. Check the instructions in dump one
			by one, and check the log, and open the full trace. [30 min]
			Most of the records do not actually reduce the size.

		[e] check timestamp 466639, how is its inslice set?
			conditional BP on InstrExecRecorder::updateCache. Verified, it's ok. added by function processing.

		Final stats:
		Trace Size: 524692, in slice: 453436, Percentage: 86.42%
		Instruction Store Size: 46807, in slice: 42494, Percentage: 90.785566%
		Instruction Store Size (excluding imported DLL): 3096, in slice: 1802, Percentage: 58.204134%
		+++ Task completed: Task generate one slice for: /home/samba/smbuser/slice_jobs/job3

11:00AM
-------------------------------------------------------------------------------
Task 152:  function processing of data slicing
-------------------------------------------------------------------------------
	[1] algorithm design [45 min]
	[2] step 1. define function void identifySingleOccuranceComponent(long long int ts, long long *tsStart, long long *tsEnd) [10 min] DONE
	[3] step 2.  Modify the gen_slice_for_branch algorithm  [30 min]
		(1) add two vectors: vecSOCStart, vecSOCEnd() and update these two vectors during the loop DONE.
	[4] Debug: 
		Problem 1: init data slice is very large. Not sure if it is right.	 Trace into init_data_slice and check the 
			timestamps being visited. Found problem long long int overflow (as long)
		It seems that ts 477434 is not cleared. Got to call clear_slice-tags().
		Fixed the problem.

TO DO: fix the identifySOC function. Start call should be fixed with corresponding ret call.

9:00AM 10/03/2013
-------------------------------------------------------------------------------
Task 153:  Misc. tasks of data slicing
-------------------------------------------------------------------------------
	[1] improve the branch collection. Add a hash_map to avoid visiting the same branch again. [15 min]
			no room for improvement. DONE.
	[2] move the set visit function to generate full trac.e [20 min]. DONE.
	[3] Algorithm Design [30 min]. DONE.
10:00AM
	[4] fix the IdentifySOC algorithm. Remove the logic of add/minus 1. [15 min] DONE.
	[5] debug the first 5 occurance of IdentifySOC. [15 min] OK.
	[6] examine collect_branch again and see if there can be further improvement. [10 min] no room. DONE.
	[7] modify full_slice algorithm prototype, add a boolean variable bSOC. [5 min] DONE.
	[8] algorithm design of full_slice [15 min] DONE.
	[9] quick look at think slicing [5 min] DONE.
11:00AM 
	[8] full_slice SOC component design: [30 min]
		[1] call full_slice on start, and end-1 because end will be reached anyway. [5 min] DONE.
		[1.5] start and end should be added in slice initially. [5min] DONE.
		[2] data link and reg link will be added as usual. [5 min] DONE.
		[3] esp link and ebp link will be added usual, but out of range link will not be added, because esp/ebp
			value guaranted at entry [5 min] DONE.
		[4] control link will be added as usual, but out of range link will not be added because start ponit will
			be reached through jump [10 min]
	[9] Debug. [70 min] 
		[1] test the EBP and ESP case [10 min] DONE.
		[2] test the control link case [10 min]  DONE.
		[3] set ts = 477437 and debug the first 2 cases [20 min] DONE. There seems to be some problems
		[4] inspect the log of 5 cases [30 min]
			(1) found bug. dependency not as expected. VERIFIED ACTUALLY OK.
8:00PM
	[10] binWriter.asssembleJMP [30 min]
		[1] find out all two kinds of jmp length. [45 min]
			[a] jmp short EB + OFFSET (positive up to 7e, negative up to 80)
			[b] long jump, e9 + 4 bytes offset (however, there are limits 0x09000000).
		[2] create function int asJMP(unsigned int eip, unsigned int target, char *buf)

7:30AM 10/04/2013
-------------------------------------------------------------------------------
Task 154:  binWriter
-------------------------------------------------------------------------------
	[1] add function asJSP(curValue, expVal, buf) [15 min] DONE.
	[2] test adJsp [15 min] DONE.
8:30AM
	[3] add function asJBP(curValue, expVal, buf) [10 min] DONE.
	[4] test adJsp [10 min] DONE.
	[5] Algorithm Design. declare all function prototypes [20 min]
			declare function in binWriter::writeDataSlice(Trace)
			in Trace needs to make getEsp_after and getEsp_before public.
	[6] implement writeDataSlice [1hr.] DONE.
10:15AM
	[7] implement initEntryPoint.
		[7.1] implement asINIT_ESP and test it [30 min] DONE
		[7.2] check trace and examine the entry point and then decide if need to catch the entry point [15 min]
			The entry point occurs at 306916.a DONE.
		[7.3] implement binWriter::genEntryPoint [30 min] DONE
11:30AM
	[9] implement Trace::findTSWithEIP() [15 min] DONE
	[10] test findTSWithEIP [10 min] DONE.
	[11] update the logic with tsEntry [15 min] DONE.
	[12] debug into it [30 min]
		[x1] found one bug on get_ESP_VALUE_BEFORE and get_esp_value_after .DONE.
8:33PM
	[13] now implement handle_SOC. [1 hr]
		[1] declare prototype [15 min]
		[2] implemen it. [45 min]
		[3] implement writePartialSlice [15 min]
		[4] update writeInstruction to verify buffer. [15 min]
	[14] debug through [20 min]
		bug1. problem with writeInstruction.


7:30AM 10/05/2013
-------------------------------------------------------------------------------
Task 155:  Test the Data Slice Algorithm
-------------------------------------------------------------------------------
	[1] debug through:
		(1) break on Trace::gen_slice_for_branch, set ts to 477437, and then break on [30 min]
			binWriter::writeDataSlice.
			[1] fix asJMP bug. DONE.
			[2] partialTrace bug. DONE
			[3] fix the eipBridge problem. DONE.
			[4] fix the bridge not in section problem.
				Algorithm Design: 30 min
	[2] fix the bridge not in section problem. Idea: require the last instruction in SOC should int
			in section. Make the following changes:
				(1) update esp_after_soc to be the value before the last instruction
				(2) the last instruction is replaced with bridge component
				(3) when generating the partial slice, do not include the last instruction (but the
			slicing algorithm will guarantee that the last instruction is hit).
	9:00AM implementation. [1 hr min]
		finish (1), (2), (3) DONE
		add log message for writeSOC.
		debugging:
			[1] fix eipBridge bug. DONE.
			[2] fix tsEnd bug.  DONE.
			[3] fix the addr 7cxx bug. 
			[4] fix the merge SOC problem.
	10:30AM fix  the addr 7cxx bug when it's a single instruction. [20 min] 
		Idea: whenever found such raw slice instruction, expand SOC as well. fix identifySOC.
		still not work. check ts: 308356
		Problem: 308356 is added later in the later segment. So the process needs to be repeating itself until
		no further instructions are added.

	11:30AM
		Implementation to fix 7cxx bug:
		[1] add Trace::getSliceSize() - calculate slice size [15 min]
		[2] add an additional loop - 15 min
		[3] add assist function: isTSInAnySOC(vecSOCStart, vecSOCEnd) DONE
		[4] add assist function: insertSOC(vecSOCStart, vecSOCEnd)A DONE.
		[5] modify the algorithm DONE
---------------------
		[6] dbug isTSInAnySOC [15 min] FIXED one bug.
		[7] debug insertSOC [35 min]
				redid the logic . DONE
				solve trouble "case skip should not ..." Idea: can allow it to be NOP. Because it will guarantee
			to be reached by the algorithm.
				check the problem of suspicious index. fixed the bug. 
				DONE.
		[8] debug the algorithm [30 min]
				fix the 7cxx problem.
				LINE 339 ---> problem.
				Fix: fix the verifyNop funciton. DONE.
		[9] fix the ESP/EBP problem.
				add tsLastEnd, if tsLastEnd is equal to tsStart, then should skip the bridge.
		[10] fix overwrite 2.
				improve the ESP and EBP. 
				Idea: the original sequence of timestamps (instructions), if there are no conditional jumps,
			then they will fore sure to lead to the next SOC. If they make any modifications to data, they are
			not referenced by any later SOC anyway, thus no dependency; if they depend on any previous SOC,
			they make no change to the control flow becuase there are no conditional jumps.
				So, when the gap is TOO SMALL, we just need to keep the original instructions.

			Implementation:
				[1] implement verifyNoCondJumps(Trace *trace, Timestamp tsStart, tsEnd) [15 min] DONE.
				[2] debug and test verifyNoCondJumps [15 min] DONE.
				[3] modify the algorithm: when call verifyAllNops fails, call verifyNoCondJumps [15 min] DONE.
		[11] fix the logic of writeSOC bridging component..
			[1] modify writePartialTrace add a bool flag [10 min] DONE
			[2] modify the writeSOC algorithm [30 min]
				Alg: generate the buffer of bridge component first, then check the gap.
				if gap>0
					if gap<bridgeSize:
						write those instructions directly
					else
						write the components
				else //gap<=0
					write the components //but face the failure of visiting back the next immediate instr. DONE.
		[12] THERE ARE BIGGER PROBLEMS WITH THE ALGORITHM ----------------
				Problem: tsLastEnd: 355578 (@403b94), tsStart 355591
				When build bridge component it overwrites @403b9b (ts: 355399)
				
	
	---> have to redo the data slice algorithm	
--------------
7:00AM 10/08/2013
-------------------------------------------------------------------------------
Task 155:  Redo the Data Slice Algorithm
-------------------------------------------------------------------------------
	[0] Alg Design [20 min] DONE
	[1] declare an SOC class [15 min] DONE
	[2] declare the SOCManager class [20 min] DONE
		1. addSOC
		2. getSize
		3. getSOC
	[3] modify the main algorithm in Trace to call SOCManager methods 
		[1] move identifySOC to SOCManager. [15 min] [DONE]
		[2] declare SOCManager in gen_data_slice and then move findTS in SOCManager[15 min] DONE.
8:30AM
		[3] move insert_into_vec into SOCManager [15 min] DONE
		[4] move mergeSOC into SOCManager [15 min]
		[5] refine the main algorithm in Trace gen_data_slice and add functions as needed [30 min]

10:00AM
	[4] work on main algorithm
		(1) add a Trace::setIER_II(ts) inline function [8 min]
		(2) modify the interface of add(). [8 min]
		(3) finish the addTS [25 min]
		(4) algorithm design: addSOC [10 min]
			first search for SOC to insert
			if it can be merged with previous one, merge it
			otherwise call insertSOC to literally add one SOC
		(5) define a function full_slice_all_soc [10 min]
11:15AM
		(6) implement addSOC [20 min]
		(7) algorithm design of insertSOC At [20 min]
			//1. call SOC->setBridgePoint 
			//2. if fail, merge it with nextSOC 
			//3. else: really insert the SOC
		(8) implement insertSOC [15 min]
		(9) modify addSOC Logic [20 min] DONE.
9:00PM
		(10) double check insertSOC design [20 min]. DONE
		(11) implement SOC::setBridgeTo(). [25 min]. DONE

8:00AM 10/09/2013
-------------------------------------------------------------------------------
Task 156:  Test and Debug the Data Slice Algorithm
-------------------------------------------------------------------------------
	[1] unit test insert_into_vector [15 min]
	[2] unit test remove_vec [15 min]
	[3] initial debug gen_data_slice [10 min] OK. will visit later
	[4] implement check_all_soc [25 min]
		(1) verify in descending order  [10 min]
		(2) verify SOC::bridge  [15 min]
9:10AM
	[5] fix the "IMPOSSIBLE" error. line 46. [15 min] DONE.
	[6] debug findInsertLocation [10 min] DONE.
	[7] check the case 477420. fix the bug on get_room [10 min]
	[8] fix one minor bug in setBridge [5 min]
	[8] debug through SOC::get_room [15 min] DONE.
	[9] modify get_room(set a minimal size needed) [10 min] DONE.
	[10] debug through SOC::gen_bridge [15 min]. DONE.A
	

10:30AM
	[11] debug through SOC::setBridgeTo [25 min] DONE.
	[12] fix SOC::gen_bridge bug [15 min]

11:30AM
	[13] improve init_data_slice efficiency. [20 min] DONE.
		[1] change the return value of Trace::set_slice. if return 0, then no updates, return 1 
			updated the timestamp only, 2 both.
		[2] change the total of init_data_slice
	[14] fixed one nasty bug of relading ier in init_data_slice [25 min] DONE.
	[15] debug SOCManager::identifySOC [10 min] DONE.
	[16] debug SOCManager::insertSOCAt [15 min] DONE.
			[1] add logger message for full_slice

7:30PM
	[17] debug and verify findTS [10 min] DONE
	[18] initial debug of addSOC [60 min]
		fixed bMerged bug.
		fixed another bug about merge.
		added delete code.
	[19] debug through addTS [10 min] done.
	[20] debug through verify_bridge_fine.
	[20] debug verify_all_soc. 
		fix bug1: ignore bridge fine for 0.
		add one more parameter to verifyBridge.
		have to unset inslice for those on the path.
		When verify bridge fails, should merge.
	[21] separate out the check of descending order.

7:45AM 10/10/2013 
	[22] Debug the strange descending order problem
		Insert the check descending order at the beginning of addTS.
		Problem is with identifySOC, remove the isInSlice().
		DONE.
	[23] New problem. too many bridge fails. Check the reason.
		(1) check why 477380 and 477381 were added to slice. It's caused by a RET which relies on 
			CALL that is in the bridge.
10:30AM
	[24] Improve the identifySOC will lose some precision. [30 min] DONE.
		check the call table and start the search from the last matching call.
		1st pass 71 SOCs -> 41 SOCs merge (speed improved and more SOCs than [23])
		2nd pass 45 -> 22 SOCsA
		34d pass: 23 -> 23
	[25] Implement binWriter::writeDataSlice  [45 min]
	[26] debug through writeDatasliceA
		(1) fix one bug related to loop.
		(2) add log to writeSOC
	SEEMS working needs to regenerate log and check.

			
-------------------------------------------------------------------------------
Task 156:  check the problem of raw trace
-------------------------------------------------------------------------------
	[1] copy back job2  AND 3 . they work fine [15 min]
	[2] try job1 again. Still stuck gdb and trace helper_trace2 and handle_instr and check the problem [30 min] 
		Somehow after recompiled it works.
!!!!!!!!!!!!!!!!!!!!!!!!!!!
	[3] new branch timestamp to test is 486953 corresponding to eip 40103c!!!.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
		Now verify if it works [1] hit the main function
								[2] skip the first conditional branch. DONE.
	Problems: 
		[1] it includes printf, which shouldn't 
		[2] execution of printf breaks.

	Continue to fix the context switch problem and then come back to visit the problem.

Slice Stats:
Trace Size: 539345, in slice: 183354, Percentage: 34.00%
Instruction Store Size: 48113, in slice: 23528, Percentage: 48.901544%
Instruction Store Size (excluding imported DLL): 3247, in slice: 2709, Percentage: 83.430859%

full slice size: 539344

-------------------------------------------------------------------------------
Task 157:  handle the context switch problem
-------------------------------------------------------------------------------
	In raw processing, checkRecordStatus if context switch happens at if-branch the 
analysis is inaccurate. What if multi-threaded programs? Thread Id can be determined using FS:[0x18].
can be handled later (add to InstrExecRecorder the threadID).

	For now to detect context switch, for each InstrInfo, include another address called targetAddr
for jmp or conditional jump instructions. The last 4 bytes or the last 2 bytes should be
the relative or absolute address.

	Several complications: (1) sysenter does not return exactly at the same address! (gap of around 6 bytes!)
(2) transfer control (jmp call) are easy to handle. RET will need InstrExecRecorder!

7:30AM 10/11/2013
	[0] Algorithm Design [20 min] DONE.
	Implementation Plan:
	[1] declare InstrInfo::getTargetAddr() as inline function. [20 min] DONE.
	[2] debug [1]. [20 min] DONE.
8:30AM
	[3] declare InstrExecRecorder::getRETTarget() as inline function. [30 min]
		[1] algorithm design [45 min]
		[2]  in ops_sse.h:helper_trace2(), if opcode is c2 or c3, take the value at address ESP_BEFORE [10 min] DONE
		[3]  change the definition of handle_instr, add one more parameter [20 min] DONE
9:30AM
	[4] debug [2]. [20 min]  DONE.
	[9] update Trace::checkRecordStatus [40 min] DONE.
10:30AM
	[10] code inspection [20 min]
	[11] debug through the function [30 min]
		[1] problem. ret value is not as expected. Need to record lastRET_ADDR [DONE]
		[2] fix the page_map problem.
11:40AM
	[12] continue debug.
	[10] run and test [15 min]

6:00PM
	[13] check the mysterious pagemap problem. [30 min] 
		Now blue screen, check if it's
	the save esp causes problem. FIXED. verified, the ESP_BEFORE is only valid when
	it is being changed. so need to adda  condition when retrieving the value!

	[14] check the switch warning problem. Found the problem, the target address
		problem can be much more complex. It can be different addressing mode.
		It could be register indexed, e.g., CALL [EDX]. However, we cannot 
		actually save every register. This approach could be too costly. Instead
		use another approach.

	In Trace class declare a pair of interrupt handler vector (start and end)
	Whenever encounter the start, record it and when meet the exit back off from
	the it.

7:45PM 
	[15] new interrupt trace design.
		[1] in trace class declare INT_HANDLER_SIZE, int [] ih_start, int [] ih_end [15 min]
				DONE
		[2] define inline function isInterruptHandler() [10 min] DONE
		[3] refine the algorithm, if it's interrupt handler, enter the status, record
				the expected Instruction [30 min] 
		[4] debug the algorithm [20 min]
			[1] fix the memory tracing problem.  Recompiled not showing any more
			Seems working and improved speed.
		[5] debug and run and test [20 min]
		Regenereate full log (around 10% less in size)
			442437 corresponding to eip 40103c!	

New size: 
race Size: 493066, in slice: 154682, Percentage: 31.37%
Instruction Store Size: 55035, in slice: 18431, Percentage: 33.489598%
Instruction Store Size (excluding imported DLL): 3247, in slice: 2662, Percentage: 81.983369%
Improved about 2%.

Still the same problem. printf crashed and it should not be included at all.


7:45AM 10/12/2013
-------------------------------------------------------------------------------
Task 158:  Devise a method for collecting reverse track. 
-------------------------------------------------------------------------------
	Basic idea:  add a reverse_pointer attribute to each node and print it
when necessary.

	[0] algorithm design [30 min] DONE.
8:15AM
	[1] add an attribute "long long int reverse_pointer", and one flag {bReversePointer}  
		in InstrExecRecorder, add functions for clear flags [15 min] DONE.
	[2] modify serialization and deserialization [15 min] DONE.
	[3] clear flag when finish the analysis [10 min] . DONE
	[3] test serliazation [30 min] DONE]

9:15AM
	[4] define function reverse__trace(long long int ts) to Trace [20 min]
	[5] modify Trace::setSlice() add attribute reverse_pointer (source),
		modify InstrExecRecorder::setSlice() 
			[1] add additional attribute to InstrExecRecorder and Trace [20 min] DONE
			[2]	dd the attribute and serialization support to InstrInfo [60 min] DONE
			[3] setEspDelay and setControl etc. all should have source [15 min] DONE.
			[3] modify the calls of ::setInSlice [45 min] DONE.
			[4] unit test InstrInfo [30 min]
				[1] fix serialization 
				[2] fix one bug in InstrExecRecorder::handle_instr
		
	[5] debug through reverse_trace [15 min]
			[5.1] fix the iifinite loop problem at 438251
			Fix: add comparison operator to the setReversePointer function.

Working now!
4:00PM
	[6] test the system by calling it once at the end of trace for the printf function [20 min]
		Regenereate full log (around 10% less in size)
			442437 corresponding to eip 40103c!	
		reverse trace on the following:
		timeStamp: 428481, ins @401022: call    0x00000076
	
		Finds that 428481 (call printf) depends on  442286

	[7] Check 442286, now the report does not report any dependence. In full-dump there is no instruction
	depending on 442286. Set a BP and check why 442286 is included.
		It is propagated from other occurance of the same instruction.

08:00AM 10/13/2013
-------------------------------------------------------------------------------
Task 159:  Analyze why printf is included
-------------------------------------------------------------------------------
	[1] ts=442437, (eip 40103c!)
	[2] bp on trace.h 921, 994, 1007,  and 1352, 

------------
	[1] fix one bug in 
	[2] observation: 442286 
			InstrInfo (@eip: 0x402943) has reverse pointer to 435494, this is caused by ts 435481, it should be propagated
		to 442286.
		Found the problem at Trace.cc:432, the instrProcessor did not load properly. But the value has been updated before.
	set BP on clear_reversePointer and set_reversepointer
	[3] problem: clear_slice does not assign -5, append "L" after the constants definition. Did not fix the problem.
		when updating the value, 443394>-5 is not evaluated as true.
		Found the problem: reverse_pointer is declared as unsigned long long int. Shoot!
		Fixed.
	Now the reverse_trace has 33 steps now.

--------------
	[3] now analyze the reverse trace.

		Problem is caused by a multiple occurance instruction in a function. This seems to be propagated too much.

	[4] add logger flush() function [5 min] DONE.
	[5] continue the analysis
		ts=442437, (eip 40103c!)
		then bp on Trace.cc:851
		*** Observation 1: 435494 depends on 435481 (@402943 pop ebx)
		but 442286 is included in slice because @eip:402943 is included and it's the same
	 	instruction at line Trace.cc:
		function 0x402935 is CALLED MANY TIMES

		The separator of printf() and getchar() in b20.exe is 435198.
		ts: 428481 calls printf (@eip: 401022)
				--435192 here (pop ebx) @402943:
			435198, add esp, 4
			435199, calls getchar (@40102a)
				--435378 is here  (push ebx) @40290a: -- depends on 435192  xxx 
				--435481 is here  (pop ebx) @402943 (in seh_epilog4 function) - pops 0
				--435494 is here  (push ebx) @409590:-- depends on 435481. push input parameter '0' file handler.
				xxx--442286 is here  (pop ebx) @402943: -- depends on 435378, included in slice because of 435378 xxx (M)
				---435502 is here (mov esi, [ebp+8]):@408f34 reads from the ebx pushed by 435494. reads input parameter '0'
			442433 cmp ..., 0x64
			442439  calls pringf("nok)
			447072  add esp (after printf)

	Problem: 442286 is incouded is actually normal, the problem is that actually ebx is
		not used in getchar, but it is regarded as information passed!!!
	observation: 
		[1] there is no cr3 change between 435192 and 435378, so the ebx dependency should have no problem.
		[2] check if 435502 is really reading from 435494. Verified: ok. both accessing from 12fed4.  value is 0. Seems to
			be the file handler for the read function.
		[3] check wwhy 442286 is visited, because 442289 @0x402947 in slice . recorded using reverse_trace function.

		

-------------------------------------------------------------------------------
Task 160:  Handle mutiple occurance timestamp
-------------------------------------------------------------------------------
	[1] improve the algorithm. When a timestapm is not inslice but the corresponding InstrInfo is: check if the instruction
	has no side effect (would not trigger exception), if yes, do not put it in slice (so it will read its data dependency from
	register or memory, but it will never trigger exception).

After the adjustment:
Trace Size: 493066, in slice: 154659, Percentage: 31.37%
Instruction Store Size: 55035, in slice: 18427, Percentage: 33.482329%
Instruction Store Size (excluding imported DLL): 3247, in slice: 2662, Percentage: 81.983369%

No big improvement.

-------------------------------------------------------------------------------
Task 161:   improve the algorithm by ignoring esp/ebp links without any other usage
-------------------------------------------------------------------------------
	ts=429635
	[1] add the attributes. OK.
	[2] fix unit test. OK.
	[3] regenerate the trace.
There are bugs fix them.
	[4] introduce bNoDataSlice instead. OK.
	[5] new bug 0x4012AD. put in conditional jump

7:30AM 10/16/2013
-------------------------------------------------------------------------------
Task 162:   Fix slicing algorithm problems
-------------------------------------------------------------------------------
	ts=429635
	[1] 0x40129E is not included.  Check the problem
		In full trace: ts is 284230 for 0x40129e.
		In branch slice trace: 284230 is not included because 284231 is marked as bNoData 
		Add dumping information and check details. Problem: neededForReg and neededForMem all 0.
		[1.5] need to update the serialization and unit test. DONE.
9:00AM 
	[2] problem with 0x004012e5 --> 0x00403e4b (call esi).
		In full trace: eip 0x00403e4b is at ts: 292643. The problem is 292640 is not processed at all.
			The problem is 292643 call esi is an instruction that needs data dependency.
		Fix: [2.5] declare function isJumpNeedsData, the logic is to check if if this is a jump and it has dependency on any of the
	registers, then it should be regarded as a jump instruction that needs data; or its input operand is not constant.

9:45AM
		[1] add the function isJumpNeedsData and add a flag, and add a void set function [10 min] .DONE
		[2] in function setInputOutput reg update the flag.  [20 min] DONE.
		[3] debug (set bp on the set function) [45 min] 
			[3.1] fix one bug about type. DONE
			[3.2] for EIP in read/write should ignore it. DONE
			[3.3] remove the [rw] of jmp/call, coz it's always updating EIP/ESP
			OK now.
			[3.4] check why flag is not set. set cond bp on 0x403e4b. It is set.
			[3.5] improve the setJmpNeedData function [10 min] OK.
			[3.6] set conditional BP in trace. [10 min] DONE.
8:30PM
		[4] Problem with eip 0x004028fc it's not included in slice, but it's dependeded by ts: 309044 (sub esp, eax)
			The problem is that 309044 (sub esp, eax) is depended for ESP
			So for ESP dependent case, we need to distinguish between depend on esp and depend on mem.
			For examle, when there is no reg and mem dependency.
				PUSH ECX does not have propagate the dependency on ECX (does not need ECX for ESP)
				SUB ESP, EAX needs to propagate the dependency onto EAX. (does need EAX for ESP)
			So the logic is where there is no REG and MEM dependency (except ESP)
				if the instruction is being depended on ESP, then push/pop does not need data dependency 
				and all other instructions will need to propagate data dependency.

				[4.1] declare flag pushpop_reg_const_operand_only set the flag  [5 min] DONE
				[4.2] in setInputOutput Reg, set the pushpop_reg_const_operand_only [15 min]
				[4.3] implement the isNeedDataforEspEbp [5 min] DONE
				[4.4] debug [15 min]
		[4] run and test and check ts 292640-292643 [10 min]

9:00AM 10/17/2013
-------------------------------------------------------------------------------
Task 162:   Fix slicing algorithm problems
-------------------------------------------------------------------------------
	ts=429635
	[1] Problem 1:  eip@0x403fd7 (mov ebp, esp) [ts:@279775] is not included, but it is 
		depended by the leave instruction at 0x404007 [ts:@283850]
			In the full trace, 283850 mistakenly depend on [timeStamp: 283828, ins @804df995: pop   ebp]
	The problem is caused by context switch. @@800ca21d: pop   ebx

 
9:15AM 10/17/2013
-------------------------------------------------------------------------------
Task 163:   Investigate context switch problem's complete solution
-------------------------------------------------------------------------------
	[1] intercept on raise exception and add a event called Interrupt. print out the relevant values
	[2] code inspection: note target-i386/int-helper.c and excp-helper.c
		raise_interrupt is more generic than raise_exception, raise_exception is to simply call raise_interrupt
			and set the exception id as the interrupt number, and set the "is_int" to 0, and
			set the next_eip_offset to 0 (to resume at the original instruction).
		Guess: timer interrupt should not be using raise_exception but raise_interrupt.
			raise_exception mainly has GPF (general protection fault),
			raise_interrupt should have I/O requests as well
10:45AM
	[3] Implementation Plan:
		[1] declare struct int_record and a new event in event.h [15 min] DONE.
		[2] define TraceManager::handle_interrupt [10 min] [DONE]
		[3] define Trace::handle_interrupt [10 min] DONE.
		[4] in excp_helper.c call send_event [20 min] DONE
		[5] in ops_sse.h decalre function isCR3ToTrace [15 min] DONE.
		[5] debug [20 min] DONE.
11:45AM
		[6] IMPROVE the tracing in Trace;:handle_interrupt [5 min]
		[7] inspect the raw trace generated. [30 min]
			Observation: alsmot every Interrupt is accompanied by context switch.

4:00PM
Development Plan:
	Trace::checkRecordStatus. 
		Logic:
			(0) declare a flag (JUST RECEIVED INTERRUPT). [5 min] DONE.
			(1) update the stack. If just got the event interrupt switch the flag of interrupt and push one token into stack. If
				the opcode is iret then pop the stack (note iret's opcode check xp image) [30 min] DONE
			(2) keep the rest of the logic [10 min] DONE.
			(3) debug [15 min]
5:00PM
				Problem 1: expectedEIP problem. Need to add last_eip in Trace and use it to predict
				the next_eip. Still not working
				Attempt 2: catch iret instruction.
			
			(4) inspect log [20 min]
			Seems now ok. Sometimes instructions repeat itself and the expInstr is the next one but it seems ok.

*** ts=427098 for eip 0x40103c

7:45PM
		Still problems with nested interrupt iret.
		Inpect the log, look at all the warnings.
			[1] most of exceptions has int_no:0xe and error_no:4 or 6
				both can trigger warning or not trigger warning
			[2] for most warning message, the address is the next instruction (mostly jumps/calls)
			[3] only two exceptions @403fdc and @404028.
					For @403fdc it needs 3 iret to return to the right place.
			[4]*** all interrupt information next_eip_addon is 0. looks suspicoius.

		Debug Plan:
			[1] conditional bp on @403fdc
			[2] then bp on raise_interrupt, raise_interrupt2, and checkRecordException.
			Observation: 0x403fdc raises tbl_fill error -> 0x804e1f25 -> iret to 0x80range
			Verified; there is no interrupt in between. So an interrupt can actually contain
		multiple iret before returning to the target. Check why there are so many levels
		Debug Plan: directly return true for 0x403fdc. Adjust the value of eipExpected so that we can
		capture all iret.

				Problem is Here: 
			timeStamp: 276914, ins @804e1fca: test  [ebp+0x70], 0x00000200
			 read: (start: 0xf750bdd4, end: 0xf750bdd7) , ESP: 0xf750bd64 -> 0xf750bd58
			timeStamp: 276915, ins @81f8f5c4: push  esp
				This is clearly a swap, it is not captured by interrupt.

7:30AM 10/18/2013
-------------------------------------------------------------------------------
Task 164:   Investigate context switch problem's complete solution AGAIN
-------------------------------------------------------------------------------
		Debug Plan:
			[1] conditinal bp on @403fdc
			[2] then bp on checkRecordException and decrease the idxExpect so that all instructions will be
				captured
			[3] conditional bo on @804e1fca helper_trace2, and then step by step and see how it
				gets into @81f8f5c4.

		Observation: 
			[a] right after 0x804e1fca, the current TLB block ends and it enters
				#0  cpu_x86_exec (env=0x28dbd190) at /home/csc288/qemu/qemu-1.4.0/cpu-exec.c:321
				There is a huge branch checking interrupts.
					It first calls 
						cpu_svm_check_intercept_param(env, SVM_EXIT_INTR, 3270); 
						and then *** do_interrupt_x86_hardirq(env, intno, 1); defined in target-i386/seg_helper.c:1293
					Note: do_interrupt_all in seg_helper.c:1196

		Debug Plan 2: Figure out the call sequence
			[1] b excp_helper.c:111 (raise_interrupt2), then Trace::handle_instr, Trace::checkRecord, do_interrupt_all
				verified: it's raise_interrupt -> do_interrupt_all -> Trace::handle_instr
				So we can move the logic to do_interrupt_all

9:45AM
		Implementation Plan:
			[1] move the logic from excp_helper.c to do_interrupt_all in seg_helper.c [15 min] DONE.
			[2] add two check logic: [15 min] [DONE]
				(1) get the first 16 bit of the target address, if not match, generate ERROR message (Util::error)
				(2) when idx>1 generate a warning: nested interrupt. 
			[3] generate the raw and inspect @403fdc [30min]
				[1] handle the case sometimes next_eip is 0.[
				There are three such warnings captured:
					[1] 0x7c900719->0x7c902f06: verified ok 
					[2] ins @805633f1: jnz   0x00009600 ->  @805633f7. OK.
					[3] @805788ea:  --> 804dc750. ok.
				Warnings are caused by 0xb1 and 0x9e. Seems no need to figure out the details.
				[2] observe nested:  all good. no more than 3 layers of nested.
				[3] observe @403fdc.	 Now fine with nested interrupt.
					There is a strange repitition of code as shown in the following, not sure if it will impact
				Note here: the execution has proceeded to 0x403ff0 (after the nested interrupt on @403fdc returns successfully),
		Then it gets an interrupt which directly returns to 0x403fdc again.
-----------------------------
					INTERRUPT: int_no: 0xe, is_interrupt: 0, error_no: 4, nxteip: 0x403fdc
-- Context Switch!
-- Context Switch BACK!
timeStamp: 263660, ins @403ff0: mov ebx, 0xFFFF0000

timeStamp: 263661, ins @403fdc: mov eax, [0x0041D400]
 read: (start: 0x41d400, end: 0x41d403)
timeStamp: 263662, ins @403fe1: and [ebp-0x8], 0x00
 read: (start: 0x12ffb4, end: 0x12ffb7)  write: (start: 0x12ffb4, end: 0x12ffb7)
timeStamp: 263663, ins @403fe5: and [ebp-0x4], 0x00
 read: (start: 0x12ffb8, end: 0x12ffbb)  write: (start: 0x12ffb8, end: 0x12ffbb)
timeStamp: 263664, ins @403fe9: push    ebx
 write: (start: 0x12ffa0, end: 0x12ffa3) , ESP: 0x12ffa4 -> 0x12ffa0
timeStamp: 263665, ins @403fea: push    edi
 write: (start: 0x12ff9c, end: 0x12ff9f) , ESP: 0x12ffa0 -> 0x12ff9c
timeStamp: 263666, ins @403feb: mov edi, 0xBB40E64E

timeStamp: 263667, ins @403ff0: mov ebx, 0xFFFF0000
-------------------------------

		[4] generate the full trace. [10min]
		[5] generate the branch
			use ts=404629 for 0x40103c. Seems to fix the 403fdc problem, but new problems comes up.



-------------------------------------------------------------------------------
Task 165:  check writeSOC.  DONE.
-------------------------------------------------------------------------------

11:30AM
-------------------------------------------------------------------------------
Task 166:  During every iteration, redo the init data slice again and see what's going on.
-------------------------------------------------------------------------------
			*use ts=404629 for 0x40103c. 
	[1] Implementation [15 min] DONE.
	[2] Debug [15 min]
		[1. problem. Add program entry into slice]. Read the log.. full_slice is not working. fixed
		[2]. Still problems: (1) printf is still included. (2) infinite loop at the beginning. 

3:30PM
-------------------------------------------------------------------------------
Task 166:  check why printf is included again
-------------------------------------------------------------------------------
			*use ts=404629 for 0x40103c. 
				printf eip: 0x401022 (ts=392742)
	[1] bp on Trace::gen..., set ts=404629, and reverse_trace on 392742
==============================
Reverse ID: 0, ts: 392742, ins @401022: call    0x00000076
Reverse ID: 1, ts: 399467, ins @40112b: ret
Reverse ID: 2, ts: 399468, ins @401027: add esp, 0x04
Reverse ID: 2, ts: -3, -> SOC End
==============================

	Problem: 399468 should be treated as bNoDataDependency at all! BP on 399468; it's included because of the esp link.

	[2] even fixed the above, the ret at 399467 is still treated as control link because the function has dependee.
		processFunction identifies printf() as bHasDependee because of 392741 has isEspDelayDependent() and
	could not find one before the entry.
			processFunction has bug! it's not recovering EIP!
		392743 could not find a corresponding ts with same ESP!
		Found the problem: 392743 is not cleared for EspProcessDelay flag!

7:30PM.
	[3] check why findTS did not find anything. [45 min]
			fixed the bug. DONE.
	[3.5] also double check the last connection point. OK.

	[4] fix the clear_in_slicetags, and add bSOC to processFunction [30 min]

	[5] the tsEsp search is delaying.
	[5] debug: use ts=404629 and break on processFunction of tsRightAfterRet=399468.

Observe 399468.

7:30AM 10/19/2013.
	Observe the log.txt, 
		sliceat: ts=404629 for 0x40103c (jnz ...)
				printf call: 392742, 399467
		read the processFunction log and see why there are real data dependency. (in ts reverse order)
			399457: because of 399480 push fs:[], it reads from 399457 move fs:[], ecx
			399223: because of 400490 in getchar. It's in sysenter code, push [ebx], must be some kernel structure.
			399186: because of 400935 @804d917e: xadd  [ecx], eax (looks like some stats updates)
			399156: caused by 400729  ins @8056452a: xadd  [ecx], eax 
			399122: caused by 400798 ins @804e2a00: mov   ebx, [eax]
			399092: caused by 402006 @804d91b9: xadd  [ecx], eax

		Trouble with kernel structure.

	Debug effort 2: check 399457 and 399223's reverse trace.
	Analysis shown below: Problme: why is 40159d included?

======================================
   reverse trace for ts: 399457 the mov fs:[], ecx 
======================================
Reverse ID: 0, ts: 399457, ins @402938: mov	fs:[], ecx
Reverse ID: 1, ts: 399480, ins @4028f5: push	fs:[]
Reverse ID: 2, ts: 399484, ins @402908: sub	esp, eax
Reverse ID: 3, ts: 399485, ins @40290a: push	ebx
Reverse ID: 4, ts: 399486, ins @40290b: push	esi
Reverse ID: 5, ts: 399487, ins @40290c: push	edi
Reverse ID: 6, ts: 399491, ins @402917: push	eax
Reverse ID: 7, ts: 399493, ins @40291b: push	[ebp-0x8]
Reverse ID: 8, ts: 399499, ins @402934: ret	
Reverse ID: 9, ts: 399508, ins @401500: push	esi
Reverse ID: 10, ts: 399514, ins @4016ba: mov	esi, [ebp+0x8]
Reverse ID: 11, ts: 399530, ins @404779: push	esi
Reverse ID: 12, ts: 399547, ins @4047a1: pop	esi
Reverse ID: 13, ts: 399550, ins @4016de: or	[esi+0xC], 0x00008000
Reverse ID: 14, ts: 399610, ins @404091: mov	eax, [esi+0xC]
Reverse ID: 15, ts: 399617, ins @4040b3: or	eax, 0x01
Reverse ID: 16, ts: 399620, ins @4040be: jnz	0x0000000B
Reverse ID: 17, ts: 399621, ins @4040c9: mov	eax, [esi+0x8]
Reverse ID: 18, ts: 399626, ins @4040d5: call	0x000000BC
Reverse ID: 19, ts: 399635, ins @4041b6: ret	
Reverse ID: 20, ts: 399636, ins @4040da: pop	ecx
Reverse ID: 21, ts: 399638, ins @4040dc: call	0x00005403
Reverse ID: 22, ts: 404483, ins @4095c9: ret	
Reverse ID: 23, ts: 404484, ins @4040e1: add	esp, 0x0C
Reverse ID: 24, ts: 404492, ins @4040fe: push	esi
Reverse ID: 25, ts: 404497, ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 26, ts: 404500, ins @4041b2: mov	eax, [eax+0x10]
Reverse ID: 27, ts: 404504, ins @404105: cmp	eax, 0xFF
Reverse ID: 28, ts: 404505, ins @404108: jz	0x00000032
Reverse ID: 29, ts: 404506, ins @40410a: push	esi
Reverse ID: 30, ts: 404511, ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 31, ts: 404514, ins @4041b2: mov	eax, [eax+0x10]
Reverse ID: 32, ts: 404518, ins @404111: cmp	eax, 0xFE
Reverse ID: 33, ts: 404519, ins @404114: jz	0x00000026   
Reverse ID: 34, ts: 404520, ins @404116: push	edi
Reverse ID: 35, ts: 404522, ins @404118: call	0x00000079
Reverse ID: 36, ts: 404531, ins @4041b6: ret	
Reverse ID: 37, ts: 404533, ins @404120: push	esi
Reverse ID: 38, ts: 404539, ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 39, ts: 404542, ins @4041b2: mov	eax, [eax+0x10]
Reverse ID: 40, ts: 404545, ins @40412d: and	eax, 0x1F
Reverse ID: 41, ts: 404551, ins @404138: jmp	0x00000007
Reverse ID: 42, ts: 404552, ins @40413f: mov	al, [eax+0x4]
Reverse ID: 43, ts: 404555, ins @404146: jnz	0x00000009
Reverse ID: 44, ts: 404556, ins @40414f: cmp	[esi+0x18], 0x00000200
Reverse ID: 45, ts: 404557, ins @404156: jnz	0x00000017
Reverse ID: 46, ts: 404558, ins @40416d: mov	ecx, [esi]
Reverse ID: 47, ts: 404563, ins @404178: jmp	0x00000016
Reverse ID: 48, ts: 404564, ins @40418e: pop	esi
Reverse ID: 49, ts: 404566, ins @404190: ret		//control link	
Reverse ID: 50, ts: 404567, ins @401599: pop	ecx //need visit link
Reverse ID: 51, ts: 404569, ins @40159d: mov	[ebp-0x4], 0xFFFFFFFE //---- problem. XXXX. need visit link
Reverse ID: 52, ts: 404609, ins @4015a9: mov	eax, [ebp-0x1C] //ok.
Reverse ID: 53, ts: 404625, ins @40102f: mov	[ebp-0x4], eax //ok.
Reverse ID: 54, ts: 404628, ins @401038: cmp	[ebp-0x4], 0x61 //ok.
Reverse ID: 55, ts: 404629, ins @40103c: jnz	0x00000011 //ok.
Reverse ID: 55, ts: -4, -> SEED! 
======================================
   END OF reverse trace for ts: 399457
======================================

		sliceat: ts=404629 for 0x40103c (jnz ...)
3:00pm 10/19/2013
	[1] Check timestamp 404569,  it is not added in slice at all
	[2] check why it's in reverse_trace. check why it's added to 404567's reverse trace
		404567 adds 404569 as the reverse pointer because of the setNeedVisit link
		404569 reaches 404567 because of control link ok.
	[3] check why 404569's reverse pointer points to 404609.
		OK.
		So 404569 is set as control visit because 404570 call... is skipped, and it has to be hit
	[4] make improvement to processFunction.
		[1] fix the add order [10 min]
		[2] fix the bOutEntireDependee [10 min] Need more elaboration
	[5] Add ReverseTrace Link Type. [30 min]
			[1] add. [5 min] DONE
			[2] clear in_slice_flags for InstrInfo as well. DONE.
			[2] fix all syntax errors. [20 min] DONE.
			[3] serialization [20 min] DONE.
			[4] unit test [15 min] DONE.

--- TO DO PROESSFUNCTION!!!!!!!!!!!!!
		
9:00AM 10/20/2013	
-------------------------------------------------------------------------------
Task 167:  improve the processFunction
-------------------------------------------------------------------------------
	[1] algorithm design [1 hr] DONE
	[2] refactor call entry. [15 min] DONE.
	[3] declare and use checkFunctionNoChangeOnESPEBP(tsEntry, tsRet) [15 min] DONE.
11:00AM
	[4] implement checkFunctionNoChange [20 min] DONE.
	[5] verify printf() getchar() do not change esp/ep [10 min] DONE.
	[6] algorithm design  checkDependee(). [20 min]
		scan backward
			if is needed for mem mark and directly return
			if is needed for reg, check if its reg is delayable
	[7] update callRetRecord. [50 min] DONE.
		[7.1] add an array of registers to protect and a counter [5 min] DONE
		[7.2] add method addRegProtected [8 min] DONE
		[7.3] add method isRegprotected [5 min] DONE.
		[7.3] update serialization and unit test it [20 min]
	
10:30AM 10/21/2013
	[8] add the following to InstrInfo
		[1] hasExactlyOneRegOperand(bool bAsRead) [20 min]
				get the insn, and then check all in/out records. 
		[2] unit test, provide a list of sample instructions [40 min] DONE.
7:30AM 10/22/2013
		[2.5] re-implement hasExactlyOneRegOp and unit test it [1.5 hr]
			very trick case, blame on the bad design of libdias.
10:00AM
		[3] Trace::isAccessOneRegFromMem(long ts, bool bRead), readFromMem, writeToMem 
				[3.0] memRange::getTotalSize() [10 min] DONE.
				[3.0] InstrExecRecorder::getWriteMemSize, getReadMemSize [10 min] DONE.
				[3.1] implement is AccessOneReg [15 min] DONE.
					chech has on reg, one writeMem or one ReadMem, and check hasExactlyOneregOperand
				[3.2] implement read and writeOneReg [8 min] DONE
				[3.3] testTrace:: constructSampleCall [40 min]
11:30AM 		[3.35] debug the constructSampleCall [45 min]
				[3.4] unit test isReadRegFromMem and isWriteRegFromMem [20 min] DONe
8:00PM 
		[4] Trace::collectRegProtected [30 min] DONE
				scan forward 20 instructions
					for each ts
						if isWriteOneregFromMem
							get reg, mem addr and save to arrRegStoreAddr
				scan backward 20 instructions
					for each ts
						if isReadOneRegFromMem
							verify
							if ok, update the CallRetRecord
9:45AM 10/23/2013
		[5] call collectRegProtected in collect call and debug it [45 min]
			[5.1] code inspection and make the changehe [DONE]
			[5.2] debug through the collectReg [DONE]

		[6] regenerate the full trace and check the registers protected by printf.
			[1] observe printf in winxp [10 min] 
					It does not protect any register except ebp.
1:30PM
		[7] bug fix: need to redo the getOnlyRegs of an instrcution, change its parameters to InstrInfo itself.
			[7.1] Algorithm Design [30 min] DONE.
			[7.2] Modify InstrInfo::hasExactlyOneRegOperand and add a parameter setReg [15 min] DONE.
			[7.3] update the algorithm for func_check_operand [15 min] - skipped no need. DONE
			[7.4] update the algorithm of getOnlyOneReg --> call some function in InstrInfo.cc [20 min] DONE.
			[7.4] update the algorithm of collectRegProtected [10 min] 
			[7.5] debugging [15 min]	

				sliceat: ts=404571 for 0x40103c (jnz ...)
				printf call: 392678, 399409 
				Debug plan: [1] b Trace::gen and set the ts, and then break on Trace::collectRegProtected and conditional branch
					on 399467
				[a] fix one bug in match call ID.
				[b] fix algorithm of first visit.
				[c] printf should protect EBP, EBX, ESI, EDI but the algorithm did not find it. See if increase the
					search range could help. --> 100.
					Now works!!!

4:30Pm	
		[6] update the algorithm of Trace::hasDependeeInFunctionBody() 
			[6.0] algorithm design [20 min] OK.
			[6.1] implement Trace::isFunctionProtectingReg() [15 min] .DONE.
			[6.2] call isFunctionProtectingReg in hasDependeeInFunctionBody() [10 min] DONE.
			[6.3] Debugging
				[1] fix the bug on check on ESP/EBP. [OK]
				[2] fix unhandled case for eip: 404171.
			[6.5] check on printf(), it seems that it still has that memory dependency problem on fs:[0].
				has data dependency on 399399.

7:30PM [7] update the algorithm of processFunction
			[7.0] declare an unsigned int inSliceCount for InstrInfo, and remove the bMultiOccurance tag
					in InstrInfo [10 min] DONE.
			[7.1] update the Trace::setInSlice(long long int ts) [15 min] DONE.
			[7.2] declare unmark_inslice() for IER and II, but keep reverse pointer [15 min] DONE.
			[7.3] define Trace::unmark_inslice(long long int ts), clear the slice tag for IER, and reduce
					one count on the InstrInfo, and if it reaches 0, clear the slice tag for InstrInfo.
					The reason to keep the reverse_pointer is just in case it is to forward the dependency [10 min]
					DONE.
			[7.4] declare delayDependencyForFunction(long long int tsEntry, long long int tsRet) [5 min] DONE.

9:00AM 10/24/2013
			[7.5] call delayRegDependency() in processFunction [30 min] DONE.
			[7.6] implement delayRegDepenndency().[30 min] DONE.
10:00AM
			[7.7] debug delayRegDependency [20 min]
				sliceat: ts=404571 for 0x40103c (jnz ...)
				printf call: 392678, 399409 
				[problem 1] has to serialize countInSlice.
			[7.8] debug processFunction [30 min]
				[problem 1] fix handling of other types of data dependency. DONE.
				[problem 2] missing updateCache(). DONE.
11:04AM
			[7.9] problem: 399409 is included as the last timestamp of SOC and is included.
				Fix the identifySOC. DONE.
			[7.10] check why the last ret 399409 is inSlice. (reverse link points to 399410 for control link). caused by
				memlink 399499. 
			[7.12] fix bug tsEntry<tsSOCStart, 322391 to 333791
			[7.12] dump the dependency of 399399
				modify reverse_trace DONE.
		Dump below:
printf call: 392678, 399409 
7:30PM Check the trace
	[1] Problem: 404708 is included in slice.  It's greater than the sliceat point.
			it is caused by 404571 has 40 times of access (starnge) must be something wrong with serialization.
		Regenerate the raw, full, and branch trace. Fixed
	[2] Problem 2: crashed at complaining not descending order. bp on full_slice tsStart==289491.
		When processing 289490 (searching for tsEnd of SOC), strangely it did not find 289491.
			Still the problem of visitedOnce -----
----------------------------------------------

7:30AM 10/25/2013 Continue on the descending order problem.
	Conjecture: problem is with the countInSlice number
		Sliceat: Timestamp: 404737 0x40103c
		printf (392684 @401022 -->   399409 @0x40112b)
	Still the problem is 399399.
	Fixed the problem, but now the algorithm works very slow.
  Now dumps below:

======================================
   reverse trace for ts: 399399
======================================
Reverse ID: 0, ts: 399399, 		 Type: MEM_LINK	ins @402938: mov	fs:[], ecx
------------ printf finishes at 399409 -------------
Reverse ID: 1, ts: 399422, 		 Type: ESP_LINK	ins @4028f5: push	fs:[] //OK.
Reverse ID: 2, ts: 399426, 		 Type: ESP_LINK	ins @402908: sub	esp, eax
Reverse ID: 3, ts: 399427, 		 Type: ESP_LINK	ins @40290a: push	ebx
Reverse ID: 4, ts: 399428, 		 Type: ESP_LINK	ins @40290b: push	esi
Reverse ID: 5, ts: 399429, 		 Type: ESP_LINK	ins @40290c: push	edi
Reverse ID: 6, ts: 399433, 		 Type: ESP_LINK	ins @402917: push	eax
Reverse ID: 7, ts: 399435, 		 Type: MEM_LINK	ins @40291b: push	[ebp-0x8]
Reverse ID: 8, ts: 399441, 		 Type: CONTROL_LINK	ins @402934: ret	
..-------------------------------------------------------------------------------------

8:30AM
Problem: 399441 should match 399420 so that the entire call is not processed.
	[1] verify using Windows XP
		399420: @4014d0 call seh_prolog4
		399441: return to @4014d5
	It establishes a new exception handler and adjusts esp (enlarges stack frame).
It pushes fs:[] to let the new exception handler to point to the existing SEH handler.

Algorithm Discussion: for instructions modifying fs, send alert and record the fs:[0] address.
For instruction modifying fs:[0], send event and entail information of the new fs:[0] value.
When processing function, if an instruction is modifying fs:[0] value and it is being dependent on memory link,
unmark the instruction and add a delayed link.

10:00AM
-------------------------------------------------------------------------------
Task 168:  fix call/ret pairing routine 
-------------------------------------------------------------------------------
	[1] debug and set conditional bp on 399441 [15 min]
	[2] remove the last statement in the function [10 min]
	[3] causes desc order check to fail. Temporarily enable it after we fix the push fs:[] issue.

10:20AM
-------------------------------------------------------------------------------
Task 169:  identify fs preserving function and delay handling
-------------------------------------------------------------------------------
	[1] Algorithm Design [25 min]
11:00AM
	[2] Capture FS modifying.
		[2.1] define FS_0 value in env [8 min] DONE.
		[2.2] defines event  NEW_FS_0 [10 min] DONE.
		[2.3] in process instruction in ops_sse.h, whenever FS_0 value changes from the older one, send an event  [15 min] DONE
		[2.4] fix TraceManager, Trace handle event. in Trace add FS_0 value and set it to new value.  [20 min] DONE.
		[2.5] debug and get the new FS value. should be 7ffdxxxx range. [15 min]
			bug1. it should be initialized [ DONE].
			verified. fs0 set. [DONE]
		[2.6] fix the old problem of non existing page map again.
			pagemap is 0 again. --> fixed SIMPLEY REBUILT THE ENTIRE SYSTEM AGAIN.
		[2.7] observation: FS_0 ACTUALLY NEVER CHANGES!!!
3:40PM
	[3] Capture re-writing of SEH handler.
		[1] declare class FSChangeRecord. 
				in Trace declare a Cache that keep the record of fs0, name it histChangeFS0, record the ts and the
				value of SEH [30 min] DONE
		[2] code inspection and unit test FSChangeRecord [20 min]. DONE.
		[3] add FCR. [10 min] DONE.
7:30PM 
		[1] in helper_trace_mem, if the write addr is fs_0, set the flag to collect fs:[0] [15 min] DONE.
			[0] in ops_sse.h, declare a flag bCollectFS0Content. [5 min] DONE.
			[2] in helper_trace_mem set the flag. [10 min] DONE.
 		[2] in helper_trace2, if the flag to collect fs:[0] is set, unmark the flag, collect the value and send the event
			to Trace [15 min]
			[1] declare event resetSEH [10 min] DONE.
			[2] in help_trace2 send the event [10 min] DONE.
			[2.5] set the flags in helper_trace_mem [5 min] DONE.
			[3] BatchAnalyzer to Trace, handle_reset_SEH [15 min] DONE.
			[4] debug and verify Trace has the event [10 min] NOT WORKING
8:30AM 10/26/2013
	Redesign Algorithm [45 min] DONE
9:30AM
	[1] in helper_trace_mem if the write addr is fs_0, send an event for SEH_CHANGE_ALERT
		[1] in event.h define the event [10 min] DONE.
		[2] in helper_trace_mem send the event [10 min] DONE.
		[3] in BatchAnalyzer, TraceManager, and Trace handle the event [15 min] DONE.
		[3.5] in BatchAnzlyer declare two functions for managing NEED_SEH. [5 min] DONE.
		[4] in Trace::handle_seh_change_alert call BatchAnalyzer::setNeedsSEH increase a counter [10 min] DONE.
		[5] in handle.h declare function isNeedSEH [5 min] DONE.
		[6] debug and capture the SEH_CHANGE_ALERT [15 min] DONE.
10:40AM
	[2] collect the SEH value
		[0] algorithm design [15 min] DONE
		[1] in helper_trace2 check if SEH is needed, if it is needed send the event (cr3, eip, value) [10 min] DONE
		[2] in BatchAnalyzer, TraceManager handle the event [10 min] [DONE]
		[3] Trace handle the event, check (eip, value) if it is as expected [15 min] [DONE]
		[4] continue the logic, push it into FCS record [10 min] [DONE]
		[5] debug through [15 min]
			[1] should collect only when in record mode. DONE.
			[2] collect raw trace. DONE.

12:00PM
	[3] construct full trace.
		[3.1] in Trace::constructFullTraceFromRaw Trace, simply set the FCR to the right path [10 min] DONE
		[3.15] in Trace::expandFromRow add a log message if a instruction is saving SEH. [10 min] DONE.
		[3.2] debug and test [10 min] DONE
------------------- TO do.
		[3.3] do the samething for loadFullTrace [10 min] DONE.
		[3.4] develop Trace::isFunctionPreserveSEH(tsCall, tsRet, idSEHHint) [30 min]


-------------------------------
7:30 10/28/2013 
		[1] update the search of SEH records [20 min] DONE.
		[2] call searchforseh in setupCallTable [20 min]		
			[2.1] in CallRetRecord add a flag and the update function [10 min] DONE.
			[2.2] fix the serialization problem in unit test. [20 min] DONE.
8:30AM
			[2.2] call it in setupCallTable [30 min] DONE
			[2.3] finish the searchSEH [30 min] [DONE]
			[2.4] code inspection [30 min]
10:30AM
			[2.2] debug through isPreserveFunctionSEH 
				[1] fix loadFullTrace bug, pathSCR not set. [15 min]
				[2]  fix loop bug [10 min]
				[3] fix the loop logic. [15 min]
				[4] fix the appendRecord error [30 min]			
					try call resetToLast().  not working.
					debug and check last 3 appends. Observation: broke and cache block size.
					problem: search for all did not reset it.
					Now seg fault.
			--- strange, cannot find out the problem. ------------ need bp later.
7:30PM
					[4.1] recompile, rebuild, and regenerate the full trace.
					[4.2]. still broke at 98166, use watch point to find out problem.
						loadCache 3797 problem. It's the serialization problem.
						3817 is already not right.
					[4.3] try call resetToLast() in append() and see if it works.
					[4.4] this was overwritten.
					check the logic of Cache::loadBlock 
					--- debug -- check the contets of 3817 and check when it is written to disk.

8:30AM 10/20/2013
	Continue the debug. 
	[1] b Trace.cc:360 if retID==98083
	[2] load ccr.loadFromCache(3817) and see if it is messed. confirmed it's messed
	[3] b Trace.cc:360 if retID==97747 and repeat 2 see if it's mess.ed. verified it's messed
	So we need to debug and set watch point and see how it's messed
	[4] set a breakpoint at ccr.appendToCache when callID is 3817 and retrieve where it is stored
			size is 34
			content is: 
			0xbfffda8c:     0x013c8401      0x00000000      0x12fa4800      0xffffff00
			0xbfffda9c:     0xffffffff      0x919b78ff      0x000ee97c      0xffff0000
		It is appended in the last position (3816).
		It is stured at: offset 26578, 0x6640f7da (curBlockID is 3)
			(gdb) p this->block
			$36 = 0x66409008 "\001k\001\001"
			(gdb) p this->curBlockSize
			$37 = 26578
			(gdb) p this->block + this->curBlockSize
			$38 = 0x6640f7da ""


	[5] check when it is saved, if it's the same content.  and check when it's loaded.
			1. the first time it's saved it's fine. First time it's loaded it's ok.
			2. multiple loads and save it's ok.
			3. check if after an updateCache it's changed. It's still fine.
			4. set a watch point (blockID=3 it should imple that at location 26578 the content is 0x013c8401
				watch this->curBlockID!=3 || this->block[26578]--0x013c8401
				does not work, only captured that when it is loaded, it's messed.
			5. try to locate the last Cache::writeCurrentBlockToDisk. Insert the check at the
				beginning and end of writeCurrentBlockToDisk().
				Findings: first throwing error at 97693.
			6. b Trace.cc:413 if i==97693 and check what's going on.
					Delve into searchForCall, the stack has 11 calls in it.
					display this->callTable->test() on each iteration. After the loop it is fine.

				Found that after line 450 the append call it messed.
				450  long long int cid = ccr.appendToCache(this->callTable);
			7. to repeat. [1] b Trace.cc:413 if i==97693 and then [2] b 450
				test it before and after. VERIFIED. now check why it performs like this.

8:45Ma 10/30/2013
		[1] Repeat the experiment 7: [15 min]
			7. to repeat. [1] b Trace.cc:413 if i==97693 and then [2] b 450
				test it before and after. VERIFIED. now check why it performs like this.
		[2] debug into the last appendRecord don't see any difference [15 min]
		[3] do a comparative study of callTable->test() [45 min]
			[1] before:
					in saveCurBlockToDisk, startIdx is 39640, nxtRecordInBlock is 679. posOfSizeIndex: 25106,
					curBlockSize: 25098.
					for the block to load, startIdx is 39540. Does not look right: only 100 bytes of difference?
					1st blow is shown below:
					(gdb) x/8wx this->block
					0x66209008:     0x01016b01      0x00000000      0x12f9ac00      0x01017800
					0x66209018:     0x00000000      0x9105c800      0x000c007c      0xffff0000
			[2] after:
				Same
			Verified, the distance of the records are too close to each other, and the later writes overwrites
			the earlier, which messes up.					
		[4] check how vecStartIdx is loaded. [30 min]
			break on Cache.cc:158
			Observation: when it is working in the LAST block (still not reaching the full capacity).
				It seems that vecBlockSize[] is not updated correctly.
10:45AM
		[5] introduce a function: updateLastBlockIdxSize() [20 min] DONE.
			if the idx of next block is already in vec, update the size of idx of next block.
		[6] debug the the updateLastBlockIdxSize() [30 min]
			Fixed the problem
		[7] remove the test functions.
11:45AM
		[8] now pick up where we stopped.
			 check the isFunctionPreserveSEH call in it in setupCallTable 
				BP on 451 and display. seems ok.
		[9] check the read of 0xFFFFFFFF problem.
				b ops_sse.h:2419
				it seems that when reading 0xfddf0000 (in kernel mode) it's always return 0xFFFFFFFF.
		[10]
		Sliceat: Timestamp: 404629 0x40103c
		printf (392742 @401022 -->   399467 @0x40112b)
			set BP at  451 if i==399467
			verified it's true

		[11] slice.
			Trace Size: 455751, in slice: 136261, Percentage: 29.90%
			Instruction Store Size: 48115, in slice: 13929, Percentage: 28.949392%
			Instruction Store Size (excluding imported DLL): 3247, in slice: 2225, Percentage: 68.524792%
		Printf is still there.

		[12] check the fs reading instruction is still there. (timestamp: 399457 @eip: 0x402938).

9:00AM 10/31/2013

-------------------------------------------------------------------------------
Task 170:  Examine the printf in slice again.
-------------------------------------------------------------------------------
	[1] run it again. [30 min]
		Sliceat: Timestamp: 404629 0x40103c
		printf (392742 @401022 -->   399467 @0x40112b)
			set BP at  451 if i==399467
			verified it's true
		The fs instruction is at is 399457
	[2] Algorithm design: Trace full_slice. [15 min]
9:45AM
	[3] Implementation:
		[3.0] modify getCallEntry and retrive the SEH [15 min]	 DONE.
		[3.1] Modify ::hasDataDependency. First check if the function is preservingSEH, and then check 
			ier if it is writing to seh. [20 min] DONE.
		[3.2] Define delayMemSEHReference() [1 hr]. 
11:30AM
		[3.3] call delayMemSEHReference [30 min] DONE.
		--- debug ---
		Sliceat: Timestamp: 404629 0x40103c
		printf (392742 @401022 -->   399467 @0x40112b)
			set BP at  451 if i==399467
			verified it's true
		The fs instruction is at is 399457
		[3.3] Debug into getCallEntry [10 min] OK. DONE.
		[3.4] debug into hasDataDependency [10 min]
				b Trace.cc:1091
			[3.4.1] declare Trace::isWriteToSEH(long long int ts, sehHint)
			[3.4.2] modify the hasDataDependendee and take sehHint [10 min] DONE 
		[3.5] debug into isWriteToSEH. DONE.
7:30PM
		[3.6] debug delayMemSEHRefernce [40 min] DONE.
		[3.7] Fix the VecSOC desc order problem.
			failed at addTS ts=387378
			bp on ts==387379
			Problem with 387641 its countInSlice is changed and not to be used as a boundary any more.
			*** need to rething about the SOC identification and merging algorithm.


8:30AM 11/1/2013
		[3.8] check the addSOC algorithm again.
			[1] read the algorithm. [30 MIN]
			[2] modify the algorithm. [30 min]
10:30AM
		[3.9] debut the addSOC algorithm
			[1] change insertSOC to return the resulting SOC. [20 min] DONE.
			[2] fixed the copyFrom issue [15 min] DONE.
			[3] check the first couple of add. [15 min] 
			[4] fix the merge algorithm. [15 min]
7:30PM
			[5] check why it's violating the reverse order again.
				Last add: 387657.
				bp on IT. Found the problem, <= problem.
				mergeID should be socIdx+1.
			[6] improve the <= problem in findSOCToInsert. [20 min]
			[7] fix bridge problem. [15 min]
			[8] fix the check desc order problem.


			[5] check line 71 (did not hit
		
		[3.6] debug processFunction for printf [20 min]
	

7:30AM 11/02/2013	
-------------------------------------------------------------------------------
Task 171:  improve speed.
-------------------------------------------------------------------------------
	[1] add a bool flag to full_slice_all_soc() when flag is true, check modified soc only [15 min]
	[2] test and run [15 min]  DONE.

8:30AM
-------------------------------------------------------------------------------
Task 172:  check printf again.
-------------------------------------------------------------------------------
		printf (392742 @401022 -->   399467 @0x40112b)
			set BP at  451 if i==399467
			verified it's true
		The fs instruction is at is 399457
	[1]  observe processFunction and see what causes function printf is included. [1 hr]
		dependency: 
			[1] 399457 (seh writing) skipped ok.
			[2] 399223 (also seh writing. skipped ok.
			[3] 399186 Problem:
			timeStamp: 400935, ins @804d917e: xadd  [ecx], eax
 read: (start: 0xe1339748, end: 0xe133974b)  write: (start: 0xe1339748, end: 0xe133974b) , DEPLINKS:  , R: 400934 , M: 399186
		Also 398592,  It seems only these two verify it later.

	[2] modify the program so that it print out all violations in process functions. [30 min]
		Observation: it has a lot of unknown dependency. Take some examples and study them.
		[1] 399121, it is introduced by 399122 on reg dependency. Got to add a limit that restrict the reverse_pointer
			inside the function. There are over 20 memory dependency between scanf and printf, as shown below:
			- has mem dependency at 399186
				-- has mem dependency at 399156
				-- has mem dependency at 399122
				...
				-- has mem dependency at 394560
				-- has mem dependency at 394405
		[2] study these dependency and see if we can remove any
				-- has mem dependency at 399186 -- depended by internal syscall instructions (intenral)
				-- has mem dependency at 399156 -- internal
				-- has mem dependency at 399122 -- internal 
				-- has mem dependency at 399092 -- internal
				-- has mem dependency at 399051 -- internal (lock)
				-- has mem dependency at 399038 -- internal
				-- has mem dependency at 398790 -- internal
				-- has mem dependency at 398774 -- internal
				-- has mem dependency at 398674 -- internal (looks like a lock inc and dec)
				-- has mem dependency at 398592 -- internal
				-- has reg dependency at 398586 -- INTERNAL REG DEPENDENCY ON CR0!!! (switch cr0 and back and forth)
				-- has mem dependency at 398551 -- some global var internal
				-- has mem dependency at 398546 -- internal
				-- has mem dependency at 398501 -- internal
				-- has mem dependency at 398500 -- internal
				-- has mem dependency at 398360 -- internal
				-- has mem dependency at 398359 -- internal
				-- has mem dependency at 398338 -- internal
				-- has mem dependency at 398182 -- internal looks like lock
				-- has mem dependency at 397988 -- internal
				-- has mem dependency at 396514 -- internal
				-- has mem dependency at 396507 -- internal
				-- has mem dependency at 396504 -- internal look like a counter
				-- has mem dependency at 396501 -- internal 
				-- has mem dependency at 395364 -- ins @402543: mov [ebp-0x211], al !!! looks like preparing some internal data 
													strucutes, but it is read by scanf
				-- has mem dependency at 395359 -- *** also in range @4025xx
				-- has mem dependency at 394560 -- ***** depended by 400579 check why.???
				-- has mem dependency at 394405 -- similar to above
				-- has mem dependency at 394403 -- similar to above
				-- has mem dependency at 394274 *** similar to above but in @7crange
				-- has mem dependency at 394170 *** similar
				-- has mem dependency at 394166 *** similar
				-- has mem dependency at 392979 *** 
		[3] start winxp and check 395359 [30 min]
			timeStamp: 395359, ins @40192a: inc [esi]
 read: (start: 0x12fcfc, end: 0x12fcff)  write: (start: 0x12fcfc, end: 0x12fcff) , DEPLINKS:  , R: 395346 , M: 395272 , C: 395358 ESP: 0x12fc98 EBP: 0x12ff20
			Observation: 0x40192a is visited multiple times.
				It's part of the write_char, clearly [esi] is the counter of the number of characters written
				mem addr of the counter is 0x0012fcfc (when printf() is finished it has counter value 9 - 9 chars printed).
				During the call of getchar, 0x0012fcfc is overwritten with some value 982b0000.
				In the other printf, it is cleared to 0 again and used as a counter.
				When calling getchar, the esp is 0x0012ff70 (higher than 0x0012fcfc). So the getchar does use
				the 0x12fcfc as the temp local stack frame and passes somehow the region to the syscall.

			Check the dependee: imeStamp: 400579, ins @80578677: repz movs es:[edi], ds:[esi]
				The first related @7c... instruction is: 0x7c91ec82, it is verified that after the syscall, the area
				of 0x0012fcfc is modified. 

				They are copied somehow to kernel buffer, but it does not seem to be useful to me here???
				Notice that 400579 also depends on other memory bytes in the same region. Clearly, all of them
				belong to the message structure passed by CsrClientCallServer.

				0x0012FCEC is the message passed to CsrClientCallServer (the message structure). And clearly 0x0012FCFC
				is in some type of union structure and included as extra bytes. The kernel routine then blindlessly
				first copies the guy from user stack to kernel buffer, without actually using it.

			Observation: to verify check how 0x0012fcfc is used. At 400579, copy range is shown as below
					read: (start: 0x12fcec, end: 0x12fd03)  write: (start: 0xf74dbcc4, end: 0xf74dbcdb)
					==> 0x12fcfc is copied to 0xf74dbcd4.
					Then it is accessed by
					TimeStamp: 400860, ins @8056a652: movs  es:[edi], ds:[esi]
				 read: (start: 0xf74dbcd4, end: 0xf74dbcd7)  write: (start: 0xe117fac0, end: 0xe117fac3) 
					--> never used.

					and also: timeStamp: 403065, ins @8056a652: movs  es:[edi], ds:[esi]
				 read: (start: 0xf74dbcd4, end: 0xf74dbcd7)  write: (start: 0xe117fac0, end: 0xe117fac3) 
					--> never used

					timeStamp: 406070, ins @8056a652: movs  es:[edi], ds:[esi]
					 read: (start: 0xf74dbcd4, end: 0xf74dbcd7)  write: (start: 0xe117fac0, end: 0xe117fac3) 
					--> never used

					It seems that they will not impact user code!

						Then it is overwritten by a push instruction.
			*************************************8	
			note: get_char is from 399469 to 404625.
			**************************************

			Task 2: observe 399186.	
			400935, ins @804d917e: xadd  [ecx], eax read: (start: 0xe1339748, end: 0xe133974b) 
				Seems to be a lock, always +1/-1 in the kernel code section.
			Check how it's included.

			400935 is added because the entire function from 400463 to 402174 is added 
		(It's a KiFastSysCall), it seems
			that sysenter/sysexit needs processing (protect registers).

=====================================================================================================
	9:35AM 11/03/2013
		********************** 9
		[1] double check how 395359 is included: generate the reverse_trace.
			The problem (previously analyzed): unused stack contents part of the union of the
			request message sent by CsrClientRequestServer. It is copied to kernel buffer, however,
			actually never used.

======================================
Reverse ID: 0, ts: 395359,       Type: NEED_VISIT   ins @40192a: inc    [esi]
Reverse ID: 1, ts: 400579,       Type: MEM_LINK ins @80578677: repz movs    es:[edi], ds:[esi] 
**** copied the entire request_message structure where 395359 modifed part is actually is not used
**** the part is 0x12fcfc
Reverse ID: 2, ts: 400840,       Type: REG_LINK ins @8056a621: lods eax, ds:[esi]
*** the contents is essentially from 0x12fcec (the first word) this determines the message type

Reverse ID: 3, ts: 400842,       Type: REG_LINK ins @8056a623: lea  ecx, [eax+0x3]
*** use it as a pointer (offset)? -- anyway part of the message parsing

Reverse ID: 4, ts: 400843,       Type: REG_LINK ins @8056a626: and  ecx, 0x0000FFFC
*** still part of mesage parsing

Reverse ID: 5, ts: 400844,       Type: REG_LINK ins @8056a62c: shr  ecx, 0x02
*** still part of message parsing

Reverse ID: 6, ts: 400863,       Type: MEM_LINK ins @8056a658: repz movs    es:[edi], ds:[esi]
*** So this actually determines the copy size, rlies on 40844 by copying its contents

Reverse ID: 7, ts: 401872,       Type: MEM_LINK ins @8056a658: repz movs    es:[edi], ds:[esi]
*** still copying contents from that one

Reverse ID: 8, ts: 402192,       Type: MEM_LINK ins @7c91eb96: sub  [ecx], edi
*** 
Reverse ID: 9, ts: 402213,       Type: REG_LINK ins @7c8715f8: mov  esi, [ebp-0x38]
*** the above is to determine the buffer location
Reverse ID: 10, ts: 402220,          Type: MEM_LINK ins @7c87160d: repz movs    es:[edi], ds:[esi] 
*** the above is to copy the I/O reading contents
Reverse ID: 11, ts: 404386,          Type: REG_LINK ins @409184: mov    al, [ecx]
Reverse ID: 12, ts: 404391,          Type: MEM_LINK ins @409192: mov    [ebx], al
Reverse ID: 13, ts: 404560,          Type: REG_LINK ins @404172: movzx  eax, [ecx]
Reverse ID: 14, ts: 404568,          Type: MEM_LINK ins @40159a: mov    [ebp-0x1C], eax
Reverse ID: 15, ts: 404609,          Type: REG_LINK ins @4015a9: mov    eax, [ebp-0x1C]
Reverse ID: 16, ts: 404625,          Type: MEM_LINK ins @40102f: mov    [ebp-0x4], eax
Reverse ID: 17, ts: 404628,          Type: NEED_VISIT   ins @401038: cmp    [ebp-0x4], 0x61
=============================
*********************************************************************************************************
***!!!  Summary (1) : again, verified the problem is the inaccurate of the movb instruction, part of the data
	is actually never used. In processing 400840 depends on 400579, it should piggy back the information
	about the address that it is dependent on. When 400579 selects the next dependee, it will wisely 
	choose which one.
**** [1] in the toProcess array, in addition to the ts specify the reason why it is needed (reg or 
	mem address to be read). Then depends on the data_needed, add the corresonding link.
	For examle, at 400579, if it is said the address 0x12fcfc is needed, then it looks at the depend link 
	and pick the one.
	For example, for a PUSH EAX register, if it is needed because of the value, then both EAX AND ESP should be included, if only ESP is included, then only ESP is included.

	Could declare a Trace::dispatchDependency(Queue&, ts, DataSource) then based on the instruction info, dispatch
the address dependency.
			
			

<< ---
			Task 2: observe 399186.	
			400935, ins @804d917e: xadd  [ecx], eax read: (start: 0xe1339748, end: 0xe133974b) 
				Seems to be a lock, always +1/-1 in the kernel code section.
			Check how it's included.
			Check 400935. Strangely there is no reverse trace for 400935.
			400935 is added because the entire function 400463 to 402174 is added.
			Manually constructed trace:
				400935 --> 400936 (no where)
				400935 --> 402100 (similary xadd) [local dependency goes nowhere] ->403140 (also added because
				entire function added)
			403140 -->  404222 (this is the last xadd included, there are other xadd, they are not included, 
				maybe because they are after 404629 the slicing point)

			404222->404226-->...404231
			Now the question is why 404222 is included. 404225 (jz) has indirect data dependency on 404222
			set BP on 404225. It's a conditional jump so it needs data propagation. Think ....

8:30AM 11/04/2013
	[1] Analysis and algorithm design.  [1.5 hmin]
	Apparently, the data itself it not affected by any user level data (could user level code conttrol flow
	affect its value -- it's a question.)

	It's not directly impact any user level data, but it's infect the path inside the function call.
	
	Our question formally should be framed as: if the function call is replaced with NOP, would it affect
	the execution of the dependee????  This depends on how many memory references we need to check.
			Exam of all dependencies below
				-- has mem dependency at 399186 -- depended by internal syscall instructions (intenral)
				-- has mem dependency at 399156 -- internal (mem addr e1126480) xadd lock like
				-- has mem dependency at 399122 -- internal (mem addr e117f798) xadd lock like, similar 
				-- has mem dependency at 399092 -- internal (mem addr 0x81d803c0) xadd lock like
				-- has mem dependency at 399051 -- internal (lock 0x800ca300)
				-- has mem dependency at 399038 -- internal (0x8055a540) look like counter of lock 
				-- has mem dependency at 398790 -- internal (0x81d80447)
				-- has mem dependency at 398774 -- internal (0x81d805d0) 
				-- has mem dependency at 398674 -- internal (looks like a lock inc and dec)
				-- has mem dependency at 398592 -- internal (look like a global const, no write 0x80042004)
				-- has reg dependency at 398586 -- INTERNAL REG DEPENDENCY ON CR0!!! (switch cr0 and back and forth)
						-- preverse CR0, may add CR0 to preserve register
						-- will not solve the problem. 
						-- NEED TO CHECK LATER!!!!
				-- has mem dependency at 398551 -- some global var internal (0x8055196c)
				-- has mem dependency at 398546 -- internal (0x8055196c)
				-- has mem dependency at 398501 -- internal
				-- has mem dependency at 398500 -- internal
				-- has mem dependency at 398360 -- internal
				-- has mem dependency at 398359 -- internal
				-- has mem dependency at 398338 -- internal
				-- has mem dependency at 398182 -- internal looks like lock
				-- has mem dependency at 397988 -- internal
				-- has mem dependency at 396514 -- internal
				-- has mem dependency at 396507 -- internal
				-- has mem dependency at 396504 -- internal look like a counter
				-- has mem dependency at 396501 -- internal 
				-- has mem dependency at 395364 -- ins @402543: mov [ebp-0x211], al !!! looks like preparing some internal data 
													strucutes, but it is read by scanf
				-- has mem dependency at 395359 -- *** also in range @4025xx
				-- has mem dependency at 394560 -- ***** depended by 400579 check why.???
				-- has mem dependency at 394405 -- similar to above
				-- has mem dependency at 394403 -- similar to above
				-- has mem dependency at 394274 *** similar to above but in @7crange
				-- has mem dependency at 394170 *** similar
				-- has mem dependency at 394166 *** similar
				-- has mem dependency at 392979 *** 


----------------------------------------------------------------
Solution???? hard.
	[1] register protection list. Add cr0. Need to actually collect cr0 value. This is doable
		printf (392742 @401022 -->   399467 @0x40112b)
	[2] in-accuracy problem. solvable.
	[3] internal lock problem (hard to solve) - ok.
			call printf
				acquire(lock)
				...
				release(lock)
			ret

			call scanf
				acquire(lock)
				...
				release(lock)
			ret

		If the value of lock is preserved, then we can skip the printf. 
		Then we need a two-pass process, first we slice analyze and record the 
			[1] the instructions that write to lock [releases]
			[2] search along the call and find those that first write to lock [acquires]
			[3] record these two addresses (instructions)

		In the second pass, whenever these instructions are encounterd, record the memory content, and
		patch it to instrrecord.

		In the slice analyze, we could add the address of memory to call/ret pair that is preserved.

	[4] internal data structures that ARE modified during the call. Solvable.
		E.g.,  396654. It could be some global counter, of running stats collector.
		It has no influence on user code, however, its value is affected indirectly (control dependency, 
		because one more call would add 1 to the counter); it does not have any influcence on
		on the user code (i.e., has no real data dependency), and does not really have any control
		influence.

		In this case, any prioir value (to be depended on) would not affect any of the control path
		or user code execution.

		This is also solvable.

		NOTE *** printf (392742 @401022 -->   399467 @0x40112b)
		
		Another Example: 396514.
		It looks like a part of some data structure. (a pointer). 
		Check its data source
			396514, ins @804dc315: mov   [eax], ecx,  write: (start: 0x81f1e880, end: 0x81f1e883) 
			1. addr eax is from
					esi @ 396459 --> pop esi --> 396450 push esi -> 396420 ecx -> eax -> 396411
					-> 0x81fde888 -> 299218. ... lost trace. seems not related to user code

			2. content ecx is from
				->396509 (0xf74dbc68) -> ebp-10 (seems to be a reserved memory buffer address in local buffer).

		Check its influence: 396514.
			396530->396577->396580 (jnz) might affect control flow. !!!

		Check why it's used at 403329: timeStamp: 403329, ins @804dc256: cmp   [ecx+0x60], 0x00
			it's used by a jz.

			ecx is from: it's from some global memory addr. 
		It may be preserved by the function though.

_________________________________________________________________
_________________________________________________________________
	[1] Implementation Plan: Solve the address preservation problem first, and then accuracy, and then
	the register preservation cr0.

9:00 11/05/2013
-------------------------------------------------------------------------------
Task 173: Check Function Address Resolution (estimated 8 hrs of work). Will solve register and address preservation.
-------------------------------------------------------------------------------
	[1] define class recordRequestProcessor, and serialization[90 min]
			instrAddr
			flag for memwrite or register_value
			short register_value
			keep an internal cache.
7:30AM 11/06/2013
	[2] experiment with mem addr reading, find a function call (printf) and read out the value of addresses at
		the entry and exit. DONE.
		NOTE *** printf (392742 @401022 -->   399467 @0x40112b)
			ts: 392742: @401022
				399156 -- internal (mem addr e1126480) xadd lock like, @804d91b9
				399122 -- internal (mem addr e117f798) xadd lock like, similar, @804e2a25 
				399092 -- internal (mem addr 0x81d803c0) xadd lock like, @804d91b9
				399467 -- @0x40112b
		Idea: at any of these four EIPs, print out the values at e1126480, e117f798, and 81d803c0. Do it in ops_sse.h.
9:15AM 11/06/2013
		[2.1] implementation modify ops_sse.h. [20 min] DONE.
		[2.2] interprest the results [15 min] DONE.

		Observation:
			[1] collection has to be done in the right context. At the beginning or end of call it does not work.
			[2] the values are different at each run! The first value changes 3->2. The second value can be a very large number.
			But 2nd and 3rd value is preserved.


11:45AM 	
	[3] define GEN_RECORD_REQUEST mode in config.txt and in BatchAnalyzer parse it [20 min] DONE.
		[3.1] test if GEN_PRESERVE_RECORD_REQUEST mode is set. [10 min] DONE.

7:30PM
	[4] modify processFunction()A
		[4.1] in Trace define rrProcessor (record_request) processor it should be actually static and a cache for it [5 min] DONE
		[4.2] in Trace create init_record_request() will remove or create a new record request. The cache path
				should be related to job only. (instead of trace). [25 min] DONE.
		[4.3] in BatchAnalyzer when starting a job, calling init_record_request [10 min] DONE.
		[4.4] in Trace define load_rrProcessor() to load it from cache [20 min] DONE.
		[4.5] in Trace constructor, load the rrProcessor. [10 min] DONE.

7:30AM 11/07/2013
		[4.6] in processFunction register dependency, just create two requests for call and return instruction to
			add collection request. [20 min] DONE.
9:30AM
		[4.7] in processFunction handle the memory dependency.
			[4.7.1] in Trace define findLastTSWriteTo(long long int tsSearchPoint, unsigned int) [15 min] DONE.
			[4.7.2] update the processFunction correspondingly [15 min] DONE. 
7:00PM.
			[4.7.3] debug into the memory handling [15 min]
				(1) check the "should have only one register being read" problem. DONE.
					Sliceat: Timestamp: 404629 0x40103c
					printf (392739 @401022 -->   399455 @0x40112b)
					temporarily disabled it.
				(2) check the findLatestTSWriteTo [10 min] DONE.
				(3) check the call in processFunction [10 min]
						two bugs: size and mrWrite. DONE.
		[4.8] re-implement the RecordRequestProcessor 
			[4.8.1] implement RecordRequest class [1 hr]
					(1) data members and constructor and destructor [10 min] DONE.
					(2) addRegRequest(unsigned int reg) [8 min] DONE.
					(3) addMemRequest() [5 min]  DONE.
					(4) serialize() [10 min] DONE.
					(5) deserialize [10 min] DONE.
					(6) code inspection [10 min]
			[4.8.2] RecordRequestprocessor
					[1] reorganize data and public interface [10 min] DONE
					[2] add eipToId [8 min] DONE
					[3] destructor [5 min] DONE.
7:30AM 11/08/2013
					[4] addReg [8 min] DONE
					[5] addMem [8 min] DONE
8:30AM
					[6] saveToCache [15 min]  DONE
					[7] loadFromCache [15 min] DONE
					[8] set up unit test framework [1 hr]
						design: mod3: 0: reg only, 1 mem only, 2 both reg and mem	
						[8.1] fix bug in getId DONE
						[8.2] fix bug in loadFromCache. DON
						[8.3] fix deserialize bug.  DONE.
						[8.4] fix another deserialize bug. DONE.
	
10:00AM			
		[5] Misc testing
				[1.1] delete RecordRequestProcessor when job is completed. DONE.
				[1.2] fix bug in processFunction. DONE
				[1.3] Now the problem is the processing is greatly slowed down. Found that searchForLatestWrite is the most
					time consuming.
					Idea: in full_slice_all_soc only slices it when it is greater than the last size.
					Sliceat: Timestamp: 404629 0x40103c
					printf (392739 @401022 -->   399455 @0x40112b)
				* problem: find the findLatestWrite searchs for -1, it is very consuming. And there are lots of them.
		Idea: have a WriteLink in InstrExecRecorder. Whenever it is writing, update its writeLink to point to lastWrite 
		based on the Cache.
	
11:45AM	
		[6] Improve the findLatestWrite efficency
			[1] similar to arrDependLink add arrWriteLink and a counter [5 min] DONE.
			[2] serialization and deserialization of InstrexecReocrder and unit test it [15 min] DONE.
			[3] add code for handling arrWriteLink [15 min] DONE.
			[4] generate the full trace and test [10 min]
				[1] fix counter problem.
				[1] Problem. appendCache ..
7:30PM			[2] Solve problem appendCache.
					save the entire cache first. DONE.
				[3] generate the full trace. DONE
					Sliceat: Timestamp: 404629 0x40103c
					printf (392739 @401022 -->   399455 @0x40112b)
			[5] update the logic of findLatestWrite and debug it [30 min] DONE.
			[6] debug findLatestWrite
				[1] fix isWriteTo [5 min] DONE.
				[2] Problem: did not call save_... disable the Util::error_exit 
		Fixed. NOW gen_mem request list:
---------------------
8:30AM 11/09/2013
-------------------------------------------------------------------------------
Task 174: Continue on preservation of mem and register.
-------------------------------------------------------------------------------
	[0] Alrogirhtm Desiugn [0.75 hr]
9:15AM
	[1] define boolean flag bNextInstrSetRecordRequest, int rr_id, unsigned int reg_to_record [], int reg_count,
			unsigned int mem_to_record [], int size [], int mem_count  [10 min] DONE.
	[2] in Trace::handle_instr setnextInstrSetRecordReuest [10 min]
	[3] debug if the setNextInstrSetRecord is ok. [80 min]
		[1] problem with vpage table build up again. Rebuild from scratch. Solved.
		[2] the Trace.cc:1534 is never hit, problem is the init_rr_processor is called. Solved.
		[3] brach mode error. rebuilt.
		[4] new stats
					Sliceat: Timestamp: 404629 0x40103c
					printf (392742 @401022 -->   399467 @0x40112b)
		[5] check bug on . Trace.cc:1360 (replace Error_exit with log msg). solved.
		[6] fix bug on save_rr. DONE
		[6] check if Trace.cc:1544 is hit in 0 mode. Fix the load_rr problem. NOW completely work.
11:30AM
	[4] in Trace::handle_instr set the record to request [20 min]
		[1] add protected Trace::setRecordRequest(int id) [15 min] DONE.
		[2] call it in Trace::handle_instr [5 min] DONE.
		[3] debug and verify if everything is ok. [10 min] DONE.

9:30AM 11/10/2013
	[5] provide function get_record_request [30 min]
		[1] define isNeedRecord() and pass it to TraceManager -> Trace [20 min] DONE.
		[2]  debug see if isNeedRecord is ever hit. [15 min] DONE.
10:45AM
		[3] define get_record_request(int *pEIPToCollect, int eipcollectPoint, 
					int *pCountRegRequest, int *pCountMemRequest, unsigned int **pArrReg, 
					unsigned int **pArrMem, int **pArrMemSize); [30 min]
		[4] debug both. [20 min] DONE.
12PM
=======================================================
8:45am 11/11/2013
		[5] Solution for collecting register value
			[1] right after call the disas_insn call collect_reg_value(env, pc_ptr); [5 min] DONE.
			[2] declare gen_save_regs_after_instr(env, pc_ptr) similar to gen_save_esp [5 min] DONE.
			[3] in handle.h and handle.cc define isNeedReg(unsigned int eip, env->cr[3]) and
				map it to TraceManager and Trace [15 min] DONE.
			[4] update gen_save_regs_after_instr correspondingly [5 min] DONE.
			[5] debug and verify if it is ok [10 min]
				[5.1] fix buf_limit problem. OK. now
9:50AM
		[6] Fix the reg translation problem.
			[1] find out where R_ESP and others are defined. cpu.h [5 min] DONE.
			Problems now, the regs save only handles the first 6 registers. What about the others? [1hr]
			(1) flags is computed at run time using gen_compute_eflags(temp_reg)
			(2) dr registers. generated using gen_helper_movl_drN_T0
			(3) cr registers. gen_helper_read_crN
			(4) xmm registers tcg_gen_st32_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,xmm_regs[reg].XMM_L(0)));
				etc. So will need a big switch case to handle the registers

11:10AM
			[2] define a storage of registers in env [8 min] DONE.
			[3] define translator table reg_part_to_whole in trace.h[15 min] DONE.
			[3] update RequestRecord::addRegRequest, use the part_to_whole table [10 min] DONE
			[4] debug addRegRequest and regenerate everything[15 min] DONE.
					Sliceat: Timestamp: 404658 0x40103c
					printf (3927771 @401022 -->   399496 @0x40112b)
			Generated dpeendendies:
-GenRecordRequest for mem dependency at 399215
--GenRecordRequest for mem dependency at 399185
--GenRecordRequest for mem dependency at 399151
--GenRecordRequest for mem dependency at 399121
--GenRecordRequest for mem dependency at 399080
--GenRecordRequest for mem dependency at 399067
--GenRecordRequest for mem dependency at 398819
--GenRecordRequest for mem dependency at 398803
--GenRecordRequest for mem dependency at 398703
--GenRecordRequest for mem dependency at 398621
--GenRecordRequest for reg: 49 dependency at 398615
--GenRecordRequest for mem dependency at 398580
--GenRecordRequest for mem dependency at 398575
--GenRecordRequest for mem dependency at 398530
--GenRecordRequest for mem dependency at 398529
--GenRecordRequest for mem dependency at 398389
--GenRecordRequest for mem dependency at 398388
--GenRecordRequest for mem dependency at 398367
--GenRecordRequest for mem dependency at 398211
--GenRecordRequest for mem dependency at 398017
--GenRecordRequest for mem dependency at 396543
--GenRecordRequest for mem dependency at 396536
--GenRecordRequest for mem dependency at 396533
--GenRecordRequest for mem dependency at 396530
--GenRecordRequest for mem dependency at 395393
--GenRecordRequest for mem dependency at 395388
--GenRecordRequest for mem dependency at 394589
--GenRecordRequest for mem dependency at 394434
--GenRecordRequest for mem dependency at 394432
--GenRecordRequest for mem dependency at 394303
--GenRecordRequest for mem dependency at 394199
--GenRecordRequest for mem dependency at 394195
--GenRecordRequest for mem dependency at 393008




			[5] define get_regs_to_record and propgate to trace [30 min]
			[6] update the save_regs_to_table, and call get_regs_to_record and debug it [30 min]
9:20AM 11/12/2013
			[7] debug the get_regs_to_record.
				[problem 1]. recorded data not right. Regenerate. [20 min] Fixed
				[problem 2]. not the entire reg is added. fixed
				[problem 3]. regenerate the trace.
				DONE.
11:00 AM
			[7] big switch cases [1 hr]
				[7.1] handle EFLAGS [20 min] DONE
					(1) observe translate.c for handling of EFLAGS, read pushf [10 min]
					(2) add the code  [10 min]
				[7.2] handle EAX to EDI. [10 min]
				[7.3] handle CR registers. [20 min] DONE.
					(1) observe translate.c [10 min]
					(2) add and debug [10 min]
				[7.4] fix DR register DONE.
				[7.5] cs to es [15 min] DONE
				[7.6] ldtr [15 min]  observe from sldt instruction.
				[7.7] gdtr [10 min] observe sgdt instruction. DONE.
7:00PM
			[8] update the get_record_request call in ops.seh [20 min] DONE
			[7] debug into the record_request [15 min]
				[1] check first 5 mem read. DONE.
				[2] check register values. 
					Problem 804b00e1.
8:30AM 11/13/2013
			[8] debug problem. 0x804b00e1 is not handled in translate.c, because it is part of the OS
				routine, and it is already translated and in buffer before the process loaded.
9:00AM
				Fix: develop static version of check reg [20 min] DONE.
			[9] debug again, check register values. [15 min] DONE.
				
9:40AM
	[6] send the value.
		[6.1] create an event to in form value is coming in. [30 min] DONE.
			[1] add the event [8 min] DONE.
			[2] handle the event [12 min] DONE.
			[3] debug [8 min] DONE.
10:30AM
		[6.2] declare class RecordValueProcessor, constructor takes parameters of regCount, 
			memCount, regValueArr, [estimate: 128 min]
			[6.1] add attribute rr_id_processed and copy rr_id [5 min] DONE
			[6.2] add and declare class RecordValueProcessor and add the .cc file [8 min] DONE
			[6.3] constructor and data members, add and initilize cache in Trace [25 min] DONE
11:30AM
			[6.4] addRecord(int rrid, int regCount, int memCount, ...) [15 min] DONE.
			[6.4] serializeTo [10 min] DONE.
			[6.5] deserializeFrom [10 min] DONE.
			[6.7] function getRegValue() [15 min] DONE.
			[6.8] function getMemValue() [15 min] DONE.
			[6.9] call addRecord in Trace.cc and update addRecord [10 min] DONE.
			[6.10] destory rvRecord and save everything [8 min] DONE.
7:30PM
			[6.11] call getRegValue and getMemValue at process function. [80 min]
				[1] add the TS information into the request [15 min] DONE.
				[2] add loadFromCache(long long int idx) [10 min] DONE.
				[2] declare RecordValueProcessor::loadByTS(ts, int idxHint) [15 min]  DONE.
				[3] call getRegValue in processFunction [10 min] DONEj
				[4] call getMemValue in processFunction [10 min] DONE.
------------------ TO DO.

		
9:00AM 11/14/2013	
-------------------------------------------------------------------------------
Task 175: Test and verify the handling of MEM and REG dependency
-------------------------------------------------------------------------------
	[1] unit test the getRegValue and getMemValue of recordValueProcessor. [90 min]
				*1. index calc. FIXED
				*2. reg. FIXED.
				*3. loading reg err. FIXED
				*4. expected reg value. FIXED.
				*5. expected mem value. FIXED.
	[2] check recording buf exceeded problem. Instruction 0x7c910c71 might be copying more than 6k bytes!!!
		At this moment, we could only add a limit.
		timeStamp: 11760, ins @7c910c71: repz stos  es:[edi], eax
 		write: (start: 0x140688, end: 0x141e87) , DEPLINKS:  , R: 11735 , R: 11757 , R: 11759
		Sovled. Capped at 0.5k.

	[3] check how is recordValueProcessor saved.
		break on Trace destructor. b Trace.cc:232
		recorded about 29k records. Data recorded about 1.3MB. Instruction exec history: about 14MB.

	[4] check full trace how recordValueProcessor is loaded. b Trace.cc:80 DONE.
	[5] check branch trace how recordValueProcessor is loaded. did not hit. check when the trace is constructed.
					Sliceat: Timestamp: 404664 0x40103c
					printf (392771 @401022 -->   399496 @0x40112b)
	[6] problem: loadTS error. At first function call.
			search problem.
			404659 is not included but 404660 and 404658 is included. It seems that
			the recording of rr_ts is wrong.
8:30PM
			Fix in Trace.cc (rr_id assignment + 1)
			recollect raw trace, full trace, and branch.
			***************************
					Sliceat: Timestamp: 404569 0x40103c
					printf (392682 @401022 -->   399407 @0x40112b)
	[7] fix the register conversion problem. FIXED.
	[8] cannot find record for 399408. fix eipcall-1. regenerate raw, full, branch trace. from record mode.
9:00AM 11/15/2013
	[9] complaining cannot find 404651  (first processFunction)
			regenerate the raw, full, branch trace.
					Sliceat: Timestamp: 404627 0x40103c
					//printf (392771 @401022 -->   399496 @0x40112b)
			Problem 1: search does not work. Broke the 4th time. It's caused by the ts+2 problem. See problem 2.
			Problem 2: the rr_ts is the actual ts + 2.
				BP on Trace.cc:1614 and then 2100. and then 2502.
				Still the +1 problem. check the ts match eip problem.
		solved.
	[10] problem of mismatch size. The first call of getMemValue gets problem.
		timestamp 403992 needs 28 bytes. latest write to the same is 402652 which writes 4 bytes. 
		Then it tries to load 28 bytes from the 4 bytes write. This caused problem.
		We will fix this using more accurate memory tracing later. At this moment take the samller one.

	
11:00AM 11/17/2013
	[10] solve mismatch size. For any size>4, directly return true (means matching values fails, hasDataDependency).		 DONE.
	[11] check mem case. --> problem. 
		But the values seems ok.
--------------------- TO DO ----------------------
	[13] another problem, cannot find rv record. Occurs in one load_vm, however, not the first.
			The problem is that 401550 is not recoreded. (eip: 804e53f1).
			Record again and see if it re-occurs.
			imeStamp: 395329, ins @402543: mov [ebp-0x211], al
			imeStamp: 384535, ins @7c911533: call  0xFFFFF699
 	write: (start: 0x12fd0c, end: 0x12fd0f) , ESP: 0x12fd10 -> 0x12fd0c , DEPLINKS:  , R: 384534 and ESP value: 0x12fd0c
	[11] check reg case --> problem. the recorded value seems not right (all 0's), but the locate of ts is successful

Plan: debug the entire process again. check the memory problem first.


11/18/2013
8:30AM
--------------------------------------------------------------------------------------
Task 176: continue debugging the value recording system
--------------------------------------------------------------------------------------
	[1] error in size mismatch to record.
			Caused by writing 1 or 4 bytes. Fix: take the minimal and print a warning. DONE.
	[2] another problem, cannot find rv record. Occurs in one load_vm, however, not the first.
		[1] modify the code to bool [15 min]
		[2] check how often it occurs [10 min]
			[1] first occurance at 384535, it's a call; second time: 384579 another call.
				chech how often it occurs.
				80 in 995 times. Around 10% of the cases, they are not hit. Could be any instructions.
			temporarily marked as hasDataDepency: true. Later handle it.
11:00AM
	[3] problem: case skip should not occur.  Seems the problem is identifySOC.
			set breakpoint on socmanager.cc:133 if tsEnd=356225
			--> so it seems that identitysoc is not a problem. It's the mergeSOC. Let is run and continue and read the log.
			Problem: 356197 addsoc(356197, 356197) merge with 356198->356225 and 356197 is a RET.

		check addSOC condition tsStart==356197 and see how it's appended.		
		Found the problem. It directly sets the SOC even if it is a RET instruction. In this case, we add another branch to handle it. 		DONE.

	[4] message "ts xxx should have only one register ..." seems suspicious, the ts value too large.
8:00PM
	[5] now the problem, extremely slow. Needs to modify the recordValueProcessor to add a cache.
		[5.1]  declare a CachedMap tsToId in trace.h [5 min] DONE.
		[5.2]  in loadByTs, use tsToId to find the id.  [10 min] DONE
		[5.3]  move constructor, and establish CachedMap [12 min] DONE
		[5.4]  unit testing [5 min] DONE.
		[5.5] devug into loadByTs [15 min] DONE.
		[5.6] still too slow. RV_SIZE too small. need to regenerate. --> solved.
	
	Solved!

---------------- to do --------------------
8:45am 11/19/2013
	[4] check printf, why it's included.
					Sliceat: Timestamp: 404589 0x40103c
					printf (392705 @401022 -->   399427 @0x40112b)
		DUMP below (note: register handing is wrong. only check mem!)
--processFunction ts 399427
-- has mem dependency at 398750 on 0x81ef8417, first bytes: 4 and 2 [1 byte reading!!! check] depended by 403714
	It's the same piece of code called in printf and scanf.
	It's impacting control flow in the interrupt handler.
-- has mem dependency at 398319 on 0x81fde888, first bytes: ffffff90 and ffffffe0 dependended by 401022
-- has mem dependency at 398142 on 0x8068ceb4, first bytes: ffffffed and ffffffef depended by 400900


 
--------------------------------------------------------------------------------------
Task 177: check register handling.
--------------------------------------------------------------------------------------
	[1] check the cr0 register changing instruction.
		[1] get the addr of the instruction: timeStamp: 396649, ins @804dbfd4: mov   cr0, ecx
		[2] set a conditional BP on translate.c:747 if addr==  0x804dbfd4 --> it's never hit.
	[2] check if the cr0 modification instruction is getting recorded
		check timestapm 399649 --> it's not recorded, because it's not the LAST! got to search from backward.

7:30AM 11/20/2013
	[3] new data.
					Sliceat: Timestamp: 404589 0x40103c
					printf (392705 @401022 -->   409220 @0x40112b)
	[4] check the cr0 register changing instruction.
		imeStamp: 401615, ins @804dbfd4: mov   cr0, ecx
	[5] check eip 0x804dbfd4 in Trace::processFunction and see what's happening.
		It is recorded for 0x7c90e3eb and 0x7c90eb94 (eipCall and eipRet).
		The same function call appeared 3 times. It seems that mod cr0 instruction also occured 3 times inside the function.
8:45AM
	[6] now in non_request mode, check translate.c and see if these two instructions (0x7c90e3eb and 0x7c90eb94)
		are ever recorded. [20 min] 
			bp on lines 748 and 756 of translate.c
		Then bp on Trace::handle_instr	
			It seems that we need to wait long enough to hit 748 and 756
		set BP on Trace::handle_instr if addr==0x7c90e3eb || addr==0x7c90eb94 && this->execRecorder>390000 && ts<4020000
			7c90e3eb -> 7c90e8eb (bNext.. is set)A (rr_id is 9), 
		then bp on ops_sse.h:2518 (it is hit), problem is that env->arrReg[xxx] are all 0's. 
		Problem: env->arrRegs never gets a value other than 0.
	[7] check the code which is embedded in translate.c
		bp on gen_save_reg_to_env. It seems that the code is doing the job.
		Need to check the translated code
10:15AM
	[8] bp on Trace::handle_instr for 0x7c90e3eb and trace into its real instructions. See if we got 9 similar code segments
		which copies into env->arrRegs.
			[1] env->arrRegs address:
			[2] debug observation:
		Problem: code is generated, however, there is a jump which directs the control directly to the next instruction.
11:20AM
	[9] Redesign: gen_save_regs move them before the begnning of each instruction.
					
		[1] @instr_to_record(Trace::handle_instr) --> set bNextInstr
			--> if needed for reg, copy the reg to record first //mem addr later because we do not know it yet
		[2] the execution of the instruction, performs the copy of registers  (these are register values before the instruction)
		[3] @instr_next, set the memory addr to copy
		[4] @instr_next before its execution, copy registers and mem values. 

11:30AM
		[5] implementation:
			(1) patch documentation [10 min] DONE.
			(2) move gen_save_regs [8 min] DONE.
			(3) @instr_to_record setbNextInstr [8 min] DONE.
			(4) check [2], [3], [4] [15 min] DONE
			(5) debug on ops_sse.h copy part [10 min] DONE.
					works. 	
			(6) debug on save record part [15 min]
					works.
			(7) debug on register comparison part of discharge [15 min]
					Sliceat: Timestamp: 404589 0x40103c
					printf (392705 @401022 -->   399427 @0x40112b)
				done works!
			(8) check the mem case. works.
			(9) read the branch slice log. DUMP BELOW:
		-- RESOLVED mem dependency at 399146 on 0xe1339748, first bytes: 38 and 38
-- RESOLVED mem dependency at 399116 on 0xe153aee0, first bytes: 2 and 2
-- RESOLVED mem dependency at 399082 on 0xe1177798, first bytes: ffffffa0 and ffffffa0
-- RESOLVED mem dependency at 399052 on 0x81d545e0, first bytes: 2 and 2
-- RESOLVED mem dependency at 399011 on 0x800ca300, first bytes: 0 and 0
-- RESOLVED mem dependency at 398998 on 0x8055a540, first bytes: 1 and 1
-- has mem dependency at 398750 on 0x81d54667, first bytes: 4 and 2
-- RESOLVED mem dependency at 398734 on 0x81d547f0, first bytes: 0 and 0
-- RESOLVED mem dependency at 398634 on 0x81d546cc, first bytes: 0 and 0
-- has mem dependency at 398552 on 0x80042004, first bytes: ffffffe0 and ffffffe0
-- REMOVED reg dependency at 398546 for reg 49 (0 vs 0)
-- RESOLVED mem dependency at 398511 on 0x8055196c, first bytes: 1 and 1
-- RESOLVED mem dependency at 398506 on 0x8055a440, first bytes: 58 and 58
-- has mem dependency at 398461 on 0xffdff124, first bytes: 70 and ffffffb8
-- RESOLVED mem dependency at 398460 on 0xffdff128, first bytes: 0 and 0
-- RESOLVED mem dependency at 398320 on 0x81e20ae4, first bytes: ffffff88 and ffffff88
-- has mem dependency at 398319 on 0x81fde888, first bytes: ffffff90 and ffffffe0
-- RESOLVED mem dependency at 398298 on 0x81fde884, first bytes: 0 and 0
-- has mem dependency at 398142 on 0x8068ceb4, first bytes: fffffff0 and fffffff2
-- RESOLVED mem dependency at 397948 on 0xe122cfd8, first bytes: ffffffe1 and ffffffe1
-- RESOLVED mem dependency at 396474 on 0x81f1e880, first bytes: 78 and 78
-- has mem dependency at 396466 on 0x81f1e88f, first bytes: 6 and 5
-- has mem dependency at 396458 on 0x81f1e853, first bytes: d and e
-- has mem dependency at 395322 on 0x12fcfc, first bytes: ffffffeb and 9
-- has mem dependency at 394523 on 0x12fd00, first bytes: 4e and 0
-- has mem dependency at 394366 on 0x12fcf8, first bytes: ffffff96 and 70
			==>


7:30AM 11/21/2013
--------------------------------------------------------------------------------------
Task 178: improve memory dependency granularity.
--------------------------------------------------------------------------------------
	Idea: let's say i1 reads memory written by i2 and i2 is a movsb of a large region. i2 will have
	a lot of memory dependency. When propagating from i1 to i2, the mem depend link will attach the
	address and size. We will keep an additional map from timestamp to memRangeManager. When propagating i2,
	for each mem link, it checks if the associated mem range is affected.

	[1] in InstrExecRecorder, when update the call of memLink correspondingly.  up to now, all mem links have
		the information of dependence.  Optimize by saving it for movsb related only. Declare all functions necessary.
		[30 min]
9:30AM
	[2] code inspection in InstrExecRecorder [15 min] DONE.
	[3] in Trace::full_slice, when propagating the memory link, for the destination, check if it has
		more than 4 bytes of write, if yes, use the cache map to add the memory access [1 hr]
		[3.1] decalre new attributes in dependLink DONE
		[3.2] decalre Trace::isTsInSliceWritesTo DONE
		[3.3] declare Trace::tsToMrM DONE
		[3.4] update clear_in_slice_tags DONE.
		[3.5] declare updateTsInSliceWrite DONE.
		[3.6] inspect the logic again. [15 min] OK.

11:20AM
	[4] work on dependLink methods [45 min]
		[4.1] setMemLink clear the addrStart and addrEnd [5 min] DONE
		[4.2] implement addMemAccess [15 min] DONE
		[4.3] update the serialization [15  min] done 
		[4.4] unit test [10 min] DONE
		[4.5] update all access of mem link DONE.

11:50AM
	[5] work on Trace functions [70 min]
		[5.1] hasMultipleWrites [15 min] DONE.
		[5.2] isTsInSliceWritesTo [10 min] DONE
		[5.2]  clearTsToMrM [15 min] DONE.
		[5.3] updateTsInSliceWriteTo[15 m n] DONE
		[5.4] unit test of trace [15 min]
7:30PM
			[1] fix fail on memrange test1 DONE
			[2] fix fail on memrange 2. DONE
			[3] testIsReadRegFromMem, trace loading problem.  Problem: the call of tshasMultipleWrites (load old ts during
					the mock_mem of the latest ts causes the loss of data).
					The problem is with mock_mem

7:30AM 11/22/2013
--------------------------------------------------------------------------------------
Task 179: Debugging code on improving memory dependency granularity.
--------------------------------------------------------------------------------------
[1] the mock memory problem.	 Idea: pass the raw trace instead,
	[1.1] add parameter to mock memory and other related functions [12 min] DONE
	[1.2] modify mock meomry code [8 min] DONE
	[1.3] unit test case again [10 min] DONE
[2] regenerate raw trace still in non-request mode. [5 min]
[3] debug the generate full_trace
	[3.1] bp on 107
		bug1. in the loop to enlarge size. fixed.
	full trace size:   33.5MB -> 33.6MB  (nearly negligible)
	New slice criteria:
					Sliceat: Timestamp: 404572 0x40103c
					printf (392679 @401022 -->   399410 @0x40112b)
10:00AM
[4] debug the branch_slice part.
	[1] Trace.cc:559 DONE.
	[2] Trace.cc:631  Fix the updateTsInSlice logic. DONE.
	[3] check problem of link.tsDependee changes. DONE
	[4] fix the end and length problem. DONE.
	[5] fix the error on multiple read/write case. DONE
	[6] fix another tsCur case. DONE.
7:30PM
	[7] case 388428. Problem: 
		timeStamp: 388428, ins @7c910c71: repz stos es:[edi], eax, write: (start: 0x323440, end: 0x323c3f) 
		it is write only. DONE.
	[7] check bMW1 case. Fix need to add range. DONE.
	[8] fix updateTsInSliceWrite(tsTarget...) add one more parameter. The idea is to go through each active range and 
			apply it to the dependLink (see if there any intersection)

7:00AM 11/23/2013
	[1] debug clearTsToMrM  verified OK.
	[2] collect the dump.
					Sliceat: Timestamp: 404572 0x40103c
					printf (392679 @401022 -->   399410 @0x40112b)
10:00AM
	[3] check the rest of the unresolved case.
	The following are the dependency that are not resolved
--processFunction ts 399410
-- has mem dependency at 398733 on 0x81d54e17, first bytes: 4 and 2 --> KiAdjustQuantumThread
-- has mem dependency at 398535 on 0x80042004, first bytes: ffffffe0 and ffffffe0 --> SwapContext
-- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20 --> KiUnlockDispatchDatabase
-- has mem dependency at 398347 on first write on 0x81f1e88e --> KiUnwaitThread
-- has mem dependency at 398344 on first write on 0x81f1e853 --> KiUnwaitThread
-- has mem dependency at 398341 on 0x81f1e88f, first bytes: 1 and 6 --> KiUnwaitThread
-- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread
-- has mem dependency at 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea

	[4] check timestapm 399410
	It has a lot of similar dec comp patters at 0x804e3bc9 (also there is a check of jge with 4).
	Use WinDbg to check
		==> 0x804e3bc9 is part of the function nt!CcWriteBehind (wrong. checked the code not right)

	Search for the bytecode of the assembly.
	kd> s -b 80000000 88000000 fe 49 6f 8a
804f91cd  fe 49 6f 8a 41 6f 84 c0-7f 51 2a 51 6e 8b 41 44  .Io.Ao...Q*Qn.AD
805486e0  58 87 54 80 00 00 00 00-1c 87 54 80 78 bb 65 80  X.T.......T.x.e.
80673008  fe 49 6f 8a 00 00 00 00-1c 87 54 80 78 bb 65 80  .Io.......T.x.e

	Disassemble it to verify: So it's a part of KiAdjustQuantumThread
		nt!KiAdjustQuantumThread+0x19:
		804f91cd fe496f          dec     byte ptr [ecx+6Fh]
		804f91d0 8a416f          mov     al,byte ptr [ecx+6Fh]
		804f91d3 84c0            test    al,al
		804f91d5 7f51            jg      nt!KiAdjustQuantumThread+0x74 (804f9228)

		nt!KiAdjustQuantumThread+0x23:
		804f91d7 2a516e          sub     dl,byte ptr [ecx+6Eh]
		804f91da 8b4144          mov     eax,dword ptr [ecx+44h]
		804f91dd 8a4063          mov     al,byte ptr [eax+63h]
		804f91e0 feca            dec     dl
		804f91e2 3ad3            cmp     dl,bl

	Set a BP on it, it's called many times. Set a BP on scanf first (in b20.exe) and then set a BP on kiAdjustQuantumThread.
List stack frame.
	00 ba947a78 804f97e6 nt!KiAdjustQuantumThread+0x19
	01 ba947ab4 bf8aec51 nt!KeWaitForMultipleObjects+0x32c
		02 ba947d30 bf8c8594 win32k!RawInputThread+0x4f3 ****
		03 ba947d40 bf800ff4 win32k!xxxCreateSystemThreads+0x60
		04 ba947d54 8053c808 win32k!NtUserCallOneParam+0x23
		05 ba947d54 7c90eb94 nt!KiFastCallEntry+0xf8
		06 006dffe0 75b653d6 ntdll!KiFastSystemCallRet

	The C source code is shown as below:

	VOID
KiAdjustQuantumThread (
    IN PKTHREAD Thread
    )

/*++

Routine Description:

    If the current thread is not a time critical or real time thread, then
    adjust its quantum in accordance with the adjustment that would have
    occurred if the thread had actually waited.

    N.B. This routine is entered at SYNCH_LEVEL and exits at the wait
         IRQL of the subject thread after having exited the scheduler.

Arguments:

    Thread - Supplies a pointer to the current thread.

Return Value:

    None.

--*/

{

    PKPRCB Prcb;
    PKTHREAD NewThread;

    //
    // Acquire the thread lock and the PRCB lock.
    //
    // If the thread is not a real time or time critical thread, then adjust
    // the thread quantum.
    //

    Prcb = KeGetCurrentPrcb();
    KiAcquireThreadLock(Thread);
    KiAcquirePrcbLock(Prcb);
    if ((Thread->Priority < LOW_REALTIME_PRIORITY) &&
        (Thread->BasePriority < TIME_CRITICAL_PRIORITY_BOUND)) {

        Thread->Quantum -= WAIT_QUANTUM_DECREMENT; **** //corresponds to THE instruction
        if (Thread->Quantum <= 0) {

            //
            // Quantum end has occurred. Adjust the thread priority.
            //

            Thread->Quantum = Thread->QuantumReset;

            //
            // Compute the new thread priority and attempt to reschedule the
            // current processor as if a quantum end had occurred.
            //
            // N.B. The new priority will never be greater than the previous
            //      priority.
            //

            Thread->Priority = KiComputeNewPriority(Thread, 1);
            if (Prcb->NextThread == NULL) {
                if ((NewThread = KiSelectReadyThread(Thread->Priority, Prcb)) != NULL) {
                    NewThread->State = Standby;
                    Prcb->NextThread = NewThread;
                }

            } else {
                Thread->Preempted = FALSE;
            }
        }
    }

    //
    // Release the thread lock, release the PRCB lock, exit the scheduler,
    // and return.
    //

    KiReleasePrcbLock(Prcb);
    KiReleaseThreadLock(Thread);
    KiExitDispatcher(Thread->WaitIrql);
    return;
}
=================>
	From the disassembly we can infer that ECX points to the _KTHREAD structure and the offset 0x6F is the "quantum" field, 
see below:
	kd> dt _KTHREAD
ntdll!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
	...
   +0x06e PriorityDecrement : Char
   +0x06f Quantum          : Char ***
	...

	So the instruction is to decrement the quantum by 1, if quantum runs out (<0)

	[4] check timestamp 398535, similar approach
		kd> s -b 80000000 86000000 89 41 04 8b 66 28 8b 46 20 [the binary code of the 3 instructions]
		80540b1f  89 41 04 8b 66 28 8b 46-20 89 43 18 fb 8b 47 44  .A..f(.F .C...G
		It's part of the SwapContext. Could not find more information but it should be related to modifying some 
	thread or process structure.
	[5] check 398444. -- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20
		KiUnlockDispatchDatabase	
	[6] has mem dependency at 398347, 398344 (nt!KiUnwaitThread)
		80500299 004633          add     byte ptr [esi+33h],al
		8050029c 384e33          cmp     byte ptr [esi+33h],cl
		8050029f 7d03            jge     nt!KiUnwaitThread+0xa2 (805002a4)

		nt!KiUnwaitThread+0x9f:
		805002a1 884e33          mov     byte ptr [esi+33h],cl

		nt!KiUnwaitThread+0xa2:
		805002a4 c6466e00        mov     byte ptr [esi+6Eh],0
	[7] check 398293 on 0x81fde888, first bytes: ffffffe0 and 28 --> nt! KiUnlinkThread
	[8] check 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea --> this one could not find.
Up to now we have the following!!!! All related to context switch.
=======================================================================================================	
-- has mem dependency at 398733 on 0x81d54e17, first bytes: 4 and 2 --> KiAdjustQuantumThread
-- has mem dependency at 398535 on 0x80042004, first bytes: ffffffe0 and ffffffe0 --> SwapContext
-- has mem dependency at 398444 on 0xffdff124, first bytes: ffffffb8 and 20 --> KiUnlockDispatchDatabase
-- has mem dependency at 398347 on first write on 0x81f1e88e --> KiUnwaitThread
-- has mem dependency at 398344 on first write on 0x81f1e853 --> KiUnwaitThread
-- has mem dependency at 398341 on 0x81f1e88f, first bytes: 1 and 6 --> KiUnwaitThread
-- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread
-- has mem dependency at 398116 on 0x8068ceb4, first bytes: ffffffe8 and ffffffea --->nt!NtRequestWaitReplyPort
			--> it's increasing the LpcpNextMessageId
=======================================================================================================

8:30AM 11/25/2013
--------------------------------------------------------------------------------------
Task 180: Figure out the above are actually proactively called by printf, or are they part
of the context switch.
--------------------------------------------------------------------------------------
	[1] figure out 298116
		Found that it's part of NtRequestWaitReplyPort, and it's increasing variable LpcNextMessageId: see the following:
		nt!NtRequestWaitReplyPort+0x550:
		80597130 83660c00        and     dword ptr [esi+0Ch],0
		80597134 a1b4d96680      mov     eax,dword ptr [nt!LpcpNextMessageId (8066d9b4)]
		80597139 894628          mov     dword ptr [esi+28h],eax
		8059713c ff05b4d96680    inc     dword ptr [nt!LpcpNextMessageId (8066d9b4)]
		80597142 750a            jne     nt!NtRequestWaitReplyPort+0x56e (8059714e)

		nt!NtRequestWaitReplyPort+0x564:
		80597144 c705b4d9668001000000 mov dword ptr [nt!LpcpNextMessageId (8066d9b4)],1

		nt!NtRequestWaitReplyPort+0x56e:
		8059714e 83662c00        and     dword ptr [esi+2Ch],0

	[2] set a BP and see how it's invoked
		[1] be on 0x401010 and 401022
		[2] be at 0x8059713c and see how many times it's invoked
			Note: use bp /p process_id 0x8059713c (otherwise there are too many distractions)
			It shows that the scanf triggers 3 calls of NtRequestWaitReply
		Note that pt in WinDbg is not that reliable. 

	[3] study 398116 again. See how it's depended.
		Dump below:
		======================================
		Reverse ID: 0, ts: 398116,       Type: MEM_LINK ins @8057882c: inc  [-0x7F97314C]
		Reverse ID: 1, ts: 400885,       Type: MEM_LINK ins @8057882c: inc  [-0x7F97314C]
		======================================
					Sliceat: Timestamp: 404572 0x40103c
					printf (392679 @401022 -->   399410 @0x40112b)

		400885 is another ntRequestWaitReply in printf, and it is reading the timestamp for updating the global
	LpcNextMessageId. 400885 is the dependent of the next instruction relying on the resulting EFLAGS value.

	
8:30AM 11/26/2013
--------------------------------------------------------------------------------------
Task 181:  fix the back tracking algorithm
--------------------------------------------------------------------------------------
		Dump below:
		======================================
		Reverse ID: 0, ts: 398116,       Type: MEM_LINK ins @8057882c: inc  [-0x7F97314C]
		Reverse ID: 1, ts: 400885,       Type: MEM_LINK ins @8057882c: inc  [-0x7F97314C]
		======================================
					Sliceat: Timestamp: 404572 0x40103c
					printf (392679 @401022 -->   399410 @0x40112b)

	[1] check the reverse_trace algorithm [10 min]
			Problem: 403090 has no reverse pointer. check how it is included.
	[2] check how 403090 is included in slice. [20 min]
			Problem is the realDepenee is set to -1.
	[3] modify hasDataDependee and set the realDataDependee [20 min] The new dump is shown below:

	[5] check 404571 why it's need_visit. [30 min]
		It's not via Trace->setInSlice. set a BP on ier->setInSlice
		Got ot fix the reverseLinkType in clear link.
		--> trouble. could not get it cleare.
	[6] 2nd attempt:
		set bp on IER::serialize.
		Found that type 6 is caused by SOC(404572,404572) -> setNeedControl
		update correspondingly.
		New dump below:
		...
Reverse ID: 59, ts: 404512, 		 Type: NEED_VISIT	ins @40159d: mov	[ebp-0x4], 0xFFFFFFFE //mem
Reverse ID: 60, ts: 404552, 		 Type: REGI_LINK	ins @4015a9: mov	eax, [ebp-0x1C] //read eax
Reverse ID: 61, ts: 404568, 		 Type: MEM_LINK	ins @40102f: mov	[ebp-0x4], eax //read mem
Reverse ID: 62, ts: 404571, 		 Type: REGI_LINK	ins @401038: cmp	[ebp-0x4], 0x61 // needed by jz at 404572

	New problem area: 404512 it is depended by 404552. which is trange set bp on it. Found that it's ok. It's because skipping
function calls.
	[7] check again.
======================================
   reverse trace for ts: 398116
======================================
Reverse ID: 0, ts: 398116, 		 Type: MEM_LINK	ins @8057882c: inc	[-0x7F97314C]
Reverse ID: 1, ts: 400885, 		 Type: MEM_LINK	ins @8057882c: inc	[-0x7F97314C]
Reverse ID: 2, ts: 403090, 		 Type: ALL_FUNCTION	ins @8057882c: inc	[-0x7F97314C]
Reverse ID: 3, ts: 403937, 		 Type: MEM_LINK	ins @8056a658: repz movs	es:[edi], ds:[esi]
Reverse ID: 4, ts: 404254, 		 Type: REGI_LINK	ins @7c81ac42: mov	eax, [ebp-0x8C]
Reverse ID: 5, ts: 404255, 		 Type: MEM_LINK	ins @7c81ac48: mov	[esi], eax
Reverse ID: 6, ts: 404279, 		 Type: REGI_LINK	ins @7c8018ce: test	[ebp-0x1C], 0x01
Reverse ID: 7, ts: 404280, 		 Type: CONTROL_LINK	ins @7c8018d2: jz	0x0000003C
Reverse ID: 8, ts: 404281, 		 Type: NEED_VISIT	ins @7c8018d4: mov	[ebp-0x4], ebx
Reverse ID: 9, ts: 404284, 		 Type: CONTROL_LINK	ins @7c8018dd: jnz	0x00000007
Reverse ID: 10, ts: 404285, 		 Type: NEED_VISIT	ins @7c8018e4: or	[ebp-0x4], 0xFF
Reverse ID: 11, ts: 404286, 		 Type: CONTROL_LINK	ins @7c8018e8: jmp	0x00000026
Reverse ID: 12, ts: 404287, 		 Type: REGI_LINK	ins @7c80190e: cmp	esi, ebx
Reverse ID: 13, ts: 404288, 		 Type: CONTROL_LINK	ins @7c801910: jge	0xFFFFFF80
Reverse ID: 14, ts: 404289, 		 Type: NEED_VISIT	ins @7c801890: xor	eax, eax
Reverse ID: 15, ts: 404291, 		 Type: ESP_LINK	ins @7c801893: call	0x00000C78
Reverse ID: 16, ts: 404294, 		 Type: REGI_LINK	ins @7c802515: pop	ecx
Reverse ID: 17, ts: 404299, 		 Type: ESP_LINK	ins @7c80251a: push	ecx
Reverse ID: 18, ts: 404300, 		 Type: ESP_LINK	ins @7c80251b: ret	
Reverse ID: 19, ts: 404301, 		 Type: ESP_LINK	ins @7c801898: ret	0x0014
Reverse ID: 20, ts: 404377, 		 Type: NEED_VISIT	ins @4094da: pop	edi
Reverse ID: 21, ts: 404381, 		 Type: CONTROL_LINK	ins @4094de: ret	
Reverse ID: 22, ts: 404382, 		 Type: NEED_VISIT	ins @409596: add	esp, 0x0C
Reverse ID: 23, ts: 404384, 		 Type: CONTROL_LINK	ins @40959c: jmp	0x00000019
Reverse ID: 24, ts: 404385, 		 Type: NEED_VISIT	ins @4095b5: mov	[ebp-0x4], 0xFFFFFFFE
Reverse ID: 25, ts: 404413, 		 Type: REGI_LINK	ins @4095c1: mov	eax, [ebp-0x1C]
Reverse ID: 26, ts: 404431, 		 Type: REGI_LINK	ins @4040ef: cmp	eax, 0xFF
Reverse ID: 27, ts: 404432, 		 Type: CONTROL_LINK	ins @4040f2: jz	0x00000088
Reverse ID: 28, ts: 404433, 		 Type: REGI_LINK	ins @4040f8: test	[esi+0xC], 0x82
Reverse ID: 29, ts: 404434, 		 Type: CONTROL_LINK	ins @4040fc: jnz	0x00000053
Reverse ID: 30, ts: 404435, 		 Type: MEM_LINK	ins @4040fe: push	esi
Reverse ID: 31, ts: 404440, 		 Type: REGI_LINK	ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 32, ts: 404443, 		 Type: REGI_LINK	ins @4041b2: mov	eax, [eax+0x10]
Reverse ID: 33, ts: 404447, 		 Type: REGI_LINK	ins @404105: cmp	eax, 0xFF
Reverse ID: 34, ts: 404448, 		 Type: CONTROL_LINK	ins @404108: jz	0x00000032
Reverse ID: 35, ts: 404449, 		 Type: MEM_LINK	ins @40410a: push	esi
Reverse ID: 36, ts: 404454, 		 Type: REGI_LINK	ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 37, ts: 404457, 		 Type: REGI_LINK	ins @4041b2: mov	eax, [eax+0x10]
Reverse ID: 38, ts: 404461, 		 Type: REGI_LINK	ins @404111: cmp	eax, 0xFE
Reverse ID: 39, ts: 404462, 		 Type: CONTROL_LINK	ins @404114: jz	0x00000026
Reverse ID: 40, ts: 404463, 		 Type: NEED_VISIT	ins @404116: push	edi
Reverse ID: 41, ts: 404465, 		 Type: MEM_LINK	ins @404118: call	0x00000079
Reverse ID: 42, ts: 404474, 		 Type: ESP_LINK	ins @4041b6: ret	
Reverse ID: 43, ts: 404476, 		 Type: MEM_LINK	ins @404120: push	esi
Reverse ID: 44, ts: 404482, 		 Type: REGI_LINK	ins @404196: mov	eax, [ebp+0x8]
Reverse ID: 45, ts: 404483, 		 Type: REGI_LINK	ins @404199: test	eax, eax
Reverse ID: 46, ts: 404484, 		 Type: CONTROL_LINK	ins @40419b: jnz	0x00000017
Reverse ID: 47, ts: 404485, 		 Type: REGI_LINK	ins @4041b2: mov	eax, [eax+0x10] //DATA
Reverse ID: 48, ts: 404488, 		 Type: NEED_VISIT	ins @40412d: and	eax, 0x1F //control
Reverse ID: 49, ts: 404494, 		 Type: CONTROL_LINK	ins @404138: jmp	0x00000007 //control
Reverse ID: 50, ts: 404495, 		 Type: NEED_VISIT	ins @40413f: mov	al, [eax+0x4] //control
Reverse ID: 51, ts: 404498, 		 Type: CONTROL_LINK	ins @404146: jnz	0x00000009 //control/
Reverse ID: 52, ts: 404499, 		 Type: REGI_LINK	ins @40414f: cmp	[esi+0x18], 0x00000200 //data
Reverse ID: 53, ts: 404500, 		 Type: CONTROL_LINK	ins @404156: jnz	0x00000017 //control
Reverse ID: 54, ts: 404501, 		 Type: NEED_VISIT	ins @40416d: mov	ecx, [esi] // control
Reverse ID: 55, ts: 404506, 		 Type: CONTROL_LINK	ins @404178: jmp	0x00000016 //control
Reverse ID: 56, ts: 404507, 		 Type: NEED_VISIT	ins @40418e: pop	esi //control
Reverse ID: 57, ts: 404509, 		 Type: CONTROL_LINK	ins @404190: ret		//function included	
Reverse ID: 58, ts: 404510, 		 Type: NEED_VISIT	ins @401599: pop	ecx //ok. block basic
Reverse ID: 59, ts: 404512, 		 Type: NEED_VISIT	ins @40159d: mov	[ebp-0x4], 0xFFFFFFFE //because skipping function call
Reverse ID: 60, ts: 404552, 		 Type: REGI_LINK	ins @4015a9: mov	eax, [ebp-0x1C] //ok.
Reverse ID: 61, ts: 404568, 		 Type: MEM_LINK	ins @40102f: mov	[ebp-0x4], eax //ok.
Reverse ID: 62, ts: 404571, 		 Type: REGI_LINK	ins @401038: cmp	[ebp-0x4], 0x61 //ok.

======================================
   END OF reverse trace for ts: 398116
======================================

7:30AM 11/27/2013
--------------------------------------------------------------------------------------
Task 182:  Algorithm Design how to handle printf
--------------------------------------------------------------------------------------
	[1] simplified problem

		scanf
			[1] do something serious
			[2] send request
				[3] OS routine: ntWaitReplyPort -> increase id, if id<0 then ...; if id>0 then ...


		printf 
			[1] do something else
			[2] send request
				[3] OS routine: ntWaitReplyPort -> increase id, if id<0 then ...; if id>0 then ...

	The global id chains the two together. 
	No the problem is: can we remove the printf block?
	Algorithm idea: [1] identify that particular INC instruction as non-interfering instruction 
					[2] incremental removal. Test remove printf and see if it affects the trace generation.

	[2] check 398293 and verify if the algorithm works.
	(-- has mem dependency at 398293 on 0x81fde888, first bytes: ffffffe0 and 28 -> KiUnlinkThread)

	Code is shown as below:
		805001a6 095154          or      dword ptr [ecx+54h],edx ; thread->waitStatus |= edx param
		805001a9 8b415c          mov     eax,dword ptr [ecx+5Ch] ; eax <- waitBlockList
		805001ac 56              push    esi

		nt!KiUnlinkThread+0x7:
		805001ad 8b10            mov     edx,dword ptr [eax] ; edx <- _KWAIT_BLOCK.WaitListEntry
		805001af 8b7004          mov     esi,dword ptr [eax+4] ; esi<- KWAIT_BLOCK.WaitListEntry.B_LINK
		805001b2 8916            mov     dword ptr [esi],edx ***; this is to perform the removal. prevElement.B_LINK = cur.F_Link
		805001b4 897204          mov     dword ptr [edx+4],esi ; this is to peform the removal. cur.B_Link = ...
		805001b7 8b4010          mov     eax,dword ptr [eax+10h]
		805001ba 3b415c          cmp     eax,dword ptr [ecx+5Ch]
		805001bd 75ee            jne     nt!KiUnlinkThread+0x7 (805001ad)

	The psudo code can be found from reactOS.	We can infer that ecx must be pointing to _KTHREAD,
	thus ecx+54 is the wait staus. The mov dword ptr [esi], edx is the operation to remove a block from the waiting list.

	Check how it's depended by the instruction in printf, could not find the instruction. It's the instruction at 
	804fb1b4!!! 
----------------

	nt!KeReleaseSemaphore:
		804fb172 8bff            mov     edi,edi
		804fb174 55              push    ebp
		804fb175 8bec            mov     ebp,esp
		804fb177 51              push    ecx
		804fb178 53              push    ebx
		804fb179 56              push    esi
		804fb17a 57              push    edi
		804fb17b ff1514774d80    call    dword ptr [nt!_imp__KeRaiseIrqlToDpcLevel (804d7714)]
		804fb181 8b7508          mov     esi,dword ptr [ebp+8]
		804fb184 8b5e04          mov     ebx,dword ptr [esi+4]
		804fb187 8ac8            mov     cl,al
		804fb189 8b4510          mov     eax,dword ptr [ebp+10h]
		804fb18c 8d3c03          lea     edi,[ebx+eax]
		804fb18f 3b7e10          cmp     edi,dword ptr [esi+10h]
		804fb192 884dff          mov     byte ptr [ebp-1],cl
		804fb195 7f04            jg      nt!KeReleaseSemaphore+0x29 (804fb19b)

		nt!KeReleaseSemaphore+0x25:
		804fb197 3bfb            cmp     edi,ebx
		804fb199 7d0f            jge     nt!KeReleaseSemaphore+0x38 (804fb1aa)

		nt!KeReleaseSemaphore+0x29:
		804fb19b e868570400      call    nt!KiUnlockDispatcherDatabase (80540908)
		804fb1a0 68470000c0      push    0C0000047h
		804fb1a5 e8566b0400      call    nt!ExRaiseStatus (80541d00)

		nt!KeReleaseSemaphore+0x38:
		804fb1aa 85db            test    ebx,ebx
		804fb1ac 897e04          mov     dword ptr [esi+4],edi
		804fb1af 7511            jne     nt!KeReleaseSemaphore+0x50 (804fb1c2)

		nt!KeReleaseSemaphore+0x3f:
		804fb1b1 8d4608          lea     eax,[esi+8]
		804fb1b4 3900            cmp     dword ptr [eax],eax *** // 
		804fb1b6 740a            je      nt!KeReleaseSemaphore+0x50 (804fb1c2)

--------------
	We need to figure out what is esi, our guess is that it is pointing to _KSEMAPHOARE. Thus (offset 0x10 is the "limit"
	attribute). esi+8 is the _KSEMAPHORE -> waitListHead!!!

	So printf is using the queue structure and scanf is using it too.

	
1:30PM 11/29/2013
	[1] Continue to figure out the logic: 
		(1) is it really not preserving the value in the queue list? Or is it a bug?
	[2] modify the code so that it dumps the first 4 bytes.
					Sliceat: Timestamp: 404572 0x40103c
					printf (392679 @401022 -->   399410 @0x40112b)
	-- has mem dependency at 398293 on 0x81fde888, first 4 bytes: 81e20ae0 and 81e1f928, size: 4

	[3] use WinDbg to verify if the above memory recording is true. Information of 398293 is shown below
		805001b2 8916            mov     dword ptr [esi],edx ***; this is to perform the removal. prevElement.B_LINK = cur.F_Link
		[2] Use windbg set bp at 0x401010, 0x401022, and 402027 (begin and end of call printf)
			then ba e1 on 0x805001b2 and find out the address that is writing to.
			Start the program again and check the contents at 0x401022 and 0x401027
			[2.1]  problem: 0x805001b2 is hit twoo many times. While in dump log, it's only hit twice during
					the printf call. 
				Address of [esi]: 89915028, 898989e0, 89890528, 89925968, 89b3a990 
				Most likely, there are too many thread switches.
			Related calls:
					nt!KiUnlinkThread -->  seems to be removing all waiting objects in the thread
				baccfe6c 8050040b nt!KiUnwaitThread+0x12 --> make thread not waiting on nay object, make it ready to run.
				baccfe98 804ff18c nt!KiWaitTest+0xab --> test the object waited by threads and release them 
				baccffa4 804ff34b nt!KiTimerListExpire+0x7a
				baccffd0 80540d5d nt!KiTimerExpiration+0xaf
				baccfff4 80540a2a nt!KiRetireDpcList+0x46
				baccfff8 b6ba7854 nt!KiDispatchInterrupt+0x2a --> interrupt dispatch

			Check reactos code: 

9:00AM 12/02/2013.
--------------------------------------------------------------------------------------
Task 183:  solve the printf problem
--------------------------------------------------------------------------------------
	Idea: ignore instructions belong to sysenter. The only problem is that we'll not be able to capture interrupt handler
	changes by malware. So later, this should be set as an option. But implement it first.

	9:30AM
	[1] Review the checkRecordLogic.  [30 min]
		[1.1] when bJustReceivedInterrupt will be set to true? [15 min] It's triggered in seg_helper.cc::do_interrupt_all(), any
			interrupt will trigger this function. But we are not sure if sysenter will trigger it.
			[1.1.1] design a lab to verify it
				7c90eb8b and 7c90eb8d (sysenter)
				[1] set a conditional bp on 7c90eb8b and 7c90eb8d in Trace::handle_instr
				[2] once it's hit set a breakpoint on handle_interrupt see if its ever hit.
			Conclusion: sysenter does not trigger the do_interrupt_all.
		[1.2] check bDelayedOneStep. [15 min] --> ok.
		[1.3] how is bTracePhyMem set?
			[1] bTracePhy is true and the capturing of memory is done through handle_phy_mem_access, which is called
				by handle_trace_mem for every memory access. But currently it needs bRecordEnabled.
			[2] bTracePhyMem is set to true in setPhyMemTraceMode, it currently records the cr3Modify eip. It is called
					in ops_sse.h:helper_trace2. When an instruction to be found to modifyCR3, the mode is set.
			[3] bTracePhyMem is set to false when the Trace it comes back (from the other cr3) and the eip is not the
					same as the cr3 switch.

	[2] Problem 1: decide how to deal with cr3 switch. CR3 switch only occurs in a system call when getchar() waits for 
		another process which collects the input and then transfer the user input into the address space. So the original
		cr3 detection can be disabled. Need to declare a new mode in all related functions. If it is to
		trace kernel mode, then use the original idea; otherwise, bTracePhyMem is turned on at a system call.

	11:00AM
	[3] Implementation:
		[1] introduce a boolean variable trace_kernel in config.txt and parse it and declare a boolean var in Trace
			for trace_kernel as an instance attribute. Need to handle all instructor and copy functions. [30 min] DONE.

		[2] in checkRecordStatus modify the algorithm. Based on opcode 134 (sysenter). Enter the tracing mode.
		Based on bTraceKernel, decide if to enalbe or disable bTracePhyMem [30 min] DONE.

	11:50AM
		[3] in setPhyMemTraceMode add the protection based on bTraceKernel and in Trace::handle_instr (the logic for disableing
				bTracePhy). [30 min] DONE.

		[4] debug: verify the bTraceKernel mode first [20 min] Verify if syscalls are still there. DONE.
		
		[5] debug: verify the bTraceKernel=false mode [30 min]
			[5.1] found that the return is not right. Should capture sysexit. opcode: 0x135
	3:45PM
			[5.2] fix a problem. the code terminates too early. DONE.
			[5.3] the opcode are not right, check again.
				[1] find the timestamp of
						sysenter: 987 0xf 0x34 
						sysexit: 5605 0xf 0x35
				[2]  make the change still not working. (check opcode 2nd byte, still not working).

9:30AM 12/03/2013.
			[5.4] sysexit problem, not returning to an address that is close. SOLVED.
			Now full dump is only 31MB, almost 60% of the original.
			[5.5] modify to include sysenter. DONE.
			[5.6] strangely only improved 7%. check.
----------------------------
			Problem: error in reading memory.
				--processFunction ts 238737
				ERROR in reading mem 0x12fd0f
				ERROR reading mem at 237748 or 231647, set as hasDataDepend
			
8:30AM 12/04/2013
--------------------------------------------------------------------------------------
Task 184:  check the memory reading problem on 0x12fd0f.
--------------------------------------------------------------------------------------
	[1] new stats
					Sliceat: Timestamp: 240572 0x40103c
					printf (235876 @401022 -->   238737 @0x40112b)
	[2] set a BP at 2377438 DONE.
	[3] check data
		timeStamp: 237748, ins @402543: mov [ebp-0x211], al
 		write: (start: 0x12fd0f, end: 0x12fd0f) , DEPLINKS:  , R: 237722 and EBP value: 0x12ff20, R: 237747
		timeStamp: 231647, ins @7c911533: call  0xFFFFF699
 		write: (start: 0x12fd0c, end: 0x12fd0f) , ESP: 0x12fd10 -> 0x12fd0c , DEPLINKS:  , R: 231646 and ESP value: 0x12fd0c

		In the branch slice, it tries to read 0x12fd0f.
		It's the reading on 231647 fails. Problem is with the read mem algorithm.
11:00AM
	[4] implementation  [20 min] DONE.
	[5] debug [20 min] 
	[6] result: problem is still memory mismatch:
		has mem dependency at 237748 on 0x12fd0f, first 4 bytes: 7c and 0, size: 1
	[7] check windbg on @402543 and see its functions and see why it's depended on.
		timeStamp: 237748, ins @402543: mov [ebp-0x211], al
 		write: (start: 0x12fd0f, end: 0x12fd0f) , DEPLINKS:  , R: 237722 and EBP value: 0x12ff20, R: 237747
		timeStamp: 239766, ins @7c913309: mov   eax, [esi+0x20]
 		read: (start: 0x12fd0c, end: 0x12fd0f) , DEPLINKS:  , R: 239694 , M: 231647 , M: 237748 , C: 239765 ESP: 0x12fca4 EBP: 0x12fcac
		Observation:
				[a] the instruction at @402543 copies the printf string byte by byte into 0x12FD0F. (it is also accessed in the
					same loop)
				[b] setting a hardware breakpoint also verifies that it is actually accessed by the instruction in
					scanf @7c903309. It is a part of CsrClientCallServer.
	[8] figure out the logic of @7c903309.
			In instruction @7c903309, mov eax, [esi+0x20],
			the esi points to the PCSR_API_MESSAGE  	ApiMessage,
		Using windbg, we are able to figure out offset 0x20.
		From the ReactOS source code we can infer that offset 0x20 is ApiMessage->Status. The problem is: why the
			byte of 0x12fd0f is not overwritten by the ClientCallServer itself?
		Reading ReactOS document again -->
			CsrClientCallServer only sets the status when the status is a failure!
			So when it returns, it directly returns the ORIGINAL whatever data is contained in ApiMessage.
			It seems that CsrClientCallServer fill into the fields of the ApiMessage and then forward the request.
-------------------------------------------------------------------------------
	Now the question is should ApiMessage be ever INITIALIZED???
	check WinXP image the value of it.

9:00AM 12/05/2013
	[9] check why ApiMessage->status is never initialized
		[9.1] set a BP at 0x7c903309 after the call of scanf, and check what's the function that's calling it.
			kernel32.readConsoleA -> kernel32.7c8713f9 -> CsrClientCallServer -> @7c903309 (return ApiMsg->status which is
				not initialized)
			The readConsoleA declares APIMessage as a local variable and then calls CsrClientCallServer. But the APIMessage ->Status
		itself is never initialized. APIMessage->Status is located at 0x12fd0c, and APIMessage is located at 0x12fcec.

		[9.2] guess: APIMsg->Status maybe set in the NtRequestAPIReplyPort.
			Experiment: set 0x12fd0c to 1 and see what's the change.
			Verified: NtRequestAPIReplyPort does clear the status to 0. Now the problem is that if the value is originally 0, will
				the clear-to-0 action be still executed?

		[9.3] Delve into NtRequestAPIReplyPort and check where is APIMessage->Status is cleared.
			hardware breakpoint does not work. Coz it is modified in kernel mode.
		[9.4] read the trace and find out if the memory writes are recorded for the sysenter instruction.
			[1] locate the timestamp of the following:
					Sliceat: Timestamp: 240572 0x40103c
					printf (235876 @401022 -->   238737 @0x40112b)
					scanf (238738 @40102a   --> 240567 @401092)
						call CsrClientCallServer 239685 @7c8715bb
							call NtRequestWaitReplyPort 239734 @7c9132f3
								sysenter 239739 @7c90eb8d       <---- however, not mem writing recorded for this instruction!!! >

		[9.5] debug the checkRecordStatus 
			[1] check how record mem is done. --> comment out the check on bRecordEnabled in handle_phy_mem_access
			[2] regenerate raw trace and full trace.
			[3] problem: handle_phy_mem is called but the translation from ha to va always yields 0xFFFFFFFF.
				Reason: page map is not built yet. It's built only when the instruction is modifying CR3. Change the logic when
				it's syscall.
		[9.6] modify isModifyCR3
			[1] add function isNeedCollectPhyMem(cr3, eip, byte1, byte2)
			[2] add the logic of checking syscall  DONE.
			[3] fix the handle_mem_read.
				


9:00AM 12/06/2013
--------------------------------------------------------------------------------------
Task 185:  Fix problem of memory recording
--------------------------------------------------------------------------------------
	[1] complain about mismatch. solved.
10:20AM
	[2] solve the memory capacity problem.
		[1] declare a CachedMap for phyMem [5 min] DONE.
		[2] add functions for enablePhyMem tracing and disable phyMemTracing [8 min] DONE
		[3] modify handle_mem_read logic, if the address is already written, don't add it [10 min] DONE
		[4] modify handle_mem_write, add the tracing [10 min] DONE
		[5] debug and see the result. [10 min] DONE.
	[3] for !bTraceIntoSyscall mode, don't trace those kernel memory.
		[1] read about page table directory. In a 32-bit page entry, bits 31:20 is the linear address,
				bit 2 indicates if it is superuser access only (value 0). [5 min]
		[2] read build_page_pae. It seems to be skipping the non-user page already!
				break on mem_helper.c:233
7:20AM 12/07/2013.
		[3] add a function is kernelMem in trace.h  [10min] FIXED.

9:30 12/09/2013
	[4] regenerate the trace 
					Sliceat: Timestamp: 240565 0x40103c
					printf (235870 @401022 -->   238731 @0x40112b)
					scanf (238733 @40102a   --> 240560 @401092)
	[5] branch slice.
		Now has cleared the memory dependency, which is good. Still has a number of unknown dependency.

10:00 12/09/2013
--------------------------------------------------------------------------------------
Task 186:  Fix the unknown dependencies
--------------------------------------------------------------------------------------
	[1] data dump below, analyze each.
		NOTE: 239732 (belongs to scanf, and it's a sysenter which has a lot of reads)
		-- has unknown dependency at 237742 depended by 239732 (AdvMemRead) on 12fd0f.
		-- has unknown dependency at 237737 depended by 239732 (AdvMemRead) on 12fcfc-12fcff
		-- has unknown dependency at 236938 depended by 239732 (AdvMemRead) on 12fd00-12fd04
		-- has unknown dependency at 236783 dependedn by 239732
		-- has unknown dependency at 236781 -same 
		-- has unknown dependency at 236652 -same 
		-- has unknown dependency at 236548 -same
		-- has unknown dependency at 236544 -same 
	[2] check dependency 237737. See if it's really being dependended by scanf
		237737:@40192a inc [esi]
		-- has unknown dependency at 236107

		Use WinDbg, set the BPs at the following sequence:
		@40192a (ba), @40112b (bp) and then memory bp on 12fcfc
		Note to use /p to indicate the process to capture
			e.g., ba r4 /p 898338d0 0x12fcfc

		Problem: the instruction @40192a is hit many times.
		check its logic?
		no debugging information.
	[3] regenerate b21.exe and place it.
					Sliceat: Timestamp: 240981 0x40103d
					printf (235862 @40101d -->   238723 @0x4011fb)
					scanf (238728 @40102e   --> 240977 @40110e)
--processFunction ts 238723
Cannot find RV record for 237744
ERROR reading mem at 237744 or 204067, set as hasDataDepend
 -- set ts 235862  @0x40101d in slice
	 	 Function included! add into slice RET 238723 @0x4011fb

---> debug...

8:30AM 12/11/2013
	[4] check the reading mem problem. Regenerate the data
		[1] request mode 1. DONE.
		[2] request mode 0.
					Sliceat: Timestamp: 240983 0x40103d
					printf (235863 @40101d -->   238724 @0x4011fb)
					scanf (238729 @40102e   --> 240979 @40110e)
--------------------------------------------------------
--processFunction ts 238724
Cannot find RV record for 237745
ERROR reading mem at 237745 or 204068, set as hasDataDepend
--------------------------------------------------------
		[3] debug: set conditional breakpoint. [30 min]
			it is the read of 237745 error. but b1 is ok (204068)
			Records below:
				timeStamp: 237745, ins @404c6b: and	[eax+0x70], 0xFD
				 write: (start: 0x321f00, end: 0x321f03) , DEPLINKS:  , R: 237744 
				timeStamp: 204068, ins @40baa1: and	[ecx+0x70], 0xFD
				 write: (start: 0x321f00, end: 0x321f03) , DEPLINKS:  , R: 204067 
		[4] debug again and check why RV record cannot be found.
			The problem is that @404c6b is not recorded at all., but @40baa1 is recorded. set a bp on it.
			check when incNeedRecord is called.
				The problem is that both instructions do have their records in rrProcessor
				at 0x404c6f it is actually recording.
			Now this time it is able to locate the RV record, but the countMem is 0!
		[5] would it be caused by the change of InstrExecRecorder::handle_mem_read/write (added one parameter of bTracePhyMem)
			unit itest it. --> found problem with RecordValueProcessor. 
		check later.

9:00AM 12/12/2013.
		[6] check the unit test problem. FIXED.
		[7] regenerate the raw trace, full trace and branch trace again. Still could not fix the problem. Each time
			the branch is generated differently (should should be a deterministic process). Still reports cannot
			find RV record.
10:00AM
		[8] recompile from scratch.
			mode 1: done. Still the problem of cannot find RV record 237742 @404c6b
		[9]  check @404c6b in log and see how many times it has occured. It has only occured twice.
		[10] debug design.
			[1] condition bp on eip of @404c6f and trace into it and see if it is calling rvProcessor->addRecord; also 
				bp on Trace::handle_value_alert
				Problem: this->bNextInstrToRecord is not set!
11:30AM
			[2] check @404c6b is ever contained in the request. Verified no.
			[3] check why @404c6b is not included in the request
				[3.1] start in mode1 again and generate the raw trace and full trace. DONE
				[3.2] in branch trace, set conditional bp to handle function 0x4011fb, then set conditional bp on @404c6b DONE
					Sliceat: Timestamp: 240980 0x40103d
					printf (235861 @40101d -->   238722 @0x4011fb)
					scanf (238727 @40102e   --> 240976 @40110e)
						237742 (@404c6b) is not added because it's not in slice. 	 
					--> verify: at the end of branch_slice check this->rrProcessor->getRR_ID_ForEIP(0x404c6b)
					Verified it's not there.
				[3.3] on mode0, generate the raw trace and full trace
				[3.4] check @404c6b is every included. The first time it's not included.
8:30AM 12/13/2013
				[3.5] check the second time. generate the trace and see we got the getRV error. [30 min]
				It's the second time, that it reports it cannot find RVRecord for 237743, the problem is why it needs to? 
		237743 is not in the request anyway.
				Debug:
				set BP in hasDataDependency and then check ts ==237743, the problem is that 237743 is now set as in slice now.
					[3.5.1] check why 237743 is set in slice.
						[1] read the log file verified that 237743 is not set in slice yet.
						[2] at the beginning of branch_slice, check if 237743 is in slice; then at the beginning of [30 min]
								hasDataDependency check if 237743 is in slice.
								from the beginning it is in slice.
								slice for 237743 is cleared at the beginning of branch_slice,
								after init_data slice it is also false.
9:30AM
						[3] 20 min check when it is set to true at 237743 [30 min] 
								set a conditional bp on InstrExecRecorder->setInSlice()
								It's added because 238982 has mem dependency on 237743
								Fix one bug in Trace::setInSlice()
							Information: 237743 and 238982
								timeStamp: 237743, ins @404c6b: and [eax+0x70], 0xFD
								 write: (start: 0x321f00, end: 0x321f03) , DEPLINKS:  , R: 237742
								timeStamp: 238982, ins @401c25: test    [eax+0x70], 0x02
								 read: (start: 0x321f00, end: 0x321f00) , DEPLINKS:  , R: 238981 , M: 237743
						[4] comparative study. [30 min]
							genreate the raw trace and full trace and the log twice.
							Sliceat: Timestamp: 240980 0x40103d
							printf (235861 @40101d -->   238722 @0x4011fb)
							scanf (238727 @40102e   --> 240976 @40110e)
							1st branch pass: 61.83%.
							2nd branch pass: 65.39%
11:00AM
						[5] compare the difference in slice logs, use binary search [20 min]
							The first line of difference is in 27520
							log_1: ! generate data slice for 240968 to reach bridge 240977
							log_2: ! generate data slice for 240975 to reach bridge 240977
						the difference is the generate slice for:  this is the first time "to reach bridge 240977" occurs.
11:20AM
						[6] debug "to reach bridge 240977" and set a conditional bp there. [20 min]
								--> the next action is to sm.addTS(240967)
								the new SOC generated is tsStart = 238735, tsEnd = 240974, this caues the 
								bridge operation 20975 to 20977
								The question: why the tsEnd is not 240968??? --> the only difference is the call of ii->getCountInSlice()
								check if clear_slice clears the count. Find the fxxx bug! Costed more than 6 hours of debugging!
								the clearCount is called after updateCache()!
						SOLVED! Now check the rest of unknown dependencies.





--------------------------------------------------------------------------------------
Task 187:  Fix the unknown dependencies
--------------------------------------------------------------------------------------
	[1] data dump below, analyze each.
							Sliceat: Timestamp: 240980 0x40103d
							printf (235860 @40101d -->   238721 @0x4011fb)
							scanf (238726 @40102e   --> 240976 @40110e)
	[2] check why there is unknown dependency, or at least fix the dump message.
			The problem is that 238717 is inslice and it has a reverse_pointer>tsRet, but it is neither of the
				isNeededForMem(), isNeededForReg() or isNeededForVisit() case.
			Got to check why it's included. Check the log file.
--processFunction ts 238722
-- has unknown dependency at 238717; # it is a "pop ebx", it is in slice because it is SOC end.
							--> in this case, it should be ignored.
-- has unknown dependency at 238715  # it is "pop edi", needed by 238728 (in scanf) for register. Problem: 238728 did
											add the register dependency to it, why it's not set for isNeededForReg()?
											set a bp and see how 238715 is added.
										--> found the bug. In real data dependency, forgot to set that it is needed for reg.
				Now need to regenerate the raw and full traces.
9:00AM 12/15/2013
	[3] check init_data slice has bug.
			debug into Trace::init_data slice for timestamp 240980
			Problem: 240978 has not data read!!!
			re-generate the raw trace and check @401036 has memory read.
				After completely recompile the project it's recorded.
			Now generate the branch trace.
				Need to recollect the raw trace.
		The first couple of unknown dependencies are listed as below:
							Sliceat: Timestamp: 240980 0x40103d
							printf (235860 @40101d -->   238721 @0x4011fb)
							scanf (238726 @40102e   --> 240976 @40110e)
--processFunction ts 238721
-- has unknown dependency at 238716 - included by socend, can be excluded. -- DISCHARGED
*** -- REMOVED reg dependency at 238714 for reg 5 (0 vs 0) - just solved
***-- has reg dependency at 238714 for reg 8 (322ed8 vs 9) ----- register edi/edi not right. register 5 esp value does not seem right!
-- has unknown dependency at 238711 -- it's writing to fs:[0], why it's not in protect mem list?
		-- we are stuck at how it is setInSlice now
		it is included becamuse multiple occurance of the same instruction.
-- has unknown dependency at 238675 - mov edi, edi. , dependended by 238686i (but it's within the printf function) why should it be included?; it's included because of SOC, again can be discharged. -- DISCHARGED
-- has unknown dependency at 238671 -- add eax, 0x20 only depended by 238672.  Set in slice as SOCEnd.
-- has unknown dependency at 238668
-- has unknown dependency at 238642
-- has unknown dependency at 238636
-- has unknown dependency at 238635
-- has unknown dependency at 238634
-- has unknown dependency at 238631

9:00AM 12/16/2013
--------------------------------------------------------------------------------------
Task 188:  Discharge the SOCEnd/SOCBegin in slice case and other cases.
--------------------------------------------------------------------------------------
	[1] check on SOC begin/end is included in slice [10 min] it finally propagates to reversePointerType
	[2] add a switch case [15 min] DONE
	[3] test and run [10 min]
							Sliceat: Timestamp: 240980 0x40103d
							printf (235860 @40101d -->   238721 @0x4011fb)
							scanf (238726 @40102e   --> 240976 @40110e)
	[4] disappears again. check the something wrong with data dependency.
		[1] check the result of hasDataDepend. --> it seems hasDataDependency is solved. [15 min]
		[2] read "something wrong" message [10 min] DONE.
10:30AM
	[5] check problem: "can skip" error message.
		[1] identify where it is [5 min]
			tsStart=238040, tsEnd=238066 --> that are located in printf, but the entire printf should be
		already skipped?
			set a bp on processFunction for 238721? did not hit.
11:45AM
		[2] check how soc(238040, 238066) is added. [10 min]
			read log: ! generate data slice for 238717 to reach bridge 238722
		[3] check why 238717 is hit (it should be skipped actually)
			[3.1] add a bp on process function first , bp on Trace.cc:1442 first
			[2] bp on soc.cc:40
				It reveals that it is soc: soc(238714,238716) setBridge to soc(238723, 238724)
				Now the problem is why would soc(238714,238716) included?
				It's caused by addTS(238715), why?
				It's added because it has multiple occurance!!! ---> this is caused by the following!
				Another question is the processFunction for the printf is never called!!!!
			Figure it out later!

9:00AM 12/18/2013
--------------------------------------------------------------------------------------
Task 189:  Find out why the printf function is not included
--------------------------------------------------------------------------------------
	Sliceat: Timestamp: 240980 0x40103d
	printf (235860 @40101d -->   238721 @0x4011fb)
	scanf (238726 @40102e   --> 240976 @40110e)

	[1] read log file and ifnd if scanf is included and if printf is included.
		The problem is that scanf itself is not included as well!
		It seems that the in-slice ts added for bridge node is not right!
9:15AM
	[2] check code [15 min] fixed. and unit testing. fixed! But still not solving the problem.
9:50AM
	[3] Solve th case skip problem.k 
		The problem occurs at 238767->238769. The problem is that 238768 is included, and its tsEntry is smaller than 238767.
		Check how 238767->238769 is included as SOC.
		[3.1] set a bp at 238767 at addSOC. [20 min]
			It's caused by sm.addTS(238768)
			No the problem why is 238767 included?
		[3.2] trace into 238768
			Found the problem matchCallTS is set to ts itself!
10:30AM
		[3.3] fix code! [15 min]
			[3.3.1] fix getCallForRet [5 min] DONE
			[3.3.2] fix code in socmanager. [5 min] DONE.
		[3.4] new problem on caseSkip: 239251->239254, check where error occurs. [15 min]
			it's caused by adding ts 239252
			The problem is that 239252 does not have a corresponding call!, it should be 239241 (does not match esp).
11:00AM
		[3.5] check why 239252 does not have a corresponding call [15 min]
			bp on 239252 Trace::searchForCall retID==239252.
			second criteria is used but given up again. check why.
			It seems that there was a worry (in some earlier stage of the implementation) that a call does not preserve ESP/EBP.
		It seems that this problem is fixed by resetting esp/ebp of each SOC.
		[3.6] re-enable the use of second criteria and see if unit testing is working. [15 min] DONE.
		[3.7] Now the new case 239240->239254, check why [15 min]
			The problem: 239253 has a deeper matchCALL.
11:40AM
		[3.8] fix the matchCall calculation again in identifySOC. [15 min] DONE.
		[3.9] run and check [15 min] new problem with 138516->141715.
				still the problem of search.  fix it [15 min]
				tsEntry is 138433, check the identifySOC function.
			Found the problem 138516->141715 is the result of merging.
		[3.10] the problem is verify_and_reset_soc
			Need to decalre a function in SOC::getMinMatchCallTS, if the minMATCHTs is smaller than soc, continue to merge, 
			until it is defined.
====> TO DO!!!

9:00AM 12/19/2013
--------------------------------------------------------------------------------------
Task 189:  Fix the tsEntry < tsStart problem
--------------------------------------------------------------------------------------
	[1] Algorithm design [20 min] DONE.
	[2] implementation [35 min]
		[2.1] define SOC::getMinMatchCallTS  [10 min] DONE.
		[2.2] in verify_and_reset_soc call getMinMatchCallTS, if the matchTS is smaller than the tsStart, enter the same
			branch; notice that in the next iteration, the check will be performed again. [20 min] DONE
		[2.3] debug. [15 min]
			[2.3.1] check getMinMatchCallTS [8 min] done.
			[2.3.2] debug the verify_and_reset ... [10 min] done.
	[3] find a new place where it does not work (tsEntry<tsStart).
			It's caused by the full_slice call when merge. Question: do we need the full_slice when merging SOCs?
			We can skip it because anyway, the bSuccess is set to false, and there will be a complete full_slice for
			each SOC a second time..
	[4] observation of lab results.
	Sliceat: Timestamp: 240980 0x40103d
	printf (235860 @40101d -->   238721 @0x4011fb)
	scanf (238726 @40102e   --> 240976 @40110e)
			238721 is added because read register error

3:11PM
	[5] new problem: full_slice_all_soc called by insertSOC throws the tsCall < tsEntry complaint.
		Problem: full_slice_all_soc is called for each insertSOC.
		adding SOC 147187 -> 147196, but merged with socPrev 147187 -> 147207 , and the 147201 has a call at 145xxx.
	[6] solution: design a function findSOCStart(ts) which returns the ts proper for SOCStart which is smaller than
		ts, get the logic from identifySOC. and then modify identifySOC. [55 min]
8:00PM
		[6.1] define findSOCStart(ts) and take the logic [15 min] DONE
		[6.2] modify identifySOC [10 min] DONE
		[6.3] unit testing [5 min] DONE.
		[6.4] modify insertSOC [10 min]	 --> seems to already solve the problem without omdifying SOC.
		[6.4] run and debug [15 min] --> stlil does not work. Need to think about the SOC algorithm again.

9:00AM 12/20/2013.
	[7] Rethink the logic of sm.addTS [1 hr]
			[7.1] check the logic of Trace::branch_slice -> break when slice size is equal to the last iteration.
				It first init raw data slice, and then add each instruction that is in slice.
				Then it full_slice_all_soc, and checks and verify SOC.
			[7.2] check the logic of socmanager.addTS, identifySOC and adds it.
					Note that addSOC logic seems to be very complex, it has too generalized assumption that
					an SOC may be appended in the middle. ---> check it later
					Also when insertSOC, when an SOC is really added it calls full_slice_all_soc (which is a lot of repeated job)
					The point of full_slice_all_soc is to find out those that cannot be used for bridging.
			[7.3]sm.verify_and_reset_soc merges SOC if bridge fails or tsEntry<tsStart (for calls), return true if everything is ok.
			[7.4] algorithm improve idea:
				[1] branch_slice: each iteration inherits the SOCs generated in the last iteration: the purpose is find the
						right set of SOC. During each iteration:
							(1) clear and reset the initial raw data slice
							(2) verify and reset SOCs, merge SOCs if bridge fails --> all SOCs bridge fine and cover the init raw slice
							(3) full_slice_all_soc, will generate control dependency inside SOC and generate data slices outside of SOC
							(4) for each time stamp in slide, add and find SOCs for each time stamp, call sm.addTS (but do not
									call full_slice)
							(5) if step does not yield any SOCs, then break
					[1.2] call and write all SOCs.
10:00AM 
		[8] implementation
			[8.1] change the branch_slice algoritm [40 min] DONE.
				[8.1.5] code inspection of gen_branch_slice [10 min] DONE.
			[8.2] change socmanager.addTS (modify the return value) [5 min] DONE.
				[8.2.1] change identifySOC logic [10 min] DONE.
				[8.2.2] change addSOC logic. [10 min] DONE.
				[8.2.3] change insertSOC logic. [15 min]
			[8.3] check the logic of verify_and_reset_soc
			

3:45PM
--------------------------------------------------------------------------------------
Task 190:  fix the complaint about memory error.
--------------------------------------------------------------------------------------
	[1] break at "something wrong" message and check what is goin gon [10 min] print the reversePoinerType instead.
			mostly report reversePointer type 7.
	[2] check type 1.
			ts: 235425
	Sliceat: Timestamp: 240980 0x40103d
	printf (235861 @40101d -->   238722 @0x4011fb)
	scanf (238727 @40102e   --> 240976 @40110e)
----------------------
timeStamp: 235425, ins @4019e8: mov	fs:[], ecx
 write: (start: 0x7ffdd000, end: 0x7ffdd003) , DEPLINKS:  , R: 235424 
	It is located before printf, I would not worry about that at this moment

	[3] check the printf function 238722.
	It looks that it is skipped. 236425 is set as seh delay.
	But still the rate is about 61%.
	[4] check accuracy of the stts report. It's ok. The non-imported instruction store size is still big.

7:30PM	
--------------------------------------------------------------------------------------
Task 191:  fix the slicing algorithm error.
--------------------------------------------------------------------------------------
	[1] 7c8017fd's dependence is not resolved yet, there is only once occurance of this instruction
timeStamp: 127006, ins @7c8017fd: mov	ecx, [ebp+0x8]
		The problem is that the memory read operation of this instruction is not recorded!
, DEPLINKS:  , R: 127001 and EBP value: 0x12ff94, C: 127005 ESP: 0x12ff94 EBP: 0x12ff94

	[2] Confirmed 7c8017fd's memory read is not recorded. set a conditional BP.
		Note: not right: bTracePhyMem is true when the mem addr is captured.

9:00AM 12/21/2013
	[3] regenerate the raw trace and look at if it's the interrupt handling causing the problem.
		It seems that there are a lot of trouble related to recording around context switch. But debug this problem first
9:45AM
	[4] debug plan:
			[1] set a BP on Trace::handle_mem_read first and then disable it
			[2] set a conditional bp on 7c8017f5 which has the read access recorded.
			[3] set a watchpoint on trace->bPhyTraceMem
			[3] set a conditional bp on 7c8017fd
		Problem is to study why bPhyTraceMem is changed.

		Observation: 7c8017f5 has the memory recorded because there is no record of writeEIP for that particular physical mem
			addr. Its bTracePhyMem is still wrong! Check if 7c8017f5 is between Syscall back.
10:10AM
	[5] check how bTracePhyMem is changed	 [30 min]
			change src code.
			Found the problem: when interrupt switch back, it does not disable bTracePhy
			Fixed. check again
		Now works.
		Regnerate all traces. DONE.
11:30AM
	[6] check if the slice is working. Still not working

12:00PM
--------------------------------------------------------------------------------------
Task 192:  fix the slicing  omission error.
--------------------------------------------------------------------------------------
	[1] problem statement: function 
			call 40149b (ts: 126979) -> ret 406206 (ts: 127075)
	[2] guess: it may be caused by an additional setting in slice operation for the program entry
	[3] implementation:
		[3.1] remove the adding of program entry [8 min] DONE
		[3.2] in socmanager::writeSOCs, if the program entry is not in any SOC, then create a bridge to it. [15 min] DONE
		[3.3] test [15 min]
	[4] the first instruction is still included. check why:
		[4.1] first instruction ts: 126978: (@40149b) 
				imeStamp: 126978, ins @40149b: call    0x00004CD1
		[4.2] check why 126978 is included.
				It's included because of other reasons.
	[5] the first SOC is: tsStart = 126978, tsEnd = 127137
		So the problem is why tsStart 126978 (the call is included but its ret@406206 is not included.
---------------------------------------
		Need to check binWriter and find why the ret@406206 is not included.

11:45PM 12/22/2013
	[6] fix bug in MOCManager::handleProgramEntry. 	 [10min]
	[7] fix bug in full_slice for soc, the soc begin should not be in slice, it will be reached by bridge anyway. [10 min] 
	DONE. fixed.

--------------------------------------------------------------------------------------
Task 193:  fix another slicing error.
--------------------------------------------------------------------------------------
	[1] problem is in setArgV. compareative study. The problem is when it returns, the stack RET is not right one is at
		0x12FFC4, and the other at 0x12FF88.
		Found the problem: function 0x0040586B parse_commandline does not reset the stack to its original status! (entering
			and exiting esp value not the same!)
	
9:00AM 12/23/2013
	[2] check parse_commandline again. [15 min] The problem is that the function call at 0x40586b does not change
		the esp, but the previous two push instructions reduces esp value by 8. The question is: why isn't esp_delay 
		handled? The corresponindg instruction is 202778.
		timeStamp: 202778, ins @405867: push	esi
 		write: (start: 0x12ff60, end: 0x12ff63) , ESP: 0x12ff64 -> 0x12ff60 , DEPLINKS:  , R: 202777 and ESP value: 0x12ff60, R: 202769 
		The return instruction timeStamp: 204094, ins @4057d2: ret	
 read: (start: 0x12ff5c, end: 0x12ff5f) , ESP: 0x12ff5c -> 0x12ff60 , DEPLINKS:  , R: 204093 and ESP value: 0x12ff60, M: 202780 
	[3] check why 202778 is not handled [10 min]
	[4] check processFunction 204094 (ret), why esp delay link is not handled. [30 min]
			The problem is that none of hte instructions in 202780->204094 is marked with EspDelay.
			So there is no need to actually delay the esp.
		Rethink about the esp delay algorithm.
10:20AM 12/23/2013
	[5] start from the last instruction of the failing function and check ESP chains. [30 min]
		[5.1] check the handling of instruction @00405873 (ts: 204096)
				Fromt he log: 204096 added an esp link to ts: 204094
		[5.2] check the processing of funciton again and check 204094.
				Found the bug at line 1508. when it is EspDelay is should not skip.
		Now fixed, the ESP at 0x0040586b seems fine. But still return in the container function is no ok.
	[6] compare the ESP of the new problem.
			problem is th leave instruction (mov esp<- ebp; pop ebp)
			The problem is that the EBP value is not the same at the leave instruction.
11:00AM
	[7] check the ebp value problem. [20 min] DONE.
			The problem is that 158732 @4057d6 (mov ebp, esp) is not included to save the esp value in ebp,
		which leads error of the leave instruction.
			Involved instructions: 158732 @4057d6
								204093@4057d1: leave
11:40AM
	[8] trace the dependency chain of 204093 and see how it did not get to 158732 [20 min]
			204093->204072
			204072 is not handled at all
		[8.1] It seems that 204093 is only handled in init raw data slice, but not in full_slice, check: [5 min] --> it is actually hit
			It has included 204072 as ebp delay. check log
		check new log for 204072 (40bc76: pop ebp) does not work, too long a chain to follow
		[8.2] try reverse direction: start from 158732 and see who depends on it
			158732->158753 (@4019b0)[ it is actually included in the slice, the problem is why it does not
				set 158732 into slice].
		[8.3] trace 158753 (@4019b0), it is depended by 199541 (@0x4019f6: pop ebp),  which resets ebp.
			However, it does not set an ebp link to 148753. [5 min]
		[8.4] trace into 199541 and check why it does generate mem link to 158753.
			The problem identified: bNoDataProgation is set to false
			Found the problem: the if-elseif case forgot to check ebp case! fixed.

3:50PM			
--------------------------------------------------------------------------------------
Task 194:  fix another slicing error.
--------------------------------------------------------------------------------------
	[1] observation.
	[2] the problem is the instruction at 0x00405964 depends on 0x00405953 on EDX, but 0x00405953 is not included
	[3] get the stats.
			145675 (jnb ...)
			145674 (@405964: cmp eax, edx)
			145671 (@405953 lea edx, [eax+0x800]
	[4] problem: 145675 did not include 145674 as dependency.
	[5] trace into 145675
			fixed bug
	[6] check again.
------------------- WORKING! and printf skipped ---------------------------------------------------
Trace Size: 267373, in slice: 99947, Percentage: 37.38%
Instruction Store Size: 48397, in slice: 9494, Percentage: 19.616918%
Instruction Store Size (excluding imported DLL): 3511, in slice: 2430, Percentage: 69.211051%
----------------------------------------------------------------------------------------------------


11:00AM 12/25/2013
--------------------------------------------------------------------------------------
Task 195:  check if there is anything that can be improved
--------------------------------------------------------------------------------------
	[1] check the call at 0x00401418 and see why it is included.
		ts: 232089, its corresponding ret is 235847
	It complains about: 235845: xor EAX, EAX. should have only one register being read! Stopped delay dependency!
	[2] conditional BP on 235845. [15 min]
			the problem is that for xor eax, eax it does not identify eax at all!
		Fixed. now new Something wrong error!!!!
-----------------
9:00AM 12/26/2013
	[3] it complains about SEH preservation.
		[3.1] check how SEH is checked.  [15 min] There is a way to use scr or ccr to check
		[3.2] re-run and check the return at 0x404fe6  [15 min]
			Still got the same problem
			"Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 1, ts: 235426, eip: 4019e8!"
			It's 238713 depends on 235426 (another mov fs:[], ecx)
9:40AM
		[3.3] conditional bp on hasDataDependency and see how 235426 is discharged. [15 min]
			There are two problems:
			[3.3.1] setNeedMem is not set by 238713 DONE.
			[3.3.2] in processFunction, no need to check isWriteSEH, just use memory check.
			DONE.
		[3.4] check if it's ok now. Rebuild and regenerate [15 min] Has problem in generated slice. [15 min]
		[3.5] check  problem with 7c910337 reverse_pointer type 10. [REGI_LINK delay] [15 min]
			Fixed. Crash on 004058aa (ts:147225), it depends on 0040589e (147222, 147221) but it's not listed.
		[3.6] check why 147225 ignores links on 147222 and 147221
			Problem is with the handling of processFunction 147220
				-- REMOVED reg dependency at 147219 for reg 1 (0 vs 0)
				the value is definite not 0! It's a string located at 0x00010000!!!
			check the register recording algorithm. This is not correct!!!
			Instruction info is listed below!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
timeStamp: 147219, ins @7c812c84: mov	eax, [eax+0x48]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
		
	
11:00AM	
--------------------------------------------------------------------------------------
Task 196:  check the register recording algorithm
--------------------------------------------------------------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
timeStamp: 147219, ins @7c812c84: mov	eax, [eax+0x48]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
	[1] check if @7c812c84 is in the request list. The problem is that it does not trigger isNeedRecord.
	[2] check the log and see why @7c812c84 is not in the list, instead tsRet and tsEntry is recorded.
	  ret: 0x7c812c87 (next @40589e)
	  entry: 0x7c812c78 (@405898 call), next @7c812c7e
	[3] check if these values are recorded.
		It is recording the value @0x40589e (by @405898 triggered), recording one register 1. Recorded value is 0x142378.
			Recorded ts is 147215, for eip 0x405898, 
	[4] fixed the reg problem. DONE. now working.A
	Slice size:
		Trace Size: 267373, in slice: 111624, Percentage: 41.75%
		Instruction Store Size: 47560, in slice: 9470, Percentage: 19.911690%
		Instruction Store Size (excluding imported DLL): 3511, in slice: 2404, Percentage: 68.470521%

11:30AM 12/28/2013
--------------------------------------------------------------------------------------
Task 197:  see if there are any improvements
--------------------------------------------------------------------------------------
	[1] check the last call. 0x00401418->0x0040141d and see why they are included
	[2] timestamp: ret is 235847
	[3] check log and records are shown below
--processFunction ts 235847
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- REMOVED reg dependency at 235845 for reg 81 (246 vs 246)
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- has reg dependency at 235843 for reg 5 (12ff88 vs 12ff84)
 -- set ts 235847  @0x404fe6 in slice
	 	 Function included! add into slice RET 235847 @0x404fe6
timeStamp: 235843, ins @404fc5: pop	esi
 read: (start: 0x12ff7c, end: 0x12ff7f) , ESP: 0x12ff7c -> 0x12ff80 , DEPLINKS:  , R: 235842 and ESP value: 0x12ff80, M: 235144 
	[4] the problem is that it tries to compare register dependency on ESP.

	[5] 235843 is needed for esp by 235849 at 0x40141e: CMP EAX, ESI.
	[6] fix: change the collection of register timestamp. and see the result (tsRet+1)
--processFunction ts 235847
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- REMOVED reg dependency at 235845 for reg 81 (246 vs 246)
-- REMOVED reg dependency at 235845 for reg 1 (0 vs 0)
-- REMOVED reg dependency at 235843 for reg 5 (12ff88 vs 12ff88)
-- REMOVED reg dependency at 235843 for reg 7 (0 vs 0)
-- REMOVED reg dependency at 235843 for reg 7 (0 vs 0)
-- REMOVED reg dependency at 235843 for reg 5 (12ff88 vs 12ff88)
-- delay dependency on reg: 13 and ts: 235847 to 232088
-- delay dependency on reg: 17 and ts: 235845 to 232080
 -- set ts 232080  @0x4055ff in slice
-- delay dependency on reg: 21 and ts: 235845 to 232080
 -- set ts 232080  @0x4055ff in slice
-- delay dependency on reg: 81 and ts: 235845 to 232086
 -- set ts 232086  @0x40140a in slice
-- delay dependency on reg: 93 and ts: 235845 to 232080
 -- set ts 232080  @0x4055ff in slice
-- delay dependency on reg: 13 and ts: 235843 to 232088
-- delay dependency on reg: 15 and ts: 235843 to 232084
 -- set ts 232084  @0x405604 in slice
-- delay dependency on reg: 97 and ts: 235843 to 232084
 -- set ts 232084  @0x405604 in slice
-- delay dependency on reg: 100 and ts: 235843 to 232088
 -- set ts 232088  @0x401416 in slice
-- set SEH delay 235427 -> 232064
 -- set ts 232064  @0x7c90ee05 in slice
	 	 Function skipped! add visit link 232088 @0x401416
--- new stats:
	race Size: 267373, in slice: 57085, Percentage: 21.35%
Instruction Store Size: 48165, in slice: 8778, Percentage: 18.224852%
Instruction Store Size (excluding imported DLL): 3511, in slice: 1928, Percentage: 54.913130%
+++ Task completed: Task generate branch slice for: /home/samba/smbuser/slice_jobs/job1

Worked! now the last function called is setargv! check it later.

9:00AM 01/03/2014
--------------------------------------------------------------------------------------
Task 198:  check setargv is really needed
--------------------------------------------------------------------------------------
9:00AM
	[1] check the timestamps of setargv
		eip is 0x004013f4. ts: 158729
		after ret is 0x004013f9, ts: 204107
	problem on processFunction 204106: ERROR in reading register 5 for 158729 or 204106.
	[2] regenerate all data slices.
		[2.1] record request mode.
	[3] --processFunction ts 204104 first couple of messages.
ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104  //check later
ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104  //check later
ERROR in reading register 1n-- ERROR reading reg value at 158728 or 204104  //check later
-- has mem dependency at 204097 on first write on 0x421308 // depended by 235852: eip:0x401434, it's preparing argv.
-- has mem dependency at 204096 on first write on 0x421304 // this is argc

The problem is that argc and argv is never needed by any other instruction during execution! Check

	[4] check dependency on 235852, it is depended by 235853 (push argc), and then the call instruction (due to esp)

10:10AM
	[5] read log and check how 235852 is added in slice. and then how 204097 is added.
		(1) 204097 is added using memlink from 235852 [not ok, as 235852 is not dependent on mem]
	[6] conditional BP on 235852.
		Found the problem. bNoDataPropagation is not accurate enough. 235852 is needed for esp (register), however,
		its esp is not dependent on its memory. Need more accurate analysis!!!
	[7] algorithm design to refine analysis: 
		[7.1] read the current design. current design is not accurate enough.
	[8] algorithm design:
		Add InstrProcessor::NeedPropagateMemLink; and InstrProcessor::NeedPropagateRegLink.
			Based on bNoDataPropagation, add more level of control.
			For example, if a push instruction is not needed for mem, should return false.
7:20PM
	[9] design I: [25 min]
		In InstrInfo class provides the following:
			get OutputDependeceMatrix(int [4][4] mtr)
				index: reg, mem, esp, ebp.
				values: 0, no; 1: yes; -1: unknown
				For example, push EAX looks like the following

						(input) reg   mem   esp ebp
				(out)
				reg				0
				mem				1 (eax) 0   0   0
				esp				0      0    1   0
				ebp				0      0    0   0

		When check if needs to propagate a link, check if it is needed for reg, or for mem, or for esp, or for ebp, then
   based on ebp. Then based on if one data output is needed, update the row correspondingly (e.g., if for the PUSH EAX instruction, 
it is only needed in memory, then only keeps, the mem row.

	[10] design II: [20 min]
		Treat the memDependeLink as a special case, add a function isMemInputReallyNeeded() to InstrProcessor. always return true
			for unknown cases; for push case, if it's only needed for Esp; return false;
		When bProgatate data is set (isNeededForReg, or isNeededForMem), and an instruction has mem dependelink, the only case
		we could think of is:
				push [0x401010] which is needed for reg.A

	[11] make decision: use approach 2 only. Instructions with two outputs are rare.
	[12] implementatoin: add a function isMemInputReallyNeeded(). [15 min]
		solved.

	[13] check the updated dependency list of the function, see below:
--processFunction ts 204104
ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 
ERROR in reading register 5n-- ERROR reading reg value at 158728 or 204104 
ERROR in reading register 1n-- ERROR reading reg value at 158728 or 204104 
-- has mem dependency at 199498 on first write on 0x4209a8
-- has mem dependency at 195804 on 0x321ef8, first 4 bytes: 420580 and 322c98, size: 4
-- has mem dependency at 168552 on 0x12fb4e, first 4 bytes: 420000 and 320210, size: 2
==> it seems that 19948 mov [0x4209a8] is part of the intitmbctable. It's writing to ptmbcinfo. (a global variable).
	It's dependended by 201174 (@401c04) a part of setlocalinfo <- vscanf.

			

9:30AM 01/04/2014
--------------------------------------------------------------------------------------
Task 199:  find another to improve
--------------------------------------------------------------------------------------
	[1] verify if the last improvement is working. verified, working.
	[2] Think about the case of 199498, and see if it can be improved. Information below.
		199498 is saving information to ptmbcinfo.
		timeStamp: 199498, ins @4071f0: mov	[0x4209A8], ebx
		 write: (start: 0x4209a8, end: 0x4209ab) , DEPLINKS:  , R: 195779 , C: 199497 ESP: 0x12ff28 EBP: 0x12ff5c
		It's dependended by 201174 (@401c04) a part of setlocalinfo <- vscanf.
		201174 is depended by 201175, which is a branch
		Then it is depended by 201176
		[2.2] check 201174, it is a part of function ___
				its return is 0x401c43, ts: 201185.
		--processFunction ts 201185
		 -- set ts 201185  @0x401c43 in slice
				 Function included! add into slice RET 201185 @0x401c43
		check why function 201185 is included
	[3] conditional debug of processFunction 201185.	
		The function is included because it changes esp/ebp
	[4] explore if the change esp/ebp is ignored, will the function still has dependee.
		failed, because needs to collect the recording information.
		Current stats:
				Trace Size: 267371, in slice: 57077, Percentage: 21.35%
				Instruction Store Size: 47554, in slice: 8772, Percentage: 18.446398%
				Instruction Store Size (excluding imported DLL): 3511, in slice: 1923, Percentage: 54.770721%
	[5] slicing improvement:
			after the check of bNoChangeOnEspEbp. check on if on the same callEIP the difference of ESP and EBP is the same
			and check if the minor adjustment can fit into the slot. call binWritter to generate the binary instruction.
			Once it passes, then this information should be passed to binwriter when writing the slice.
10:30AM
	[6] Detailed Alg Design:
		[1] declare class CallAdjustFailureRecord(eip). supports methods addEIP, findEIP. Itself keeps a 
			simple hash function.  Need serialization.
		[2] unit testing CallAdjustFailureRecord.
		[3] declare class CallAdjustRecord (eip, esp_change, ebp_change), use CachedMap.
		[4] in Trace destructor, call the eipToCallAdjustRecord clear and see if the destructor of CallAdjustRecord is called.
		[5] add bool Trace::AdjustCall(ts,tsRet), it should first check if eip belongs to the CallAdjustFailureRecord;
				and then  check if it managable to replace hte instruction with the adjust of esp.
		[6] update the binWriter::writePartrialTrace. Call Trace::eipToCallAdjustRecord and see if there is 
				any record to replace.
11:00AM
	[7] Implementation
		[1] CallAdjustFailureRecordProcessor [1 hr]
			[1] vector<unsigned int> vecEIP, map<eip, 1>, Cache [8 min] DONE.
			[2] function addEIP()  [8 min] DONE
			[3] function serializeTo(char *) [15 min] DONE.
			[4] function deserlizeFrom(char *) [10 min]  DONE.
			[5] hasEIP [5 min] DONE.
			[6] loadFromCache and saveToCache [15 min] DONE.
			[5] unit testing [20 min]
				[1] something wrong with rrv loadFromCache. OK.
				[2] needs to adjust the serialization save integer by integer.
3:00PM			[3] debug through. found the problem.

3:50PM
		[2] define class CallAdjustRecord [DONE]
			[1] define data members: [5 min] DONE.
			[2] public operations: [30 min] DONE.
				(1) define class CallAdjustRecord(eip, espChange, ebpChange) - just data class [10 min]
				(2) function int size asReplacement(char *buf), use binWriter functions. [20 min]
			[3] unit test [25 min]
4:45PM 
		
9:00AM 1/5/2014	
		[3] define class CallAdjustRecordProcessor 	 [estimated 2 hrs]
			[1] data members [15 min] DONE.
				(1) string basePath <- from Trace directory	
				(2) CallAdjustFailureRecordProcessor 
				(3) vector of CAR*
				(4) map of eip to CAR*
			[2] public operations: [1 hr]
				(1) contructor: based on Job::REQUEST_MODE decide if to load the cache for CallAdjustFailureRecordProcessor
						or creat it as a new. First from BatchAnalyzer get the current job, and then job->job_path. [15 min] DONE
				(2) destructor: remove the list of CAR*. [5 min] DONE
				(3) protected: getCAR(eip) -> pointer to CallAdjustRecord [5 min] DONE
				(4) protected: addCAR(eip, espChange, ebpChange) [8 min] DONE
				(5) public: tryAddCar(eip, esp_change, ebpChange) -> return false if failed, it first checks CallAdjustFailureProcessor to check if this is a failure record, and then it gets the current car, see if the espChange and ebpChange the same, if the same, it attempts to serialize it. For all failure, mark the callAdjustFailureRecordProcessor. [25 min]
6:50PM
			[3] include CallAdjustRecordProcessor in Trace [40 min]
				(1) declare CallAdjustRecordProcessor in Trace [8 min] DONE.
				(2) change the logic on hasDataDependee based on isFunctionNoChangeESPEBP. [15 min]
				(3) change the logic of binWriter [15 min]
			[4] debug into it. [30 min]
				[1] request mode [15 min]
				[2] work mode [20 min]

9:00AM 1/6/2014
	[1] test the CallAdjustRecordProcessor
		[1] 1. the request mode [20 min] ok.
			bp on the constructor and destructor and then start the raw mode.
				trace_record ok, full_trace. ok.
		[2] 2. test 2nd chance code in Trace.cc:1508 [15 min]  DONE.
		[3] fix the addCAR problem. [10 min] fixed.
		[3] 3. check the binWriter case. [20 min]
			Observation: raw mode non was collected
10:40AM
		[4]  use the capture mode. 
			Found the problem: the EIPs pushed into the carp is not the "call" instruction.
		[5] check Trace.cc:1515 again. Fixed the bug
			first replace: eip 0x401355 (call heapSetInformation)
			second replace: eip 0x405058 
		Fixed. New stats are:
	Trace Size: 267373, in slice: 56912, Percentage: 21.29%
Instruction Store Size: 47547, in slice: 8717, Percentage: 18.333438%
Instruction Store Size (excluding imported DLL): 3511, in slice: 1916, Percentage: 54.571347%

  Trace sice: 21.35->21.29%.
		[5] check the setarg call again. ret: meStamp: 204106, ins @40588d: ret

--processFunction ts 204106
ERROR in reading register 5n-- ERROR reading reg value at 158730 or 204106 
ERROR in reading register 5n-- ERROR reading reg value at 158730 or 204106 
ERROR in reading register 1n-- ERROR reading reg value at 158730 or 204106 
-- has mem dependency at 199500 on first write on 0x4209a8
-- has mem dependency at 195806 on 0x321ef8, first 4 bytes: 420580 and 322c98, size: 4
-- has mem dependency at 168554 on 0x12fb4e, first 4 bytes: 420000 and 320210, size: 2
-- has mem dependency at 168531 on 0x12fb4c, first 4 bytes: 420000 and 320210, size: 2
-- has mem dependency at 168232 on 0x12fb32, first 4 bytes: 427c80 and 320210, size: 2
	check setlocal function is skipped (for 199500)

********************************************************************************
for 0x401e2b (call setLocaleUpdate)
--processFunction ts 238992
-- has reg dependency at 238990 for reg 5 (12fcf8 vs 12fcfc) -- dependency on [esi] ---> it seems that the reg 5 is not right!
	Check later!!!
-- has mem dependency at 238987 on 0x12fd18, first 4 bytes: 8148 and 8101, size: 1
 -- set ts 238992  @0x401c43 in slice
	 	 Function included! add into slice RET 238992 @0x401c43


01/07/2014
8:30AM
--------------------------------------------------------------------------------------
Task 200:  add the branch exit
--------------------------------------------------------------------------------------
	Idea: find an empty hole big enough and insert the branch to call the TerminateProcess function.

	[1] modify Trace::branch_slice and insert the call on binWriter
		[1] in gen_branch_slice call binWriter writeProgramExit [8 min]
		[2] in binWriter header define writeProgramExit  [8 min]
		[3] implement it [30 min]
9:50AM
	[2] genExitCode
		[2.1] scheleton [15 min]
		[2.2] genCallTerminate [20 min] 
		[2.3] implement writeLittleEndian [10 min]
10:30
		[2.4] finish genCallTerminate logic [20 min]
11:30
		[2.5] finish the jmp logic. [DONE]

--
6:50AM  01/08/2014
	[2.6] finish the logic of changeBranch (10 min) [DONE]
	[2.7] handle the logic of failure of inserting branching exit. remove it from folder.   (15 min) [DONE]

7:30AM
	[3] Debugging
		[1] WriteProgramExit [ok]
		[2] findHole. problem minDist. FIXED.
		[3] need to flush after write the first step.
		[4] asJMP. 
		[5] genExitCode.
		[6] genTerminateProcess
		There are problems with genTerminateProcess. 
10:00AM
		[7[ fix genTerminateProcess. Don't use strcpy. Use memcpy. [10 min]
		[8] the terminate code is not written into the right place. fix it. [15 min]
10:55AM
		[9] target address of TerminateProcess is wrong (it is 1e1c, the correct one should be 1e16) - 6 bytes away.
		[10] Now fix the visiting logic in findHole.

11:00AM 1/9/2014
			[10.1] define bool checkIsHole(Trace* trace,unsigned int eipStart, int size, int fidTarget, int fidSource) [25 min]
			[10.2] add a second fid of source, read the instruction instruction opcode and then check the trace->hasInstruction,
					and call checkIsHole() [10 min] DONE.
			[10.3] modify genProgramExit and add parameter source filename [5 min] DONE.
			[10.4] modify gen_slice_for_branch pass parameter source file name [5 min] DONE.
			[10.5] modify the caller.  [8 min] DONE.

		Success.


9:00AM
--------------------------------------------------------------------------------------
Task 201:  generate all branches and build the program to collect running results. 
--------------------------------------------------------------------------------------
	[1] build all branches. [30 min]
		report error on case skip
	[2] regenerate raw and full slice
		mode 0 -- strangely the network does not work
	[3] modify the collection program. DONE.

8:00AM 1/11/2014
	[4] check samba configuration
		try command "net use y: \\169.254.236.150\smbuser"
	Observation: try "dir" -> it's very slow, then try "ipconfig /all" it's extremely slow. Not sure what's going on.
		"notepad" is also working however slow.
		"ipconfig" - never reports anything just hangs. -> after 30 minutes no response
9:00AM
--------------------------------------------------------------------------------------
Task 202:  check qemu image networking problem again.
--------------------------------------------------------------------------------------
	[1]  recompile and reset. does not work
10:45AM
	[2] read previous logs about network setup. Check how to diagnose.
		using tcpdump, it seems that the xp vm is sending request to tap0 device (captured by tcpdump), however,
		there is never a response.
	[3] read about TAP device: tap is for link layer 2, tun is for routing (layer 3). A user program connects to
		tap/tun device to receive packets.
	[4] read about bridge: a bridge of two adaptors is to simply merge these two guys.
	--- strangely, the host regard it as 10.0.2.16.
******************************************************************************************
******************************************************************************************
-------------------------------------------------------------
-- solved the problem. needs to set static eip 10.0.2.15 for the br0. (manually, see the new startup.sh in qemu_image)
-------------------------------------------------------------
******************************************************************************************
******************************************************************************************

1/12/2014
10:40AM - 
--------------------------------------------------------------------------------------
Task 202:  check the case skip problem
--------------------------------------------------------------------------------------
	[1] identify the ts that got the thing broke.
		132786
	[2] regenerate full trace. 
		Problem:
		tsEntry:126977, tsCur: 127073, tsEnd:130412, tsStart: 126990 (the start of SOC)
	First check if 126977 and 127073 are a pair of match Confirmed.
	The call is about init_security_cookie.

	[3] find out how SOC 126990 to 130412 is established.
		It is the result of verify_and_reset_SOCs. The problem is that the ts introduced in the merged slice introduced a call
		Problem is that the verify funciton already reports false, it still continues to slice.
	[4] another case skip problem:
			this is because the last SOC (instead of SOCPrev is not verified yet)A
		Fixed: new problem: cannot handle instruction size 6.

9:00AM 01/13/2014
--------------------------------------------------------------------------------------
Task 203:  fix slicing algorithm bugs
--------------------------------------------------------------------------------------
	[1] identify ts: for cannot handle instruction size 6.
			error thrown by binWriter changeBranch.
			instruction bytes: 0x0f 0x84 0xc1 0x00 0x00 0x00 0x00
			handle size 6.
10:00AM
	[2] check merging logic infinite loop problem;
			problem: timestamp 203769 is added over and over again.
			Fix the logic of sm.addSOC --> when it's already in slice, no need to return true.
			Found the bug: soc is not inserted when its index is 0.
11:00AM
	[3] run the 367 branches. 
		found new error: 146858. [138th]
		set conditional bp
		Problem with soc(146103->146247) bridge to 146249, soc id 4.
		It seems that it needs one more iteration of fullslice.
		Verified it's the problem if verify_and_reset
		Problem with verify_bridge
7:30PM
	[4] new problem: ts=242259
			tsCur 238465
		check how trace->tsToMrM is used.
	fixed.

11:00AM 01/14/2014
	[1]  now complete run of all 327 branches
	[2] Still got a lot of c005 errors. 
	[3] these programs crash the debugger itself. check the bridge for program entry.

8:30AM 01/14/2014
	[5]  bug found: bridge right after ts does not work, has to be in the same thread.  Add a Util::error_exit on
		location.
		[5.1] regenerate the raw and full trace. DONE. [15 min]
		[5.2] run and capture error. [5 min] discovered the error. It is id 1 branch (2nd branch).
				generate file location error for 126979. 
				This is caused by a bug that discovers context switch one instruction later, so there is
			one instruction in interrupt handler cut in.
	9:00AM
	[6] bug fix: context switch handling
		[6.1] read the code about Context Switch [20 min]
				[6.1.1] gen raw trace and check it again.
		[6.2] recompile and see if the problem persists. [10 min]
			Now works for 4 examples

	[7] check brc_3, why the exit code is something different.
		[7.1] first branch is still not right. debug into it [15 min]
		[7.2] found the problem is the program entry handling, not handled. [10 min]
				The problem is that the last instruction is a CALL, and it is marked off. So the
				control directly jumps to the next one.
				Need to uncomment out the unmark statement
		[7.3] fix and test [10 min]
			[7.3.1] another bug at branch 2. The problem is that the program entry is not added in slice.

	10:30AM
	[8] another bug: terminate branch itself.
		It seems that the JE branch is not right brc_2. The jmp is not right.
		Fixed the bug in changeBranch. SOLVED

	10:40AM
	[9] another bug: brc_0 is not right. Still program entry.
		Fixed.

	11:00AM
	[10] let it run for a while. broke at slice 7. check it later.
	[11] add configure DUMP_ENABLED handle. DONE.

	11:50AM
	[11] now the scanf example does not work. First it broke at a function; then, the printf is included.
		[1] recompile and regenerate the trace in both modes.
			mode 1: mode 0;
		[2] found that printf is skipped successfully (saving about 20% of instr store instructions), but
			it still broke at _minit.
		[3] trace into the problem and see why it is broke. (ts=240744)
			Broke at 0x40606a. -> 0x406112
			The problem is at 0x405d42 the esp value is different, causing ret value different.
			Problem found: 0x00405d34 (pop) is not included.
		[4] new problem found: 0x405d52 (pop) is not included which causes the problem-
		VERIFY LATER.
			Study the two calls at: 405d5a (ret: 144636) is skipped, 405d5f (ret: 144648) is called, 405d64 (ret, ts: 144649) failed.
		Function 144637(ret ts) is skipped, however, the delay dependency on reg (14) [bp] is relayed to 144612, 
		but there is no delay of esp.
		Conditional bp on 144637 process function.
			Problem: only EBP delay is recorded. 
		Check these two instructions!
		Problem: 144637 is inslice (next call), it depends on 144636 for esp, however, it's not needed for esp.

		Check: how the esp of 144649 is broken
		It seems that RET does not propagate the esp link.
		Fixed.

9:00AM 01/17/2014
--------------------------------------------------------------------------------------
Task 204:  fix slicing algorithm bugs
--------------------------------------------------------------------------------------
	[1] check another bug at 0x004085a5. mov[edi+constant], eax. Both edi and eax are not right.
		The problem is that eip 0x4085a5 (ts: 231916) depends on 0x408597 (ts: 231891)
			==> 231916 is not needed for mem and does not propagate data 
			But its data destination needs register, hence actually bDataNoPropagation should be set to false.
	[2] debug 231916 [20 min]
			Proposed solution: add a method in InstrInfo to tell if an instruction needs to write to memory,
				and the destination has register,  set bNoDataPropation to false, and set bNoDataRegDependency to false.
9:30AM
	[3] Implementation: 
		[3.1] Create InstrInfo::isMemOperandContainReg() [3r min] DONE
			[1] declare a flag for hasMemoperandContaingReg [5 min] DONE.
			[2] declare the set and get function [8 min] DONE.
			[3] declare a checkRegUpdate function [8 min] DONE
			[4] update the setInputOutput reg [5 min] DONE
			[5] debug. [15 min] DONE
		[3.2] Modify InstrExecRecorder::isMemInputReallyNeeded [15 min] SKIP.
		[3.3] in Trace.cc change the logic to change value on bNoDataPropagation and bNoRegDep. [15 min] DONE
		[3.4] debug through [30 min]
		Now the new stats eip 0x4085a5 (ts: 231916) depends on 0x408597 (ts: 231891)
				[3.1] 45 min DONE. 
					does not pass unit testing. fixed. 
					another unrelated unit testing bug.
						fix destructor of CallAdjustRecord.
				[3.2] 8 min -> skipped 
				[3.3] 15 min
					[3.3.1] regenerate the trace.
						[1] raw, [2] full and doc, [3] branch
			check @4085a5 (it's not there anymore)
			Regenerate the mode 1 and then mode 0

9:00AM 01/18/2014
	[4] new problem: 0x004019ac. (ts: 234937)
			The problem is that its memroy dependency link is not propagated.
			[4.1] conditional BP on 234937 and find out the cause. [30 min]
				Problem is line 642 introduced in yesterday's code.
9:35AM
			[4.2] fix: add the check on isNeededForReg as well, and then check [15 min]
		fixed.
10:00AM
	[5] new problem: 0x40cc8f.	
			[5.1] comparative study of trace. [35 min]
			failed. There are so many instances of 0x40cc8f.	
			Regenerate the entire thing.
				mode 1 and mode 0
			Now the problem is located at eip: 0x406cdc (this is the first time that the time stamp is hit: ts:	 193869)
				The address to write is not right. Problem: esi value is not ok.
				It relies on 192920 (@40ce44 pop esi), and the instruction in the slice is NOP (problem). So the problem is that
		The ESI register is changed in the function that contains @40ce44, but its value is not recovered. go to check the
		slicing log.
				Function is:
					181506@406cb2 -----------------------------------------------------------------> 192936@40ce44 (skipped)
						inside 406cb2 it calls (181624@40ce81  ---> @40ce44:192920   192929 @40ce51 ) is included because
						170185@40ce81 --> 181486@40ce51 is INCLUDED!
								

		@0x40ce81: ts: 181624), the function is skipped.

	In unsliced version:
		0x40ce81 --> 0x40ce44 --> 0x406cb2 --> 0x40ce81 --> 0x40ce44 --> 0x406cdc
	Both functions (@0x40ce81, @0x406cb2) preserve the esi value


	In sliced version (buggy):
		0x40ce81 --> 0x40ce44(nop) --> 0x406cb2 (function skipped did not call 0x40ce81) --> 0x406cdc (error on esi) 
	Check log:
		193869 (@406cdc) --> 192920 in slice (0x40ce44)--> processFunciton ts 192936 [the ret for functio n0x406cb2]A
			--> 0x40ce44 delayed to  181477 (@0x40ce44) [but the log shows that it is included in slice]

	Next check: in binWriter call trace->setIER_II(181477) and then check if it's in slice.
		Strangely IER is in slice but II is not in slice.

	Stragenly, after the first set of inSlice, the II because false. [[check!!!]
		Found the problem: in Trace::setInSlice it first checks ier->setInSlice(), if it's already set, it's not going to
			call ii->setInSlice(), which sets the counter; thus the counter is always 1, even if there are multiple
			increments. But when applying delayRegDependence, if an instruction has multiple delay links, it's decrementing
			the counter too many times.

		In the design, the counts of InstrInfo reflects the number of distinct locations in the slice. The call of unmark slice
		seems not following this semantics strictly. Change the implementation of unmarkInSlice. If the ier is not inslice,
		don't do it.
	FIXED.!!!!


8:30AM 01/20/2014	
--------------------------------------------------------------------------------------
Task 204:  make sure the 1st 10 slices are working
--------------------------------------------------------------------------------------
	1. manual analysis and running both prove that they are working.

9:30AM 
--------------------------------------------------------------------------------------
Task 205:  chain all the parses together and create a new task type.
--------------------------------------------------------------------------------------
	[0] planning [20 min]
	--9:45 AM implementations (expected to complete 11:45AM)
	[1] Modify the config.txt [5 min] DONE.
	[2] add Job category in header file [5 min] DONE.
	[3] process the category [10 min]. DONE
	[4] create class taskBatchBranchSlice [15 min]
	[5] call taskBatchBranchSlice [10 min] DONE
	[6] debug the above [15 min]
	--10:45AM 
	[7] refine implementation of taskBatchBranchSlice [40 min]
		[1] add a parameter to Trace::gen_branch_slice to change JOB::PRESERVE_REQUEST_MODE value, if the input 
				value is -1, keep the orginal value. [8 min] DONE.
		[2] change the taskBranchSlice correspondingly and make the compile through [10 min] DONE.
		[3] fix the others [8 min] DONE.
		[4] debug through [15 min]
	--11:50AM
	[8] debug the above [20 min]
		[1] found the problem related to job_cateogory.
		[2] fix: 
			[2.1] create class taskChangeJobCateogory [20 min]
			[2.2] insert taskChangeJobCategory   [20 min]
			[2.3] debug [15 min]
	[9] test first 10 slices [20 min]
		[1] problem. segmentation fault!
10:00AM 01/21/2014
		[9.1] fix the segmentation fault problem. The problem is that the taskChangeCategory has no logger property. Fix that. 
				[25 min]
		[9.2] find out the gen_branch segmentation fault problem. [15 min]
				problem is that full trace is not there yet.
				The problem is that the raw trace is not saved to disk.
11:15AM
		[9.3] start a raw trace mode running and find out when it is written to disk.  [15 min]
			It's called by the destructor of TraceManager.
		[9.4] Solution: add a new task to delete all Traces [20 min] DONE.
		[9.5] new problem. the second loadvm is timedout.
			rebuild all.
				problem 1. the "log" is counted as one job. fixed
				problem 2. the vm should be resumed. fixed

9:30AM 01/22/2014
		[9.5] the vm is still stopped. Need a command to resume the vm.
				add a resume task after the loadvm task.
				recompile all.
				bp on taskContVM::do_job
			Problem: it never actually hits the taskContVM. Add the "cont " command to loadVM
			Now it seems to work (after appending a cont command) -> ng helper_trace2 at least
		[9.6] problem: send_evt problem on TraceManager::myinst (already null), it always pop a segmentation fault.
			Check why. Recompile first.
			[1] fix destructor of TraceManager. DONE.
			[2] add TraceManager::createInstance() in loadvm. DONE.
			[3] need to clear numCR3.
				check: bp on ops_sse.h:2386
				check when the vm loaded signal is setn.
			[4] problem: callAdjustRecord destructor error when deleting raw trace in gen_full_trace. SOLVED.
-=-----------------
	[10] problem: out of memory. Check if Cache has destructor.
		Use leak detector.
		problems seems to be fixed.
	[11] rr_processor loadFromCache (round 2)
			b Trace.cc:2213 (data is the ame for two parses)A
			The problem is with line 279 of Cache.cc if total_size = 0, should not be added with arrPosition.
	[12] out of memory again.

8:30AM 01/25/2014
	[13] run the system again. Still crashes because of memory leak. Check val_run using full_mode.
		[1] leak on cacheRV. Found the problem when loadCache, there is a memory leak. It's a possible leak. leave it.
	[14] run valgrind on branch slice and check leak.
		problem in x86_disassembly a lot of memory leaks.
		Problem is that x86_disas cals calloc to intialize insn->op field, but did not release it.
			is there any x86_destroy_insn functions?
			see below--
    x86_oplist_free(&insn); //does the real job
    x86_cleanup(); //this function simply returns 1?

		-- to do: find each x86_disasm(...) and call x86_oplist_free correspondingly.
		fixed
	[15]  fix the mapTsToId problem:
		Now owrking the first time!
	add another source.

9:30AM 01/26/2014
	[1] continue the exploration of memory problem.
		There are around 560MB memory not freed for the PC emulator. 
		Check in the raw mode (reloading snapshot), if it is getting worse.
		Reachable is about the same. It seems that we can stop the exploration of mem leak.
	[2] run the batch again and see if it terminates right.
		error in cannot open raw_traces.
			The problem: the full_trace task tries to open it but raw trace is not there.
		bp on taskSaveTrace::synch_job().
		strangely taskSaveTrace is only hit once.
		It seems that taskSaveTrace does not save the trace. remove the raw_trace there. 
		Problem initVM recreates the trace!!!
			Solution: move the part of the code to initVM.
	[3] now the problem of rr_processor
		unit test fails. double free. just checked. avoid double free.
		check rrProcessor.saveToCache() is called. add to save_rr_processor to taskTraces.
12:00PM 01/28/2014.
		It pops an error in save_rr_processor in the second pass. should be fixed
	[4] out of memory again. This time try to break on pc.c:929
		It is not hit the second time. Try valgrind again.
		It still times out. Try enlarge mem capacity. does not work. ram device error.
	[5] check mtrace.
		embedded mtrace code. the trick is to use "export MALLOC_TRACE=/tmp/t". If it's not in tmp, it seems not passing.
		The progra mstopped at about 400MB (request 200MB).
	[6] try setting OOM killer exceptions.

9:30AM  01/29/2014
--------------------------------------------------------------------------------------
Task 206:  solve the out of memory problem and other problems of batch branch
--------------------------------------------------------------------------------------
	[1] try change the trace size. seems to solve the problem.
	[2] error: cache must be empty before saving to cache.
		Strangely the save_rr_processor is called in the second pass, where the Job::REQUEST MODE is 1.
		It's the change it back causing the problem.
	[3] job2 starts from slicing immediately. The category is not reset back.
		Now seems working.
   
	[4] move a 3rd file to processing.  passed
	[5] check 10 branches each file. and then run the results.
	[6] smbd connect still too slow, try adjust /etc/samba/smb.conf, enable the tcp_option= NO_DELAY. seems not helping.
		Seems stil lnot working. reboot.
	[7] smbd: it seems to be the problem of xp side. The initial request is not sent until several minutes later.
	[8] double free delete this->cur_job;
	[9] samba is still too slow. Try to think about the solution.
	[10] try take another snapshot with it net use ... already loaded. problem with snapshot. 
		inactive CPU. (info cpus -> halted even after cont command).

2:30PM 01/31/2014
--------------------------------------------------------------------------------------
Task 207:  solve the SLOW net map samba drive problem
--------------------------------------------------------------------------------------
Check the following functions.
	qemu_run_all_timers () at qemu-timer.c:454 (calls the following)
	qemu_run_timers is visited many many times

	It seems that it takes a very long time to reach the "break" in the following
	386             if (!qemu_timer_expired_ns(ts, current_time)) {
	387                 break;
	388             }
******* SOLUTION ****************

	[1] attempt 1: add a base timestamp to -rtc clock=vm option. does not work.
	[2] search "halted" for the "cpus command"
		the "(halted)" message is printed by hmp_info_cpus.
		It is reading value cpu->value->halted. It is generated by qmp_query_cpus.
	It's actually getting the env->halted.
		interestingly when helper_trace2 is called, env->halted is always 0. (this is for snap111)
	When loading snap222, the env->halted is always 1 when doing qmp_query_cpus, however,
	it is 0, when helper_trace2 is hit. set a watch point on it and see how it's changed?
		It is called by do_hlt <- helper_halt.
	While loading snap111, it's not easy to hit the do_hlt. It seems that
		both snap111 (good one) and snap222 (bad one) do turn on/off the env->halted.
	The question is: maybe it is unrelated? (or maybe related)?
	[3] research how is helper_hlt called. It is triggered by a hlt instruction (see translate.c).
		a hlt instruction halts CPU until the next interrupt (e.g., timer interrupt).
	[4] check timer interrupt.
		from intel documentation, timer interrupt is generated by apic (later verified wrong. should be i8254 chip. there is a qemu simulator for it).
		timer interrupt is triggered in acpi_pm_tmr_update (wrong: should be pit_update_timer in i8254.c)
	[5] check interrupt handling IRQ: 0
		hardware interrupt is done using do_interrupt_x86_hardirq
Compare snap111 and snap222:
		when sending a keyboard event
			do_interrupt_x86_hardirq are called (intno: 147)
			however, it is called much less freqently than snap111 (in failed snap222).
			A lot of intno 61, 98 etc. suspect 61 is the timer interrupt.

********************************************************************
Guess: the messed snapshot has CLOCK value invalid.  which does not trigger the timer interrupt.
********************************************************************
9:00AM 02/01/2014	
	[6] read hw/apic carefully. If possible, figure out how hardware interrupt is raised. [1 hr]
		location: hw/apic.c. Interested functions listed below:
			NMI - non maskable interrupt
			SMI - system management interrupt (when OS is suspended. CPU management mode)
		APIC supports several "delivery" methods of interrupt: local , smm, external,
		and bus deliver.

		For non maskable interrupts (NMI), it's calling function cpu_interrupt to direct pass the
	non-maskale or system management interurpts in.
		For tohers, it is setting irq using apic_set_irq.
	** apic_update_irq (signals CPU when an IRQ is pending)
	** apic_set_irq
	both called cpu_interrupt, it changes env;
	may be the ones that used by timer
	** apic_get_interrupt gets the highest prioirty interrupt currently in apic

	Now timer related functions:
	*** apic_timer_update
	*** apic_timer
	Interestingly, these two timers are not called in snap111 (the good one).
	It seems that APIC is the wrong place to look at.

	qemu_mod_timer and hw/i8254 is the place to look at.
	Intel i8254 is the programmable timer.

	[7] read hw/i8254.c (programmable timer) [0.5 hr]
		8254 use pit_set_gate to send out information.
		the important function is pit_irq_timer_update
		-----------------
		*pit_irq_timer_update:
			it computes the expire time and irq level and calls pit_set_irq [seems
				to be called for every timer interrupt]	
			next_delay is defined as (expire_time-current_time)/get_ticks_per_sec()
				get_ticks_per_sec() returns 1G.
				expire_time-current_time in GDB shows something like 843
		-----------------
	10:30AM 02/01
	[8[ read qemu_timer.c [1 hr]
			qemu_next_alarm_deadline is calculated as the smaller of the delta of
				host timer and rt timer relative to expire_time.
			qemu_del_timer is to stop a timer (but not deallocate it). It's basically to
		remove the timer from the linked list.
		qemu_mod_timer is to modify the current timer so that they will be fired after
			the expire_time. Expire_time is the absolute time in ns.
			Its function is to change the expire time of the given timer "ts" and insert
			it back into the list of active timers of its associated clcok.
			There should be three clocks: vm, host, and real time.

		*qemu_run_timers ->
			(1) get the current time
			(2) if the active timer is not expired, return; wait until the next time tick to check.
			(3) if there are expired timers, call ts->cb [which is set to 
				i8254.c:pit_irq_timer_update

	Summary: the logic here:
			main_loop_wait -> qemu_run_all_timers (on vm, host, real) 
				It's the qemu_run_timer(vm) triggers 
					pit_irq_timer_update will immediately shoot a qemu_set_irq request,
						and  then update the timer (to set the next expire time)
						qemu_set_irq -> pic_irq_request -> cpu_interrupt

	Observation on snap222: pit_irq_timer_update is still called frequently. So what's the
		difference between snap111 and snap222????

11:30AM
	[9] attempt: explore the relation between [30 min]
			do_interrupt_x86_hardirq are called (intno: 147)
	AND		pit_irq_timer_update
		Design: first break on pit_irq_timer_update and then do_interrupt_x86_hardirq and
	see if it's paired, and record interrupt number in snap111 and then repeat it in
	snap222.
			[9.1] Observation: in snap111, pit_irq_timer does not trigger do_interrupt_x86_hardirq.
	The interval between each neighboring pit_irq_timer is 838 and 54923725 (ns). All vm clock
	triggered.
			drill down into pit_irq_timer: it's calling hpet_handle_legacy_irq  (when irq_level is 0)
				-> gsi_handler
				-> 8259
				-> pic_update_irq
				-> pic_irq_request
			The irq_level in pit_irq_timer_update is alternative between 1 and 0 (because
		54923725 %65535 is alternating between 0 and another non-zero number)
******
	Note from wiki: On the PC, the BIOS (and thus also DOS) traditionally maps the master 8259 interrupt requests (IRQ0-IRQ7) to interrupt vector offset 8 (INT08-INT0F) and the slave 8259 (in PC/AT and later) interrupt requests (IRQ8-IRQ15) to interrupt vector offset 112 (INT70-INT77).
	So timer interrupt is int 8.
*****

		[9.2] attempt 2: set bp on do_interrupt_x86_hardirq and bp on intno: 8
	But do_interrupt_x86_hardirq the reported interrupt numbers are mostly 177. Ocassionally 113.
		keyboard event is 147.

			Observation 2: in snap222 it is also receiving hardware interrupt 177.
		Different: when sending a key, interrupt 61 did not occur.
	Got to do a comparative debugging for int no 61.

	6:50PM 02/01/2014 comparative study of hardware interrupt 61.
	[10] study interrupt 61.
		Design: break on do_interrupt_x86_hardirq and see how it reached from 147 (twice) to 61.
		cpu_exec triggered interrupt 147 (keyboard event)
			Problem: cannot direct "n" in gdb there are long jumps.
			capture the first 147 and then break on cpu_x86_exec
			then bp on helper_trace2 for each instruction
	Observation: the interrupt handler for interrupt 147 is executed on the same sequence. However,
		interrupt 61 just did not show up.

		Seems not working.

	7:30PM
	[11] coming back to the timer interrupt. Purpose: figure out the intno for timer interrupt.
		Candidate interrupt numbers are 177, 61, 68, 98
		Actually confirmed in another way, numCR3 in snap222 is never increased. So timer 
	interrupt is handled wrong, or scheduling is not right.

	[12] Trace on pit_irq_timer_update
			read PIT 8254 documentation. Mode 2 is the rate generator mode.
			There seem to be too many timer interrupt generated!
			It seems that in both snap111 and snap222 images, there is a gap between
		current time and expire time, and it takes a lot of iterations to reach the point
		that the two are equal.
???maybe should break on qemu_del_timer in line 269 of pit_irq_timer_update.

	8:30AM 02/02/2014
	[13] Study the logic of qemu_run_timer again.
		A clock has a sequence of timer. Each timer has an expire time in nanoseconds.
		Question: what is the difference between a timer and an alarm timer?
		struct qemu_alart_timer can be regarded as a class qemu_alart_timer which
	has three methods: start, stop, and rearm. It has no relation with QEMUTimer, but
	has a data member (dependeing on OS), to a real timer of OS.
		qemu_next_alart_deadline computes the "smaller" deadline of host and virtual timers,
	it is calculated using real system timers.
		*** show_available_alarms shows two alarms dynticks and unix_timer, note that
	an alarm timer has a name.	
		qemu_get_clock_ns gets the "real" (lasted time) for vm clock.
		qemu_modify_timer: modify the timer updates its expire_time (given as a parameter)
			and insert it back into the clock as the active timer. 
		qemu_run_timers: 
			take the current time (from the host system or a real timer)
			using a for loop to repeately increase the timer (maybe many times). Each time
		calls pit_ireq_timer_update to generate IRQ signals. However, most of them are ignored.
	event_notifier_set is defined in		util/event_notifier-posix.c:92
	qemu_clock_warp does not actually do anything just return because use_icount is 0.
		The purpose of alarm clock is to stop vcpu from the thread of execution code and
	execute the other thread to check interrupts.

	10:30AM
	Summary: qemu_run_timers run the 3 clock timers. Each clock has a sequence of timers, however,
	it seems that only one timer gets updated. Each timer, when updated, will call 
	pit_ireq_timer_update to trigger IRQ. 
		Problem/Question/Puzzle1: for each qemu_run_timer the pit_ireq_timer_update is
		called many times, because there is a huge gap to the real time stamp read (or
		maybe this delay is caused by debugging???...)

	10:45AM
	[14]  explore pit_irq_timer_update again and drill down to details how signals are sent.
		[1] pit_get_out  - generates the out-pin value. At mode 2, if the the value%count 
			is 0, out is 0; otherwise it is 1. According to i8254 manual, the low edge
			represents a clock pulse (signal) to the i8259 apic controller or other 
			controller.
		[2] pit_irq_timer logic:
				1. calculate the output line (1-bit) level (high-1, low-0)
				2. call qemu_set_irq(s->irq, level) to output the signal.
					the s->irq defines which controller handles the output channel
					Here each QEMUTimer corresponds to a PITChannel (one of the 3 counters
			on i8254). Depending on which counter it is, output line is connected to 
			different place.
					According to i8254 documentation, on i8254 channel (counter) 0
			generates timer interrupt, channel 1 generates DRAM refreshment signal,
			channel 3 generates signals to speaker.
				3. drill down into qemu_set_irq it is calling:
					[1] hpet_handle_legacy_irq (defined in hw/hpet.c) - high precesion event timer:
						hpet is the next generation of timer device (10Mhz) better than
						i8254 (1Mhz).	
						it basically forwards the request (irq:0, level:0) to the gsi_handler
					[2] gsi handler (in pc.c)
						GSI stands for "global system interrupts" it consists of
							8259 and apci irqs.
						depending on the irq line number n, sends to 8259 or apic (second chip).
						In our case, it's level 0 (low edge), and irq number 0: timer interrupt.
						So it is sent to 8259 programmable interrupt controller (PIC).
						8259's handler will be called.
					[3] pic_set_irq by 8259: (look at i8259.c)
						irq: 0 (line of irq. one of 8 lines)
						level: 0 (low voltage edge)
							elcr stands for (level triggered)
							timer interrupt is edge triggered. From the code line 174 of i8259.c,
						we have: s->last_irr used for edge triggering (remember the last state
						for edge), change is triggered at level 1 (rising edge). 
							*** s->irr (interrupt request register) is set  (on the correspoding
								bit) 
						level 1 will update the s->irr (however, it's already marked)
						then pic_update_irq is called.
					[4] pic_update_irq
						calls pic_get_irq first, it returns -1? not sure why,
						then it calls pic_irq_lower, which sends pit_irq_request(0,0)
					[5] apic_accept_pic_itr:
						Accordin to chapter 10.5.5 of Intel Manual, when a local interrupt
					is sent to APIC, it is subject to a number of criteria for acceptance.
					If the interrupt is accepted, it is logged into IRR register.
							DO_UPCAST(APICommonState, busdev.qdev, d) is to 
					find the container APICCOMMONState object that contains d as the qdev member.
					this is a constant operation (by doing some offset pointer moving).
						LINT0 is the "Local Interrupt Pin 0" (input pin)
						lvt is the local vector table.
							MSR_APIC_ENABLE is defined as 1<<12
						It seems that the apic interrupt accepted always to be 0 (rejected)
---------------------------------------
						So APIC never delivers anything!???? No breakpoint hit!
					The timer interrupt is always rejected by APIC controller!
					Sequence: qemu_run_timer ->  pit_irq_timer_update -> i8254 (timer)
						->hpet -> gsi -> i8259 -> apic controller -> rejected (never accepted)

					To verify: tomorrow, make a change to the code: disable the 
			pit_irq_timer_update and see if things still work.
					Verified: it is ACTUALLY NEEDED! Otherwise the screen is not refreshed!

8:30AM 02/03/2014
						
	[15] continue yesterday's experiment on qemu_run_timer, disable pit_timer_irq_update and
		see what happens. ???????????????? [STILL CANNOT EXPLAIN]
			1. we know that screen updates not working but try helper_trace2 and see
		if process number is increasing.
				[1] in snap111: numCR3 is increasing (although screen is not showing up)
				[2] in snap222: helper_trace2 is never hit!
			Analysis?
				[1] apparently, the vm clock running is still useful! So somehow
					the interrupt (for sure) is SENT OUT TO cpu.
				[2] in snap111: the VM clock timer interrupt is used to refresh screen only.
					There is another interrupt for process scheduling purpose.
				[3] in snap222: the VM clock is used for scheduling purpose (maybe?)
	[16] try to understand cpu_x86_exec and main_loop, how does it switch to cpu_x86_exec [SOLVED]
			main_loop_wait: poll all select I/O interrupts and run timers
			cpu_x86_exec: execute tb blocks
		How does the switch occur?
			main_loop_wait and cpu_x86_exec belong to two threads!
			main_loop_wait runs the timer and process I/O
			cpu_x86_exec run the emulator code!
10:15AM
	[17]  clearly snap111 is relying on another interrupt for scheduling processes. Try
		to figure out that interrupt and ireq/interrupt number.
	set a watch pointer on env->cr[3] and see how it is changed. [30 min]
	Observations:
	[1] *** cr3 is changed by cpu_x86_update_cr3  in helper.c
			called by helper_write_crN
			for translating an instruction.
	[2] the current EIP is: 0x804dbf60
			env->interrupt_request is 0
			env->interrupt_injected is 0 as well.
	[3] check do_interrupt_hardware_irq and see what's the related interrupt number
			seems both 177, 65 can trigger
	[4] check do_interrupt_all	
			It seems that 177 triggers the most
			it is called by do_interrupt_hardware_irq, and called in
			cpu_x86_exec()! [which checks interrupt before each tb block of code in cpu_exec.c]
	[5] read about cpu_x86_exec():
			1. first check exception and interrupt
				based on env->exception_index>=0, do interrupt
				if it's value is <= 0x10000 (EXCP_INTERRUPT) it's interrupt, it then
				checks and performs the following:
							handle INTERRUPT_DEBUG, IRQ poll, SIPI, variety of interrupts. 
							env->exception_index is 5 (stands for EXCP_IRQ)	
			2. then it runs a tb. using tcg_qemu_tb_exec().
11:30AM
	[5] verify interrput 177. in do_interrupt_all add a line to not treat int no 177. see
			it impacts anything on snap111.
			Verified. it actually blocks process interleaving context switch.
			Interestingly snap222 also gets interrupt 177. We need to do a comparative study.

	[6] comparative study of handling of interrupt 177. [failed]
		[6.1] snap222. Questions: who sends the interrupt 177? How is the int number determined?
			calls do_interrupt_protected.
				interrupt dc is 0x28dc4260
				ptr is 0x8003f988
				type is 14
				selector is 8
				offset is 0x81f8f5c4 (next eip) address
			But it does not reach the change cr3 instruction.
		[6.2] snap111. check what's different.
				env->eip is changed to 81f8f5c4 the same
				Cannot tell the difference!
		[6.3] find out why it's interrupt 177.
				intno is generated by cpu_get_pic_interrupt() in pc.c.
					->apic_get_interrupt()
					->apic_irq_pending()
					->get_highest_priority_int()
				p/x s->irr
				$31 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x20000, 0x0, 0x0}
				It looks like it's irq5. (parallel port 2?)
				Int number is computed as
				i = 5 * 32 + bit_position of leftmost of 0x20000 (17)
				=177.
			Next figure out who's placing 0x2000 on the s->irr, set a mem bp on it.
		*** now we have the nice discovery ****!!!
		#0  set_bit (tab=0x28dcf830, index=177) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:60
#1  0x082726c8 in apic_set_irq (s=0x28dce510, vector_num=177, trigger_mode=1)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:390
#2  0x08271f6c in apic_bus_deliver (deliver_bitmask=0xbffff04c, delivery_mode=1 '\001', 
    vector_num=177 '\261', trigger_mode=1 '\001')
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:240
#3  0x08272298 in apic_deliver_irq (dest=1 '\001', dest_mode=1 '\001', delivery_mode=1 '\001', 
    vector_num=177 '\261', trigger_mode=1 '\001')
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:290
#4  0x08275349 in ioapic_service (s=0x28ded6c8)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:71
#5  0x08275460 in ioapic_set_irq (opaque=0x28ded6c8, vector=9, level=1)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:102
#6  0x08126d7a in qemu_set_irq (irq=0x28de9f5c, level=1) at hw/irq.c:38
#7  0x08284e12 in gsi_handler (opaque=0x28dd6c50, n=9, level=1)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../pc.c:98
#8  0x08126d7a in qemu_set_irq (irq=0x28de5294, level=1) at hw/irq.c:38
#9  0x080bf3d0 in pm_update_sci (s=0x28ddac60) at hw/acpi_piix4.c:106
#10 0x080bf45b in pm_tmr_timer (ar=0x28ddb1fc) at hw/acpi_piix4.c:115
#11 0x080be2ca in acpi_pm_tmr_timer (opaque=0x28ddb1fc) at hw/acpi.c:389
#12 0x081e35f4 in qemu_run_timers (clock=0x28c458d0) at qemu-timer.c:394

-------------------------
Still the run_timers put it's the acpi_pm_tmr_timer directly!!! (so we were looking at the
wrong place of timer!)

**************************************************************
	* it is still triggered by the vm_clock, but this time it's a different timer
		It's the acpi_pm_tmr_timer *** (in acpi module)
		according to 10.5.4 of intel documentation, the LVT timer register determines
	the vector number to deliver.
		Note that before it delivers to piix4, it calls the following:
*** qemu_system_wakeup_request(QEMU_WAKEUP_REASON_PMTIMER);

		The associated irq is 9 on sci.
		Because it's irq->n is 9, it genrates s->irr to be 0x200 (see bit 9) at ioapic,
			it's determined by s->ioredtbl[9]	 (last two hex digits)

	Summary: VM clock has both i8259 and APIC timers. They go through different interrupts.
	8259 goes through interrupt 0, however, it's never delivered; the APIC tiemr goes through
	ioapic pin 9, vector number 177 and delivered as hardware interrupt 177.

------------*********************************************
The above confirmed that interrupt 177 (0xb1) is one of the working timer interrupts
[1] pit i8259 timer responsible for refreshing screen, however, not used for scheduling
[2] APIC timer (interrupt 177) responsible for process scheduling.
----------- *********************************************

	4:00PM
	[18] verify 177 is the intno that triggers the context switch. [15 min]
		[1] declare a global variable last_intno and set it in do_interrupt_all
		[2] bp on cpu_x86_update_cr3
		Observation: Most likely it's 177, but it could 14 and 65
	[19] check do_interrupt_hardware_irq on int_no 65, 
		[1] who's triggering 65? from do_interrupt_hardware_irq and then check who's writing
			to s->irr. watch on s->irr[2].
			verified: 65 is i/o write.
		[2] where does it go? 
			it will also cause the switch of cr3 because of another process space routine.
		[3] check 14. - did not capture it again.
	Conclusion: 177 is the timer interrupt no.

	7:00PM
	[19] Figure out why 177 does not trigger the scheduling in snap222. Record the instructions
		and then compare.
		[1] in the main function open the log file [15 min] DONE.
		[2] use a global variable to control helper_trace2 and dump the instruction into
				the file [20 min]
		--
		[3] start the capture starting from do_interrupt_all for 177
		[4] end the capture until the next do_interrupt_all
	Observation: the sequence departs at the 15th instruction!
	ins @81f8f5c4: push esp
    2 ins @81f8f5c5: push ebp
    3 ins @81f8f5c6: push ebx
    4 ins @81f8f5c7: push esi
    5 ins @81f8f5c8: push edi
    6 ins @81f8f5c9: sub  esp, 0x54
    7 ins @81f8f5cc: mov  ebp, esp //ESP = OLD_ESP-0x68
    8 ins @81f8f5ce: mov  [esp+0x44], eax //it's old ESP-24, save EAX to ESP-10, curr stack frame
    9 ins @81f8f5d2: mov  [esp+0x40], ecx //it's old ESP-28,  save ECX ..
   10 ins @81f8f5d6: mov  [esp+0x3C], edx //it's old ESP-34, save EDX ...
   *** 11 ins @81f8f5da: test [esp+0x70], 0x00020000 //it's old ESP+8

	It should reflects to the CS register stored as parameter

	10:00AM			
	[20] Redo the experiment. Record multiple occurance of cpu_x86_update_cr3.
		[1] bp on do_interrupt_all for 177, bp on cpu_x86_update_cr3.
		record ok.txt, nok.txt.
		Strangely, cpu_x86_update_cr3 is not hit that frequently.
		The problem is that there are a lot of disruption. There are a lot of sysenter
	from the user program syscalls, which switch cr3.
		Check the process(CR3) and their names
	p/x arrCR3
$4 = {0x0, 0x39000, 0xb35e000, 0x5dee000, 0xbccb000, 0x6d39000, 0x62c4000, 

	0x39000 - seems to be the OS kernel
	b35e000 - unknown 
	6d39000 - svchost.exe (checks and loads services)
	bccb000 - wuauclt.exe (windows auto update client)
	5dee000 - csrss.exe (controls threading and windows console)
	62c4000 - services.exe (service control manager)
	62ce000 - lsass.exe (local security authentication server)

hit sequence: 
6d3900, bccb00,  6d3900, bccb00, 39000, bccb00, 39000, 62c4000<->6d3900, b35e000, (then long
time no switching),  bccb00, 6d3900, 62c4000, 

	11:00AM
	[2] attempt 2: select a process that is being switched to, not as a result of sysenter.
	Chosse 0xbccb000 (wuacult.exe - windows auto update client). 
	Design: (1) bp on do_interrupt_all, (2) bp on cpu_x86_update_cr3 which sets new_cr3
			to 0xbccb000, (3) sets bLog to 1 at do_interrupt_all and stop at new_cr3
			breakpoint and FFLUSH the filelog.
	The change of cr3 instructio is located at 806ecbc8. Strangely, the recorded cr3 for
	all instructions are all bccb000. Failed system are calling each other. 
	Need to find a process that is frequently working.
		Failed.

	[3] attempt 3: start a new application program calc.exe and see how frequently process
		context switch occurs. Failed. calc.exe is never swiched to. (reason: it does not
		have any computation).

	[4] attempt 4: start ieexplore and see how it works. too slow. give up.
	
	[5] attempt 5: create a batch loop
		Use p BatchAnalyzer::myinst->sendCommandToVM("...") will save a lot of job.
		failed: batch file is not treated as a process. It looks like a part of the cmd.exe

3:00PM
	[6] attempt 6: create a time consuming batch task.[pure CPU no I/O]  [30 min]
		[1] create a time consuming loop, experiment it on xp vm . DONE.
		[2] copy it to samba DONE.
		[3] try run the new file and get its cr3/proc id.
		[4] see when it's swapped to (this must be a real context switch, because no one
			is request service from it). DONE.
		mainly switched from 5dee000 (csrss.exe). 
		start to take
		Still not recorded right. It maybe the first couple of system calls (switching 
between services).


6:30PM
	[7] attempt7: use WinDbg to trace into the timer interrupt handler and check the logic
	***	ins @81f8f5c4: push esp
	does not work, need to find the idt table.
	Doing "!idt -a" in windbg shows @8053c3fa

		Try the instructions listed in http://winprogger.com/the-epic-of-apic/ about apic.
		[1] reload the symbols .reload
		[2] !apic
		Does not work. Still cannot read apic contents. 0xfffe0000 is not accessible.

		7.2 try to locate context_switch routine in xp. 
		??? intno 177 (0xb1) is not in the list 
		Found that HalpDispatchInterruptHandler is a wraper of KiDispatchException, and 
handles APIC signals.
			May be we could trace from KiQuantumEnd: it checks quantum of each thread in 
	the list and does the swap.
			It is called by KiDispatchInterrupt -> KiQuantumEnd
				It reads _KPRCB->QuantumEnd (0x88c offset)
		The ISR entry point is:
			806d4a41 ff3524f0dfff    push    dword ptr ds:[0FFDFF024h]
806d4a47 c60524f0dfff02  mov     byte ptr ds:[0FFDFF024h],2
806d4a4e 832528f0dffffb  and     dword ptr ds:[0FFDFF028h],0FFFFFFFBh
806d4a55 fb              sti
806d4a56 ff1528e46c80    call    dword ptr [hal!_imp__KiDispatchInterrupt (806ce428)]
806d4a5c fa              cli
806d4a5d e8fea7ffff      call    hal!HalpEndSoftwareInterrupt (806cf260)
806d4a62 ff25f8e46c80    jmp     dword ptr [hal!_imp_Kei386EoiHelper (806ce4f8)]
------------------------------------------------
		Basically it just calls KiDispatchInterrupt. It seems not to distinguish
any interrupt number, because the KiDispatchInterrupt will read the APIC signals.

		*** all source code available in ReactOS!!!

	Use windbg commands !pcr to display current processor control register,
and dt _KPCRB (find the address of pcr entry) to find QuantimEnd field.
We can then set a hardware breakpoint (mem write) on it.

		It's updated by KeUpdateRunTime<-KeUpdateSystemTime (not sure which called it)
		irql (using !pcr) is 0x001c
		Irql: 0000001c
	                IRR: 00000004
	                IDR: ffff20f8

Another Idea:
	[1] get the binary code of the kiDispatchInterrupt (move byte ptr ds:[0ffdf024]) and
		search for the next code get the EIP in the qemu/xp.
	[2] trace the hardware interrupt who's triggering it.

	

	
10:00AM  02/05/2014
--------------------------------------------------------------------------------------
Task 207:  study the logic of process/thread scheduling and clock.
--------------------------------------------------------------------------------------
[1] study the logic of scheduling. (ReactOS) [1 hr]
	[1.1] HalpDispatchInterrupt2ndEntry: it calls KiDispatchInterrupt
	[1.2] KiDispatchInterrupt: handles both hardware and software
		get the prc (_KPRC) and prcb (_KPRCB, the kernel processor state data)
			Note: prcb->TimerRequest
					prcb->CurrentThread, prcb->NextThread 
		Logic:
			//prcb->QuantumEnd means the current thread's quantum is end
			if(prcb->QuantumEnd){ KiQuantumEnd(); //calculate QuantumEnd of the current thread,
				if yes, schedule another; if no, keep it; but strangely it does not update 
				prcb->QuantumEnd; GUESS: prcb->QuantumEnd if updated by another procedure}
			else{
				context switch and get a new thread to run;
				call KiContextSwitch to perform Context Switch
			}
	[1.3] KiQuantumEnd: 
			get the current thread from prcb->Thread
			check thread->Quantum (note: it's different from prcb->QuantumEnd)
			if runs out of qunatum (quantum<0): 
				reassigns priority of the current thread
				pick the next thread and schedule it to run.	
	[1.4] KeUpdateSystemTime: called by HalpClockInterrupt, HalpInterruptDispatch and
			KiInterruptDispatch:
				1. call KiWriteSystemTime
				2. call KeUpdateRunTime
	[1.5] KeUpdateRunTime:
			Update usertime, systemtime, interrupt time count (by clock cycles or quater?)
			Update Thread->Quantum (minus certain amount)
			if(Thread->Quantum<=0){
				prcb->QuantumEnd=1; //so that's where it is updated
[2] Locate the 1st 15 bytes of the binary code
	[1] KiQuantumEnd
		Use WinDbg:
			Locate the following code: corresponds to if(Thread->Quantum<=0...)
		804ff081 33db            xor     ebx,ebx
		804ff083 385e6f          cmp     byte ptr [esi+6Fh],bl
		804ff086 8845ff          mov     byte ptr [ebp-1],al
		804ff089 7f6c            jg      nt!KiQuantumEnd+0x95 (804ff0f7)

		nt!KiQuantumEnd+0x29:
		804ff08b 8b4644          mov     eax,dword ptr [esi+44h]
		804ff08e 385869          cmp     byte ptr [eax+69h],bl
		804ff091 740c            je      nt!KiQuantumEnd+0x3d (804ff09f)

	Byte string: "\x33\xdb\x38\x5e\x6f\x88\x45\xff\x7f\x6c\x8b\x46\x44\x38\x58"

		Implementation:	
			[1] In helper_trace2 declare a private function isKiQuantumEnd()
			[2] trace into it and prints the last_intno
		--> failed. could not capture KiQuantumEnd (maybe because it's not that frequent?)
	
	[2] try KeUpdateRunTime:	
		Try the following:
		80540388 806b6f03        sub     byte ptr [ebx+6Fh],3
		8054038c 7f19            jg      nt!KeUpdateRunTime+0x133 (805403a7)

	This corresponds to source:
		 if ((CurrentThread->Quantum -= 3) <= 0) {
				Prcb->QuantumEnd = TRUE;
				HalRequestSoftwareInterrupt(DISPATCH_LEVEL);
		 }
		Still could not get it right. Or is it caused by different code?
		Does not work. It seems that the data structure is different?
		Well then test the preceeding instructions at the beginning of the function
	that loads pcrb etc.
		80540274 a11cf0dfff      mov     eax,dword ptr ds:[FFDFF01Ch]
		80540279 53              push    ebx
		8054027a ff80c4050000    inc     dword ptr [eax+5C4h]

	Another try:
		8054027a ff80c4050000    inc     dword ptr [eax+5C4h]
		80540280 8b9824010000    mov     ebx,dword ptr [eax+124h]
		80540286 8b4b44          mov     ecx,dword ptr [ebx+44h]

	Still not successful. Maybe it's the offset of eax stuff.
	Attempt 2: only capture the skeleton. Mark the don't care as \xF7. not working. cannot
		find the sequence
	Attempt 3: try to read the logic of interrupt handler of 177, not like anyone.
	Failed.

[2] failed. Could not locate or trace into any of the KeUpdateRunTime functions! Strange!


	guess? THE INTERRUPT handler of 177 checks 0x20000 is to verify intno is int 177.
Check the WinDbg again on HalpClockInterrupt.
	Found it! It's the handler of int 177!!!!
hal!HalpClockInterrupt:
806d4d50 54              push    esp
806d4d51 55              push    ebp
806d4d52 53              push    ebx
806d4d53 56              push    esi
806d4d54 57              push    edi
806d4d55 83ec54          sub     esp,54h
806d4d58 8bec            mov     ebp,esp
806d4d5a 89442444        mov     dword ptr [esp+44h],eax
806d4d5e 894c2440        mov     dword ptr [esp+40h],ecx
806d4d62 8954243c        mov     dword ptr [esp+3Ch],edx
806d4d66 f744247000000200 test    dword ptr [esp+70h],20000h
806d4d6e 75b8            jne     hal!V86_Hci_a (806d4d28)
***********************************************8
!!!1 -> but on VBox WinXp it's hooked as interrupt 0x30! 
*************************************************
Compare with the following on interrupt handler of 177 
	ins @81f8f5c4: push esp
    2 ins @81f8f5c5: push ebp
    3 ins @81f8f5c6: push ebx
    4 ins @81f8f5c7: push esi
    5 ins @81f8f5c8: push edi
    6 ins @81f8f5c9: sub  esp, 0x54
    7 ins @81f8f5cc: mov  ebp, esp //ESP = OLD_ESP-0x68
    8 ins @81f8f5ce: mov  [esp+0x44], eax //it's old ESP-24, save EAX to ESP-10, curr stack frame
    9 ins @81f8f5d2: mov  [esp+0x40], ecx //it's old ESP-28,  save ECX ..
   10 ins @81f8f5d6: mov  [esp+0x3C], edx //it's old ESP-34, save EDX ...
   *** 11 ins @81f8f5da: test [esp+0x70], 0x00020000 //it's old ESP+8

[3] Comparative Study QEMU and VBox Winxp's HalpClockInterrupt Handler!

		VBox: 0x806d4d50 
	--> 1. //to test if it is the timer interrupt?
	806d4d66 f744247000000200 test    dword ptr [esp+70h],20000h
	--> 2.  //call begin system interrupt
	al!HalpClockInterrupt+0xa3:
	806d4df3 e850f9ffff      call    hal!HalBeginSystemInterrupt (806d4748)
	--> 3.  //call KeUpdateSystemTime
	806d4f35 0f843df5ffff    je      hal!KeUpdateSystemTime (806d4478)

	This leads to:
	Qemu: 0x81f8f5c5
	--> 1. //to test if it is the timer interrupt?
	81f8f5c5 44247000000200 test    dword ptr [esp+70h],20000h
		Code begin different from: @EIP 0x81f8f65b: length: (2): jnz  0x0000000F
		There are some extra instructions, but still follow roughly the same logic.
---> then it departs from one branch and it never calls KeUpdateSystem

	Trouble is that the logic of the two routines are completely different!!! It may be
caused by different device driver for hardware? Trouble ...

*************** TO DO
--------------------------------------------------------------------------------------
Task 208:  Figure out how context switch is done.
--------------------------------------------------------------------------------------
	[1] find the context switch code in KiDispatchInterrupt. 
			nt!SwapContext
		80540ab0 0ac9            or      cl,cl
		80540ab2 26c6462d02      mov     byte ptr es:[esi+2Dh],2
		80540ab7 9c              pushfd
		80540ab8 8b0b            mov     ecx,dword ptr [ebx]
		

	[2] find the corresponding procedure in QEMU
	Identified the code!: (!804dbec0!!!!)
		@EIP 0x804dbec0: length: (1): pushf
		@EIP 0x804dbec1: length: (2): movl      (%ebx), %ecx
		@EIP 0x804dbec3: length: (7): cmpl      $0x00, 0x994(%ebx)
		@EIP 0x804dbeca: length: (1): push      %ecx
		@EIP 0x804dbecb: length: (6): jnz       0x0000013A
		@EIP 0x804dbed1: length: (7): cmpl      $0x00, 0x8056198

	last_int_no is 177

	keyboard event could also trigger it.
	verified in snap222. It's not triggered.
	Note: not every 177 triggers it.

!!!!!!!!!!!!1 TO DO !!!!!!!!!!!!!!!!!!!!!!!!!!!!
	[3] record the code and find how it's triggered.
		start from interrupt 177 and stop at the first hit of 0x804dbec0

8:00AM 02/06/2014
	[4] record the code and comparative study.
	Implementation:
		[1] perform the experimen tagain on do_interrupt_all (147, keyboard event)
			Verified: snap111, every 147 triggers SwapContext.
				in snap222, the first 147 triggers SwapContext but the rest dont
9:30AM
		[2] mode 1: recording mode. 
			start: do_interrupt_all, triggered by 147 (keyboard event). set bLog=1
			end: 0x804dbec0 is hit.  -> ok.txt
		[3] mode 2: to check snap222
			start: do_interrupt_all, triggered by 147. set bLog=1
			end: do_interrupt_all, 147 again, and set bLog=0
		[4] observation:
			compared ok.txt and nok.txt, the different starts from
	nok.txt
			1630 ns @80518611 (cr3: 39000): xchg    [ecx], eax
			1631 ins @80518613 (cr3: 39000): test    eax, eax
			1632 ins @80518615 (cr3: 39000): jnz 0x0000001E
			1633 ins @80518633 (cr3: 39000): ret 0x0004
	ok.txt
			1630 ins @80518611 (cr3: b35e000): xchg  [ecx], eax
			1631 ins @80518613 (cr3: b35e000): test  eax, eax
			1632 ins @80518615 (cr3: b35e000): jnz   0x0000001E
			1633 ins @80518617 (cr3: b35e000): and   [-0x7FAAC520], eax
10:00AM
	[5] figure out the logic of the key processing algorithm and check how it gets to swapcontext.
		Approach: use WinDbg first dump !idt -a and check the key event handler, it's
			ace36e200000031:	899cc15c i8042prt!I8042KeyboardInterruptService (KINTERRUPT 899cc120)
			Verified: it's the ISR that handles keyboard event.
			First couple of instructions (from WinDbg dumped below):
---------------
kd> uf i8042prt!I8042KeyboardInterruptService
i8042prt!I8042KeyboardInterruptService:
ba9a8495 6a18            push    18h
ba9a8497 68a8b79aba      push    offset i8042prt!`string'+0x154 (ba9ab7a8)
ba9a849c e8ff000000      call    i8042prt!_SEH_prolog (ba9a85a0)
ba9a84a1 8b7d0c          mov     edi,dword ptr [ebp+0Ch]
ba9a84a4 8b7728          mov     esi,dword ptr [edi+28h]
ba9a84a7 837e3001        cmp     dword ptr [esi+30h],1
ba9a84ab 0f854f010000    jne     i8042prt!I8042KeyboardInterruptService+0xa2 (ba9a8600)
----------
	It corresponds to the following in ok.txt, recorded for (snap111) [by search "push.*18"]
---------
ns @f85c0495 (cr3: b35e000): push  0x18
ins @f85c0497 (cr3: b35e000): push  0xF85C37A8
ins @f85c049c (cr3: b35e000): call  0x00000104
ins @f85c05a0 (cr3: b35e000): push  0xF85C3274

	[6] figure out the key ISR logic (and see how it reaches the SwapContext). Logic commands
in ok_comment.txt in (qemu_image)
	Summary of the logic of I8042KeyboardInterruptService:
		(1)  call SEH_prolog
		(2) perform in al, 0x63 (port 0x63)  [it's one of the 8042 i/o ps2 device)
		(3) call I8xGetBytesAsynchronousA
		(4) call I8xQueueCurrentKeyboardInput
		(5) _SEH_epilog
		(5) back to KiInterruptDispatch

(WRONG)	Conclusion: keyboard still does not DIRECTLY trigger SwapContext. It still needs
	something like timer interrupt.

	[7] compare with nok.txt, see if all major points are there. All there
	Conclusion: confirmed. It's the problem of timer interrupt.

	[8] observe the rest of the ok_comment.txt and see how SwapContext is actually triggered.
	81f8f5c5 44247000000200 test    dword ptr [esp+70h],20000h

12:15PM
	[9] The conclusion above IS NOT RIGHT. The last_intno report of SwapContext is 147.
	So it must be the second 147 handler that triggers it. Check using WinDbg:
		[1] ba e1 I8042KeyboardInterruptService
		[2] hit it the second time and then ba e1 SwapContext and see what's going on. 
	After iretd of the 2nd I8042KeyboardInterruptService, it enters 
		intelppm.-> popProcessorIdle -> KiIdleLoop -> HalClearSoftwareInterrupt -> 
		KiRetireDPCList -> then SwapContext!!!
	Mark the above in ok_comment.txt

		Problem: but in the qemu version, it may be in a completely different context (when
	the key is pressed).

		--> check if KiIdleLoop is ever executed???

	[10] identify KiIdleLoop using WinDbg.
		KiIdleLoop code from WinDbg:
		----------------
		nt!KiIdleLoop+0x10:
		80540cc0 fb              sti
		80540cc1 90              nop
		80540cc2 90              nop
		80540cc3 fa              cli
		80540cc4 3b6d00          cmp     ebp,dword ptr [ebp]
		80540cc7 740d            je      nt!KiIdleLoop+0x26 (80540cd6)

		nt!KiIdleLoop+0x19:
		80540cc9 b102            mov     cl,2
		80540ccb ff15a8764d80    call    dword ptr [nt!_imp_HalClearSoftwareInterrupt (804d76a8)]
		80540cd1 e841000000      call    nt!KiRetireDpcList (80540d17)

	Pattern sti nop nop cli is unique, capture it.
		[a] impleent isIdIdleLoop.
		[b] test it. It's never hit for the 4 instructions version.
		[c] conclusion: KiIdleLoop is NEVER hit!!!!!!
		Conclusion: it may be because the KiIdleLoop itself relies on timer interrupt to be
	scheduled. (so for snap111 it's a problem?)

10:00PM
	[11] go back to analyze ok_comment.txt
			The problem is what happened after the 2nd keybard event returns. in WinDbg it's
	going to KiIdleLoop. But in QEMU's version, it goes directly to call of SwapContext
	immediately.
		[11.1] Need to find out:
			[1] dump memory: add function print_mem, e.g., to print the bytes at 0x804dbe90
				print_mem(0x804dbe90, 16, env)
			[2] windbg search
					s 80000000 L2000000 90 90 90
			"search for 3 nops between 8000000 and a000000"
		[11.2] search for 
			@804dbe0f (cr3: b35e000): cli
			ins @804dbe10 (cr3: b35e000): cmp   eax, [eax
			--- more details here
			@EIP 0x804dbe0f: length: (1): cli
			@EIP 0x804dbe10: length: (2): cmpl      (%eax), %eax
			@EIP 0x804dbe12: length: (2): jz        0x0000001F
		-- sometimes need to call print_mem twice because of page fault, the 5 bytes are
		below: 0x804dbe0f: fa 3b 00 74 1d 
		Search in WinDbg: s 80000000 L70000000 fa 3b 00 74 1d, find two and identified one
		*** KiDispatchInterrupt!
		80540a06 8d8380090000    lea     eax,[ebx+980h]
		80540a0c fa              cli
		80540a0d 3b00            cmp     eax,dword ptr [eax]
		80540a0f 741d            je      nt!KiDispatchInterrupt+0x2e (80540a2e)

		nt!KiDispatchInterrupt+0x11:
		80540a11 55              push    ebp
		80540a12 ff33            push    dword ptr [e

		[11.3] now continue to analyze the logic of KiDispatchInterrupt!
			Analyzisis done in ok_comment.txt
			Basic logic follows the source code in ReactOS for KiDispatchInterrupt
			
			take if elseif(prcb->NextThread) branch and swaps a thread.

		[11.4] compare ok_comment.txt and nok.txt, mainly check the important calls
			The difference is that prcb->QuantumEnd is 0 and prcb->nextThread is 0

****************************************************************************************
Conclusion: it's the guy who updated prcb->NextThread (the snap222 never has prcb->NextThread
updated). Needs to check: (interrupt 177 in QEMU and 0x30 in WinDbg)
****************************************************************************************
*** WinDbg:
hal!HalpClockInterrupt:
806d4d50 54              push    esp
806d4d51 55              push    ebp
806d4d52 53              push    ebx

*** QEMU:
	1 ins @81f8f5c4: push esp
    2 ins @81f8f5c5: push ebp
    3 ins @81f8f5c6: push ebx

 !!!!! KiQuantumEnd resets prcb->NextThread!

8:15AM 02/07/2014
	[12] Design: identify who's updating prcb->NextThread
		[1] in WinDbg find who's updating prcb->NextThread
			[a] bp on  KiDispatchInterrupt and find the check on prcb->NextThread
				prcb->NextThread is located at ffdff128
			[b] set a write BP on prcb->NextThread
				Observatoin: prcb->NextThread is modified in 
				(1) KiQuantumEnd (when it's found
					that QuantumEnd flag is set, switch thread).
				(2) also in the else if branch of KiDispatchInterrupt (when prb->NextThread
					is found to be not null), this time to swtich and clear it (see ReactOS)
				(3) KiReadyThread <- ExReleaseResource
				(4) KiUnlockDatabase
				(5) KiReadyThread<- .... <-win32!CreateSystemThread
				(6) KiAdjustQuantumThread ...
				too many to analyze.
9:45AM
	[13] Conjecture: HalpClockInterrupt calls KeUpdateSystemTime which then calls
			KiCheckForTimerExpiration then sets the software interrupt. Check if there is
			any difference between snap111 and snap222.
			[13.1]  with the help of WinDbg, provide the annotated comments of HalpClockInterrupt
below
				Steps: bp on do_interrupt_all if intno==177, and then comparative study of code
************************************************************************************
HalpClockInterrupt
*********************************************************************************** 
@EIP 0x81f8f5c4: length: (1): push      %esp
@EIP 0x81f8f5c5: length: (1): push      %ebp
@EIP 0x81f8f5c6: length: (1): push      %ebx
@EIP 0x81f8f5c7: length: (1): push      %esi
@EIP 0x81f8f5c8: length: (1): push      %edi
@EIP 0x81f8f5c9: length: (3): sub       $0x54, %esp
@EIP 0x81f8f5cc: length: (2): mov       %esp, %ebp
@EIP 0x81f8f5ce: length: (4): movl      %eax, 0x44(%esp)
@EIP 0x81f8f5d2: length: (4): movl      %ecx, 0x40(%esp)
@EIP 0x81f8f5d6: length: (4): movl      %edx, 0x3C(%esp)
@EIP 0x81f8f5da: length: (8): testl     $0x00020000, 0x70(%esp)
@EIP 0x81f8f5e2: length: (6): jnz       0x00000130 #jne hal!V86_Hci_a 
@EIP 0x81f8f5e8: length: (6): cmpw      $0x08, 0x6C(%esp)
@EIP 0x81f8f5ee: length: (2): jz        0x00000025
@EIP 0x81f8f5f0: length: (4): movw      %fs, 0x50(%esp) #copy segment registers
@EIP 0x81f8f5f4: length: (4): movw      %ds, 0x38(%esp)
@EIP 0x81f8f5f8: length: (4): movw      %es, 0x34(%esp)
@EIP 0x81f8f5fc: length: (4): movw      %gs, 0x30(%esp)
@EIP 0x81f8f600: length: (5): mov       $0x00000030, %ebx
@EIP 0x81f8f605: length: (5): mov       $0x00000023, %eax
@EIP 0x81f8f60a: length: (3): mov       %bx, %fs
@EIP 0x81f8f60d: length: (3): mov       %ax, %ds
@EIP 0x81f8f610: length: (3): mov       %ax, %es
@EIP 0x81f8f613: length: (7): movl      %fs:0x0, %ebx #set fs:[0] the SEH handler pointer 
@EIP 0x81f8f61a: length: (11): movl     $0xFFFFFFFF, %fs:0x0 #set to 0xFFFF (for kernel)
@EIP 0x81f8f625: length: (4): movl      %ebx, 0x4C(%esp)
@EIP 0x81f8f629: length: (6): cmp       $0x00010000, %esp
@EIP 0x81f8f62f: length: (6): jc        0x000000BB # Abios_Hci_a (checking intno)
@EIP 0x81f8f635: length: (8): movl      $0x00000000, 0x64(%esp)
@EIP 0x81f8f63d: length: (1): cld
@EIP 0x81f8f63e: length: (3): movl      0x60(%ebp), %ebx
@EIP 0x81f8f641: length: (3): movl      0x68(%ebp), %edi
@EIP 0x81f8f644: length: (3): movl      %edx, 0xC(%ebp)
@EIP 0x81f8f647: length: (7): movl      $0xBADB0D00, 0x8(%ebp)
@EIP 0x81f8f64e: length: (3): movl      %ebx, (%ebp)
@EIP 0x81f8f651: length: (3): movl      %edi, 0x4(%ebp)
@EIP 0x81f8f654: length: (7): testb     $0xFF, 0xFFDFF050
@EIP 0x81f8f65b: length: (2): jnz       0x0000000F jne hal!Dr_Hci_a (another handler)
# ***********************************************************************
#in the following (HalpClockInterrupt+0x99) the code is completely different
# ***********************************************************************
@EIP 0x81f8f65d: length: (5): mov       $0x81F8F588, %edi
# -------- to find out the call, trace into the long jump and get the signuatre instructions
# -------  then search in WinDbg!!!
@EIP 0x81f8f662: length: (5): ljmp      0xFE54B700  #!!! jump to KiDispatchInterrupt!!!
		# the rest of the code will actually never be hit!
		# continue to ljmp FE54B700! (804dad62)

@EIP 0x804dad62: length: (6): incl      0xFFDFF5C4
@EIP 0x804dad68: length: (2): mov       %esp, %ebp
@EIP 0x804dad6a: length: (3): movl      0x24(%edi), %eax
@EIP 0x804dad6d: length: (3): movl      0x29(%edi), %ecx
@EIP 0x804dad70: length: (1): push      %eax
@EIP 0x804dad71: length: (3): sub       $0x04, %esp
@EIP 0x804dad74: length: (1): push      %esp
@EIP 0x804dad75: length: (1): push      %eax
@EIP 0x804dad76: length: (1): push      %ecx
@EIP 0x804dad77: length: (6): lcall     *0x804D75D8 # call _imp_HalBeginSystemInterrupt
		#HalBeginSystemInterrupt mainly sets up the interrupt vector 
@EIP 0x804dad7d: length: (2): or        %eax, %eax
@EIP 0x804dad7f: length: (2): jz        0x00000038
@EIP 0x804dad81: length: (3): sub       $0x0C, %esp
@EIP 0x804dad84: length: (7): cmpl      $0x00, 0x8056198C
@EIP 0x804dad8b: length: (7): movl      $0x00000000, -0xC(%ebp)
@EIP 0x804dad92: length: (2): jnz       0x0000002D
@EIP 0x804dad94: length: (3): movl      0x1C(%edi), %esi
@EIP 0x804dad97: length: (3): movl      0x10(%edi), %eax
@EIP 0x804dad9a: length: (1): push      %eax       #eax is the interrupt context
@EIP 0x804dad9b: length: (1): push      %edi       #edi should be the interrupt number
@EIP 0x804dad9c: length: (3): lcall     *0xC(%edi) # call the interruptservice routin
#------------- ****************************************************************
# --- see what is the interrupt service routine 
# It's  f850c31e
# -----------------------------------------------------------------------------

# //problem could not find the corresponding code in WinDbg!
#!!!! got to use lm command to display memory range first
# find that 0xf8... mem range corresponds to 0xb0... range!
# use 0xf850c334 9 bytes as signature to search.
#!!!!! corresponds to ACPIInterruptServiceRoutine!!!!
#------------------------------------------------------------------------
@EIP 0xf850c31e: length: (2): mov       %edi, %edi
@EIP 0xf850c320: length: (1): push      %ebp
@EIP 0xf850c321: length: (2): mov       %esp, %ebp
@EIP 0xf850c323: length: (1): push      %ecx
@EIP 0xf850c324: length: (1): push      %ecx
@EIP 0xf850c325: length: (1): push      %ebx
@EIP 0xf850c326: length: (1): push      %esi
@EIP 0xf850c327: length: (1): push      %edi
@EIP 0xf850c328: length: (5): lcall     0x0000F84C #ACPIIoReadPm1Status
@EIP 0xf850c32d: length: (2): mov       %eax, %ebx
@EIP 0xf850c32f: length: (5): lcall     0xFFFFEA35 #ACPIGpeIsEvent
@EIP 0xf850c334: length: (2): test      %al, %al
@EIP 0xf850c336: length: (5): mov       $0x00010000, %edi
@EIP 0xf850c33b: length: (2): jz        0x00000004
@EIP 0xf850c33d: length: (2): or        %edi, %ebx
@EIP 0xf850c33f: length: (7): testb     $0x01, 0xF851F279
@EIP 0xf850c346: length: (2): jnz       0x00000008
@EIP 0xf850c348: length: (2): test      %ebx, %ebx
@EIP 0xf850c34a: length: (2): jnz       0x00000004
@EIP 0xf850c34c: length: (2): mov       %edi, %ebx
@EIP 0xf850c34e: length: (2): mov       %ebx, %esi
@EIP 0xf850c350: length: (3): and       $0x11, %esi
@EIP 0xf850c353: length: (3): movl      %esi, -0x4(%ebp)
@EIP 0xf850c356: length: (2): jz        0x0000001B
@EIP 0xf850c358: length: (1): push      %esi
@EIP 0xf850c359: length: (5): lcall     0x0000F45B # call CLEAR_PM1_STATUS_BITS
@EIP 0xf850c35e: length: (3): test      $0x01, %bl
@EIP 0xf850c361: length: (2): jz        0x0000000A
@EIP 0xf850c363: length: (5): movl      0xF851F590, %eax #PmHalDispatchTable
@EIP 0xf850c368: length: (3): lcall     *0xC(%eax) #LOOKS LIKE THE REAL HANDLER
# ---------------------- ACPITimerCarry
@EIP 0x806f4e08: length: (1): push      %ebx
@EIP 0x806f4e09: length: (6): movl      0x806F90A8, %edx #edx->hal!TimerInfo
@EIP 0x806f4e0f: length: (1): in        %dx, %eax  #MUST BE READING CLOCK VALUE
@EIP 0x806f4e10: length: (2): mov       %eax, %ebx
@EIP 0x806f4e12: length: (6): movl      0x806F90B8, %ecx # three attributes of hal!TimerInfo
@EIP 0x806f4e18: length: (5): movl      0x806F90AC, %eax
@EIP 0x806f4e1d: length: (6): movl      0x806F90B0, %edx
@EIP 0x806f4e23: length: (2): add       %ecx, %eax
@EIP 0x806f4e25: length: (3): adc       $0x00, %edx
@EIP 0x806f4e28: length: (2): xor       %eax, %ebx
@EIP 0x806f4e2a: length: (2): and       %ecx, %ebx
@EIP 0x806f4e2c: length: (2): add       %ebx, %eax
@EIP 0x806f4e2e: length: (3): adc       $0x00, %edx
@EIP 0x806f4e31: length: (6): movl      %edx, 0x806F90B4 #update the three attributes of TimerInfo
@EIP 0x806f4e37: length: (5): movl      %eax, 0x806F90AC
@EIP 0x806f4e3c: length: (6): movl      %edx, 0x806F90B0
@EIP 0x806f4e42: length: (1): pop       %ebx
@EIP 0x806f4e43: length: (1): ret

# ----- back to ACPIInterruptService
@EIP 0xf850c36b: length: (2): mov       %esi, %eax 
@EIP 0xf850c36d: length: (2): not       %eax
@EIP 0xf850c36f: length: (2): and       %eax, %ebx
@EIP 0xf850c371: length: (2): test      %ebx, %ebx
@EIP 0xf850c373: length: (2): jz        0x00000061
-- will jump 0x61 bytes away and kill the following (most likely) -- seems TO BE AFFECTED
#-- SEEMS TO BE AFFECTED BY THE TimerCarry results (ebx value)
#6:45PM test if 0xf850c375 is ever hit.
# ??? actually the following until 0xf850c3d6 is never hit
# -- in WinDbg it IS HIT? 
@EIP 0xf850c375: length: (3): movl      0xC(%ebp), %esi
@EIP 0xf850c378: length: (3): add       $0x30, %esi
@EIP 0xf850c37b: length: (2): movl      (%esi), %eax
@EIP 0xf850c37d: length: (2): not       %eax
@EIP 0xf850c37f: length: (2): test      %eax, %ebx
@EIP 0xf850c381: length: (2): jnz       0x00000004
@EIP 0xf850c383: length: (2): or        %edi, %ebx
@EIP 0xf850c385: length: (2): test      %ebx, %edi
@EIP 0xf850c387: length: (2): jz        0x00000009
@EIP 0xf850c389: length: (2): push      $0x00
@EIP 0xf850c38b: length: (5): lcall     0xFFFFE7F9
@EIP 0xf850c390: length: (1): push      %ebx
@EIP 0xf850c391: length: (5): lcall     0x0000F423
@EIP 0xf850c396: length: (2): movl      (%esi), %eax
@EIP 0xf850c398: length: (5): mov       $0x80000000, %edi
@EIP 0xf850c39d: length: (2): or        %edi, %ebx
@EIP 0xf850c39f: length: (2): mov       %eax, %edx
@EIP 0xf850c3a1: length: (1): push      %eax
@EIP 0xf850c3a2: length: (2): or        %ebx, %edx
@EIP 0xf850c3a4: length: (2): mov       %esi, %ecx
@EIP 0xf850c3a6: length: (3): movl      %eax, -0x8(%ebp)
@EIP 0xf850c3a9: length: (6): lcall     *0xF851C314
@EIP 0xf850c3af: length: (3): cmpl      %eax, -0x8(%ebp)
@EIP 0xf850c3b2: length: (2): jnz       0xFFFFFFED
@EIP 0xf850c3b4: length: (2): not       %eax
@EIP 0xf850c3b6: length: (2): and       %ebx, %eax
@EIP 0xf850c3b8: length: (3): orl       %eax, -0x4(%ebp)
@EIP 0xf850c3bb: length: (3): testl     %edi, -0x4(%ebp)
@EIP 0xf850c3be: length: (2): jz        0x00000013
@EIP 0xf850c3c0: length: (3): movl      0xC(%ebp), %eax
@EIP 0xf850c3c3: length: (2): push      $0x00
@EIP 0xf850c3c5: length: (2): push      $0x00
@EIP 0xf850c3c7: length: (3): add       $0x34, %eax
@EIP 0xf850c3ca: length: (1): push      %eax
@EIP 0xf850c3cb: length: (6): lcall     *0xF851C390
@EIP 0xf850c3d1: length: (3): movl      -0x4(%ebp), %esi
@EIP 0xf850c3d4: length: (2): xor       %eax, %eax
#--- the above is never hit in snap111 --- strange, directly return
@EIP 0xf850c3d6: length: (1): pop       %edi
@EIP 0xf850c3d7: length: (2): test      %esi, %esi
@EIP 0xf850c3d9: length: (1): pop       %esi
@EIP 0xf850c3da: length: (3): setnz     %al
@EIP 0xf850c3dd: length: (1): pop       %ebx
@EIP 0xf850c3de: length: (1): leave
@EIP 0xf850c3df: length: (3): ret       $0x0008

#---------- back to KiDispatchInterrupt
@EIP 0x804dad9f: length: (4): cmpl      $0x00, -0xC(%ebp)
@EIP 0x804dada3: length: (2): jnz       0x00000045
@EIP 0x804dada5: length: (3): add       $0x0C, %esp
@EIP 0x804dada8: length: (1): cli
@EIP 0x804dada9: length: (6): lcall     *0x804D75DC #call _HalEndSystemInterrupt
@EIP 0x804dadaf: length: (5): ljmp      0x00004B4C

#-------------- NEW TO ANALYZE!!! 9:30AM 02/08/2014
# -- this is KiExceptionExit
@EIP 0x804df8fb: length: (1): cli
@EIP 0x804df8fc: length: (7): testl     $0x00020000, 0x70(%ebp) #check IRQ is 0x20000 (timer)
@EIP 0x804df903: length: (2): jnz       0x00000008
@EIP 0x804df905: length: (4): testb     $0x01, 0x6C(%ebp) #check IRQL is 1
@EIP 0x804df909: length: (2): jz        0x00000036
#--------- the following will be skipped (however, it will sometimes be hit)
#in snap111 it is hit especially after sending a key
#in snap222 it never fired
# the following is HIT only if(IRQ has 0x20000 set || IRQL!=1) 
# SO the following is HIT only when it's NOT timer interrupt.
@EIP 0x804df90b: length: (6): movl      0xFFDFF124, %ebx #FFDFF124 points KTHREAD, now ebx has Thread
@EIP 0x804df911: length: (4): movb      $0x00, 0x2E(%ebx)  #  set _KTHREAD->Alerted to 0
@EIP 0x804df915: length: (4): cmpb      $0x00, 0x4A(%ebx)  # _KTHREAD->ApcState->UserAPCPending
@EIP 0x804df919: length: (2): jz        0x00000026 #if no ApcState->UserAPCPending skip following
@EIP 0x804df91b: length: (2): mov       %ebp, %ebx
@EIP 0x804df91d: length: (5): mov       $0x00000001, %ecx
@EIP 0x804df922: length: (6): lcall     *0x804D7648 #call _imp_KfRaiseIrq
@EIP 0x804df928: length: (1): push      %eax
@EIP 0x804df929: length: (1): sti
@EIP 0x804df92a: length: (1): push      %ebx
@EIP 0x804df92b: length: (2): push      $0x00
@EIP 0x804df92d: length: (2): push      $0x01
@EIP 0x804df92f: length: (5): lcall     0x00005F3E #call KiDeliverApc
@EIP 0x804df934: length: (1): pop       %ecx
@EIP 0x804df935: length: (6): lcall     *0x804D7670 #call _imp_KfLowerIrql
@EIP 0x804df93b: length: (1): cli
@EIP 0x804df93c: length: (2): ljmp      0xFFFFFFCF
@EIP 0x804df93e: length: (1): nop
#----------- the above will be skipped
@EIP 0x804df93f: length: (4): movl      0x4C(%esp), %edx
@EIP 0x804df943: length: (7): movl      %fs:0x50, %ebx
@EIP 0x804df94a: length: (7): movl      %edx, %fs:0x0
@EIP 0x804df951: length: (6): test      $0x000000FF, %ebx
@EIP 0x804df957: length: (2): jnz       0x00000050
@EIP 0x804df959: length: (8): testl     $0x00020000, 0x70(%esp)
@EIP 0x804df961: length: (6): jnz       0x000000C7
@EIP 0x804df967: length: (7): testw     $0xFFF8, 0x6C(%esp)
@EIP 0x804df96e: length: (2): jz        0x00000079
@EIP 0x804df970: length: (4): movl      0x3C(%esp), %edx
@EIP 0x804df974: length: (4): movl      0x40(%esp), %ecx
@EIP 0x804df978: length: (4): movl      0x44(%esp), %eax
@EIP 0x804df97c: length: (5): cmpw      $0x08, 0x6C(%ebp)
@EIP 0x804df981: length: (2): jz        0x0000000E
@EIP 0x804df983: length: (3): leal      0x30(%ebp), %esp
@EIP 0x804df986: length: (2): pop       %gs
@EIP 0x804df988: length: (1): pop       %es
@EIP 0x804df989: length: (1): pop       %ds
@EIP 0x804df98a: length: (3): leal      0x50(%ebp), %esp
@EIP 0x804df98d: length: (2): pop       %fs
@EIP 0x804df98f: length: (3): leal      0x54(%ebp), %esp
@EIP 0x804df992: length: (1): pop       %edi
@EIP 0x804df993: length: (1): pop       %esi
@EIP 0x804df994: length: (1): pop       %ebx
@EIP 0x804df995: length: (1): pop       %ebp
@EIP 0x804df996: length: (7): cmpw      $0x0080, 0x8(%esp)
@EIP 0x804df99d: length: (6): ja        0x000000A7
@EIP 0x804df9a3: length: (3): add       $0x04, %esp
@EIP 0x804df9a6: length: (1): iret
# ------------ return now
# ---- FINISH ????? did not update timer etc.

Summary:  ClockInterrupt -> ACPIInterruptService -> ACPITimerCarry -> ExceptionExit
Seems no important calls placed. 

	Conjecture: if disable TimerCarry what would happen?
	[1] check if snap222 is also caling TimerCarry.
@EIP 0xf850c368: length: (3): lcall     *0xC(%eax) #LOOKS LIKE THE REAL HANDLER
# ---------------------- ACPITimerCarry
...
# ----- back to ACPIInterruptService
@EIP 0xf850c36b: length: (2): mov       %esi, %eax 
	[2] test snap111 and snap222
			add a conditional bp at helper_trace2 (on f850c368)
			both are hit
	[3] try disable 0xf850c368 by directly setting the eip_in to f850c36b
			see the results of snap111 and snap222
			verified, it's NOT APICTimerCarry which causes the difference.
	[4] try disable the entire service by replacing eip
@EIP 0x81f8f5c4: length: (1): push      %esp with
@EIP 0x804df9a6: length: (1): iret
		Does not work in helper_trace2. It has to be done in disas_insn() in translate.c
		Translate the instruction to iret (0xcf).
		change is in translate.c:4425
		also bp on ops_sse.h:2423
	Verified, if disable the interrupt (iret) directly, it would not even trigger the
 timer interrupt again (because it ignores the READ_REG_C).
	[5] try disable 0xf850c368 (clal of APICTimerCarry) by replacing the
		three instructions at 0xf850c368, 0xf850c369, 0xf850c36a to NOP instructions.
		See if it would change the behavior of snap111.
		Verified, disableing APICTimerCarry does not actually affect it. Need to find another one.
	[6] try disable the lcall interrupt service routine at 0x804dad9c
			 
@EIP 0x804dad9c: length: (3): lcall     *0xC(%edi) # call the interruptservice routin
		verified: cannot disable. It disrupts the entire service.

	[7] disable  the ACPIGpeIsEvent
@EIP 0xf850c32f: length: (5): lcall     0xFFFFEA35 #ACPIGpeIsEvent
@EIP 0xf850c334: length: (2): test      %al, %al
		Verified: it can actually be disabled.

	[8] disable  _imp_HalBeginSystemInterrupt
@EIP 0x804dad77: length: (6): lcall     *0x804D75D8 # call _imp_HalBeginSystemInterrupt
		#HalBeginSystemInterrupt mainly sets up the interrupt vector 
@EIP 0x804dad7d: length: (2): or        %eax, %eax
		blue screen.

	[9]  try ACPITimerAgain

@EIP 0xf850c368: length: (3): lcall     *0xC(%eax) #LOOKS LIKE THE REAL HANDLER
@EIP 0xf850c36b: length: (2): mov       %esi, %eax 
		verified ACPITimer does can be disabled. no effects.

---------------------
	[10]  --- TO DO find out the constant 0x20000 and EBP+70h (what the data structure of
ESP; find the KiExceptionExit branch that are hit.
		!!! it should be related to PCR or PCRB
		*** VERIFYED EBP+70h is the IRR field of _KPCR (offset 0x28)
		So _KPCR address is EBP+70h-28h = EBP+48
		So we can infer that
			[EBP+70h] is the IRR field
			[EBP+6ch] is the IRQL field (irql LEVEL)
		KiExceptionExit is to verify if IRR is 0x20000 and IRQL is 1, according to the
calculation 32*IRQL + left most bit, the intno related on WinDbg is
			17 + 32*1 = 49.
	[11] It seems that timer interrupt cannot be disabled because of the following line
@EIP 0x804df92f: length: (5): lcall     0x00005F3E #call KiDeliverApc

		Verified: if disabled, the entire qemu system is frozen, not accepting commands
and also blocked gdb.


	[12] Check again how the part of the code is visited
@EIP 0x804df8fc: length: (7): testl     $0x00020000, 0x70(%ebp) #check IRQ is 0x20000 (timer)
@EIP 0x804df903: length: (2): jnz       0x00000008
@EIP 0x804df905: length: (4): testb     $0x01, 0x6C(%ebp) #check IRQL is 1
@EIP 0x804df909: length: (2): jz        0x00000036
#--------- the following will be skipped (however, it will sometimes be hit)
#in snap111 it is hit especially after sending a key
#in snap222 it never fired
# the following is HIT only if(IRQ has 0x20000 set || IRQL!=1) 
# SO the following is HIT only when it's NOT timer interrupt.
@EIP 0x804df90b: length: (6): movl      0xFFDFF124, %ebx #FFDFF124 points KTHREAD, now ebx has Thread
@EIP 0x804df911: length: (4): movb      $0x00, 0x2E(%ebx)  #  set _KTHREAD->Alerted to 0
@EIP 0x804df915: length: (4): cmpb      $0x00, 0x4A(%ebx)  # _KTHREAD->ApcState->UserAPCPending
@EIP 0x804df919: length: (2): jz        0x00000026 #if no ApcState->UserAPCPending skip following

	*** observation: in snap111, most of these are coming from testrb 0x01,0x6c(%ebp) (check
IRQ1); in snap222, the same jz at 0x804df909 is hit, but it never jumps into the
0x804df90b branch.

	*** find who's writing to 0x6c(%ebp) - not working. too many writes to it. 

	[13] Consider call of SwapContext again.
		@EIP 0x804dbec0: length: (1): pushf
	SwapContext is never invoked in snap222 except the first time.
    The hit of SwapContext is rare for 177 in snap111 as well.

9:00AM 02/10/2014
--------------------------------------------------------------------------------------
Task 208:  Find out why snap222 does not trigger SwapContext
--------------------------------------------------------------------------------------
Observation:
	[1] SwapContext is never called in snap222
	[2] _KPRCB->QuantumEnd is updated in KeUpdateRunTime, but it's never showing up in snap111.
[1] try locate KeUpdateRunTime.
	[1] get the signature string: 
	[2] add a function isKeUpdateRunTime: still could not get the KeUpdateRunTimer. Strange???
	10:00AM 02/11/2014
	[3] try another snippet of isKeUpdateRunTime.
		Still too slow. Doing a dir does not work. It seems that before turning
	bLog=1, if we do dir, and then do it again it will work (pages loaded?)
		to improve the speed, comment out the disasm and sprintf line.
		Still could not capture any code of KeUpdateRunTime. give it up.
[2] observation: it seems _KPRCB->QuantumEnd is a fixed number
	Approach: read code of KiDispatchInterrupt, there is a branch about
		if(prcb->QuantumEnd){
			prcb->QuantumEnd= 0;
			KiQuantumEnd;
		}
		the prcb->QuantumEnd (after setting a write BP on it), is modified in
	KeUpdateRunTime, it's assigned with the value of ESP 
	prcb is identified as ffdff9ac.
	[2.1] in helper_mem function read/write set a bp and check who is accessing ffdff9ac.
		[1] read. Verified. it's been read in KiDispatchInterrupt. Code is only slightly
	different from the windbg version.
		[2] write. [1] performed at the end of KiDispatchInterrupt
				[2] found also where it is written, looks like KeUpdateRunTime
		observation: it's called much less frequently after the snapshot is loaded.	
				Ideitnfied: it's a part of KeUpdateRunTime! [that explains why after snapshot
		is loaded a while >10 seconds, KeupdateRunTime is not hit again! strange!]
		The following is the partial code from QEMU:
############ YEAH!!! FINALLY FOUND the KeUpdateRunTime !!!! ##############################
############ !!!!!!!!!!!!!!!!!!!!!! KeUpdateRunTime !!!!!!!!!!!!!! in QEMU !!!!!!!!!!!!!
		@EIP 0x804e39c1: length: (4): subb      $0x03, 0x6F(%ebx) # CurrentThread->Quantum-=3
	### 0x6f(ebx) is Currentthread->Quantum is NOT always located at the same place!!!
	### but usually they have values like 0x00000000fa (not big value)
		@EIP 0x804e39c5: length: (2): jg        0x0000001B #if >0, skip; else update QuantumEnd
		@EIP 0x804e39c7: length: (6): cmpl      0x12C(%eax), %ebx 
		@EIP 0x804e39cd: length: (2): jz        0x00000013
		@EIP 0x804e39cf: length: (6): movl      %esp, 0x9AC(%eax) #write ESP to prcb->QuantumEnd
		@EIP 0x804e39d5: length: (5): mov       $0x00000002, %ecx
		@EIP 0x804e39da: length: (6): lcall     *0x804D7654 #nt!_imp_HalRequestSoftwareInterrupt
		@EIP 0x804e39e0: length: (1): pop       %ebx
		@EIP 0x804e39e1: length: (3): ret       $0x0004
############ !!!!!!!!!!!!!!!!!!!!!! End of KeUpdateRunTime !!!!!!!!!!!!!! in QEMU !!!!!!!!!!!!!

		[2.3] set a breakpoint  at 0x804e39c1 and 0x804e39cf and see how frequently they are hit.
		Observation: interestingly both are hit ONCE and ONLY ONCE!!!! why????

		[2.4] try to locate KeUpdateSystemTime.
			use the above two breakpoints and set then bp on helper_trace2 and 
		display print_instrRange(eip_in, eip_in+1, env), until it hits the next instruction
		after ret.
############ YEAH!!!!!!! FINALLY FOUND THE KeUpdateSystemTime !!! ######################
@EIP 0x804e373d: length: (5): mov       $0xFFDF0000, %ecx
@EIP 0x804e3742: length: (3): movl      0x8(%ecx), %edi
@EIP 0x804e3745: length: (3): movl      0xC(%ecx), %esi
@EIP 0x804e3748: length: (2): add       %eax, %edi
@EIP 0x804e374a: length: (3): adc       $0x00, %esi
@EIP 0x804e374d: length: (3): movl      %esi, 0x10(%ecx)
@EIP 0x804e3750: length: (3): movl      %edi, 0x8(%ecx)
@EIP 0x804e3753: length: (3): movl      %esi, 0xC(%ecx)
@EIP 0x804e3756: length: (6): subl      %eax, 0x80551994
@EIP 0x804e375c: length: (5): movl      0x80551980, %eax
@EIP 0x804e3761: length: (2): mov       %eax, %ebx
@EIP 0x804e3763: length: (6): jg        0x0000008A
@EIP 0x804e3769: length: (5): mov       $0xFFDF0000, %ebx
@EIP 0x804e376e: length: (3): movl      0x14(%ebx), %ecx
@EIP 0x804e3771: length: (3): movl      0x18(%ebx), %edx
@EIP 0x804e3774: length: (6): addl      0x80551990, %ecx
@EIP 0x804e377a: length: (3): adc       $0x00, %edx
@EIP 0x804e377d: length: (3): movl      %edx, 0x1C(%ebx)
@EIP 0x804e3780: length: (3): movl      %ecx, 0x14(%ebx)
@EIP 0x804e3783: length: (3): movl      %edx, 0x18(%ebx)
@EIP 0x804e3786: length: (2): mov       %eax, %ebx
@EIP 0x804e3788: length: (2): mov       %eax, %ecx
@EIP 0x804e378a: length: (6): movl      0x80551984, %edx
@EIP 0x804e3790: length: (3): add       $0x01, %ecx
@EIP 0x804e3793: length: (3): adc       $0x00, %edx
@EIP 0x804e3796: length: (6): movl      %edx, 0x80551988
@EIP 0x804e379c: length: (6): movl      %ecx, 0x80551980
@EIP 0x804e37a2: length: (6): movl      %edx, 0x80551984
@EIP 0x804e37a8: length: (1): push      %eax
@EIP 0x804e37a9: length: (5): movl      0xFFDF0000, %eax
@EIP 0x804e37ae: length: (3): add       $0x01, %eax
@EIP 0x804e37b1: length: (2): jnc       0x00000008
@EIP 0x804e37b3: length: (6): incl      0x8055671C
@EIP 0x804e37b9: length: (5): movl      0x80556718, %eax
@EIP 0x804e37be: length: (7): imull     0x8055671C, %eax
@EIP 0x804e37c5: length: (2): add       %ecx, %eax
@EIP 0x804e37c7: length: (5): movl      %eax, 0xFFDF0000
@EIP 0x804e37cc: length: (1): pop       %eax
@EIP 0x804e37cd: length: (5): and       $0x000000FF, %eax
@EIP 0x804e37d2: length: (7): leal      -0x7FAA6400(,%eax,8), %ecx
@EIP 0x804e37d9: length: (2): movl      (%ecx), %edx
@EIP 0x804e37db: length: (2): cmp       %edx, %ecx
@EIP 0x804e37dd: length: (2): jz        0x0000000E
@EIP 0x804e37df: length: (3): cmpl      -0x4(%edx), %esi
@EIP 0x804e37e2: length: (2): jc        0x00000009
@EIP 0x804e37e4: length: (2): ja        0x00000027
@EIP 0x804e37e6: length: (3): cmpl      -0x8(%edx), %edi
@EIP 0x804e37e9: length: (2): jnc       0x00000022
@EIP 0x804e37eb: length: (1): inc       %eax
@EIP 0x804e37ec: length: (1): inc       %ebx
@EIP 0x804e37ed: length: (5): and       $0x000000FF, %eax
@EIP 0x804e37f2: length: (7): leal      -0x7FAA6400(,%eax,8), %ecx
@EIP 0x804e37f9: length: (2): movl      (%ecx), %edx
@EIP 0x804e37fb: length: (2): cmp       %edx, %ecx
@EIP 0x804e37fd: length: (2): jz        0x00000052
@EIP 0x804e37ff: length: (3): cmpl      -0x4(%edx), %esi
@EIP 0x804e3802: length: (2): jc        0x0000004D
@EIP 0x804e3804: length: (2): ja        0x00000007
@EIP 0x804e3806: length: (3): cmpl      -0x8(%edx), %edi
@EIP 0x804e3809: length: (2): jc        0x00000046
@EIP 0x804e380b: length: (6): movl      0xFFDFF020, %ecx
@EIP 0x804e3811: length: (6): leal      0x80559984, %eax
@EIP 0x804e3817: length: (6): leal      0x8A0(%ecx), %edx
@EIP 0x804e381d: length: (4): cmpl      $0x00, 0x18(%eax)
@EIP 0x804e3821: length: (2): jnz       0x0000002E
@EIP 0x804e3823: length: (1): cli
@EIP 0x804e3824: length: (6): incl      0x870(%ecx)
@EIP 0x804e382a: length: (3): movl      %edx, 0x18(%eax)
@EIP 0x804e382d: length: (3): movl      %ebx, 0x10(%eax)
@EIP 0x804e3830: length: (6): add       $0x00000860, %ecx
@EIP 0x804e3836: length: (3): movl      0x4(%ecx), %ebx
@EIP 0x804e3839: length: (3): movl      %eax, 0x4(%ecx)
@EIP 0x804e383c: length: (2): movl      %eax, (%ebx)
@EIP 0x804e383e: length: (2): movl      %ecx, (%eax)
@EIP 0x804e3840: length: (3): movl      %ebx, 0x4(%eax)
@EIP 0x804e3843: length: (1): sti
@EIP 0x804e3844: length: (5): mov       $0x00000002, %ecx
@EIP 0x804e3849: length: (6): lcall     *0x804D7654
@EIP 0x804e384f: length: (7): cmpb      $0x00, 0x805530C1
@EIP 0x804e3856: length: (2): jnz       0x0000003C
@EIP 0x804e3858: length: (7): cmpl      $0x00, 0x80551994 #comp KiTickOffset
@EIP 0x804e385f: length: (2): jg        0x00000021
@EIP 0x804e3861: length: (5): movl      0x8055198C, %eax #eax:= [KeMatxTickOffset]
@EIP 0x804e3866: length: (6): addl      %eax, 0x80551994
@EIP 0x804e386c: length: (3): pushl     (%esp)
@EIP 0x804e386f: length: (5): lcall     0x0000003E # call KeUpdateRunTime
@EIP 0x804e3874: length: (1): cli
@EIP 0x804e3875: length: (6): lcall     *0x804D75DC #_imp__HalEndSystemInterrupt
@EIP 0x804e387b: length: (5): ljmp      0xFFFFC080 #nt!KiExceptionExit
# --------- END Of KeUpdateSystemTime ---------------------------------- #####

	[2.4] set bp on 0x804e3858 (compare KiTickOffset) and @EIP 0x804e373d 
		(beginning of KeUpdateSystemTime)
		Observation: KeUpdateSystemTime is also called only ONCE!!!! forever!!!
		check who's calling KeUpdateSystemTime.

	The problem is that it never returns from
@EIP 0x804e3875: length: (6): lcall     *0x804D75DC #_imp__HalEndSystemInterrupt

	[2.5] set a conditional bp on 0x804e3875 and delve into _imp__HalEndSystemInterrupt
		@EIP 0x806eec50: length: (2): xor       %ecx, %ecx #hit many times, but KeUpdateSystem is 
				#never called again.
		@EIP 0x806eec52: length: (4): movb      0x4(%esp), %cl
		@EIP 0x806eec56: length: (6): movb      -0x7F911DA8(%ecx), %cl
		@EIP 0x806eec5c: length: (10): movl     $0x00000000, 0xFFFE00B0
		@EIP 0x806eec66: length: (3): cmp       $0x41, %cl
		@EIP 0x806eec69: length: (2): jc        0x00000011 # will actually jump to 7a
		@EIP 0x806eec6b: length: (1): push      %ecx
		@EIP 0x806eec6c: length: (5): lcall     0xFF9DB58A
		@EIP 0x806eec71: length: (1): nop
		@EIP 0x806eec72: length: (5): lcall     0xFF9DB547
		@EIP 0x806eec77: length: (3): ret       $0x0008
		@EIP 0x806eec6b: length: (1): push      %ecx
		@EIP 0x806eec6c: length: (5): lcall     0xFF9DB58A
		@EIP 0x806eec71: length: (1): nop
		@EIP 0x806eec72: length: (5): lcall     0xFF9DB547
		@EIP 0x806eec77: length: (3): ret       $0x0008
		@EIP 0x806eec7a: length: (7): cmpb      $0x00, 0xFFDFF096 # !!! jump here
		@EIP 0x806eec81: length: (7): movb      $0x00, 0xFFDFF095
		@EIP 0x806eec88: length: (2): jz        0xFFFFFFE3
		@EIP 0x806eec8a: length: (5): push      $0x00000041
		@EIP 0x806eec8f: length: (5): lcall     0xFF9DB567 #???
		@EIP 0x806eec94: length: (1): push      %ebx
		@EIP 0x806eec95: length: (1): push      %ecx
		@EIP 0x806eec96: length: (1): sti
		@EIP 0x806eec97: length: (7): movb      $0x00, 0xFFDFF096 ### will breakhere 
		@EIP 0x806eec9e: length: (6): lcall     *0x806EC430 --> calls 804dbe03
			# this is KiDispatchInterrupt.
		@EIP 0x806eeca4: length: (1): cli #----------- NEVER REACHED
		@EIP 0x806eeca5: length: (1): pop       %ecx
		@EIP 0x806eeca6: length: (1): pop       %ebx
		@EIP 0x806eeca7: length: (2): ljmp      0xFFFFFFC4
		@EIP 0x806eeca9: length: (3): leal      (%ecx), %ecx
		@EIP 0x806eecac: length: (2): xor       %eax, %eax
		@EIP 0x806eecae: length: (4): movb      0x4(%esp), %al
		@EIP 0x806eecb2: length: (6): movb      -0x7F911DA8(%eax), %al
		@EIP 0x806eecb8: length: (1): nop
		@EIP 0x806eecb9: length: (5): lcall     0xFF9DB4F6
		@EIP 0x806eecbe: length: (5): lcall     0xFF9DB531
		@EIP 0x806eecc3: length: (4): movl      0xC(%esp), %eax
		@EIP 0x806eecc7: length: (3): shr       $0x04, %ecx
		@EIP 0x806eecca: length: (6): movb      -0x7F906F78(%ecx), %cl
		@EIP 0x806eecd0: length: (2): movb      %cl, (%eax)
		@EIP 0x806eecd2: length: (5): mov       $0x00000001, %eax
		@EIP 0x806eecd7: length: (1): sti
		@EIP 0x806eecd8: length: (3): cmp       $0x02, %cl
		@EIP 0x806eecdb: length: (2): jnc       0x00000009
		@EIP 0x806eecdd: length: (7): movb      $0x02, 0xFFDFF095
		@EIP 0x806eece4: length: (3): ret       $0x000C
		@EIP 0x806eece7: length: (1): int3

	[2.6] it seems that
		KeUpdateSystemTime -> KeUpdateRunTime ->halEndSystemInterrupt (multiple times)
	First instruction of KeUpdateSystemTime
	@EIP 0x804e373d: length: (5): mov       $0xFFDF0000, %ecx
	Just check what is the previous eip.
		The previous instruction is located at 0x806f46d4. (ljmp)

	9:30AM 02/12/2014
	[2.7] record the previous 100 instructions [estimate: 40 min]
		[1] declare a queue and append it in queue about eip [15 min] DONE
		[2] declare a function that prints the contents of queue [10 min] DONE
		[3] break on 0x804e373d and see what are the previous 100 instructions and
			then find the lcall or ljmp instructions [15 min]
		found that KeUpdateSystem it invoked by a ljmp
		Search for the instruction sequence of hte container function.
		There are a lot of in/out to 70/71 ports (verified it's real time clock)
		search for the following two signatures
		@EIP 0x806ece14: length: (2): mov       $0x0C, %al
		@EIP 0x806ece16: length: (2): out       %al, $0x70

		There are many candidates. Trick (out 0xc, 0x70) is repeated twice in the QEMU version.
		The following are the list of candidates:
			(1) HalStartProfileInterrupt (not right) the second is 0xd to 0x70 port.
			(2) HalStopProfileInterrupt not right, it outputs 0b, 0c, 0d
			(3) HalpSetWakeAlarm - ends with output 0d to 0x70
			none of them work!

		[4] another attempt: check how many times
		the following instruction is hit:
		EIP 0x806ecf7c: length: (6): cmpl      %ebx, 0x806F4ED4
			This instruction is ONLY hit once only!!!!

		[5] maybe we should display 150 instructions instead.
		Found the instruction sequence starts from 0x806ecd34. 
Strangely, 0x806ecd34 is ONLY HIT ONCE! It's triggered by hardware interrupt 209.
--------------------
@EIP 0x806ecd34: length: (1): push      %esp
@EIP 0x806ecd35: length: (1): push      %ebp
@EIP 0x806ecd36: length: (1): push      %ebx
@EIP 0x806ecd37: length: (1): push      %esi
@EIP 0x806ecd38: length: (1): push      %edi
@EIP 0x806ecd39: length: (3): sub       $0x54, %esp
@EIP 0x806ecd3c: length: (2): mov       %esp, %ebp
@EIP 0x806ecd3e: length: (4): movl      %eax, 0x44(%esp)
@EIP 0x806ecd42: length: (4): movl      %ecx, 0x40(%esp)
@EIP 0x806ecd46: length: (4): movl      %edx, 0x3C(%esp)
@EIP 0x806ecd4a: length: (8): testl     $0x00020000, 0x70(%esp)
@EIP 0x806ecd52: length: (2): jnz       0xFFFFFFBA
@EIP 0x806ecd54: length: (6): cmpw      $0x08, 0x6C(%esp)
@EIP 0x806ecd5a: length: (2): jz        0x00000025
@EIP 0x806ecd7f: length: (7): movl      %fs:0x0, %ebx
@EIP 0x806ecd86: length: (11): movl     $0xFFFFFFFF, %fs:0x0
@EIP 0x806ecd91: length: (4): movl      %ebx, 0x4C(%esp)
@EIP 0x806ecd95: length: (6): cmp       $0x00010000, %esp
@EIP 0x806ecd9b: length: (6): jc        0xFFFFFF49
@EIP 0x806ecda1: length: (8): movl      $0x00000000, 0x64(%esp)
@EIP 0x806ecda9: length: (1): cld
@EIP 0x806ecdaa: length: (3): movl      0x60(%ebp), %ebx
@EIP 0x806ecdad: length: (3): movl      0x68(%ebp), %edi
@EIP 0x806ecdb0: length: (3): movl      %edx, 0xC(%ebp)
@EIP 0x806ecdb3: length: (7): movl      $0xBADB0D00, 0x8(%ebp)
@EIP 0x806ecdba: length: (3): movl      %ebx, (%ebp)
@EIP 0x806ecdbd: length: (3): movl      %edi, 0x4(%ebp)
@EIP 0x806ecdc0: length: (7): testb     $0xFF, 0xFFDFF050
@EIP 0x806ecdc7: length: (6): jnz       0xFFFFFE99
@EIP 0x806ecdcd: length: (5): push      $0x000000D1
@EIP 0x806ecdd2: length: (3): sub       $0x04, %esp
@EIP 0x806ecdd5: length: (1): push      %esp
@EIP 0x806ecdd6: length: (5): push      $0x000000D1
@EIP 0x806ecddb: length: (2): push      $0x1C
@EIP 0x806ecddd: length: (5): lcall     0x00001ECF
@EIP 0x806eecac: length: (2): xor       %eax, %eax
@EIP 0x806eecae: length: (4): movb      0x4(%esp), %al
@EIP 0x806eecb2: length: (6): movb      -0x7F911DA8(%eax), %al
@EIP 0x806eecb8: length: (1): nop
@EIP 0x806eecb9: length: (5): lcall     0xFF9DB4F6
@EIP 0x800ca1af: length: (2): out       %al, $0x7E
@EIP 0x800ca1b1: length: (7): movzxb    0x800CA300, %ecx
@EIP 0x800ca1b8: length: (1): ret
@EIP 0x806eecbe: length: (5): lcall     0xFF9DB531
@EIP 0x800ca1ef: length: (1): push      %eax
@EIP 0x800ca1f0: length: (5): lcall     0x00000006
@EIP 0x800ca1f6: length: (1): pushf
@EIP 0x800ca1f7: length: (1): push      %eax
@EIP 0x800ca1f8: length: (1): push      %ebx
@EIP 0x800ca1f9: length: (2): out       %al, $0x7E
@EIP 0x800ca1fb: length: (5): movl      0x800CA300, %eax
@EIP 0x800ca200: length: (2): mov       %eax, %ebx
@EIP 0x800ca202: length: (4): movb      0x10(%esp), %bl
@EIP 0x800ca206: length: (8): lock cmpxchgl     %ebx, 0x800CA300
@EIP 0x800ca20e: length: (2): jnz       0xFFFFFFED
@EIP 0x800ca210: length: (2): cmp       %bh, %bl
@EIP 0x800ca212: length: (2): jnc       0x00000004
@EIP 0x800ca216: length: (3): rol       $0x08, %ebx
@EIP 0x800ca219: length: (2): cmp       %bh, %bl
@EIP 0x800ca21b: length: (2): ja        0x00000008
@EIP 0x800ca21d: length: (1): pop       %ebx
@EIP 0x800ca21e: length: (1): pop       %eax
@EIP 0x800ca21f: length: (1): popf
@EIP 0x800ca220: length: (3): ret       $0x0004
@EIP 0x800ca1f5: length: (1): ret
@EIP 0x806eecc3: length: (4): movl      0xC(%esp), %eax
@EIP 0x806eecc7: length: (3): shr       $0x04, %ecx
@EIP 0x806eecca: length: (6): movb      -0x7F906F78(%ecx), %cl
@EIP 0x806eecd0: length: (2): movb      %cl, (%eax)
@EIP 0x806eecd2: length: (5): mov       $0x00000001, %eax
@EIP 0x806eecd7: length: (1): sti
@EIP 0x806eecd8: length: (3): cmp       $0x02, %cl
@EIP 0x806eecdb: length: (2): jnc       0x00000009
@EIP 0x806eecdd: length: (7): movb      $0x02, 0xFFDFF095
@EIP 0x806eece4: length: (3): ret       $0x000C
@EIP 0x806ecde2: length: (7): cmpb      $0x00, 0x806F97A4
@EIP 0x806ecde9: length: (2): jz        0x00000007
@EIP 0x806ecdf0: length: (5): movb      0x806F4ECC, %al
@EIP 0x806ecdf5: length: (2): or        %al, %al
@EIP 0x806ecdf7: length: (2): jz        0x00000018
@EIP 0x806ecdf9: length: (7): addb      $0x56, 0x806F4ECD
@EIP 0x806ece00: length: (2): jnc       0x0000000F
@EIP 0x806ece0f: length: (5): lcall     0xFFFFFC89
@EIP 0x806eca98: length: (1): push      %eax
@EIP 0x806eca99: length: (1): pushf
@EIP 0x806eca9a: length: (1): cli
@EIP 0x806eca9b: length: (6): leal      0x806FDF20, %eax
@EIP 0x806ecaa1: length: (6): popl      0x806F4E84
@EIP 0x806ecaa7: length: (1): pop       %eax
@EIP 0x806ecaa8: length: (1): ret
@EIP 0x806ece14: length: (2): mov       $0x0C, %al
@EIP 0x806ece16: length: (2): out       %al, $0x70
@EIP 0x806ece18: length: (2): pushf
@EIP 0x806ece1a: length: (2): popf
@EIP 0x806ece1c: length: (2): ljmp      0x00000002
@EIP 0x806ece1e: length: (2): in        $0x71, %al
@EIP 0x806ece20: length: (2): pushf
@EIP 0x806ece22: length: (2): popf
@EIP 0x806ece24: length: (2): ljmp      0x00000002
@EIP 0x806ece26: length: (2): mov       $0x0C, %al
@EIP 0x806ece28: length: (2): out       %al, $0x70
@EIP 0x806ece2a: length: (2): pushf
@EIP 0x806ece2c: length: (2): popf
@EIP 0x806ece2e: length: (2): ljmp      0x00000002
@EIP 0x806ece30: length: (2): in        $0x71, %al
@EIP 0x806ece32: length: (2): pushf
@EIP 0x806ece34: length: (2): popf
@EIP 0x806ece36: length: (2): ljmp      0x00000002
@EIP 0x806ece38: length: (5): lcall     0xFFFFFC7C
@EIP 0x806ecab4: length: (1): push      %eax
@EIP 0x806ecab5: length: (6): pushl     0x806F4E84
@EIP 0x806ecabb: length: (6): leal      0x806FDF20, %eax
@EIP 0x806ecac1: length: (1): popf
@EIP 0x806ecac2: length: (1): pop       %eax
@EIP 0x806ecac3: length: (1): ret
@EIP 0x806ece3d: length: (5): movl      0x806F4EAC, %eax
@EIP 0x806ece42: length: (2): xor       %ebx, %ebx
@EIP 0x806ece44: length: (6): movl      0x806F4EB0, %ecx
@EIP 0x806ece4a: length: (6): addb      %cl, 0x806F4EC4
@EIP 0x806ece50: length: (2): sbb       %ebx, %eax
@EIP 0x806ece52: length: (7): cmpl      $0x00, 0x806F9920
@EIP 0x806ece59: length: (6): jz        0x00000123
@EIP 0x806ecf7c: length: (6): cmpl      %ebx, 0x806F4ED4
@EIP 0x806ecf82: length: (6): jz        0x00007752
@EIP 0x806f46d4: length: (6): ljmp      *0x806EC40C
@EIP 0x804e373d: length: (5): mov       $0xFFDF0000, %ecx

##################
Conclusion: it seems to be INTERRUPT NO 209 who triggers the KeUpdateRunTimer and
KeUpdateSystemTime.
QUESTION IS: who's generating interrupt 209? in snap222, there is no 209 at all; in snap111,
there is ONLY one 209.
	If load the system from the image, we are getting a lot of 209's.
##################

			
		

--------------------------------------------------------------------------------------
Task 209:  Find out who's generating interrupt 209.
--------------------------------------------------------------------------------------
	[1] figure out the irq and irr number of 209. Check 
		cpu_get_pic_interrupt(env);
		debug: in qemu command line, do loadvm snap111 and immediately ctrl+c.
		cpu_exec.c:330
		209 is generated by apic_get_interrupt(env->apic_state);

		It's irql 5 and 0x20000 , which actually should generate interrupt 177.
		If it's irq 6 and 0x20000, then it generates 209
	[2] use mem hardware breakpoint to trace on s->tab[6]
		found the following:
#### !!!! yeah finally found that it's the rtc_period_timer!!!!
		#0  set_bit (tab=0x28dcf830, index=209) at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:60
#1  0x08272810 in apic_set_irq (s=0x28dce510, vector_num=209, trigger_mode=0)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:390
#2  0x08272307 in apic_bus_deliver (deliver_bitmask=0xbffff01c, delivery_mode=0 '\000', 
    vector_num=209 '\321', trigger_mode=0 '\000')
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:277
#3  0x082723e0 in apic_deliver_irq (dest=1 '\001', dest_mode=1 '\001', delivery_mode=0 '\000', 
    vector_num=209 '\321', trigger_mode=0 '\000')
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../apic.c:290
#4  0x08275491 in ioapic_service (s=0x28ded7b8)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:71
#5  0x08275608 in ioapic_set_irq (opaque=0x28ded7b8, vector=8, level=1)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../ioapic.c:111
#6  0x08126d7a in qemu_set_irq (irq=0x28dea040, level=1) at hw/irq.c:38
#7  0x08284f5a in gsi_handler (opaque=0x28de5228, n=8, level=1)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../pc.c:98
#8  0x08126d7a in qemu_set_irq (irq=0x28de5330, level=1) at hw/irq.c:38
#9  0x0810f271 in hpet_handle_legacy_irq (opaque=0x28deb738, n=1, level=1) at hw/hpet.c:677
#10 0x08126d7a in qemu_set_irq (irq=0x28e0781c, level=1) at hw/irq.c:38
#11 0x08280d46 in qemu_irq_raise (irq=0x28e0781c) at /home/csc288/qemu/qemu-1.4.0/hw/irq.h:14
#12 0x082814e1 in rtc_periodic_timer (opaque=0x28e07178)
    at /home/csc288/qemu/qemu-1.4.0/hw/i386/../mc146818rtc.c:200
#13 0x081e35f4 in qemu_run_timers (clock=0x28c45a18) at qemu-timer.c:394
#14 0x081e3813 in qemu_run_all_timers () at qemu-timer.c:452
#15 0x081b62b4 in main_loop_wait (nonblocking=0) at main-loop.c:436
#16 0x08235954 in main_loop () at vl.c:2007
#17 0x0823c991 in main (argc=14, argv=0xbffff744, envp=0xbffff780) at vl.c:4341
#####################
Conclusion: somehow RTC timer is NEVER triggered! But RTC timer is used 
to trigger the KeUpdateSystemTiemr->KeUpdateRunTime -> SwapContext.
#####################


7:30PM 02/13/2014. 
--------------------------------------------------------------------------------------
Task 210:  Try to solve the problem of drifting timer.
--------------------------------------------------------------------------------------
	[1] bp on qemu_run_all_timers and then run into qemu_run_timers -> then delve into
			qemu_run_timer(vm_clock)
	[2] change the clock values and see how it's going.
	[3] observation: it seems that cpu_get_clock() returns the real time from the host (instead
		qemu_get_clock_ns() -> check clock_type (VM_CLOCK, correct) -> check itype
			use_icount (check)
			if(use_icount) is defined, it will call gen_io_start()/gen_io_end() for i/o operations.
		of vm clock!)
	[4] if simply change the use_icount to 1, it will try to read the qemu_get_clock_ns()
		from the icount (instruction count), which is currently 0. This eventually
		leads to a segmentation fault.
		it's expectable because some instructions are not translated with io/start
	[5] still keep the same use_icount as 0, but change all the timers and see the effects.
		doing vmclock->active_timers->next->next ...
		there are three timers: pit_irq_timer, rtc_update_timer, rtc_periodic_timer.
		It seems that the third timer triggers the 209 interrupt.
		now chagne all of their expire_time to the current time (in debugging)

		Observation: it still does not work on snap222. interrupt 209 is only triggered once
	and not again.

		Guess: still not enough 209 triggered.
	[6] figure out the question why ts->expire_time is changed back. It is also affected by
		PITChannel* opaque->next_transition_time, which is changed in pit_irq_timer_update
9:00AM 02/14/2014
	[7]  figure out how the 3rd timer is called actually.
			set the expire_time of all three timers and then watch *(&(clock->active_timers))
		there are four timers found:
		$4 = (QEMUTimer *) 0x28e07d18 /PIT
		$5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer
		$6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer
		$7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer

	Recorded events:
		[1] hit run_all_timres again
		[2] mod PIT and temporarily set to apic_pm_tmr_timer
		[3] in qemu_mod_timer_ns. The apic_pm_tmr_timer (currently the top one) is also 
			expired, re-insert the pit timer before apic_pm_tmr_timer and resets its expire time.
			the operation is an insert operation into the list. now PIT is the first timer again.
		[4] ** after the call back ts->cb, the expire_time was reversed (need to change later)
			pit timer is again expiring in the loop. ....
			it's the pit timer always getting updaed almost
		[5] set condition bp on qemu_mod_timer_ns, condition is set to that it's not the
			pit timer.
			found that the apic_pm_tmr_timer is updated by an out instruction (io port writing,
			addr: 45056).
	10:45AM
		[6] add the condition again and look at how rtc_periodic_timer is updated.
			reset expire_time of everybody. It is called inside qemu_run_timer(vm_clock)
			due to the loop. Need to understand it completely!!!
			for(;;) {
				ts = clock->active_timers; //ts points to head of clock timer list
				if (!qemu_timer_expired_ns(ts, current_time)) { 
					//if ts->expire_time>current_time break out of loop
					break;
				}
				//if ts->expire_time<=current_time (i.e., the head timer expired)
				clock->active_timers = ts->next;
				ts->next = NULL;
				/* run the callback (the timer list can be modified) */
				//call the call back function to
				//(1) send the interrupt out
				//(2) modify the timer->expire_time based on (opaque to increase to
				//		next transition time and then insert the timer back
				ts->cb(ts->opaque);

				// it will call qemu_mod_timer_ns for all timers 
				// the following is form qemu_mod_timer_ns
					//pt points to 2ND timer this moment
					//ts is actually the first timer
					pt = &ts->clock->active_timers; 
					for(;;) { //go search for the 1st NOT EXPIRED timer
						t = *pt;
						if (!qemu_timer_expired_ns(t, expire_time)) {
							break;
						}
						pt = &t->next;
					}
					ts->expire_time = expire_time; //set 1st timer NOT EXPIRE
					ts->next = *pt; //link to the NOT EXPIRED timer
					*pt = ts; //AFTER THIS IS DONE the clock->active_timers points to
						//shape: EXPIRED, EXPIRE,  NOT_EXPIRE, NOT_EXPIRE, NOT EXPIRE


			}
			//---------------------------------------------------
			//CONCLUSION:
			//	the outer loop is supposed to update actually ONE expired timer ONLY!!!
			//! so we should make the timres expire
			//---------------------------------------------------


---------------------
	[8] check the status of all timers in snap222.
		Current_time: 0x701ff06b531
		$4 = (QEMUTimer *) 0x28e07d18 /pit_irq_timer 0x6fe778196cb (expired)
		$5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer 0x6fec62748a9 (expired)
		$6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer 0x134e83cf833f61e1 (not expired)
		$7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer 0x134e83cfbc7b8b00 (not expired)

	after resetting all of them(expire time) to current_time, after 100 clock updates
		Current+time: 0x741f11fc12b
		$4 = (QEMUTimer *) 0x28e07d18 /pit_irq_timer 0x7401bb625f1 (expired)
		$7 = (QEMUTimer *) 0x28ded748 /rtc_update_timer 0x7402908cab0 (expired)
		$5 = (QEMUTimer *) 0x28e208a0 /apic_pm_tmr_timer 0x740400f72a1 (expired)
		$6 = (QEMUTimer *) 0x28ded728 /rtc_periodic_timer 0x134e83cf842dcd09 (UPDATED NOT RIGHT)
			-> it's reading from the host time resource!!!! [that's the reaon it's 
				NEVER GOING to be updated!]
		Note: 209 is triggered by rtc_periodic_timer!!!!! That explains.

	[9]  check how is rtc_periodic_timer updated its source, set a hw bp on it.
		Find that it is the RTCState->next_periodic_time decides it:
		as shown below:
				 static void rtc_periodic_timer(void *opaque)
				179     {
				180         RTCState *s = opaque;
				181
				182         periodic_timer_update(s, s->next_periodic_time);
		What about resent s->next_periodic_time as well?
		It will reset the things right.
	
	[10] new problem: pic_irq_timer updates too slow. It will obstruct the updates of other timer!
	understand how it's getting updated.
		it uses PITChannelState->next_transition_time.
			Basically, it's to increment the PITChannelState, increment by 65545 every time,
		and it's not affected by the current time (real). there will be some drift of time
		as it proceeds.

		As it progresses, the PIT timer kind of live locks the qemu_update_timer. First,
	it wastes a lot of cycle (generate hundreds/thousands of mini incremental interrupts)
	many many times (most are ignored because they are repeating each other).
		Proposed mod: 
			in pit_irq_timer:
			get the current time using qemu_get_clock_ns(s->irq_timer->clock)
			DONE.

		manually modify the rtc_periodic_timer->expire_time and then
			((RTCState *)s->opaque)->next_periodic_time to current time.
		Now triggers interrupt 209 frequently; HOWEVER, it still does not accept keyboard!

8:30AM 02/15/2014
	[11] repeat the experiment on snap222 and see what is wrong.
		[a] in qemu_run_all_timers(), drill into qemu_update_timer() for vm_clock,
			then set the current_time as the expire_time for all timers linked
			by ts->next->next ...

			Specifically, identify the rtc_periodic_timer, update its opaque attribute
			which is actually an RTCState
				((RTCState *)s->opaque)->next_periodic_time to current time.

		[b] break on do_interrupt_all and see if 209 is triggered.

		[c] check KeUpdateSystemTime and KeUpdateRunTime and see if they are invoked.
			KeUpdateSystemTime: 0x806ecd34 (hit), 0x804e373d (hit)
			KeUpdateRunTime: 0x804e39c1 (CurrentThread->Quantum-=3) (hit)
			-- the above two are hit multiple times --
			SwapContext: @EIP 0x804dbec0: length: (1): pushf (hit) (hit less frequently,
				but is hit multiple times)

			Note sure what is broken on snap222, but no processes are detected though.
			Strange? even with SwapContext already working!
			Will revisit snap222 later.

	[12] Repeat the above on snap111 and save a snapshot as snap333. Working!
			But something is wrong which triggers blue screen.
				b raise_exception and see what's wrong.
				exception index is 6 or 7. But could not identify how it's generated.
				make clean install. Still not working.
			It may be caused by the invalid clock value. Try update all timers and
			opaque corresondingly.
			Seems to work. wait 10 minutes.
			Observation: the net use command is much faster than the snap111 (probalbly the
		interleaving improves the response to network packets!!!)

******************************************************************************************
------------------------------------------------------------------------------------------
Snapshot Problem Finally Solved Completely!
		Points: see [11] of 8:30AM 02/15/2014 notes!
			got to update all opaque state of the four timers associated with the vm clock
		(bp on the corresponding ts->cb function).

		Cause: it's caused by the RTC periodic timer, interrupt 209 is NEVER fired.
		We fix the clock value and REMOVED ONE bug related to PIT-timer (update using
		the recent value to avoid live lock by pit timer)
------------------------------------------------------------------------------------------
******************************************************************************************

	10:30AM
	[13] now generate the new image and change the job sequence (so that we can save the 
time costly step of net use). [30 min]
		[1] make the updates
		[2] remove test use (strangely, when using net use, it is still quite vulnerable
			when saving VM. if not using net use, it is good - network transmission
			cause some trouble in buffer i/o channel?]
		[3] completely recompile and check. 
			still not working. strangely if load snap333 first, loadvm snap555 would work. 
			???
	11:30AM
		[4] need to check into the details of the raise_exception.
			[1] find out where it is throwing. exception_index: 6 (invalid opcode)
				global_eip: 0x20ece
				set a condition bp on it (have to code it), too slow
				It's an instruction: les %esp, %eax

				The exception is genreated by disas_insn for the les instruction, because
				mode is 3. Note translate.c:5673

		[5] trace the last 150 instructions before 0x20ece and see what is going on.
			the last couple of instructions is iret, which gets into 0x20ece.
			see below:
			------------
				@EIP 0xc01d5: length: (1): popa
				@EIP 0xc01d6: length: (1): pop  %ds
				@EIP 0xc01d7: length: (1): pop  %es
				@EIP 0xc01d8: length: (1): popf
				@EIP 0xc01d9: length: (1): iret
			------------
			There is an interrupt right before 0x20ece.:w
				@EIP 0x20ec6: length: (4): mov  $0x0012, %ax
				@EIP 0x20eca: length: (2): addb %al, (%eax)
				@EIP 0x20ecc: length: (2): int  $0x10
			It's BIOS interrupt call (related to GUI cursor etc.)
			Interestingly: 20ec6 is never HIT in other snapshots (also xxec6 is never hit
		in other snapshots <- it may be caused by others? check interrupt) ????

	7:00PM 
		[6] break on 20ec6 and check the last interrupt number
			last_intno is 8. verified it's always 8 for do_interrupt_all before hitting
		0x20ec6. --> triggers 804e0f69.

			check if in snap333 it's popping interrupt 8. ??? ---> yes eventually.
			VERIFIED. snap333 will also crash as long as interrupt 8 is there! --> only 
				occured once. could not verify.

		[7] figure out who's triggering interrupt 8?
			Should be tab[0] is 0x00000100
			--> did not catch it.
			interrupt 8 is not triggred from cpu-exec.cc30
			current eip is: 0x804e1f25 (then break on helper_trace2 on it) 
				0x804e1f25 is hit many times. not a good way to find out who's triggering
				interrupt 8 (error code is 2).
			It seems to be caused by
				718     void tlb_fill(CPUX86State *env, target_ulong addr, int is_write, int mmu_idx,
				719                   uintptr_t retaddr)
				720     {
				721         int ret;
				722
				723         ret = cpu_x86_handle_mmu_fault(env, addr, is_write, mmu_idx);
			the ret is 1 and then it triggers an excpetio
			The problem seems to be: illegal write of memory.
				(gdb) p/x env->cr[3]
				$27 = 0x62ce000
				(gdb) p/x addr
				$28 = 0xf7901ffc
				(gdb) p/x is_write
				$29 = 0x1
				(gdb) p/x global_eip
				$30 = 0x804e1f25
				(gdb) p print_instrRange(0x804e1f25, 0x804e1f26, env)
				@EIP 0x804e1f25: length: (7): movw      $0x0000, 0x2(%esp)
				it is trying to save to: 0xf7901ffc
				when calling tlb_fill, it generates the error (guess: protection error)
					candidates: PG_ERROR_P_MASK, rror_code |= (is_write << PG_ERROR_W_BIT);
						PG_ERROR_U_MASK, PG_ERROR_I_D_MASK;
					it seems that error_code is set to is_write <<PG_ERROR_W_BIT
				general exception_index: EXCP0E_PAGE (14)

		==> it seems that interrupt 14 triggres 0x804e1f25, and the usual address to load
			is in 0x203c range.

		Later: check who's triggering interrupt 14? does snap333 got interrupt 14?
			[1] in snap333 there are lots of interrupt 14 as well.
			[2] it seems 14 is the page fault.

		7:30PM 02/16/2014
			Guess: is it the lock causing the problem?
			loadvm snap555 and check the clock value. 
				(1) set a breakpoint at savevm.c:2311. loadvm_state()
				(2) then BP on qemu_run_all_timers.
					check the state of APICTimer
					check 
					(a) what is the difference between RTC_periodic_timer and RTC_update_timer
					(b) what is the difference between next_periodic_time and next_alarm_time
						seems to be the difference between periodic and oneshot
					ONLY updates the PIT timer and RTC_periodic_timer, ignore apic timer
			and rtc_update_timer. got bluescreen
			[1] test 1. update PIT, rtc_PERIODIC_timer, and RTC_UPDATE_TIMER.
			Does not work. has to start from the analysis of blue screen

			**** bp on do_interrupt_all if intno==14, ignore the first 736 times ***
			Found the problem, it triggers page exception on itself very early (from 300 imes).
				Need to use binary search to find the first time it triggers the problem.

		8:30AM 02/17/2014
		[8] continue the analysis try to identify which interrupt 14 triggers the problem.
			Use binary search [30 min]
			[1] b savevm.c:2311 first and then bp on do_interrupt_all
			100 (too large) -> 50 -> 25 -> 12 --> 6 --> 3
			use ignore bp 5 times, it will be trapped

		[9] check what are the interrupts before ignore bp 5.
			There are about 10 interrupts handled. Mostly are 209 and 177 (periodic and
		pit timer interrupts)

		9:30AM
		[10] record the last 10 interrupts. [20 min]
			{209, 177, 209, 209, 65, 14, 209, 14, 14, 14}
			{130, 209, 130, 130, 209, 65, 209, 14, 14, 14}
			{65, 177, 209, 209, 209, 209, 14, 14, 14, 14}
			{209, 209, 209, 177, 209, 14, 209, 14, 14, 14}
			{209, 177, 65, 209, 209, 209, 14, 14, 14, 14}
			{209, 209, 177, 209, 65, 209, 14, 14, 14, 14}
			{130, 209, 130, 130, 209, 65, 209, 14, 14, 14}




		Now the question is 209 may trigger something strange that causes segmentation fault?

		[11] now the question: is the number of 209 interrupts fixed before the crash? [15 min]
			1st time: 36, 37, 36, 35, 47 not fixed.

		[12] record the last 100 instructions and see what are they
			It seems that it's always the 0x804df14a triggering the problem (which
	should be  apart of handler for 209).	
			@EIP 0x804df148: length: (2): pop       %fs
				@EIP 0x804df14a: length: (3): leal      0x54(%ebp), %esp
				@EIP 0x804e1f25: length: (7): movw      $0x0000, 0x2(%esp)
				@EIP 0x804e1f25: length: (7): movw      $0x0000, 0x2(%esp)
				@EIP 0x804e1f2c: length: (1): push      %ebp
				@EIP 0x804e1f25: length: (7): movw      $0x0000, 0x2(%esp)
				@EIP 0x804e1f25: length: (7): movw      $0x0000, 0x2(%esp)
				@EIP 0x804e1f2c: length: (1): push      %ebp

		[13] set bp on 0x804df14a -> it's hit multiple times. Most likely the last hit is:
			>40. and add a second condition env->cr[3]==0x62ce0000 guarantess the hit.

		11:00AM
		[14] trace into the bp on 0x804df14a [env->cr[3]==0x62ce0000] and trace it step by
			step in binary and see how the interrupt is thrown.
			cr3 0x62ce0000 is services.exe
			First of all, it seems that it's the 0x804df14d pop %edi causing the problem.
			A normal address is something like: 0xf800cdb4
			Debug: [1] bp on savevm.c:2311 (last line of loadvm)
					[2] bp on ops_sse.h:2473(capture 0x804df14a)
					[3] bp on do_interrupt_all if intno==14

			FOUND THE PROBLEM!!!! HA HA HA HA ...
			Details: when i>2 [process idx in arrCR3 is greater than 2], 	 it tries to
		read process name/file path using
			target_ulong FilePath = cpu_ldl_data(env, pFilePath);
			When the page is not there, it triggers an interrupt and failed if
		currently it's not in the right privilege mode.

		So page fault in helper_trace2 actually killed other processes (timer
		triggered swap), that explains why it's not switching?

		================================ primary fix =====================
		1. before loading memory cpu_ldl_data, check the availability of data in page table.
		add a function cpu_ldl_data_safe(int &res), set res to -1 if it fails. [120 min]
			[1] cpu_ldl_data
			[2] cpu_ldu_code
		================================other fixes===========================
		1. in lodavm_state in savevm.c: initialize arrCR3, numCR3
		2.  in helper_trace last part, if proc_state is DETERMINED, should skip
			the check of the process name


3:30PM 02/17/2014. 
--------------------------------------------------------------------------------------
Task 211:  fix helper_trace1
--------------------------------------------------------------------------------------
TO DOs:
		================================ primary fix =====================
		1. before loading memory cpu_ldl_data, check the availability of data in page table.
		add a function cpu_ldl_data_safe(int &res), set res to -1 if it fails. [120 min]
			[1] cpu_ldl_data
			[2] cpu_ldu_code
		================================other fixes===========================
		1. in lodavm_state in savevm.c: initialize arrCR3, numCR3
		2.  in helper_trace last part, if proc_state is DETERMINED, should skip
			the check of the process name
==============================================================================
	[1] study the cpu_ldl_data and check what we can improve. [20 min]
			1. cpu_ldl_data is defined in softmmu_header.h:250 (include/exec/softmmu_header.h)
			2. it checks the softmmu table, if it's in table, it just load from softmmu
			3. otherwise it calls helper_ldl_mmu() to load mmu (for the address)
			4. helper_ldl_mmu is defined in include/exec/softmmu_template.h:97
				it calls tlb_fill to perform the job
				which then calls cpu_x86_handle_mmu_fault() --> checks page table (and
					permissions) --> if the page is loaded it calls tlb_set_page() to 
				update the tlb with the page information (physical addr)	
			Conclusion: it seems fine to reload tlb; hoewver, if there is a page fault,
			it's unexpected and the interrupt cannot return to the original code; that's no
			good.
	[2] improve idea: [10 min]
		[a] before each cpu_ldl_data, if va_to_ha returns -1, skip the cpu_ldl_data
		--- potentially, if the page is never loaded, the procedure of capturing the
			process name could fail. Will check later.

	[3] implementation steps:
		[1] repeat the error: (1) bp savevm.c:2311, (b) bp on do_interrupt_all if intno==14,
			(c) b helper_trace line ops_sse.h:2473 and display eip, capture 62ec00, display count209 (should be value around 40 to 50) [2] before the problematic cpu_ldl_data add the check

		[2] insert the logic and see if we can save from the crash. --> after the logic is
			interestingly, the check logic is only hit twice before the crashpoint.
			verified --> it does help. However, there are other crash points.

	10:30AM 02/18/2014
	[4] add all other check points [1 hr] Working!!!!!
	11:20AM
	[5] now use snap555 for the batch analysis and see what is going on. Problem: 
		we have to wait a couple of seconds before the y:\ drive is loaded.
		Found that it's the problem of snapshots.
	[6] save a new snapshot named snap666
	[7] try the new snap666. --> still problem. Does not work. Seems need to do a dir before it. 
	[8] add a dir task and see what's going on. suddenly it's very slow.
	[9] apply the optimizations 
		1. in lodavm_state in savevm.c: initialize arrCR3, numCR3
		2.  in helper_trace last part, if proc_state is DETERMINED, should skip
			the check of the process name
	[10] delete unused vms.
	SUCCESS!

9:00AM 02/20
--------------------------------------------------------------------------------------
Task 212: test the system 
--------------------------------------------------------------------------------------
	[1] generate the trace and then run it [1 hr] OK.
	[2] strangely IMM cannot debug the generated slice (which did not occur before). Check it later
	[3] call isDebuggerPresent and generate a new program.
			finding: IsDebuggerPresent only checks when the process is really run under 
		debugger. When WinDbg is connected, it's not discovered.
	[4] tried the other trick of checking PEB, also not working
	[5] had to use INT 2D. Use the INT2D trick, service 1 (print debug string), print a NULL
		string. EAX return value will be different. now works. copied to /home/samba/smbuser
	[6] collect all the traces of checkdebug.exe.
		The problem is that the process is not captured!

8:45AM 02/21
	[7] check why the process is not captured by the process terminate event is sent?
		[7.1] check when process term event is sent?
			it's sent by seg_helper.c, sysenter.
			problem: the process is actually never captured. When the send_evt performs
			erase cr3 in the setCR3 of the condition, the size is already 0, but it
			triggers the termination of capture process.
	[8] consider algorithm fix for capturing process.
		Ideal way would be trigger the page fault and load the page. If in normal mode,
	cpu_ldl_memory --> tlb_fill -> set up interrupt ... -> in next cpu cycle, it picks
the hardware interrupt -> do_interrupt_all ---> set up the trap frame and the next EIP
(basically to repeat it). --> OS routine load the page --> redo the instruction.
		Now the problem is that if the instruction is not REPEATABLE it causes failure.
	[9] design idea: first test if page is in RAM, if not check if the instruction is JMP or
		CALL, if they are, avoid loading page (go to next instruction);  if no, then
		set the env->eip to the next one (so the current instruction would not have to be
		done twice).
10:00AM
	[10] experiment: in one page not in RAM case, load the page and examine what's the next
	EIP and check the instruction type.  [20 min]
		$2 = 0x804df14a
		@EIP 0x804df14a: length: (3): leal      0x54(%ebp), %esp
	Then do_interrupt_all is hit, and  next_eip is 0x804df148. (which is env->eip because
		it's not updated yet)
		next eip hit:  0x804e1f25
	Now if we reset env->eip would that solve the problem?
		Does not work, it will recursively trigger page fault infinite many times. 
	the page fault handler itself is triggering page faults.
		maybe forbid load memory when it's in 0x80range?
		does remove the failure bug.
		Strangely, none of the breakoint is hit. The process is not captured

	[11] set a BP on discovering new CR3. see if the CR3 is captured
		b ops_sse.h:2549
			It seems that new process is 0xf32b000, but its proc_status is already set to 0 (NO)
		the problem is that the same cr3 occured twice!
			set a watch point on proc_status[7] and arrCR3[7] and arrCR3[8]
			After setting the watch point, it becomes ok (strange). must be some timing problem?
			

11:20AM
	[12] solve the problem of copy y:\se (l) first.
			[1] check how  sendCommandToVM is accomplished.
				it calls handle_usr_command
					hmp_send_key
8:00PM
			[2] prepare a large buffer before hmp_send_key and str concat it. [20 min]
			check what's the string generated. Verified no problem with this function. Maybe
			introduced as a bug by the fix for 11?.


			[3] attempt 3: try disable the fix for [[1] and the problem still occurs. Could
				not figure why.

			fix [11] first. removed sleep statement, seems still 40-50% of chance of getting it
wrong.


8:30AM 02/22
--------------------------------------------------------------------------------------
Task 213: Fix the problem that process name is not captured.
--------------------------------------------------------------------------------------
	[1]  move b21.exe back and see if the problem persists. [20 min]
			it seems problems persists. It seems that the copy problem never occurs.
			if we change the file name to checkdebug the problem shows up.
			It may have to do with the size of file name?
			enlarge buffer -> still does not solve problem. All readline buf are declared 
		with 4k, which should be suffice enough. Strange.
			change the name to the same size and see if problem persists. not occur
			change checkdebugger.exe -->
					b1234567890.exe problem occurs
					b212.exe --> repeated 8 times, problem never occurs
					b11112222333344445555.exe
						-> problem every time.
			CONFIRMED: longer file name cause the problem. Must be some buffer overflow.
			will check it later.

	[2] now concentrate on the process name not captured problem.
		[2.1] collect potential cr3 process ids:
			new 8th CR3 value: 0xf36b000
			---new 9th CR3 value: 0x5f33000
			new 8th CR3 value: 0xf32b000
			0xf2ad000
			--new 8th CR3 value: 0xf36b000
			---new 9th CR3 value: 0x5f33000
			Never the same!

	[3] Debug:
		[1] in all set to NO (with i>=8) set bp
		[2] in check file name (with i>=8 and eip<=0x800000) set bp
			The problem is that arrCR3 has duplicated entry, winlog.exe 5f3300 is writen
		into two neighbor slots.
			identified problem: /home/csc288/qemu/qemu-1.4.0/target-i386/translate.c:8215
8215               arrCR3[cr3Count++] = env->cr[3];
		Comment out this section of code!
		Found the problem and now it's working!

10:45AM
	[4] collect the trace and do the experiment
	293 slices to genreate

	[5] problem: cannot get file offset of 0x7c80xxxx. The problem is caused by the
		generate bridge which overwrites 0x7c80xxxx not in range.
		[5.1] generate the full dump first.
			for soc: sStart = 146327, tsEnd = 146535, bridge: 146536
			it looks like the trace completely do not match each other.
		clean the full trace.
		error again.
		
		Problem: {tsStart = 146327, tsEnd = 146535, bModified = false, 
  tsBridge = 146536, room = 8, tsNextStart = 14656
---------------- DATA BELOW ------------------------
		timeStamp: 146535, ins @403790: call    [0x420208]
 read: (start: 0x420208, end: 0x42020b)  write: (start: 0x12ff24, end: 0x12ff27) , ESP: 0x12ff28 -> 0x12ff24 , DEPLINKS:  , R: 146534 and ESP value: 0x12ff24, M: 72984

timeStamp: 146536, ins @7c80c6cf: mov   edi, edi
, DEPLINKS:  , R: 146517 , C: 146535 ESP: 0x12ff24 EBP: 0x12ff84
---------------- DATA ABOVE------------------------
	the problem is that the CALL cannot be the end of an SOC. Check how SOC is extended.
--> still not solved. recompile it later.

	9:00AM 02/23/2014
	[1] find out the slice point
		159421
	[2] check how soc end is identified as a call instruction. set a conditional bp at
		socmanager::identifySOC.
		does not capture it
	[3] try to set a watch point on sm.vecSOCs[8].tsEnd
		could not capture it.
	[4] search tsEnd in soc.
		Result: soc (146535, 146535) is first created as a single SOC; then it is merged
		with socNext (146327, 146534) at socmanager.cc:79.
	[5] check verifyBridge
		set a conditional bp there.
			problem is in setBridgeTo it returns true (which should be actually a false)
		in setBridgeTo if the tsEnd is a call instruction, then it should be a direct false.
		The bridge itself is actually checked against call instruction in get_room().
	[6] fix setBridge [20 min] --> seems to be fixed, but only one soc.
		problem is that it's always one soc. maybe add singleSOC should be expanded for
			single ts that is a jump/call.
	[7] test: 293 branches:
			started: 11:13AM--> 

8:45AM
--------------------------------------------------------------------------------------
Task 213: perform experiment
--------------------------------------------------------------------------------------
	[1] collect and copy the branch slices.
		There are some infinite loop slices. Put them in problem_slices
			Prolbme files: 60/293
	[2] check the good files. 223 lines.
	[3] kernel debug mode.
9:00AM 02/25/2014
	[4] compare. Yeah! it works!!!! branch 97 discovered the difference!
		23c223
< slices\b212.exe\brc_97\b212.exe: 0x11220001^M
---
> slices\b212.exe\brc_97\b212.exe: 0x22330001^M

	[5] apply it to b1.exe and see if it works. Verified: no difference!
DONE!

10:45AM 02/25/2014
--------------------------------------------------------------------------------------
Task 214: strengthen the time-out function of runproc.cc
--------------------------------------------------------------------------------------
	[1]  just change the time out value in milliseconds. Now change to 3 seconds.
	solved. All timed out gets exit value 0x103.

11:30AM 02/25/2014
--------------------------------------------------------------------------------------
Task 215: Fix the long bin executable file name problem
--------------------------------------------------------------------------------------
	[1] pick one .exe that is trapped in infinite loop. 
		Example: branch_40
		Trapped in a loop close 0x00409aa6 to 0x00409aaa (there is no update of the
			value).
		The problem is that 0x00409aaa is ONLY hit once and it passes.
		Problem: at 0x004012FC the stack pointer location is different.
		at 0x0040125D the call instruction did not actually balance the ESP. It did
	ADD ESP, 0x10 [which reverts the effect the call instruction on ESP],
	(however, it ignores the fact that the previous 4 ESP instructions are ignored.

	[2] regenerate the branch 40 and get the corresponding slice eip.
			ts=163443. Not working. Needs to set a conditional BP to capture it.
	[3] verify if branch 40 is the problematic one.
	[4] study why the error occurs:
		[1] where is the function adjust ESP?
				it is generated by CallAdjustRecord::asReplacement() in 
				binWriter::writePartialTraceToFile.
		[2] if the function is skipped, and if the function has replacement (not nops),
			then we'll need to add dependency to the previous ESP writing instruction!
			In this case, we'll need to add a progESP function after the call of
			processFunction in Trace.cc
			
	[4]  b binWriter.cc:433 is not hit!
8:45AM 02/26/2014
	[5] try to generate an infinite loop branch.
		[0] generate the full trace. 
		[1] set a conditional CallAdjustRecord::asReplacement(), need to check that it is
				being called by the binWriter!
		[2] get the ts
			eip: [0x403843]
			ts: 141819 
		[3] generate the trace in both mode 1 and 0. Verified the slice is not working.
			reason: at 0x403843, it's no go. CAPTURE SLICE DONE.
		[4] Fix idea:in Trace class add a function propagateESPEBPLink when a car record
			is available.
9:00AM 02/27/2014
		[5] Implment propgateEspEbpLink - do it directly inside trace::full_slice
			[5.1] check if car record exists for EIP, if no, directly return
					call Trace::isFunctionNoChangeOnESPEBP
			[5.2] if esp change, finds the latest esp writing instruction and 
					add an ESP delay link to it
					call delayRegDependency
			[5.3] if ebp change, finds the latest ebp writing instrucgtion and 
					add an EBP delay link to it 
			9:40AM
			[5.4] test/debug
				ts: 141819, eip: 0x403843
				problem: did not capture 0x403843, the problem is that the function
				itself is identified as has dependency, and is included? How come 0x403843
				is not included in slice?
				[1] enable log and check the following timestamps:
				137803 and check how it's NOT included.
				137825 (the RET)
				Strangely: 137803 and 137802 are included in slice. When writing, they are not
		in slice.
				check when it's disabled, break on InstrInfo::setInSlice() and unmarkInSlice()
				Observation: the tryAddCAR is called and used to add it; but
		finally it is identified as having data dependency.

		Question: for every iteration, should we clear CallAdjustRecord?????

8:40AM 02/28/2014
	[6] figure out whether timestamp 0x403843  is contained in slice.
		[6.1] find out how many iterations [10 min]
				ts: 141819, eip: 0x403843
				(id = 4)
		[6.2] verify if it is hit in the last iteration at  all [10 min]
				set a BP conditional and see how it's hit.
				captured in pass3.
		[6.3] problem analysis:
			[1] car is added even if the call is identified to have data dependency in body.
				which is no good.
			[2] car is not cleared at each pass which is no good.
9:20AM
	[7] fix the problem.
		[7.1] move the place of esp/ebp check. [5 min] DONE.
		[7.2] add CallAdjustRecord::clear() [5 min]
		[7.3] call CallAdjustRecord::clear in every pass beginning [10 min] DONE.
		[7.4] double check the code of propagating ESP/EBP [10 min] DONE
		[7.4] debug ts:141819 and binWriter.cc and check if it's hit again. [15 min]
			b Trace::gen_slice_for_branch set ts to 141819
			b binWriter.cc:412
	[8] now new problem. the generated program crashed close to the termination point.
		[8.1] analyze the problem. It looks like that it starts from 0x403d30 (the
				jump to last section is not handled right).
		[8.2] generate it again. It seems problem persists.
		[8.3] in binWriter.cc set a conditional bp on 0x403d30
				observation: it is writing the instruction and the next few instructions.
				Know that file offset is from  12592 to 12602 (where it's messed with
			other data or instructions).
				Observation: it writes 12592 (2 bytes), and then 12594 (1 byte)
			then this is the last SOC, and it closes fid. Then it contins to
		binWriter::writeProgramExit.
				Problem is asJMP. Fix the logic.
				fixed.
	[8] generate 20 slices and test. There is One exception.
	[9] generate 40 slices and test.
			Around 4 slices with problems: 27, 40, 51
			0x103, 0x25021 etc.	
	[10] generate the entire slice
		Bug: crash. Problem REQ mode is set to 0. FIXED. It's in Trace constructor.

8:30AM 03/01/2014
	[11] generate the entire slices (293)
		Bug: found most of them still crashes with code 0x103.
		It seems to be the problem of runproc.exe
			Found that it is still the problem of slicing.
	[12] check slice 108. Compare with a correct slice.
		It's the stack problem again.
		find the earliesr slice with problems.
		It seems that it's runproc.exe which drains the resource eventually.
		It did not release the resource of a process
10:00AM
	[13] modify runproc.cc [30 min] DONE.
		It has greatly reduced the number of 103s.
		The earliest slice is 40.
10:30AM
	[14]  compare the log file again. verified. It successfully detects the error.
		

11:00AM
--------------------------------------------------------------------------------------
Task 216: Fix the long bin executable file name problem
--------------------------------------------------------------------------------------
	[1] check how  sendCommandToVM is accomplished.
		it calls handle_usr_command
			hmp_send_key
	[2] enlarge the hmp_send_key buffer see if it helps
		[2.1] repeat the error first (create a 30 char long name)
			total: 10  
			fail: 9
		[2.2] enlarge the buffer in hmp_send_key
			total: 5
			fail: 1
		So enlarge the buffer of hmp_send_key does not work.

	[3] study the logic of hmp_send_key [15 min]
		All chars are actually sent to hmp_send_key.
	[4] analyze the workflow.
		main-loop.c -> handle_user_command_dummy
			-> BatchAnalyzer::take_and_exec_cmd_from_buffer()
			-> only called handle_user_command ONCE!
		How about using a loop? added 1s wait time before puching each key. does not work.
	[5] get into the details of sending a key board input.
		qmp_send_key -> kbd_put_keycode-> qemu_put_kbd_event
		Check which part is missing
	[6] in kbd_put_keycode
		renable the printf
		Translation from keycode to index is done by index_from_key
2:30PM
	[7] add from from keycode to key
			string (index_from_key) -> index (keycode_from_keyvalue) -> keycode -> kbd_put_keycode
			in ui_input.c
			in ui_input.c work on the following:	
		[7.1] key_from_index
		[7.2] key_from_keycode
		[7.3] insert into kbd_input_key see how it works.
	Conclusion: at the level of kbd_put_keycode is still fine.
		It may be that the key board events are pressed too often and it is missed by the
	windows kernel.

	[8] add a magic number to slow down the entering of keycode.
	[9] test:
		total: 10 
		fail: 0
	[11] regenerate all tests.
ALL done.
		

9:30AM 03/03/2014		
--------------------------------------------------------------------------------------
Task 217: find an experiment with a packer
--------------------------------------------------------------------------------------
	[1] find packer. Themida
	[2] problem: crashes QEMU, problem found in the code va_to_ha.
	[3] problem area:
		nclude/exec/softmmu_header.h:176
176                                                     fprintf(stderr, "ERROR in I/O unalgined access. Count is %d\n", count); exit(8);
	[4] read the va to ha logic: [20 min]
			It seems to be hit multiple times and no problems with it
	[5] check the error address. It's 0x425ffe.
	[6] set conditional bp on 0x425ffe and check the logic.
		observation: 0x425ffe is first translated to an I/O address and then translated into
	a complete address.
		@EIP 0x425527: length: (5): push        $0x000008BE
		@EIP 0x42552c: length: (3): movl        %ebx, (%esp)
		@EIP 0x42552f: length: (3): popl        (%edx,%eax)
	instruction 0x42552f popl instruction triggers the helper_trace_mem function.

	[7] check the meaning of TARGET_PAGE_MASK
		TARGET_PAGE_BITS is 12
		#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS) ==> 2^12 = 4k = 0x00001000
		#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1) => 0xFFFFEFFF
			tbl_addr & ~TARGET_PAGE_MASK is equivalent to
			tlb_addr & 0x00001000
		tlb_addr is retrieved from softmmu table. (given the page index)
			found: tlb_addr: 0x425010.

	[8] check the 1st part: page_idx: 0x25
			because va&(realsize-1) is 0, it jumps directly to the iotable access.
	[9] check the 2nd part: same addr returned
	[10] check the 3rd and 4th path
		now va is 0x426000 -> page index is 0x26 [different page now]A
			tlb_addr: 0x426000 -> now since its & with TARGEG_PAGE_MASK is not 1 anymore,
		it's not treated as I/O.
			So the 3rd addr is mapped a regular addr, and the 4th addr is mapped as a regular
		addr. That's the problem that it crashes, because it cannot hold 2.
		But actually the second address is consecutive, we can actually increase the
		length by 1.

	[11] improve the algorithm so that it can be incremented length by 1.
9:00AM 03/04/2014
		It runs and reports memRange out of range.
		Verified it's caused by the problem of detect_vm
9:30AM
	[12] enlarge memRange range size. [20 min] DONE.
	[13] test the testvm.exe again. It takes a lot of time to run.
		The program timed out.
10:15AM
	[15] running result: it throws a dialog that the program can only run in the computer
		that it is protected.
	[14] run testvm.exe without the analysis platform [20 min]
	[15] try to figure it out why the themida (testvm.exe) takes so much time to run.
		break on Cache::savetoDisk and Cache::saveCurrentBlockToDisk
	Problem: InstrStore size too small. Change it
		[15.1] test with job 1 OK.
		[15.2] run testvm. Still too slow. Try gprofiling and see what's going on.
	[16] use gprof, add -pg to Makefile (rules.mak)
		Use command: gprof /usr/local/bin/qemu-system-i386  gmon.out > ana.txt
		[16.1] first, run it for b21.exe.
			after pg is linked, no end.
		[16.2] run the testvm.exe
			 time   seconds   seconds    calls  ms/call  ms/call  name
		 50.00      0.01     0.01      740     0.01     0.01  int128_2_64
		 50.00      0.02     0.01                             aio_ctx_prepare
		  0.00      0.02     0.00  5437623     0.00     0.00  update_proc_stats
	Note int128_2_64 and aio_ctx_prepare. Suspected that the gprof data is not accurate enough.
	It seems that gprof does not report the accurate report and it does not include
	the I/O time.

	[16] remove the -pg profiler
8:30PM
	[17] try kcallgrind (a part of valgrind) - generated callgrind_run.sh
	[18] install kcachegrind	
	[19] run - it seems to be producing more accurate results.
		Let it run. Since it's too slow, use callgrind control to start recording in the middle.
	 callgrind_control -i on
		Still waiting ... It fails at copying.

	[20] check build_page_map logic, when is it called?
		It's not very efficient. It is called at each ENTRANCE of sysenter.
		comment out the count and see if it makes things runs faster.
		set a global counter and print how many times it is printed every 1000 times.
		Verified: at a certainly point, it starts to build page table too often.
			Wait and see if it can finish the task.
	[21] It breaks at the memRangemanager capacity 500. Strangely
			handle_phy_memory access is called at every helper_trace in
			tcg/i386/tcg-target.c:1254
Need to study the logic of helper_trace_mem

10:00AM 03/06/2014
--------------------------------------------------------------------------------------
Task 218: double check the handling of physical memory
--------------------------------------------------------------------------------------
	[1]	 set a breakpoint and study the logic
		b tcg-target.c:1255
		In the trace, there is actually bTracePhyMem protecting it.
	[2] observation:
		(a) setPhyTraceMode is called in ops_sse.h: helper_trace2
		(b) it's unset in disablePhyMemTrace in Trace.cc
	[3] algorithm design for speeding it up.
		[1] check when bTracePhyMem is modified. get all of them
			trace.h:   setPhyMemTraceMode
						enablePhyMemTrace
						disablePhyMemTrace
	[4] Design: [40 min]
		[1] add bNeedsTracePhyMem as an integer to trace.h [5 min] DONE.
		[2] in BatchAnalyzer::execRawTrace set it to 0 [8 min] DONE
		[3] modify setPhyMemTraceMode and enablePhyMemTrace in Trace [5 min] DONE
		[4] modify disablePhyMemTrace [5 min] DONE
		[5] run and compare performance [10 min]
			check if it's every disabled --> no significant improvement
	[5] still crashes on memory range limit.
9:00AM
	[6] fix the memory range limit problem.
		[6.0] modify the program and record the cr3 of the trace, and the eip  DONE.
			of the last instruction before entering syscall (change EIP mode).
			[1] in Trace class, there is last_eip, cr3 [5 min]
		[6.1] repeat the problem twice and see if it is the same EIP. [25 min]  Skipped.
			There seems to be once that the program executes normal
9:45AM
		[6.2] simply enlarge the memRangeManager size and unit testing it [20 min] OK. 
				no bug found
10:45AM
		[6.5] for timeout event of execRawTrace, try sendkey ctrl-c. [1 hr]
			[a] how to send key. DONE.
			[b] add a command sendkey to BatchAnalzyer [10 min] DONE
			[c] add an attribute Task:bTaskAnalyze to Task and set it to false by default;
				in taskAnalyze initiliaze to true. [8 min] DONE
			[d] at line 1182, change the handling of taskAnalyze. send a control-c to 
				command module [10 min] DONE
			[e] debug:
					[1] case loadVM [8 min]
					[2] exec taskAnalyze [10 min]
						--> there are bugs related to this. fix it later. address 3 first.
					[3] test the program testvm [15 min]
						--> full trace is never generated. The problem is that raw_trace 
					is not there.
							Problem: full trace generation algorithm is hit; however, it is
					switched to some other thread in the qmeu emulator about i/o locking.
8:30AM 03/08/2014
				[4] check if the PROCESS_TERMINATE signal is received.
						b BatchAnalyzer.cc:69, 1100, 1208
					Verified, it's the timestamp problem.
9:10AM	
				[5] add timeouts for different tasks (rawtrace, fulltrace, slicetrace
					and remove the original TASK_TIMEOUT). Then update the call of TASK_TIMEOUT
					to corresponding tasks. [20 min] Actually no need because fulltrace
					and branch slices have no timeouts.
9:20AM
				[6] verify if taskFullTrace and taskBranchSlice are hit. [10 min]
					b taskBranSlice::gen_branch_slice
					b taskFullTrace::gen_full_trace
					Found the problem, the reason is that in branch_slice, tsEIP is not found 
				(hit) yet. Trace.cc [1153]

9:40AM
				[7] fix the problem, add exception handling here. [35 min]
10:15AM
				[8] handle segmentation in second batch. [15 min]
					found the problem->memory consumption too big.
					Still occasionally I/O thread lock up. Not sure what's the problem.
10:30AM
				[9]
					check when trace is seraillized to disk.
					taskSaveTraces::do_job, it's hit. So not likely the file locking.
					stop the vm in saveTraceJob.
					does not help. recompile.

10:45AM
				[10] still not working. vm wait no good.
					try another job. Memory problem is solved.

11:04AM
				[11] try notepad.exe
					The control-c method is not working. Notepad continues to run
				and never ends.


11:00AM 03/11/2014
--------------------------------------------------------------------------------------
Task 219: improve the process termination
--------------------------------------------------------------------------------------
	[1] investate the task terminate command [15 min] DONE
		Use taskkill /f /im notepad.exe
	[2] design term proces algorithm [25 min] DONE
		[1] determine the process name. can collect in taskAnalyze
		[2] check timeout. For each task there is a timeout event.
		[3] at line 1201 of Batchanalyzer.cc is to terminate the job.
	[3] implement it and test it [25 min]
		[4] shoot the command
		[5] find the process terminate event handling.
				BatchAnalyzer.cc:69
				BatchAnalyzer.cc:1226 add a conditional branch.
		[6] it seems that it never captures the process teriminate event for the notepad.exe
			the system is running but the notepad is never showing up, the taskkill is
			also not working.
			After the first non related PROCESS TERMINATE EVENT is received, mouse_move 
			does not work
		[7 ]when stopvm is called. -> never called
			helper_trace is called
	[4] check the b21.exe. It is working fine.
	[5] check b21 and let it time-out see if it is ok. It seems to be working. VERIFIED
		it is working.
			break at BatchAnalzyer.cc:69, 1226, 1205
			It seems that ctrl-c is pretty quick at terminating 
	[6] run notepad.exe again and give it a long break.
			It seems one of the svchost.exe terminates and then the entire system is not
		responding (helper_trace2 is still capturing).
		Disable the sendkeyCommand killtask.
	[7] It looks like that the NOTEPAD.exe freezes the system. Not sure why.
4:00PM
	[8] check why NOTEPAD.exe freezes the system.
		[8.1] break on qemu_run_all_timers when it freezes and check the timers of vm_clock.
		It looks that rtc_update_timer does not like normal. Before freeze its value is:
			1115771634328 --> after -->
			86400771630298 (it seems that it needs to be hit first by GDB first, and then
				the 2nd time it is hit it turns out to be the big number).
		But it went back to normal --> 1345165842103
			The program prints ROCESS EXITS! DUMP THE TRACE
			b seg_helper.c:2345

		The first PROCESS EXITS is 5dee000
	5dee000 - csrss.exe (controls threading and windows console)
		Which is NO GOOD. Why is this process get killed?
	-------------------
		Description of csrss.exe (from Internet microsoft.com)
			csrss.exe is is the user-mode portion of the Win32 subsystem; Win32.sys is the kernel-mode portion. Csrss stands for Client/Server Run-Time Subsystem, and is an essential subsystem that must be running at all times. Csrss is responsible for console windows, creating and/or deleting threads, and implementing some portions of the 16-bit virtual MS-DOS environment. http://www.neuber.com/taskmanager/process/csrss.exe.html 
-------------------------

		Need to check out the last moment of csrss.exe!

	[9] declare append_eip, remove_eip, and print_queue in handle.h and also
		handle.cc.  [20 min]
				0x804dbdbd 
				0x804dbdc1 
				0x804dbdc4 
				0x804dbdca 
				0x804dbdcc 
				0x804dbdd2 
				0x804dbdd8 
				0x804dbddb 
				0x804dbeb9 
				0x804dbebb 
				0x804dbec0 
		seg_helper.c:2345
			before sysenter, it's 
				0x75b44df9 
				0x75b44dfb 
				0x75b44dfd 
				0x75b44e00 
				0x75b44e03 
				0x75b44e06 
				0x75b44e08 
		dump below:
			EIP 0x75b44ded: length: (3): andl      $0x00, (%ecx)
@EIP 0x75b44df0: length: (7): leal      0x75B480E0(,%eax,8), %edx
@EIP 0x75b44df7: length: (2): movl      (%edx), %ecx
@EIP 0x75b44df9: length: (2): cmp       %edx, %ecx
@EIP 0x75b44dfb: length: (2): jz        0x0000001A
@EIP 0x75b44dfd: length: (3): movl      0x4(%esi), %edi
@EIP 0x75b44e00: length: (3): leal      -0x10(%ecx), %eax
@EIP 0x75b44e03: length: (3): cmpl      %edi, 0x1C(%eax)
@EIP 0x75b44e06: length: (2): jnz       0x00000009
@EIP 0x75b44e08: length: (3): movl      0x18(%eax), %ebx

	It's hit many times. Check the above again tomorrow.


9:00AM
--------------------------------------------------------------------------------------
Task 220: improve the process termination
--------------------------------------------------------------------------------------
	[1] check the handling of process termination [10 min]
			sysenter, EAX requeest 0x101 is to terminate process.
			does other registers impact the semantics such as ECX or EBX?
			gdb) p/x env->ECX_BEFORE_SYSENTER 
				$5 = 0x69ffd4
				(gdb) p/x env->EDX_BEFORE_SYSENTER 
				$6 = 0x69fed0
			Strangely, append_eip is never called but still able to dump.



	[2] re-check the last 10 instructions before process termination of csrss.exe (5dee000) [15 min]
		0x75b448dc 
		0x75b448dd  ** lcall 7c9010ed
		0x7c9010ed 
		0x7c9010f1 
		0x7c9010f3 
		0x7c9010f6 
		0x7c9010f8 
		0x7c9010fb 
		0x7c9010ff 
		0x7c901101 ** return  
		0x75b448e3 ** cmp       $0x07, %si 
		0x75b448e7 
		0x75b448e9 
		0x75b448ef 
		0x75b448f2 
		0x75b448f5 ** lcall  
		0x7c90e88e  mov       $0x00000101, %eax
		0x7c90e893  mov       $0x7FFE0300, %edx
		0x7c90e898  lcall     *(%edx)
		0x7c90eb8b 
		0x7c90eb8d This must be sysQuickCall 

	[3] check if any of thehese has been hit.
		Verified: 0x75b448dd is actually ONLY hit once, now the problem boils down to
	what is the unique path that leads to the problem. check the prevoius 100 instr.
...
0x804df184  sysexit
0x7c90eb94  ret
0x7c90e384  ret 0x10
0x75b446be  test %eax
..
0x75b4470a ** lcall will be called multiple times 
0x7c901005 *  always hit after the above lcall??? 
...
0x7c90102b ret **  hit many times
0x75b44710 
0x75b44fe1  ** hit many times! 
...
0x75b448dc  ** hit ONCE
0x75b448dd  ** only HIT ONCE!
0x7c9010ed 
...
0x7c9010ff 
0x7c901101 
0x75b448e3 
	
	[4] from the above it seems that 0x75b448dc's problem. dump
		@EIP 0x75b448d0: length: (2): jnz       0xFFFFFFEB
		@EIP 0x75b448d2: length: (1): cmpsb     %ds:(%esi), %es:(%edi)
		@EIP 0x75b448d3: length: (2): addb      %al, (%eax)
		@EIP 0x75b448d5: length: (3): addb      %dl, -0x18(%edi)
		@EIP 0x75b448d8: length: (1): stc
		@EIP 0x75b448d9: length: (1): push      %es
		@EIP 0x75b448da: length: (2): addb      %al, (%eax) ***
		@EIP 0x75b448dc: length: (1): push      %ebx
	It seems that it's 0x75b448da cuts in some interrupt and causes the problem.
		Problem: 75b448da is never hit! (VERIFIED) strangely it is never hit.
	 	check without the condition on CR3. (still not working)
	verified: strangely the previous instruction 0x75b448da is never hit.

	[5]  conjecture: someone prepared the stack so that it jumps to the path to
	terminate the process. check 0x75b44fe1 , what are usually the next addr.
		0x75b449a2
		0x75b448a4
		It seems to be alternativing between these two.
		The problem is who is pushing these addresses to the stack?
		Need to set a conditional BP and check the save wordsA

8:30AM 03/14/2014
	[6] add value capture code snippet to line 1287 of tcg/i386/tcg-target.c (helper_trace_mem)
		and see who is pushing value 0x75b448dc to the stack? [30 min]
		[6.1] is helper_trace_mem done before or after the mem operation?
			It seems to be before the real operation.
		[6.2] then we should block on read instruction first, get what is the stack
			address that 0x75b448dc is read from the stack. DOES NOT WORK. it crashes program.
		[6.3] in ops_sse.h helper_trace2, in the next instruction (0x75b448dc), read the
			esp value from env->ESP_VAL_BEFORE.
			value is 0x69fee0, 0x52fee0, 0x101fee0
			not stable, but 0x69fee0 is the most frequent. (especially the first time GDB is run)
9:30AM
		[6.4] 6.3 does not work. Find out where is cpu_stl_code [15 min]
			it's defined in target-i386/soft_mmu.h:306
9:50AM
		[6.5] set a conditional BP there and see who is saving 0x75b448dc [15 min]
			include/exec/softmmu_header.h
	DOES NOT WORK. because cpu_stl_xxx might not even be called!
10:30AM
		[6.6] study the logic of where helper_trace_mem is called again! [30 min]
			tcg_out_trace_mem is called at tcg_out_qemu_ld and tcg_out_qemu_st, they
		will be always called for memory read and write.
		It basically generates a branch using three calls tcg_out_tlb_load,
			tcg_out_qemu_st_direct, add_qemu_ldst_label.

			tcg_out_tlb_load:  it calculates the address and test some bit in softmmu table entry
		and based on it generates two branch. The first branch will be load_direct
			tcg_out_qemu_st_direct:
			tcg_qemu_ldst_label: create a label and will be processed later to be wired._
11:00AM
		[6.7] modify tcg could be too costly to create a function like helper_trace_mem.
			Instead, set a bp on helper_trace_mem when the addr being written is
				0x69fee0 and 0x52fee0 and 0x101fee0
				verified, it's hit too many times!!!!
		[6.8] check tcg_out_qemu_st_direct
				for 4-bytes operation, it calls directly
					tcg_out_modrm_offset: it just generate 2-byte instruction (psuedo).
					which does not look like the direct access. Strange
					it is generating one instruction of MOV.
	   --- try to understand the logic completely.
		[6.9]  browse softmmu_header.h

8:30AM 03/15/2014
		[6.10] read tcg_out_tlb_load (tcg-target.c): [30 min]
		the following is the code generated, see comments
   0xb5134b8d <code_gen_buffer+2957>:   call   0x834104a <helper_trace_mem>
		#addr to write is 0x6ffc
   0xb5134b92 <code_gen_buffer+2962>:   add    $0x10,%esp
   0xb5134b95 <code_gen_buffer+2965>:   pop    %edx
   0xb5134b96 <code_gen_buffer+2966>:   pop    %ecx
   0xb5134b97 <code_gen_buffer+2967>:   pop    %eax
  0xb5134b98 <code_gen_buffer+2968>:   mov    %ecx,%eax #copy addr into %eax
	#ecx is the address %6ffc
   0xb5134b9a <code_gen_buffer+2970>:   mov    %ecx,%edx #copy addr into %edx
   0xb5134b9c <code_gen_buffer+2972>:   shr    $0x8,%eax
   0xb5134b9f <code_gen_buffer+2975>:   and    $0xfffff003,%edx
   0xb5134ba5 <code_gen_buffer+2981>:   and    $0xff0,%eax
   0xb5134bab <code_gen_buffer+2987>:   lea    0x360(%ebp,%eax,1),%eax #load MMU entry
   0xb5134bb2 <code_gen_buffer+2994>:   cmp    (%eax),%edx #see if entry matches
   0xb5134bb4 <code_gen_buffer+2996>:   mov    %ecx,%edx
   0xb5134bb6 <code_gen_buffer+2998>:   jne    0xb5134bbc <code_gen_buffer+3004> #jump to slow path
   0xb5134bbc <code_gen_buffer+3004>:   add    0x8(%eax),%edx #store the offset to softmmu

***	Note the instruction label_ptr[0] = s->code_ptr; recorded 0xb5134bba (which is the
	target of the address of branch instruction), it will be overwritten later.
*** offsetof(CPUArchState, tlb_table[mem_index][0]) is the way to 
	access global variable.
*** It looks like the %edx contains the actual address. See below
	tcg_out_mov(s, type, r1, addrlo); //at the beginning
	here it moves addrlo (which is from a dynamically allocated register) [%ecx], 
into r1 (which is %edx)

 tcg_out_qemu_st_directly just generate One instruction:
   0xb5134bbf <code_gen_buffer+3007>:   mov    %edi,(%edx) #this is to perform the save into MMU
 *** here tcg_out_qemu_st_direct(s, data_reg, data_reg2, TCG_REG_L1, 0, 0, opc);
	data_reg is the register that contains the 32-bit data, (i.e., it's %edi)
	TCG_REG_L1 is the register that contains the target address (i.e., it's %edx) - 
		the target address is actually the address of the softmmu entry.

9:30AM
* hardware breakpoint to figure out the relationship between addresses, see annotations
	above (from helper_trace_mem)
10:00AM
	Implementation steps:
	[1] in CPUArchState (cpu.h) add last_write_val [10 min] DONE.
	[2] add a move instruction that writes into env->last_write_val, simulate env->ESP_VAL_BEFORE
		[60 min]
		tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State, ESP_VAL_BEFORE)
		initial implementation does not work.
		Modify from:
		tcg_out_modrm_sib_offset(s, 
			OPC_LEA + P_REXW, //opcode
			r0, //register --> destination register
			TCG_AREG0, //rm --> ebp->points to actually the CPUArchState structure's base
			r0, //index // contains the corresponding MMU entry index  0x360[ebp+eax+1] -> eax
				//it is the eax here.
			0, // shift
			 offsetof(CPUArchState, tlb_table[mem_index][0]) //offset
			);
	It generates the following 0x360 must be the result of offset
   0xb5134bab <code_gen_buffer+2987>:   lea    0x360(%ebp,%eax,1),%eax #load MMU entry
10:50AM
	break into tcg_out_qemu_stl and observe the values:

		Effort 2: in tcg_out_qemu_st_direct:
			[1] push r0. DONE
			[2] set r0 to 0. DONE
			[3] then call tcg_out_modrm_sib_offset to set r0
			[4] generate the save code

	[3] debug into tcg_out_qemu_st_direct [15 min]
		Checked the code generated but could not debug into it, 
		Seems to be ok and did not crash app.
11:45AM
	[4] now check who's pushing 0x75b448dc as the return address.
		@EIP 0x75b448dc: length: (1): push      %ebx
		Now it's hit! the eip is: 
		@EIP 0x75b448d7: length: (5): lcall     0x000006FE
		@EIP 0x75b448dc: length: (1): push      %ebx
	Shoot! wasted one days' effort! It's the problem of dumping at a wrong offset!
	Now break on 0x75b448d7 and see the last 100 instructions.
0x804df12a 
...
0x804df184 
0x7c90eb94 
0x7c90e384 
0x75b446be  //hit many times.
..
0x75b4470a 
0x7c901005 //hit many times
...
0x7c90102b 
0x75b44710  //hit many times
..
0x75b448d6  //hit once

	[8] now use binary search to handle the following:
0x75b44710  //hit multiple times #leal -0x100(%ebp), %eax
0x75b44716 #push eax
0x75b44717 #leal 0x20(%ebp), eax
0x75b4471a #push eax
0x75b4471b #lcall 0x000006b8
0x75b44dd3-- new func 	
0x75b44dd5				# push %ebp
0x75b44dd6 				#mov %esp, %ebp
0x75b44dd8				#movl 0x8(%ebp), %ecx 
0x75b44ddb				#push %ebx 
0x75b44ddc 				#push %esi
0x75b44ddd 				#movl 0xc(%ebp), %esi
0x75b44de0				#movl 0x4(%esi), eax 
0x75b44de3 				#and 0x00FF, %eax
0x75b44de8 				#test %ecx, %ecx
0x75b44dea 				#push %edi
0x75b44deb 				#jz 0x5
0x75b44ded 				#andl 0x0, (%ecx)
0x75b44df0 				#leal 0x75b480e4, eax, 8, %edx
0x75b44df7				# movl (%dcx), %ecx
0x75b44df9 				# cmp %edx, %ecx
0x75b44dfb 				# jz 0x1a
0x75b44dfd 				# movl 0x4(esi), %edi
0x75b44e00				# leal -0x10(%ecx), Teax 
0x75b44e03 				#cmpl %edi, 0x1c(%eax)
0x75b44e06				#jnz %0x000009 
0x75b44e08 				#movl 0x18(%eax), %ebp
0x75b44e0b 				#cmpl (%esi), %ebx
0x75b44e0d 				#jz 0x11
0x75b44e1e 				#movl 0x8(%ebp), %ecx
0x75b44e21 				#test %ecx, %ecx
0x75b44e23 				#jz 0xF4
0x75b44e25  //multiple times #movl 0x20(%eax), %edx
0x75b44e28 			#movl %edx, (%ecx)
0x75b44e2a 			#jmp 0xFFFFFFED
0x75b44e17			#pop %edi 
0x75b44e18			#pop %esi 
0x75b44e19			#pop %esp 
0x75b44e1a			#pop %ebp 
0x75b44e1b			#ret 0x0008 
0x75b44720  //multiple times # mov %eax, %edi 
0x75b44722					# test %edi, %edi 
0x75b44724  // mutiple times #jnz 0x155
0x75b44879  // multiple times  	# cmp 0x1, %esi			*****
0x75b4487d  // multiple tims	#jz 0x11f (jmp not taken) ****
0x75b44883  //hit once			#cmp 0x6, %si
0x75b44887  //hit once 			#jnz 0x4f  (jmp taken)
0x75b448d6  //hit once			#


//next job: try to find out the meaning of the code

	8:00AM 03/16/2014
	[5] try to figure out the logic at 0x75b44879, print memory first
	0x75b44879: 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d 
	Use windbg command:
		s  -b 80000000 Lf0000000 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d 
		s  -b b0000000 L40000000 66 83 fe 01 0f 84 19 01 00 00 66 83 fe 06 75 4d 

	Did not find it

		Then search for 75b44722
		s -b 80000000 L10000000  85 ff 0f 85 4f 01 00 00 53

		Then search for 0x75b44e17 too many
	Then search for 75b44e1e

8:30AM 03/18/2014
	[6] search for 0x75b44710
	10:30AM tried all combinations, did not work. strangely
	[7] another attempt:
		in WinDbg type: !process 0 0
			find csrss.exe
		Then attack process using .process command
		search again. Does not work.
	[8] read help file of "s" command, found that unless using "L?", the search
		range will not exceed 256MB which is 0x10000000.
	[9] To search full address range:
		s 0x00000000 L?0xffffffff 66 8e fe 06
	[10]try the process limite again.
		This time: use command ./process /i proc_id
		the "/i" enforces to run and break at the specified process.
		now use the s 0x00000000 L?0xffffffff 66 8e fe 06 to search for the entire addr space

		!!!! *****************************************************************************
		found similar address at cmp 0x6, %si (0x75b44883) located at addres 0x75b4474c!!!!
		However, it is not listed by the lm command! 
		Use !address 0x75b4474c to find it out.
			did not fidn any useful information
		*** .reload /f /v (enforce immediate load (now have loaded all the moules)
		******* IMPORTANT LESSON *************8 ##############!!!!!!!!!!!!!
			.reload /f /v
			s 0x00000000 L?0xffffffff 66 8e fe 06

		!!!! *****************************************************************************

	[11] search for instruction: 0x75b44710, it is also located on address 0x75b44710 in
		windbg.

	[12] Now use WinDbg to study the code again.
0x75b44710  //hit multiple times #leal -0x100(%ebp), %eax
0x75b44716 #push eax
0x75b44717 #leal 0x20(%ebp), eax
0x75b4471a #push eax
0x75b4471b #lcall 0x000006b8 #//CsrLocateThreadByClientID
0x75b44dd3-- new func 	
0x75b44dd5				# push %ebp
0x75b44dd6 				#mov %esp, %ebp
0x75b44dd8				#movl 0x8(%ebp), %ecx 
0x75b44ddb				#push %ebx 
0x75b44ddc 				#push %esi
0x75b44ddd 				#movl 0xc(%ebp), %esi
0x75b44de0				#movl 0x4(%esi), eax 
0x75b44de3 				#and 0x00FF, %eax
0x75b44de8 				#test %ecx, %ecx
0x75b44dea 				#push %edi
0x75b44deb 				#jz 0x5
0x75b44ded 				#andl 0x0, (%ecx)
0x75b44df0 				#leal 0x75b480e4, eax, 8, %edx
0x75b44df7				# movl (%dcx), %ecx
0x75b44df9 				# cmp %edx, %ecx
0x75b44dfb 				# jz 0x1a
0x75b44dfd 				# movl 0x4(esi), %edi
0x75b44e00				# leal -0x10(%ecx), Teax 
0x75b44e03 				#cmpl %edi, 0x1c(%eax)
0x75b44e06				#jnz %0x000009 
0x75b44e08 				#movl 0x18(%eax), %ebp
0x75b44e0b 				#cmpl (%esi), %ebx
0x75b44e0d 				#jz 0x11
0x75b44e1e 				#movl 0x8(%ebp), %ecx
0x75b44e21 				#test %ecx, %ecx
0x75b44e23 				#jz 0xF4
0x75b44e25  //multiple times #movl 0x20(%eax), %edx
0x75b44e28 			#movl %edx, (%ecx)
0x75b44e2a 			#jmp 0xFFFFFFED
0x75b44e17			#pop %edi 
0x75b44e18			#pop %esi 
0x75b44e19			#pop %esp 
0x75b44e1a			#pop %ebp 
0x75b44e1b			#ret 0x0008 
----------------------- # return from CsrLocateThreadByClientID
0x75b44720  //multiple times # mov %eax, %edi  # esi value always remains unchanged, mostly 1
												#sometimes 6
0x75b44722					# test %edi, %edi  #edi is something like 0x171938
												#looks like thread id
									#type: PCSR_THREAD
0x75b44724  // mutiple times #jnz 0x155 (jmp) //# in Wdbg it is to jump 0x263 bytes away
0x75b44879  // multiple times  	# cmp 0x1, %esi			***** //
0x75b4487d  // multiple tims	#jz 0x11f (jmp not taken) ****
0x75b44883  //hit once			#cmp 0x6, %si
0x75b44887  //hit once 			#jnz 0x4f  (jmp taken)
0x75b448d6  //hit once			#

// CONCLUSION -------------------
	[1] entire function is CSRSRV!CsrApiRequestThread !!!
	[2]  so 0x75b44710 corresponds to the following !!! the if branch!!!
	//the following is from reactOS.
		CsrThread = CsrLocateThreadByClientId(&CsrProcess,
                                              &ReceiveMsg.Header.ClientId);
		#esi must be the LPCMessage
        /* Did we find a thread? */
        if (!CsrThread) //0x75b44724
        { 
            /* This wasn't a CSR Thread, release lock */
            CsrReleaseProcessLock();

            /* If this was an exception, handle it */
            if (MessageType == LPC_EXCEPTION) 
		...
		}

		//--> 0x75b44879
		if (MessageType != LPC_REQUEST) //LPC_REQUEST is defined as 1, cmp 0x1, esi
        {   //--> 0x75b4487d
            /* It's not an API, check if the client died */
            if (MessageType == LPC_CLIENT_DIED) //LPC_CLIENT_DIED is 6 // 0x75b4488e cmp 0x6, %si
            {
		/* Now we reply to the dying client */
                    ReplyPort = CsrThread->Process->ClientPort;

                    /* Reference the thread */
                    CsrLockedReferenceThread(CsrThread);

                    /* Destroy the thread in the API Message */
                    CsrDestroyThread(&ReceiveMsg.Header.ClientId);

                    /* Check if the thread was actually ourselves */
                    if (CsrProcess->ThreadCount == 1)
                    {
                        /* Kill the process manually here */
             ******           CsrDestroyProcess(&CsrThread->ClientId, 0); ***** 
				So here, it kills the CsrThread *****
                    }

                    /* Remove our extra reference */
                    CsrLockedDereferenceThread(CsrThread);


	12:30PM
	[8]
		Study the logic of NTSTATUS NTAPI CsrApiRequestThread 	( 	IN PVOID  	Parameter	) 	
	It seems to have some relation with timeout.
		The function itself is responsible for receiving user threads and handle their request.
		It basically serves the client requests (API calls).
		There is a branch is the message is LPC_CLIENT_DIED, it check if this is the
	the only thread of the processs and it destroys the process. The question is
	
		The question for us is that: why is the CSRSS.exe kills itself?

	8:45AM 03/19/2014
	[9] break on the branch to CsrDestroyProcess and check signal 6.
		[1] qemu side: b ops_sse.h:2480
		[2] windbg side: 
			!process 0 0 
			.process /i the PROCESS_ID of csrss.exe
			g (to run to the process)
			.reload /f /v (reload symbols force)
			ba e1 0x75b448d6 (this is where check thread count and terminate process will go o)
			Observation:
				0x75b448d6 is NEVER invoked in windbt, instead
				0x75b44883 is hit many times
			It's si = 7 (LPC_EXCEPTION) that triggers 0x75b448d6. So there could
	be something wrong initiated from notepad.exe and caused csrss.exe to kill itself.
	The pseudo code from reactos is listed below
	//-----------------------------------------------------------------
	if (MessageType == LPC_EXCEPTION)
            {
                /* Kill the process */
				//***** NOTE HERE, it's terminating the CSR PROCESS!!!! ******
                NtTerminateProcess(CsrProcess->ProcessHandle, STATUS_ABANDONED);

                /* Destroy it from CSR */
                CsrDestroyProcess(&ReceiveMsg.Header.ClientId, STATUS_ABANDONED);

                /* Return a Debug Message */
                DebugMessage = (PDBGKM_MSG)&ReceiveMsg;
                DebugMessage->ReturnedStatus = DBG_CONTINUE;
                ReplyMsg = &ReceiveMsg;
                ReplyPort = CsrApiPort;

                /* Remove our extra reference */
                CsrDereferenceThread(CsrThread);
            }
    //-------------------------------------------------------------------------


10:00AM
	[10] Modify the system so that we can trace the last 100 instructions of notepad.exe.
		[10.1] check if the pid (cr3) is the same for trace:
			f3a5000, f45e000
		It looks like that the addr is always 0x0fxxxxxx
		[10.2] change the appending of instruction
		[10.3] observation:
				0xbf8e4e6d 
				....
				0xbf8e4e6b 
			after csrss.exe died, the notepad.exe is still running.
		[10.4] set a breakpoint at the deadth of csrss.exe and see
			what's the last instruction of notepad.exe (see whether they are triggering
			anything).
			observation: 0x804dbc61 ... 0x804dbf60 (looks like context switch)
			Running the second time returns the last last array of instructions.
			Not enough for making a prediction.
	10:45AM
		[10.5] enlarge the array of instructions to 2000 and then make a dump. [15 min]
			Still in kernel mode
	11:00AM
		[10.6] modify the adding instruction mode, do not add 0x80xxx instruction. [15 min]
			observation: first run:
				0xf849ac6d 
				0xf849ac6e 
				0xf849ac6f 
				0xf849ac70 
			---
				0xf81a37a4 
				0xf81a37a5 
				0xf81a33ec 
				0xf81a33ed 
			---
				0xf812d413 
				0xf812d414 
				0xf812d415 
				0xf812d416 
			Every time it's different.
		[10.7] try to remove 0xf8 range and see what's the last instruction.
				0x7c9011ab 
				0x7c9011ac 
				0x7c9011ad 
			--- 2nd time same. 3rd time also same.
		[10.8]
			Study the code in VM.
				0x7c80a68d 
				0x7c80a68f 
				0x7c80a690 
				0x7c80a691 
				0x7745aad8  #jmp COMCTL.7745aadd
				0x7745aadd  #pop ebp 
				0x7745aade  #ret 4 
				0x773d40e6 
				0x773d40e9 
				0x773d40ea  #jmp 773d40ee
				0x773d40ee 
				0x773d40ef  #pop
				0x773d40f0 
				0x773d40f1  #ret 4
				0x773d4255 
				0x773d4257 
				0x773d4258 
				0x773d4259 #ret 4 
				0x773d42d4 
				0x773d42e1 
				0x773d42e2 #ret c 
				0x7c9011a7 # mov %esi->%esp
				0x7c9011a9 
				0x7c9011aa 
				0x7c9011ab 
				0x7c9011ac 
				0x7c9011ad #ret 10. 
		SHOULD RETURN TO 0x7c91CBAB (or maybe others)
		Note program entry: 0x0100739D. check if it's hit

		[10.9] set a conditional bp ON 0X7C9011AD and see what's the next.
			it's hit 10 times (1st run), which is significantly more than expected
			verified it's always hit 10 times.
			The next one is 0x804e1f25 (looks like an illegal addr complaint)

7:50PM 03/19/2014
		[10.10] break on the last conditional bp and check if there are any interrupts
			b on ops_sse.h:2482 and ignore it 9 times
			then when it's hit, break on helper_trace2
			break on helper_trace_mem
			break on helper_raise_interrupt
			break on helper_raise_exception
	--> helper_trace_mem is hit first, access: 0x773d1c2c (read)
	Then helper_trace2 is hit (eip: 0x804e1f25) --> why no raise_interrupt
		It did not hit ldl_mem and directly goes to cpu_x86_exec [ meand MMU hit]
	check the other hits. It seems that the env->ESP_VAL_BEFORE suddenly jumps 
	from 0x7fa14 to 0x773d1c2c.
		Then it invokes cpu_loop_exit, which tricks a long jump and we are not 
	able to trace

		Check who is setting the ESP to 0x773d1c2c.

	So it's 7c9011a9. --> mov %esi -> %esp causing problem.
	Maybe it's the ESP value problem. check later.

10:00AM 03/20/2014
	[11] continue to figure out the problem
		[11.1] analyze the exact location of the error [15 min] verify 3 times.
			___th hit of 0x7c9011ad (10th)
			__ th hit of 0x7c80a68d (1st)
		[11.2] check the ESP is the same [1 hr]
			Starting from 0x7c80a68d and check/display ESP value of each instructioni [20 min]
			It always context switch in between	
			Context Switch happens at the jmp instruction at 0x7745aad8 (jmp 5)
			set a conditional bp at 0x7745aadd (too slow change code)
				and similar for all context switches
			Verified: upto 0x7c9011a7 the ESP value is always correct. It's the
				value of ESI (which assigned to esp) causes problem.
			esi is from:
		[11.3] repeat the error, conditional break on 0x7c9011a7 
			__ th hit causes the problem. (10th time)
		[11.4] check ESP value for each all normal before 10th
		[11.5] check who's changing ESP_VAL_BEFORE 
			[1] ignore bp on 0x7c9011a7 9 times
			[2] then set breakpoints
			[3] it is verified that ESP is indeed changed (before calling of helper_trace2).
		[11.6] check where is the esi value from
			It's set by 0x773D40E9 reading from 7F9C0 (result: 7FA04)
**** TO DO ****
		break pon 0x773D40E9 and watch the content being written.

		
	9:00AM 03/21/2014
	[12] analyze the memory saved by 0x773D40E9 [45 min]
			set bp on 0x773D40E9 and trace into the helper_trace_mem and see
			what is the contents saved.
			[1] figure how many times of its call to reach error ONLY hit one time
			[2] check out the value
				It is reading 0x7f9c0 to esi.
				verified it is reading 0x773d1c1c.
				So the POP %esi instruction has no problem.
			The question is who is writing to 0x7f9c0 with value 0x773d1c1c?
	[13] analyze the winxp version and check who is writing to 0x7f9c0 with  [20 min]
					value 0x7fa04?
			It is caused by instruction at 0x773d40a0 push esi (first time)
			Found that esi has value 0x7fa04 from the beginning of the module is
		loaded.
	10:30AM 03/21/2014
	[14] check ESI value at 0x773d40a0 and 0x773d42b3.  [30 min]
		[14.1] in translate.c, add statements for saving esi save_reg_esp_ebp_before_instr
				then conditional bp on 773d40a0 and 773d42b3
				check the value of esi
			Found that somehow the esi value is changed between the two BPs!
			check out where it is.
				Found: the esi value is changed in kernel routine, and the latest one
		is 7c9103ae.	 (before it there is an instruction which loads esi with
		value from stack [ebp+12], it seems like a dll name]

		[14.2] check when it occurs
			set conditional bp on 7c9103b8 and ignore it and see how many times to reach
		the switch.
				hit 40 times.

8:30AM 03/22/2014
	[15] continue the research on ESI problem. [20 min]
		[15.1] research what is the use of 7c9103ae. set breakpoint on 773d42b3 (entry)
				--> how many times hit 7c9103ab (change esi) --> 773d40a0 (push esi) write
				to memory)	 
		Observation: 7c9103ab is hit many times before reaching 773d40a0, it can be 
			pointing anywhere before it has the value 0x7FA04 that reaches 773d40a0.
				1st hit: 0x7F7A0 (WindowsShell.manifest)
				2nd hit: same
				3rd hit: same
				4th hit: 0xA3CF8 (global path)
				5th hit: 0x7F1D8 (WindowsShell.manifest)
				6th hit: 0x773D1C1C (Desktop/Control Panel)
				7th hit: 0x773D1C00 (SmoothScroll)
				8th hit: 0x773D1B88 (Microsoft\Current Version\ ...)
				9th hit: 0x773D1B60 (EnableBoloomTips)
				10th hit: 0x773D29A0 (Microsoft\...CurrentVersion...)
				---------------> 0X7C3D40A0 (now ESI has value 0x7FA04)
		[15.2] in winxp: start from 773d42b3 (entry) to 0x773d40a0 and see how 7c9103ae is called.
			1st hit by 0x773d42cf (it hits all of them) and then reach 773d40a0
				--> 773d4211 -> invoked 6 times and the value of esi is recovered to 0x7fa04
				--> 773d4234, 773d4239 (4 calls) -> 
					invoked another 4 times of 773d03ae and esi is recovered to 0x7fa04
				--> 773d4250 --> calls 0x773d40a0
10:00AM
		[15.3] break on 773d4211, 773d4234, 773d4239 and see what is the value of esi before them
				[20 min]
				773d4211: 7fa04
				773d4234: --> 0x773d1c1c, 7f9dc
				773d4239: 
		[15.4] break on 0x7c9103ae and see if it matches [15.1]
			1st: 7F7A0, ESP: 7f454  xp_ESP: 7f454
			2nd: 7c97ce28: ESP: 7f044  xp_ESP:  7f044
			3rd: 7c97ce28, ESP: 7f03c xp_ESP: 7f03c
			4th: a49f8, ESP: 7f130 xp_ESP: 7f130
			5th: 7f1d8, ESP: 7f0b4 xp_ESP: 7f0b4
	***
			--0x773d4225, ESI: 7fa04  ESP:7f9d8   , xp_ESP:  7f9d8
			6th: 773d1c1c, ESP: 7f754 xp_ESP: 7f768 (now different) !!!
			7th: 773d1c00, ESP: 7f750 xp_ESP: 7f764
			8th: 773d1b88, ESP: 7f754 xp_ESP: 7f768
			9th: 773d1b60, ESP: 75750 xp_ESP: 7f764
		--> recovered to 773d1c1c (did not recover to 0x7FA0C)
			-- 0x773d4234, ESI: 773d1c1c, ESP: 7f9dc, xp_ESP: 7f9dc
			-- 0x773d4239
			10th: 773d1c1c, 7f9dc
11:30AM 
	[15.5] study why the esp difference starting from 0x773d4225 (call)
			Found that four calls related to registry key open and registration.
			The first one at 773d6ff4 leads to the 6th call of 773d0a1e
			bp on 773d6ff4 and check data
			0x773d6ff4, ESI: 77d48f75 ESP: 7f79c   xp_ESP: 7f79c xp_ESI: 774d8f75
			0x77dd6b2f, ESI: 0x7c4 ESP: 7f760 xp_ESP, 7f774 xp_ESI:  0x40 (strage it's never hit)


9:00AM 03/24/2014
	[16] Use binary search to study from where 773d6ff4 to 77dd6b2f, the esp starts to change.
		[16.1] First set breakpoint on 773d42b3 (entry)
				--> how many times hit 7c9103ab (change esi) --> 773d40a0 (push esi) write
				to memory)	 
			when the first point is hit, set BP at 773d6ff4, and 77dd6b2f
			0x773d6ff4, ESI: 77d48f75 ESP: 7f79c   xp_ESP: 7f79c xp_ESI: 774d8f75
---> there is a context switch
		stragenly 0x773d6ff4 is hit a second time
						ESI: 77d48f75, ESP: 7f788
			0x77dd6b2f, ESI: 0x7c4 ESP: 7f760 xp_ESP, 7f774 xp_ESI:  0x40 (strage it's never hit)

		[16.2] figure out why a context switch will reset the ESP value, why isn't it
				reset to 7f79c?
			add a new branch (eip_in & 0xFF0000000) == 0x77000000
			Verified: it's not the problem of context switch

		[16.3] continue with the helper_trace2
				Strangely it hits: 773d6fe2 (not the regOpenReg function)
				2nd time it's hit it jumps to 77dd6a78
				after 0x773d4211
			0x773d6ff4: jumps to 0x773d6fdc (not right)

		[16.4] check again. First break on 773d42b3 , 773d4211, then 773d6ff4 and then
				773d6a78 (the 1st instr)
			--> it hits: 773d6fdc
			It triggers a context switch when doing memory read.
			check which address it is doing: 
9:30AM 03/25/2014
		[17] check again the instruction at 773d6ff4
			[17.1] check the instruction
				[1] bp on 773d42b3, 773d4211 first
				[2] then bp on 773d6ff4 (call openRegistryxxx)
			observation: gets a core dump when print_instrRange at 0x773d6ff4.
			is not successful at printing the instruction.
			[17.2] read the memory contents at the target address
			[17.3] trace into helper_trace2 of the instruction and see how it
				ends up without the calling of helper_trace2. 
		Found the problem: in helper_trace2 when it tries to perform memory read, it fails.
	because it triggers a page fault. Found the problem: when it reads 12 bytes, it is
	accrosing the boundary of the page and the check did not find that. So copying 16 bytes
	is quite dangerous operation that leads to memory read failure.
		[18] try to fix the above problem:
			in the loop which tries to copy 15 bytes, chekc if the the current address
		is page boundary (that is to & with 0x0000FFF is 0x0000000 - last 12 bites is 0, 4kb
		page), perform a second check on va_to_ha, if not stop.
			It now works.
		[19] needs to check how it works for instruction 773d6ffe  The page is now located here.
		[20] to do: 
			BREAK ON ops_sse.h:2644 and check every possible case.
			Notepad case successfully solved.
			


9:00AM 03/26/2014	
--------------------------------------------------------------------------------------
Task 221: test notepad.exe
--------------------------------------------------------------------------------------
	[1] run batch analysis with large time interval and see how it works.
		It wrongly cleared all tasks.
	[2] try add a parameter to clearTaskList(int upperLimitCount) and pass 1 in. [20 min]
	Continue to next task directly

9:45AM 03/28/2014
--------------------------------------------------------------------------------------
Task 222: fix the problem of frozen analyze task.
--------------------------------------------------------------------------------------
	[1] check the sequence of events fired. 
		the list of events are defined in
			//BatchAnalyzer::execBatchBranchSlice
			this->addTask(new taskChangeJobCategory(job, logger, Job::GEN_RAW_TRACE, 1));
			this->genTasksForGenRawTrace(job);
			this->addTask(new taskSaveTraces(job, logger));
			this->addTask(new taskChangeJobCategory(job, logger, Job::GEN_FULL_TRACE, 1));
			this->genTasksForFullTrace(job);
	[2] run and see how it works.
		It's frozen, never jump from full trace to the slice job.
		It seems to be actually ok.
		Found that full_trace is slowed by tsHasMultipleWrites
	[3] try enlarge block size to 64MB
		found that over limit
			change both to 16M entries per block does not work. 8mb does not.
			try 4MB.

9:00AM 03/29/2014
--------------------------------------------------------------------------------------
Task 223: improve the performance of loadBlock by adding a cache.
--------------------------------------------------------------------------------------
	[0] recompile and get a good setting. [60 min]
	[1] observe the performance of loadBlock and analyze when it is called most often. [15 min]
		Observation: it is switching back and forth too many times between two blocks.
	[2] design idea: maybe in the Cache set up a backup block. When loading, switch.
	[3] need to redesign Cache:
		[1] create an array of two blocks, for every attribute keep two copies, and declare
			an active_block_idx
		[2] for load_black() if the alternative block idx is the same as the id, then
			just point to the other; if not, then load the alternative block and set it to
			the other.
		[3] for saveCurrentBlock() still do the same
		[4] check all the others
11:00AM
	[4] implementation. 
		[1] change all data definitions of cache.h [20 min]
		[2] correct syntax error in each of the functions. [90 min]
			[1] createCache  DONE.
			[2] loadCache  DONE
			[3] ~Cache DONE
			[5] appendRecord DONE
			[7] updateRecord DONE
			[8] updateLastBlockIndexSize  [DONE]

9:45AM 03/30/2014
			-------------------
			[9] saveCurrentBlockToDisk [DONE]
10:15AM
			[10] saveToDisk [DONE]
				[10.1] refactor saveCurrentBlockToDisk to saveBlockToDisk(i) [15 min]
				[10.2] refactor saveCurrentBlockToDisk [5 min]
				[10.3] saveToDisk, depending on the order. [20 min]
			[6] retrieveRecord DONE
10:45AM
			[9] loadBlock
				[9.1] refactor loadBlockInto [15 min]
				[9.2] redo loadBlock [15 min]


9:15AM 04/01/2014
	[5] unit testing
		[1] destructor problem [10 min]			 
		[2] testLoadCache [15 min] 
9:40AM
		[3] appendCache [15 min]
		[4] loadBlock again. [15 min]
		[5] loadBlock issue 2 load rec2 line number: 69
10:30AM
		[6] dependLink error. [25 min]
		check saveCache...
7:50PM
		[7] fix issues in loadBlock  [30 min]
		[8] bad alloc problem [30 min]

10:00AM 04/02/2014
		[9] fix the problem of 10000 records????	
			fixed. It's 
	cleared all unit test.

	-- to do: the fwrite at line 230 can be improved

9:30AM 04/03/2014
	[10] test the entire thing on qemu_image. Problem. segmentation fault with 
		Cache::loadBlockInto.
		Fixed: it's because the request mode set to 0.

9:00AM 04/04/2014
	[11] exec job end too early problem. solved, the copy time is too short.
10:30AM
	[12] should show a progress bar of the full trace analysis [20 min]
	[13] check how the full trace is generated. Still very slow though. Found that
		2 chunks of block is still not enough. May consider multiple buffers.
		Observation: after 39% is's slowned down a lot.
		48% - 11:22AM
		55% - 11:35AM
		62% - 11:45AM
		77% - 12:11PM
		82% - 12:26PM
		87% - 12:35PM
About 7% per 10 minutes.
9:00AM 04/05/2014
	[14]  add reports of loadBlockIntoIDs and saveBlock invocation times. [20 min]
	[15] handle SIGFPE, Arithmetic exception. FIXED.
	[16] try to dump the ratio of load and save blocks. everything ok before 33%.
	[17] mysterious problem of cannot printing out Cache::loadTimes! Clean and recompile.

9:00AM 04/06/2014
	[18] fix the ratio of load and save blocks. found the problem, it's the first
		%d literal caused problem. (because it should actually be %lld)
		Now the stats:
		---progress: 37%, loadBlock: 34991, saveBlock: 2, load/save: 11663 ----
		---progress: 38%, loadBlock: 35889, saveBlock: 2, load/save: 11963 ----
		---progress: 39%, loadBlock: 36868, saveBlock: 3, load/save: 9217 ----
		---progress: 46%, loadBlock: 43915, saveBlock: 3, load/save: 10978 ----
	[19] add another attribute called actual load.
		---progress: 38%, loadBlock: 35589, saveBlock: 2, actualLoad: 250, load/save: 11863, load/actLoad: 141 ----
		---progress: 39%, loadBlock: 36299, saveBlock: 2, actualLoad: 310, load/save: 12099, load/actLoad: 116 ----
...
		---progress: 48%, loadBlock: 45691, saveBlock: 3, actualLoad: 1237, load/save: 11422, load/actLoad: 36 ----
---progress: 49%, loadBlock: 49115, saveBlock: 3, actualLoad: 1296, load/save: 12278, load/actLoad: 37 ----

	---progress: 55%, loadBlock: 70037, saveBlock: 4, actualLoad: 2517, load/save: 14007, load/actLoad: 27 ----
---progress: 56%, loadBlock: 71713, saveBlock: 4, actualLoad: 2630, load/save: 14342, load/actLoad: 27 ----

		---progress: 65%, loadBlock: 86129, saveBlock: 4, actualLoad: 3627, load/save: 17225, load/actLoad: 23 ----
---progress: 66%, loadBlock: 91041, saveBlock: 5, actualLoad: 4090, load/save: 15173, load/actLoad: 22 ----

	Observation: from 65% to 66%, there are 5000 loads and 500 loads. Actual ratio is 10:1
	1% costs about 1 minutes.
	500 actual loads is aboout 200MB * 500 = 100GB of reading. That's a lot.

	At 77% to 78% it's roughly the same ratio.

9:00AM 04/07/2014
	[20] try improve by adding more buffer. Make it generic [3 hrs]
		[1] createCache
		[2] loadCache
		[3] saveToDisk
		[4] appendRecord
		[5] updateRecord
		[6] updateLoastBlockIdxSize
		[7] saveBlockToDisk
		[8] saveCurrentBlocktoDisk
		[9] saveToDiskA
		[10] retrieveRecord
		[11] loadBlockInto
		[12] loadBlock
09:30AM 04/08/2014
	[21] unit test the change
		[1] loadCache problem. [10 min]
		[2] fix arithmetic fault. [15 min]
		[2] block_size too large problem. increase process heap size.
			call malloc_stats() to check memory usage. [20 min]
				Found that in testTrace it exceeds memory usage. around 3.2GB.
				the end of it
			testAddInstr() wasted about 300MB.
		[4.5] fix the percent problem (lSize/100)
		[5] now broke on constructSampleTrace(), break at about 1.9GB. Before the call the
			heap use is about 5MB.
				A trace takes 667MB (each block takes about 256 * 500k = 112MB. So it
					has two cache blocks, one for instrStore and the other for sequence)
				When constructFullTraceFromRawTrace is called, it is 667MB
					it first calls a trace will consumes 667MB --> 1.32GB
						trace->instrStore creates another 336MB. --> 1.75 GB (so this should
							be freed first)
						trace->execHistory creates another 336MB --> 1.97 GB
			After the fix, reduced from 700MB to 120MB.
		[6] stack overflow error. fixed.
		[7] make ad-hoc allocation of disk block.
			[1] fix Cache::Cache only init the first block.
			[2] fix loadBlockInto
			[3] fix saveBlock
			reduced another 10MB.

09:00AM 04/09/2014
		[8] test the entire system. segmentation fault. on vpage->hpage translation.
		Could not replicate the erorr unfortunately.  Address it later.
			5 success, 3 fail.

	[9] performance comparison on buffer number nad size.
	Chteeomparison: 
	Buffer setting: 500k
		2 buffer:
			---progress: 48%, loadBlock: 45691, saveBlock: 3, actualLoad: 1237
			---progress: 49%, loadBlock: 49115, saveBlock: 3, actualLoad: 1296
			==> load: 3424, actual load: 59. Ratio: 58:1
		3 buffer:
			---progress: 48%, loadBlock: 47311, saveBlock: 3, actualLoad: 193
			---progress: 49%, loadBlock: 51571, saveBlock: 3, actualLoad: 242
			==> load: 4260, actual load: 49. Ratio: 86: 1
		4 buffer:
			---progress: 48%, loadBlock: 48169, saveBlock: 3, actualLoad: 9, 
			---progress: 49%, loadBlock: 52507, saveBlock: 3, actualLoad: 9, 
			==> load: 4338, actual load: 0!!!!
			---progress: 98%, loadBlock: 147447, saveBlock: 7, actualLoad: 1683
			---progress: 99%, loadBlock: 150041, saveBlock: 7, actualLoad: 1740
			==> load: 2594, actual load: 157, ratio: 16
		8 buffer:
			--progress: 98%, loadBlock: 147467, saveBlock: 7, actualLoad: 25, 
			--progress: 99%, loadBlock: 150071, saveBlock: 7, actualLoad: 25, 
			==> 0!
		No the problem is that most slices are broke (throw exception).

09:15AM 04/11/2014
	[10] fix the occasional problem of memory broke.
			Could not repeat it again. strange. can now repeat.
	[11] found the problem of notepad1.exe crash: something wrong with ntdll during slicing.
10:30AM
	[11] run valgrind. (val_run.sh)
		[1] problem iwth task. [we'll let them leak anyway, not big]
			there is a potential that tasks deleted (get time out event called).
		[2] invalid read in saveToCache of RecordRequestProcessor. fixed.
		[3] invalid memory read of getTrace() read of bytes already freed in
				handle_mem-write->getTrace. (solved) call the destroyInstance instead.

9:30AM 04/12/2014
		[4] seems to clear all of them. Run 1 branch slice. 
			But still got the segmentation fault. in ha_to_pa.
			set bp on taskSaveTraces which deletes TraceManager and traces
	and see how many times it is hit before the next crash.
			Verified, it is still the TraceManager destroyer problem. See when it is called.
		It is a synchronization problem. The TraceManager instance has been destroyed 
	but someone else is still using it. Typical scenario:
		Thread 1: TraceManager *tm = TraceManager::getInstance; 	//s1
					if(tm!=null){ 									//s2
							Trace *tr = tm ->getTrace(cr3); //s3
		Thread 2: TraceManager::destroyInstance; //s4
	If s3 is executed between s1 and s2, that's no good.
		Even it's after s3 it's no good as well.
		So actually ALL handle_mem_read, handle_mem_write, handle_instruction should
	be protected by a mutex lock and TraceManager::destroyInstance, which is caused by
	taskSaveTraces should be handled carefully as well.
		
	11:00AM	
		[5] add synchronization protection.
			[1] declare mutex lock and protect all functions in handle.cc [20 min] DONE.
			[1.5] make a double check of everything [15 min] DONE. DONE.
			[2] examine BatchAnalyzer.cc and add protections to functions that
				call TraceManager. It seems only need to put protection to
				the call of destrooyInstance of TraceManager, because the others do
				not have conflict with it. (will be guarnateed to be sequential; other
			the current running threads of qemu threads (system emulation) will race against it).
				[20 min] DONE.
			[3] test [1 hrmin]
				[1] Problem: it locks up the system. VERY HARD TO DEBUG.
				[2] replace all pmutex_lock to mylock and others to myunlock
				found that there are two consecutive lock.
					Confirmed 4th lock creates the problem.
					isNeedResetCR3 problem.
					Not sure who locks the guy. add a piece of code to remember the lock
					It's called by send_event has lock and then 
				[3] set recursive lock to allow same thread to own/request the same lock
					twice.
					Added in BatchAnalyzer::do_jobs
			4:30PM 
					remove the lock protection on clearTasks() in BatchAnalyzer
					It now works.
			5:00PM
				[4] run it 5 times to verify no meomry crash anymore.
					1. but seems to be locked in wait_for_io
					2. 
				It now breaks at every 2nd pass.
10:00AM 04/14/2014
		[6] gdb the problem.
			segmentation fault: Cache::loadBlockInto (id=0, destID=0).
			The problem is with RecordRequestProcessor::loadFromCache.
			It is called by Trace::Trace depending on the GEN_PRESERVE_REQUEST_MODE
		And it is called by TraceManager::isProcessToBeTraced.
			Strangely it is calling the constructor of Trace::Trace.
			The problem is that after the raw trace is done, the TraceManager still
		has the name of the process in the setNamesProcessToTrace.
			In addition, the emulator should not be started at all.
		Find out who calls it. (break on BatchAnalyzer::stopvm and contvm).
			No one actually calls contvm, but it is still fired.
		Still mutliple places of error.

			[6.1] attempt 1: since gdb has problem with static field displaying,
		check the assembly instruction of TraceManager::myinst and find its location.
		Found the problem. After the first round, it does call the
		BatchAnalyzer::createInstace() and then helper_trace2 is called,
		it then clal isProcessToBeProcessed
				Question: is it called before loadvm?
				Fix: adjust the sequence of lodavm and initVM (which add the process to load).
				Also comment out taskContVM. Why is it needed for branch slice? comment it out.

		[7] now it complains about TraceManager is NULL after the 1st half round. DONE.
9:00AM 04/14/2014
		[8] broke again on TraceManager::isProcessToBeTraced, when it finds that
			the process is being traced
				break on taskInitTM::synch_job and BatchAnalyzer::loadvm
			Trace::Trace seems to be called at the right place.
			check how vecBlockIdx is initialized in Cache.cc
				Problem is that RecordRequestProcessor has no data at all! This crashes
			loadBlockInto. Check if RecordRequestProcessor is EVERY SAVED!
				the RecordRequestProcessor is inconsistent: it reports that
			there is one record, but the block is not modified.
			[8.1] set breakpoint on RecordRequestProcessor operations and see if they
				are ever called.
				Problem is RecordRequestProcessor->cacheRRP->curBlockModified[0] 
			is modified to false when save to disk. The block is indeed written into
			the file. But it's loaded.
				Then in Trace::constructFullTraceFromRawTrace, because the job mode is
		GEN_PRESERVE_REQUEST_MODE, it initializes the rr_processor, which is not good.
		Because it should actually load it.
				Modify Trace::Trace add a condition that the job category is
		gen raw trace. Seems to be fixing the problem
			[8.2] verify if branch_slice is called ever. called
	10:15AM
			[8.3] robustness test. run it five times.
				change it to 2 branch_slice
				2 success, 1 fail. (deadlock on os_host_main_loop_wait)
7:30PM 04/14/2014
--------------------------------------------------------------------------------------
Task 224: check the generated slice for notepad.exe crash problem.
--------------------------------------------------------------------------------------
					
	[1] check the generated .exe file and see why it crashes.
			Always break at 0x7c913395/96. It's a part of stricmp function. Passed address
		is not right.
			Try to look for common starting point from the collection of return addresses
		in the xp stack.
			candidates: 7c8171b5 (does not work).
		Use WinDbg to inspect stack: candidate: 7c90eac7.
	[2] problem: not able to stop at any instruction in IMM.
	[3] try in WinDbg. 
		Use command sxe ld
		to break on any module loading, found that the last module loaded is
			winspool.drv
	[4] found the path
			7c9272e6
		->  7c91b0dc
		->  7c913396 (broke)
	[5] immunity debugger not able to stop at the breakpoint
		Have to use windbg for the faulty version

		[1] break on 7c9272e6 then break on 7c91b0dc and then check the difference
			 need to figure tomorrow.	

9:00AM 04/15/2014
	
	[6] continue analysis of IMM + WinDbg (look at the difference) [30 min]
		(use sxe ld and sxd ld to enable or disable events on module load)
		7c9272e6 -> 7c91b0dc -> 
			7c913396
		Problem is with 0x7c91b1b2 (when it calls the stricmp, the 2nd parameter,
		which should be a pointer of a string is not right, it is value 90909091 (which is
		an illegal address)
	[7] Find out where the address is from [30 min]
		The following sequence are reversed
	-----------------------------CORRECT VERSION --------------------
		7c91b1b1 push edi (edi should have 775fb0ec, "msvcrt.dll")
	->  7c91b1af add edi, eax (edi should be 774e0000, eax is 0011b0ec)
	->  7c91b1a9 mov edi, [esi+18] (esi should be 001a2b58, it then loads 774e0000 from 001a2b70)
	->  7c91b102 mov esi, [ebp+10] (ebp is 0x0007e7bc)
	-----------------------------WRONG VERSION --------------------
		7c91b1b1 push edi (edi has 91909090,points to nowhere)
	->  7c91b1af add edi, eax (edi should be 01000000 , eax is 90909090))
	->  7c91b1a9 mov edi, [esi+18] (esi 001a1ee0, it then loads 01000000 from 001a1ef8))
	->  7c91b102 mov esi, [ebp+10] (ebp is 0x0007fa78))

-================ Conclusion:
	=====[1] edi is the base of the module (01000000 is the base of the notepad.exe and 77re0000
			is the base of ole32.dll)
	=====[2] eax at 7c91b1af is maybe the OFFSET to a module name (import table)? just a guess??
			Clearly, notepad.exe's PE header is somehow overwritten by 9090909090.
			So, there is something wrong with the module that writes into the
			binary executable.
	11:00AM
	[8] Figure out what's the meaning of EAX. [30 min]
		[8.1] 
		At 0x7c91b197 it first call
				PVOID NTAPI RtlImageDirectoryEntryToData 	( 	
						PVOID  	BaseAddress,
						BOOLEAN  	MappedAsImage,
						USHORT  	Directory,
						PULONG  	Size 
				) 	
		==== working version
			baseAddr: 774e0000 (base of ole32.dll)
			mappedAsImage: 1, Directory: 1, size: 0007e7a0
		It retrieves OptionalHeader.DataDirectory[1] (according to ReactOS source)
		seems to be the import table, because [0] is the export table

		[8.2] At 0x7c91b19c: MOV EBX, EAX #move EAX to EBX
		Now EBX and EAX both have 0x775fb04c (the virtual address attribute of import table)
		-> it should be the import address table BEGINNING ADDRESSS

		[8.3] At 0x7c91b19e: MOV EAX, [EBX+C]   #move content at 0x775fb04c+c
			Note: EBX currently points to IMAGE_IMPORT_DIRECTORY:
				long rvaImportImportTable
				long timeDateStamp
				long forwardChain
				long rvaModuleName ###!!!!
		So +0xC points to the field of rvaModuleName (relative address)
		So we can confirm that it is loading the module name at strcmpi, and this area (import
	table) is wiped by the current implementation.

	11:30AM 
		[8.4] obsreve the PE header ox 0x01000000 [30 min]
			Both version: import table address and size correct: 7604
			Observe:
			01007604: correct version has the right information (+12 is the rvaModuleName
			01007604: WRONG version: has all wiped out starting from 0x01007410 (all wiped
				with 0x90909090)!!!!
		Found the bug now!
		[8.5] found that 90909090 starts from 0x01001000. START ADDR!!!! OF .text section!
		So the current algorithm is WRONG! It overwrites the DATA stored in the .text section.
	7:30PM
		[8.6] read the original logic [20 min]
			it's done in clearAllSections.
			Call graph below:
				binWriter::writeDataSlice -> clearAllSections
			read the writeSOC logc:
				calls writePartrialTrace, which writes instruction in InstrInfo one by one.
	8:00PM
		[8.7] implement the conservative algorithm [1 hr]
			[1] change declration of the function and make it compile [10 min] DONE.
			[2] change the algorithm [15 min] DONE
			[3] unit testing [10 min]
			[4] testing [30 min]

	[9] problem again: crash on creating exit point.

08:30AM 04/17/2014
	[10]		check how it is used to insert 
		Problem 1: 
		jmp size: 5 > branch size: 2!
Failed in writing program exit. Removing directory: /home/samba/smbuser/slice_jobs/job_notepad/branch_slices/notepad.exe/brc_22!!!!!!!!!!!!----- 

		Problem 2: 
		cannot find hole for 40-byte jump.
Failed in writing program exit. Removing directory: /home/samba/smbuser/slice_jobs/job_notepad/branch_slices/notepad.exe/brc_27!!!!!!!!!!!!-----

		Problem 3:
Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 13, ts: 4033348, eip: 1007155!
	Breakpoint 2, Util::error_exit (fmtstr=0xb7c3b930 "Could not locate SOC for ts: %lld\n")

	[12] fix problem 2 first
		ts=4038671
	[11] fix problem 1 and 2.
		Attempt: instead of searching for '\x90' search for both \x90 and \x00.
		not sure effects yet


9:00AM 04/18/2014
--------------------------------------------------------------------------------------
Task 225: fix the SOC not found problem
--------------------------------------------------------------------------------------
	[1] fix problem 2 first
		ts=4038671
	[2] set a breakpoint at branch_slice and set the ts to 4038671 , yes, can repeat the error
9:15AM
	[3] error analysis: [20 min]
		tsStart = 4033356
		tsEnd = 4062616
		maxTS = 4062616
		It broke because tsend is equal to maxTS
		Think about the semantics of maxTS. 
	[4] check source code at SOCmanager::195 [15 min]
10:00AM
	[5] make the change to SOCmanager and test it [20 min]
		Worked!
10:40AM


10:00AM 04/16/2014
--------------------------------------------------------------------------------------
Task 226: run check on notepad.exe
--------------------------------------------------------------------------------------
	Some minor exceptions, verified that it does not distinguish between the
	environment. Around 80 branches collected.
			

10:00AM 04/18/2014
--------------------------------------------------------------------------------------
Task 227: fix jump size problem
--------------------------------------------------------------------------------------
					
	[1] find out where it complains about cannot find jump size. [20 min]
		set breakpoints on 4 locations of binWriter.cc about complaining size of jump.
		[a] slice 22 
		captured jump size > branch size (5>2)
		ts: tsBranch=4033592
		[b] slice 45
		jump size > branch size (5 >2)
		ts: tsBranch=4027658
		[c] slice 46
		jump size > branch size  (5>2)
		ts: tsBranch=4033594

		It finally broke at Trace::~Trace(), CallAdjustRecordProcessor broke

8:30AM
	[2] solve the jump size problem. [15 mi]
	Observation: main problem is a 2byte JE instruction, check tsBranch 4033592 again. 

9:15AM
	[3] solution: develop a second chance function getImmediateHole(int bytes) - usually
		five. Design [15 min]
9:30AM
	[4] implementation
			[1] create binWriter.cc::static int findHoleImmediate(Trace *trace, 
				int fid, int fidSrc, vector<sectionInfo*>* vecSection, 
				unsigned int eipStartSearch, int minHoleSize, unsigned int* holeStartEIP);
				//start the search at next immediate instruction of eipStartSearch [5 min] DONE.
			[2] add function getSectionContainAddr, modify addrInAnySection [15 min].  DONE
			[3] read byte one by one, if \x90 or \x00, make the hole.
				if note, call trace->getInstrInfoID and then load the InstrInfo.
				check if the instruction is in slice. If no, then expand the hole and 
				update the logic [20 min]
	FAILED the solution does not work. The next immediate instruction is included 
		in slice. (its eip is 0x01003543). Check why it's included in slice.
			[3] unit test [20 min]

9:00AM 04/21/2014
	[5] check into case 4033592 and see why the instruction is already in slice right after
		the jz. set a conditional bp on InstrInfo::setInSlice (0x01003543).
		It's caused by SOCEnd.
	09:20AM
	[6] check why it's list as SOCEnd:
		Since the tsSeed is a branch instruction, it's identified as a transfer control
	instruction and cannot be SOCEnd. So the next instruction is listed as
	SOCEnd.
	[7] check what quick trick we can make: check tsReversePointer.
	[8] implementation: DONE.
	[9] still not working, tsReversePointer is set a second time to a positive value.
		It is included because the subsequent one has a register dependency on it.
			now the SOC becomes:
				4027335, 4033595
		Another idea: check countInSlice.
		It would not work. It will be counted in slice.
	[10] there seems to be a bug to be fixed: countInSlice.
3:00PM 04/22/2014
	[11] re-run the program with 4033592 again.
		set bp on Trace::gen_slice_for_branch and then set bp on binWriter::writeProgramExit.
		it broke because of the 2nd instruction after it.
		Check in winxp image. (addr is the next instr after  0x01003543)
		It's a JE instruction and is dependended by 4033157 (which is a earlier timestamp)
10:00AM 04/23/2014
	[12] check timestamp 4033592 (0x01003543) in winxp. [20 min]
		Strangely still found that it is visited before the next je
	[13] re-run the program and check 4033592 again.
		Set bp to check how is 01003541, 01003543, 01003544, 01003546 included 
		Found that they are added because of TYPE_OF_ALL_FUNCTION.
		So need to make an exception for that case.
9:00AM 4/24/2014
	[14] add the TYPE_ALL_FUNCTION in findHoleImmediate() [20 min]
		ts: 4033592 (0x01003543)
	[15] fix the problem of exceeding boundary 01003554
		Now timestamp 4033592 works.
	[16] test time stapm
		ts: tsBranch=4027658 
		ts: tsBranch=4033594 works
	[17] run these three branches.
		Found that 4033592 branch fails (throws exception).
		
		
10:00AM 04/24/2014
--------------------------------------------------------------------------------------
Task 228: fix exception throwing branches
--------------------------------------------------------------------------------------
	[1] observe Branch: 4033592
		broke at 0x0100750c. -> 0100297b --> 0x010049b6
9:15AM 04/25/2014 continue the analysis
	Start two instances and compare
		0100750c same
		0100297b - different. There is a shift of 4 bytes in ESP.
		010049b6 - completely differnet sets of parameters passed.
	So the problem is that which causes the 4 bytes shift in ESP?

	[2] comparative study: 
		Different occurs at the call at 0x0010750c (call entry)
		The problem occurs at 0x0100239b, the calculation of the bridge connection is
	not accurate

	[3] try run the algorithm twice and see what's the difference. Manual run.
		ts: 4033592 (0x01003543) {set REQUEST mode to 1 first}
	[4] Problem: when it finishes generate_branch, rr_processor destroyer is not called.
		fixed the save trace problem. VERIFIED it's till the same problem.
		of ESP adjust size.
	So now the problem: bridge adjustment of ESP size is not right (adjusted 4 bytes more).
Trace into where it is from.

9:00AM 04/26/2014
	[5] set a breakpoint at the writing to 0100239b (add ESP, -28).
		problem: did not capture it
	[6] set a breakpoint at  binWriter.
		found the bridge at 0x0100293b.
	[7] set another bp on tsBridge to capture the generating process and compare.

9:00AM 04/27/2014
	[1] check the tsBridge conditional BP and then do the same and compare ESP value.
		Bridge starts at 0x0100293b. (nextSOC eip: 0x01002940)
		ESP_BEFORE_next_SOC: 7fef4
		EBP_BEFORE_next_SOC: 7ff1c
		ESP_AFTER_cur_SOC: 7ff1c
		EBP_AFTER_cur_SOC: 7ff1c

	WinXP: 
		ESP_BEFORE_next_SOC: 7fef4 
		EBP_BEFORE_next_SOC: 7ff1c  
		ESP_AFTER_cur_SOC: 7ff1c
		EBP_AFTER_cur_SOC: 7ff1c

	Error version:
		ESP_AFTER_cur_SOC: 7ff20
		EBP_AFTER_cur_SOC: 7ff20
	The problem: 0x01002938 (push ebp) is not put in slice.

	[2] check why 0x01002938 is not included in slice. It is depended by 0x0100293b
	Trace from 0x01002939.
		bp captured 0x01002939 (MOV EBP, ESP)
		Note that both ier->isNeededForReg() and Mem() are false.
		The soc (2947617 -> 2947639)A
			It has a link (ESP link) but it is skipped because it is
		not a RETURN instruction. But it is an SOC end.

	[3] new problem is with 0x01004568 (ESP messsed). use the same method to debug ig.
		It's the 0x01004567 messed up.
12:30pm 04/29/2014
	[4] set bp at binWritter::gen_bridge
		(gdb) p/x esp_after_soc
		$8 = 0x7fee0 (matches xp)
		(gdb) p/x ebp_after_soc
		$9 = 0x7ff1c (matches xp)
		(gdb) p/x esp_before [next tsTarget: eip: 0x01004577)
		$10 = 0x7fe2c (matches)
		(gdb) p/x ebp_before
		$11 = 0x7fedc (matches)
		Found two problems:
			(a) assemble ebp: should call assemble EBP
			(b) assemble ESP error:
					the ADD ESP, 0xFFFFFF4C is wrong
		Strangely, it's working again. Fix the ebp problem.
		Generate slice for 4033592 again.

	[5] still got signal 0x502. This time broke at 0x010049b6.
		esp value is not the same.

		Problem analysis:
			SOC end does not have ESP dependent link, so it did not
	propagate to the instruction at 0x0100457a.

	[6] fix idea:
		[1] set a boolean variable which records if the instruction has
			ESP or EBP propagated.
		[2] create a function propagateEBPESP(int esp=1, tsStart, tsEnd)
				backward search the latestESP modification instructions
				and mark it as ESP_DELAY
			call functions: find the one changes ESP. 
9:15AM 04/30/2014
		[3] Implementation:
			[1] create Trace::findTsWithoutESPAfter. Simulate the implementation of
				findTsWithESPAfter, make it also work for ebp. [15 min] DONE.
			[2] in Trace::full_slice set two boolean variables which records if ana instruction
				has ESP or EBP propagated [10 min] DONE.
9:00AM 05/01/2014
			[3] create Trace::propagateESPEBP(int esp=1, tsStart, tsEnd) which propagates
				the ESP/EBP lik [20 min] DONE
9:30AM
			[4] test over 4033592 cas eip: 0x010049b6. Find the soc first. [30 min]
				[1] capture the SOC of 0x010049b6 [5 min]
					Problem it did not capture 0x010049b6
				It did not capture 0x010049b6
				[2] try break on 878 and 881.  
			[5] broke at 0x0100750c--> 0x0100297b.
					Still cannot capture the real code.
			[5] check the slice generated, still got 502 error.

9:00AM 05/02/2014
	[6] check the new 502 error.
			01007509 -> 	01004592
			the problem is with the instruction 0x0100457b, one PUSH instruction is
	not taken (it's located at 0x0100457a and it is skipped in the slice)
			[6.1] break on 0x0100457b and check its dependency. It should be
				SOCEnd. advance 876 after the BP is found
			[6.2] the problem seems to be tsFindESPNotEqual.
				trace again.
				Still got the 502 error. Check again.

			Problem: 0x0100462e (test call given input "c:\windows\notepad.exe" return 1 in EAX)			
			[6.3] trace into above problem.	
				Problem is with 0x01004500 registerClassW has different returns.
					Then it differs at 0x77D4AE44A
					it's because 0x77D4A566 (differs on ESI value).
				 caused 77D4A550 (before the call all params seem to be the same)

			Why is registerClassW failed? if we place the file in win (verified it's not
				the location problem).

		Summary: sequence of error:
			0x01007509 -> 0x01004592 -> 0x0100462e --> 0x01004550 --> 0x77d4a550 (RegisterClassExW)


9:30AM 05/03/2014
	There seems to be something wrong with RegisterClassExW (check if the information is passed
	right]
	[7] check in winxp abou t0x77d4a550.				
			Windbg does not provide symbols, has to analyze manually.
			Correct Version: the third word (WNDPROC   lpfnWndProc;) is not null: 0x01003429
			Problematic Version: the third word is 0x00000000

		Hit too many times. Need to follow the chain of
			The payload is located at 0x6fdf0 (the 3rd field is at 0x6fdf8)
			0x01007509 -> 0x01004592 -> 0x0100462e -> then set the hw breakpoint.
			Found that at 0x0100462e it is still not there.

		Found the problem: the instruction at 0x01004539  which pushes the parameter
			to the stack (lpfnWndProc 0x01003429) is SKIPped!
				It is copied to 0x6fb58 at 0x77d4a485 (it is also included in wrong version)
					Then the content at 0x6fb58 is cleared at 0x77dd72ef
			When 77d4a550 (RegisterClassExW) is called it's the copied data 0x6fb50 is passed!!!

		Now the question is why isn't the data dependency being produced in the trace alg?

	[8] Find out the timestamp which actually reads the information.
		Figure out any of the following are hit multiple times.
		0x01007509, 0x01004592, 0x0100462e, 0x01004550
			ALL hit once.
		Then set a bp on 77d4a550 (how many times is it hit)?
			It is called three times AFTER 0x01004550 is hit.

		[8.1] Trace algorithm debug:  [15 min]
			set  bp at 0x01007509, 0x01004592, 0x100462e, and 0x01004550
			set bp at 0x77d4a550 (and see how many times it's hit)

			seed ts: 4033592	
			0x01004550 timestamp: 3730860
			0x77d4a550 hit times:  only 1 time before 0x01004550 ts:3732503
			the correspoinding 0x7c90eb8d (syscall) must be located at 3732507

		[8.2] check the dependency of  3732507: its eip is 0x804df184.
				only one dependency on 3732503.	
		[8.3] check dependeny of 3732506: eip: 7c90eb8b
				found that it has 22 dependency links.
				Problem: it's eip should be 0x7c90eb8d?
				link 1: esp_link: 3732505
				link 2: type 12: 3732500 (TYPE_MEM_LINK_ADV)
				link 3: type 12: 3732498
				but none of TYPE_MEM_LINK_ADV is actually processed. The problem is bDataProgatation is true. BECAUSE it's set to true earlier at the beginning!

	3:45PM 05/03/2014
	[9] debug into timestamp 3732506: eip: 7c90eb8b (it should be 7c90eb8d) first of all
		[1] two problems: why eip mismatch
		[2] why bNoDataProgation?
		Solve problem [2] first.
		For the instruction the isNeededForReg() and isNeededForMem() are both
		false, that's the reason, no data propagation is not allowed. This seems
		wrong for kernel service. Think about the case of registerClass. It does not
		generate any memory writes and then it's going to cause problem (when pass a null
		pointer). Another problem is: there should be actually register dependency. becaues
		EAX is being read.

		[2.1] check why isNeededForReg is not called. Trace from timestamp 3732512
		Problem: CONTEXT SWITCH INSTRUCTION IS NOT MARKED REGISTER DEPENDENCY.
			They change EAX, EBX, ECX, EDX, ESP registers. Should mark all these registers.
			3732514:  0x77d4a563, MOV EAX, EBX (EBX->EAX)
				deplink: 2 (reg link on ebx, control link: 3732513)
			3732513: 0x77d4a55d (JE 77d6fa25)
				deplink: reg link: 3732511
			3732512: 0x77d4a55a (MOVZ SI, EAX)
				deplink: reg link: 3732503 (but it should be 373507): guess original version is
					EAX return is 0 and nothing is changed so there is no dependency on it.
			3732511: 0x77d4a558 (TEST EBX, EBX)
				deplink: reg link: reg link on 3732510
			3732510: 0x77d4a555 (MOV EBX, [EBX+14])
				deplink: 3 
					reglink: 3732423 (out dated, should be depend on sysenter instruction)
					memlink: 3730852
					controlink: 3732509
			3732509: 0x77d49835 (RET 1C)
				deplink: 3
					reglink: 3 ESP link: 3732508 (ok)
					memlink: 3732502 (take the return addr from ebp+4) ok.
					control link: 3732508
			3732508: 7c90eb94 (RET)
				deplink: 3
					ESPLINK: 3732505
					memlink: 3732505 (call pushes ret)
					control link: 373507
			*** 3732507: 0x804df184 sysexit - (*** SHOULD NOT BE RECORDED!!! ***
				deplink:  reg link on 3732503 [does not look right]
			*** 3732506: 0x7c90eb8b  (missing 0x7c90eb8d!!!!)
				deplink:  22 of them!
					ESP link: 3732505
					mem adv link: 3732500 ...
			3732505: 0x77d49833 (call [edx])
					ESP link: 3732502
					reg link: 3732504
			3732504: 0x77d4982e (mov edx, 0x77fe0300)
					 no dep link
			3732503: 0x77d49829 (mov eax, 11e8)
					control link: 373502
			373502: 0x77d4a550 (call 77d49829)
				
			
	Summary:
		[1] problem that caused the miss of code: all bMultipleWrites && code 0x 0x7c90eb8d (
				or 0x7c90eb8b) should be marked as propagating memory (needs all memory
				contents).
		[2] issue 1: during the trace generation, 0x7c90eb8d should be recorded 
			instead of 0x7c90eb8b
		[3] issue 2: during trace gneration, sysexit should not be recorded
		[4] issue 3: eax, ebx, ecx, edx, esp should be recorded as updated 

	Decision fix: [1] first.

	9:00AM 05/04/2014
	[1] NEW ISSUES AFTER [1] is fixed, still not working
		check these points. 0x01007509, 0x01004592, 0x0100462e, 0x01004550
		Problem is that the push instruction is still not included in slice.
	[2] trace into  3732506 and check each dep link
		memlink: 372505, 3732500, 3732498, 3732497, 3732496, 3732495, 3732493, 
			3732456, 3732446, 3732490, 3732487, 3732477, 3731039, 3731040, 3731052,
			3731068, 3731049, 3731048, 3731050 ...
		For ts: 3732495, eip: 77d4a536
		For ts: 3731039, eip: 77d4a5b3
		For ts: ... there are many layers of info forwarding, not sure which one.
	[3] check how these are first layer dependency are handled:
		problem: isTsInSliceWriteTo the tsToMrM has no record at all. Because no one
	is actually reading from the memory that is WRITTEN by the ts.

	Now fixed.


9:00AM 05/04/2014
--------------------------------------------------------------------------------------
Task 229: trace collection problem.
--------------------------------------------------------------------------------------
	[1] think about the change of adding all memory dependencies of a syscall/sysenter.
		think about a case that a printf is never needed. the syscall might
	write to some memory slots that the others depend on (kernel memory).
	These will be skipped. Otherwise, there will not no dependency to the printf.
	Should be ok.

	[2] issue 1: during the trace generation, 0x7c90eb8d should be recorded 
			instead of 0x7c90eb8b
		Fix Trace::checkRecordStatus about context switch. bRecordEnable
			still should be true, delay one step.
	[3] issue 2: during trace gneration, sysexit should not be recorded
		Fix Trace::checkRecordStatus about context switch. bRecordEnable
			still should be false.
	[4] issue 3: eax, ebx, ecx, edx, esp should be recorded as updated 
		in InstrExecRecorder::expandFromRaw, if it's sysenter, update the
			this->writeRegAccessMap.

	8:45PM 05/06/2014
	[5] implementation plan: address issue 1 and 2 in Trace::checkRecordStatus.
		[5.1] trace into the logic
			Problem: at 0x7c90eb8d the bRecordEnabled is disabled (it should be delayed one
				step).

	11:00AM 05/10/2014
		[5.1] fixed 0x7c90eb8d fixed.
		[5.2] check how the sysexit is handled. break on Trace.cc:1885
			Seems to be ok. verify again.
			it's handled correctly.
		[5.3] verify issue 1 and 2. break at the Trace::gen_slice_for ...
			set a breakpoint at eip 0x7c90db8d and see how many links it has.
			[1] check the problem that that system receives segmentation fault.
				cache->vecBlockIdx[] break. It's RecordRequestProcessor::loadFromCache broke.
				Still got the segmentation fault. even change the mode. -- stuck at this point.
				In full trace mode, it is to load. But load fails.
			[2] check if RecordRequestProcessor is ever saved.
				Verified.

	11:30AM 05/12/2014
				Need to enable save trace::rr_processor in BatchAnalyzer.cc
				Fixed
			[3] now verify if 0x7c90eb8d has the links DONE. working!
			[4] verify sysexit is not recorded.
				check if there is any instruction recorded for 0x804df184 or 0x80xxxxxx.
				Problem: sysexit not handled yet.

	8:30AM 05/11/2014
		[5.4] check the handling of sysexit. [30 min]
			Set a breakpoint on 0x804df184 and 0x804d1f25 in Trace::handle_instr
		and checkRecords. Found the error, did not delay one step.
		[5.5] Fix and check again:  [30 min]
			Observation: still not removing 0x80000000 range instructions completely
			because 0x7c90eb94 returns to 0x8000000 range instruction.
		[5.6] check the event of interrupt how it is handled [30 min]
			search for INT_EVT
				seg_helper.c:1219
				trace again on 0x7c90eb94.
				interrupt is INDEED raised by 0x7c90eb94.
				Trace::handle_interrupt sets the bJustReceivedInterrupt.
			It seems that 0x7c90eb94 generates a page fault and CPU jumps to
			0x806ecd34.
				it seems to be handling it right, ALBEIT the context switch
			should not have one step delay.
	10:30AM 
		[5.7] fix the extra one step delay in sudden context switch for page faults. [30 min]
			debugging: check 0x7c90eb8d case and the other switch case from 0x7c90eb94.
		[5.8] verify if the fix is working. Seems to work, no 0x80XXXXYY instructions are
			hit.
	12:15PM
	[6] fix the register issuance.
		break on InstrExecRecorder.cc:244
	[7] generate the slices.

9:00AM 05/14/2014
	[8] run all the slices in regular mode. Perfect! no exceptions at all! and some does
		pop notepad.
		copy the file for diff test.
	[9] run in debug mode. 
		difference: none!
	PROBLEM SOLVED.


9:10AM 05/14/2014
--------------------------------------------------------------------------------------
Task 330: compare the slices for test_vm job
--------------------------------------------------------------------------------------
	test_vm is:  Themida
	[1] problem: error cannot find instruction at 0x0
	change record time to 2 minutes and see how it works.
		It break at the first call gen_slice_for_branch

7:00PM 05/16/2014
	[2] debug into the problem.
		Problem: generate the 1st branch
			-> final step: writedataSlice
			write the last soc: 385751->385751
				write instruction 0x604000
				file_offset 2105856
			the problem is that the eip is located at 0x604000
			which is the last byte of the 3rd section.
	[3] found the problem: 0x604000 may be an instruction which is completely
		out of the range.
			Copy the file and analyze how many sections are there.
		Found that: there are two sections lsited by Qemu:
			0x401000 (startInMem), size 0x20000
			0x423000 (start in Mem), size: 0x1e1000

		But reported in IMM: there is only one section: 0x400000, 0x205000 which includes the
		range. Check the PE header.  (0x205000 is reported by the size of image).
			There are at least 5 sections listed, but only 3 sections are listed
		in QEMU.
			Check binWriter::getExecutableSections.

	[4] finally the bug: it is the binWriter::writeSOC. it should NOT use vec[1] in soc list.
11:30AM 05/17/2014 

	[5] new problem: crash at verifyAllNOPS. 2nd visit of line 95 of binWriter.cc crashes
	the program.
		Problem is 0x42494e(it's regarded as NOP).
		[5.1] check in xp. Verified it's a 0xbb. The question is
			why is it written?
		[5.2] check why binWriter::writeInstruction is called and why does it
		call verifyAllNops.
			The problem is that the instruction written into the file
		does not match the one in disk.
		gdb) x/5bx toWrite
		0x6f96dfe3:     0xba    0x00    0x30    0x02    0xf0 (mov edx, ...)
			ins @42494e: mov       edx, 0xF0023000
		(gdb) x/5bx verifBuf
		0x85d4e3f4:     0xbb    0x01    0x31    0x03    0xf1 (mov ebx, f1033101)

		[5.3] check tomorrow what is at 0x42494e (set a hardware breakpoint).
			Verified: it's run-time code extraction.
		[5.4] Discussion: should be enforce writing it?
			[1] possibility 1: it could be the source and make hte decoded instruction actually
				different.
			[2] possibility 2: keep it and let it extracted. The problem is that
				program exit does not work.
10:20AM 05/18/2014.
		[5.5] Check possibility 1, enforce writing. [30 min]
			It does not produce an executable file.
		[5.6] try possibility 2.
			Still got exception 0x80000004 at 0x424959
			In the original program, it's already changed to extracted code; but
		in the sliced program, it's still not touched.

			So the problem is that THE SLICING ALGORITHM DOES NOT DEAL WITH
		THE SELF-PACKING ALGORITHM!!!


11:00AM 05/20/2014
--------------------------------------------------------------------------------------
Task 331: handle self-extraction in slicing algorithm
--------------------------------------------------------------------------------------
	[1] algorithm design: idea: 
		Whenever handling an instruction add a memory dependency reference on the last
	write. This can be added when generating the full trace.

7:00pm
	[2] Implementation: in moc_mem_access,  add the memory link.
		It still crashes. The problem is control flow.
8:45AM 05/21/2014
	[3] verify the self-extraction is working. [VERIFIED]
		[1] change back to 20 seconds version and see if it is working.
			There are occasionally deadlock. Check it later.
		[2] in InstrexecRecorder set a conditional bp and check if 0x424959 propergate to
			other instructions. 
			Problem: for one instruction at 0x424959, it has estalibhsed over 5 linkes.
			timestamp: 1038921, 1038990, 1038994, 1039998, 1039002
			eip: 0x424945,   0x424946, 0x424946, 0x424946, 0x424946
		So 0x424946 is hit multiple times.
			But it should be actually the instruction at 0x424943 which writes into
			these instructions. (verified, the problem is caused by 2 instrutions
			shifting in recording). So it is 0x424943 who writes to the same instruction
			(different location), the recording is actually right.
10:30AM 
	[4] check the generated slice again. [15 min] Does not work
	[5] problem: the program hangs.
			It is blocked in os_main_wait_loop, when requesting for
		i/o lock.
	[6] solve i/o lock problem: set a boolean variable bTerminated as global and
		check it before acquire the lock
		[6.1] check the normal running of os_main_wait_loop found that there is a global
		variable called qemu_shutdown-requested
11:30PM
		[6.2] in BatchAnalzyer::stopvm set the qemushutdown_requested to 1;
			and in loadvm set the shutdown_requested to 0. [15 min]
			Actually just call qemu_system_reset_request. [not working]
			[1] add a global variable bVMTerminated. [5 min]
			[2] change implementation of qemu_shutdown_requested and check the bVMTerminated [5 min]
			[3] set the variable in loadvm and stopvm of BatchAnalyzer [10 min]
12:00PM
			[4] tes: it seems that the thread of init thread is running. It's just very slow.
	It seems that the threads are having a live lock. The mutex lock check is locking
	the system up.
		Displaying the lock, it shows that thread 10014 is the owner of the lock 
	thread 10014 waits for a lock for mem_write (mylock) the lock is held by
    thread 10031 which does the analysis of slicing.

			[5] set a breakpoint on the thread of Trace::gen_slice (mylock).
				get its thread id first.
				[1] b BatchAnalyzer.cc:491
				[1.5] pthread_create
				[2] get the thread number
				[3] b mylock thread thread_id
			Found that problem is caused by mylock of thread. The task of slicing and
		generate full trace is a long running thread.
			** test try to disable mylock in send_event
			unable to catch it.
			[6] test it again. Question: would disabling mylcok in send_event cause
	synchronization errors? (one thread doing read/write mem but the trace/traceManager
	is deleted?)

9:00AM 05/22/2014
				[6.1] new problem: failed becaues of time out. Guess it's the full-trace
					timeout too short. Redo.
				[6.2] it turns out to be gen raw trace, execNextTask crashes the entire thing.
					The problems turns out to be TraceManager::destroyInstance.
				The problem is caused by the termination of the main thread.
				So, we should actually remove the is_shutdown_request() change in vl.cA
					instead, inesrt the bVMTerminated logic into is_vmstop_request() in vl.c
				problem: still throw exception on vmstate<VMSTATE_MAX
					change the state to PAUSED.
			Now seems to be fixing point 5.

10:00AM 05/23/2014

	[7] it still hangs, because there are too many SOCs.	
		Try see if the modification on verifyAllNops caused the problem.
		Look at ::writeInstruction.
		Still no effect. Check what's the problem (is thread 1 main thread
		costing too much?) Verified that when generating the full trace the main thread
		is working.
	[8] test if it works on notepad.exe.
		the main loop thread is still executing.
		The generation speed is about 2 minutes per slice.
	[9] make modificatoin in main-loop.c:422 add a sleep statement. on condition of
		bVMTerminated. Note: move the actual definition to dummy.cc
		New experiment: speed is about the same

8:30AM 05/24/2014
	[10] think about what caused the low speed.
		[10.1] disable the mem dependency and check. Verified, when mem dependency is
	brought into scenario, it makes it very slow.
9:30AM 05/24/2014
		[10.2] do the experiment again. There is only two socs resulted.
			1153635 -> 1153636
		and 1153627-> 1153628,
		only 4 instructions involved?
10:30AM
		[10.3] recover the mock_readinstr and see what are the socks.
			the first loop that propagates the dependdency takes forever. Takes about 15 minutes
		to generate the first slice (28764 socs!)
		[10.4] print the data and cache loading and see what's the problem.
9:30AM 05/26/2014
		[10.5] refine the print and check
			It looks like the getESP is consuming a lot of time.
		[10.6] add a print_stats in Cache class for debugging purpose. 15 min.
		there are about 1 million timestamps.
			Cache load is not that crazy however, most of the
			see if reset prioirty of thread is ok.
	2:00PM 
		[10.7] try adjusting prioirty of the branch slice thread and see what happens.
7:45PM 05/27/2014
		[10.8] adjust the priority of main thrad as well and see what happens.
			Found the original approache does not work. Prioirty is always 0.
			seems not quite helpful.
				Need to set policy to RR.
9:00AM 05/28/2014
		[10.9] debug and check the effect of SCHED_RR.
			It seems no use. Delete all threads config statements.
9:4rAM
		[10.10] research problem again and check the bottleneck.
			use sampling approach.
			Most of the time it is spent on getESP value.
5:00PM 05/29/2014
		[10.11] check the feasibility of improving the getESP_BEFORE productivity. [30 min]
			[a] read the logic of getESP_BEFOR [done] 
			[b] think about hte current solution of CachedMap [done] no need to use CachedMap
			[c] think about the solution. Idea: keep an array list (Cache) of ESP/EBP value
		to store the entire ESP/EBP value change of records. Part of the records may be
		located on the disk, but the recently used records can be stored in chunks.
		When a request comes in, perform a binary search on the chunk.

		[10.12] Implementation Steps [Estimate: 2.5 hrs]
	5:40PM 05/29/2014
			[1] declare class BinSearchTable. It includes a Cache, and allow operations
				to add value paper. [5 min] DONE.
			[2] constructor BinSearchTable(char *cachePath) [5 min] DONE.
			[3] void addValuePair(long long int ts, unsigned int value). Mainly to
				call Cache::appendRecord [15 min]
			[4] long long int getSize() [5 min] 
			[5] void saveToDisk [5 min]
	6:00PM 05/29/2014
			[6] unsigned int value getValueForTS(long long int ts). Perform
				binary search on it [30 min]
			[6.1] retrieveRecord(long long int *ts, unsigned int *value) [15 min]
	8:00PM 05/29/2014
			[8] unit testing 1. 1 -> N 3 times size of the cache [20 min]
			[9] uni testing 1 -> N 3 times but prime number entries. [25 min] DONE.

	4:50PM 05/30/2014

			[10] add a faster version of quick_search using the latest loaded cache.
			[11] unit test it.

	8:45PM 05/30/2014
			[7] Trace::constructEspChangeTable(bool bESP) 
				[1] declare two BinSearchTable one for esp and one for ebp [8 min]
				[2] in Trace::constructFullTrace from raw trace, build the binSearch Table [15 min]
				[3] in Trace::loadTrace() deserialize the two binSearchTable. [15 min]
			[10] modify Trace::getESPAfter [10 min]
			[11] modify Trace::getESPBefore [15 min]

	8:40PM 05/31/2014
			[12] debug: load trace problem. solved

	8:50PM 06/01/2014
			[13] debug: problem of consecutive ts. UNSOLVED YET.
				inspect code. 
				It seems ok. The problem is solved.

			[14] too slow in clear_inslice_tags.
				Found the problem: too many passes.
				Every time bNoMod is false.
				Find out who is writing to bNoMod
				due to the call of verify_all_socs.

	7:30PM 06/02/2014
			[15] debug inot verify_all_socs and see how it returns false
			on bNoMod.
			It seems that soc id 0 always gets false in its bModified, let'c check 
		what is going on.
			Found the problem, an extra i++ caused the problem!!!! (fxxx!) introduced by testing
		code.

			[16] now back to the problem: cannot read byte at specific address.

	4:00Pm 06/03/2014
				[16.1] check the "EIP" it is writing into.
					give up the write if the address is NOT right.
					Problem: read byte 512 in section 2. (0x604000).
				the problem is that the real size (in mem) is 0x1024 but the
				in file size is 512. So need to make two updates. First, read the 
				in file size and then control the loop.
	
					[a] fix sectionInfo
					[b] fix the loop on reading byte/writing byte.
				Seem to be fixed. Need to verify. Very slow though.
			[17] generate 10 slices and run them.

	4:00PM 06/04/2014
			[18] it turns out the slice 3 is still too slow. Check it again.
				in Trace.cc:1279 change i to 3
				Pass2 : 7:48PM -8:07PM (20 minute)
				Pass3: 8:08PM - 8:27PM (20 minute)
					foudn that there are 157545 SOCs. Too many.
					Most time spent on verify_desc_soc.
				Pass4: 8:07-8:49PM
				Pass5: 8:49PM-

	5:00PM 06/05/2014
		[19] try to improve it, remove verify_desc_order()
				pass2: 8:25PM -8:39Pm (14 min)
				pass3: 8:40PM -8:50PM (10 min) improved half the speed
				pass4: now analyze each.
					clear_tag 8:51PM - 8:54pm 3 MIN
					init_data_slice: 8:55PM - 8:55PM 1 min
					slice_all_soc: 30 seconds
					add new ts to soc manager: 8:56PM - 9:05Pm 10 min
					verify_and_reset: 9:09PM - does not take much time.
	4:00PM 06/06/2014
		[20] check it again and record the number of SOCs after verify_and_reset.
				pass1: smsize: 157210 
						first SOC tsStart = 1153664, tsEnd = 1153668
					-> 157202
				pass2:  157457 (after adding ts) (added by the prevous rounds)
					-> 157457
					92809 modified consecutive many SOCs modified
				pass3: 
		[21] analyze the bottle neck of for loop of addSOC.
			random sampling:
				ii->set_inslice - 3
				sm.addTS (findSOC) - 15
			So socmanager.addTS findSOC is the most costly operation
		[22] try to improve SOCManager??findSOC 
			after improvement of findSOC
			pass2: 9:59PM ->10:02PM 2 minutes! 
				
				
4:00PM 06/07/2014
--------------------------------------------------------------------------------------
Task 332: Speed-up
--------------------------------------------------------------------------------------
	[1] replace the sample programs.
	[2] improve clear_init_slice.
		[a] in InstrExecRecorder add an integer pass attribute (5 min) DONE.
		[b] modify interface of setInSlice and add pass number. (10 min) DONE.
		[c] clear all related syntax errors (20 min) DONE.
		[d] modify isInSlice interface and implementation and clear all syntax errors (20 min) DONE
		[e] modify clear_init_slice (10 min) DONE
		[f] handle serialization DONE
		[f] unit testing [15 min] DONE.
	[3] test on b21.exe (there are lots of exceptions). Must be introduced by the improvement.
	[4] run on themider job first and see if it improves.
4:00Pm 06/08/2014
	[5]  debug into themida job and look at the performance.
		pass 2: sm.getSize() 157204.
		full_slice does not take much time.
		Problem: more and more passes
	[6] debug into one pass and check the time.
		found that one addTS operation could take a lot of time.
		There are too many SOCs, and merge them takes too much time.
		8:00PM 
	[7] heurstics: if pass>5 or number of SOCs greater than 1000, don't do SOC, do one slice
	directly.

4:00PM 06/09/2014
--------------------------------------------------------------------------------------
Task 333: Debug the Speed-up Algorithm
--------------------------------------------------------------------------------------
	[1] use b21.exe Found proble brc_1
	[2] IMM broke at b21.exe of brc_1. (actually it broke on all of the EXE files)
	[3] use IMM to debug IMM. It looks like the path problem, too long./
	[4] cannot figure out why. Use windbg to debug it. brc_0 works.
	[5] use hxD to perform a binary diff between correct version and wrong version.
		First difference occurs at file offset 0x400. (it is the start of the .text
	section). The first byte diff is 0x405 (file offset) corresponds to
	0x00401005 (jmp MAIN instruction) It is changed to NOP. but it does not seem to
	explain why IMM crashes on it.
		One observatoin: why is NOP placed????
	[6] place the faulty b21.exe as b22.exe in the same folder. IMM does not crash anymore.
		It seems to be the path problem.
	[7] verified. Now need to copy the file to the ...\sdk path to do the check. 
	[8] comparative study of the two versions. Find out where it breaks.
		The problem: 0x4014A0 (jump Start) is not included.
	[9] debug into the slicing algorithm line 1228
			source seed eip: 0x40134e
		the init_data-slice looks pretty normal
		check the full_slice_all_soc now
		Strange: soc is only 126949->126945.
4:30PM 06/10/2014
	[10] verify the only SOC problem: only ONE SOC identified. small range.
		Observation: the handleProgramEntry does insert a new SOC at the program entry.
		bridge for the first SOC is 0x40616e
		First instruction is 0x40149b. Problem is that II is identified as not in slice.

		Check handleProgramEntry again and check why II is NOT identified in slice.
		Problem is inSlice (pass_no) returns true, but ii is not in slice.

		See the problem: need a deep clean which clears ts of all IER.

3:30AM 06/11/2014
	[11] implement the clear_init_slice_deep.
	[12] debug into slice 1 and check handleProgramEntry.
	Removed the problem. But there are occasionally other exceptions.
	[13] generate the full slice for b21.exe.
	[14] new problem: too many timeouts.
			
 
3:00PM 06/11/2014
--------------------------------------------------------------------------------------
Task 334: Fix the timeout problem
--------------------------------------------------------------------------------------
	[1] find which slice times out.
		id = brc_4.
	[2] comparative study of brc_4.
		Problem: the RET of 0x4019E4.
	[3] find out brc_4. ts.
		ts is 0x405fcb.
			It's called by _mtinit
8:48AM 06/13/2014
	[4] check out how many times processFunction is called. It is called many times need
	conditional bp.
	[5] set a conditional BP so that the processFunction is called for 0x4019E4.
		Process function is not called at all.
	[6] so we do have to trace back from 0x405fcb.
	[7] display the last EIP included in trace.
		There are 8 SOCs being processed.
	The last two are:
	: /x lastEIP = 0x406183
	1: /x lastEIP = 0x406174

	[8] check what is located at 0x406174.
	The question is WHY 0x406174 did not trace to 

	[9] problem is with the call at 0x401341 (call PRE-LOG)
			0x4019a0 to 0x4019e4 (call of pre_log)
	[10] problem is the handling of 0x401348 (cmp instruction), why it does not
		lead to the RET instruction at 0x4019e4.
		Problem is that 0x401348 is not hit at all?
9:00AM 06/14/2014
	[11] debug 0x401348. In previous rounds get the ts of 0x401348 and then
	in the 4th round. check if it is in slide, if it is how it is set?
		ts is 126945.
	At the beginning of 4th round: 
		during pass 1: it is not set.
		After all the passes, it is still not set.
	Strange: after debugging, it actually generate a correct slice. The address 0x401348
is overwritten with a jump instruction correctly.

	[12] regenerate slices and observe behaviors. Still found a lot of problems.
	generate 8 slices.
	[13] problem now starts from brc_5. Check what's the problem.
	New problem: there are actually two problems. First the ESP value is maintained not 
correctly and then the RET at the end of the function security_init_cookie is not 
included in the slice.
	Decision: check the ESP problem first. Analyze instruction 0x4061A2. Look at why
its ESP dependency is not propagated.
	[14] debug 0x4061a2. set a conditional BP in Trace::full_slice.
		verify first if it generates the bad exe file.	
		Problem: it's not producing exactly the same exe file after the code is changed.
	A second time, it reproduce the error trace. Try it again.
		Problem: 3rd time. It does not work again. Has to record the trace firs tand then
	reproduce.
		verified twice. Now it's the problem of 0x4061a2 which does not
	propagate ESP properly check why.
	11:00AM 06/14/2014
	[14] check 0x4061a2
		it's in slice and bEspProcessDelay is true.
		ESP dependency leads to ts: 126842 (eip: 0x40619e). 
It is ignored because of SOC starts at 126843 (eip: 0x40619f).
	[15] check the list of SOCs.
	At pass4, soc 9 (starting from 0, second last soc)	, there are 11 socs.
	126843 (eip: 0x40619e), tsEnd = 126906 (

	vec10: 126832 (eip: 0x406174), tsEnd = 126833(also: 0x406174??)

	The problem is then, why is the first call included? It is called because the setting
	of the last SOC. When pass exceeds 5, there is ONLY one SOC. Then it is doen that way.

	[16] check the slicing in the last SOC and see how it would because like this.
	Now in the last SOC, the esp link does propagte to 126827 (0x40619e)

	Now look at the handling of timestamp 126827 (0x40619e):
		has esp link. propagates to eip: 0x406182, ts: 126837
		Is it set in slice? Strangely ier is in slice but ii is not!!!

	Check the handling of 0x4061a2 and see how tsTarget is set in slice:
		only setEspProcessDelay is called.
	A problem of the pass number approach: since flag is not cleared. if in another pass,
if an setEspDelay is called it will be used to update the pass number and then the other
inSlice flags will be confused!!!! needs to fix.


3:4506/15/2014
--------------------------------------------------------------------------------------
Task 335: Fix the bug on pass_no 
--------------------------------------------------------------------------------------
	[1] add attributes and fix all functions. [30 min]
	[2] unit test.. OK.
	[3] generate 10 slices.
	[4] new problem a lot of 0xC00005 error. Check brc_5
		comparative study: break at 0x4013b5.
		It seems that the generate bridge is WRONG. At 0x40601E it tries to jump to 0x406011,
which is in the middle of an instruction..
	[5] break on brc_5 and check how the slice is generated.
		[a] what is the slice target?
			target slice is 0x40601E. Strangely, the findHole locates 0x406011 as the
	starting address of the hole.
9:00AM 06/17/2014
	[6] continue debug break on brc_5 and check writeProgramExit.
		b Trace.cc:1247
		Found the problem. The hole_start is started in the middle of an in slice instruction.
		Strangely, the trace->has_instr(eip, opcode) should actually have checked it. buggy
	implementation.
10:00AM
	[7] proposed fix: [20 min]
		introduce two variables. If the instruction is in trace then set the block.
10:30AM 
	[8] test the fix. check brc5. [15 min]
	[9] generate 30 slices and test.
	There are still quite some c0005 errors.
	[10] check branch br_11
		break at _mtinit
		break at 00406112
		Problem: 00405ccb (push instruction) is not in slice.
		This leads to 0x4019AC not reading the right instruction. which relies on 0x405cc9.
	0x405ccb is skipped and leads to the problem.
	[11] debug:
		break on br_11, and then set a conditional bp on 0x00405ccb and see how it's
	propagating the dependency link. pay attention to SOC.
		Observation: tsSocStart: 130256, tsSocEnd: 144667
		tsCur: 142345, so it's not the soc problem. check why.
		(a) 142345, eip: 0x4019ac.
			depends on: espLink: 0x4019a5 ...
		(b) check eip: 0x405cd0 and check its dependency. (ts:142342)
			depends on: espLink did not propagate because it is not bEspProcessDelay.
		(c) check 0x4019a0.
			espLink: 142342. Problem: ESP flag is NOT set correctly!
			Found the problem. ier->setInSlice does an extra of cleaning job.
	[12] new problem: a lot of process kill when running. slice brc_18. 0x103 error code.
		Problem: 0x004013D5 is not included in slice, it is dependent by 0x004013DF.
		This leads to the problem that it does not jump to 0x004013DF.

8:45AM 06/18/2014
	[13] break on brc_18. and check what's the dependency of 0x4013DF, it should have a dependency
	on 0x4013D5 (jnz). [20 min]
		did not hit.
	9:00AM
	[14] find out what's the last ts that is hit in brc_18. [15 min]
		The slice starts from 0x4070d6.
		0x4013f4 (setargv) ->  0x4057e8 -> 0x40724b
	[15] check 0x4013f4.
		0x4013f4:
			[1] esp link: 0x405924
			[2] 0x4013ef set need visit.
		0x4013ef:
			[1] reglink: ignored because no data propagation
			[2] control link on 0x405924: ret of getEnvironmentStrA
				process control link and find function can be skipped. set the previous
				instruction as need visit: 0x4013e5.
		0x4013e5:
			[1] reglink: ignore
			[2] control link on 0x7c812c92.
					process function and skip the function.
					set need visit on 0x4013d5 ***
	10:10AM
		0x4013d5: It is not in slice because it is set to be NeedVisit only.
			It does propagate the data dependency (conditional on reg values)
			and set the previous TEST EAX INSTRUCTION into slice.
	Conclusion: bug: if the instruction before a function call is a JUMP, it is not
		right to set it only to needvisit, it should be directly in slice; otherwise,
		the control link will not be properly propagated. 
	10:40AM
	[16] fix: at the processFunction if the dependee is transfer control, then set it in slice.
	[17] debug case 18. propagates to 0x4013d3.
	[18] fix setIER_II
	[19] debug case 18 again.
	[20] test 20 cases. all good.
	[21] try all slices. same except slice 95. Could be caused by response in keyboard.
	[22] generate themider slice.
		too slow. stuck on slice 12.
			
		
9:15 AM 06/19/2014
--------------------------------------------------------------------------------------
Task 336: check the speed problem of slice 12.
--------------------------------------------------------------------------------------
	[1] repeat the experiment:
		set bp at Trace.cc:1297,  modify i to 12.
		check how it slows down:
		read 2099717
		set inSlice: 2099667, 1049079,  1049083, 1049091
		read 2099736
		read 2099738
	block switch occurs: 4179912, 1608840 set slice of 1048523, 2516929, ...
	because the order of toProcess is random order.

	[2] add an integer and check the number of processing items.
		tid: 1, 3, 16545, 26009, 29385, 29803, 78528, 78688, 79202, 79437, 79672, 
	80103, 80109, 80110, 80116, 80121,
		It seems that it's thrattling.
	[3] improvements that can be made:
		Make the toProcess a heap instead of FIFO.
		ts: 17591, 256333, 5435259, 5435259, 760082, 5435259
		tid: 1034259, 1034260, 1435475, 1479031, 1479114, 1600447, 2655052,
		-> end: 2794517

		time: 2 minutes, end: 2794547 

	[3] improvements: avoid duplicate id processing.
		It actually never hits. the priority queue already handles it.

	[4] more improvements to make
		[a] line 369 of Cache.cc can be improved.
		time: 1 minute. saves about half. 

	[5] observation:
		init_data_slice is CALLED many times! SOCmanager.cc:61

	[6] observation: about 1 hr to 1.5 hrs per slice. Desparately need to analyze the
	performance.

8:50AM 06/20/2014

	[7] verify the 7 slices generated. 2 ok and 5 generates exceptions.
	[8] analyze the first branch which generates the error.
		The first branch ends about 30 instructions from execution.
9:00AM
	[9] compare it with the correct version.
		The problem is that the instruction at 0x00604062 is skipped (while strangely
	the instruction before it, also used for self-extraction is kept).

9:30AM
	[10] check the current implementation and see if they work for the job1.
	seems fine no bugs introduced.

10:00am
	[11]  check the algorithm, if no SOC handling at all what's the speed?
		full slice still takes more than 20 minutes
	[12] check the most expensive time operation by sampling:
		tsCur recover: 2
		Trace.cc: 949: 8
		add link: 1
	[13] plan: add priority_queue to processing.
		1 add counter. add Util::clock functions.
12:00pm
	[14] Util::clock:
		[1] startClock. 10 min.
		[2] endClock. 5 min
		[3] getDuration. 5 min
		[4] add counter. 5 min
	[15] run and collect data:
		[1] slice 12: first run of full_slice: 
				(gdb) p sec
				$3 = 1477.6700000000001
				(gdb) p cntProcessed 
				$4 = 5049499

		===== second run ===============
				(gdb) p sec
				$6 = 1533.54
				(gdb) p cntProcessed 
				$7 = 5049499
	[16] improvement: add a priority_heap and each timestamp is processed.
		Refactor the code. Need a lot of coding do it later.

9:00AM 06/21/2014
	[17] Implementation:
		[1] Modify the trace.h and change InstrExecRecorder [10 min] DOE
				setInslice, setEsp, setNeedVisit etc. 
		[2] Change implementation and correct all syntax errors [30 min] DONE
		[3] run unit testing [10 min] DONE.
10:00AM
		[4] run on job1. [15 min]
			[a] remove the bug of duplicates.
			Around 8 c0005 error. Acceptable (out of 147 slices)
		[5] test slice 12 of themider. [15 min]
			(gdb) p sec
			$4 = 1541.9200000000001
			(gdb) p cntProcessed
			$5 = 4416542
		Strangely, did not improve the time
			(gdb) p sec
			$7 = 1498.9000000000001
			(gdb) p cntProcessed 
			$8 = 4416542
		Does not gain much.
	[18] proposal of another improvement. Remove the reloading.
9:30AM 06/22/2014
	[19] Implementation:
		[a] make a snapshot before proceeding
		[b] idea: declare a bunch of local variables (boolean) to represent results of
			function calls and make copies of dependentLinks.  
9:50AM
		[c] details:
			[c1] add a copy constructor to dependLink [8 min] DONE.
			[c2] declare an array of dependLink of MAX_WRITE_LINK [5 min]
			[c3] do the copy [5 min] DONE
			[c4] declare all the boolean variables for IER functions [40 min]
				[a] mem link. DONE
				[b] reg link. DONE
				[c] esp link. DONE
				[d] ebp link. DONE
				[e] control link. DONE
		[d] check the effect
			b Trace.cc:1315
			b Util::getDuration
				(gdb) p sec
				$2 = 1460.95
				(gdb) p cntProcessed 
				$3 = 4416542
			Verify slight change.

		[e] verification: ok around 8 c0005s out of 108 slices. 
		[f] observation: most GDB ctrl-c stops at the handling at the end of
		the control link.

9:30AM 06/23/2014
	[20] improve the handling of link
		[1] declare FastCachedMap class (template) , takes long long int [30 min]
			[a] constructor takes capacity [5 min]
			[b] hasKey [10 min]
			[c] get(long long int key) [5 min]
			[d] add(key, value)  [8 min]
		[2] unit test it [20 min]
10:00AM
		[3] improve the implementation [20 min]
		[4] debug implementation [15 min]
			[4.1] issue 1: the place it is cleared is wrong.
		Achieved around 10x improvements
		(gdb) p sec
		$22 = 154.22
		(gdb) p cntProcessed 
		$23 = 4416542

		[5] run the algorithm on job1 and see if it works. around 14 c005. out of 110 ok.
	[21] try to improve the handling of deep cleaning of cache. 
		The problem is that instrStore is already cleared. Check if it caused any errors.



9:15 AM 06/24/2014
--------------------------------------------------------------------------------------
Task 337: check the speed problem of slice 12 --> further improve speed of clear_init_slice.
--------------------------------------------------------------------------------------
	[1] try to understand why no deep clean is no ok. generate the slice. [30 min]
		[a] break at brc_2.
		[b] analyze difference.
			Problem: instruction 0x4014a0 is not included
		[c] trace into trace generation and see what is the last EIP.
			lastEIP is 0x4014A0.
			Now the problem is why it is not written into slice.
			Use the following to find out its ts:
			 p this->getInstrInfoID(0x4014a0, 0xe9)
			$3 = 45962
			Then call ii->loadFromCache(45962)
			When calling ii->isInSlice() it returns true.
		[d] continue to check why 0x4014A0 is not included in slice when written to file.
			Problem: the instruction is actually written! strange.
			So we are looking at wrong slice. Generated slice did work.
		UNDERSTAND THE PROBLEM. The problem is that the previous passing POLLUTES the next
	slice! (because pass number still starts from 0!)

	[2] idea:
		change pass no from "char" to int, this is going to increase the data stored for
	each instruction for about 12 bytes (three 32-bit words). Should be worth it.
	If there are 4 million instructions * 12 bytes = 400MB. No good.

		alternative design: store an short integer slice number. (2 more bytes).

10:40AM 06/24/2014
	[3] implementation:
		[1] declare a short int slice_no, init to 0. [5 min] DONE.
		[2] moidfy the serialization and unit test it [15 min] DONE
		[3] add a STATIC SHORT INT slice_no and provide a static function to set it [8 min] DONE
		[4] modify deep_clean to reset slice_no, and writes it  [10 min] DONE.
		[5] modify gen_slice and pass a slice number and change 
			the static number of slice [10 min] DONE.
		[6] at the beginning of gen_slice, if the pass_no is 0, do a deep clean [5 min] DONE.
	11:05AM
		[7] modify each isXYZ function, if slice_id is smaller, returns false. [15 min]
		[8] inspect InstrExecRecorder code again. [15 min]
	11:30AM
		[9] generate 30 slices and see the result.
		[10] found problem. First slice: brc_2: error 0x080003.
		[11] debug: generate slice 2 and see what's the problem. When it's generated separately,
			it's ok.
		[12] problem instruction at 0x00401341 depends on 0x0040133c but it is ignored. Debug it.
			set a conditional breakpoint at Trace.cc:738 if ier->eip==0x00401341
			link is esp link on 0x40133c.
			setEspDelay is settled properly.
			ier->isInSlice(pass) is true. but ii->isInSlice() is false.
		[13] check it again.
			Strangely, after setEspDelay, the isInSlice() is true.
			Found the problem.
				Before round 2, the pass_no_inslice is set to 2 by previous branch generation
				Now when slice_id is updated by setEsp, but the pass_no is not updated,
				it's leading to wrong result.
	2:00PM
		[14] whenever needs to set the slice_id, check if this is the first time it needs to 
			be set and clear the pass_no. Declare a function for it.
		[15] debug slice 2. OK.
		[16] gen 100 slices and see how they work. around 15 c0005 in 111 slics
		[17] check setInSlice implementation (clear other flags part).
			It actually gives the same number of c0005 error.
		[18] now check the effects on the slice.
		
	
9:15 AM 06/25/2014
--------------------------------------------------------------------------------------
Task 338: check the speed problem of slice 12 --> further improve speed of other factors.
--------------------------------------------------------------------------------------
	[1] timing all components.
		init_data_slice: about 1.5 minute
		full_slice_soc: about 3 minutes
		loop that adds TS: about 1.5 minute
		init_data_slice again: 1.5 min
		full_slice_soc again: about  3 min
		writeDataSlice: about 2 minutes
		report_stats: 1 minute
		Total: > 15 minutes
		Still full_slice_soc is the most expensive one.
	[2] sampling on full_slice
			recover tsFirst: 1
			mem link: 1
			dpendLink copy: 1
			toProcess.pop: 2
			delayRegDependency: 1
	[3] check if gen_slice can be improved. make change to heuristics on break on soc.
	8:45AM 06/26/2014
	[4] check the improvement 
		init_data_slice: 73 seconds
		addTS: 327 seconds (too many initDataSlice)
		init_data_slice again: 399 seconds
		full_slice_soc again: :w
		check entire efficiency:

		9:10AM slice 12
		10:11 slice 19 (around 10 minutes each)
		11:06 slice 21 (around 30 minutes each)
		4:04pm slice 38 (around 20 minutes each)
		7:20pm slice 69 (around 6 minutes each)
		8:50pm slice 86 (around 6 mintues each)
		8:43AM slice 174 (around 6 minutes each)

8:45AM 06/27/2014
	[5] test the slices generated.
		[1] most are exceptions.
		[2] the debug version generates a log of Int3 interrupt, which stops the windbg but
		there is no "normal" 0x1122 difference between the two versions. Will solve the
		slice problem later. For now, concentrate on the speed problem.
	9:30AM
	[6] add a class called timer to project. Move startClock, endClock and getDuration(). to it
	and fix all compiler errors. [20 min]

	9:40
	[7] add timer for init_data_slice, addTS, full_slice

	9:50am analyze the stats on slice 12.
	[6] add the data stats to init_data_slice and all other relevant functions.
		init_data_slice: 67 seconds
		identifySOC: 9 seconds, 9 seconds, 8 seconds, 0.01 seconds, ...
			Algotether: 173 seconds
		writeProgram: 40 seconds.
		full_slice: 143 seconds.
		gen_slice: takes 439 seconds.

	10:45AM 
	[7] try to improve init_data_slice
		[7.1] measure init_data_slice performance 3 times. [20 min]
		46.93, 57.32, 49.75, 57.2, 54. Avg = 53
	[8] improve init_data_slice by adding boolean flags [20 min]  
		remove the reload and the current link
		68.93, 56, 63,  no good , why?
		58, 56, 
		drop two extra see the result.
		56.93, 53. 
		not much improvement though. 
	[9] try save one updateCache(). does not work.
	3:15PM
	[10] improve the update only if changed policy. see if it helps.
		why only 0.03 seconds? needs to check.
	[11] fix error in clear_slice and try it again.
		[11.5] unit test.
		[11.6] found that need to do a deep clean of the slices for each test iteration!
		60 seconds. 
	[12] verify if the algorithm change.  OK.
	[13] check the speed. improved a lot but suspicious.
9:00AM 06/28/2014.
	[14] check validity.
		Too many exceptions. Only 2 out of 39 are correct ones.
		Most exceptions are 0xc000001d. (illegal instruction). This must have something to do
	with code extraction. 
9:00AM
	[15] compare brc_0 and real .exe [30 min]
		IMMM breaks on the brc_0. Use windbg instead.
		windbg fails as well.
		Check the program header and see what is wrong. cannot see much.
	[16] check the IMM crash problem.
		Use IMM to debug IMM. It broke in a strlen function call.
		Found the problem. It may be because of the name t0.exe.exe. Problem solved! Never have two .exes!
	[17] compare brc_0 and real themide.exe
9:30AM
		Problem: 0x00604030 depends on 0x0060402b and it is not included.	 [30 min]
		Observation:
			Note: arrCOPY can first save the time.
			It did call setEspProcessDelay on 0x0060402b.
			It seems that pass_no is not cleaned.
	[18] fix deep clear
		Observation: fixed the previous. It broke on code extraction again.
	[19] fix the arrLink copy first.
		data initial: 106 seconds, 101 seconds, 102 seconds
		after change: 101 seconds,  102 seconds
	[20] check the new problem.
		The slice skips the simple decoding at 0x00604060.
		tmp breakpoint: 0x004247cf.

		Problem: 0x00424949 (instruction MOV EAX, 0X48692121) is treated by the place as
	NOP instructions when writing slices. check the algorithm.

		Found the problem. It is OVERWRITTEN by 0x90 in clearAllSections because it's
		not in slice.
			related logic: find code extract at verifyAllNops

	[21] proposed fix: at clearAllSections. read the byte at the place, if it does not
	match the opcode of the instruction then give a warning and skip.

		
9:15 AM 06/29/2014
--------------------------------------------------------------------------------------
Task 339: check the incorrect slice problem
--------------------------------------------------------------------------------------
	[1] find out the branch
		branch eip is 0x424974.
		It is a self-extracted instrution which is changed by 0x424943 to 0x424947
		Broke at
			0x004247d9 -> 0x004248d3
		The problem is that 0x004248d3 is replaced with a "JMP 0x004247ed"
		Now, 0x004247ed is identified as self-extraction area and it is not touched.

		The jmp instruction is likely to be one in the writeExit Case.

	[2] find out who's generating the JMP instruction at 0x004248d3.
		//set a breakpoint at asJMP and watch its target addr
		It's part of writeProgramExit. 
		0x004248d3 is identified as the first Jump addr.
		eipExit: 0x004247d9 is identified in findHole

	[3] need to update the logic of findHole (should avoid the self-extract part).
		The problem is that the area is identified as proper area for placing program exit
	because: [1] it is not an instruction executed before (yes, ok), and the byte is 90
		or 00. But it lacks the check that it will be OVERWRITTEN.
		To fix: add another condition for checking overwrites.
		This needs help from run time trace recording.

TO DO!!!!!!
--------------------------------------------------------------------------------------
Task 340: handling of self-extraction. Identify the memory range that is self extracted.
--------------------------------------------------------------------------------------

	[1] Design:
		[1] Changes on Trace
			[a] add a string executable_path to Trace data [5 min]
			[b] change Trace constructor() and the TraceManager::isProcessToBeTraced [15 min]
			[c] verify raw trace recording is ok [15 min]
			-- seems no need for full-trace to know about it 
		[2] Trace has a mem range manager which keeps track of addrInAny section
			for each memory write.
			[a] add a Cache to MemManager and a constructor [10 min]
			[b] add MemManager::saveToDisk [15 min]
			[c] add MemRangeManager::loadFromCache [15 min]
			[d] unit testing [30 min]

	[2] implementation
		[1] implement MemManager
			[a] add a Cache constructor [8 min] DONE.
			[b] add memManager::saveToDisk [15 min] DONE.
			[c] add MemManger::loadFromCache [15 min] DONE.
8:50AM 07/01/2014
			[d] unit test. create 10000 and use a small buffer. DONE.
				Problems: need DEBUG!
				Found the problem: saveToDisk() is called twice.
			[e] found problem with addRange.
			[f] fix unit test problem.
				fixed: note the efficiency of memRangeManager needs some improvement when
				number of items > 1000.
		[2] add executable_path to Trace data and constructor [15 min]
10:20AM
		[3] debug on Trace() and verify constructor is working. [10 min]
			ok but mem crash. recompile.
			ok. just need recompile the entire thing.
11:15AM
		[4] add memRangemanager to Trace
			[1] add data definition [5 min]. DONE.
			[2] in Trace() and destructor create a Cache and create the memRangemanager. [15 min]
				done.
			[3] debug it. [5 min] DONE.
			[4] add vecSections into trace in constructor [15 min] DONE.
			[5] debug it [5 min] DONE.
			[6] in Trace::handle_mem_write use vecSections to tell if it is a writing. [15 min] DONE. OK
			[7] debug.
			[8] copy memRangemanager in full_trace. [15 min]
			[9] debug
				problem: always broke on delete []
				ok. solved.

	8:30AM 07/02/2014
		[5] verify if vecSecs is set up correctly on branch slice [20 min]
			[5.1] need to modify loadFullTrace. DONE.
	9:10AM
		[6] in binWriter::findHole supply a new parameter of memRangeManager
			[a] add function to memRangeManager:: bool hasAddr [15 min] DONE
			[b] change definition and implementation of findHole [10 min]
			[c] debug findHole and see how it works. [15 min]
			[d] test [15 min] NOW It's working
			[e] more testing: generate 20 slices and see the result.
		[7] found new problem: slice 12 c0005

		
11:00AM		
--------------------------------------------------------------------------------------
Task 341: check problem of slice 12 (0xc000005 problem)
--------------------------------------------------------------------------------------
	[1] compare the slice.
		Found the problem: 00604062 is not included but the instruction that
			writes to the same location 00604060 is included.
		This leads to the "jmp" instruction at 0x00423014 is not processed right.
	[2] re-generate the slice 12. and check the processing of 0x00423014
			problem: bNoDataPropagation is identified becaues the instruction does not
		need anything (it's a simple jump fixed)
			fix: whenever an instruction is the result of self-extraction should
		propagate all data dependency!
	[3] new problem: 0x00423014 is not in any section! because it's out of section range!
		There must be a bug:
			[1] 0x00423014 is located in section "lgksxcmc" (size 0x1e100 and start: 0x00423000).
			[2] but it is not in the self-extraction mrm!

5:30PM 07/03/2014
	TO DO - check how mrmSelfExtract did not include 0x00423014? -- SO there is  a bug!!!
	[1] start in raw mode and see if it's captured.
		Problem: 0x423014 is destroyed when save to disk
	[2] set a conditional breakpoint.
		Strangely, it did not capture it!
		Shoot, found the problem. It's the destructor!
8:30AM 07/04/2014
	[3] generate 20 slices and check it.
		Problem: generation stops at slice 39.
10:00AM
	[4] new problem: now every slice results in exception.
		[5.1] debug slice 0.
			the problem is that after self-extraction the contents at 0x00423014 is
		still not right.
			Found the problem: the contents are changed. 
		[5.2] check why it is writing to 0x00423014.
			did not get it overwritten in findHole
			firstJump: 424974. 4248d3.
			eipExit: 0x42463a.
			file_offset is 135700
		[5.3] break on lseek when offset is 135700
		-- FOUND THE PROBLEM: it writes the DECODED instruction into the place.
		So	 Fix idea: fix writeInstruction in binWriter.
		No problem: it broke.

10:00AM 07/06/2014
	[5] debug the broke problem: problem is findHoleImmediate
		Problem: it complains that it cannot find instruction 0x4042f9. (line 693)
		Reason: 0x4042f9 is a self-extracted instruction (its corresponding location
	in file does not correspond to the self-extracted form).
		Easy fix: just return -1 when it tries to locate such a hole starting from the 
	instruction
		Long run problem: the entire writeProgramExit will anyway fail because the
	target location is SELF-EXTRACTED. overwrite it will fail.
	===> need to think over the entire solution!!!!

9:00AM 07/08/2014
	[6] modify findHomeImmediate and return false when the instruction is
		self-extracted. [20 min] DONE.
	[7] self a bp at the point and see how it works. [30 min]
		verified: it indeed kills the entire folder (so branch 3 is skipped)
	[8] generate 30 slices and see how it works.
		a lots of exceptions (but no error windows).
		compare.
		This time did not generate a lot of int3 in windbg.
		The current slice did not find any difference.
	[9] check the failed slice.
		First broken one is br16.
		check 0x004249cd.
		check 0x00424b5c.
		check 0x00426324.
		did not reach 0x00563516.
			it broke afer zwContinue.
		Start debug from 0x00426324. No much difference in stack.
		it seems that conents at 0x0012fcd4 is not the same. The sliced version misses one
		word.
			It complains about the access of eab3867e
	It broke after the 3rd shift-f9.

		Note: after the STI it should return to 0x00424c90.
		And it did return.
			Find parameter 2 of zwContinue, and +xb8 is the next EIP: 0x004236325.
	Setting software BP did hit it in both.

		Then last visited: 0x004272f1
		00561587
		005615b9. (note some info skipped).
		005616a9
		0050c652

	So there is something wrong in the big loop.
	About 4 hrs generating 35 slices. 10 minutes each (500 seconds).

	Most of them suffer the same problem from slice 12.	

	9:00AM 07/10/2014

	[10] analyze slice 12. 	 need to regenerate the slice.
		Now it's slice 16.
	[11[ check slice 16.
		There is a big loop which hard to get out.
		Hard to find out
		Use binary search to find out how many steps to reach the point of break.
		First use animate over and see how it works.
		use "run trace" in debug menu of IMM to find out what are the previous instructions
	right before crash. NOTe ecx has the same as the EAX. It must be something like
	a "jmp ecx" or "jmp eax" instruction.

		Find it! (use the trace over to save time)!!! (call 0x004266ca)
		Detailed procedure:
		[1] first a couple of shift+f9 until 0x00426324 (STI)
		[2] then shift+f7 into 0x7c90 range and set a hardware breakpoint at 0x00426325 (right 
			after the STI instruction)
		[3] then shfit+f9 to 0x00426325
		[4] in IMM->View, clear and open run trace. Then in Debug->run trace.
		[5] view the trace and find out the "call 0x004266ca" instruction the latest call.
			This part is still extracted, so set a hardware breakpoint on it.
			0x004266ca is actually hit 3 times and then broke.
		[6] Compare it with the regular version of Themider.
			Interestingly it never hits 0x004266ca!
		[7] compare the traces of the two.
			in the wrong trace: it takes only 169 instructions to reach 0x004266ca.
			Trace over is too carse, use trace into
				Still trace into takes 269 steps to reach
			So just do manual comparison. not working (caused by RTDSC???)
		[8] do in trace comparison again...
			Trick: can use "set condition" to set the number of "command count" to limit
		the number of execution steps in the correct trace (coz it never hits 0x004266ca)
			Two traces differ from:  found the previous observation was wrong.
			The correct trace actually hits 0x004266ca!!!
		[9] binary search. 
			Collect how many instrcutions until it hits the crash point in bad trace (t16).
			[9.1] Too many instructions: set breakpoint at 0x004266ca and check how many 
		instructions to hit the point.
			[9.2] Observe the trace and set a bp at 0x00513cf0 which is within the 65535 range.
				not work.
			[9.3] observe the trace. The crash is caused by the instruction "jmp ecx" at
		0x004e4643. (it's only hit once). But in the correct trace it's never hit.
 
			
			

	7PM 07/11/2014
		[10] binary search. First use trace over and then look at the difference. Use the
			limit the call approach.
			[10.1] incorrect trace. set hardware bp on 0x004e4643 (jmp ecx).
			0x00426324 (STI) -> 0x7C90EAF0 -> 0x004e4643 (trace over -> 12 instructions).
			So set hardware BP on 0x00426324 and then 0x7c90eaf0 and then trace into.
			There are too many of them use Trace Over again.
			Found that at one zwContinue - eip (at 0xb8 of the input parameter 1 of
		zwContinue) returns to 0x0042632b.
			From Trace into it seems that the system direct reaches 0x004e4643 after
		the sysenter for zwContinue.
		[11] another try: trce into from 0x00426325 to 0x00424643.
			Given the trace: do the binary search.
				Binary search: 
					004e4310 (202) [did not hit]
					005505f0 (1320) [hit] but the jump to 0x004e3ed2 (which eventually falls
		to 0x004e4310 never occurs)

				that's right 0x005505f0 the 4th time hit: 
					0x00438807, 0x0046f907, 004d62b6, 0x004e3ed2
										correct trace  0x004db2b6 0x004c6876.
				There are over 17k instructions between each call. @#@#$ hard to analyze.
	9:30PM 
		[12] collect the two traces and compare.
			Note!!! the last two checkboxes in the file dialog needs to be checked!
			Then use windiff to compare the two files.	
				The trace even recorded the register values!!!
			The different starts from 0x00550514 where ECX and EDI are the same!!!
		Check why they lead to the different ECX value for "jmp ecx" at  0x005505f0.

		So there must be someway that the slicing messed up somewhere and could not compute
the right slice given the intensive data computing and we could not figure from where.
		What we know is that at location 0x0042bae6 it reads out ECX, and it does not work then.
		-- think about it later
		[13] generate all slices possible.
		broke at slice 39.

	9:00AM 07/12/2014. Check slice 39. Why it's broke
	[14] check slice 39. does not give a lot of information. Needs to bp into slice 39.

9:00AM 07/13/2014
	[14] check slice 39. break on it.
		it is sliceing 0x432084.
		Problem: the instruction has a dependency link on itself! 
		Check why
		[14.1] check instruction 0x432084. It is a JNZ instruction. It depends 
	on the previous instruction "CMP DL, 75".
		[14.2] check what is the repvious instruction.
			the previous ts is indeed located at 0x432081.
			So the problem is: how is the depLink constructed.
		[14.3] has to regenerate the raw trace and set a bp in InstrExecRecorder.cc on
			eip 0x432084.
		In the new trace generation it's ok. Problem still break on slice 39.
		Needs to improve breakpoint again.
					
	9:30AM 07/14/2014
	[15] check slice 39. The problem is in the raw trace gneration.
		set a breakpoint on when it produces the self-pointing depLink; and then
		trace on slice 39.
		Strange: could not capture the production of self-pointing depLink, but
	the slicing algorithm fails on it.
		Now the eip is 0x43f96a.

		Still the same problem: it depends on the previous instruction

	11:00AM 07/18/2014
	[16] read the slice full trace expansion algorithm and see if there are
		any chance that it may produce self-pointing depLink.
	[17] debugging plan:
		[1] run batch slice and find out where it broke
			[a] make a check in REGI (init_data_slice) and check tsTarget<tsCur, if not,
				error_exit
			[b] b gen_slice... and start from slice 37 (note: to do true clear of flags manually
			got branch 39.
				ts: 19457710
				eip: 0x432084
	
		[2] regenerate full slice and stop at that point
		[3] generate trace again.


	9:00AM 07/19/2014
	[18] debug the full_trace
		[18.1] break on ts: 19457709 and observe 3 more timestamps [20 min]		
			[18.1.1] it breaks on Cache::loadBlock(0) -> loadBlockID 4th time.
				Strange. Not likely a timeout problem.
				Recompile the entire project again. 
			It broke when loading  rr_processor. the binfile is 0.
				The problem seems to be that rr_processor directory is wiped out everytime.
				It is called by  init_rr_processor because the GEN_REQUEST_MODE is 1
				then load_rr_processor is called. Rethink it's logic. 
	10:30AM
			[18.1.2] fix rr_processor. and then regenerate the ts and eip.
			(gdb) p ts
				$4 = 19451515
				(gdb) p/x ier->eip
				$5 = 0x432084

			[18.1.3] check the full load again.
				break on 1949514.
					19451514 -> (eip: 0x431fb9)
					19451515 -> (eip: 0x432081)
					19451516 -> (eip: 0x432084) dpends on 1949515
					19451517 -> (eip: 0x4320db) depends on others 19491478
			Conclusion: in branch_slice, the timestamp 1949515 is essentially 1949516

			[18.1.4] further check
			branch_slice:
				ts: 0, eip: 0x7c92289a
				ts: 10000, eip: 0x7c9220e0
				ts: 19451500: eip: 0x431cea
			full_mode:
				ts: 0, eip: 7c92289a
				ts:10000, eip: 7c9220e0
				ts: 19451500, eip: 431ce8
				ts: 

		Seems that there is a compact of raw slice of one instruction difference.

	3:30PM 07/22/2014
	[19] BINARY SEARCH of the incompatible place.
		approach: compare the full mode and branch_slice ts:
			break at line 217 of InstrExecRecorder.cc in full_mode
			break at Trace::gen_slice in branch mode and then use
			Trace->loadIER_II() to load and check.
			ts: 8000000, full: 0x563419, branch: 0x563419

			ts: 16400000, full: 56db6b, branch:  56db6b
			ts: 16400010, full: 56db8c , branch:  56db8c
			ts: 16400020, full: 56dddd, branch:  56dddd

			ts: 16400021, full: 56dddf, branch:  56dddf 
			ts: 16400022, full: 56dde1, branch:  56dde1
			ts: 16400023, full: 56dde2, branch:  7c90eaec  ****!!!!!! HERE'S THE DIFFERENT POINT. Why would branch slice miss one!!!
			ts: 16400024, full: 7c90eaec, branch:  7c90eaf0
			ts: 16400025, full: 7c90eaf0, branch:  7c90eaf3

			ts: 16500000, full: 76b42a67, branch: 76b42a6a
			ts: 17000000, full: 571769, branch: 57176b 
			ts: 18000000, full: 57175b, branch: 57175d * 
			ts: 19000000, full: 42677f, branch: 426781
		****************************************************************
			ts: 19451512: full: 431f42, branch:  431fb6 ****
			ts: 19451513: full: 431fb6, branch: 431fb9 
			ts: 19451514: full: 431fb9, branch: 432081
			ts: 19451515: full: 432081, branch: 432084
			ts: 19451516: full: 432084, branch: 4320db

	Strangely, it departs from trace id: 19451512 the EIPs are completely different
from each other!

	Check winxp image and look at what is located at
	0x431f42: JNZ 431FB6
	0x431fb6: CMP DL, 95
	0x411fb9: JNZ 432081

	So there is a ONE difference between raw trace and full trace. Needs to re-do the
timing of full trace. The trick is to use raw instead of the newly constructed full
trace


	07/30/2014
	9:00am 
	[1] identify exactly the location of error using binary search [20 min]
	9:30
	[2] analyze why branch slice miss the following record and causes the shift of one. 	
			ts: 16400022, full: 56dde1, branch:  56dde1
			ts: 16400023, full: 56dde2, branch:  7c90eaec  ****!!!!!! HERE'S THE DIFFERENT POINT. Why would branch slice miss one!!!
		[2.1] break on construct full trace timestamp 16500022 and see what is the full trace generated. [15 min]
			Found the problem: line 230 of InstrExecRecoder.cc skips the construction of the
			record because there is no record of the instruction in InstrStore.
		[2.2] first fix: the log error should be replaced by a Util::error call, because there is no way to proceed.
	10:00
		[2.3] verify if it's always the same for 56dde2 (eip). construct the raw trace first and then the full trace.
			[a] 1st time: 56dde2
			[b] 2nd time: 56dde2 again.
	10:30
		[2.4] break on 56dde2 and see how it's handled. break on helper_trace2.
			check how many times 56dde2 is hit.. only hit once
	10:45
		[2.5] trace into 56dde2
			temporarily change the timeout from 240 to 2400 because it will terminate the thread.
			It did add to the instructore store: opcode is 0xf.
			first 4 bytes are: 0x0f    0x3f    0x07    0x0b
		Problem: it seems that the instruction 0x0f, 3f, 07, 0b is not recognized as an instruction. Check it again.
		The problem is that x86_disasm reports that it's an invalid instruction.
		So it looks like a trick by the packer (it's an invalid instruction and then triggers the interrupt handler?)

	11:15
		[2.6] check 56dde1 and 56dde2 in theimider program. Use IMM 
			At 56dde1, there is an INC EAX command (0x40)
			At 56dde2, this is an invalid instruction (0x0f 0x3f)
			Using IMM to run it breaks at 56dde2 which complains about an invalid instruction but then shift+f9 jumps to
			56dded (I suspect that theimider has set up a certain exception handler technique to jump to it).
		[2.7] fix for the qemu:
			InstrInfo::load_instr (when the len is 0, which means that the current instruction is invalid, generate an 
		alert message, and take at least one byte for length). This will allow that the search of the opcode will
		still succeed.	

	11:50
		[2.8] unit testing. multiple errors. all passed.
		[2.9] run batch slice again and see if it breaks on 56dde2
			full slice problem is avoided. 
			branch_slice: set bp on Trace.cc:1445 and then re-init count and let it start directly at slice 39.
			Note line 1223 has a Util::error exit, it will break if the slice number is no ok.
		[2.10] problem: complain about rr_processor. Let it run fro mslice 0.
			Seems to be ok. Already processed to slice 45.
11:30AM 07/31/2014.
		There are still about 1/3 103s.
			Slicing did not find the debugger. Seems need to analyze themider and see what's the technique it is using.
			

8:4AM 08/01/2014.
--------------------------------------------------------------------------------------
Task 342: speed up SOC problem.
--------------------------------------------------------------------------------------
	8:55AM
	[1] collect stats. Let it run 1 hr and see how many slices it is generating.
		GENERATE branch 22 of 242 
		 *****************
		!!!! init_data_sice takes 93.410000 seconds.
		!!! identify SOCs takes 95.680000 seconds!
		!!!! init_data_sice takes 107.260000 seconds
	10:00AM
	[2] design of modification.
		[1] in addTS: directly call addSOC. [5 min] DONE.
		[2] add private function: hasSOCStart and hasSOCEnd [15 min] DONE.
		[3] change identifySOC: remove getCountInSlice. It seems that insertSOC needs not to be done. [5 min] DONE.
		[4] fix getSOCStart [15 min] DONE.
		[4] debug: take a slice and do it. [25 min]
			[4.1] found problem with addTS.
				SOCManager::addTS, 

	7:15pm
	[3] RUN ANOTHER ROUND. for 1 hr.
		Does not improve a lot. GENERATE branch 23 of 242.
	Prolem: MAX soc size is too small.

	10:00PM readjust the max soc size to 128000 from 128 and pass iteration from 5 to 10. Look at the result.
		10:15pm start.

	8:45AM 08/02/2014
	[4] figure out why it's so slow. Sampling
		findInsertionLocation: 4
		hasSOCStart/end: 5
		It seems that most of the time are spent on sequential search.
	9:15AM
	[5] collect the running data for slice 0.
		!! identify SOCs takes 276.960000 seconds!
		TOO MANY SOCs, use direct/one pass slicing!
	[6] improvement: [20 min] 
		findInsertionLocation: use binary search.
		hasSOCStart and hasSOCEnd, all use binary search. DONE.
		after hasSOCStart and hasSOCEnd, improved to 135 seconds.
	[7] improvement on findInsertionLocation, just add a check for the last case. [15 min]
		New timing: 122 seconds. Not improve much. Still has to try binary search. --> push to later
	[8] increase MAX SOC limits again.
		first slice has 157451 SOCs.
		Stats listed below: !!! identify SOCs takes 345.100000 seconds!
			!!! write program takes 13.610000
			!!! gen_slice takes 445.900000 seconds!
	[9] debug into the 0'th slice and look at how slices are propagated and see if there is anything we can do.
		Look at how init_slice are collected.
			first init_slice: 68.
				First couple of ts: 1153628. 1153627 (merged with previous one). 1153623. 1153622 (merge with previous.
					1153618).
		
	First problem: do we really have to mark every ts, whose instruction is marked?
			Now, because of the check of ii->isInSlice(), first init_slice grows from 68 to 28763 -> 28674 ->28678->157456->.
	Another problem is when pass reaches the limit, it did not report fail.

	7:30PM 
	[10] remove the logic on ii and repair the logic on pass no.
		no need to fix pass.
	[11] debug slice #0.
		sm size: 51 -> 55 -> 1102 ->  221-> 209
		There are too many passes and it broke.
	[12] check why there are so many passes
		set a breakpoint at the last line of socmanager::addTS and set breakpoint at all bSuccess=false in
	socmanager::verify_and_reset_soc. 
		also break on 

		bModified: 1153634, 1153636

		Other more important causes: gen_bridge introduce new dependency. fix dump operations of InstrInfo and InstrexeCRecorder
	It's already existing.

	[13] check why setBridge needs to move along the chain. This seems a bad choice to introduce more.
		For exapmle: tsEnd: 1039116 (dec [edi]), 1039117 (inc edi), 1039118 (dec ecx)
		trace into setBridgeTo for 1039117.
		Problem: 1039117 is marked in slice? why?

	08:45AM 08/04/2014
	[14] check why 1039117 is included in slice and check when it is on bridge. [20 min]
		Observation: ier does not show 1039117 is in slice, but ii->isInSlice() is true.
		Set another breakpoint on ii->setInSlice on eip: 424945, instruction is "inc edi"
			424945 is set in slice in init_data_slice for timestamp: 1039113. It is dependended by 1039116 (0x424945)
		When the setBridge is called: the current soc list has two socs:
			(1153635, 1153636), (1153627, 1153628), 
			to SOC which tries to be added is: (1039116). Its bridge on 1039117 is failed because 1039117 has the
		same eip as 1039113, which is already in slice [1039113 is not added yet as an SOC]. This seems fine.
		Then it introduces new dependencies. The entire thing looks like a loop:

		@424943: dec       [edi] #<--- jump back from 0x424947
		@424945: inc       edi
		@424946: dec       ecx
		@424947: jnz       0xFFFFFFFC

		Check how many of these are actually in slice: (altogether 68 data points).
			1153635: @424974, @424974: jnz       0x0000001B
			1153628: @424953, @424953: sub       edx, 0xF0000000
			1153627: @42494e, @42494e: mov       edx, 0xF0023000  #load contents of 0xf0023000 into edx

		
			== the rest are in the lop	
			stragely it did not discover 1039116: @424943
			1039116: @424943
			1039113 @424945
			1039112: @424943
			1039109: @424945
			1039108: @424943
			1039105: @424945
			1039104: @424943
			1039096 @424943
				repeating until:
			1038925: @424945
			----
			1038923: @42493d: lea       edi, [ebp+0x86B1935] (this may be loading from a global init data variable)

	  ** So the entire slice should include hte first 3 instructions and the decoding procedure which reads from
		ebp+0x86b1935 and performs the 4 instruction decoding. Then the result is used to compare with 0xF000000.
		The slice should have 4 SOCs only!.
	  ** but the algorithm does not yeild the best SOC solution.

		The current SOC works like this
			Add (1153635, 1153636) ok, add (1153627, 1153628) ok,
			Biw add 1939116, intends to use 1039117 as bridge but canot, because 1039113 in slice (but not added yet)

	7:40PM continue the trace into the slicing algorithm.
		soc0. (1153635, @424874, 116536)  --> why is 1153636 added? -- 1153636 is included because 1153635 is a jump instruction.
	Actually, seems no need.

		soc1. (1153628, @424653), (1153627, @42494e)

		soc2. (1039116@424943 dec [edi], which is read by @42494e). single soc (1039116, 1039116), but when finding bridge,
			1039117 (@424945  inc edi is already in slice, in earlier iteration, 1039117 itself is not in IER slice)
			Then 1039118 @424946 dec ecx is not in slice, it is used as the bridge (2 bytes) -> it uses @424947 as well

		soc3. (1039113 @424945 inc edi, 1039113) -> this is wrong, because 1039116 is already used.   WRONG. it uses the
			same bridge for overwriting.

	8:45AM 08/05/2014
	[1] read about Hadoop for future improvement. [0.75 hr]
	The algorithm needs major improvement.
	9:30AM
	[2] design algorithm for improvement. [0.25 hr]
	9:45AM
	[3] implementation
		[1] hasSOCStart and hasSOCEnd. [1 hr]]
			[a.1] should establish a hashmap for socstartEIP, socsendeip. [8 MIN] DONE.
			[a.2] hasSOCStart and hasSOCEnd change the parameter to EIP [10 min] DONE.
			[a.3] change call to hasSOCStart [8 min] DONE.
			[a.4] function addSOCStart and addSOCEnd [8 min] DONE.
			[a.5] modify insert_into_vec, add a parameter of trace.	[15 min] DONE.
			[a.6] recheck the implementation any place that modifies soc.tsStart or tsEnd. [30 min]. DONE
	11:00AM
			[a.6] debug into the trace [15 min]
				[b.1] debug into addSOCStart, addSOCEnd, hasSOCStart, hasSOCEnd, removeSOCStart, removeSOCEnd	
				[b.2] when get to soc3 (1039113, @424945, should return true on eip).
			It now can compact the loop, but it's too over agressive. It now includes soc 1 (1153628, 1153627) which is
				too far away (when doing full slice, it's going to introduce too much).
	12:00pm
				[b.3] Util::error_exit on already exists in map.
					check line 443 and examine what is being removed and what is being added.
					first time hits, generates the fault.
						$35 = 714089
						>>>>>(gdb) p soc.tsStart
						$36 = 714078
				[b.4] one more but related to line 431. hit 3 times.
					When line 431 tries to remove 385774, it broke.
	7:30Pm
					set a BP at InstrExecRecorder: 385774 in trace.h and enalbe it after line 431
					It may be removed by another ts who has the same EIP 0x604049.


	8:45AM 08/06/2014
	[4] debug the bridge algorithm.
		[4.1] for the first slice, after the first pass, check all SOCs.
			The first pass generates 4 slices:
			(1153635, 1153636),
			(1039104, 1153628), //THIS ONE IS OVER EXPANDED.
			(1039093, 1039101)
			(1038923, 1039090)

			This leads to explosion of dependencies
		9:15AM [20 min]
		[4.2] find out why in the first pass, it expands from (1039104, 1039116) to 1153628.
			break on on first slice and break before pass++
			It merges from soc 6: 1, 2, 2, 3, 4, 5, 2 (sm.size)
			The merge happens at slice count 5 -> 2. (after addTS)
			the related SOCs are:
			gdb) p sm.vecSOCs[2]
{tsStart = 1039116, tsEnd = 1039116, bModified = true, tsBridge = 1039118, room = 3, tsNextStart = 1153627}
{tsStart = 1039111, tsEnd = 1039114, bModified = true, tsBridge = 1039115, room = 3, tsNextStart = 1039116}
{tsStart = 1039109, tsEnd = 1039109, bModified = true, tsBridge = 1039110, room = 1, tsNextStart = 1039111}
			It looks like all the bridge will fail.  next ts to add is 1039108. the identifySOC merges and expand to 1153627.

			Problem: the search for end loop directs gets to 1153626. from 1039116. Needs more examination.
			Found that between 1039116 and 1153626 it is still inside the loop. So the dependency is only reading
	a very small portion of the data generated by the loop.
			Verified, the merge of slice does not cause any problem though, the last one is 1153626 (0x424949) which is
	just out of hte loop. there is no more improvement could be done for it.
			After the first pass, the SOCs are
			(1153635, 1153636), (1039104, 1153628), (1039093, 1039101), (1038923, 1039090)
		Note: during the set briding process, a lot of init_data_slice are called, which is not necessary because these
	bridges are later merged. The verify_and_reset_soc seems no ok, it did not count the bridges. Then in the second
	iteration, it introduces new data dependency.

		11:30AM
		[4.3]  check why verify_soc failed to identify failed bridges. Pass the first pass and then trace into it.
				soc0 doesnot need bridge
				soc1 (1039104, 1153628) use bridge 1153629: @42498f. success.
				soc2 (1039093, 1039101) uses bridge 1039102: @424946, it's part of the SOC but not in slice. [SO THIS IS WRONG HERE!]
				soc3 (1038923, 1039090) uses @424947 as the  bridge is not fine as well.
		[4.4] proposed fix:
		7:45PM 
			[1] in InstrInfo add a flag IN_SOC, and add functions setInSOC(), unmarkInSOC(), isInSOC(), and 
					update clear_in_slicetags in Trace() [15 min] DONE.
			[1.5] unit test. [10 min] DONE.
			[2] refactor: SOCManager::findNextSOCEnd() [15 min] DONE.
			[2.5] test the current implementation. [10 min] DONE.
			[3] modify soc::setBridgeTo it keeps searching until there is enough room. and make sure to update
				the tsEnd [20 min] DONE.

		9:20AM 08/07/2014
			[4] debug again. [30 min]
				problem. too slow. found an infinite loop in setBridge.
				[4.1] fix it and add soc end logic. 
				[4.2] handling bug: eip not in map.
				[4.3] further protection when removing and add soc end.
				Problem: isINSOC is never hit because setInSOC is never called!
		11:00AM
			[5] need to revamp the implementation of SOC, set the data members to be protected and set up the 
					set methods.
				[5.1] add protected resetII_SOCsInRange(long long int tsStart, long long int tsEnd); [10 min] DONE
				[5.3] change all public attributes to protected [5 min] DONE.
				[5.4] add inline get function [10 min] DONE.
				[5.5] add inlin set function [10 min] DONE.
				[5.6] fix all syntax errors [20 min] DONE.
			--- UNIT TEST fails!
				[5.7] make it compatible with old version (use on trace) 
				all unit test now passed.

		8:45AM 08/08/2014
			[6] debug the slice algorithm [40 min]
				[a] get and set methods
						setTsStart, setTsEnd OK.
						setIISOCInRange. OK.
						setBridgeTo OK.
						hasSOCStart, addSOCStart, removeSOCStart, removeSOCEnd, addSOCEnd ALL ok.
				9:25AM
				[b] Trace::gen_slice [15 min]
					[b.1] bug: get_room is wrong. bridge should not be ok for loop ierations.
						So, setBridge should include it in SOC as well.
				9:35AM [25 min]
				[c] fix SOC::setBridgeTo
					Success. Now after the 1st round the 3 SOCs are:
					{tsStart = 1153635, tsEnd = 1153636,}
					{tsStart = 1153627, tsEnd = 1153628}
					{tsStart = 1038923, tsEnd = 1039117}
				They are the most compact SOCs around the data slices.
					Now the problem is that after several rounds, new dependencies are introduced and the 3rd SOC grows very large.


10:30AM 08/08/2014
--------------------------------------------------------------------------------------
Task 343: simplify the SOC dependency problem (shrink its size)
--------------------------------------------------------------------------------------
	[1] trace into the full slice of SOC [1153635, 1153636] and see if there is any new dependency introduced [10 min]
			1153636 introduced new dependency on 1038938 (which is not necessary)
			root cause: 1153636 should not be included.
			tsEnd can be a jump instruction. The next instruction will be its bridge. The only problem is the writeProgramExit
			--> verified: writeProgramExit will OVERWRITE the conditional jump instruction anyway.
		11:45AM [15 min]
		[1.1] comment out  getSOCEnd's part. Problem. Needs to comment out check on jump control of setBridgeTo.
		seems ok now, but now only two SOCs. needs to check.

	7:30PM
		[1.2] debug into getSOCEnd and see why there are only two SOCs now.
			sm size shrinks from 3 to 2 at i=1039109	
			Normal, from 1039117 to 1153627 there are lots of loop iterations (which is not included in data slice).
	[2] trace into the full slice of SOC [1153627, 1153628] and see if there is any new dependency introduced [10 min] DONE.

9:30AM 08/09/2014
	[3] trace into slicing.
		Two slices: 
			soc0: [1153635, 1153635]
			soc1: [1038923, 1153628] but actual data slice should be from 1039116 (downward)

		[1] soc0: [1153635, 1153635].  @424974: jnz       0x0000001B 
			the ii->isJumpNeedData() returns true and forces the bNoDataProgation to be false (will propagate data
		dependency).
			Dependencies: 1153628 This is in the second slice. CORRECT. the program exit does need the register condition.
				1153628: @424953: sub       edx, 0xF0000000	
				1039096: ins @424943: dec       [edi] 
				1039100: ins @424943: dec [edi] 
				1039104: same
				1039108: same
				1039112: same
				1039116: same
			It has memory dependency because it is self-extracted (6 bytes as the result of extraction). So one register dependency
		and 6 memory dependency..
			All these are in the SOC1 [1038923, 1153628] and they did not introduce more than the original set of data slice

		[2] soc1:  [1038923, 1153628] but the actual data slice should be from 1039116 (downward)
				1153628: propagate to
					1153627: mov       edx, 0xF0023000
					1038964: @424943: dec       [edi] [the following are self-extraction data slice - should be originally there ]
					1038968, @424943: dec [edi]
					1038972, 1038976, 1038980: @424943: dec [edi]
				1153626: control link
					Propagate to 1038924, 1038928, 1038932 ... 6 links (all self-extraction links)
				1153625: jnz       0xFFFFFFFC (control link)
				1153624: @424946: dec       ecx (this is reasonable, because it is needed by 1153625)
				1153622: @424943: dec       [edi] (WHY???)
				1153621: ins @424947: jnz       0xFFFFFFFC (control link), it leads to 1153620
				1153620: @424946: dec ecx, depends on 1153616
				1153619: @424943: dec [edi] (why?????)
		11:15AM
			[2.1] check why @424943: dec [edi] is included.  They are introduced as setVisit (because of control link of the block)
				Check propagation of such instructions:
					bVisitControlLink is true, bNoDataProgation is true. 
					ii->hasMemopWithRegOp (because of EDI), then set bNoDataProgation to false!!!
						propagated dependency to: ins @424945: inc       edi
						Strangely, it does not have memory dependency link! The other is the control link!
					

		Problems Fix: 
		11:30am
			[1] check why ts:1153619: @424943 does not have memory link?
				break on Trace::expandFromRaw and check the construction of memory link.
				No one writes to it, the contents is originally in the PE image. so it should be fine. But strangely ts shifts -1.
				Now problems: it crashes on appendRecord. Problme is that CallAdjustRecord when delete itself failed to write to cache.
			==> FIXED.

		7:15PM 08/09/2014
			[2] when hasMemopWithReg should not enalbe bNoDataProgation.
			==> OK. there is already code handling it.

			[3] FOR SOC is bSOC is set, the initial slice should include everything that is already in slice, should do a 
				pre-scanning first. 
	
			7:30PM	
			==> a for loop which does the job. 
			[4] tsEnd, if not originally in slice, only need to be set to needVisit.
			==>modified

			8:20pm
			[5] TEST [3] AND [4].
				Seems fine.
----------------
			soc0: [1153635, 1153635]
			soc1: [1038923, 1153628] but actual data slice should be from 1039116 (downward)
-------------
				After full_slice all soc, check what are the newly added slice instructions.
				Newly added TS: 
						1038922, ins @424938: mov       ecx, 0x00007000 //NOTE. right before 1038923 (reasonable, it's the loop counter)
		Second pass leads to the following SOCs.
-------------------------
soc0:{tsStart = 1153635, tsEnd = 1153635}
soc1:{tsStart = 1038922, tsEnd = 1153628} [only the tsStart is shifted once to the left]
-------------------------
			Result: only 10 instructions in slice (in store).

		[STOP HERE] multiple pass passing error, check. DONE.
	


9:00AM 08/11/2014
--------------------------------------------------------------------------------------
Task 344: check the function processing error.
--------------------------------------------------------------------------------------
[1] read processFunction code and check the error message.
[2] take slice 12.
	Problem instruction: timestamp 3116233, eip: 7c91003f. ns @7c91039f: ret 
	find out how it's included in slice.
	Run slice to slice 12. 

	processFunction:  tsStart: 3115445, tsEntry: 3115510, tsRightAfterRet: 3116236
9:00AM 08/12/2014 [30 min]
[3] debug into processFunction and check when ts:3116233 is added.
	[a] set a breakpoint at call Trace::gen_slic, after the 1st run change the id
	[b] set a conditional bp at InstrExecRecorder->setInslice (compile)
	[c] set a conditional bp at Trace::processFunction on tsRightAfterRet: 3116236
	take slice 12.
	Observation: 
		[1] instruction 3116233 is first set in slice for full_slice of SOC: 3115445, 3116317
		because ii is inSlice. Strangely, don't know why ii is in slice.
		[2] check process function, why it's not identified if it's already in slice.
				when processing 3116233, its reverse pointer is set to 5435027.
				It directly returns false, seems not right.
				The instruction is not needed for mem, reg, visit etc.
			fixed. there are two bugs (one simple condition problem)
[4] still problems with slice 16, in the following:
	Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5596085, eip: 7c913288!
	Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5595297, eip: 7c915187!
	Something wrong in Trace::processFunction(), there should be no data (mem) dependency, reversePoinerType: 3, ts: 5595247, eip: 7c91506b!
	[1] check how is 5596085 is set. break on slice 16.
		it is set in slice by processFunction (entire function in slice)
		(5591054, 5596291) [callentry, rightAfterRet]
	Reports error in (5596022, 5596087) [a function nested inside]
		The question is why is its reversePointerType 3?
			first time the linkType is set to 7. It's 3. and not updated because
		it does not check the pass number.

9:15AM 08/13/2014
[5] seems to fix the complaint about processFunction.
	check the timing problem. DONE.

9:40AM
[6] slice 18 again has problem. check.
	[6.1] add slice quality report (doesn't cost anything)
			add timing report for slice quality report.
	[6.2] issues:
			slice 12 fails on SOC identification. Check why.
	[6.3] identify data: slice 18.
		(5596022,5596087) for ts 5596085
		Strangely did not find it.
		found that it's slice 19.
	It seems that it first needs one full slice, and then in the second slice it broke.
		Problem: this->pass_no_inslice is not cleared. fixed.



9:00AM 08/15/2014
--------------------------------------------------------------------------------------
Task 345: try to improve the data slice again.
--------------------------------------------------------------------------------------
[1] observe slice 12.
	[1] let slice 0 do the init
	[2] set slice id to 12
	[3] break on init_data slice and check the slice size.
			init data slice is 2790153 (big number!)
			first pass: sm size: 99.
			2nd pass: sm size: 77
			3rd pass: sm size: 66
			4th pass: sm size: 63
			5th pass: sm size: 61
			6th pass: sm size: 57 
	[4] observe the init_data_slice what data are included.
		[1] 5435262, @426dac: jno       0x00000008
		[2] 5435245, @426d8c: sub       dx, 0x9B5D
		[3] 5435221, @426471: sub       edx, ecx
		[4] 5435220, @42646b: xor       ecx, 0x020A4E99
		[5], 5435218, @426462: mov       edx, [esp]
		[6] 5435217, ins @426461: pop       ecx
		[7] 5435216, @426460: push      edx
		[8] 5435215, add       edx, 0xE9ECBFEB
		[9] 5435214, @426454: or        edx, 0x78AD0634
		[10] 5435213,  @42644e: or        edx, 0x67660CBB
		[11] 5435212, @426449: mov       edx, 0x42A846CA
		[12] 5435211, @426446: mov       [esp], edx
		[13] 5435208, @42643e: add       edx, ebp
		[14] 5435206, ns @42643b: add       edx, ebx
		[15] 5435204, @426438: add       ebx, ecx
		[16] 5435203, @426432: xor       ecx, 0x39C7B881
		[17] 5435202, ins @426430: neg       ecx
		[18] 5435201, ins @42642b: mov       ecx, 0x0C5722D4
		[19] 5435199,  @426428: neg       ebx
		[20] 5435198, ins @426423: mov       ebx, 0x5E99253A
		[21] 5435194, ins @426417: mov       edx, eax
		[22] 5435192, @426415: pop       eax
		[23] 5435191, @426412: mov       [esp], ebx
		[24] 5435188, @42640a: xor       ebx, edi
		[25] 5435187, @426405: mov       ebx, 0xA38B5286
		[26] 5435186,  ins @426400: mov       edi, 0x6B5A7E3F
		XX [27] 5236091, ins @42619c: pop       [ecx] //introduced by memory link.
	[5] figure out why 5236091 is included by 5435245.
		So, the memory link is introduced by the self-extraction. Otherwise, 


---------------------------------------------------
	New goal - add an instruction trace for data tracing
---------------------------------------------------
	2PM-3PM 08/18/2015  [1 hr]
	// [1] add a new option of datatracing "dtr" [20 min]
		[a] modify hmp-commands.h
		[b] modify monitor.c
		[c] modify handle.h
	//[2] give up direction 1. [10 min]
		[a] reverse the change on hmp-commands.h
	[3] modify the config.txt [10 min]
	[4] add constant for data trace
	[5] add the framework genExecDataTrace() [10 min]
	[6] add BatchAnalyzer::genTasksForGenDataTrace() [10 min] 

	3-5PM 08/18/2015 [1 hr]	
	[7] fix cateogryToName [1 hr]  
	[8] read about how keyboard event is processed [1 hr] debug trace into it.
		[a] handle_user_command (in monitor.c) -> "sendkey x"
		[b] hmp_send_key
		[c] qmp_send_key, e.g., for key "d". The value is *(keylist->value) values are:
				{kind = KEY_VALUE_KIND_QCODE, {data = 0x27, number = 39, qcode = Q_KEY_CODE_D}}
				keycode is 32.
				note the call of keycode_from_keyvalue(p->value);
		[d] kbd_put_keyboard 
		[e] qemu_put_keyboard_event
		[f] ps2_put_keyboard
		[g] ps2_queue --> the opaque parameter is the PS2State, which can be monitored (see who's reading it).
		use command like "awatch *0x28e08d4c" (monitor the read and write access)
		[h] ->  found that the following functions are called
			#0  0x08156dc2 in ps2_read_data (opaque=0x28e08d00) at hw/ps2.c:191
			#1  0x0814e497 in kbd_read_data (opaque=0x28e07b44, addr=0, size=1) at hw/pckbd.c:323
		[g] find out the current EIP of the instruction :
			approach: set bp on the helper_trace2 and print eip_in, the "in" instruction is locted the previous one
			0x806f48af (next instruction)
			@EIP 0x806f48ae: length: (1): in        %dx, %al //*****
			@EIP 0x806f48af: length: (3): ret       $0x0004
			@EIP 0x806f48b2: length: (2): mov       %edi, %edi
			@EIP 0x806f48b4: length: (2): xor       %eax, %eax

		[h] the problem is that env->arrRegs has nothing but 0 (not updated actually because dynamic run-time translation by qemu)	
		[g] *** in gen_save_regs_before_instr, modify the condition so that for instruction 0x806f48ae, the instruction register
				values are recorded.
		!!!!!

	1:00PM-2:00PM 
	[1]  should indicate the registers to RECORD!!!!
		[a] in  translate.c:800
		[b] debug into it.
			1. b ps2_queue (on keyboard)
			2. b ops_sse.h:2480 and see how it's been read in EDX register.
				display/x eip_in
				display/x env->arrRegs
				Expected data:
				{kind = KEY_VALUE_KIND_QCODE, {data = 0x27, number = 39, qcode = Q_KEY_CODE_D}}
				keycode: 0x20
				keycode is 32.
			3. b ps2_read_data
	Strange. Observation is that many IN instructions are executed before ps2_read_data
			and the data of registers are not updated. 
	[2] check if ps2_read_data is getting the keycode.
			Yes, val read is 0x20 (32) (for char d)
		--> memory_region_read_accessor
			*value |= (tmp & mask) << shift; 
			//note tmp is 0x20 --> *value is 
			// no change, value to return is sitll 0x20
		--> iorange_read (address is 0x60)
		--> helper_inb (addr is 0x60), still return 0x20
		--> then it goes to helper_trace2 for the next instruction, dump of regs
		 env->arrRegs = {0x0, 0x0, 0x2ee0, 0x60, 0x1, 0x8055068c, 0x805506a0, 
		--> it seems that EDX has the right value but EAX does not have it.
	[3] trace into ps2_read_data and trace why it's not writing to EAX.
		it should be located at: p/x &(env->arrRegs[0])
		$21 = 0x28dcf55c (env->arrRegs[0])
		Observation: 
			esp is copied to 0x28dcf550,
			ebp is copied to 0x28dcf554,
			$eax value is copied to 0x28dc6390, 
	[4] found the problem: because registers are copied before instruction,
		should take it RIGHT before the instruction AFTER the in_instruction
		change the capture address to 0x60
---> still does not work.


	
	



		





	









Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.