Hi all,

  I had spent some day reading the code of strace to grasp the big picture of strace.my main purpose is first to find out how did strace dispatch the syscall to each printing function. I found that the main function ( in file:strace.c) first call init(argc, argv) to initialize some important data structures such as “static struct tcb **tcbtab;” and so on. And then process the arguments(using getopt()) and set the corresponding global flags(such as cflag_t cflag representing the –c\-C option etc..), and use sigaction() to set some signal handler at last.

After that, the main trace loop function “static int trace(void)” is called to handle all the work.

the trace() function is really very big and I found it will finally call trace_syscall(tcp) to do the core output things, then I go into trace_syscall(tcp) to see what happened there.

trace_syscall() is defined in syscall.c and it just simply use exiting(tcp) to determine which function to call(trace_syscall_exiting(tcp)  or trace_syscall_entering(tcp) ). From there I get to know why “strace sleep 2” output like that(and in fact I later set a breakpoint there and see it more clearly).

 

trace_syscall_entering(tcp) will call some other functions to populate some fields of the tcp structure. And I think the most important output is done by the following code:

       if ((tcp->qual_flg & QUAL_RAW) && tcp->s_ent->sys_func != sys_exit)

              res = printargs(tcp);

       else

              res = tcp->s_ent->sys_func(tcp);

apparently, for most functions the else part is executed and it dispatch to the RIGHT function by the structure s_ent which store the output function’s address by a function pointer(together with the name string of the function etc..). I then become interesting at where and how does strace assign the right value to tcp->s_ent structure. and I found it is done in “static int get_scno(struct tcb *tcp)” by the line “tcp->s_ent = &sysent[scno];”. the global pointer sysent is defined as:

 const struct_sysent sysent0[] = {

#include "syscallent.h"

};

and after looking at the “syscallent.h”,  I finally know how strace integrates the syscall table. and I think if we want to add support to some new syscall we can start from the syscallent.h.

 

Back to the real output functions, the syscallent.h file give us the function name of the output function, they just have the same name to the corresponding syscall function. for example, from file.c I found sys_open() which just call static int decode_open(struct tcb *tcp, int offset). and the function decode_open do all the detail things , It know the detail arguments meaning of syscall open() .

another interesting find is that strace have the low-level output function which finally output things and other upper functions just use these some kind API to finish their output function and do not care how the low-level output function works.the typical low-level output functions are:tprintf(), printstr(),printpath(), printfd() and so on.

 

I will spend more time reading and debugging the code to understand its implementation and I think there is no need to understand all of the code deeply to finish the GSOC project.

 


From this mail (http://sourceforge.net/p/strace/mailman/strace-devel/thread/4515571.KdWbzpdtLr%40vapier/#msg32095710), I find “the advanced path decoding itself would be large enough to fill a whole 3 month GSOC project”.

So, Are you suggesting us not to choose the “advanced path decoding” as the proposal?

 


I read the discuss in the mail and found the “Structured output” is also a good choice and from my current understanding of strace, we can just modify the output part of strace alone to finish the work.

From this mail(http://sourceforge.net/p/strace/mailman/message/32072591/ ), 

Is it means that I should first finish a very basic prototype addressed some of the problems in the list and post the patch to the mailing list?

 


by the way, I find in this mail(http://sourceforge.net/p/strace/mailman/message/31924683/) that the current strace is “Printing of decoded C constructs is mostly open-coded” “ Support of other formats inevitably means introducing some API for structured output “ and “the strace code base would have a framework to call an output module and that would take care of the exact output details.”  .

So I am just wondering why strace hard-coded these decoding function and why use the method using in flex/bison, such as:

we first define a specification file(plain text with a specific grammar) like:

define sys_open: open ( $1, $2 ) = $0

and then strace parse this file and substitute these $1,$2,$0 variable with real arguments and output the result string. because I ever used flex/bison and I think this maybe better than the hard-coded way?

this is just my very first thoughts and I know it’s immature(we still need some special way to handle those complex syscall’s argument and this requires really a great lot work to do).



Thanks

Yangmin Zhu