From: Guillaume T. <gui...@po...> - 2004-06-30 08:13:13
|
Hello, I tried to make a comparaison between ELSA (Enhanced Linux System Accounting) and CSA (Comprehensive System Accounting). The started point was to evaluate how intrusive in the Linux kernel the two projects are. The results of my analysis are the following. If we want to improve accounting under Linux, I think that we need to work on the accounting part (of course) and also we need to provide some mechanism to manage group of processes. Currently, there already is a BSD-like accouting. The BSD accounting gives less metrics than CSA does. In fact, BSD-accounting IO statistics are broken. Thus, I wrote a small patch for kernel 2.6.7 that fixes the problem (attach it to this email). This patch is based on the CSA I/O statistics part. There are improvements in the -mm kernel tree concerning accounting. The second problem is to manage group of processes. There are several projects like CKRM (class), CSA (pagg) and ELSA (bank). It seems that the choice is still open. CKRM is an ambitious project while CSA(PAGG to be more accurate) and ELSA provides simple mechanism to manage group of processes. Anyway, the accounting part must be independant of how processes are grouped together (class, pagg, bank, ...). I paste the paper that compares ELSA and CSA at the end of this mail. If there is some indentation problems, you can view it on the web site http://sourceforge.net/docman/display_doc.php?docid=23446&group_id=105806 I also put it on a Wiki page so you can add comments or fix mistakes or misunderstoods. The link is http://elsa.sourceforge.net/cgi-bin/elsa-wiki.pl Thank you for your comments, Best, Guillaume ============================== ELSA versus CSA ================================= This document presents differences between the two accounting tools ELSA (Enhanced Linux System Accounting) and CSA (Comprehensive System Accounting). We are interested by how those tools modify Linux kernel. We will compare csa-job-pagg-2.6.5.patch with patch-2.6.7-elsa. ========================== 1) Diff output histogram ========================== Tool "diffstat" shows the insertions, deletions, and modifications per-file. It shows all modifications and not only kernel modifications. 1.1) ELSA --------- Makefile | 2 drivers/Kconfig | 2 drivers/Makefile | 1 drivers/elsacct/Kconfig | 61 +++ drivers/elsacct/Makefile | 5 drivers/elsacct/bank.c | 427 ++++++++++++++++++++++++ drivers/elsacct/elsacct.c | 811 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/bank.h | 183 ++++++++++ include/linux/elsacct.h | 84 ++++ include/linux/init_task.h | 1 include/linux/sched.h | 3 kernel/exit.c | 2 kernel/fork.c | 14 13 files changed, 1595 insertions(+), 1 deletion(-) There are 6 new files: drivers/elsacct/Kconfig drivers/elsacct/Makefile drivers/elsacct/bank.c drivers/elsacct/elsacct.c include/linux/bank.h include/linux/elsacct.h It means that ELSA modified 7 kernel files. 1.2) CSA -------- Documentation/job.txt | 104 ++ Documentation/pagg.txt | 151 +++ drivers/block/ll_rw_blk.c | 4 fs/exec.c | 4 fs/read_write.c | 25 include/linux/csa.h | 526 +++++++++++ include/linux/csa_internal.h | 85 + include/linux/init_task.h | 2 include/linux/job.h | 123 ++ include/linux/pagg.h | 223 ++++ include/linux/paggctl.h | 179 +++ include/linux/sched.h | 28 init/Kconfig | 59 + kernel/Makefile | 5 kernel/csa.c | 1665 ++++++++++++++++++++++++++++++++++ kernel/exit.c | 13 kernel/fork.c | 19 kernel/job.c | 2056 +++++++++++++++++++++++++++++++++++++++++++ kernel/pagg.c | 397 ++++++++ mm/memory.c | 20 mm/mmap.c | 10 mm/mremap.c | 8 mm/rmap.c | 3 mm/swapfile.c | 4 24 files changed, 5706 insertions(+), 7 deletions(-) There are 10 new files: Documentation/job.txt Documentation/pagg.txt include/linux/csa.h include/linux/csa_internal.h include/linux/job.h include/linux/pagg.h include/linux/paggctl.h kernel/csa.c kernel/job.c kernel/pagg.c It means that CSA modified 14 kernel files. 1.3) Conclusion --------------- If we just look at the results of the "diffstat" command we can see that CSA is more intrusive than ELSA. It's true but it's because CSA can get more informations concerning memory and I/O than ELSA. This is the next part of our analysis. ==================== 2) Diff in details ==================== Now, We see in details the differences between ELSA and CSA. 2.1) include/linux/sched.h [ELSA + CSA] --------------------------------------- CSA: In order to get accounting informations, CSA adds eleven new fields in the structure of the process (struct task_struct): ... #if defined(CONFIG_PAGG) /* List of pagg (process aggregate) attachments */ struct pagg_list pagg_list; #endif /* i/o counters(bytes read/written, blocks read/written, #syscalls, waittime */ unsigned long rchar, wchar, rblk, wblk, syscr, syscw, bwtime; #if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) unsigned long csa_rss_mem1, csa_vm_mem1; clock_t csa_stimexpd; #endif ... It also modifies the memory management structure (struct mm_struct) .. unsigned long hiwater_rss, hiwater_vm; ... Of course, to initialize those new variables, they need to add new code in "include/linux/init_task.h" ELSA: We added field "bank_head" in the task_struct. This field holds information about banks that are in the system. Like CSA, we added an initialization of this field in "include/linux/init_task.h" 2.2) drivers/block/ll_rw_blk.c [CSA] ------------------------------------ CSA adds four lines of code. It modifies the function "get_request_wait()" to update field "bwtime" of the current process that is a wait time. It also updates the fields "rblk" and "wblk" in function "drive_stat_acct". 2.3) fs/exec.c [CSA] -------------------- There is an addition of a call to csa_update_integrals() and update_mem_hiwater() in the do_execve() function called by sys_execve() system call. The execve() routine executes a program given as argument. csa_update_integrals() updates variable like csa_stimexpd, csa_rss_mem1 and csa_vm_mem1 defined in the task_struct (see section 2.1). The code of the function is: static inline void csa_update_integrals(void) { long delta; if (current->mm) { delta = current->stime - current->csa_stimexpd; current->csa_stimexpd = current->stime; current->csa_rss_mem1 += delta * current->mm->rss; current->csa_vm_mem1 += delta * current->mm->total_vm; } } update_mem_hiwater() updates variables hiwater_rss and hiwater_vm defined in the virtual memory structure. The code is: static inline void update_mem_hiwater(void) { if (current->mm) { if (current->mm->hiwater_rss < current->mm->rss) { current->mm->hiwater_rss = current->mm->rss; } if (current->mm->hiwater_vm < current->mm->total_vm) { current->mm->hiwater_vm = current->mm->total_vm; } } } 2.4) fs/read_write.c [CSA] -------------------------- They also modified the file fs/read_write.c in order to take into account writings and readings. We found that those data, although present in BSD accounting, are not updated by the BSD process accounting. Modified functions are vfs_read(), vfs_write(), sys_readv(), sys_writev() and do_sendfile(). Those modifications allow the recovery of accounting informations about the number of bytes read and write, the number of blocks read and write and the number of syscalls. 2.5) kernel/exit.c [ELSA + CSA] ------------------------------- CSA: There are several modifications in the do_exit() function. First, there are two calls. One to csa_update_integrals() and another to update_mem_hiwater(). There is a call to "csa_acct()" which is a wrapper to CSA end-of-process accounting record, which is written by CSA (csa.c) code when a task within a job exits. Finally, there is a call to "detach_pagg_list_chk()". ELSA: Process is removed from the bank by calling elsacct_process_remove(). If bank is empty, the bank is automatically removed from kernel space. Before removed it, a callback is called to perform some actions like write accounting informations in a file. 2.6) kernel/fork.c [ELSA + CSA] ------------------------------- CSA: There is a call to "csa_clear_integrals()" that allows the initializing of fields when a new task is created. There are some other initializations. It's in this file that the new process is attached to the same "process aggregate" (PAGG) containers as the parent process. ELSA: Fields of the current process are initialized and the process is added in the same container as its parent. 2.7) mm/memory.c, mm/mmap.c, mm/mremap.c, mm/rmap.c and mm/swapfile.c [CSA] --------------------------------------------------------------------------- In file "mm/memory.c", function "zap_page_range()" has been modified. It is called when the system needs to remove user pages inside a given range of addresses. This is used for example by the system call "madvise" which allows to give advices to Linux kernel about how to use memory. They also modified function do_wp_page() adding, as in the preceding function, a call to csa_update_integrals() and update_mem_hiwater(). The function do_wp_page() is called by handle_pte_fault() when a user wants to write a page that is shared. Another modification appears in do_swap_page() and in do_anonymous_page(). This function is used by the do_no_page() which tries to create a new page mapping if it doesn't exist. Always about memory information, the file "mm/mmap.c" is modified. The modification occurs in do_mmap_pgoff(). This function is used when the kernel maps a file or a device in memory. In this file, functions which allow to update CSA accounting information have been added to the function expand_stack() which allows to expand the virtual memory area. Same way, a modification was made in do_brk(). Files "mm/mremap.c", "mm/rmap.c" and "mm/swapfile.c" have been modified. Modifications have been done respectively in functions move_vma() + do_mremap(), try_to_unmap_one() and unuse_pte(). =============== 3) Conclusion =============== CSA is more intrusive than ELSA because SGI developers added code in order to get more accounting information. Currently, ELSA allows to get same information as found in BSD-like process accounting (ie time accounting) but for a group of process. As BSD accounting is already included in the Linux kernel, it seems intresting to fix IO statistics issue instead of creating a new accounting tool. The table-1 shows what kind of information is available with different systems. ---------------------------------------------------------- | ACCOUNTING INFORMATION | SYSTEM USED | |------------------------------|---------------------------| | | BSD/ELSA (1)| CSA (1) | |==============================|=============|=============| | Job ID (or Bank ID) | x | x | | Project ID | | x | | System billing units | | x | |------------------------------|-------------|-------------| | Process creation time | x | x | | User time | x | x | | System time | x | x | | Elapsed time | x | x | |------------------------------|-------------|-------------| | Average memory usage | x | x | | Hiwater memory usage (RSS) | | x | | Hiwater memory usage (VM) | | x | | Minor page faults | x | x | | Major page faults | x | x | | Number of swaps | x | x | |------------------------------|-------------|-------------| | Chars read | | x | | Chars written | | x | | Blocks read | | x | | Blocks written | | x | | Blocks I/O wait time | | x | | Read system calls | | | | Write system cals | | x | |------------------------------|-------------|-------------| | Max number of CPUs used | | | | Max number of CPUs available | | | | Semaphore wait time | | | | Time connected to a CPUs | | | ---------------------------------------------------------- Table-1: Information available about accounting (1) Information are taken from source code The other issue to improve accounting is to deal with the management of a group of processes. Three projects are available, CKRM, CSA and ELSA. There is some discussion around this but currently it's not easy to identify the best approach, thus the discussion is still open. ============================ ELSA versus CSA [END] ============================= |