From: Jay L. <jl...@en...> - 2004-10-22 01:19:10
|
These two patches are the one we submitted to SuSE for Sles9 SP1. They are clean of CSA specific code. In earlier round of discussion, all partipants favored a common layer of accounting data collection. I believe these two patches are the super set that meets the needs of people who need enhanced BSD accounting. This patchset consists of two parts: acct_io and acct_mm, as we identified improved data collection in the area of IO and MM are useful to our customers. It is intended to offer common data collection method for various accounting packages including BSD accouting, ELSA, CSA, and any other acct packages that favor a common layer of data collection. 'acct_mm' defines a few macros that are no-op unless CONFIG_BSD_PROCESS_ACCT config flag is set on. Andrew, please consider including these two patches. Please let me know how i can help! Best Regards, --- Jay Lan - Linux System Software Silicon Graphics Inc., Mountain View, CA |
From: Jay L. <jl...@en...> - 2004-09-27 22:36:23
|
This is an effort of providing an enhanced accounting data collection. It is intended to offer common data collection method for various accounting packages including BSD accouting, ELSA, CSA, and any other acct packages that favor a common layer of data collection, separated from data presentation layer and management of process groups layer. This patchset consists of two parts: acct_io and acct_mm as we identified useful spots for improved data collection in the area of IO and MM. This patchset is to replace the previously submitted CSA patchset of four. The CSA kernel module is a standalone module. The csa_eop patch was to provide a hook for end-of-process handling and that can be considered separately unless there is enough common interest. Now that the patchset is down to IO and MM, i hope it is more appealing :) Comments? Best Regards, - jay --- Jay Lan - Linux System Software Silicon Graphics Inc., Mountain View, CA |
From: Jay L. <jl...@en...> - 2004-09-27 22:45:28
Attachments:
acct_io
|
1/2: acct_io Enhanced I/O accounting data collection. Signed-off-by: Jay Lan <jl...@sg...> |
From: Jay L. <jl...@en...> - 2004-09-27 22:52:41
Attachments:
acct_mm
|
2/2: acct_mm Enhanced MM accounting data collection. Signed-off-by: Jay Lan <jl...@sg...> |
From: Paul J. <pj...@sg...> - 2004-09-28 09:35:05
|
nits: 1) I'm not sure the "no-op if CONFIG_CSA not set" comments are worthwhile - it does not seem to be a common practice to mark macros that collapse under certain CONFIG's with such comments, and some code, such as in fork.c, would become quite a bit less readable if such comments were widely used. 2) Three of the added csa_update_integrals() lines have leading spaces, instead of a tab char, such as in: =================================================================== --- linux.orig/fs/exec.c 2004-09-27 11:57:40.201435722 -0700 +++ linux/fs/exec.c 2004-09-27 14:05:41.266160725 -0700 @@ -1163,6 +1164,9 @@ /* execve success */ security_bprm_free(&bprm); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); <========= + update_mem_hiwater(); <========= return retval; } 3) Is it always the case that csa_update_integrals() and update_mem_hiwater() are used together? If so, perhaps they could be collapsed into one? Even the current->mm test inside them could be made one test, perhaps? 4) What kind of kernel text size expansion does this cause? There seem to be about a dozen of these calls. What are the pros and cons of inlining csa_update_integrals() and update_mem_hiwater()? Are these on hot enough kernel code paths that we should benchmark with and without these hooks enabled, both inline and out-of-line? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Robin H. <ho...@sg...> - 2004-09-28 11:40:02
|
On Tue, Sep 28, 2004 at 02:33:50AM -0700, Paul Jackson wrote: > nits: > > 3) Is it always the case that csa_update_integrals() and > update_mem_hiwater() are used together? If so, perhaps > they could be collapsed into one? Even the current->mm > test inside them could be made one test, perhaps? This sounds like a really good idea. Maybe update_mem_hiwater should have the #ifdef CONFIG_CSA inside it. This really sounds like a good idea. Is update_mem_hiwater everywhere that csa_update_integrals needs to be? I seem to remember one or two places where that was not the case. Of course that was a few years ago and my memory is really fuzzy > > 4) What kind of kernel text size expansion does this cause? > There seem to be about a dozen of these calls. What are > the pros and cons of inlining csa_update_integrals() and > update_mem_hiwater()? Are these on hot enough kernel code > paths that we should benchmark with and without these hooks > enabled, both inline and out-of-line? The size was never very noticable. It usually did not even cause overflow to the next 4k page. The csa_job module added the biggest bloat, but I have nearly always compiled that as a module. I have benchmarked these hooks a very long time ago. The number and location has not changed appreciably. I ran three seperate tests. The first was without any csa config'd on. The second was with csa config'd on, but no job containers and the writing of the accounting file turned off. Last was with everything. We ran 7 runs of each config on a 32 cpu system. There was no delta between the two kernels that were not writing accounting files. Actually, the average for the with integrals was 0.3% above the without integrals, but this is well within the noise range. Originally, there was a 5% decrease in performance with the writing of the accounting data. There was another unfortunate side effect that some of the CSA metrics became much worse. This problem was later identified and fixed. At that point, CSA logging caused a 2.7% drop in AIM7 peak with shifting the transition point (I think that was the name I remember) towards a higher number of processes. Jack said that was due to a slight serializing of process exits. Robin |
From: Paul J. <pj...@sg...> - 2004-09-28 13:31:09
|
Robin wrote: > I have benchmarked these hooks a very long time ago. The number and > location has not changed appreciably. These results seem reasonable ... thanks. > The size was never very noticable. But would the time cost of being out of line be noticable either? Actually, being out of line might be a tick faster, if it reduced by a cache line what was needed for a common execution path. > Originally, there was a 5% decrease in performance with the writing of > the accounting data. There was another unfortunate side effect that some > of the CSA metrics became much worse. This problem was later identified > and fixed. Is there any non-trivial risk that some other "unfortunate side affect" exists today, that we'd find on benchmarking? I'm not sure its worth benchmarking again, but I slightly suspect it is, and if benchmarking was done, I'd do it with these calls both inline and out of line, to see what affect that had on runtime. If no affect on runtime, I'd tend toward the out of line calls - at least saving a little kernel text space. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Robin H. <ho...@sg...> - 2004-09-28 14:34:43
|
> > Is there any non-trivial risk that some other "unfortunate side affect" > exists today, that we'd find on benchmarking? When I last owned csa, I was running benchmarks before each SGI release. The tests were a simple matter of grabbing belay or belay2 and running setting up an FC disk vault (one was usually attached that had 16 disks and use Jack's runit script to launch it. I would then take the output and use Jack's web page to graph and compare it to the previous. Additionally, every time I got access to a new larger system, I would run the tests on there and check for any odd affects of CSA. Nothing interesting ever popped up from LBS2.1.1 all the way through to LBS3.0. > > I'm not sure its worth benchmarking again, but I slightly suspect it is, > and if benchmarking was done, I'd do it with these calls both inline and > out of line, to see what affect that had on runtime. If no affect on > runtime, I'd tend toward the out of line calls - at least saving a > little kernel text space. AIM7 is far to big of a hammer to find this level of micro-optimization. You could probably find or write a simple microbenchmark which shows the difference that introducing the code causes, but I would doubt it would show the inline versus the callout. Either way, we have probably spent more time discussing benchmarking this than it is worth at this point of time. I would expect the do_no_page() path will be the easiest to identify the change. I have a simple test which maps a large region and then touches a large number of pages. The whole loop is surronded by reading of the Shub RTC register. This was done to determine the effect of quicklists on page faulting. That type of microbenchmark might be your best bet at finding the problem. Robin |
From: Jay L. <jl...@sg...> - 2004-10-02 00:42:48
|
Paul Jackson wrote: > nits: > > 1) I'm not sure the "no-op if CONFIG_CSA not set" comments > are worthwhile - it does not seem to be a common practice > to mark macros that collapse under certain CONFIG's with > such comments, and some code, such as in fork.c, would > become quite a bit less readable if such comments were > widely used. Yeah, that makes sense. Will be fixed in next posting. > > 2) Three of the added csa_update_integrals() lines have > leading spaces, instead of a tab char, such as in: > > =================================================================== > --- linux.orig/fs/exec.c 2004-09-27 11:57:40.201435722 -0700 > +++ linux/fs/exec.c 2004-09-27 14:05:41.266160725 -0700 > @@ -1163,6 +1164,9 @@ > > /* execve success */ > security_bprm_free(&bprm); > + /* no-op if CONFIG_CSA not set */ > + csa_update_integrals(); <========= > + update_mem_hiwater(); <========= > return retval; > } Caused by 'cut-n-paste'. Will be fixed. > > 3) Is it always the case that csa_update_integrals() and > update_mem_hiwater() are used together? If so, perhaps > they could be collapsed into one? Even the current->mm > test inside them could be made one test, perhaps? As Robin pointed out, there are a couple of instances that are not the case. Actually there are three. Thanks for your feedback, Paul! - jay |
From: Jay L. <jl...@en...> - 2004-10-22 01:33:44
Attachments:
acct_io
|
1/2: acct_io Enahanced I/O accounting data collection. Signed-off-by: Jay Lan <jl...@sg...> |
From: Andrew M. <ak...@os...> - 2004-10-22 02:18:09
|
Jay Lan <jl...@en...> wrote: > > 1/2: acct_io > > Enahanced I/O accounting data collection. > It's nice to use `diff -p' so people can see what functions you're hitting. > + current->syscr++; Should these metrics be per-thread or per-heavyweight process? > + if (ret > 0) { > + current->rchar += ret; > + } It's conventional to omit the braces if there is only one statement in the block. > =================================================================== > --- linux.orig/include/linux/sched.h 2004-10-01 17:01:21.412848229 -0700 > +++ linux/include/linux/sched.h 2004-10-01 17:09:42.723482260 -0700 > @@ -591,6 +591,9 @@ > struct rw_semaphore pagg_sem; There is no `pagg_sem' in the kernel, so this will spit a reject. > #endif > > +/* i/o counters(bytes read/written, #syscalls */ > + unsigned long rchar, wchar, syscr, syscw; > + These will overflow super-quick. Shouldn't they be 64-bit? > --- linux.orig/kernel/fork.c 2004-10-01 17:01:21.432379595 -0700 > +++ linux/kernel/fork.c 2004-10-01 17:09:42.732271376 -0700 > @@ -995,6 +995,7 @@ > p->real_timer.data = (unsigned long) p; > > p->utime = p->stime = 0; > + p->rchar = p->wchar = p->syscr = p->syscw = 0; We generally prefer p->rchar = 0; p->wchar = 0; etc. yes, the code which is there has already sinned - feel free to clean it up while you're there ;) |
From: Jay L. <jl...@en...> - 2004-11-04 23:48:46
|
Andrew Morton wrote: > Jay Lan <jl...@en...> wrote: > >>1/2: acct_io >> >>Enahanced I/O accounting data collection. >> > > > It's nice to use `diff -p' so people can see what functions you're hitting. Will be done that way next time. > > >>+ current->syscr++; > > > Should these metrics be per-thread or per-heavyweight process? Our customers found it useful per process. > > >>+ if (ret > 0) { >>+ current->rchar += ret; >>+ } > > > It's conventional to omit the braces if there is only one statement in the > block. Fixed. Also a few other places in the same siutation. > > >>=================================================================== >>--- linux.orig/include/linux/sched.h 2004-10-01 17:01:21.412848229 -0700 >>+++ linux/include/linux/sched.h 2004-10-01 17:09:42.723482260 -0700 >>@@ -591,6 +591,9 @@ >> struct rw_semaphore pagg_sem; > > > There is no `pagg_sem' in the kernel, so this will spit a reject. Fixed. That was from pagg, but i should not assume that. > > >> #endif >> >>+/* i/o counters(bytes read/written, #syscalls */ >>+ unsigned long rchar, wchar, syscr, syscw; >>+ > > > These will overflow super-quick. Shouldn't they be 64-bit? You are correct. Thanks! > > >>--- linux.orig/kernel/fork.c 2004-10-01 17:01:21.432379595 -0700 >>+++ linux/kernel/fork.c 2004-10-01 17:09:42.732271376 -0700 >>@@ -995,6 +995,7 @@ >> p->real_timer.data = (unsigned long) p; >> >> p->utime = p->stime = 0; >>+ p->rchar = p->wchar = p->syscr = p->syscw = 0; > > > We generally prefer > > p->rchar = 0; > p->wchar = 0; > etc. > > yes, the code which is there has already sinned - feel free to clean it up > while you're there ;) Will be corrected. Thanks, - jay > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |
From: Jay L. <jl...@en...> - 2004-10-22 01:35:54
Attachments:
acct_mm
|
2/2: acct_mm Enhanced MM accounting data collection. Signed-off-by: Jay Lan <jl...@sg...> |
From: Andrew M. <ak...@os...> - 2004-10-22 02:28:06
|
Please don't send multiple patches under the same Subject:. It confuses me and breaks my patch processing tools (I strip out the "1/2" numbering because it becomes irrelevant). Please choose a meaningful and distinct title for each patch. See http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Jay Lan <jl...@en...> wrote: > > 2/2: acct_mm > > Enhanced MM accounting data collection. > > Index: linux/include/linux/sched.h > =================================================================== > --- linux.orig/include/linux/sched.h 2004-10-01 17:16:35.105905373 -0700 > +++ linux/include/linux/sched.h 2004-10-14 12:15:33.450280955 -0700 > @@ -249,6 +249,8 @@ > struct kioctx *ioctx_list; > > struct kioctx default_kioctx; > + > + unsigned long hiwater_rss, hiwater_vm; > }; unsigned long hiwater_rss; /* comment goes here */ unsigned long hiwater_vm; /* and here */ > > extern int mmlist_nr; > @@ -593,6 +595,10 @@ > > /* i/o counters(bytes read/written, #syscalls */ > unsigned long rchar, wchar, syscr, syscw; > +#if defined(CONFIG_BSD_PROCESS_ACCT) > + u64 acct_rss_mem1, acct_vm_mem1; > + clock_t acct_stimexpd; > +#endif Please place the above three fields on separate lines and document them. It's not clear to me what, semantically, these fields represent. That's something which is appropriate for the supporting changelog entry. > +/* Update highwater values */ > +static inline void update_mem_hiwater(void) > +{ > + if (current->mm) { > + if (current->mm->hiwater_rss < current->mm->rss) { > + current->mm->hiwater_rss = current->mm->rss; > + } > + if (current->mm->hiwater_vm < current->mm->total_vm) { > + current->mm->hiwater_vm = current->mm->total_vm; > + } > + } > +} If this has more than one callsite then it it too big to inline. If it has a single callsite then it's OK to inline it, but it can and should be moved into the .c file. > + > +static inline void acct_update_integrals(void) > +{ > + long delta; > + > + if (current->mm) { > + delta = current->stime - current->acct_stimexpd; > + current->acct_stimexpd = current->stime; > + current->acct_rss_mem1 += delta * current->mm->rss; > + current->acct_vm_mem1 += delta * current->mm->total_vm; > + } > +} Consider caching `current' in a local variable - sometimes gcc likes to reevaluate it each time and it takes 14 bytes of code per pop. This function is too big to inline. > +static inline void acct_clear_integrals(struct task_struct *tsk) > +{ > + if (tsk) { > + tsk->acct_stimexpd = 0; > + tsk->acct_rss_mem1 = 0; > + tsk->acct_vm_mem1 = 0; > + } > +} Do any of the callers pass in a null `tsk'? |
From: Jay L. <jl...@en...> - 2004-11-04 23:55:13
|
Andrew Morton wrote: > Please don't send multiple patches under the same Subject:. It confuses me > and breaks my patch processing tools (I strip out the "1/2" numbering > because it becomes irrelevant). > > Please choose a meaningful and distinct title for each patch. See > http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Will do better next time :) > > Jay Lan <jl...@en...> wrote: > >>2/2: acct_mm >> >>Enhanced MM accounting data collection. >> > > >>Index: linux/include/linux/sched.h >>=================================================================== >>--- linux.orig/include/linux/sched.h 2004-10-01 17:16:35.105905373 -0700 >>+++ linux/include/linux/sched.h 2004-10-14 12:15:33.450280955 -0700 >>@@ -249,6 +249,8 @@ >> struct kioctx *ioctx_list; >> >> struct kioctx default_kioctx; >>+ >>+ unsigned long hiwater_rss, hiwater_vm; >> }; > > > unsigned long hiwater_rss; /* comment goes here */ > unsigned long hiwater_vm; /* and here */ Will do. > > >> >> extern int mmlist_nr; >>@@ -593,6 +595,10 @@ >> >> /* i/o counters(bytes read/written, #syscalls */ >> unsigned long rchar, wchar, syscr, syscw; >>+#if defined(CONFIG_BSD_PROCESS_ACCT) >>+ u64 acct_rss_mem1, acct_vm_mem1; >>+ clock_t acct_stimexpd; >>+#endif > > > Please place the above three fields on separate lines and document them. > > It's not clear to me what, semantically, these fields represent. That's > something which is appropriate for the supporting changelog entry. Certainly! > > >>+/* Update highwater values */ >>+static inline void update_mem_hiwater(void) >>+{ >>+ if (current->mm) { >>+ if (current->mm->hiwater_rss < current->mm->rss) { >>+ current->mm->hiwater_rss = current->mm->rss; >>+ } >>+ if (current->mm->hiwater_vm < current->mm->total_vm) { >>+ current->mm->hiwater_vm = current->mm->total_vm; >>+ } >>+ } >>+} > > > If this has more than one callsite then it it too big to inline. > > If it has a single callsite then it's OK to inline it, but it can and > should be moved into the .c file. > > >>+ >>+static inline void acct_update_integrals(void) >>+{ >>+ long delta; >>+ >>+ if (current->mm) { >>+ delta = current->stime - current->acct_stimexpd; >>+ current->acct_stimexpd = current->stime; >>+ current->acct_rss_mem1 += delta * current->mm->rss; >>+ current->acct_vm_mem1 += delta * current->mm->total_vm; >>+ } >>+} > > > Consider caching `current' in a local variable - sometimes gcc likes to > reevaluate it each time and it takes 14 bytes of code per pop. > > This function is too big to inline. If not inline, where these functions do you suggest to place? How about kernel/acct.c? > > >>+static inline void acct_clear_integrals(struct task_struct *tsk) >>+{ >>+ if (tsk) { >>+ tsk->acct_stimexpd = 0; >>+ tsk->acct_rss_mem1 = 0; >>+ tsk->acct_vm_mem1 = 0; >>+ } >>+} > > > Do any of the callers pass in a null `tsk'? No. Just to be safe, i guess :( *** The revised version of acct_io and acct_mm should be posted soon. Thank you very much for reviewing and comments. - jay > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |
From: Andrew M. <ak...@os...> - 2004-10-22 02:13:41
|
Jay Lan <jl...@en...> wrote: > > These two patches are the one we submitted to SuSE for Sles9 SP1. Did suse accept them? > They are clean of CSA specific code. > > In earlier round of discussion, all partipants favored a common > layer of accounting data collection. I believe these two patches are > the super set that meets the needs of people who need enhanced BSD > accounting. OK, well I shall assume that all participants are reading lse-tech, or that readers of lse-tech will pass on the appropriate heads-up. Because if I don't hear from anyone, this all goes in. > This patchset consists of two parts: acct_io and acct_mm, as we > identified improved data collection in the area of IO and MM are > useful to our customers. > > It is intended to offer common data collection method for various > accounting packages including BSD accouting, ELSA, CSA, and any other > acct packages that favor a common layer of data collection. > > 'acct_mm' defines a few macros that are no-op unless > CONFIG_BSD_PROCESS_ACCT config flag is set on. > > Andrew, please consider including these two patches. Please let me > know how i can help! These patches are really, really badly documented. This is by no means unusual, but I labour on. A nice description of what you set out to do and how you did it would be a nice thing to present. I have a few comments on the present implementation and shall follow up to those patches. |