From: Andrew M. <ak...@os...> - 2006-03-30 05:04:17
|
Shailabh Nagar <na...@wa...> wrote: > > Could you please include the following delay accounting patches > in -mm ? I'm at a loss to evaluate the suitability of this work, really. I always am when accounting patches come along. There are various people and various groups working on various different things and there appears to be no coordination and little commonality of aims. I worry that picking one submission basically at random will provide nothing which the other groups can work on to build up their feature. On the other hand, we don't want to do nothing until some uber-grand all-singing, all-dancing statistics-gathering infrastructure comes along. So I'm a bit stuck. What I would like to see happen is that there be some coordination between the various stakeholders, and some vague plan which they're all happy with as a basis for the eventual grand solution. We already have various bits and pieces of statistics gathering in the kernel and it's already a bit ad-hoc. Adding more one-requirement-specific accounting code won't improve that situation. But then, I said all this a year or two ago and nothing much has happened since then. It's not your fault, but it's a problem. Perhaps a good starting point would be a one-page bullet-point-form wishlist of all the accounting which people want to get out of the kernel, and a description of what the kernel<->user interface should look like. Right now, I don't think we even have a picture of that. We need a statistics maintainer, too, to pull together the plan, coordinate, push things forwards. The first step would be to identify the stakeholders, come up with that page of bullet-points. Then again, maybe the right thing to do is to keep adding low-impact requirement-specific statistics patches as they come along. But if we're going to do it that way, we need an up-front reason for doing so, and I don't know what that would be. See my problem? |
From: Balbir S. <ba...@in...> - 2006-03-30 06:26:44
|
On Wed, Mar 29, 2006 at 09:03:14PM -0800, Andrew Morton wrote: > Shailabh Nagar <na...@wa...> wrote: > > > > Could you please include the following delay accounting patches > > in -mm ? > > I'm at a loss to evaluate the suitability of this work, really. I always > am when accounting patches come along. > > There are various people and various groups working on various different > things and there appears to be no coordination and little commonality of > aims. I worry that picking one submission basically at random will provide > nothing which the other groups can work on to build up their feature. > > On the other hand, we don't want to do nothing until some uber-grand > all-singing, all-dancing statistics-gathering infrastructure comes along. > > So I'm a bit stuck. What I would like to see happen is that there be some > coordination between the various stakeholders, and some vague plan which > they're all happy with as a basis for the eventual grand solution. > > We already have various bits and pieces of statistics gathering in the > kernel and it's already a bit ad-hoc. Adding more one-requirement-specific > accounting code won't improve that situation. > > But then, I said all this a year or two ago and nothing much has happened > since then. It's not your fault, but it's a problem. > > Perhaps a good starting point would be a one-page bullet-point-form > wishlist of all the accounting which people want to get out of the kernel, > and a description of what the kernel<->user interface should look like. > Right now, I don't think we even have a picture of that. > > We need a statistics maintainer, too, to pull together the plan, > coordinate, push things forwards. The first step would be to identify the > stakeholders, come up with that page of bullet-points. > > Then again, maybe the right thing to do is to keep adding low-impact > requirement-specific statistics patches as they come along. But if we're > going to do it that way, we need an up-front reason for doing so, and I > don't know what that would be. > > See my problem? One of the issues we have tried to address is the ability to provide some form of a common ground for all the statistics to co-exist. Various methods were discussed for exchanging data between kernel and user space, genetlink was suggested often and the clear winner. To that end, we have created a taskstats.c file. Any subsystem wanting to add their statistics and sending it to user space can add their own types by extending taskstats.c (changing the version number) and creating their own types using genetlink. They will have to do the following 1. Add statistics gathering in their own subsystem 2. Add a type to taskstats.c, extend it and use data from (1) and send it to user space. The data from various subsystems can co-exist. I feel that this could serve as the basic common infrastructure to begin with and refined later (depending on the needs of other people). Thoughts? Balbir |
From: Andrew M. <ak...@os...> - 2006-03-30 06:48:32
|
Balbir Singh <ba...@in...> wrote: > > On Wed, Mar 29, 2006 at 09:03:14PM -0800, Andrew Morton wrote: > > Shailabh Nagar <na...@wa...> wrote: > > > > > > Could you please include the following delay accounting patches > > > in -mm ? > > > > I'm at a loss to evaluate the suitability of this work, really. I always > > am when accounting patches come along. > > > > There are various people and various groups working on various different > > things and there appears to be no coordination and little commonality of > > aims. I worry that picking one submission basically at random will provide > > nothing which the other groups can work on to build up their feature. > > > > On the other hand, we don't want to do nothing until some uber-grand > > all-singing, all-dancing statistics-gathering infrastructure comes along. > > > > So I'm a bit stuck. What I would like to see happen is that there be some > > coordination between the various stakeholders, and some vague plan which > > they're all happy with as a basis for the eventual grand solution. > > > > We already have various bits and pieces of statistics gathering in the > > kernel and it's already a bit ad-hoc. Adding more one-requirement-specific > > accounting code won't improve that situation. > > > > But then, I said all this a year or two ago and nothing much has happened > > since then. It's not your fault, but it's a problem. > > > > Perhaps a good starting point would be a one-page bullet-point-form > > wishlist of all the accounting which people want to get out of the kernel, > > and a description of what the kernel<->user interface should look like. > > Right now, I don't think we even have a picture of that. > > > > We need a statistics maintainer, too, to pull together the plan, > > coordinate, push things forwards. The first step would be to identify the > > stakeholders, come up with that page of bullet-points. > > > > Then again, maybe the right thing to do is to keep adding low-impact > > requirement-specific statistics patches as they come along. But if we're > > going to do it that way, we need an up-front reason for doing so, and I > > don't know what that would be. > > > > See my problem? > > One of the issues we have tried to address is the ability to provide some > form of a common ground for all the statistics to co-exist. Various methods > were discussed for exchanging data between kernel and user space, genetlink > was suggested often and the clear winner. > > To that end, we have created a taskstats.c file. Any subsystem wanting > to add their statistics and sending it to user space can add their own > types by extending taskstats.c (changing the version number) and creating > their own types using genetlink. They will have to do the following > > 1. Add statistics gathering in their own subsystem > 2. Add a type to taskstats.c, extend it and use data from (1) and send > it to user space. > > The data from various subsystems can co-exist. I feel that this could serve as > the basic common infrastructure to begin with and refined later (depending on > the needs of other people). > Sounds fine to me, but I'm not a stakeholder. Trolling back through lse-tech gives us: pnotify: Erik Jacobson <er...@sg...> CSA accounting/PAGG/JOB: Jay Lan <jl...@en...> Limin Gu <li...@db...> per-process IO statistics: Levent Serinol <lse...@gm...> ELSA: Guillaume Thouvenin <gui...@bu...> per-cpu time statistics: Erich Focht <ef...@es...> Scalable statistics counters with /proc reporting: Ravikiran G Thirumalai <ki...@in...> (Kiran feft IBM, but presumably the requirement lives on) There was a long thread "A common layer for Accounting packages". Did it come to a conclusion? Anyway, if mostly everyone is mostly happy with what you propose then that it good news. |
From: Paul J. <pj...@sg...> - 2006-03-30 09:55:46
|
Andrew wrote: > CSA accounting/PAGG/JOB: > Jay Lan <jl...@en...> > Limin Gu <li...@db...> You can remove Limin Gu from this list. She has joined the ranks of former-SGI employees, some time back. We wish her well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Dipankar S. <dip...@in...> - 2006-03-30 13:26:06
|
On Wed, Mar 29, 2006 at 10:47:37PM -0800, Andrew Morton wrote: > > Sounds fine to me, but I'm not a stakeholder. > > Trolling back through lse-tech gives us: > > Scalable statistics counters with /proc reporting: > Ravikiran G Thirumalai <ki...@in...> > (Kiran feft IBM, but presumably the requirement lives on) Not necessarily in that form. A lot of statistics has now become per-cpu, something we wanted to achieve back then. Automatic /proc reporting was an idea only tossed around, but /proc is now deprecated for such things. There may be a need for fast export of counters to userspace, but those requirements are not yet clear. This is different from per-task accounting infrastructure that people are trying to develop. Thanks Dipankar |
From: Shailabh N. <na...@wa...> - 2006-03-30 17:23:55
|
Andrew Morton wrote: >Balbir Singh <ba...@in...> wrote: > > >>On Wed, Mar 29, 2006 at 09:03:14PM -0800, Andrew Morton wrote: >> >> >>>Shailabh Nagar <na...@wa...> wrote: >>> >>> >>>>Could you please include the following delay accounting patches >>>> in -mm ? >>>> >>>> >>>I'm at a loss to evaluate the suitability of this work, really. I always >>>am when accounting patches come along. >>> >>>There are various people and various groups working on various different >>>things and there appears to be no coordination and little commonality of >>>aims. I worry that picking one submission basically at random will provide >>>nothing which the other groups can work on to build up their feature. >>> >>>On the other hand, we don't want to do nothing until some uber-grand >>>all-singing, all-dancing statistics-gathering infrastructure comes along. >>> >>>So I'm a bit stuck. What I would like to see happen is that there be some >>>coordination between the various stakeholders, and some vague plan which >>>they're all happy with as a basis for the eventual grand solution. >>> >>>We already have various bits and pieces of statistics gathering in the >>>kernel and it's already a bit ad-hoc. Adding more one-requirement-specific >>>accounting code won't improve that situation. >>> >>>But then, I said all this a year or two ago and nothing much has happened >>>since then. It's not your fault, but it's a problem. >>> >>> Yes, I agree it is a problem. We found it ourselves while developing this patchset. BSD accounting had some properties we liked (like availability of stats for a process after it died) but the way to extend it or get access to those stats while a process was alive wasn't all that good. Similarly CSA had needs like ours but not quite the same. Our compromise solution, prompted by your comments on getting a consensus for the use of a "statistics connector" for all accounting stakeholders, was the taskstats interface, as described by Balbir below. But it is not the complete solution or an attempt to get some common accounting infrastructure, true :-( >>>Perhaps a good starting point would be a one-page bullet-point-form >>>wishlist of all the accounting which people want to get out of the kernel, >>>and a description of what the kernel<->user interface should look like. >>>Right now, I don't think we even have a picture of that. >>> >>>We need a statistics maintainer, too, to pull together the plan, >>>coordinate, push things forwards. The first step would be to identify the >>>stakeholders, come up with that page of bullet-points. >>> >>> >>>Then again, maybe the right thing to do is to keep adding low-impact >>>requirement-specific statistics patches as they come along. >>> Personally, this is the approach I favor with unification happening piecewise, atleast as far as the collection of statistics is concerned. The interface for making stats available outside would seem to be more in need of a unified approach since we already have a profusion of export methods, some legacy and some being introduced by folks like us. >>>But if we're >>>going to do it that way, we need an up-front reason for doing so, and I >>>don't know what that would be. >>> >>>See my problem? >>> >>> >>One of the issues we have tried to address is the ability to provide some >>form of a common ground for all the statistics to co-exist. Various methods >>were discussed for exchanging data between kernel and user space, genetlink >>was suggested often and the clear winner. >> >>To that end, we have created a taskstats.c file. Any subsystem wanting >>to add their statistics and sending it to user space can add their own >>types by extending taskstats.c (changing the version number) and creating >>their own types using genetlink. They will have to do the following >> >>1. Add statistics gathering in their own subsystem >>2. Add a type to taskstats.c, extend it and use data from (1) and send >> it to user space. >> >>The data from various subsystems can co-exist. I feel that this could serve as >>the basic common infrastructure to begin with and refined later (depending on >>the needs of other people). >> >> >> > >Sounds fine to me, but I'm not a stakeholder. > >Trolling back through lse-tech gives us: > >pnotify: > Erik Jacobson <er...@sg...> > >CSA accounting/PAGG/JOB: > Jay Lan <jl...@en...> > Limin Gu <li...@db...> > >per-process IO statistics: > Levent Serinol <lse...@gm...> > >ELSA: > Guillaume Thouvenin <gui...@bu...> > >per-cpu time statistics: > Erich Focht <ef...@es...> > >Scalable statistics counters with /proc reporting: > Ravikiran G Thirumalai <ki...@in...> > (Kiran feft IBM, but presumably the requirement lives on) > > To this list we can also add Microstate accounting Peter Chubb <pe...@ch...> I don't know if Peter is still interested in pursuing this or it was rejected. >There was a long thread "A common layer for Accounting packages". Did it >come to a conclusion? > > Unfortunately, not. >Anyway, if mostly everyone is mostly happy with what you propose then that >it good news. > > It would seem like a good first step then, for me to contact the folks above and see if they are able to use the interface we're proposing and modify it if needed. --Shailabh |
From: Peter C. <pe...@ge...> - 2006-03-31 02:56:17
|
>>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: >> Shailabh> To this list we can also add Shailabh> Microstate accounting Peter Chubb Shailabh> <pe...@ch...> I don't know if Peter is still Shailabh> interested in pursuing this or it was rejected. It's still maintained in a sporadic sort of way --- I update it when either I need it for something, or someone's downloaded it and asks why it doesn't work agains kernel X.Y.Z. I see a few downloads a month. My microstate accounting patch overlaps the delay accounting patch quite a lot in functionality, (but I thnk mine is cleaner except for interrupt time accounting... which the delay accounting patch doesn't do. I wanted to know how much time a thread *really* had on the processor, subtracting off the time spent in interrupt handlers for some other process). -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia |
From: Shailabh N. <na...@wa...> - 2006-03-31 05:27:46
|
Peter Chubb wrote: >>>>>>"Shailabh" == Shailabh Nagar <na...@wa...> writes: >>>>>> >>>>>> > > > >Shailabh> To this list we can also add > >Shailabh> Microstate accounting Peter Chubb >Shailabh> <pe...@ch...> I don't know if Peter is still >Shailabh> interested in pursuing this or it was rejected. > >It's still maintained in a sporadic sort of way --- I update it when >either I need it for something, or someone's downloaded it and asks >why it doesn't work agains kernel X.Y.Z. I see a few downloads a >month. > > So do you intend to pursue acceptance ? If so, do you think the netlink-based taskstats interface provided by the delay accounting patches could be an acceptable substitute for the interfaces you had (from an old lkml post, they appear to be /proc/tgid/msa and a syscall based one) ? >My microstate accounting patch overlaps the delay accounting patch quite a >lot in functionality, (but I thnk mine is cleaner except for interrupt >time accounting... which the delay accounting patch doesn't do. I >wanted to know how much time a thread *really* had on the processor, >subtracting off the time spent in interrupt handlers for some other >process). > > Thanks. Will incorporate into a note on the mechanisms of the other accounting patches. --Shailabh |
From: Peter C. <pe...@ge...> - 2006-03-31 08:19:12
|
>>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: Shailabh> Peter Chubb wrote: (microstate accounting patch) >> It's still maintained in a sporadic sort of way --- I update it >> when either I need it for something, or someone's downloaded it and >> asks why it doesn't work agains kernel X.Y.Z. I see a few >> downloads a month. >> >> Shailabh> So do you intend to pursue acceptance ? If so, do you think Shailabh> the netlink-based taskstats interface provided by the delay Shailabh> accounting patches could be an acceptable substitute for the Shailabh> interfaces you had (from an old lkml post, they appear to be Shailabh> /proc/tgid/msa and a syscall based one) ? I'd have to take a close look. The syscall interface is modelled on getrusage(), and only lets you get your own or your children's data; I'm not too worried about trashing it, as it should be possible to emulate in terms of netlink (albeit at a cost; system calls are relatively cheap) /proc/<pid>/task/<tid>/msa lets you get at anything you own. I use awk scripts to process the msa file in /proc/... and pipe it into gnuplot at n second intervals; a netlink interface would need to have an auxiliary program to read it and then squirt it into the scripts, I think --- or is there a way to get ASCII out on demand? I quite often use cat to do quick checks on whats going on too --- so overall I think the /proc interface is desirable. -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia |
From: Shailabh N. <na...@wa...> - 2006-03-31 16:03:45
|
Peter Chubb wrote: >>>>>>"Shailabh" == Shailabh Nagar <na...@wa...> writes: >>>>>> >>>>>> > >Shailabh> Peter Chubb wrote: > (microstate accounting patch) > > >>> It's still maintained in a sporadic sort of way --- I update it >>>when either I need it for something, or someone's downloaded it and >>>asks why it doesn't work agains kernel X.Y.Z. I see a few >>>downloads a month. >>> >>> >>> >>> >Shailabh> So do you intend to pursue acceptance ? If so, do you think >Shailabh> the netlink-based taskstats interface provided by the delay >Shailabh> accounting patches could be an acceptable substitute for the >Shailabh> interfaces you had (from an old lkml post, they appear to be >Shailabh> /proc/tgid/msa and a syscall based one) ? > >I'd have to take a close look. > Please do ! As I mentioned in the other note where I summarize the various accounting packages I think it should be fairly easy for microstate accounting to extend the structure returned by the taskstats interface. > The syscall interface is modelled on >getrusage(), and only lets you get your own or your children's data; >I'm not too worried about trashing it, as it should be possible to >emulate in terms of netlink (albeit at a cost; system calls are >relatively cheap) > >/proc/<pid>/task/<tid>/msa lets you get at anything you own. I use >awk scripts to process the msa file in /proc/... and pipe it into >gnuplot at n second intervals; a netlink interface would need to have >an auxiliary program to read it and then squirt it into the scripts, I >think --- or is there a way to get ASCII out on demand? > No. The use of netlink pretty much means you have to use an auxiliary program. We provide one already (as part of the documentation to the patches). What netlink buys you is the ability to - get data for a task after it has exited (ie netlink serves as a buffer) - get data for large number of tasks more efficiently than /proc >I quite often >use cat to do quick checks on whats going on too --- so overall I think >the /proc interface is desirable. > > Yes, /proc is more convenient both for cat'ting and also since its used by tools like top. Delay accounting patches also provide the "block I/O wait (including swapin)" statistic through /proc/tgid/stat for convenience and so that top etc. can use it while displaying per-task stats. However, the question here is this: *if* a single, unified interface for per-task statistics was deemed to be desirable (as Andrew is effectively suggesting we explore), what would that interface be ? /proc-based, netlink based or syscall-based ? I would submit it is netlink-based since it is a superset of /proc and syscalls. Neither of the latter two can return data after a task has exited (atleast not easily...you can always invent infrastructure to buffer per-task stats but it would be cumbersome) Whereas the former can, with the help of an auxiliary program, provide the same data that /proc and syscalls can. The price paid by /proc and syscall users for unification is convenience, not loss of functionality. Would you agree ? --Shailabh |
From: Shailabh N. <na...@wa...> - 2006-03-31 06:43:01
|
Andrew Morton wrote: >Balbir Singh <ba...@in...> wrote: > > >>On Wed, Mar 29, 2006 at 09:03:14PM -0800, Andrew Morton wrote: >> >> >>>Shailabh Nagar <na...@wa...> wrote: >>> >>> >>>>Could you please include the following delay accounting patches >>>> in -mm ? >>>> >>>> >>>I'm at a loss to evaluate the suitability of this work, really. I always >>>am when accounting patches come along. >>> >>>There are various people and various groups working on various different >>>things and there appears to be no coordination and little commonality of >>>aims. I worry that picking one submission basically at random will provide >>>nothing which the other groups can work on to build up their feature. >>> >>>On the other hand, we don't want to do nothing until some uber-grand >>>all-singing, all-dancing statistics-gathering infrastructure comes along. >>> >>>So I'm a bit stuck. What I would like to see happen is that there be some >>>coordination between the various stakeholders, and some vague plan which >>>they're all happy with as a basis for the eventual grand solution. >>> >>>We already have various bits and pieces of statistics gathering in the >>>kernel and it's already a bit ad-hoc. Adding more one-requirement-specific >>>accounting code won't improve that situation. >>> >>>But then, I said all this a year or two ago and nothing much has happened >>>since then. It's not your fault, but it's a problem. >>> >>>Perhaps a good starting point would be a one-page bullet-point-form >>>wishlist of all the accounting which people want to get out of the kernel, >>>and a description of what the kernel<->user interface should look like. >>>Right now, I don't think we even have a picture of that. >>> >>>We need a statistics maintainer, too, to pull together the plan, >>>coordinate, push things forwards. The first step would be to identify the >>>stakeholders, come up with that page of bullet-points. >>> >>>Then again, maybe the right thing to do is to keep adding low-impact >>>requirement-specific statistics patches as they come along. But if we're >>>going to do it that way, we need an up-front reason for doing so, and I >>>don't know what that would be. >>> >>>See my problem? >>> >>> >>One of the issues we have tried to address is the ability to provide some >>form of a common ground for all the statistics to co-exist. Various methods >>were discussed for exchanging data between kernel and user space, genetlink >>was suggested often and the clear winner. >> >>To that end, we have created a taskstats.c file. Any subsystem wanting >>to add their statistics and sending it to user space can add their own >>types by extending taskstats.c (changing the version number) and creating >>their own types using genetlink. They will have to do the following >> >>1. Add statistics gathering in their own subsystem >>2. Add a type to taskstats.c, extend it and use data from (1) and send >> it to user space. >> >>The data from various subsystems can co-exist. I feel that this could serve as >>the basic common infrastructure to begin with and refined later (depending on >>the needs of other people). >> >> >> > >Sounds fine to me, but I'm not a stakeholder. > >Trolling back through lse-tech gives us: > >pnotify: > Erik Jacobson <er...@sg...> > >CSA accounting/PAGG/JOB: > Jay Lan <jl...@en...> > Limin Gu <li...@db...> > >per-process IO statistics: > Levent Serinol <lse...@gm...> > >ELSA: > Guillaume Thouvenin <gui...@bu...> > >per-cpu time statistics: > Erich Focht <ef...@es...> > >Scalable statistics counters with /proc reporting: > Ravikiran G Thirumalai <ki...@in...> > (Kiran feft IBM, but presumably the requirement lives on) > >There was a long thread "A common layer for Accounting packages". Did it >come to a conclusion? > >Anyway, if mostly everyone is mostly happy with what you propose then that >it good news. > > Following Andrew's suggestion, here's my quick overview of the various other accounting packages that have been proposed on lse-tech with a focus on whether they can utilize the netlink-based taskstats interface being proposed by the delay accounting patches. Please note that unification of statistics *collection* is not being discussed since that kind of merger can be done as these patches get accepted, if at all, into the kernel. To try and unify right away would hold every patch (esp. delay accounting !) hostage to the problems in every other patch unnecessarily. As long as the interface can be unified, the merger of the collection bits can always happen without affecting user space. Stakeholders of each of these patches, on cc, are requested to please correct any misunderstandings of what their patches do. Also, please comment on the observations about their patch's ability to use the netlink-based taskstats interface, code for which was posted at http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/1787.html Thanks, --Shailabh Summary The following can use the taskstats netlink-based interface by extending the returned data structure - Comprehensive System Accounting - per-process I/O stats - Microstate accounting - per cpu time stats The following patches' interface needs are independent of taskstats or subsumed by one of above: - Enhanced Linux System Accounting - pnotify - scalable statistics counters Details (please correct if these are misunderstood) 1. Comprehensive System Accounting (Jay Lan) -------------------------------------------- - Collect various per-task statistics and write an accounting record containing these stats at task exit. Interface similar to BSD process accounting but the accounting record structure is quite different. - CSA could utilize some stats collected/exported by delay accounting: blkio wait time, cpu run time for task - CSA only needs data to be available at task exit, not during the task's lifetime. Moreover, at task exit, it needs the accounting record to be written to a file. - CSA could utilize delay accounting's taskstats netlink interface to gather task data at exit through a userspace utility that then writes it out to its expected file. To do so, CSA would need the taskstats struct to be extended with whatever additional stats it needs. The additional stats could be selectively exported only on task exit to avoid imposing a space burden on users of delay accounting who query a process's statistics during its lifetime. Collection of the additional stats needed by CSA may be tied to pnotify and job patches which are still being reviewed/considered for acceptance. As such, unification in the collection of stats can be deferred until status of pnotify/job/CSA patches becomes more clear. 2. per-process I/O statistics (Levent Serinol) ---------------------------------------------- - Exports task->{rchar,wchar} through /proc/tgid/iostat (earlier version proposed export through /proc/tgid/stats) - No new stats collection. Just export of existing task fields - Problem with accepting the patch stems from the accuracy of the statistics in these fields. The fields are updated only in three cases today (sys_read/write, sys_readv/writev, do_sendfile) so they aren't accurate. Async I/O, memory-mapped I/O is not counted at the very least. CSA patches also export these fields through their accounting record but don't appear to be doing anything to improve accuracy of collection (or maybe it doesn't matter to them). BSD accounting, which ought to be using the sum of these fields for its ac_io field, doesn't (it hardcodes the output to zero). When the fate of task->rchar/wchar is decided, based on CSA's needs, those fields can be easily added to taskstats. 3. per-cpu time statistics (Erich Focht) ---------------------------------------- - Collects time spent by a task on each cpu of a system and exports it through new interface /proc/tgid/cpu - Statistic is needed for performance analysis/debugging (like schedstats) and not for production systems. - Unsure why push for acceptance was abandoned. Possibly due to one or more of: space overhead of allocating NR_CPUS variables in task_struct, time overhead of collecting the data ? - Can use taskstats interface to export the data by adding needed fields to struct taskstats and bumping up the version. 4. Microstate accounting (Peter Chubb) -------------------------------------- - Measure time spent by a thread in various interesting states, while accounting for interrupts, and export through /proc/tid/msa and through a syscall interface - Interesting states have some overlap with delay accounting - Exporting of per-task stats can be done through taskstats netlink interface 5. Enhanced Linux System Accounting (Guillaume Thouvenine) ---------------------------------------------------------- - Group tasks at a user level into "jobs" and aggregate, at user level, per-task statistics collected by CSA and/or BSD process accounting. - ELSA does not introduce any new requirement for either collection or export of statistics from the kernel. It can use either BSD and/or CSA's method of using an accounting file. - ELSA needs notification of forks and exits which it can already get through the process events connector in the kernel. Hence ELSA's needs are either met by the kernel today or are a strict subset of CSA (since BSD accounting is already there). 6. pnotify (Erik Jacobson) -------------------------- - Infrastructure for kernel modules to be notified when an event (like fork/exit/exec) happens to a task. Also provides some per-task data for the modules' convenience - pnotify isn't concerned with exporting data to userspace or collecting any stats. Thats left to the kernel module that uses pnotify to get notifications. CSA is one expected user of pnotify. 7. Scalable statistics counters (Ravikiran Thirumalai, Dipankar Sarma) ---------------------------------------------------------------------- - Infrastructure for setting up per-cpu counters (not per-task necessarily) - No specific stats collection proposed as part of patch - May have need for interface for fast export to userspace but requirements not clear - Not per-task and unlikely to have unification prospects at interface level |
From: Guillaume T. <gui...@bu...> - 2006-03-31 07:32:39
|
On Fri, 31 Mar 2006 01:42:28 -0500 Shailabh Nagar <na...@wa...> wrote: > Following Andrew's suggestion, here's my quick overview > of the various other accounting packages that have been > proposed on lse-tech with a focus on whether they can > utilize the netlink-based taskstats interface being proposed > by the delay accounting patches. > > Please note that unification of statistics *collection* is not > being discussed since that kind of merger can be done as these > patches get accepted, if at all, into the kernel. To try and > unify right away would hold every patch (esp. delay accounting !) > hostage to the problems in every other patch unnecessarily. As > long as the interface can be unified, the merger of the > collection bits can always happen without affecting user space. > > Stakeholders of each of these patches, on cc, are requested to > please correct any misunderstandings of what their patches do. > > Also, please comment on the observations about their patch's > ability to use the netlink-based taskstats interface, code for which > was posted at > > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/1787.html > [...] > > 5. Enhanced Linux System Accounting (Guillaume Thouvenine) ^^^^^^^^^^ Thouvenin > ---------------------------------------------------------- > > - Group tasks at a user level into "jobs" and aggregate, > at user level, per-task statistics collected by CSA and/or BSD > process accounting. > > - ELSA does not introduce any new requirement for either > collection or export of statistics from the kernel. It can use > either BSD and/or CSA's method of using an accounting file. > > - ELSA needs notification of forks and exits which it can already > get through the process events connector in the kernel. > > Hence ELSA's needs are either met by the kernel today or are a > strict subset of CSA (since BSD accounting is already there). The overview is very interesting and you have a very good comprehension of ELSA. As you said ELSA is a group tasks at a user level and everything is already in the kernel so your patches don't generate troubles to ELSA. As you said in the delay accounting documentation, delay statistics can also be collected for all tasks and a tool like ELSA can aggregate results for groups of processes. Chears, Guillaume |
From: Shailabh N. <na...@wa...> - 2006-03-31 17:02:08
|
Guillaume Thouvenin wrote: >On Fri, 31 Mar 2006 01:42:28 -0500 >Shailabh Nagar <na...@wa...> wrote: > > > >>Following Andrew's suggestion, here's my quick overview >>of the various other accounting packages that have been >>proposed on lse-tech with a focus on whether they can >>utilize the netlink-based taskstats interface being proposed >>by the delay accounting patches. >> >>Please note that unification of statistics *collection* is not >>being discussed since that kind of merger can be done as these >>patches get accepted, if at all, into the kernel. To try and >>unify right away would hold every patch (esp. delay accounting !) >>hostage to the problems in every other patch unnecessarily. As >>long as the interface can be unified, the merger of the >>collection bits can always happen without affecting user space. >> >>Stakeholders of each of these patches, on cc, are requested to >>please correct any misunderstandings of what their patches do. >> >>Also, please comment on the observations about their patch's >>ability to use the netlink-based taskstats interface, code for which >>was posted at >> >>http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/1787.html >> >> >> >[...] > > >>5. Enhanced Linux System Accounting (Guillaume Thouvenine) >> >> > ^^^^^^^^^^ > Thouvenin > > >>---------------------------------------------------------- >> >>- Group tasks at a user level into "jobs" and aggregate, >>at user level, per-task statistics collected by CSA and/or BSD >>process accounting. >> >>- ELSA does not introduce any new requirement for either >>collection or export of statistics from the kernel. It can use >>either BSD and/or CSA's method of using an accounting file. >> >>- ELSA needs notification of forks and exits which it can already >>get through the process events connector in the kernel. >> >>Hence ELSA's needs are either met by the kernel today or are a >>strict subset of CSA (since BSD accounting is already there). >> >> > >The overview is very interesting and you have a very good comprehension >of ELSA. As you said ELSA is a group tasks at a user level and >everything is already in the kernel so your patches don't generate >troubles to ELSA. As you said in the delay accounting documentation, >delay statistics can also be collected for all tasks and a tool like >ELSA can aggregate results for groups of processes. > > >Chears, >Guillaume > > Thanks Guillaume. Thats one "sign-off" on the taskstats interface then :-) --Shailabh |
From: Dipankar S. <dip...@in...> - 2006-03-31 17:16:33
|
On Fri, Mar 31, 2006 at 01:42:28AM -0500, Shailabh Nagar wrote: > 7. Scalable statistics counters (Ravikiran Thirumalai, Dipankar Sarma) > ---------------------------------------------------------------------- > > - Infrastructure for setting up per-cpu counters (not per-task necessarily) > > - No specific stats collection proposed as part of patch > - May have need for interface for fast export to userspace but requirements > not clear > - Not per-task and unlikely to have unification prospects at interface level This is very old stuff, so we don't need to consider this. This was meant for global counters and not for per-task counters. The main goal was to user per-cpu counters for global counters and that goal has mostly been achieved since then using static and dynamic per-cpu allocation. Thanks Dipankar |
From: Jay L. <jl...@en...> - 2006-04-10 17:15:51
|
I made two feedback on 3/31 only to see them bounced back over the weekend. :( Here was my first feedback: Shailabh Nagar wrote: >> >>Following Andrew's suggestion, here's my quick overview >>of the various other accounting packages that have been >>proposed on lse-tech with a focus on whether they can >>utilize the netlink-based taskstats interface being proposed >>by the delay accounting patches. >> >>Please note that unification of statistics *collection* is not >>being discussed since that kind of merger can be done as these >>patches get accepted, if at all, into the kernel. To try and >>unify right away would hold every patch (esp. delay accounting !) >>hostage to the problems in every other patch unnecessarily. As >>long as the interface can be unified, the merger of the >>collection bits can always happen without affecting user space. >> >>Stakeholders of each of these patches, on cc, are requested to >>please correct any misunderstandings of what their patches do. > >To me, data collection and formation before sending down to >userspace is very important part. What this taskstats netlink >interface does is just to provide an interface to send "already >formatted" data to userspace. In other words, it will replace >"writing accounting records to an accounting file" step currently >performed in BSD accouting and in CSA. If i understand it correctly, >you have delayacct.c sitting on top of taskstats interface, and >all other accounting methods should build their own layer on top >of taskstats as well. For example, potentially BSD acct.c can replace >fput() (and other statements dealing with acctounting file) with >this interface. Same for CSA. > >This approach sounds right to me. Actually i am very glad that you >made effort to provide a common ground here. Yet, this is only >one step. I will apply your patchset on top of 2.6.16-mm to see >what i get and give more feedback later. And, here is the second one: > > > This taskstats thing is much more complicated than what Guillaume > used to have when he put up a prototype of doing ELSA over netlink. > One confusing point is the struct taskstats. If it is to be used > as the big data struct to contain all accounting data everybody > needs (as Shailabh suggested on his CSA analysis section), then > if at do_exit() every accounting methods are to be invoked to > handle their netlink transmission (as currently implemented in > delayed accounting), would it be a lot of overhead sending "grand > data" too many times? Maybe each layer should just format data of > their interest when invoked from do_exit, and then we do one call > to genetlink to deliver formated struct taskstats data? > > Also, as you pointed out, CSA only retrieve data at end of task > but delayed accounting needs to retrieve data during the process. > So, i think we need more than one record types, not just the > struct taskstats, so that the user space delayed accounting > application can specify to get only delayed accounting record. > > Honestly, this taskstats.c layer looks more like something > extracted from delayed accounting than a carefully designed > common ground to me. Patch 8/8 is about documentation of delayed > accounting than the common ground for various accounting methods. > Can you please present us a documentation of design concept of > such a common layer? That would help me. I guess i also need to > catch up on genetlink to better understand taskstats code. > > Regards. > - jay > Regards, - jay |
From: Shailabh N. <na...@wa...> - 2006-04-10 21:45:16
|
Jay Lan wrote: > I made two feedback on 3/31 only to see them bounced > back over the weekend. :( > > Here was my first feedback: > > Shailabh Nagar wrote: > >> > >>Following Andrew's suggestion, here's my quick overview > >>of the various other accounting packages that have been > >>proposed on lse-tech with a focus on whether they can > >>utilize the netlink-based taskstats interface being proposed > >>by the delay accounting patches. > >> > >>Please note that unification of statistics *collection* is not > >>being discussed since that kind of merger can be done as these > >>patches get accepted, if at all, into the kernel. To try and > >>unify right away would hold every patch (esp. delay accounting !) > >>hostage to the problems in every other patch unnecessarily. As > >>long as the interface can be unified, the merger of the > >>collection bits can always happen without affecting user space. > >> > >>Stakeholders of each of these patches, on cc, are requested to > >>please correct any misunderstandings of what their patches do. > > > >To me, data collection and formation before sending down to > >userspace is very important part. What this taskstats netlink > >interface does is just to provide an interface to send "already > >formatted" data to userspace. In other words, it will replace > >"writing accounting records to an accounting file" step currently > >performed in BSD accouting and in CSA. Exactly. The writing of the accounting file can be done in userspace through a CSA-specific daemon reading the data. > If i understand it correctly, > >you have delayacct.c sitting on top of taskstats interface, and > >all other accounting methods should build their own layer on top > >of taskstats as well. Yes, all the new ones that are yet to be included in the kernel > For example, potentially BSD acct.c can replace > >fput() (and other statements dealing with acctounting file) with > >this interface. Same for CSA. Yes. I'm not sure if changing BSD would be useful (since I don't know how often it is used ?) but yes, it can be done and CSA is similar. > > > >This approach sounds right to me. Actually i am very glad that you > >made effort to provide a common ground here. Yet, this is only > >one step. I will apply your patchset on top of 2.6.16-mm to see > >what i get and give more feedback later. > > And, here is the second one: > >> >> >> This taskstats thing is much more complicated than what Guillaume >> used to have when he put up a prototype of doing ELSA over netlink. >> One confusing point is the struct taskstats. If it is to be used >> as the big data struct to contain all accounting data everybody >> needs (as Shailabh suggested on his CSA analysis section), then >> if at do_exit() every accounting methods are to be invoked to >> handle their netlink transmission (as currently implemented in >> delayed accounting), would it be a lot of overhead sending "grand >> data" too many times? Maybe each layer should just format data of >> their interest when invoked from do_exit, and then we do one call >> to genetlink to deliver formated struct taskstats data? > Good idea. One can already do this in the code we submitted by adding functions similar to delayacct_add_tsk() within the fill_pid() and fill_tgid() parts of the taskstats code. Then the delayacct_tsk_exit() routine will serve as the "one call" to deliver formatted data. However, using delayacct_tsk_exit (which does have delay accounting specific bits too) as the data delivery call isn't intuitive. So I'll separate out the taskstats_exit_pid as a separate call directly made within do_exit(). Will require some refactoring but it can be done. >> >> Also, as you pointed out, CSA only retrieve data at end of task >> but delayed accounting needs to retrieve data during the process. >> So, i think we need more than one record types, not just the >> struct taskstats, so that the user space delayed accounting >> application can specify to get only delayed accounting record. > A separate record type isn't needed, atleast for now. For delay accounting, the data obtained during a process' lifetime is the same as the one expected at the end. So by itself, it has no need to distinguish records generated during the lifetime and those generated after a process exits. Yes, the additional fields added to the taskstats struct by CSA will be "unnecessary" for delay accounting users but they will have to be able to deal with that anyway (for the process exit records where CSA and delay will share a common exit record). So creating a separate record structure for the "during lifetime" records trades off transmission of a larger structure (relatively cheap) vs. the added complexity of tracking two types of records. At this point, the tradeoff isn't worth it for us. >> Honestly, this taskstats.c layer looks more like something >> extracted from delayed accounting than a carefully designed common >> ground to me. > If you have other specific suggestions about the interface and why it doesn't meet CSA's needs, we can work to fix them. >> Patch 8/8 is about documentation of delayed >> accounting than the common ground for various accounting methods. > True. Patch 8/8 was meant to document delay accounting alone. I'll extract the taskstats specific parts out. >> Can you please present us a documentation of design concept of >> such a common layer ? > Well, the design is fairly straightforward and is probably apparent by now. A common per-task accounting structure called taskstats exists. Userspace can use a NETLINK_GENERIC interface to send queries for statistics of a particular pid or tgid during the lifetime of a process. Specifying the pid gives the stats for just that pid. Specifying the tgid returns the sum of stats for all threads of the tgid. Userspace can also choose to open the NETLINK_GENERIC socket in multicast and listen for per-pid and per-tgid statistics that are automatically sent from the kernel using a whenever a task exits. These stats are sent whenever there is any listener on the genetlink socket. The per-pid and per-tgid data are exactly the same as what you would get if a query could be done just before a task exited. Sending the per-tgid data at the exit of each pid/tid is necessary since there is no well-defined "tgid exit" point in the kernel (we do not define a thread group to cease existence when the thread group leader exits...rather it ceases to exist when the last thread of the thread group exits). Also, per-tgid accumalation is only done dynamically in the kernel, not maintained as a separate statistic (to avoid wasting time and space). So each time a tid from a tgid exits, one needs to collect and send the whole tgid's data in case userspace is trying to track the stats at a per-tgid level. The statistic structure contents are documented in include/linux/taskstats.h and by the accounting subsystem which fills in the fields. Currently delay accounting is the only user so all the fields are of the form XXX_count and XXX_delay_total where the former is a count of number of values added in the latter. Latter is the cumulative "delay", in nanoseconds, seen by a pid waiting for the resource XXX. e.g. cpu_delay_total is the total time spent waiting for a cpu to run on, blkio_delay_total is the time spent waiting for sync block I/O to complete etc. As more per-task accounting packages get added to the kernel, they can define additional fields following the instructions in include/linux/taskstats.h and define their own userspace utilities similar to getdelays.c Querying for data during a task's lifetime is done completely independently by all the utilities (using unicast queries and replies) - responses to queries by one are not seen by the others. The stats sent on task exit are common and multicast to all listening utilities. Will add this to a separate taskstats doc in Documentation/. >> That would help me. I guess i also need to catch up on genetlink to >> better understand taskstats code. > Please do so soon. The usage of genetlink for taskstats has gone through a detailed review by Jamal etc. so there shouldn't be any genetlink issues that are pertinent to the potential CSA usage of taskstats. --Shailabh >> >> Regards. >> - jay >> |
From: Jay L. <jl...@en...> - 2006-04-10 22:33:33
|
Shailabh Nagar wrote: > Jay Lan wrote: > [ text deleted ] >>> This taskstats thing is much more complicated than what Guillaume >>> used to have when he put up a prototype of doing ELSA over netlink. >>> One confusing point is the struct taskstats. If it is to be used >>> as the big data struct to contain all accounting data everybody >>> needs (as Shailabh suggested on his CSA analysis section), then >>> if at do_exit() every accounting methods are to be invoked to >>> handle their netlink transmission (as currently implemented in >>> delayed accounting), would it be a lot of overhead sending "grand >>> data" too many times? Maybe each layer should just format data of >>> their interest when invoked from do_exit, and then we do one call >>> to genetlink to deliver formated struct taskstats data? >> >> > > Good idea. One can already do this in the code we submitted by adding > functions similar to delayacct_add_tsk() within the fill_pid() and > fill_tgid() parts > of the taskstats code. Then the delayacct_tsk_exit() routine will serve > as the > "one call" to deliver formatted data. > > However, using delayacct_tsk_exit (which does have delay accounting > specific > bits too) as the data delivery call isn't intuitive. So I'll separate > out the taskstats_exit_pid > as a separate call directly made within do_exit(). Will require some > refactoring but it > can be done. The "one call" to deliver formatted data should be placed between if (tsk->mm) { <statements to update tsk->mm hiwater data> ... } and exit_mm(tsk); since CSA needs to pick up data from tsk->mm. I would say to place it immediately before exit_mm(tsk) would be perfect since it is done after BSD's "acct_process()" call, just in case somebody one day volunteers to clean up BSD codes. :) Regards, - jay > > >>> >>> Also, as you pointed out, CSA only retrieve data at end of task >>> but delayed accounting needs to retrieve data during the process. >>> So, i think we need more than one record types, not just the >>> struct taskstats, so that the user space delayed accounting >>> application can specify to get only delayed accounting record. >> >> > A separate record type isn't needed, atleast for now. For delay > accounting, the data obtained during a > process' lifetime is the same as the one expected at the end. So by > itself, it has no need to distinguish > records generated during the lifetime and those generated after a > process exits. > > Yes, the additional fields added to the taskstats struct by CSA will be > "unnecessary" for delay accounting > users but they will have to be able to deal with that anyway (for the > process exit records where CSA and delay > will share a common exit record). > > So creating a separate record structure for the "during lifetime" > records trades off transmission of a larger structure (relatively cheap) > vs. the added complexity of tracking two types of records. > At this point, the tradeoff isn't worth it for us. > > >>> Honestly, this taskstats.c layer looks more like something >>> extracted from delayed accounting than a carefully designed common >>> ground to me. >> >> > If you have other specific suggestions about the interface and why it > doesn't meet CSA's needs, > we can work to fix them. > >>> Patch 8/8 is about documentation of delayed >>> accounting than the common ground for various accounting methods. >> >> > True. Patch 8/8 was meant to document delay accounting alone. I'll > extract the > taskstats specific parts out. > >>> Can you please present us a documentation of design concept of >>> such a common layer ? >> >> > Well, the design is fairly straightforward and is probably apparent by now. > A common per-task accounting structure called taskstats exists. > Userspace can use a NETLINK_GENERIC interface to send queries for > statistics of a particular pid or tgid during the lifetime of a process. > Specifying the pid gives the stats for just that pid. Specifying the > tgid returns > the sum of stats for all threads of the tgid. > > Userspace can also choose to open the NETLINK_GENERIC socket in > multicast and > listen for per-pid and per-tgid statistics that are automatically sent > from the kernel using a whenever a task exits. These stats are sent > whenever there is any listener on the genetlink socket. The per-pid and > per-tgid > data are exactly the same as what you would get if a query could be done > just before > a task exited. Sending the per-tgid data at the exit of each pid/tid is > necessary since > there is no well-defined "tgid exit" point in the kernel (we do not > define a thread group to > cease existence when the thread group leader exits...rather it ceases to > exist when the > last thread of the thread group exits). Also, per-tgid accumalation is > only done dynamically in the kernel, not maintained as a separate > statistic (to avoid wasting time and space). So each time a tid from a > tgid exits, one needs to collect and send the whole tgid's data in case > userspace is trying to track the stats at a per-tgid level. > > The statistic structure contents are documented in > include/linux/taskstats.h > and by the accounting subsystem which fills in the fields. Currently > delay accounting > is the only user so all the fields are of the form > XXX_count and XXX_delay_total > > where the former is a count of number of values added in the latter. > Latter is the > cumulative "delay", in nanoseconds, seen by a pid waiting for the > resource XXX. > e.g. cpu_delay_total is the total time spent waiting for a cpu to run > on, blkio_delay_total > is the time spent waiting for sync block I/O to complete etc. > > As more per-task accounting packages get added to the kernel, they can > define > additional fields following the instructions in > include/linux/taskstats.h and define their > own userspace utilities similar to getdelays.c > Querying for data during a task's lifetime is done completely > independently by all the utilities > (using unicast queries and replies) - responses to queries by one are > not seen by the others. > The stats sent on task exit are common and multicast to all listening > utilities. > > > Will add this to a separate taskstats doc in Documentation/. > >>> That would help me. I guess i also need to catch up on genetlink to >>> better understand taskstats code. >> >> > Please do so soon. The usage of genetlink for taskstats has gone through > a detailed review by Jamal etc. so there shouldn't be any genetlink > issues that are pertinent to the potential CSA usage of taskstats. > > > --Shailabh > > >>> >>> Regards. >>> - jay >>> > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |