Andrew Morton wrote:
>Balbir Singh <balbir@...> wrote:
>
>
>>On Wed, Mar 29, 2006 at 09:03:14PM -0800, Andrew Morton wrote:
>>
>>
>>>Shailabh Nagar <nagar@...> wrote:
>>>
>>>
>>>>Could you please include the following delay accounting patches
>>>> in -mm ?
>>>>
>>>>
>>>I'm at a loss to evaluate the suitability of this work, really. I always
>>>am when accounting patches come along.
>>>
>>>There are various people and various groups working on various different
>>>things and there appears to be no coordination and little commonality of
>>>aims. I worry that picking one submission basically at random will provide
>>>nothing which the other groups can work on to build up their feature.
>>>
>>>On the other hand, we don't want to do nothing until some uber-grand
>>>all-singing, all-dancing statistics-gathering infrastructure comes along.
>>>
>>>So I'm a bit stuck. What I would like to see happen is that there be some
>>>coordination between the various stakeholders, and some vague plan which
>>>they're all happy with as a basis for the eventual grand solution.
>>>
>>>We already have various bits and pieces of statistics gathering in the
>>>kernel and it's already a bit ad-hoc. Adding more one-requirement-specific
>>>accounting code won't improve that situation.
>>>
>>>But then, I said all this a year or two ago and nothing much has happened
>>>since then. It's not your fault, but it's a problem.
>>>
>>>Perhaps a good starting point would be a one-page bullet-point-form
>>>wishlist of all the accounting which people want to get out of the kernel,
>>>and a description of what the kernel<->user interface should look like.
>>>Right now, I don't think we even have a picture of that.
>>>
>>>We need a statistics maintainer, too, to pull together the plan,
>>>coordinate, push things forwards. The first step would be to identify the
>>>stakeholders, come up with that page of bullet-points.
>>>
>>>Then again, maybe the right thing to do is to keep adding low-impact
>>>requirement-specific statistics patches as they come along. But if we're
>>>going to do it that way, we need an up-front reason for doing so, and I
>>>don't know what that would be.
>>>
>>>See my problem?
>>>
>>>
>>One of the issues we have tried to address is the ability to provide some
>>form of a common ground for all the statistics to co-exist. Various methods
>>were discussed for exchanging data between kernel and user space, genetlink
>>was suggested often and the clear winner.
>>
>>To that end, we have created a taskstats.c file. Any subsystem wanting
>>to add their statistics and sending it to user space can add their own
>>types by extending taskstats.c (changing the version number) and creating
>>their own types using genetlink. They will have to do the following
>>
>>1. Add statistics gathering in their own subsystem
>>2. Add a type to taskstats.c, extend it and use data from (1) and send
>> it to user space.
>>
>>The data from various subsystems can co-exist. I feel that this could serve as
>>the basic common infrastructure to begin with and refined later (depending on
>>the needs of other people).
>>
>>
>>
>
>Sounds fine to me, but I'm not a stakeholder.
>
>Trolling back through lse-tech gives us:
>
>pnotify:
> Erik Jacobson <erikj@...>
>
>CSA accounting/PAGG/JOB:
> Jay Lan <jlan@...>
> Limin Gu <limin@...>
>
>per-process IO statistics:
> Levent Serinol <lserinol@...>
>
>ELSA:
> Guillaume Thouvenin <guillaume.thouvenin@...>
>
>per-cpu time statistics:
> Erich Focht <efocht@...>
>
>Scalable statistics counters with /proc reporting:
> Ravikiran G Thirumalai <kiran@...>
> (Kiran feft IBM, but presumably the requirement lives on)
>
>There was a long thread "A common layer for Accounting packages". Did it
>come to a conclusion?
>
>Anyway, if mostly everyone is mostly happy with what you propose then that
>it good news.
>
>
Following Andrew's suggestion, here's my quick overview
of the various other accounting packages that have been
proposed on lse-tech with a focus on whether they can
utilize the netlink-based taskstats interface being proposed
by the delay accounting patches.
Please note that unification of statistics *collection* is not
being discussed since that kind of merger can be done as these
patches get accepted, if at all, into the kernel. To try and
unify right away would hold every patch (esp. delay accounting !)
hostage to the problems in every other patch unnecessarily. As
long as the interface can be unified, the merger of the
collection bits can always happen without affecting user space.
Stakeholders of each of these patches, on cc, are requested to
please correct any misunderstandings of what their patches do.
Also, please comment on the observations about their patch's
ability to use the netlink-based taskstats interface, code for which
was posted at
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/1787.html
Thanks,
--Shailabh
Summary
The following can use the taskstats netlink-based
interface by extending the returned data structure
- Comprehensive System Accounting
- per-process I/O stats
- Microstate accounting
- per cpu time stats
The following patches' interface needs are independent
of taskstats or subsumed by one of above:
- Enhanced Linux System Accounting
- pnotify
- scalable statistics counters
Details
(please correct if these are misunderstood)
1. Comprehensive System Accounting (Jay Lan)
--------------------------------------------
- Collect various per-task statistics and write an accounting
record containing these stats at task exit. Interface similar to
BSD process accounting but the accounting record structure is
quite different.
- CSA could utilize some stats collected/exported by delay
accounting: blkio wait time, cpu run time for task
- CSA only needs data to be available at task exit, not during the
task's lifetime. Moreover, at task exit, it needs the accounting record
to be written to a file.
- CSA could utilize delay accounting's taskstats netlink interface
to gather task data at exit through a userspace utility that then writes
it out to its expected file.
To do so, CSA would need the taskstats struct to be
extended with whatever additional stats it needs.
The additional stats could be selectively exported only on
task exit to avoid imposing a space burden on users of delay
accounting who query a process's statistics during its lifetime.
Collection of the additional stats needed by CSA may be tied to pnotify
and job
patches which are still being reviewed/considered for acceptance. As such,
unification in the collection of stats can be deferred until status of
pnotify/job/CSA patches becomes more clear.
2. per-process I/O statistics (Levent Serinol)
----------------------------------------------
- Exports task->{rchar,wchar} through /proc/tgid/iostat
(earlier version proposed export through /proc/tgid/stats)
- No new stats collection. Just export of existing task fields
- Problem with accepting the patch stems from the accuracy
of the statistics in these fields. The fields are updated only in
three cases today (sys_read/write, sys_readv/writev, do_sendfile)
so they aren't accurate. Async I/O, memory-mapped I/O is not
counted at the very least.
CSA patches also export these fields through their accounting
record but don't appear to be doing anything to improve accuracy
of collection (or maybe it doesn't matter to them). BSD accounting,
which ought to be using the sum of these fields for its ac_io field,
doesn't (it hardcodes the output to zero).
When the fate of task->rchar/wchar is decided, based on
CSA's needs, those fields can be easily added to taskstats.
3. per-cpu time statistics (Erich Focht)
----------------------------------------
- Collects time spent by a task on each cpu of a system
and exports it through new interface /proc/tgid/cpu
- Statistic is needed for performance analysis/debugging
(like schedstats) and not for production systems.
- Unsure why push for acceptance was abandoned. Possibly due
to one or more of: space overhead of allocating NR_CPUS variables
in task_struct, time overhead of collecting the data ?
- Can use taskstats interface to export the data by adding needed
fields to struct taskstats and bumping up the version.
4. Microstate accounting (Peter Chubb)
--------------------------------------
- Measure time spent by a thread in various interesting states,
while accounting for interrupts, and export through /proc/tid/msa
and through a syscall interface
- Interesting states have some overlap with delay accounting
- Exporting of per-task stats can be done through taskstats
netlink interface
5. Enhanced Linux System Accounting (Guillaume Thouvenine)
----------------------------------------------------------
- Group tasks at a user level into "jobs" and aggregate,
at user level, per-task statistics collected by CSA and/or BSD
process accounting.
- ELSA does not introduce any new requirement for either
collection or export of statistics from the kernel. It can use
either BSD and/or CSA's method of using an accounting file.
- ELSA needs notification of forks and exits which it can already
get through the process events connector in the kernel.
Hence ELSA's needs are either met by the kernel today or are a
strict subset of CSA (since BSD accounting is already there).
6. pnotify (Erik Jacobson)
--------------------------
- Infrastructure for kernel modules to be notified when an event
(like fork/exit/exec) happens to a task. Also provides some per-task data
for the modules' convenience
- pnotify isn't concerned with exporting data to userspace or collecting
any stats. Thats left to the kernel module that uses pnotify to get
notifications. CSA is one expected user of pnotify.
7. Scalable statistics counters (Ravikiran Thirumalai, Dipankar Sarma)
----------------------------------------------------------------------
- Infrastructure for setting up per-cpu counters (not per-task necessarily)
- No specific stats collection proposed as part of patch
- May have need for interface for fast export to userspace but requirements
not clear
- Not per-task and unlikely to have unification prospects at interface level
|