You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(19) |
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(11) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Christoph H. <hc...@in...> - 2005-07-23 18:54:00
|
On Fri, Jul 22, 2005 at 10:33:21PM +0200, bert hubert wrote: > On Fri, Jul 22, 2005 at 01:01:32PM -0700, Paul Jackson wrote: > > Another vote in favor of relayfs here ... > > At OLS the 'SystemTAP' idea was presented, which has been partially > implemented already, and it builds on relayfs as well. It dovetails nicely > with kprobes. And what exactly is this systemtap thing supposed to be? And why the heck do they announce it at some conference and we should suddenly care about it? |
From: Christoph H. <hc...@in...> - 2005-07-23 18:53:32
|
On Fri, Jul 22, 2005 at 01:01:32PM -0700, Paul Jackson wrote: > Another vote in favor of relayfs here ... > > I am reminded by my good colleagues at SGI that relayfs is a key > to the Linux Trace Toolkit (LTT), which is in turn an important > technology for some product(s) on which SGI is working. I don't think anyone cares for product plans of particular companies. That beein said I wish LTT folks would make a little more progress so we could actually include it. |
From: bert h. <ber...@ne...> - 2005-07-22 20:35:47
|
On Fri, Jul 22, 2005 at 01:01:32PM -0700, Paul Jackson wrote: > Another vote in favor of relayfs here ... At OLS the 'SystemTAP' idea was presented, which has been partially implemented already, and it builds on relayfs as well. It dovetails nicely with kprobes. So it appears there is a sizeable amount of code which is building on relayfs, iow, it is getting to be infrastructure. I'm redoing diskstat to work with k/jprobes so it won't require a kernel patch anymore, but it will still rely on relayfs. So it would be tremendously helpful if relayfs would be part of the mainline. I'll be banging out some HOWTO style documentation soonish. Bert. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services |
From: Paul J. <pj...@sg...> - 2005-07-22 20:01:47
|
Another vote in favor of relayfs here ... I am reminded by my good colleagues at SGI that relayfs is a key to the Linux Trace Toolkit (LTT), which is in turn an important technology for some product(s) on which SGI is working. It is uses such as this which speak to the value of including relayfs in the kernel. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Paul J. <pj...@sg...> - 2005-07-21 00:32:09
|
Bert wrote: > the diskstat tools require relayfs That way might lay the real value of relayfs, as a common technology basis for specific tools that are developed and maintained on top of relayfs. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: bert h. <ber...@ne...> - 2005-07-20 21:45:42
|
> > When I'm debugging something requiring detailed tracing, I don't want > to have to think about whether the tracing tool has the particular > behaviour, performance, data loss, and other such characteristics > needed for my immediate needs. It is easier to code up some little > ad hoc mechanism than it is to try to figure out whether some general > purpose mechanism is suitable and how to use the generic mechanism. You can do lots of modes with relayfs already - no ping-pong buffer, n-buffer, lossy, not lossy etc etc. I currently use it in 'flight-recorder' mode where new messages overwrite old ones. It might be good to document different possible ways of using relayfs. > If there are enough specific purposes for relayfs, fine. But beware > of over generalizing its potential usefulness. There is always the > risk of over designing it, adding additional flexibility and options > in an effort to gain customers, at the expense of making it less and > less obviously useful in a trivial way for any specific purpose. It's currently pretty limited - but you can add more features on top of it, in a modular fashion. I tend not to use the complex stuff, but you can layer it if you want. It'd be nice if we had some basic relaying infrastructure available that'd cover most needs successfully. Advanced users can do advanced things if they want. Btw, the diskstat tools (http://ds9a.nl/diskstat) require relayfs. It'll be released this Friday or so. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services |
From: Paul J. <pj...@sg...> - 2005-07-20 21:27:50
|
Steve wrote: > The reason I would like to see this merged, so kernel hackers don't need > to constantly write there own logging buffers everytime you need to > debug a complex area of the kernel. But I doubt that relayfs, or anything resembling it, will accomplish this purpose, at least for some of us, in many such situations. When I'm debugging something requiring detailed tracing, I don't want to have to think about whether the tracing tool has the particular behaviour, performance, data loss, and other such characteristics needed for my immediate needs. It is easier to code up some little ad hoc mechanism than it is to try to figure out whether some general purpose mechanism is suitable and how to use the generic mechanism. Invariably in any particular situation, there is some almost trivial way to hack in something adequate, for very little effort, doing things that would be utterly useless in some other case. Such tracing mechanisms work to obtain major subsystem isolation, by exposing the flow of data and control back and forth across a major boundary, such as using strace for the initial isolation of a problem that might be in user space, or might be in the kernel. But for detailed work within a subsystem, the corners that one can cut with ad hoc tools often make them vastly superior to general purpose tools. Even the best equipped of carpenters sometimes throw together some temporary scaffolding using rough cut 2x4's (2 inch by 4 inch cross section lumbar; I don't know what they're called in metric nations.) If there are enough specific purposes for relayfs, fine. But beware of over generalizing its potential usefulness. There is always the risk of over designing it, adding additional flexibility and options in an effort to gain customers, at the expense of making it less and less obviously useful in a trivial way for any specific purpose. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Steven R. <ro...@go...> - 2005-07-18 13:28:14
|
On Sun, 2005-07-17 at 21:45 +0200, bert hubert wrote: > On Sun, Jul 17, 2005 at 10:43:40AM -0500, Tom Zanussi wrote: > > > It is racey - in this mode, there's nothing to keep the kernel from > > writing as much as it wants before the user side has a chance to read > > any of it. The only way this can be used safely is to make sure the > > kernel side isn't writing anything when the client is reading. This > > would be typical of a flight-recording usage i.e. kernel writes a > > bunch of data continuously, then stops and allows the client to read > > whatever's in there. > > Or by numbering entries written out, when in flight-recording mode you > wouldn't want to block the kernel. Exactly! I've written a logging device to record data in the kernel that a printk can't help with. I've used this in debugging inturrupts, the scheduler, and high speed network packets. Where a printk to a serial would just slow things down, and going to the network is too expensive, and complex if you happen to be debugging the network. This tool is called logdev (http://www.kihontech.com/logdev) and uses a ring buffer that is like the relayfs overwrite mode. It can do printk like records and when something goes wrong, I dump the buffer to the serial. Or I have a user space program reading it from a device. I don't care about anything that happened earlier, I want to only know what happened up to the point I dumped the buffer. Lately, I've been usuing this with Ingo's RT patch, and when the system locks up, I dump the buffer, and it shows quite nicely where the lockup occurred, and why. With Tom's help, I also have a version that uses relayfs as a backend in overwrite mode. It's still a work in progress (so no web site yet!) since there's some issues of using a singe buffer for multiple CPUs. This helps in debugging race conditions since you need to see how events interleave. > > > > In fact, it appears this might even happen in non-overwrite mode. > > > > It shouldn't ever be able to happen in non-overwrite mode - if it > > did, it would be a bug. Can you be more specific as to how you see > > this happening in this mode? > > Yeah - you're right. The misunderstanding is because in both cases > (overwrite and non-overwrite) data is lost, except that in one case you lose > old data, and in the other new data. > > It might be a good idea to document this as well. > > Btw, I've already uncovered interesting things using relayfs, but I still > don't see the case for having it merged :-) The reason I would like to see this merged, so kernel hackers don't need to constantly write there own logging buffers everytime you need to debug a complex area of the kernel. -- Steve |
From: Tom Z. <za...@us...> - 2005-07-17 20:48:05
|
bert hubert writes: > On Sun, Jul 17, 2005 at 10:43:40AM -0500, Tom Zanussi wrote: > > > It is racey - in this mode, there's nothing to keep the kernel from > > writing as much as it wants before the user side has a chance to read > > any of it. The only way this can be used safely is to make sure the > > kernel side isn't writing anything when the client is reading. This > > would be typical of a flight-recording usage i.e. kernel writes a > > bunch of data continuously, then stops and allows the client to read > > whatever's in there. > > Or by numbering entries written out, when in flight-recording mode you > wouldn't want to block the kernel. > > > > In fact, it appears this might even happen in non-overwrite mode. > > > > It shouldn't ever be able to happen in non-overwrite mode - if it > > did, it would be a bug. Can you be more specific as to how you see > > this happening in this mode? > > Yeah - you're right. The misunderstanding is because in both cases > (overwrite and non-overwrite) data is lost, except that in one case you lose > old data, and in the other new data. Just to clarify - in either mode, if you don't have a consumer or the consumer can't keep up with the amount of data being written by the kernel, you will of course lose data at some point. Normally you wouldn't want to lose data; by using non-overwrite mode you're implicitly letting relayfs know this i.e. if at any point all the sub-buffers remain unread and the kernel is still trying to write into them, let the client know (via the buffer-full callback) that this has happened. Presumably you would then increase the buffer size or have the kernel write less etc. > > It might be a good idea to document this as well. > Yes, I'll make it more explicit in the documentation. > Btw, I've already uncovered interesting things using relayfs, but I still > don't see the case for having it merged :-) Glad to hear it. Can you say what if anything would convince you it should be merged? > > Thanks for your answers, I think I get it all now. No problem, and thanks for patch and other suggestions. Tom |
From: bert h. <ber...@ne...> - 2005-07-17 19:46:08
|
On Sun, Jul 17, 2005 at 10:43:40AM -0500, Tom Zanussi wrote: > It is racey - in this mode, there's nothing to keep the kernel from > writing as much as it wants before the user side has a chance to read > any of it. The only way this can be used safely is to make sure the > kernel side isn't writing anything when the client is reading. This > would be typical of a flight-recording usage i.e. kernel writes a > bunch of data continuously, then stops and allows the client to read > whatever's in there. Or by numbering entries written out, when in flight-recording mode you wouldn't want to block the kernel. > > In fact, it appears this might even happen in non-overwrite mode. > > It shouldn't ever be able to happen in non-overwrite mode - if it > did, it would be a bug. Can you be more specific as to how you see > this happening in this mode? Yeah - you're right. The misunderstanding is because in both cases (overwrite and non-overwrite) data is lost, except that in one case you lose old data, and in the other new data. It might be a good idea to document this as well. Btw, I've already uncovered interesting things using relayfs, but I still don't see the case for having it merged :-) Thanks for your answers, I think I get it all now. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services |
From: Tom Z. <za...@us...> - 2005-07-17 15:44:00
|
bert hubert writes: > On Sat, Jul 16, 2005 at 06:13:55PM -0500, Tom Zanussi wrote: > > > relayfs itself only provides the buffering and file operations along > > with the kernel API for clients as documented in > > Documentation/filesystems/relayfs.txt. Applications still need some > > kind of communication between the kernel and user space in order to > > know when data is ready and how much is ready - the relay-apps stuff > > tries to make this easy to do by allowing clients to ignore all those > > details. It happens to use netlink for this, but clients can use > > whatever they want to do this communication. > > Ok - that is good to know. What is missing from relayfs.txt is a demarcation > of which system does what. > > As I see it there are three things currently: > > 1) Basic relayfs facilities, which only stuff data into N sub-buffers per > CPU, but also offer a set of functions that could be called via userspace > over some sort of communication channel. > > 2) klog which is a thin wrapper over relay_write > > 3) relay-app.h which lives in the kernel and communicates with librelay.c in > user space, providing that communication. > > Is this correct? Yes. > > > Then just run the kleak app, and when it finishes, you should have a > > set of files, cpu0l...cpuX in your current directory containing all > > the data you've logged. > > I've changed the fprintf(stderr, "netlink send error") to perror("netlink > send error") and now it prints 'Connection refused', which makes heaps of > sense since I did not use relay-app.h, but wrote directly to the channel. Right - you need to insmod kleak.ko in order for the netlink socket to be created in the kernel. > > > > 2) What kind of messages do I need to send/receive? > > > > Basically, the daemon needs to know, for a given per-cpu buffer, how > > many sub-buffers have been produced and consumed, in order to know > > which sections of the mmapped buffer to read. It also needs to notify > > I currently just write away without any userspace component, except that I > mmap the entire relayfs file in which I see the four configured sub-buffers. > I guess that in override mode that would work? Right - this sounds exactly like what overwrite mode is meant for - flight-recording types of applications, where you don't have an active reader in userspace and you're interested in the most recent data. If you don't have an active reader and use no-overwrite mode, the buffer will become full when it wraps around the first time, and subsequent events will be lost (the buffer-full callback will tell you when this happens). > > > The format is whatever the client writes into it - relayfs itself > > doesn't impose any format at all. The client doesn't need librelay.c > > to read the data itself - librelay.c is for managing the daemon side > > of the application and writing ready data to disk as it becomes > > available. It doesn't know anything about the actual data being written. > > Ok - so there is nothing in there except n stretches of data, and some > padding? Each write is either IN a sub-buffer or not at all, it doesn't span > sub-buffers? Right, a write will never be split across sub-buffers. > > > > 4) What are the semantics for reading from that file? > > > > The file is a buffer broken up into sub-buffers. The client reads the > > sub-buffers it knows are ready directly from the mmapped buffer. > > The file can only be mmap()ed - there is no read() available. > > Indeed. So the idea is to wait for a ringbuffer to become 'full', read it, > and wait for the next one to become full? Right, as sub-buffers become full, the userspace part of the client should read them, update the kernel part with how many it just consumed, and wait around for more. > > > BTW, there's also documentation in relay-app.h, don't know if you saw > > that. > > Yes - but it only makes sense after the 'separation of powers' within > relayfs is clear. relayfs.txt talks rather cavalierly of 'clients' and > 'calls' but does not make clear this client lives in userspace and can't > just call kernel functions. > > Please consider the patch below. I'm not 100% sure if everything is correct, > but I'd love to know. Yes, on first reading, it all looks correct, and does a nice job of clarifying things - thanks for taking the time to do this. :-) > > I'm wondering how relayfs could be operated safely in overwrite mode, btw - > who's to say the kernel might not have zoomed past my sub-buffer once I'm > notified of the crossing? The padding data I receive might be outdated by > then. Sounds racey. It is racey - in this mode, there's nothing to keep the kernel from writing as much as it wants before the user side has a chance to read any of it. The only way this can be used safely is to make sure the kernel side isn't writing anything when the client is reading. This would be typical of a flight-recording usage i.e. kernel writes a bunch of data continuously, then stops and allows the client to read whatever's in there. > > In fact, it appears this might even happen in non-overwrite mode. It shouldn't ever be able to happen in non-overwrite mode - if it did, it would be a bug. Can you be more specific as to how you see this happening in this mode? Thanks, Tom > > diff -urBb -X linux-2.6.13-rc3-mm1/Documentation/dontdiff linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt linux-2.6.13-rc3-mm1-ahu/Documentation/filesystems/relayfs.txt > --- linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt 2005-07-17 11:00:48.000638680 +0200 > +++ linux-2.6.13-rc3-mm1-ahu/Documentation/filesystems/relayfs.txt 2005-07-17 10:58:21.634889656 +0200 > @@ -23,6 +23,46 @@ > the function parameters are documented along with the functions in the > filesystem code - please see that for details. > > +Semantics > +========= > + > +Each relayfs channel has one buffer per CPU, each buffer has one or > +more sub-buffers. Messages are written to the first sub-buffer until it > +is too full to contain a new message, in which case it it is written to > +the next (if available). At this point, userspace can be notified so it > +empties the first ringbuffer, while the kernel continues writing to the > +next. > + > +If notified that a sub-buffer is full, the kernel knows how many bytes > +of it are padding, ie, unused. Userspace can use this knowledge to copy > +only valid data. > + > +After copying, userspace can notify the kernel that a sub-channel has > +been consumed. > + > +relayfs can operate in a mode where it will overwrite data not yet > +collected by userspace, and not wait for it to consume it. > + > +relayfs itself does not provide for communication of such data between > +userspace and kernel, allowing the kernel side to remain simple and not > +impose a single interface on userspace. It does provide a separate > +helper though, described below. > + > +Klog, relay-app & librelay > +========================== > + > +relayfs itself is ready to use, but to make things easier, two > +additional systems are provided. Klog is a simple wrapper to make > +sending data to a channel simpler. relay-app is the kernel counterpart > +of userspace librelay.c, combined these two files provide glue to > +easily stream data, without having to bother with housekeeping. > + > +It is possible to use relayfs without relay-app & librelay, but you'll > +have to implement communication between userspace and kernel, allowing > +both to convey the state of buffers (full, empty, amount of padding). > + > +Klog, relay-app and librelay can be found on > +http://relayfs.sourceforge.net > > The relayfs user space API > ========================== > @@ -34,7 +74,8 @@ > open() enables user to open an _existing_ buffer. > > mmap() results in channel buffer being mapped into the caller's > - memory space. > + memory space. Note that you can't do a partial mmap - you must > + map the entire file, which is NRBUF * SUBBUFSIZE. > > poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are > notified when sub-buffer boundaries are crossed. > @@ -70,6 +109,9 @@ > relayfs_create_dir(name, parent) > relayfs_remove_dir(dentry) > relay_commit(buf, reserved, count) > + > + channel management typically called on instigation of userspace: > + > relay_subbufs_consumed(chan, cpu, subbufs_consumed) > > write functions: > @@ -86,10 +128,9 @@ > buf_unmapped(buf, filp) > buf_full(buf, subbuf_idx) > > - > -A relayfs channel is made of up one or more per-cpu channel buffers, > -each implemented as a circular buffer subdivided into one or more > -sub-buffers. > +As explained above, a relayfs channel is made of up one or more per-cpu > +channel buffers, each implemented as a circular buffer subdivided into > +one or more sub-buffers. > > relay_open() is used to create a channel, along with its per-cpu > channel buffers. Each channel buffer will have an associated file > @@ -123,24 +164,25 @@ > data regardless of whether it's actually been consumed. In > no-overwrite mode, writes will fail i.e. data will be lost, if the > number of unconsumed sub-buffers equals the total number of > -sub-buffers in the channel. In this mode, the client is reponsible > -for notifying relayfs when sub-buffers have been consumed via > -relay_subbufs_consumed(). A full buffer will become 'unfull' and > -logging will continue once the client calls relay_subbufs_consumed() > -again. When a buffer becomes full, the buf_full() callback is invoked > -to notify the client. In both modes, the subbuf_start() callback will > -notify the client whenever a sub-buffer boundary is crossed. This can > -be used to write header information into the new sub-buffer or fill in > -header information reserved in the previous sub-buffer. One piece of > -information that's useful to save in a reserved header slot is the > -number of bytes of 'padding' for a sub-buffer, which is the amount of > -unused space at the end of a sub-buffer. The padding count for each > -sub-buffer is contained in an array in the rchan_buf struct passed > -into the subbuf_start() callback: rchan_buf->padding[prev_subbuf_idx] > -can be used to to get the padding for the just-finished sub-buffer. > -subbuf_start() is also called for the first sub-buffer in each channel > -buffer when the channel is created. The mode is specified to > -relay_open() using the overwrite parameter. > +sub-buffers in the channel. > + > +In this mode, the userspace client is reponsible for notifying relayfs when > +sub-buffers have been consumed via relay_subbufs_consumed(). A full buffer > +will become 'unfull' and logging will continue once the client calls > +relay_subbufs_consumed(). When a buffer becomes full, the buf_full() > +callback is invoked to notify the client. In both modes, the subbuf_start() > +callback will notify the client whenever a sub-buffer boundary is crossed. > + > +This can be used to write header information into the new sub-buffer or fill > +in header information reserved in the previous sub-buffer. One piece of > +information that's useful to save in a reserved header slot is the number of > +bytes of 'padding' for a sub-buffer, which is the amount of unused space at > +the end of a sub-buffer. The padding count for each sub-buffer is contained > +in an array in the rchan_buf struct passed into the subbuf_start() callback: > +rchan_buf->padding[prev_subbuf_idx] can be used to to get the padding for > +the just-finished sub-buffer. subbuf_start() is also called for the first > +sub-buffer in each channel buffer when the channel is created. The mode is > +specified to relay_open() using the overwrite parameter. > > kernel clients write data into the current cpu's channel buffer using > relay_write() or __relay_write(). relay_write() is the main logging > > > -- > http://www.PowerDNS.com Open source, database driven DNS Software > http://netherlabs.nl Open and Closed source services -- Regards, Tom Zanussi <za...@us...> IBM Linux Technology Center/RAS |
From: bert h. <ber...@ne...> - 2005-07-17 09:01:48
|
On Sat, Jul 16, 2005 at 06:13:55PM -0500, Tom Zanussi wrote: > relayfs itself only provides the buffering and file operations along > with the kernel API for clients as documented in > Documentation/filesystems/relayfs.txt. Applications still need some > kind of communication between the kernel and user space in order to > know when data is ready and how much is ready - the relay-apps stuff > tries to make this easy to do by allowing clients to ignore all those > details. It happens to use netlink for this, but clients can use > whatever they want to do this communication. Ok - that is good to know. What is missing from relayfs.txt is a demarcation of which system does what. As I see it there are three things currently: 1) Basic relayfs facilities, which only stuff data into N sub-buffers per CPU, but also offer a set of functions that could be called via userspace over some sort of communication channel. 2) klog which is a thin wrapper over relay_write 3) relay-app.h which lives in the kernel and communicates with librelay.c in user space, providing that communication. Is this correct? > Then just run the kleak app, and when it finishes, you should have a > set of files, cpu0l...cpuX in your current directory containing all > the data you've logged. I've changed the fprintf(stderr, "netlink send error") to perror("netlink send error") and now it prints 'Connection refused', which makes heaps of sense since I did not use relay-app.h, but wrote directly to the channel. > > 2) What kind of messages do I need to send/receive? > > Basically, the daemon needs to know, for a given per-cpu buffer, how > many sub-buffers have been produced and consumed, in order to know > which sections of the mmapped buffer to read. It also needs to notify I currently just write away without any userspace component, except that I mmap the entire relayfs file in which I see the four configured sub-buffers. I guess that in override mode that would work? > The format is whatever the client writes into it - relayfs itself > doesn't impose any format at all. The client doesn't need librelay.c > to read the data itself - librelay.c is for managing the daemon side > of the application and writing ready data to disk as it becomes > available. It doesn't know anything about the actual data being written. Ok - so there is nothing in there except n stretches of data, and some padding? Each write is either IN a sub-buffer or not at all, it doesn't span sub-buffers? > > 4) What are the semantics for reading from that file? > > The file is a buffer broken up into sub-buffers. The client reads the > sub-buffers it knows are ready directly from the mmapped buffer. > The file can only be mmap()ed - there is no read() available. Indeed. So the idea is to wait for a ringbuffer to become 'full', read it, and wait for the next one to become full? > BTW, there's also documentation in relay-app.h, don't know if you saw > that. Yes - but it only makes sense after the 'separation of powers' within relayfs is clear. relayfs.txt talks rather cavalierly of 'clients' and 'calls' but does not make clear this client lives in userspace and can't just call kernel functions. Please consider the patch below. I'm not 100% sure if everything is correct, but I'd love to know. I'm wondering how relayfs could be operated safely in overwrite mode, btw - who's to say the kernel might not have zoomed past my sub-buffer once I'm notified of the crossing? The padding data I receive might be outdated by then. Sounds racey. In fact, it appears this might even happen in non-overwrite mode. diff -urBb -X linux-2.6.13-rc3-mm1/Documentation/dontdiff linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt linux-2.6.13-rc3-mm1-ahu/Documentation/filesystems/relayfs.txt --- linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt 2005-07-17 11:00:48.000638680 +0200 +++ linux-2.6.13-rc3-mm1-ahu/Documentation/filesystems/relayfs.txt 2005-07-17 10:58:21.634889656 +0200 @@ -23,6 +23,46 @@ the function parameters are documented along with the functions in the filesystem code - please see that for details. +Semantics +========= + +Each relayfs channel has one buffer per CPU, each buffer has one or +more sub-buffers. Messages are written to the first sub-buffer until it +is too full to contain a new message, in which case it it is written to +the next (if available). At this point, userspace can be notified so it +empties the first ringbuffer, while the kernel continues writing to the +next. + +If notified that a sub-buffer is full, the kernel knows how many bytes +of it are padding, ie, unused. Userspace can use this knowledge to copy +only valid data. + +After copying, userspace can notify the kernel that a sub-channel has +been consumed. + +relayfs can operate in a mode where it will overwrite data not yet +collected by userspace, and not wait for it to consume it. + +relayfs itself does not provide for communication of such data between +userspace and kernel, allowing the kernel side to remain simple and not +impose a single interface on userspace. It does provide a separate +helper though, described below. + +Klog, relay-app & librelay +========================== + +relayfs itself is ready to use, but to make things easier, two +additional systems are provided. Klog is a simple wrapper to make +sending data to a channel simpler. relay-app is the kernel counterpart +of userspace librelay.c, combined these two files provide glue to +easily stream data, without having to bother with housekeeping. + +It is possible to use relayfs without relay-app & librelay, but you'll +have to implement communication between userspace and kernel, allowing +both to convey the state of buffers (full, empty, amount of padding). + +Klog, relay-app and librelay can be found on +http://relayfs.sourceforge.net The relayfs user space API ========================== @@ -34,7 +74,8 @@ open() enables user to open an _existing_ buffer. mmap() results in channel buffer being mapped into the caller's - memory space. + memory space. Note that you can't do a partial mmap - you must + map the entire file, which is NRBUF * SUBBUFSIZE. poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are notified when sub-buffer boundaries are crossed. @@ -70,6 +109,9 @@ relayfs_create_dir(name, parent) relayfs_remove_dir(dentry) relay_commit(buf, reserved, count) + + channel management typically called on instigation of userspace: + relay_subbufs_consumed(chan, cpu, subbufs_consumed) write functions: @@ -86,10 +128,9 @@ buf_unmapped(buf, filp) buf_full(buf, subbuf_idx) - -A relayfs channel is made of up one or more per-cpu channel buffers, -each implemented as a circular buffer subdivided into one or more -sub-buffers. +As explained above, a relayfs channel is made of up one or more per-cpu +channel buffers, each implemented as a circular buffer subdivided into +one or more sub-buffers. relay_open() is used to create a channel, along with its per-cpu channel buffers. Each channel buffer will have an associated file @@ -123,24 +164,25 @@ data regardless of whether it's actually been consumed. In no-overwrite mode, writes will fail i.e. data will be lost, if the number of unconsumed sub-buffers equals the total number of -sub-buffers in the channel. In this mode, the client is reponsible -for notifying relayfs when sub-buffers have been consumed via -relay_subbufs_consumed(). A full buffer will become 'unfull' and -logging will continue once the client calls relay_subbufs_consumed() -again. When a buffer becomes full, the buf_full() callback is invoked -to notify the client. In both modes, the subbuf_start() callback will -notify the client whenever a sub-buffer boundary is crossed. This can -be used to write header information into the new sub-buffer or fill in -header information reserved in the previous sub-buffer. One piece of -information that's useful to save in a reserved header slot is the -number of bytes of 'padding' for a sub-buffer, which is the amount of -unused space at the end of a sub-buffer. The padding count for each -sub-buffer is contained in an array in the rchan_buf struct passed -into the subbuf_start() callback: rchan_buf->padding[prev_subbuf_idx] -can be used to to get the padding for the just-finished sub-buffer. -subbuf_start() is also called for the first sub-buffer in each channel -buffer when the channel is created. The mode is specified to -relay_open() using the overwrite parameter. +sub-buffers in the channel. + +In this mode, the userspace client is reponsible for notifying relayfs when +sub-buffers have been consumed via relay_subbufs_consumed(). A full buffer +will become 'unfull' and logging will continue once the client calls +relay_subbufs_consumed(). When a buffer becomes full, the buf_full() +callback is invoked to notify the client. In both modes, the subbuf_start() +callback will notify the client whenever a sub-buffer boundary is crossed. + +This can be used to write header information into the new sub-buffer or fill +in header information reserved in the previous sub-buffer. One piece of +information that's useful to save in a reserved header slot is the number of +bytes of 'padding' for a sub-buffer, which is the amount of unused space at +the end of a sub-buffer. The padding count for each sub-buffer is contained +in an array in the rchan_buf struct passed into the subbuf_start() callback: +rchan_buf->padding[prev_subbuf_idx] can be used to to get the padding for +the just-finished sub-buffer. subbuf_start() is also called for the first +sub-buffer in each channel buffer when the channel is created. The mode is +specified to relay_open() using the overwrite parameter. kernel clients write data into the current cpu's channel buffer using relay_write() or __relay_write(). relay_write() is the main logging -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services |
From: Tom Z. <za...@us...> - 2005-07-16 23:14:21
|
bert hubert writes: > Ok, I'm working furiously on my OLS presentation (Wednesday, 3pm, be > there), but I'm running into a wall with relayfs, which I intend to use to > convey large amounts of disk statistics towards userspace. > > Now, I've read Documentation/filesystems/relayfs.txt many times over, and I > don't get it. > > It appears there is relayfs, and 'klog' on top of that. It also appears that > to access relayed data from the kernel in userspace there is librelay.c. > > On reading librelay.c, I find code sending and receiving netlink > messages, but relayfs.txt doesn't even contain the word netlink! Hi, relayfs itself only provides the buffering and file operations along with the kernel API for clients as documented in Documentation/filesystems/relayfs.txt. Applications still need some kind of communication between the kernel and user space in order to know when data is ready and how much is ready - the relay-apps stuff tries to make this easy to do by allowing clients to ignore all those details. It happens to use netlink for this, but clients can use whatever they want to do this communication. The klog patch just makes a couple of utility logging functions available for use from anywhere within the kernel which allow the client to not have to worry about whether or not there's a relayfs channel ready to receive the data - you could just as well use relay_write directly in say the IO function you want to trace, but you'd have to do something like if(relay_channel) relay_write(). It just allows you to uncondionally log regardless of whether there's a channel ready or not. If you just want to get something up and running without worrying about the netlink channel and all that stuff, you can just modify the kleak example as follows: - apply the klog.patch - in kleak.c, change init_relay_app("kleak", "cpu", NULL) to init_relay_app("diskstat", "cpu", NULL). The relayfs files will be created as /mnt/relay/diskstat/cpu0...cpuX, if you've mounted relayfs at /mnt/relay. - in kleak-app.c, change static char *kleak_filebase = "/mnt/relay/kleak/cpu"; to static char *kleak_filebase = "/mnt/relay/diskstat/cpu"; - log the data from the kernel functions using klog() or klog_printk(). The kleak.patch file shows how to do this for kmalloc/kfree, just do something similar in the functions you actually want to instrument. You can also use klog_printk() if you want to log as text. Then just run the kleak app, and when it finishes, you should have a set of files, cpu0l...cpuX in your current directory containing all the data you've logged. If you still have problems and would be willing to share your code, I'd be happy to get it going myself. Just let me know. > > I then launched the 'kleak-app' sample program, but told it to look at > /relay/diskstat* instead of its own file, but it gives me unspecified > netlink errors. Can you give me more details about these errors? > > Things I need to know, and which I hope to find documented somewhere: > > 1) Do I need to do the netlink thing? No, the example code uses netlink, but you could use anything you want to communicate between the kernel and daemon. > 2) What kind of messages do I need to send/receive? Basically, the daemon needs to know, for a given per-cpu buffer, how many sub-buffers have been produced and consumed, in order to know which sections of the mmapped buffer to read. It also needs to notify the kernel client of how many sub-buffers it's consumed. Basically that's it - the rest is application management e.g. the buffer sizes to use, when to start/stop logging, etc. > 3) What is the exact format userspace sees in the relayfs file? Iow, can I > access that file w/o using librelay.c? The format is whatever the client writes into it - relayfs itself doesn't impose any format at all. The client doesn't need librelay.c to read the data itself - librelay.c is for managing the daemon side of the application and writing ready data to disk as it becomes available. It doesn't know anything about the actual data being written. > 4) What are the semantics for reading from that file? The file is a buffer broken up into sub-buffers. The client reads the sub-buffers it knows are ready directly from the mmapped buffer. The file can only be mmap()ed - there is no read() available. > 5) When using klog, is there only one channel? There is only one channel, which is represented in the filesytem as a set of per-cpu files. > 6) does librelay.c talk to regular relayfs or to klog? librelay.c talks to the client code in relay-app.h, which in turn uses the relayfs kernel API to talk to relayfs. BTW, there's also documentation in relay-app.h, don't know if you saw that. Hope that helps, Tom |
From: bert h. <ber...@ne...> - 2005-07-16 21:12:56
|
Ok, I'm working furiously on my OLS presentation (Wednesday, 3pm, be there), but I'm running into a wall with relayfs, which I intend to use to convey large amounts of disk statistics towards userspace. Now, I've read Documentation/filesystems/relayfs.txt many times over, and I don't get it. It appears there is relayfs, and 'klog' on top of that. It also appears that to access relayed data from the kernel in userspace there is librelay.c. On reading librelay.c, I find code sending and receiving netlink messages, but relayfs.txt doesn't even contain the word netlink! I then launched the 'kleak-app' sample program, but told it to look at /relay/diskstat* instead of its own file, but it gives me unspecified netlink errors. Things I need to know, and which I hope to find documented somewhere: 1) Do I need to do the netlink thing? 2) What kind of messages do I need to send/receive? 3) What is the exact format userspace sees in the relayfs file? Iow, can I access that file w/o using librelay.c? 4) What are the semantics for reading from that file? 5) When using klog, is there only one channel? 6) does librelay.c talk to regular relayfs or to klog? Don't get me wrong, relayfs sure looks nice for what I'm trying to do but from userspace it is sort of a black box right now.. Thanks! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services |