From: Erik W. <om...@cs...> - 2000-03-23 09:45:18
|
This message is intended to get everyone up to speed on the co-routines as they will be put into CVS in the near future. There are a lot of benefits and things that are now possible due to the addition of these things, but I've written up a lot of the reasoning behind them as well. This should probably end up in the developer documentation at some point. It's long, but I think it's required reading for anyone hacking the core code (gstreamer/gst/*.[ch]). ----- First, a quick explanation of The Way Things Were(tm): To actually run the pipeline, one would call an _iterate() function, either provided by the bin or thread. More specifically, the thread would call its own _iterate() in a loop once set RUNNING && PLAYING. During plan generation it would have discovered all elements that satisfy one of the following criterea: a) GstSrc-derived b) has a sink pad that's wired to an element outside the bin (i.e. queue) (in this case the element recorded is the one outside, i.e. the queue) These are called 'entries' into the Bin, and are the scheduling drivers. Of course, behaviour os strictly undefined if there is more than one of these, and that's why things have changed... The _iterate() function would then go through the list of entries and call their respective push() or equivalent function, meaning eiter gst_src_push() or gst_connection_push() (remember, the outside element is the entry in that case), in either case causing the push() function of the element to fire. In said push() function some work is done (read from disk, pull from queue, whatever) and a gst_pad_push() is called. At this point the chain() function for the peered pad is called, which has the effect of calling the functional code for the next element in the pipeline, which just happens to create the 'pipeline' effect, hence the point of this whole description ;-) Now, consider the mutlitide of cases where that whole model just doesn't cut it: 1) elements that can't respond to chain() calls but must pull their data (like a real bitstream-based element) 2) elements with more than one input or output It turns out that these are more common than you might want to think. The whole OGI pipeline structure is built around the loop model, where the life of an element is while(1) {pull;process;push}. The various system-stream parsers cause the problem of potentially causing large amounts of work serially if stuff is hooked to them directly. Obviously, you can solve this by putting thread boundaries between such misbehaved elements, at least in the mux/demux case. But there are problems with even that, as we've found at OGI. Co-routines were put in the OGI pipeline to deal with the problem that several of the elements were barely touching the CPU, and thus causing the huge OS-level overhead of switching these things constantly. Merging them into a single schedulable entity solves the problem, because typically there's a larger element close by to group it with, so scheduling overhead reduces drastically. So, the solution is co-routines (also called cothreads, same thing). First co-routines are a simple user-space method for switching between subtasks. They're based on setjmp()/longjmp() in their current form, though I've heard that there are other (more machine-specific) methods that are faster. Basically, setjmp() saves the current stack frame, PC, and so on to a structure, and longjmp() switches to one of these stack frames. That means that you save the stack for your current context and promptly switch (fork()-style) to some other context. You get returned to just after the setjmp(), hence the fork()-style check before longjmp()'ing again, unless you like infinite loops ;-) As they are implemented in GStreamer, the whole of the work is done in the Bin. GstThread pretty much goes along for the right by not overriding things, which is the way things are supposed to be. The Pads help a lot, but are unaware of the actual mechanism. The changes at the Pad level consiste of a buffer pen and a function pointer (two currently, but that'll be fixed). Basically, there's a push and a pull function (this also conflicts with backwards buffer passing, but I'll figure something out) that's used for all transfer operations. In the _push() case, the buffer is placed in the peer's pen and the push() function pointer is called. This is assumed to do something that transfers the buffer to the peer, one way or another. The _pull() case is reversed and conditional. If there's a buffer in the pen, grab it. If not, call the pull() function pointer, then grab it. When we move up a level into the Bin, we first come to the plan generation. The first thing done is to create the global cothread context, and then state for each of the elements. All the functionality is provided by a generalized library of sorts in cothread.[ch], which is where all the work of making it portable must be done (different chip architectures, changes in pthread guts, etc, will all render cothreads unusable). Also, the push() and pull() handler functions (soon to be merged into one, maybe switch()?) are also set for all the pads. Then in the iterate function (which has been abstracted out so each Bin subclass can provide its own), things actually get really simple. All it does is cothread_switch() to some arbitrary element's cothread state. Currently it choses the first on the list, but this can be modified later to provide context-driver functionality (where _iterate() actually terminates at some point, say when that driver starts running a second time. there are reasons for this, albeit complicated). So, the execution trace gets interesting, but it boils down the the simple fact that any time a gst_pad_push() is done, the holding pen is filled and the appropriate switch handler is called (push() currently), which in the current implementation does nothing but do a cothread_switch() to the peer element. A gst_pad_pull() is similar except it switches whenever the holding pen is empty and it wants to get a buffer. The key that makes it work in both chain- and loop-function based environments is the wrapper than actually runs the element. When you create a cothread state, you have to provide a function pointer as the first bit of code to run when the cothread is actually created. This function is always provided by the Bin, and handles both cases. In the loop-function based case, it just calls the element's loop function. A neat trick is that it does so in a while(1) loop, so the element's 'loop function' doesn't actually have to loop. Not sure this is useful, but it's 100% free in the usual case, so why not? The chain-function based approach consists of a loop (while(1)) that runs gst_pad_pull() and calls the chain() function for that pad. Simple, eh? Where it breaks is bascially the same problem you find with pure chain-function setups, and that's in elements with mutliple inputs. In this scheme they're currently round-robined. If they don't happen to want to take inputs on a 1-for-1 basis while producing output regularly, one leg or another is going to get lopped off, causing all kinds of scheduling nightmares. The solution in that case is: Don't Do It! Use loop-function based elements in any situation that comes even close to that. This changes things in the sense that it's now not necessary to build things like the mp3parse and ac3parse elements, since mpg123 and ac3dec are bitstream-based elements and a good bitstream/getbits library should be capable of pulling data on demand. This way you just provide a function that does a gst_pad_pull() and be done with it. What is not dealt with yet is state transitions in said elements. Consider the case of a MPEG video decoder that's got state to worry about. If you were to switch states and cause the plan to be regenerated (I'll try to do that write-up tomorrow sometime), you could end up switching back into the decoder element in the middle of a decode, while providing it data from a brand-new stream. Thus, some reset mechanism should be provided for elements that need such functionality. Entirely optional, of course. Also, some of our discussions at work today pointed out the fact that we need a pretty complete implementation of select(2) for inter-element connections (I'm skirting around some competing terminology here... <g>), which in the pure co-routine case means just 'randomly' switching to some sourcing element. Whichever one comes back first 'wins the select'. In the thread-boundary case things get more interesting, since you'll have a queue attached. In fact, I believe that the Connection (queue) case isn't handled yet, I'll have to think through how that works and try it out. A more generally useful capability is that of non-blocking pull()'s. Strictly speaking, this is a bit of an oxymoron in most cases, thus it's back to the queue case to figure out how that will work here. That's the only time you should really be blocking (on an empty queue), though that would mean that someone isn't doing their job anyway.... Also, note that all of the cothread code is partitioned into the Bin class. This means that you can relatively trivially create your own subclass (or even override just that part of an existing class for a given instance [maybe?]) with custom scheduling routines, whether cothread-based or not. Though doing it without cothreads is going to get rather hairy in a lot of cases. That's where some of the Pipeline intelligence comes into play, and the virtual FROZEN state. Yet another thing to write up... Sigh, I'm going to go sleep now. TTYAL, Omega Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/ Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/ __ / \ SEUL: Simple End-User Linux - http://www.seul.org/ | | M E G A Helping Linux become THE choice _\ /_ for the home or office user |