From: Scott W. <bau...@co...> - 2005-02-15 07:35:40
|
I'm very excited about finding Libvisual! I wanted to take a few days to grok how it all works and meshes together before I posted my comments, and I _think_ I'm getting a handle on this. If I'm completely off-base, feel free to bash me upside the head. Before my comments start, let me point out that in case it hasn't been reported yet, the frame-limiter for 0.2.0 is broken, at least for the XMMS plugin. It's chewing up all available CPU time. I profiled one of the simpler plugins (the scope) to check this out and when sized very small (roughly 100 X 50 by my eye) the "render" callback was still getting called more than 600 times per second even though frames were (theoretically) being limited to 30/second. So here're my initial thoughts and suggestions, for what they're worth: It would appear to me that instead of pcm data being "pushed" to the visualizer engine as is done in the visualizer plug-in models put forth by XMMS and WinAmp, the pcm data in Libvisual is being "pulled" via an Input plugin's VisPluginInputUploadFunc or by implementing an upload callback. The "pull" model works just fine, but Libvisual needs to add some means for synchronization and non-audio data or else you'll cut out an entire class of visualizations. That's probably unclear, so let me give an example: Let's say I'm decoding an MPEG or AVI file using FFMPEG (ffmpeg.sourceforge.net) - as I decode, I'm going to get interleaved "packets" of audio and video data. Depending on the codec, the display times between individual video frames can vary wildly so a simple latency calculation won't synch the video to the audio. Instead, the codec (or ffmpeg or the application itself) calculates and provides a "presentation time stamp" in stream relative time of when to display each frame of video. The video visualization plugin's job is to buffer frames, and display them when the proper time comes. So, if you stick with the "pull" data model, you need to have the ability for the application to expose a method for the visualization plugin to get the current playing stream time. Likewise, there needs to be a method to query visualization plugins to see if they can accept and handle certain special data types (so I don't exacerbate entropy sending video packets to Goom, for instance) and an API for getting that special data to those that do (as simple as a "userdata" callback in addition to the "render" callback.) In the same light, consider a Karaoke (mp3+cdg or ogg+cdg) plugin. There are actually two data sources: a standard .mp3 or .ogg file, and a separate .cdg file that contains the karaoke lyrics and graphics that were ripped out of the subchannel data of an audio CD+G disc. A karaoke visualizer would get the .cdg data sent to it as a one shot package at the start of the stream, and before returning from a "song_start" callback (and there should be one of these, as well as callbacks for "song_pause", "song_resume", and "song end" so the thing ain't chewing up CPU cycles if Joe User needs to pause audio to do something CPU-intensive for a bit) it decodes the CDG data into frames, and generates presentation time stamps for each frame. Again, during song playback, it doesn't care a whiff about pcm data, it just wants to monitor stream playback time and display each frame synched to the audio output. In fact, there's another standard using MIDI with karaoke lyrics that may not generate any pcm data at all (and while I'm at it, I might want my non-karaoke MIDI file player to generate data for a graphic piano keyboard visualizer in pass-through or hardware-synth mode, or use "regular" audio visualizers when using a software-synth that creates regular pcm data.) In both of the above cases, the "render" callback may or may not actually draw a frame if the call happens between two presentation time stamps. For that matter, Libvisual should not assume that any Actor actually draws a frame during any particular "render" call unless the Actor tells Libvisual that it _did_ draw a frame, and likewise there needs to be a method or callback to the application to let it know that a new frame is available for drawing so that an unchanged video buffer isn't getting re-blitted without cause) As an application writer, I obviously have a vested interest in any CPU-saving tweak possible. As far as the interface for querying which special data types a visualization plugin can accept, I think something similar to the way a WinAmp input plug-in exposes which filetypes it can handle would work great. Off the top of my head - here are a few special data types of interest: Streaming Video Karoke CDG Stream Tags (i.e. ID3V2 or APE) Either in addition to, or in place of the VisSongInfo stuff, you could also abstract special data types for: Artist/Track Title/Album/Year Still Images Associated with an artist/track (album cover art, artist photos, etc. in JPEG, BMP, GIF, PNG, etc.) I also want to point out an obvious omission here (understandable, since you probably weren't considering a video class of visualizers.) In a addition to RGB and OpenGL display types, there should also be a YUV display type. The same rules of no-blit/no-morph between disimilar display types should apply. YUV420P (SDL type: SDL_YV12_OVERLAY) format should be sufficient out of the starting gate. I also think there should be a non-gui method for getting/setting individual visualizer configuration settings by a serialized string. The application may or may not be able to decipher the contents of the command strings for a particular visualizer plug-in, but they can still be thought of as a "bookmark" of what the user likes above and beyond the last-used configuration settings. Wouldn't it also be nice if the plug-ins exposed to the application language neutral author/copyright/credits/plug-in name & version info. A method of getting a list of presets (for those visualizers that support them) and selecting one by non-gui means would also be cool. Finally, on my wish list (but I realize the added YUV mode adds another layer of complexity to this and alpha-channelled shaped text is probably out of the question for that) I'd love to see some sort of overlay engine: Perhaps I, as an application author, want a scrolling ticker at the bottom of the screen announcing sports scores or happy hour specials (!) or some On Screen Display info for a TV or radio tuner card when I'm running in full-screen mode (or otherwise.) Not that big of a deal for me to add at application level, but a guy can dream, right? In VisUI, a useful addition would be a tab or page widget for implementing multiple dialog box pages. I think wxWidgets has about the snazziest way of specifying a platform-neutral dialog box that I've seen. One last question, is frequency spectrum analysis being done even for those visualizers that don't need it? If there isn't, there should be a way to turn off those expensive FFT's if they don't help a given visualizer (and/or morph) if there isn't. Keep up the great work! S.W. |
From: Dennis S. <sy...@yo...> - 2005-02-15 10:37:46
|
On Tue, 2005-02-15 at 02:37 -0500, Scott Watson wrote: > I'm very excited about finding Libvisual! I wanted to take a > few days to grok how it all works and meshes together before > I posted my comments, and I _think_ I'm getting a handle on > this. If I'm completely off-base, feel free to bash me upside > the head. Well, I am delighted to see someone excited by libvisual, and thus I will love to guide you through the mess we wrote :)! > Before my comments start, let me point out that in case it hasn't > been reported yet, the frame-limiter for 0.2.0 is broken, at least > for the XMMS plugin. It's chewing up all available CPU time. I > profiled one of the simpler plugins (the scope) to check this > out and when sized very small (roughly 100 X 50 by my eye) the > "render" callback was still getting called more than 600 times > per second even though frames were (theoretically) being limited > to 30/second. I removed the xmms frame limiter completely, it was very borked, and we will replace the client completely with a rewrite that uses libvisual-display. lvdisplay is a library we're working on that provides display abstraction in the same fashion as libvisual does with visualisation plugins. It's going to be very powerful and Vitaly is working on it. You can check out the work in libvisual-display within the CVS server. > So here're my initial thoughts and suggestions, for what they're > worth: > > It would appear to me that instead of pcm data being "pushed" to the > visualizer engine as is done in the visualizer plug-in models put > forth by XMMS and WinAmp, the pcm data in Libvisual is being "pulled" > via an Input plugin's VisPluginInputUploadFunc or by implementing > an upload callback. Internally, the data is pushed to the plugins, but on the frontside we pull it. This is because, VisInput also has plugins VisInputPlugins. These plugins can act as a capture for alsa, jack, esd and such. We've written this because we have a standalone client in thought. So the callback is nothing more than the pull implementation what normally gets done in the plugin. > The "pull" model works just fine, but Libvisual needs to add > some means for synchronization and non-audio data or else you'll > cut out an entire class of visualizations. That's probably > unclear, so let me give an example: > > Let's say I'm decoding an MPEG or AVI file using FFMPEG > (ffmpeg.sourceforge.net) - as I decode, I'm going to get > interleaved "packets" of audio and video data. Depending on the > codec, the display times between individual video frames can vary > wildly so a simple latency calculation won't synch the video to the > audio. Instead, the codec (or ffmpeg or the application itself) > calculates and provides a "presentation time stamp" in stream > relative time of when to display each frame of video. The video > visualization plugin's job is to buffer frames, and display them > when the proper time comes. So, if you stick with the "pull" data > model, you need to have the ability for the application to expose > a method for the visualization plugin to get the current playing > stream time. Likewise, there needs to be a method to query > visualization plugins to see if they can accept and handle certain > special data types (so I don't exacerbate entropy sending video > packets to Goom, for instance) and an API for getting that special > data to those that do (as simple as a "userdata" callback in > addition to the "render" callback.) Well, actually it is possible to semi push it before render, by writing the VisInput layer yourself. It isn't very hard. However you're right. What we're going to do is rewrite VisAudio within a few months, to add support for more audio formats, and floating point audio. As well cooler detections, BPM detection. So we could take this as well into the design. I am not sure about the special data types, as we try to keep the framework semi specific. > In the same light, consider a Karaoke (mp3+cdg or ogg+cdg) plugin. > There are actually two data sources: a standard .mp3 or .ogg file, > and a separate .cdg file that contains the karaoke lyrics and > graphics that were ripped out of the subchannel data of an audio > CD+G disc. > > A karaoke visualizer would get the .cdg data sent to it as a one shot > package at the start of the stream, and before returning from a > "song_start" callback (and there should be one of these, as well as > callbacks for "song_pause", "song_resume", and "song end" so the > thing ain't chewing up CPU cycles if Joe User needs to pause audio > to do something CPU-intensive for a bit) it decodes the CDG data > into frames, and generates presentation time stamps for each frame. > Again, during song playback, it doesn't care a whiff about pcm data, > it just wants to monitor stream playback time and display each frame > synched to the audio output. In fact, there's another standard > using MIDI with karaoke lyrics that may not generate any pcm data > at all (and while I'm at it, I might want my non-karaoke MIDI file > player to generate data for a graphic piano keyboard visualizer in > pass-through or hardware-synth mode, or use "regular" audio > visualizers when using a software-synth that creates regular pcm data.) Aah here we have a case! :) Well, what I suggest we could do is use the VisObject VisParam. Define a special KaroakeObject, and send this as a parameter to the plugin it's event loop the event loop gets called every frame before render, so it's suffecient to synchrose using this. > In both of the above cases, the "render" callback may or may not > actually draw a frame if the call happens between two presentation > time stamps. For that matter, Libvisual should not assume that > any Actor actually draws a frame during any particular "render" > call unless the Actor tells Libvisual that it _did_ draw a frame, > and likewise there needs to be a method or callback to the application > to let it know that a new frame is available for drawing so that > an unchanged video buffer isn't getting re-blitted without cause) As > an application writer, I obviously have a vested interest in any > CPU-saving tweak possible. Alright, we could like return '0' or '1' within the render function if a frame is drawn, would this be suffecient ? We also have an interest in CPU-saving tweaks, since, with VJing where we combine multiple visuals, and ran them through transforms. This is very important :) > As far as the interface for querying which special data types a > visualization plugin can accept, I think something similar to > the way a WinAmp input plug-in exposes which filetypes it can > handle would work great. Off the top of my head - here are a > few special data types of interest: > > Streaming Video > Karoke CDG > Stream Tags (i.e. ID3V2 or APE) > Either in addition to, or in place of the VisSongInfo stuff, > you could also abstract special data types for: > Artist/Track Title/Album/Year > Still Images Associated with an artist/track (album cover art, > artist photos, etc. in JPEG, BMP, GIF, PNG, etc.) I think we could work this out using the VisObject system, and set up some guidelines. We could also create a frontend around this. It's very intresting to see this suggestions because they contain things I never thought off! > I also want to point out an obvious omission here (understandable, > since you probably weren't considering a video class of visualizers.) > In a addition to RGB and OpenGL display types, there should also be > a YUV display type. The same rules of no-blit/no-morph between > disimilar display types should apply. YUV420P (SDL type: > SDL_YV12_OVERLAY) format should be sufficient out of the starting > gate. Unless the need is REALLY there, I don't consider YUV, because it adds a load of complexion. I think the basics are actually easy to add but the conversion (when needed) between yuv/rgb and vice versa is a pain. Tho, if the need is there with suffecient reason, we will implement it (and might like some help with it). > I also think there should be a non-gui method for getting/setting > individual visualizer configuration settings by a serialized string. > The application may or may not be able to decipher the contents of > the command strings for a particular visualizer plug-in, but they > can still be thought of as a "bookmark" of what the user likes > above and beyond the last-used configuration settings. Wouldn't > it also be nice if the plug-ins exposed to the application > language neutral author/copyright/credits/plug-in name & version > info. A method of getting a list of presets (for those visualizers > that support them) and selecting one by non-gui means would also > be cool. We're working on the serialization :), and i18n support regarding languages would be nice yeah :). > Finally, on my wish list (but I realize the added YUV mode adds > another layer of complexity to this and alpha-channelled shaped > text is probably out of the question for that) I'd love to see > some sort of overlay engine: Perhaps I, as an application author, > want a scrolling ticker at the bottom of the screen announcing > sports scores or happy hour specials (!) or some On Screen Display > info for a TV or radio tuner card when I'm running in full-screen > mode (or otherwise.) Not that big of a deal for me to add at > application level, but a guy can dream, right? we've got visual_video_blit_overlay. Which for RGB supports alpha channels. It's rather fast as it's done in mmx. What you describe there, with the overlay is very possible at this very moment. Regarding the fullscreen stuff, lvdisplay will be really helping with this, it will be multihead aware so you can also use dualhead setups, one for control, 1 with the graphics for example. But I suspect that you have the need for YUV, so we could start investigating how to implement this best :) > In VisUI, a useful addition would be a tab or page widget for > implementing multiple dialog box pages. I think wxWidgets has > about the snazziest way of specifying a platform-neutral dialog > box that I've seen. Could you point me to some wxWidgets docs ? VisUI is there to be extended :) > One last question, is frequency spectrum analysis being done even > for those visualizers that don't need it? If there isn't, there > should be a way to turn off those expensive FFT's if they don't > help a given visualizer (and/or morph) if there isn't. I agree, you can control it by hand, but not easily. The VisAudio replacement that is upcoming (but atleast 2 months away) will fix this all! :) Thanks for your interests, and keep in mind, we're really in diehard development. We're far from stable both API and ABI. But on the positive side, we're open for suggestions, and like to get it right now and not in three years when we've had 4 API stable branches :) Keep your suggestions coming as they only improve what we write :) Thanks, Dennis |