[Libvisual-devel] Comments & Suggestions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'm very excited about finding Libvisual!  I wanted to take a
few days to grok how it all works and meshes together before
I posted my comments, and I _think_ I'm getting a handle on
this.  If I'm completely off-base, feel free to bash me upside
the head.

Before my comments start, let me point out that in case it hasn't
been reported yet, the frame-limiter for 0.2.0 is broken, at least
for the XMMS plugin.  It's chewing up all available CPU time.  I
profiled one of the simpler plugins (the scope) to check this
out and when sized very small (roughly 100 X 50 by my eye) the
"render" callback was still getting called more than 600 times
per second even though frames were (theoretically) being limited
to 30/second.

So here're my initial thoughts and suggestions, for what they're
worth:

It would appear to me that instead of pcm data being "pushed" to the
visualizer engine as is done in the visualizer plug-in models put
forth by XMMS and WinAmp, the pcm data in Libvisual is being "pulled"
via an Input plugin's VisPluginInputUploadFunc or by implementing
an upload callback.

The "pull" model works just fine, but Libvisual needs to add
some means for synchronization and non-audio data or else you'll
cut out an entire class of visualizations.  That's probably
unclear, so let me give an example:

Let's say I'm decoding an MPEG or AVI file using FFMPEG
(ffmpeg.sourceforge.net) - as I decode, I'm going to get
interleaved "packets" of audio and video data.  Depending on the
codec, the display times between individual video frames can vary
wildly so a simple latency calculation won't synch the video to the
audio.  Instead, the codec (or ffmpeg or the application itself)
calculates and provides a "presentation time stamp" in stream
relative time of when to display each frame of video.  The video
visualization plugin's job is to buffer frames, and display them
when the proper time comes.  So, if you stick with the "pull" data
model, you need to have the ability for the application to expose
a method for the visualization plugin to get the current playing
stream time.  Likewise, there needs to be a method to query
visualization plugins to see if they can accept and handle certain
special data types (so I don't exacerbate entropy sending video
packets to Goom, for instance) and an API for getting that special
data to those that do (as simple as a "userdata" callback in
addition to the "render" callback.)

In the same light, consider a Karaoke (mp3+cdg or ogg+cdg) plugin.
There are actually two data sources: a standard .mp3 or .ogg file,
and a separate .cdg file that contains the karaoke lyrics and
graphics that were ripped out of the subchannel data of an audio
CD+G disc.

A karaoke visualizer would get the .cdg data sent to it as a one shot
package at the start of the stream, and before returning from a
"song_start" callback (and there should be one of these, as well as
callbacks for "song_pause", "song_resume", and "song end" so the
thing ain't chewing up CPU cycles if Joe User needs to pause audio
to do something CPU-intensive for a bit) it decodes the CDG data
into frames, and generates presentation time stamps for each frame.
Again, during song playback, it doesn't care a whiff about pcm data,
it just wants to monitor stream playback time and display each frame
synched to the audio output.  In fact, there's another standard
using MIDI with karaoke lyrics that may not generate any pcm data
at all (and while I'm at it, I might want my non-karaoke MIDI file
player to generate data for a graphic piano keyboard visualizer in
pass-through or hardware-synth mode, or use "regular" audio
visualizers when using a software-synth that creates regular pcm data.)

In both of the above cases, the "render" callback may or may not
actually draw a frame if the call happens between two presentation
time stamps. For that matter, Libvisual should not assume that
any Actor actually draws a frame during any particular "render"
call unless the Actor tells Libvisual that it _did_ draw a frame,
and likewise there needs to be a method or callback to the application
to let it know that a new frame is available for drawing so that
an unchanged video buffer isn't getting re-blitted without cause)  As
an application writer, I obviously have a vested interest in any
CPU-saving tweak possible.

As far as the interface for querying which special data types a
visualization plugin can accept, I think something similar to
the way a WinAmp input plug-in exposes which filetypes it can
handle would work great.  Off the top of my head - here are a
few special data types of interest:

Streaming Video
Karoke CDG
Stream Tags (i.e. ID3V2 or APE)
Either in addition to, or in place of the VisSongInfo stuff,
you could also abstract special data types for:
Artist/Track Title/Album/Year
Still Images Associated with an artist/track (album cover art,
artist photos, etc. in JPEG, BMP, GIF, PNG, etc.)

I also want to point out an obvious omission here (understandable,
since you probably weren't considering a video class of visualizers.)
In a addition to RGB and OpenGL display types, there should also be
a YUV display type.  The same rules of no-blit/no-morph between
disimilar display types should apply. YUV420P (SDL type:
SDL_YV12_OVERLAY) format should be sufficient out of the starting
gate.

I also think there should be a non-gui method for getting/setting
individual visualizer configuration settings by a serialized string.
The application may or may not be able to decipher the contents of
the command strings for a particular visualizer plug-in, but they
can still be thought of as a "bookmark" of what the user likes
above and beyond the last-used configuration settings.  Wouldn't
it also be nice if the plug-ins exposed to the application
language neutral author/copyright/credits/plug-in name & version
info.  A method of getting a list of presets (for those visualizers
that support them) and selecting one by non-gui means would also
be cool.

Finally, on my wish list (but I realize the added YUV mode adds
another layer of complexity to this and alpha-channelled shaped
text is probably out of the question for that) I'd love to see
some sort of overlay engine: Perhaps I, as an application author,
want a scrolling ticker at the bottom of the screen announcing
sports scores or happy hour specials (!) or some On Screen Display
info for a TV or radio tuner card when I'm running in full-screen
mode (or otherwise.)  Not that big of a deal for me to add at
application level, but a guy can dream, right?

In VisUI, a useful addition would be a tab or page widget for
implementing multiple dialog box pages.  I think wxWidgets has
about the snazziest way of specifying a platform-neutral dialog
box that I've seen.

One last question, is frequency spectrum analysis being done even
for those visualizers that don't need it?  If there isn't, there
should be a way to turn off those expensive FFT's if they don't
help a given visualizer (and/or morph) if there isn't.

Keep up the great work!

S.W.

[Libvisual-devel] Comments & Suggestions

Audio visualisation library

[Libvisual-devel] Comments & Suggestions