This list is closed, nobody may subscribe to it.
| 2000 |
Jan
|
Feb
|
Mar
|
Apr
(36) |
May
(15) |
Jun
(7) |
Jul
(8) |
Aug
(15) |
Sep
(3) |
Oct
(2) |
Nov
(59) |
Dec
(18) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(43) |
Feb
(11) |
Mar
(10) |
Apr
(39) |
May
(22) |
Jun
(25) |
Jul
(10) |
Aug
(4) |
Sep
(3) |
Oct
(2) |
Nov
(22) |
Dec
(2) |
| 2002 |
Jan
(4) |
Feb
(20) |
Mar
(50) |
Apr
(13) |
May
(39) |
Jun
(36) |
Jul
(14) |
Aug
(16) |
Sep
(3) |
Oct
(23) |
Nov
(34) |
Dec
(16) |
| 2003 |
Jan
(37) |
Feb
(37) |
Mar
(2) |
Apr
(14) |
May
(10) |
Jun
(23) |
Jul
(33) |
Aug
(16) |
Sep
(27) |
Oct
(17) |
Nov
(76) |
Dec
(10) |
| 2004 |
Jan
(89) |
Feb
(13) |
Mar
(13) |
Apr
(14) |
May
(43) |
Jun
(27) |
Jul
(34) |
Aug
(14) |
Sep
(4) |
Oct
(15) |
Nov
(14) |
Dec
(33) |
| 2005 |
Jan
(9) |
Feb
(4) |
Mar
(12) |
Apr
(19) |
May
|
Jun
(4) |
Jul
(1) |
Aug
(3) |
Sep
(2) |
Oct
(3) |
Nov
|
Dec
(3) |
| 2006 |
Jan
(3) |
Feb
(4) |
Mar
(7) |
Apr
(11) |
May
(3) |
Jun
(7) |
Jul
|
Aug
(8) |
Sep
(11) |
Oct
(2) |
Nov
|
Dec
|
| 2007 |
Jan
|
Feb
(5) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(7) |
Nov
(7) |
Dec
|
| 2008 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
(8) |
Jul
(3) |
Aug
|
Sep
(6) |
Oct
(4) |
Nov
|
Dec
(2) |
| 2010 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
|
From: Alastair M. <ala...@aj...> - 2000-05-01 17:51:21
|
Just adding my 2 cents worth: Arne Schirmacher wrote: [some snippage may have occurred] > 1. make a library. That shouldn't be too difficult. Just put all your *.o files > into a lib and write a public header file. Please make it g++ compatible from > the start Ditto. I tend to favor C++ for apps. This doesn't necessarily mean coming up with class wrappers, just so long as g++ doesn't barf on the C code. > ... My first guess would be int dv_decode(void > *compressed_data, void *uncompressed_data, struct some_dv_info *info). The info > struct should have some basic information like width, height, result code etc. > The pointer compressed_data is one full frame of NTSC or PAL data. > Uncompressed_data should be the RGB format as it is used in playdv.c right now. Who handles the audio stream(s)? Is the split done upstream of, downstream of, or by the dv_decode()? (Probably by it, perhaps also with options or alternate functions to return just the image data or just one or more audio channel data.) Also, some flags/options/whatever to discriminate between interlaced and progressive video streams (is there anything in the signal that indicates this or does the user just have to know?) and perhaps provide for de-interlacing (scan doubling with optional doubling the frame rate). Yes, this can all be done by the application post-decompression if you really don't want it here. For editing apps (eg Broadcast 2000 mentioned below, although I don't know if that one supports it) some way of getting at timecode data is useful. An alternative to simply returning pointers to various uncompressed data streams might be to provide for pointers to functions that get called for the various data streams at the end of each frame. The former would likely be more useful in editing apps, the latter more useful in playback apps. > 2. make a video4linux compatible framework. I do not have any experience with > v4l or v4l2, but since it is supposed to be a general purpose framework, we > should extend it such that a camcorder with the ieee1394 subsystem and libdv > codec can be used just as any other TV frame grabbing card. Only with superior > quality of course. I like this idea. It'd be nice to have xawtv or whatever able to play 1394 dv cam input as well as composite or tuner input. (How well apps like xawtv are structured to make this simple I don't know. In an ideal world there might be some sort of general "stream of images" connection the way most files are considered "stream of characters", with lower level layers taking care of what kind of stream that really is.) > 3. Broadcast2000 support. I did not try it yet, don't know what is required for > supporting it, but having DV support in broadcast2000 would be very cool. Bcast 2000 officially doesn't support DV because the author doesn't have any DV equipment. But yes, it'd be a nice addition. (Personally I haven't had much success with any sort of video under bcast2000, but have used its audio capabilities quite a bit. There are some UI things I'd change. It'd be nice to see a NLE/compositing suite for Linux that can handle almost any format via plugins.) Cheers, -- Alastair |
|
From: James B. <ja...@ex...> - 2000-05-01 14:31:04
|
Arne Schirmacher wrote: ... > 4. Stability, Quality. Right now libdv is a bit, er... , sensitive to its input > data. If it is invalid data it just crashes. The DV data stream from a > camcorder is not guaranteed to be error-free, packets may be dropped, the > program should not crash if it is feed with wrong data. Could we start a collection of data streams with errors? -- James Bowman ja...@ex... |
|
From: Arne S. <ar...@sc...> - 2000-05-01 09:42:24
|
Well, I didn't actually wrote the xdvplay program, I just added a couple of
calls to my AVI library to the existing playdv.c program.
Here is what I think is needed most:
1. make a library. That shouldn't be too difficult. Just put all your *.o files
into a lib and write a public header file. Please make it g++ compatible from
the start (early versions of libraw suffered from the same problem and the
author had to fix it). My first guess would be int dv_decode(void
*compressed_data, void *uncompressed_data, struct some_dv_info *info). The info
struct should have some basic information like width, height, result code etc.
The pointer compressed_data is one full frame of NTSC or PAL data.
Uncompressed_data should be the RGB format as it is used in playdv.c right now.
This would be the first thing to do, just to allow other people start working.
If we have this I would try to merge it with an existing player software like
xanim for example.
2. make a video4linux compatible framework. I do not have any experience with
v4l or v4l2, but since it is supposed to be a general purpose framework, we
should extend it such that a camcorder with the ieee1394 subsystem and libdv
codec can be used just as any other TV frame grabbing card. Only with superior
quality of course.
When I have my SuSE 6.4 DVD (due May 10), I am going to buy a TV card and play
with this whole v4l thing and also with broadcast2000. Right now my system is
so outdated that these programs do not work at all. Dvgrab runs fine though...
3. Broadcast2000 support. I did not try it yet, don't know what is required for
supporting it, but having DV support in broadcast2000 would be very cool.
4. Stability, Quality. Right now libdv is a bit, er... , sensitive to its input
data. If it is invalid data it just crashes. The DV data stream from a
camcorder is not guaranteed to be error-free, packets may be dropped, the
program should not crash if it is feed with wrong data.
Arne
-----Original Message-----
From: Erik Walthinsen [SMTP:om...@cs...]
Sent: Monday, May 01, 2000 1:21 AM
To: Arne Schirmacher
Cc: lib...@li...
Subject: Re: xdvplay 0.2 is available
On Sun, 30 Apr 2000, Arne Schirmacher wrote:
> This is a slightly modified version of the playdv.c program from the libdv
> project. This version now reads AVI files instead of plain DV data files.
The plan is to convert libdv into a real library in the near future.
Since you've actually written an application with it, you're probably good
resource for ideas on the exact API. What are your thoughts on that
matter? Check the archives for a few ideas (at least, it's supposed to be
in the archives, let me know if you can't find it) we've had so far.
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Scott F. J. <sc...@fl...> - 2000-05-01 07:17:26
|
API thoughts.
Quicktime4linux will be a natural place to include a libdv codec.
Quicktime supports dvc, and it appears that the 120000 bytes are
just stored on a per-frame basis.
With the quicktime4linux library, one can either read an entire
frame into a buffer, or get a file descriptor and read a frame
out of a quicktime file from a stream.
It would make sense for us to support two front-ends on our code
as well: buffer and stream. I realize for most uses, we will want to
focus
on the stream method, but if somebody buffers an image, we shouldn't
prevent them
from decoding it.
We should create two layers to the API, a decoding (and in the future
encoding)
layer, and a file/buffer layer that allows us to write front-ends for
raw DV files, DV AVI files, DV frames stored in quicktime files, etc.
Because of the nature of the unshuffling of blocks, there is no
efficiency in having "scanline" based routines.
We should support generalized output image format converters,
and we should initially supply both RGB and YUV methods.
For example, a "base class" dv_image_t could be:
typedef struct dv_image_s {
int (*init)(); // Fn to initialize converter
int (*from_ycrcb_411)(dv_image_t *self, dv_block_t *bl, int row, int
column);
int (*from_ycrcb_420)(dv_image_t *self, dv_block_t *bl, int row, int
column);
void *ClientData;
} dv_image_t;
And a skeletal RGBImage converter would include:
RGBImage_new()
{
dv_image_t *im = (dv_image_t *)malloc(sizeof(dv_image_t));
im->init = NULL;
im->fromycrcb411 = dv_ycrcb_411_block;
im->fromycrcb420 = dv_ycrcb_420_block;
im->ClientData = (guint8 *)malloc(sizeof(guint8)720*576*4);
return im;
}
RGBImage_del(dv_image_t *im)
{
free(im->ClientData);
free(im);
}
Use becomes something like:
dv_image_t *image = RGBimage_new(); // Object to hold output data.
// Contains pointer to RGB
converter fn.
FILE *fp = fopen("my.dv", "r");
while(status = dv_read_frame(image, fp)) {
// Do something with image
}
RGBimage_del(image);
If we find we have a lot of "state" variables, we can
create a "decoder" object, pass the stream to this object,
and store the "state" variables in it:
dv_t *dv = dv_new();
FILE *fp = fopen("my.dv", "r");
dv_set_fp(dv, fp);
dv_set_quality(dv, DV_QUALITY_HIGH);
dv_set_voutput(dv, DV_VIDEO_OUTPUT_RGB); // vs. DV_VIDEO_OUTPUT_YUV
dv_set_aoutput(dv, DV_AUDIO_OUTPUT_NONE);
dv_read_frame(dv);
isPAL = dv_isPAL(dv);
At the higher level, we can write wrappers around DV stream, AVI,
quicktime, etc.
At this level, we may want to be able to seek to a specific frame,
or rewind, or play backwards, etc. When the lower level routines
are called, they should just operate on the buffer as given.
The higher level for raw dv streams is pretty simple, and we may
want to include it in our library. The one for quicktime should probably
be written within quicktime4linux, rather than within libdv.
|
|
From: Erik W. <om...@cs...> - 2000-04-30 23:29:08
|
On Sun, 30 Apr 2000, Arne Schirmacher wrote:
> This is a slightly modified version of the playdv.c program from the libdv
> project. This version now reads AVI files instead of plain DV data files.
The plan is to convert libdv into a real library in the near future.
Since you've actually written an application with it, you're probably good
resource for ideas on the exact API. What are your thoughts on that
matter? Check the archives for a few ideas (at least, it's supposed to be
in the archives, let me know if you can't find it) we've had so far.
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Arne S. <ar...@sc...> - 2000-04-30 21:18:52
|
Hi all, download xdvplay 0.2 from http://www.schirmacher.de/arne/xdvplay/xdvplay_0.2.tar.gz . This is a slightly modified version of the playdv.c program from the libdv project. This version now reads AVI files instead of plain DV data files. Arne |
|
From: James B. <ja...@ex...> - 2000-04-30 17:00:45
|
All, Just a courtesy note on my recent checkins. This is the assembler recode of dv_parse_ac_coeffs_pass0. I measured an approximate 10% improvement running the benchmark with pond.dv. A diff of images between the C and asm versions found no differences, so I'm reasonably confident that the implementations agree. -- James Bowman ja...@ex... |
|
From: Erik W. <om...@cs...> - 2000-04-30 06:21:11
|
FYI, for anyone interested in changes made to the CVS version of libdv,
there's a libdv-cvs list to which all commit messages go. Latency is
about 5 seconds to my mailbox, which is about as realtime as you're going
to get for at least a little while longer... ;-}
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Scott F. J. <sc...@fl...> - 2000-04-30 05:40:12
|
For the current ycrcb_to_rgb "C" version, this appears a bit faster.
We'll still see major improvement with the MMX version, but we may
as well get this one clean and crisp.
Changes: lut used for luma only. Chroma scales properly, sum green
impact
outside inner loop.
--- ycrcb_to_rgb32.c.orig Sat Apr 29 22:19:28 2000
+++ ycrcb_to_rgb32.c Sat Apr 29 22:25:00 2000
@@ -27,10 +27,6 @@
#include <glib.h>
gint32 ylut[256];
-gint32 impactcbr[256];
-gint32 impactcbg[256];
-gint32 impactcrg[256];
-gint32 impactcrb[256];
static int initted = 0;
static void dv_ycrcb_init()
@@ -40,10 +36,6 @@
i<256;
++i) {
ylut[i] = 298 * ((signed char)(i) + 128 - 16);
- impactcbr[i] = 409 * (signed char)(i);
- impactcbg[i] = 100 * (signed char)(i);
- impactcrg[i] = 208 * (signed char)(i);
- impactcrb[i] = 516 * (signed char)(i);
}
}
@@ -56,12 +48,11 @@
for(i=0;
i<height*180;
i++) {
- int cr = *cr_frame++; // +128
- int cb = *cb_frame++; // +128;
- int cbr = impactcbr[cb];
- int cbg = impactcbg[cb];
- int crg = impactcrg[cr];
- int crb = impactcrb[cr];
+ signed char cr = (signed char)*cr_frame++; // +128
+ signed char cb = (signed char)*cb_frame++; // +128;
+ int cbr = 409 * cb;
+ int cbcrg = 208 * cb + 100 * cr;
+ int crb = 516 * cr;
int j;
for(j=0;
@@ -69,7 +60,7 @@
j++) {
gint32 y = ylut[*y_frame++];
gint32 r = (y + cbr) >> 8;
- gint32 g = (y - cbg - crg) >> 8;
+ gint32 g = (y - cbcrg) >> 8;
gint32 b = (y + crb ) >> 8;
*rgb_frame++ = CLAMP(r,0,255);
*rgb_frame++ = CLAMP(g,0,255);
|
|
From: Stefan L. <lu...@be...> - 2000-04-29 20:23:44
|
Hi, so thats my suggested color correction adapted for current YUV411. -- mfg Stefan Lucke (lu...@be...) |
|
From: Scott F. J. <sc...@fl...> - 2000-04-29 09:40:46
|
I made a dumb mistake, and I believe our impacts are swapped in G: 100 and 208 multiplies should read 208 and 100 in the 411 ycrcb_to_rgb stuff. The MMX code will make this moot for most Intel users, but a big whoops, nonetheless. Also, in the C code, it's only worth having a lut for luma. The crcb components seem faster as multiplies. |
|
From: Erik W. <om...@cs...> - 2000-04-28 21:31:02
|
I'm starting to put various documents up on libdv.sourceforge.net/tmp/ in lieu of a better place to put them (that's in the works). The first document details (most of) the process I went through in designing an optimized MMX routine to do yuv2rgb conversion. Said routine is almost but not quite ready for general use, it needs wrappers to be useful, and I've been sufficiently busy to have not gotten around to that yet ;-( A copy of the source is up as well, in a form designed mostly for MPEG decoders. It takes a 16x16 and two 8x8 planes and outputs a 16x16 RGB plane. If anyone has similar documents they would like to have put there, send me a copy. They should be OSD-compatibly licensed. There will eventually bit a much better place for these, this is just a stopgap. Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/ Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/ __ / \ SEUL: Simple End-User Linux - http://www.seul.org/ | | M E G A Helping Linux become THE choice _\ /_ for the home or office user |
|
From: Erik W. <om...@cs...> - 2000-04-28 20:36:35
|
For anyone looking at/working on libdv, please subscribe to the libdv-dev sourceforge list, so we can keep everything together in one place. There's a lot of stuff happening to libdv right now, and it'll help a lot if everyone knows what everyone else is working on. To subscribe: http://lists.sourceforge.net/mailman/listinfo/libdv-dev TIA, Omega Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/ Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/ __ / \ SEUL: Simple End-User Linux - http://www.seul.org/ | | M E G A Helping Linux become THE choice _\ /_ for the home or office user |
|
From: Erik W. <om...@cs...> - 2000-04-28 20:33:14
|
On Fri, 28 Apr 2000, Stefan Lucke wrote:
> The following patch includes these modification for both YUV411 and
> YUV420.
The patch applies against the 420, but 411 has been updated to be faster
and more correct (see the patch manager on sourceforge). That said, the
PAL sources I have do look a lot better now, thanks ;-) I'll apply it to
CVS in a couple minutes.
Gah, this bumps my MMX yuv2rgb routines up on the priority list again...
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Erik W. <om...@cs...> - 2000-04-28 20:19:13
|
On Fri, 28 Apr 2000, Scott F. Johnston wrote:
> Which is faster: unrolling the loops and growing past 12K
> or leaving the loops in and keeping it under?
It depends on the loops involved. If the bodies are very small (say,
<10cycles), it's worth unrolling them. If not, leave them loops. I'm
guessing we won't have too much trouble fitting under 12k regardless, but
if we have to make a choice, it would go along the lines of total branches
when picking which ones to unroll.
> Switching to a block-based ycrcb_to_rgb gave me about 5%
> speed improvement over full-frame conversion. This
> and other changes are with Erik for review.
> (Got rid of place.c, broke PAL decoding, repackaged
> closer to library form, added dv2ppm.c, ...)
I hope to look through that to day and try to merge everything into CVS.
> I propose we stay away from kernel hacks for as long
> as possible.
Right, we won't ever depend on kernel hacks. Besides, they don't even
exist yet... They just happen to make certain things faster in certain
situations.
> Ideally we should keep maintaining C-versions
> of each routine, to assist in cross-platform
> development. I'm sure the LinuxPPC folks will want this
> code, and if we keep it populated with too much
> ia32, they may revolt!
Exactly. The C version will always exist for everything, and be the
fallback position. If someone compiles it for an arch for which not
everything has been optimized, they get lots of C....
> We may also want to offer speed vs. quality options for
> our users: One improvement is to skip the third pass AC
> decoding and just return from dv_parse_video_segment()
> without calling dv_parse_ac_coeffs(seg). In playback,
> the additional error is barely noticable. (I had to use
> dv2ppm to grab frames and compare the results.)
> For some uses, like DV editting, where speed is more
> important than quality, I'd even be willing to forego
> *ALL* the AC decoding. Just give me 8x8 blocks of DC,
> which my tests show runs more than 3x faster-- those
> ducks look awfully blocky, though!
> There may be other intermediate "exit-points" in the decoder that
> we'll want to maintain as options. (Y_ONLY is another
> example: great for video editting when detail is needed,
> but color isn't.)
Yup. I can imagine quite a few options. The problem is the fact that
this means branches. This is where specialization comes into play. You
have various forms of the functions, and at some point a vtable is filled
with pointers to the currently appropriate ones (based on the criteria),
which are called as needed. This method is probably the prefered way of
supporting all Intel ia32 chips from one binary. The way the bitstream
code would be set up would let one compile multiple copies of the same
code with different names, each with some level of processor support
(ia32, MMX, SSE, etc.), which would then be specialized at a higher level.
I need to write up my ideas on the bitstream API and how specialization of
that sort works.
Now I just need to figure out what to do first ;-)
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Scott F. J. <sc...@fl...> - 2000-04-28 19:16:21
|
Which is faster: unrolling the loops and growing past 12K or leaving the loops in and keeping it under? Switching to a block-based ycrcb_to_rgb gave me about 5% speed improvement over full-frame conversion. This and other changes are with Erik for review. (Got rid of place.c, broke PAL decoding, repackaged closer to library form, added dv2ppm.c, ...) I propose we stay away from kernel hacks for as long as possible. Ideally we should keep maintaining C-versions of each routine, to assist in cross-platform development. I'm sure the LinuxPPC folks will want this code, and if we keep it populated with too much ia32, they may revolt! We may also want to offer speed vs. quality options for our users: One improvement is to skip the third pass AC decoding and just return from dv_parse_video_segment() without calling dv_parse_ac_coeffs(seg). In playback, the additional error is barely noticable. (I had to use dv2ppm to grab frames and compare the results.) For some uses, like DV editting, where speed is more important than quality, I'd even be willing to forego *ALL* the AC decoding. Just give me 8x8 blocks of DC, which my tests show runs more than 3x faster-- those ducks look awfully blocky, though! There may be other intermediate "exit-points" in the decoder that we'll want to maintain as options. (Y_ONLY is another example: great for video editting when detail is needed, but color isn't.) Erik Walthinsen wrote: > > On Thu, 27 Apr 2000, James Bowman wrote: > > > I took a look at module sizes and found that we're at about 16k of text: > > we're blowing the code cache, and when code moves around (like when you > > remove a module) different functions are cacheing against each other and > > changing performance in surprising ways. > Whee! ;-) > > > The code should get smaller as we optmize it, though, so this effect > > will go away. We should be safe if the decode loop fits in 12k. > Yeah, we can do that pretty easily. Eventually I expect that a sufficient > percentage of this will be written in ASM to keep it well below that. > Then of course we have to worry about blowing the data cache. That means > all sorts of tricks, most of which aren't set up yet (such as using > non-cachable pages, which means a kernel hack). > |
|
From: James B. <ja...@3d...> - 2000-04-28 18:49:04
|
So the file I just checked in does a simple benchmark - testbitstream.c - that extracts bits from a 10k buffer. Comparing the current implementation with the simpler new one, I get the following old: 29.0s new: 20.7s This isn't really surprising - the new code doesn't have any branches, and doesn't do any swab()s on the input stream. It's true that the new version does more shifts than the old version, but on PentiumII shifts are 1uop each, so the cost is more than offset by the benefit of zero branch mispredicts. Running the two versions in the playdv benchmark mode (with dv_parse_ac_coeffs disabled because of the unget issue), I get old: 33.2 new: 32.2 -- James Bowman ja...@ex... |
|
From: Alastair M. <ala...@aj...> - 2000-04-28 15:49:24
|
Well, I fixed the last (?) silly bug in my audio routines so that at least 16-bit 48khz data plays (and can be written to disk) properly. (Recording a 1khz tone and observing my decoded output in a program that displayed the waveform was very helpful). The latest is posted at http://www.ajwm.net/backfire/dv_audio/ Note that although the code is there to handle PAL, it needs a table of offsets to be filled in for that to work. But it does work with NTSC. The above URL also has a modified version of playdv.c that plays back the sound (kind of strangely if your system can't keep up a real time framerate), and a simple standalone program that just plays back the audio from a .dv file. Erik Walthinsen wrote: > > On Wed, 26 Apr 2000, Alastair Mayer wrote: > > > If anyone has a copy of this, could they summarize the relevant details > > sufficient for me to add decoding for 32kHz/12-bit and 44.1kHz/16-bit > > which it allows? > > Ack, I left it at work. I'll summarize it as compared to 314M when I get > into work tomorrow. If you get a chance to do this, it'd be great, thanks. > Did you happen to try your code on pond.dv, from > ftp.libdv.sourceforge.net? It claims to be a 314M-compliant stream, and > should be 48kHz/16-bit (captured from Canon Elura), though every couple > dozen frames it changes to a 61834 stream for a frame, according to the > APT field. I'm confused on that one, anyone have an idea why that would > be? Yes, but something is wrong. My code reports a lot of invalid audio blocks from pond.dv (which may well be my code, but it works fine on the data from my Sony TRV-103) and what audio it does decode comes out as short noise bursts. -- Alastair |
|
From: Erik W. <om...@cs...> - 2000-04-28 07:26:28
|
On Thu, 27 Apr 2000, James Bowman wrote:
> I'm experimenting with a new bitstream implementation that avoids
> branches. Basically it avoids keeping any state about the bitstream
> around, avoids branches, and makes parse.o 2k smaller. It looks like
> this:
The problem seems to be that it does a lot more work overall. It looks
a lot like Michael Hipp's code from mpg123, which has been replaced with
more stateful bitstream code with much success. In fact, I just checked
and it's not even using MMX getbits yet, and it's still faster.
> static inline guint32 bitstream_show_16(bitstream_t * bs) {
> guint32 a, b, c;
> guint r = bs->offset & 7;
> guint8 *s = &bs->buf[bs->offset >> 3];
>
> return (((s[0] << 16) | (s[1] << 8) | s[2]) >> (8 - r)) & 0xffff;
> }
This is the killer, since shifting regular ia32 registers around is quite
expensive (on the order of 3 cycles per shift, IIRC).
> static inline guint32 bitstream_show(bitstream_t * bs, guint32 num_bits)
> {
> return bitstream_show_16(bs) >> (16 - num_bits);
> }
This is limitted to getting 16 bits, which is OK for libdv but not for a
lot of other codecs. MPEG requires at least 24.
> Here's the trouble: dv_parse_ac_coeffs calls dv_find_spilled_vlc which
> uses bitstream_unget to push bits back into the stream. Any suggestions
> on an alternative implementation which avoids doing this?
Dunno. Buck hasn't had a chance to explain his parsing code to me fully,
and he's out till Tuesday. Hopefully he can answer this question then
(or sooner, if he gets a hold of his mail).
This is one sub-project that needs tackling in the generic sense, though.
I personally have gathered and/or written some dozen bitstream/getbits
routines, all of which have advantages and disadvantages. Merging them
together into a single, releasable toolkit would have major advantages. I
would expect that mpg123 could gain another 20% by using a proper
implementation. mpeg2dec should gain 2-5%. The key is providing a
specializable header file for inlines, such that one might turn off
certain features like large shows (get rid of next_word) and ungets, in
order to simplify most of the routines. Of course, dealing with MMX and
other stuff without being too much of a pain to work with is critical.
I suppose I should put up some of our ideas on this.... Stay tuned.
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: James B. <ja...@ex...> - 2000-04-28 07:00:49
|
I'm experimenting with a new bitstream implementation that avoids
branches. Basically it avoids keeping any state about the bitstream
around, avoids branches, and makes parse.o 2k smaller. It looks like
this:
typedef struct bitstream_s {
guint8 *buf;
gint32 offset;
} bitstream_t;
static inline guint32 bitstream_show_16(bitstream_t * bs) {
guint32 a, b, c;
guint r = bs->offset & 7;
guint8 *s = &bs->buf[bs->offset >> 3];
return (((s[0] << 16) | (s[1] << 8) | s[2]) >> (8 - r)) & 0xffff;
}
static inline guint32 bitstream_show(bitstream_t * bs, guint32 num_bits)
{
return bitstream_show_16(bs) >> (16 - num_bits);
}
static inline void bitstream_flush(bitstream_t * bs, guint32 num_bits) {
bs->offset += num_bits;
}
static inline void bitstream_seek_set(bitstream_t *bs, guint32 offset) {
bs->offset = offset;
}
[I've skipped a couple of functions, but you get the idea...]
Here's the trouble: dv_parse_ac_coeffs calls dv_find_spilled_vlc which
uses bitstream_unget to push bits back into the stream. Any suggestions
on an alternative implementation which avoids doing this?
--
James Bowman
ja...@ex...
|
|
From: Erik W. <om...@cs...> - 2000-04-28 06:48:40
|
On Thu, 27 Apr 2000, James Bowman wrote:
> I took a look at module sizes and found that we're at about 16k of text:
> we're blowing the code cache, and when code moves around (like when you
> remove a module) different functions are cacheing against each other and
> changing performance in surprising ways.
Whee! ;-)
> The code should get smaller as we optmize it, though, so this effect
> will go away. We should be safe if the decode loop fits in 12k.
Yeah, we can do that pretty easily. Eventually I expect that a sufficient
percentage of this will be written in ASM to keep it well below that.
Then of course we have to worry about blowing the data cache. That means
all sorts of tricks, most of which aren't set up yet (such as using
non-cachable pages, which means a kernel hack).
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: James B. <ja...@ex...> - 2000-04-28 06:44:40
|
I had a slightly weird experience this afternoon. I was running some benchmarks, and found that I was changing about 5% when I built with/without the ycrcb_to_rgb32.o module. Even though it wasn't being called. I took a look at module sizes and found that we're at about 16k of text: we're blowing the code cache, and when code moves around (like when you remove a module) different functions are cacheing against each other and changing performance in surprising ways. The code should get smaller as we optmize it, though, so this effect will go away. We should be safe if the decode loop fits in 12k. -- James Bowman ja...@ex... |
|
From: Erik W. <om...@cs...> - 2000-04-28 06:27:27
|
On Thu, 27 Apr 2000, Scott F. Johnston wrote:
> I've played around some more with my changes.
> I was puzzled to see that the ycrcb to rgb conversion
> is taking up 25% of the run-time. This seems long.
> I was able to eek out a few percent, but nothing major.
Yeah, it's slow by definition, and C doesn't help. I hope to get a 4:1:1
version of my MMX routine ready at some point, which should reduce it from
~30 cycles/pixel to about 7.5.
> The "unshuffle" routine isn't taking up much CPU,
> so there's no need to convert it into a table yet.
> We've got bigger fish to fry.
Sounds good.
> If you want to keep track of the dif sequence number and
> dif blocks by counting rather than pulling from the headers,
> that's fine, but I'd rather keep them "difseq" and "difblock"
> identifiers rather than (i,j,k) indices.
I do want to use the ID fields, but have the option of keeping track
somehow. This morning I was suggesting an API that allows mutliple calls
to the decode routine per frame, which would be good for streaming, and
which requires the ID fields be used.
> I'm sure we'll use bitstream when decoding the other blocks,
> I'm just doing the initial fread to pull it into a buffer,
> as is done for the video blocks.
OK, that'll change once we have a more stable API for the library...
> Nice ducks.
Yeah, tell that to the property managers. They tried to get rid of them
(there are a *lot*). They failed.
> P.S. Do you like the Elura? I was thinking of getting a Sony PC-100,
> but I don't like where they put the zoom.
I think it's a nice camera. The zoom on the Elura is in a reasonably sane
position, not sure if the PC-100 is the same. In general most of the
controls are properly placed, given the hand position. The two major
problems with the Elura are the upwards-pointing microphone (yeah, I
really want to hear the ballast of our flourescents over the *actual*
*scene*...), and the lack of certain connectors without attaching a
(presumably) bulbous 'dock' onto the bottom.
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Erik W. <om...@cs...> - 2000-04-27 23:26:40
|
On Thu, 27 Apr 2000, Scott F. Johnston wrote:
> I'm trying to be more "block" based rather than "frame" based.
> For example, the conversion from YCrCb to RGB now occurs on blocks.
Good. This is what we've been planning to do in general.
> I've also started calling things "dvc" as the video format may
> be known as miniDV, but since "dvb" and "dvd" may be supported
> by this library in the future, I thought I'd start making distinctions.
I suppose that makes sense. Not sure it'll end up supported in this
library or not, probably not. We're discussing the idea of a general
library for common video codec routines, but they wouldn't even have
codec-specific names on them. Then again, the library is called libdv, so
it seems to make sense to stay with dv_*. Buck is on vacation, we'll
discuss it when he gets back Tuesday.
> I've enabled "STRICT_SYNTAX" and instead of keeping track of the
> dif numbers and counting the video block numbers in "i,j,k"
> variables in a macroblock, I'm extracting difseq and difblock from the
> header of each macroblock.
We relied on keeping track because some of our videos didn't have valid ID
bytes. Our Cannopus DVRex seems to capture video without correct IDs,
which was throwing me off for a while.
> If you look at "dvc_parse.c" you'll notice that
> I've included stubs for parsing all the blocks, not just the
> video blocks.
> Currently, the audio, subcode, and vaux blocks just pull
> data from the stream and return. (I haven't changed all the
> unsigned char's to guint8's yet, though.)
This should probably be changed to use the bitstream that's available, so
it's more generic.
> I've brought us closer to having a "library" by simplifying
> playdv to only contain a minimal front and and the GTK stuff.
Good.
> I've added dv2ppm to convert the frames to ppm images.
Cool.
> I've replaced "place" with a routine that unshuffles the
> location based on the dif and macroblock numbers.
> This should be converted into a table.
Yeah, the placement code has always been ugly...
> WARNING: I have not written unshuffle for PAL yet, so
> my work has temporarily broken PAL decoding.
If it gets into CVS in that form, I'm sure someone will fix it ;-)
> I've noticed that this decoder does not give the same results
> as the Quicktime decoder. I believe it has to do with compression
> of the dynamic range in the ycrcb to rgb conversion. I haven't
> poked around further to see about stretching the range, but I
> know it's a common thing to do.
Different good or different bad? I know your ycrcb changes made it look
a heckuvalot more colorful...
> I hope I haven't broken anything else.
Just PAL. You'll have to ask the Europeans for forgiveness for that
one... ;-)
I'm familiarizing myself with your changes now, and will try to merge them
into CVS in some form soon.
BTW, you should sign onto the mailing list, libdv-dev. Makes it easier to
discuss this, since everyone else can chime in too ;-)
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|
|
From: Erik W. <om...@cs...> - 2000-04-27 18:10:17
|
On Thu, 27 Apr 2000, James Bowman wrote:
> dvs *dvs_open(void *(*reader_func)(void*), void *s, enum frame_format)
> Initializes a dv stream, with a given read_func. reader_func(s)
> returns a pointer to the next 400 bytes in the stream. frame_format
> would be the target pixel format: 32-bit RGB, 16-bit RGB, raw YCrCb.
I would lean towards a very simple API that takes a bunch of bytes (in
multiples of 80) and a pointer to a memory region:
int dv_decode(unsigned char *dv,unsigned char *image);
Couple that with a few utility routines:
int dv_video_format(unsigned char *dv); (= DV_411, DV_420, ...)
int dv_video_size(unsigned char *dv,int &w,int &h);
int dv_bytes_per_frame(unsigned char *dv); (= 120,000, 144,000)
int dv_next_frame_offset(unsigned char *dv);
The idea is that this gives full control over the input and output streams
to the application calling the library. A decode loop could look like:
unsigned char *dv, *image;
int format,bpf,w,h;
/* assume that the next bytes are the head of a frame */
dv = malloc(DV_DIF_BLOCK_SIZE);
fread(dv,DV_DIF_BLOCK_SIZE,1,stdin);
format = dv_video_format(dv);
dv_video_size(dv,&w,&h);
switch (format) {
case DV_411, DV_420: image = malloc(w*h+(w*h/2));
...
}
bpf = dv_byte_per_frame(dv);
free(dv);
dv = malloc(bpf);
fread(dv,bpf-DV_DIF_BLOCK_SIZE,1,stdin);
dv_decode(dv,image);
/* go process the image now, yuv2rgb convert and display */
while (!feof(stdin)) {
fread(dv,bpf,1,stdin);
dv_decode(dv,image);
/* process image */
}
Now, this API needs a few more things, since the ideal case from the
performance point of view, on machines without hardware acceleration, is
to do yuv2rgb on a macroblock at a time basis during decode. That means
we'd want a switch (or just a dv_decode variation) to have it output
various forms of RGB, and thus do the specialization for speed.
One thing I do have somewhere is the header file from Adaptec's DV codec.
It's sitting in some mailbox of mine. I'll dig that out and we can see
what they do.
Erik Walthinsen <om...@cs...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|