Thread: [Mlt-devel] New avsync module
Brought to you by:
ddennedy,
lilo_booter
From: Brian M. <pez...@ya...> - 2013-02-26 04:46:20
|
Dan, I added a new module: avsync. It has one producer and one consumer. The producer generates a stream with a classic blip/flash pattern. The consumer attempts to detect the blip/flash and calculate the audio/video sync. The consumer will report the caluclated lipsync in milliseconds. A positive number indicates that audio leads video. A negative number indicates that audio lags video. The module can test itself: melt -silent -profile atsc_720p_30 blipflash -consumer blipflash avsync = 0.00 Obviously, 0 is perfect sync. It is interesting to round-trip the pattern through some various encodings: melt -profile atsc_720p_30 blipflash -consumer avformat:test.avi melt -silent test.avi -consumer blipflash avsync = -23.02 melt -profile atsc_720p_30 blipflash -consumer avformat:test.mpg melt -silent test.mpg -consumer blipflash avsync = -10.02 melt -profile atsc_720p_30 blipflash -consumer avformat:test.mkv melt -silent test.mkv -consumer blipflash avsync = 0.00 melt -profile atsc_720p_30 blipflash -consumer avformat:test.mov acodec=ac3 melt -silent test.mov -consumer blipflash avsync = -5.33 melt blipflash -consumer libdv:test.dv melt -silent test.dv -consumer blipflash avsync = -40.00 A sync worse than +/-3ms is pretty bad, in my opinion. Some of these might be worth looking in to. Anyway, the module is obviously up for debate. Let me know your thoughts. I can add, modify or delete as appropriate. ~Brian |
From: Dan D. <da...@de...> - 2013-02-26 06:16:50
|
Thank you for the contribution. You might be interested to read this old report I did testing a/v sync of the DVEO SDI consumer, drivers, and hardware: http://www.mltframework.org/bin/view/MLT/LinsysSyncTest The challenge now is to figure out if the problems lie more in the avformat consumer or producer, or at least how to isolate one to fine tune it. Do you have any thoughts about that? At least with raw DV output, we can check it with the dvthumbs.c I used in that study. Also, how did you come up with the 3ms number? Why is 33ms - one frame at 30 Hz - not acceptable? On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly <pez...@ya...> wrote: > Dan, > > I added a new module: avsync. It has one producer and one consumer. The producer generates a stream with a classic blip/flash pattern. The consumer attempts to detect the blip/flash and calculate the audio/video sync. The consumer will report the caluclated lipsync in milliseconds. A positive number indicates that audio leads video. A negative number indicates that audio lags video. > > The module can test itself: > > melt -silent -profile atsc_720p_30 blipflash -consumer blipflash > avsync = 0.00 > > Obviously, 0 is perfect sync. > > It is interesting to round-trip the pattern through some various encodings: > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.avi > melt -silent test.avi -consumer blipflash > avsync = -23.02 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mpg > melt -silent test.mpg -consumer blipflash > avsync = -10.02 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mkv > melt -silent test.mkv -consumer blipflash > avsync = 0.00 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mov acodec=ac3 > melt -silent test.mov -consumer blipflash > avsync = -5.33 > > melt blipflash -consumer libdv:test.dv > melt -silent test.dv -consumer blipflash > avsync = -40.00 > > A sync worse than +/-3ms is pretty bad, in my opinion. Some of these might be worth looking in to. > > Anyway, the module is obviously up for debate. Let me know your thoughts. I can add, modify or delete as appropriate. > > ~Brian |
From: Brian M. <pez...@ya...> - 2013-02-26 14:10:26
|
>Thank you for the contribution. You might be interested to read this > old report I did testing a/v sync of the DVEO SDI consumer, drivers, > and hardware: > http://www.mltframework.org/bin/view/MLT/LinsysSyncTest Interesting. It is pretty much the same concept as the blip flash. But your method allows a visual comparison. It reminds me of how I have tested lipsync using an oscilloscope. You can run analog audio and video into a scope and trigger on the blip. Then you can use the scope to measure the offset to the video flash. > The challenge now is to figure out if the problems lie more in the > avformat consumer or producer, or at least how to isolate one to fine > tune it. Do you have any thoughts about that? At least with raw DV > output, we can check it with the dvthumbs.c I used in that study. If I were going to attack it, I would probably try to build a large test data set and then look for trends. For example, maybe a particular codec or container is always off by some fraction of one frame. Also, we could create a blipflash clip and feed it into ffmpeg directly - bypassing MLT altogether. Matroska seemed to do pretty good. It would be frustrating to focus on MLT only to find out the problem is in libav. > Also, how did you come up with the 3ms number? Why is 33ms - one frame > at 30 Hz - not acceptable? 33ms is well below the threshold for human perception. So from that perspective, 33ms isn't bad. But if you have a workflow that includes multiple decode/encode cycles, the offset could stack up into a problem. Let me rephrase what I was trying to say: If the offset is better than +/-3ms, then you can probably attribute that to rounding errors in the demux, decode, encode, mux and detection steps of the process. It may not be worth chasing. If the offset is worse than +/-3ms, then that probably indicates a programming error (not working as intended) and it may be worth trying to fix. Whether it is good or bad depends on your use case. ~BM |
From: Dan D. <da...@de...> - 2013-02-26 06:20:25
|
On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly <pez...@ya...> wrote: > Dan, > > I added a new module: avsync. It has one producer and one consumer. The producer generates a stream with a classic blip/flash pattern. The consumer attempts to detect the blip/flash and calculate the audio/video sync. The consumer will report the caluclated lipsync in milliseconds. A positive number indicates that audio leads video. A negative number indicates that audio lags video. > > The module can test itself: > > melt -silent -profile atsc_720p_30 blipflash -consumer blipflash > avsync = 0.00 > > Obviously, 0 is perfect sync. > > It is interesting to round-trip the pattern through some various encodings: > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.avi > melt -silent test.avi -consumer blipflash > avsync = -23.02 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mpg > melt -silent test.mpg -consumer blipflash > avsync = -10.02 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mkv > melt -silent test.mkv -consumer blipflash > avsync = 0.00 > > melt -profile atsc_720p_30 blipflash -consumer avformat:test.mov acodec=ac3 > melt -silent test.mov -consumer blipflash > avsync = -5.33 > > melt blipflash -consumer libdv:test.dv > melt -silent test.dv -consumer blipflash > avsync = -40.00 > It is interesting to note that melt blipflash -consumer libdv:test.dv gives 0.00, which lends itself to a possible off-by-one error in the avformat producer since this is PAL, which is 40ms per frame. -- +-DRD-+ |
From: Dan D. <da...@de...> - 2013-02-26 06:26:38
|
On Mon, Feb 25, 2013 at 10:20 PM, Dan Dennedy <da...@de...> wrote: > On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly <pez...@ya...> wrote: >> Dan, >> >> I added a new module: avsync. It has one producer and one consumer. The producer generates a stream with a classic blip/flash pattern. The consumer attempts to detect the blip/flash and calculate the audio/video sync. The consumer will report the caluclated lipsync in milliseconds. A positive number indicates that audio leads video. A negative number indicates that audio lags video. >> >> The module can test itself: >> >> melt -silent -profile atsc_720p_30 blipflash -consumer blipflash >> avsync = 0.00 >> >> Obviously, 0 is perfect sync. >> >> It is interesting to round-trip the pattern through some various encodings: >> >> melt -profile atsc_720p_30 blipflash -consumer avformat:test.avi >> melt -silent test.avi -consumer blipflash >> avsync = -23.02 >> >> melt -profile atsc_720p_30 blipflash -consumer avformat:test.mpg >> melt -silent test.mpg -consumer blipflash >> avsync = -10.02 >> >> melt -profile atsc_720p_30 blipflash -consumer avformat:test.mkv >> melt -silent test.mkv -consumer blipflash >> avsync = 0.00 >> >> melt -profile atsc_720p_30 blipflash -consumer avformat:test.mov acodec=ac3 >> melt -silent test.mov -consumer blipflash >> avsync = -5.33 >> >> melt blipflash -consumer libdv:test.dv >> melt -silent test.dv -consumer blipflash >> avsync = -40.00 >> > > It is interesting to note that > > melt blipflash -consumer libdv:test.dv excuse me, I meant melt -silent libdv:test.dv -consumer blipflash using the libdv producer instead of avformat. > gives 0.00, which lends itself to a possible off-by-one error in the > avformat producer since this is PAL, which is 40ms per frame. I started to look into this, but like the other isolation problem, now I am not sure whether to look at audio or video initially! -- +-DRD-+ |
From: Dan D. <da...@de...> - 2013-02-26 06:39:55
|
On Mon, Feb 25, 2013 at 10:26 PM, Dan Dennedy <da...@de...> wrote: > On Mon, Feb 25, 2013 at 10:20 PM, Dan Dennedy <da...@de...> wrote: >> On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly <pez...@ya...> wrote: >>> Dan, >>> >>> I added a new module: avsync. It has one producer and one consumer. The producer generates a stream with a classic blip/flash pattern. The consumer attempts to detect the blip/flash and calculate the audio/video sync. The consumer will report the caluclated lipsync in milliseconds. A positive number indicates that audio leads video. A negative number indicates that audio lags video. >>> [,,,] >>> melt blipflash -consumer libdv:test.dv >>> melt -silent test.dv -consumer blipflash >>> avsync = -40.00 >>> >> >> It is interesting to note that >> >> melt blipflash -consumer libdv:test.dv > > excuse me, I meant > melt -silent libdv:test.dv -consumer blipflash > > using the libdv producer instead of avformat. > >> gives 0.00, which lends itself to a possible off-by-one error in the >> avformat producer since this is PAL, which is 40ms per frame. > > I started to look into this, but like the other isolation problem, now > I am not sure whether to look at audio or video initially! I created a blipflash DV with frame numbers: melt blipflash -attach dynamictext:#frame# bgcolour=white -consumer libdv:test.dv Then, I generated an image sequence with the libdv producer: melt -silent libdvtest.dv -consumer avformat:test-%04d.jpg progressive=1 It looks fine. Then, using the avformat producer: melt -silent test.dv -consumer avformat:test-%04d.jpg progressive=1 The result shows frame 0 followed by frame 2! Now, applying an in point: melt -silent test.dv in=5 -consumer avformat:test-%04d.jpg progressive=1 test-0001.jpg shows frame# 6 instead of 5. Off-by-one in the video handling of the avformat producer. -- +-DRD-+ |
From: Brian M. <pez...@ya...> - 2013-02-26 14:18:50
|
> On Mon, Feb 25, 2013 at 10:26 PM, Dan Dennedy <da...@de...> wrote: >> On Mon, Feb 25, 2013 at 10:20 PM, Dan Dennedy <da...@de...> > wrote: >>> On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly > <pez...@ya...> wrote: >>>> Dan, >>>> >>>> I added a new module: avsync. It has one producer and one consumer. > The producer generates a stream with a classic blip/flash pattern. The consumer > attempts to detect the blip/flash and calculate the audio/video sync. The > consumer will report the caluclated lipsync in milliseconds. A positive number > indicates that audio leads video. A negative number indicates that audio lags > video. >>>> > [,,,] >>>> melt blipflash -consumer libdv:test.dv >>>> melt -silent test.dv -consumer blipflash >>>> avsync = -40.00 >>>> >>> >>> It is interesting to note that >>> >>> melt blipflash -consumer libdv:test.dv >> >> excuse me, I meant >> melt -silent libdv:test.dv -consumer blipflash >> >> using the libdv producer instead of avformat. >> >>> gives 0.00, which lends itself to a possible off-by-one error in the >>> avformat producer since this is PAL, which is 40ms per frame. >> >> I started to look into this, but like the other isolation problem, now >> I am not sure whether to look at audio or video initially! > > I created a blipflash DV with frame numbers: > melt blipflash -attach dynamictext:#frame# bgcolour=white -consumer > libdv:test.dv > > Then, I generated an image sequence with the libdv producer: > melt -silent libdvtest.dv -consumer avformat:test-%04d.jpg progressive=1 > > It looks fine. Then, using the avformat producer: > melt -silent test.dv -consumer avformat:test-%04d.jpg progressive=1 > > The result shows frame 0 followed by frame 2! > > Now, applying an in point: > melt -silent test.dv in=5 -consumer avformat:test-%04d.jpg progressive=1 > > test-0001.jpg shows frame# 6 instead of 5. > > Off-by-one in the video handling of the avformat producer. Nice! It looks like the module has been useful already. I Hope I didn't just create a whole bunch of work for you :) My plan was to try to script some automated tests to run weekly on the build server. That way we would know if any avsync regressions occurred. But it didn't occur to me that we might have to fix up some producers/consumers. My plan was to use +/-3ms as my pass/fail criteria (hence my previous comment). But maybe it would make more sense to use one frame duration - at least until everything gets dialed in. When you run "melt -silent", do you use CTL-C to stop it? Is there any way to tell melt to exit after x frames? ~BM |
From: Dan D. <da...@de...> - 2013-02-26 17:32:05
|
On Tue, Feb 26, 2013 at 6:18 AM, Brian Matherly <pez...@ya...> wrote: >> On Mon, Feb 25, 2013 at 10:26 PM, Dan Dennedy <da...@de...> wrote: > >>> On Mon, Feb 25, 2013 at 10:20 PM, Dan Dennedy <da...@de...> >> wrote: >>>> On Mon, Feb 25, 2013 at 8:45 PM, Brian Matherly >> <pez...@ya...> wrote: >>>>> Dan, >>>>> >>>>> I added a new module: avsync. It has one producer and one consumer. >> The producer generates a stream with a classic blip/flash pattern. The consumer >> attempts to detect the blip/flash and calculate the audio/video sync. The >> consumer will report the caluclated lipsync in milliseconds. A positive number >> indicates that audio leads video. A negative number indicates that audio lags >> video. >>>>> >> [,,,] >>>>> melt blipflash -consumer libdv:test.dv >>>>> melt -silent test.dv -consumer blipflash >>>>> avsync = -40.00 >>>>> >>>> >>>> It is interesting to note that >>>> >>>> melt blipflash -consumer libdv:test.dv >>> >>> excuse me, I meant >>> melt -silent libdv:test.dv -consumer blipflash >>> >>> using the libdv producer instead of avformat. >>> >>>> gives 0.00, which lends itself to a possible off-by-one error in the >>>> avformat producer since this is PAL, which is 40ms per frame. >>> >>> I started to look into this, but like the other isolation problem, now >>> I am not sure whether to look at audio or video initially! >> >> I created a blipflash DV with frame numbers: >> melt blipflash -attach dynamictext:#frame# bgcolour=white -consumer >> libdv:test.dv >> >> Then, I generated an image sequence with the libdv producer: >> melt -silent libdvtest.dv -consumer avformat:test-%04d.jpg progressive=1 >> >> It looks fine. Then, using the avformat producer: >> melt -silent test.dv -consumer avformat:test-%04d.jpg progressive=1 >> >> The result shows frame 0 followed by frame 2! >> >> Now, applying an in point: >> melt -silent test.dv in=5 -consumer avformat:test-%04d.jpg progressive=1 >> >> test-0001.jpg shows frame# 6 instead of 5. >> >> Off-by-one in the video handling of the avformat producer. > > Nice! It looks like the module has been useful already. I Hope I didn't just create a whole bunch of work for you :) > I digged into that problem last night, found the culprit, and made a quick change to fix it. However, as I suspected might be the case, it created a huge regression for some other cases, namely your mkv and mov tests in the originating post. I am talking ~300ms off. I have put a huge amount of effort into a/v sync and seeking over the years, and I am very reluctant to make changes due to the large amount of testing to balance requirements and get it right. Here is a quick rundown of the requirements: - large variety of formats/codecs - large variety of files (tools and devices used to produce them, each with a variety of versions) - a variety of ffmpeg and now libav versions As if that amount of diversity alone is not enough: - frame accurate and fast seeking where possible, including AVCHD - live (network stream and device) input (no seeking required) - large runs with no memory leaks - image caching to support filters such as YADIF and telecide that need previous and next frames - large numbers of files in a composition without consuming RAM or file handles - a special mode to decode all audio streams in a mux - various combinations of the above - cross-platform - very limited human resources to develop and support All I can say is thank goodness for FFmpeg and libav without which none of this would be possible, but there is still a lot of work for MLT to make effective and comprehensive usage of their API. I do not plan to revisit this task in the near term. With that said... > My plan was to try to script some automated tests to run weekly on the build server. That way we would know if any avsync regressions occurred. But it didn't occur to me that we might have to fix up some producers/consumers. My plan was to use +/-3ms as my pass/fail criteria (hence my previous comment). But maybe it would make more sense to use one frame duration - at least until everything gets dialed in. > I would like to develop a plan to test for regression against a baseline so I can try to make a change or improvement and have more automated testing to verify it. Starting with the avformat producer, we need a set of reference inputs, but we do not have them, and it does not make sense to produce them with the MLT avformat consumer. But we can use blipflash producer to generate a raw DV and verify that with dvthumbs.c. Then, we can use the ffmpeg command line tool to generate a variety of encodings. We still do not know if those are correct, but they do provide a best effort baseline against which we can measure deviations to check for regressions, analyze trends, and investigate the hotspots (popular or important formats that show a large offset). > When you run "melt -silent", do you use CTL-C to stop it? Is there any way to tell melt to exit after x frames? > yes, ctrl+c. If you want to go a certain number of frames set an out point on the producer and termnate_on_pause=1 on the consumer, but the consumer has to be written to exit the consumer thread's loop when that property is set. -- +-DRD-+ |
From: Brian M. <pez...@ya...> - 2013-02-27 02:49:56
|
>>> Off-by-one in the video handling of the avformat producer. >> >> Nice! It looks like the module has been useful already. I Hope I didn't > just create a whole bunch of work for you :) >> > > I digged into that problem last night, found the culprit, and made a > quick change to fix it. However, as I suspected might be the case, it > created a huge regression for some other cases, namely your mkv and > mov tests in the originating post. I am talking ~300ms off. I have put > a huge amount of effort into a/v sync and seeking over the years, and > I am very reluctant to make changes due to the large amount of testing > to balance requirements and get it right. Here is a quick rundown of > the requirements: > > - large variety of formats/codecs > - large variety of files (tools and devices used to produce them, each > with a variety of versions) > - a variety of ffmpeg and now libav versions > As if that amount of diversity alone is not enough: > - frame accurate and fast seeking where possible, including AVCHD > - live (network stream and device) input (no seeking required) > - large runs with no memory leaks > - image caching to support filters such as YADIF and telecide that > need previous and next frames > - large numbers of files in a composition without consuming RAM or file handles > - a special mode to decode all audio streams in a mux > - various combinations of the above > - cross-platform > - very limited human resources to develop and support > > All I can say is thank goodness for FFmpeg and libav without which > none of this would be possible, but there is still a lot of work for > MLT to make effective and comprehensive usage of their API. I do not > plan to revisit this task in the near term. With that said... I hear that. Don't forget: - maintain compatibility with two forks of the libav/ffmpeg project - maintain compatibilty with the rapidly changing API - maintain compatibility with all known legacy versions of the libav suite I didn't mean to trivialize the work you've done. Just keeping up with the API must make you feel a little bit like Sisyphus. And I don't think that anything has to be done immediately. The current performance is doing a good job of meeting many people's needs. >> My plan was to try to script some automated tests to run weekly on the > build server. That way we would know if any avsync regressions occurred. But it > didn't occur to me that we might have to fix up some producers/consumers. My > plan was to use +/-3ms as my pass/fail criteria (hence my previous comment). But > maybe it would make more sense to use one frame duration - at least until > everything gets dialed in. >> > > I would like to develop a plan to test for regression against a > baseline so I can try to make a change or improvement and have more > automated testing to verify it. Starting with the avformat producer, > we need a set of reference inputs, but we do not have them, and it > does not make sense to produce them with the MLT avformat consumer. > But we can use blipflash producer to generate a raw DV and verify that > with dvthumbs.c. Then, we can use the ffmpeg command line tool to > generate a variety of encodings. We still do not know if those are > correct, but they do provide a best effort baseline against which we > can measure deviations to check for regressions, analyze trends, and > investigate the hotspots (popular or important formats that show a > large offset). I like the idea of tracking against a baseline. Let me capture some data and come up with a proposal. We just need to find some success criteria that will be tight enough so that we know when a regression has occurred. I'll chip away at it for a couple of weeks and get back to you. >> When you run "melt -silent", do you use CTL-C to stop it? Is > there any way to tell melt to exit after x frames? >> > > yes, ctrl+c. If you want to go a certain number of frames set an out > point on the producer and termnate_on_pause=1 on the consumer, but the > consumer has to be written to exit the consumer thread's loop when > that property is set. Ah yes. I did implement terminate_on_pause in the blipflash consumer. But I didn't know what it was for. :$ Thanks, ~BM |
From: Brian M. <pez...@ya...> - 2013-03-07 05:00:36
|
>> My plan was to try to script some automated tests to run weekly on the > build server. That way we would know if any avsync regressions occurred. But it > didn't occur to me that we might have to fix up some producers/consumers. My > plan was to use +/-3ms as my pass/fail criteria (hence my previous comment). But > maybe it would make more sense to use one frame duration - at least until > everything gets dialed in. >> > > I would like to develop a plan to test for regression against a > baseline so I can try to make a change or improvement and have more > automated testing to verify it. Starting with the avformat producer, > we need a set of reference inputs, but we do not have them, and it > does not make sense to produce them with the MLT avformat consumer. > But we can use blipflash producer to generate a raw DV and verify that > with dvthumbs.c. Then, we can use the ffmpeg command line tool to > generate a variety of encodings. We still do not know if those are > correct, but they do provide a best effort baseline against which we > can measure deviations to check for regressions, analyze trends, and > investigate the hotspots (popular or important formats that show a > large offset). I did some testing with ffmpeg and I'm pretty sure that much of the offset is comming from libav itself. For example, if you use libdv to round trip a blipflash through melt, you get an offset of 0. But if you use the dv output from melt and round trip it through ffmpeg as AVI, and then run it back through melt, you get an offset. So I tested a bunch of formats and I found that the A/V offset through melt is never worse than one video frame. So I set one video frame as the success threshold. The script is here: https://github.com/mltframework/mlt-scripts/blob/master/test/test_avsync.sh It will run every week along with the other autmoated tests. It tests 7 different output formats. We can add more or refine them. Let me know what formats you think are the most important. While the test won't tell us if the AV sync gets worse by a few milliseconds, it will tell us if there is a catostrophic AV sync error - which I think is probably the most important thing. Here are the current results, FYI: libdv: 0.0ms avformat-avi: -23.02ms avformat-dv: -33.38ms avformat-mkv: 0.0ms avformat-mov: 12.02ms avformat-mp4: 12.02ms avformat-mpg: -10.02ms Of course, the audio and video codecs probably have some effect on the sync in addition to the container format. I didn't dig too deep into that as most of the tests use default codecs. ~BM |