#52 SVN 2006-09-02 - Silent crash

CVS
closed-accepted
capture (20)
6
2006-12-28
2006-09-02
No

You might already know about this, but just in case:

The current SVN version silently exits after recording
for a moment or two. I see the following assert if I
run it from the command line:

xvidcap: pthread_mutex_lock.c:108:
__pthread_mutex_lock: Assertion `mutex->__data.__owner
== 0' failed.
Aborted

This is on Kubuntu 6.06.1 (64-bit), compiled with gcc
4.0.3. Codec and format are both set to auto, audio
disabled.

Thanks,
-J

Discussion

1 2 > >> (Page 1 of 2)
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-03

    Logged In: YES
    user_id=782084

    (update, accepted)
    Hi,
    yep, this is a major annoyance. Haven't gotten to the bottom
    of this, as it is not always reproducable for me.
    I have the feeling I'm only seeing it on my P4 HT enabled
    with SMP kernel.
    What CPU are you on? Is it a Hyperthreading enabled or
    Multicore? Could you try with a non-SMP kernel?

    Since this is an assertion from within libpthread, I don't
    quite know if I can do anything about it, or whether it
    needs to be fixed upstream.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-03
    • priority: 5 --> 3
    • assigned_to: nobody --> charly4711
    • status: open --> open-accepted
     
  • Jesse Litton

    Jesse Litton - 2006-09-03

    Logged In: YES
    user_id=210111

    Well, I am using an AMD4400x2 (dual-core) processor... but
    I'm not completely convinced it's a bug in libpthread just
    yet (but I've been wrong before <g>).

    Since I seem to be able to consistently recreate the
    problem, I'll take a crack at throwing some debugging output
    into my local copy and see what falls out. I don't have
    anywhere near your Linux expertise, but another set of eyes
    can't hurt.

    Linux pluto 2.6.15-26-amd64-k8 #1 SMP PREEMPT Thu Aug 3
    03:11:38 UTC 2006 x86_64 GNU/Linux

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-03

    Logged In: YES
    user_id=782084

    didn't say I was "convinced" ... just that it might.

    If it's reproducible for you, testing with non-SMP kernel
    might help to rule it out.
    Have been reading about this error message in a context
    where threads on two CPUs would get a lock on a mutex at the
    same time. This should not be possible with a non-SMP kernel.

     
  • Jesse Litton

    Jesse Litton - 2006-09-04

    Logged In: YES
    user_id=210111

    I looked into it a little bit last night, and must admit to
    being very perplexed. The assert doesn't seem to be
    generated at the same time xvidcap is doing any of its
    locking operations, which I didn't expect. Don't pay
    attention to the specific line numbers of my additional
    debugs below (because they no longer correspond to the
    original source), but the sequence below is what I'm seeing
    before the crash. Also, ignore the fact that it says it's
    in xvc_job_validate()... that was just the last function to
    define DEBUGFUNCTION.

    The code seems to return from every lock operation
    successfully... At first I thought I might not be seeing
    the right info because some of the debugging output was
    going to stdout and might have been buffered... but I
    redirected the important ones to stderr, which should have
    corrected any sequencing mismatch.

    ...
    capture.c captureFrameToImage(): Entering
    capture.c captureFrameToImage(): going to fetch image next
    capture.c XGetZPixmap(): Entering
    capture.c XGetZPixmap(): read 1737904 bytes
    capture.c XGetZPixmap(): Leaving
    capture.c captureFrameToImage(): Leaving
    capture.c TCbCaptureSHM(): calling job->save
    capture.c TCbCaptureSHM(): called job->save
    capture.c TCbCaptureSHM(): pic_no=413 flags=4102 state=4
    VC_REC 4 - VC_STOP 0
    capture.c TCbCaptureSHM(): we're recording
    capture.c TCbCaptureSHM(): before remove_state @ 894
    job.c xvc_job_validate(): Locking mutex @ 607
    job.c xvc_job_validate(): after lock @ 610
    job.c xvc_job_validate(): Unlocking mutex @ 614
    job.c xvc_job_validate(): after unlock @ 617
    capture.c TCbCaptureSHM(): after remove_state @ 897
    capture.c TCbCaptureSHM(): reading an image in a data
    sturctur present
    capture.c captureFrameToImage(): Entering
    capture.c captureFrameToImage(): going to fetch image next
    capture.c XGetZPixmap(): Entering
    capture.c XGetZPixmap(): read 1737904 bytes
    capture.c XGetZPixmap(): Leaving
    capture.c captureFrameToImage(): Leaving
    capture.c TCbCaptureSHM(): calling job->save
    capture.c TCbCaptureSHM(): called job->save
    capture.c TCbCaptureSHM(): pic_no=414 flags=4102 state=4
    VC_REC 4 - VC_STOP 0
    capture.c TCbCaptureSHM(): we're recording
    capture.c TCbCaptureSHM(): before remove_state @ 894
    job.c xvc_job_validate(): Locking mutex @ 607
    job.c xvc_job_validate(): after lock @ 610
    job.c xvc_job_validate(): Unlocking mutex @ 614
    job.c xvc_job_validate(): after unlock @ 617
    capture.c TCbCaptureSHM(): after remove_state @ 897
    capture.c TCbCaptureSHM(): reading an image in a data
    sturctur present
    capture.c captureFrameToImage(): Entering
    capture.c captureFrameToImage(): going to fetch image next
    capture.c XGetZPixmap(): Entering
    capture.c XGetZPixmap(): read 1737904 bytes
    capture.c XGetZPixmap(): Leaving
    capture.c captureFrameToImage(): Leaving
    capture.c TCbCaptureSHM(): calling job->save
    capture.c TCbCaptureSHM(): called job->save
    xvidcap: pthread_mutex_lock.c:108: __pthread_mutex_lock:
    Assertion `mutex->__data.__owner == 0' failed.
    = 0x41d5d0
    target = 7
    targetCodec = 7
    ncolors = 256
    color_table = 0xb825f0
    colors = 0xb815e0
    win_attr (w/h/x/y) = 826/526/61/438
    area (w/h/x/y) = 826/526/61/438
    xtoffmpeg.c XImageToFFMPEG(): Entering
    xtoffmpeg.c dump32bit(): Entering with image 0xbaecd0
    xtoffmpeg.c dump32bit(): Leaving
    xtoffmpeg.c XImageToFFMPEG(): calling encode_video with
    codec 0xbb7d60, outbuf 0x2aaab2bed010, outbuf size
    -1296117744, output frame 0xc3a130
    xtoffmpeg.c do_video_out(): Entering with format context
    0xbb69e0 output stream 0xba9d20 buffer 0x2aaab2bed010 size 265
    xtoffmpeg.c do_video_out(): Leaving
    capture.c TCbCaptureSHM(): submitting capture of next frame
    in 71 milliseconds
    gnome_ui.c xvc_frame_monitor(): Entering with time = 29
    gnome_ui.c xvc_frame_monitor(): Leaving with percent = 30
    gnome_ui.c do_record_thread(): going for next frame
    gnome_ui.c do_record_thread(): woke up
    ...

    I'm building a non-SMP kernel in the background and will
    give it a try today or tomorrow, per your earlier request.

    -J

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-05

    Logged In: YES
    user_id=782084

    What I don't get is: How come the debug output continues
    after the assertion? This can't be right, can it?

    Also, there are two mutexes (actually three, but the
    update_filename_mutex is no longer needed) and it would be
    good to make sure which one is causing the problem. One
    would be recording_mutex in gnome_ui.c, the other mp in
    xtoffmpeg.c.
    Dunno how much time I'll be able to spend on this today, but
    you might want to try building without audio support to
    avoid use of the second one (used for audio capture), so
    after configure you might want to make clean, edit config.h
    and unset HAVE_FFMPEG_AUDIO, then build.

    Also, I should be initializing recording_mutex to
    PTHREAD_MUTEX_INITIALIZER and don't atm.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-07

    Logged In: YES
    user_id=782084

    (update)
    have not gotten a single one of those on my Athlon XP2400+
    while testing many other things during the last two days.
    With those things out of the way, I'll look into this on my
    P4 HT enabled, next.
    Any news from your side?

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-08

    Logged In: YES
    user_id=782084

    (update)
    haven't gotten a single crash with HAVE_FFMPEG_AUDIO
    #undef'ed, however --audio no is not sufficient. More
    research needed, but it seems the mutex for audio capture is
    the culprit. If you can live without audio, compiling
    without audio support may be a temporary workaround.

     
  • Jesse Litton

    Jesse Litton - 2006-09-09

    Logged In: YES
    user_id=210111

    >> haven't gotten a single crash with HAVE_FFMPEG_AUDIO
    #undef'ed, however --audio no is not sufficient. <<

    I too tried "--audio no", and found it did not stop the asserts.

    I rebuilt my copy with FFMPEG disabled (tried once with it
    def'd to zero, and once with it undef'd completely - binary
    says that it does not have audio support when I prompt
    params), as you suggested. But, it still seems to have the
    same problem. :(

    >> What I don't get is: How come the debug output continues
    after the assertion? This can't be right, can it? <<

    I noticed earlier that if I let all the output stream to the
    console, the assert is always last (as expected) - but when
    I redirect stdout and stderr to a file, it doesn't come out
    last. It's got to be some kind of file buffering issue.

    When running from the console, the last lines I see are:

    capture.c TCbCaptureSHM(): called job->save
    capture.c TCbCaptureSHM(): submitting capture of next frame
    in 72 milliseconds
    gnome_ui.c xvc_frame_monitor(): Entering with time = 28
    gnome_ui.c xvc_frame_monitor(): Leaving with percent = 30
    xvidcap: pthread_mutex_lock.c:108: __pthread_mutex_lock:
    Assertion `mutex->__data.__owner == 0' failed.

    I will test with my non-SMP kernel in just a little bit.
    It's been pretty crazy lately and I still haven't had the
    chance.

    -J

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-11

    Logged In: YES
    user_id=782084

    (update)
    lot's of things point towards glibc/libpthread. There seem
    to be issues around NPTL. This can potentially be worked
    around by setting LD_ASSUME_KERNEL.

    I'll elaborate a little so I'll be able to find that
    information again myself:
    Linux systems typically have libraries for the different
    threading implementations lying around and the dynamic
    linker determines which ones to use based on compatibility
    information the libraries provide and the running system.
    The information the running system publishes can be tweaked.

    This is explained in more detail here:
    http://people.redhat.com/drepper/assumekernel.html
    (I'll attach it for persistence)

    So if you know what versions require what kernel, you can
    for the dll to pick a certain one. On ubuntu you cannot use
    eu-readelf, but would rather use objdump like this:

    $ LC_ALL=C objdump -s -j .note.ABI-tag /lib/libpthread.so.0

    /lib/libpthread.so.0: file format elf32-i386

    Contents of section .note.ABI-tag:
    0134 04000000 10000000 01000000 474e5500 ............GNU.
    0144 00000000 02000000 02000000 00000000 ................
    ................^........^........^

    This tells you that this library (the non-nptl version)
    requires kernel 2.2.0+

    $ LC_ALL=C objdump -s -j .note.ABI-tag /lib/tls/libpthread.so.0

    /lib/tls/libpthread.so.0: file format elf32-i386

    Contents of section .note.ABI-tag:
    0134 04000000 10000000 01000000 474e5500 ............GNU.
    0144 00000000 02000000 06000000 00000000 ................
    ................^........^........^

    This tells you that the new implementation requires kernel
    2.6.0+.

    So doing this the following sounds like a promising
    workaround (though I need to test more because I cannot
    always reproduce this):

    $ LD_ASSUME_KERNEL=2.4.19 ~/xvidcap/bin/xvidcap --mf

    If we can confirm that this helps, I'll raise the issue
    upstream.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-11

    description of LD_ASSUME_KERNEL

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-11

    Logged In: YES
    user_id=782084

    (update)
    just captured 100 mpegs @ 10fps and 50 frames max with
    autocontinue without a single crash on my P4 HTenabled and
    full-fledged compile.
    The workaround looks good to me.

     
  • Jesse Litton

    Jesse Litton - 2006-09-12

    Logged In: YES
    user_id=210111

    It looks like dapper's libm.so.6 requires at least kernel
    2.6.0. So, I'm unable to verify for you. :(

    evil@pluto:~$ LD_ASSUME_KERNEL=2.4.19 xvidcap --mf
    xvidcap: error while loading shared libraries: libm.so.6:
    cannot open shared object file: No such file or directory

    evil@pluto:~$ LC_ALL=C objdump -s -j .note.ABI-tag
    /lib/libm.so.6

    /lib/libm.so.6: file format elf64-x86-64

    Contents of section .note.ABI-tag:
    0200 04000000 10000000 01000000 474e5500 ............GNU.
    0210 00000000 02000000 06000000 00000000 ................

    I'm guessing libm comes from glibc. I don't see any
    alternate version of that package to install.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-14

    Logged In: YES
    user_id=782084

    that's very strange, as I'm on dapper, too.
    On my Athlon XP machine I have:

    khb@hosaka:~$ locate libm.so.6
    /lib/tls/libm.so.6
    /lib/tls/i686/cmov/libm.so.6
    /lib/libm.so.6
    khb@hosaka:~$ LC_ALL=C objdump -s -j .note.ABI-tag
    /lib/tls/i686/cmov/libm.so.6

    /lib/tls/i686/cmov/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0114 04000000 10000000 01000000 474e5500 ............GNU.
    0124 00000000 02000000 06000000 00000000 ................
    khb@hosaka:~$ LC_ALL=C objdump -s -j .note.ABI-tag
    /lib/tls/libm.so.6

    /lib/tls/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0114 04000000 10000000 01000000 474e5500 ............GNU.
    0124 00000000 02000000 06000000 00000000 ................
    khb@hosaka:~$ LC_ALL=C objdump -s -j .note.ABI-tag
    /lib/libm.so.6

    /lib/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0114 04000000 10000000 01000000 474e5500 ............GNU.
    0124 00000000 02000000 02000000 00000000 ................
    khb@hosaka:~$ dpkg-query -S /lib/libm.so.6
    libc6: /lib/libm.so.6
    khb@hosaka:~$ dpkg-query -S /lib/tls/libm.so.6
    libc6: /lib/tls/libm.so.6

    and of course I can start xvidcap with the workaround
    without any problems.

    Perhaps you could check with the ubuntu folks?

     
  • Jesse Litton

    Jesse Litton - 2006-09-15

    Logged In: YES
    user_id=210111

    They appear to use a different version for the 64-bit
    distro. The one in my 32-bit chroot is the same as yours.
    I'll rebuild everything as 32-bit and see if the workaround
    works there for me.

    evil@pluto:~$ locate libm.so.6
    /arcane/chroot/lib/tls/libm.so.6
    /arcane/chroot/lib/tls/i686/cmov/libm.so.6
    /arcane/chroot/lib/libm.so.6
    /lib/libm.so.6
    /lib32/libm.so.6
    evil@pluto:~$ LC_ALL=C
    objdump -s -j .note.ABI-tag /lib32/libm.so.6

    /lib32/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0114 04000000 10000000 01000000 474e5500 ............GNU.
    0124 00000000 02000000 06000000 00000000 ................
    evil@pluto:~$ LC_ALL=C
    objdump -s -j .note.ABI-tag /arcane/chroot/lib/libm.so.6

    /arcane/chroot/lib/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0114 04000000 10000000 01000000 474e5500 ............GNU.
    0124 00000000 02000000 02000000 00000000 ................

     
  • Jesse Litton

    Jesse Litton - 2006-09-15

    Logged In: YES
    user_id=210111

    So far, your workaround seems to work great when compiling
    to 32-bit; It's not crashed on me when using
    "LD_ASSUME_KERNEL=2.4.19 xvidcap --mf" inside my 32-bit chroot.

    I did verify that when compiled 32-bit but without setting
    the kernel it still crashes regularly (in exactly the same
    manner as when I was compiling 64-bit).

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-16

    Logged In: YES
    user_id=782084

    good!
    I'll bump this upstream (need to find out where exactly
    first, though). Since I'm not using any special APIs and one
    threading implementation works but the other doesn't and
    furthermore on a single CPU (without multicore or HT)
    everything seems to work alright but not ón SMP (and this
    should be transparent to the threading API) I think this
    strongly suggests a bug in the NPTL implementation.

    If you could still test with the non-SMP kernel, I could
    also add that result to the bug.

    Thanks,
    Karl.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-09-16

    Logged In: YES
    user_id=782084

    (update)
    trying with ubuntu folks and hope the escalate further
    upstream if required.
    Created bug # 60708 and # 60711 on bugs.ubuntu.com

     
  • Jesse Litton

    Jesse Litton - 2006-09-17

    Logged In: YES
    user_id=210111

    Sorry it took so long, but I finally did test with a non-SMP
    kernel. Even the "AMD64-generic" pre-built (k)Ubuntu kernel
    was SMP-aware, so I had to recompile my own and re-build the
    nvidia drivers for it. I've done it a hundred times under
    Mandrake/Mandriva, but I had to figure out several
    Ubuntu-specific twists (like how the kernel compile fails if
    you don't disable the rt2600 drivers, and that the X server
    is running even at runlevel 3?!?!). It was... fun. :)

    ANYway... The good news is: You're right; the problem does
    not seem to present itself at all when running non-SMP.

    If there's anything else you would like me to test, perhaps
    when the upstream guys have a possible fix, just let me know.

    -J

     
  • Jesse Litton

    Jesse Litton - 2006-11-27

    Logged In: YES
    user_id=210111
    Originator: YES

    It appears that the 32-bit version of edgy now also requires kernel 2.6 for libm.so.6

    So, the workaround no longer works for (K)Ubuntu users. :(

    evil@pluto:~$ dchroot -d
    I: [edgy32 chroot] Running shell: ‘/bin/bash’
    evil@pluto:~$ LC_ALL=C objdump -s -j .note.ABI-tag /lib/libm.so.6

    /lib/libm.so.6: file format elf32-i386

    Contents of section .note.ABI-tag:
    0154 04000000 10000000 01000000 474e5500 ............GNU.
    0164 00000000 02000000 06000000 00000000 ................

     
  • Jesse Litton

    Jesse Litton - 2006-11-27
    • priority: 3 --> 6
     
  • Jesse Litton

    Jesse Litton - 2006-11-27

    Logged In: YES
    user_id=210111
    Originator: YES

    Seeing if the bug system will allow me to bump the priority on this one back up since it completely stops the program from working with newer distros on multicore systems (which is practically every new system nowadays).

    -J

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-12-03

    Logged In: YES
    user_id=782084
    Originator: NO

    (update)
    more research brings up glibc bug # 3328
    http://sources.redhat.com/bugzilla/show_bug.cgi?id=3328

    that one was rejected and points to gcc bug # 29415
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29415

    That in turn suggests that an addition to gcc after 4.1.1. introduces a bug. And edgy has some additions after 4.1.1:
    gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)

    Initial testing with gcc 4.0 looks good. Just recorded 50+ videos at 10 fps and 50 frames on my HTenabled P4.
    If you could validate that gcc 4.0 works for you guys, I'll give the ubuntu folks the pointer.

    Karl.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-12-04

    Logged In: YES
    user_id=782084
    Originator: NO

    (update)
    had another crash with gcc 4.0.4.
    I cannot seem to break a binary compiled with gcc 3.4.
    Please try this:
    - Make sure you're using gcc 3.4 (by changing the symlink or passing CC=gcc-3.4)
    - cd <xvidcap-src-dir>/ffmpeg
    - make distclean
    - cd ..
    - make distclean
    - ./autogen.sh --prefix=/tmp/xvidcap
    - make && make install
    - /tmp/xvidcap/bin/xvidcap --file test-%d.avi --fps 10 --continue yes --frames 50

    Cheers,

    Karl.

     
  • Karl H. Beckers

    Karl H. Beckers - 2006-12-04

    Logged In: YES
    user_id=782084
    Originator: NO

    (update)
    3.4 seems reasonably stable for me. Captured 700+ movies doing that amounting to more than an hour's worth of video with changing capture sizes in between to capture lightly and quickly or slowly and under heavy load. This without a single crash.
    If I read http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29415 correctly, though, the question is less a question of what compiler the application is compiled with but what compiler glibc is compiled with. That would be really bad and I don't think I can verify that quickly.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks