#161 semaphore undo count sometimes screws up

closed-fixed
IPC (12)
5
2008-10-19
2008-06-04
John Hughes
No

Testing alsa applications, which use the rather strange idiom that:

DOWN(sem) = SEMOP (wait for semaphore to go to zero, increment semaphore)

UP(sem) = SEMOP (decrement semaphore)

we find that sometimes the semaphore is left at 1 when a process exits, which should be impossible as all operations are done with the SEM_UNDO flag set.

Tracing the ssi_semexit code shows an adjustment of +1 being applied to a semaphore with value 0, which should also be impossible - with this idiom the undo should either be -1 or zero

Argument:

The semaphore is initially zero. If an application does a DOWN() call then it will have an adjustment of -1. When it does an UP it will have an adjustment of 0.

Discussion

  • John Hughes

    John Hughes - 2008-06-04

    src/pcm/pcm_direct.h from alsa lib

     
  • John Hughes

    John Hughes - 2008-06-05

    demonstrate semundo bug in openssi

     
  • John Hughes

    John Hughes - 2008-06-05

    Logged In: YES
    user_id=166336
    Originator: YES

    so, here's what happens.

    In the base code when a new proc is cloned it either shares it's semaphore undo list - a pthread_create, or it makes a new one - a fork.

    In the OpenSSI code we only know how to deal with fork - we use the pid to find the undo list.

    Here's a test program that shows the problem - when run on a non-openssi system the semaphore val is zero when the subprocess exits (because it's 10 subthreads shared an undo list). When run on an OpenSSI system the semaphore value is left at 9 because each thread gets its own undo list, only one of which is used at process exit.

    It looks like this is going to take a bit of re-architeching to fix.

    File Added: semundo.c

     
  • John Hughes

    John Hughes - 2008-06-06

    patch to fix semundo bug

     
  • John Hughes

    John Hughes - 2008-06-06

    Logged In: YES
    user_id=166336
    Originator: YES

    Ok, the fix was pretty easy - if we can assume that CLONE_SYSVSEM is always used with CLONE_THREAD, i.e. that people that share semaphore undo lists also share a tgid then we just use the tgid in place of the epid.

    Here's a patch.

    Works for me.

    File Added: sem.c.patch

     
  • John Hughes

    John Hughes - 2008-10-19
    • assigned_to: nobody --> hughesj
    • status: open --> closed-fixed
     
  • John Hughes

    John Hughes - 2008-10-19

    Fixed in CVS

     

Log in to post a comment.