Menu

spin lock in async mode

Oleg
2014-03-03
2014-04-02
  • Oleg

    Oleg - 2014-03-03

    The logger locks up when used by real time threads. The default synchronization primitive should probably be a mutex with a possible configurable override.

     
  • Andrey Semashev

    Andrey Semashev - 2014-03-04

    The mutex is configurable, you can specify it in multi_thread_model template parameters. But it's not the only mutex that gets locked when log records are emitted. There's another one in the core and one per sink, and you can't customize them. The core uses a shared_mutex, so you shouldn't block unless you modify logging configuration. The mutex in the sinks depends on the sink frontend you use. Async frontend with unbounded_fifo_queue policy uses spin_mutex internally to queue log records.

    On top of that you should also consider that Boost.Log does quite a few memory allocations while processing log records. Memory allocator is another source of possible blocking.

     
  • Oleg

    Oleg - 2014-03-04

    It's the spin lock on the async queue that is the source of the problem. It doesn't really work when writing/reading threads are of different priorities. That's why, I think, it should at, a minimum, be made configurable. Otherwise the logger becomes unusable in a real-time environment.

     
  • Oleg

    Oleg - 2014-03-05

    Here's a snapshot of several stack frame tops from several threads just to demonstrate my point:

    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86
    #1  0x00007fd9313b499d in boost::log::v2_mt_posix::aux::sem_based_event::wait()
    #2  0x00007fd930ff7aab in boost::log::v2_mt_posix::sinks::asynchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend,
    
    Thread 49 (Thread 0x7fd922aba700 (LWP 22201)):
    #0  0x00007fd92d6d72d2 in ?? ()
    #1  0x00007fd9313b409f in boost::log::v2_mt_posix::aux::threadsafe_queue_impl_generic::push(boost::log::v2_mt_posix::aux::threadsafe_queue_impl::node_base*) ()
    #2  0x00007fd930fedaa9 in boost::log::v2_mt_posix::sinks::unbounded_fifo_queue::enqueue(boost::log::v2_mt_posix::record_view const&) ()
    
    Thread 43 (Thread 0x7fd91bea9700 (LWP 22207)):
    #0  0x00007fd92d6d72d5 in ?? ()
    #1  0x00007fd9313b409f in boost::log::v2_mt_posix::aux::threadsafe_queue_impl_generic::push(boost::log::v2_mt_posix::aux::threadsafe_queue_impl::node_base*) ()
    #2  0x00007fd930fedaa9 in boost::log::v2_mt_posix::sinks::unbounded_fifo_queue::enqueue(boost::log::v2_mt_posix::record_view const&) ()
    
    Thread 42 (Thread 0x7fd91b4a8700 (LWP 22208)):
    #0  0x00007fd92d6d72d5 in ?? ()
    #1  0x00007fd9313b409f in boost::log::v2_mt_posix::aux::threadsafe_queue_impl_generic::push(boost::log::v2_mt_posix::aux::threadsafe_queue_impl::node_base*) ()
    #2  0x00007fd930fedaa9 in boost::log::v2_mt_posix::sinks::unbounded_fifo_queue::enqueue(boost::log::v2_mt_posix::record_view const&) ()
    

    And there many more like that, all stuck spinning on the lock

     

    Last edit: Oleg 2014-03-05
  • Andrey Semashev

    Andrey Semashev - 2014-03-05

    The spin_mutex is guarding just a few instructions, so if you're facing a heavy contention on it then maybe you're logging too much (among all working threads). I doubt any other kind of mutex would improve the situation. A true lock-free queue would help in this case, but it has difficulties. However, you are free to implement your own queueing strategy and see if it helps. I'd be interested to hear from you about the results.

     
  • Oleg

    Oleg - 2014-03-05

    The problem here seems to be priority inversion. The reader and some of the writer threads are running at normal priority, several other writer threads are real-time. If a low priority threads grabs the lock and then gets swapped out, it'll never be scheduled again as the higher priority writers are still spinning. A regular mutex should have fixed this.

    I'm still digging into this and will let you know if my suspicions are confirmed.

     
  • Oleg

    Oleg - 2014-04-02

    We tested this replacing pthread_spin_lock with pthread_mutext and inteprocess::spin_mutex which yields on each spin and the problem went away.

    The use of spin locks is explicitly discouraged for real-time threads in the posix docs:

    http://netbsd.gw.com/cgi-bin/man-cgi?pthread_spin_lock+3+NetBSD-6.0

    So the current implementation is definitely broken.

     
  • Andrey Semashev

    Andrey Semashev - 2014-04-02

    The use of spin locks is explicitly discouraged for real-time threads in the posix docs

    I've heard the opposite opinion - you can't and shouldn't use normal mutexes in realtime applications because you can't guarantee the execution time within a given time frame if you yield the thread. That said, Boost.Log is not well suited for realtime applications for other reasons.

    So the current implementation is definitely broken.

    It's not broken, it's not well suited for your case, that's it. I'm interested in improving the situation but I don't think a simple replacement of the spin mutex with a regular mutex will benefit all use cases.

     
  • Oleg

    Oleg - 2014-04-02

    I've doing some reading on this since we hit the problem a few months ago and the general consensus seems to be that the priority inversion cannot be handled in user space (spin lock). The kernel, however, specifically takes steps to resolve it if it gets a chance.
    Our pretty extensive testing only confirms that.
    Funny that would dismiss your own library as not suited for real-time apps when we've been using with great success for a long time now.
    Unfortunately we are now going to have to build and maintain our own custom version boost.log with a different mutex, which is a serious inconvenience, while all you have to do is add a build switch to select the lock implementation.

     
  • Andrey Semashev

    Andrey Semashev - 2014-04-02

    Could you also try how PTHREAD_MUTEX_ADAPTIVE_NP mutex works for your workload?

     

Log in to post a comment.