The logger locks up when used by real time threads. The default synchronization primitive should probably be a mutex with a possible configurable override.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The mutex is configurable, you can specify it in multi_thread_model template parameters. But it's not the only mutex that gets locked when log records are emitted. There's another one in the core and one per sink, and you can't customize them. The core uses a shared_mutex, so you shouldn't block unless you modify logging configuration. The mutex in the sinks depends on the sink frontend you use. Async frontend with unbounded_fifo_queue policy uses spin_mutex internally to queue log records.
On top of that you should also consider that Boost.Log does quite a few memory allocations while processing log records. Memory allocator is another source of possible blocking.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's the spin lock on the async queue that is the source of the problem. It doesn't really work when writing/reading threads are of different priorities. That's why, I think, it should at, a minimum, be made configurable. Otherwise the logger becomes unusable in a real-time environment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The spin_mutex is guarding just a few instructions, so if you're facing a heavy contention on it then maybe you're logging too much (among all working threads). I doubt any other kind of mutex would improve the situation. A true lock-free queue would help in this case, but it has difficulties. However, you are free to implement your own queueing strategy and see if it helps. I'd be interested to hear from you about the results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem here seems to be priority inversion. The reader and some of the writer threads are running at normal priority, several other writer threads are real-time. If a low priority threads grabs the lock and then gets swapped out, it'll never be scheduled again as the higher priority writers are still spinning. A regular mutex should have fixed this.
I'm still digging into this and will let you know if my suspicions are confirmed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The use of spin locks is explicitly discouraged for real-time threads in the posix docs
I've heard the opposite opinion - you can't and shouldn't use normal mutexes in realtime applications because you can't guarantee the execution time within a given time frame if you yield the thread. That said, Boost.Log is not well suited for realtime applications for other reasons.
So the current implementation is definitely broken.
It's not broken, it's not well suited for your case, that's it. I'm interested in improving the situation but I don't think a simple replacement of the spin mutex with a regular mutex will benefit all use cases.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've doing some reading on this since we hit the problem a few months ago and the general consensus seems to be that the priority inversion cannot be handled in user space (spin lock). The kernel, however, specifically takes steps to resolve it if it gets a chance.
Our pretty extensive testing only confirms that.
Funny that would dismiss your own library as not suited for real-time apps when we've been using with great success for a long time now.
Unfortunately we are now going to have to build and maintain our own custom version boost.log with a different mutex, which is a serious inconvenience, while all you have to do is add a build switch to select the lock implementation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The logger locks up when used by real time threads. The default synchronization primitive should probably be a mutex with a possible configurable override.
The mutex is configurable, you can specify it in multi_thread_model template parameters. But it's not the only mutex that gets locked when log records are emitted. There's another one in the core and one per sink, and you can't customize them. The core uses a shared_mutex, so you shouldn't block unless you modify logging configuration. The mutex in the sinks depends on the sink frontend you use. Async frontend with unbounded_fifo_queue policy uses spin_mutex internally to queue log records.
On top of that you should also consider that Boost.Log does quite a few memory allocations while processing log records. Memory allocator is another source of possible blocking.
It's the spin lock on the async queue that is the source of the problem. It doesn't really work when writing/reading threads are of different priorities. That's why, I think, it should at, a minimum, be made configurable. Otherwise the logger becomes unusable in a real-time environment.
Here's a snapshot of several stack frame tops from several threads just to demonstrate my point:
And there many more like that, all stuck spinning on the lock
Last edit: Oleg 2014-03-05
The spin_mutex is guarding just a few instructions, so if you're facing a heavy contention on it then maybe you're logging too much (among all working threads). I doubt any other kind of mutex would improve the situation. A true lock-free queue would help in this case, but it has difficulties. However, you are free to implement your own queueing strategy and see if it helps. I'd be interested to hear from you about the results.
The problem here seems to be priority inversion. The reader and some of the writer threads are running at normal priority, several other writer threads are real-time. If a low priority threads grabs the lock and then gets swapped out, it'll never be scheduled again as the higher priority writers are still spinning. A regular mutex should have fixed this.
I'm still digging into this and will let you know if my suspicions are confirmed.
We tested this replacing pthread_spin_lock with pthread_mutext and inteprocess::spin_mutex which yields on each spin and the problem went away.
The use of spin locks is explicitly discouraged for real-time threads in the posix docs:
http://netbsd.gw.com/cgi-bin/man-cgi?pthread_spin_lock+3+NetBSD-6.0
So the current implementation is definitely broken.
I've heard the opposite opinion - you can't and shouldn't use normal mutexes in realtime applications because you can't guarantee the execution time within a given time frame if you yield the thread. That said, Boost.Log is not well suited for realtime applications for other reasons.
It's not broken, it's not well suited for your case, that's it. I'm interested in improving the situation but I don't think a simple replacement of the spin mutex with a regular mutex will benefit all use cases.
I've doing some reading on this since we hit the problem a few months ago and the general consensus seems to be that the priority inversion cannot be handled in user space (spin lock). The kernel, however, specifically takes steps to resolve it if it gets a chance.
Our pretty extensive testing only confirms that.
Funny that would dismiss your own library as not suited for real-time apps when we've been using with great success for a long time now.
Unfortunately we are now going to have to build and maintain our own custom version boost.log with a different mutex, which is a serious inconvenience, while all you have to do is add a build switch to select the lock implementation.
Could you also try how PTHREAD_MUTEX_ADAPTIVE_NP mutex works for your workload?