Is there a good reason for basic_sink_frontend::will_consume() to grab an exclusive lock? All it seems to be doing is calling a const Filter function with no state being changed. We found this to be a major bottleneck in our system so I am wondering if there is a way of avoiding it short of changing the source code.
In general, I think the the library uses locks too aggressively. In our case and, I bet in lots of others there's no need for locking of any kind since we set up the logger (filters, exception handlers, etc) once at initialization time and don't ever change them again. So if the library provided an optimistic locking mode where certain methods (setFilter()) were only allowed before the logging core enters a 'running' state, it would give a major performance boost.
Another serious performance drag is having to allocate heap memory for pretty much every attribute. For instance, time stamp, that's normally part of every log record is usually just an integer (wrapped in ptime, or something like it). So I don't see why I can't just pass it by value to the backend sink instead of calling 'new' every time.
All these are minor and easily fixable flaws in an otherwise great library. But they have such a serious impact in our environment that they may cause us to reconsider using Boost.Log altogether. That would be a shame.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the suggestions. I'll have to inspect the code to see whether it is safe to change the lock type to a shared one. This may well be a bug.
As for memory allocation, there is no easy way around it considering the dynamic nature of attributes. I had a few ideas on how to reduce the amount of allocations but they complicate the library quite a lot and I'm not sure how they will work out in the end.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, then this is a serious if not fatal design flaw that effectively nullifies all the careful optimizations done elsewhere in the library. There's little point in having a super fast lock-free queue to pass log messages between threads if you have to grab a number of locks and do a ton a memory allocations before you get there.
By 'dynamic nature of attributes' I assume you mean that there's no restriction on attribute types and yet you have to stuff them into a container of some sort while preserving the original type. Hence the virtual dispatch, hence having to 'new' them. However, once you take my view that this solution is no solution at all for a high performance C++ logging library and cuts out a wide range of potential customer, I'm sure that alternatives will start to materialize. One obvious one is to have a buffer of some reasonable size and doing a placement new in it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I understand your reasoning and I would love to optimize the library further but for many practical applications dynamic memory allocations in the library pose no serious problem. And if a decent memory allocator is used (e.g. tcmalloc) the overhead is not as big as it may seem. Pooling the allocated memory is one of the ideas I have, and at some point I will probably implement something like that.
OTOH, if logging degrades your application performance severily, it is possible that you overuse logging. After all, you would typically do some useful work besides logging, right? And logging in tight loops is never a good idea.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What you are essentially saying is 'if you think it's too slow get a faster box'. If people choose C++ these days it's probably for performance reasons and the few memory allocations (we do use the intel-tbb allocator) and the several mutexes taken on the front end logging path do pose a serious problem. We have a function that does a bunch of things and takes 5 sec to execute (over the lifetime of the app) out of which 2.3 is inside logger::open_record(), out of which 1.5 secs it's blocked on a mutex in will_consume(). The saddest part is that this particular mutex is completely unnecessary in our case. It's only needed if one decides to change the filter in mid-flight which we (and I assume the majority of others) never do. So we are paying dearly for an edge case we're never going to hit. And that's a no-no for a general purpose logging library that's about to become part of boost. Same goes for having to 'new' every integer we log.
I think Boost.Log is a great library overall, far ahead of anything else I've seen in the C++ world. But it just makes these flaws all the more unfortunate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What you are essentially saying is 'if you think it's too slow get a faster box'.
No, what I'm saying is that there are different use cases and application domains, yours being one of them. Believe me, there are real world applications that perform online update of logging configuration, and these are by no means exceptional. Limiting the library flexibility to better suit your case at expense of other cases is not a solution, not for a general purpose library Boost.Log is.
As for memory allocator, I played with TBB a year or so ago and it performed slower than tcmalloc in my tests. I don't know if anything has changed since then. Not that I'm suggesting you to switch, but if memory allocation is the bottleneck according to your tests, it might be worth a try.
I'm changing the lock type in will_consume now so that filtering is performed in parallel. Thanks for reporting it. I will probably try to optimize it further by caching the filter in TLS some time in the future.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Believe me, there are real world applications that perform online update of logging configuration
I don't doubt there are. My point is that I don't want to pay for somebody else's features. Let them lock the world if they need to. What I'm looking for is for a way for me to communicate my usage pattern to the library: I promise that I'm not going to do certain things, so don't worry about them.
Thanks for the will_consume fix.
BTW, our testing of the TBB and tcmalloc allocators produced results similar to yours.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I'm looking for is for a way for me to communicate my usage pattern to the library: I promise that I'm not going to do certain things, so don't worry about them.
If that particular lock is still a problem now, you can create your own sink frontend without it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That's the plan. I wish registering a new sink were a bit easier though. Right now I have to cut an past a lot of boilerplate code from init_sink() into my factory sink creator.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there a good reason for basic_sink_frontend::will_consume() to grab an exclusive lock? All it seems to be doing is calling a const Filter function with no state being changed. We found this to be a major bottleneck in our system so I am wondering if there is a way of avoiding it short of changing the source code.
In general, I think the the library uses locks too aggressively. In our case and, I bet in lots of others there's no need for locking of any kind since we set up the logger (filters, exception handlers, etc) once at initialization time and don't ever change them again. So if the library provided an optimistic locking mode where certain methods (setFilter()) were only allowed before the logging core enters a 'running' state, it would give a major performance boost.
Another serious performance drag is having to allocate heap memory for pretty much every attribute. For instance, time stamp, that's normally part of every log record is usually just an integer (wrapped in ptime, or something like it). So I don't see why I can't just pass it by value to the backend sink instead of calling 'new' every time.
All these are minor and easily fixable flaws in an otherwise great library. But they have such a serious impact in our environment that they may cause us to reconsider using Boost.Log altogether. That would be a shame.
Thanks for the suggestions. I'll have to inspect the code to see whether it is safe to change the lock type to a shared one. This may well be a bug.
As for memory allocation, there is no easy way around it considering the dynamic nature of attributes. I had a few ideas on how to reduce the amount of allocations but they complicate the library quite a lot and I'm not sure how they will work out in the end.
Well, then this is a serious if not fatal design flaw that effectively nullifies all the careful optimizations done elsewhere in the library. There's little point in having a super fast lock-free queue to pass log messages between threads if you have to grab a number of locks and do a ton a memory allocations before you get there.
By 'dynamic nature of attributes' I assume you mean that there's no restriction on attribute types and yet you have to stuff them into a container of some sort while preserving the original type. Hence the virtual dispatch, hence having to 'new' them. However, once you take my view that this solution is no solution at all for a high performance C++ logging library and cuts out a wide range of potential customer, I'm sure that alternatives will start to materialize. One obvious one is to have a buffer of some reasonable size and doing a placement new in it.
I understand your reasoning and I would love to optimize the library further but for many practical applications dynamic memory allocations in the library pose no serious problem. And if a decent memory allocator is used (e.g. tcmalloc) the overhead is not as big as it may seem. Pooling the allocated memory is one of the ideas I have, and at some point I will probably implement something like that.
OTOH, if logging degrades your application performance severily, it is possible that you overuse logging. After all, you would typically do some useful work besides logging, right? And logging in tight loops is never a good idea.
What you are essentially saying is 'if you think it's too slow get a faster box'. If people choose C++ these days it's probably for performance reasons and the few memory allocations (we do use the intel-tbb allocator) and the several mutexes taken on the front end logging path do pose a serious problem. We have a function that does a bunch of things and takes 5 sec to execute (over the lifetime of the app) out of which 2.3 is inside logger::open_record(), out of which 1.5 secs it's blocked on a mutex in will_consume(). The saddest part is that this particular mutex is completely unnecessary in our case. It's only needed if one decides to change the filter in mid-flight which we (and I assume the majority of others) never do. So we are paying dearly for an edge case we're never going to hit. And that's a no-no for a general purpose logging library that's about to become part of boost. Same goes for having to 'new' every integer we log.
I think Boost.Log is a great library overall, far ahead of anything else I've seen in the C++ world. But it just makes these flaws all the more unfortunate.
No, what I'm saying is that there are different use cases and application domains, yours being one of them. Believe me, there are real world applications that perform online update of logging configuration, and these are by no means exceptional. Limiting the library flexibility to better suit your case at expense of other cases is not a solution, not for a general purpose library Boost.Log is.
As for memory allocator, I played with TBB a year or so ago and it performed slower than tcmalloc in my tests. I don't know if anything has changed since then. Not that I'm suggesting you to switch, but if memory allocation is the bottleneck according to your tests, it might be worth a try.
I'm changing the lock type in will_consume now so that filtering is performed in parallel. Thanks for reporting it. I will probably try to optimize it further by caching the filter in TLS some time in the future.
I don't doubt there are. My point is that I don't want to pay for somebody else's features. Let them lock the world if they need to. What I'm looking for is for a way for me to communicate my usage pattern to the library: I promise that I'm not going to do certain things, so don't worry about them.
Thanks for the will_consume fix.
BTW, our testing of the TBB and tcmalloc allocators produced results similar to yours.
If that particular lock is still a problem now, you can create your own sink frontend without it.
That's the plan. I wish registering a new sink were a bit easier though. Right now I have to cut an past a lot of boilerplate code from init_sink() into my factory sink creator.