1. In the case where Boost.Log is faster, the formatter is still used, although it doesn't contain time stamp.
2. In the "slow" case, compared to the "fast" case, only time stamp attribute and formatter are added.
3. In both "slow" and "fast" cases filtering is either used or not, and if it is, the filters are the same.
Also, how many threads did you use in MT tests and how many cores did you have in the VM?
I have a few thoughts on how to improve formatting scalability. This might improve MT performance quite a bit.
> I've written a custom attribute using ftime and strftime and it performance is slightly better but still 2.5 times slower.
I've seen that different timers on Linux provide performances that differ vastly (some are a few times slower than others). You might want to experiment with different timer functions and see if it helps. I'd start with standard time() function.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The (major) discrepancy in the ftime and custom strftime to log4cxx's time stamp is caching. I've added a simple cache (which compares accuracy to the millisecond and, if they're the same, appends the cached string to the stream.
Boost.Log is now about 50% slower using the cached custom ftime, as opposed to before where it was 250-300% slower. And much, much more using Boost.DateTime.
The standard time() returns a time_t which does not have fractional seconds. I've used ftime() for milliseconds - and gettimeofday() which provides microseconds - to keep it in line with log4cxx & APR (Apache Portable Runtime).
I also 'plugged' in log4cxx & APR's time stamp acquisition and formatting into Boost.Log and the performance was similar to my ftime and custom strftime, about 40-50% slower.
I've managed to get hold of a non-VM Linux machine so I'll run some more tests on it now. And I'll address the other scenarios for Linux then.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Okay, the results where Boost.Log was outperforming log4cxx in linux was on a single-core machine.
In this case:
1. Correct
2. Correct - the Boost.DateTime attribute appears to be very, very slow
3. Filtering is not used.
I tested 1,2,4 and 8 threads.
I've now tested on a native Linux machine with 4 cores:
1. Boost.Log is **marginally slower** (<10%) when only the message and severity attributes are included (and formatted).
2. Boost.Log is **comparable** for 1 and 2 (extra logging) threads and is **faster** for more than 4. Tested with just the message attribute and also with severity attribute.
3. Boost.Log is **5-6 times slower** with the message and time stamp attributes (with and without fractional seconds).
4. Boost.Log is **1.5 times slower** with the message and custom time stamp attributes. Custom using log4cxx's time stamp acquisition and formatting or ftime and custom strfitme fomatter. Both used caching.
5. Boost.Log is **slightly slower** (<15-20%) when filtering on severity level with just the message attribute.
Looks better in several tests (especially, comparing the absolute numbers), but not quite there yet. What's surprising is that it is also worse in other tests. I ran these tests several times, so this is not a measurement error. I cannot explain why it is so.
Andrey, if you have time, could you please run tests using STLPort with the time stamp enabled? I assume you're using the IOStreams with STLPort, it'd be interesting to see how Boost.DateTime performs.
(I haven't set my environment up to use STLPort yet…)
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Okay, sorry for the delayed response I have just returned from leave.
Before the break I tested STL Port and the results were similar to yours. Using my custom clock, Boost.Log consistently performs faster than log4cxx. Although Boost.DateTime is slow, log4cxx's time uses buffering which obviously in our test scenarios is very beneficial. My custom clock uses buffering also.
With regard to the use of my code, I have consulted my boss and he is happy for you to use it as you wish. He just mentioned that he doesn't want my time used up maintaining it, but I think from your message you'll be cleaning it up and doing that yourself?
I only tested Boost.Log against our existing internal library. I was going to test against Pantheios but, unfortunately, never got around to it…
Cheers,
Sam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That's interesting. Is the following correct:
1. In the case where Boost.Log is faster, the formatter is still used, although it doesn't contain time stamp.
2. In the "slow" case, compared to the "fast" case, only time stamp attribute and formatter are added.
3. In both "slow" and "fast" cases filtering is either used or not, and if it is, the filters are the same.
Also, how many threads did you use in MT tests and how many cores did you have in the VM?
I have a few thoughts on how to improve formatting scalability. This might improve MT performance quite a bit.
> I've written a custom attribute using ftime and strftime and it performance is slightly better but still 2.5 times slower.
I've seen that different timers on Linux provide performances that differ vastly (some are a few times slower than others). You might want to experiment with different timer functions and see if it helps. I'd start with standard time() function.
The (major) discrepancy in the ftime and custom strftime to log4cxx's time stamp is caching. I've added a simple cache (which compares accuracy to the millisecond and, if they're the same, appends the cached string to the stream.
Boost.Log is now about 50% slower using the cached custom ftime, as opposed to before where it was 250-300% slower. And much, much more using Boost.DateTime.
The standard time() returns a time_t which does not have fractional seconds. I've used ftime() for milliseconds - and gettimeofday() which provides microseconds - to keep it in line with log4cxx & APR (Apache Portable Runtime).
I also 'plugged' in log4cxx & APR's time stamp acquisition and formatting into Boost.Log and the performance was similar to my ftime and custom strftime, about 40-50% slower.
I've managed to get hold of a non-VM Linux machine so I'll run some more tests on it now. And I'll address the other scenarios for Linux then.
Okay, the results where Boost.Log was outperforming log4cxx in linux was on a single-core machine.
In this case:
1. Correct
2. Correct - the Boost.DateTime attribute appears to be very, very slow
3. Filtering is not used.
I tested 1,2,4 and 8 threads.
I've now tested on a native Linux machine with 4 cores:
1. Boost.Log is **marginally slower** (<10%) when only the message and severity attributes are included (and formatted).
2. Boost.Log is **comparable** for 1 and 2 (extra logging) threads and is **faster** for more than 4. Tested with just the message attribute and also with severity attribute.
3. Boost.Log is **5-6 times slower** with the message and time stamp attributes (with and without fractional seconds).
4. Boost.Log is **1.5 times slower** with the message and custom time stamp attributes. Custom using log4cxx's time stamp acquisition and formatting or ftime and custom strfitme fomatter. Both used caching.
5. Boost.Log is **slightly slower** (<15-20%) when filtering on severity level with just the message attribute.
: https://sourceforge.net/apps/trac/boost-log/attachment/wiki/LogFourCxxComparison/results_gcc_64bit.txt
I finally managed to investigate the possible cause of our results discrepancy. As I expected, STL plays a major role in the library performance.
Here are results with STL, bundled with MSVC 9.
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-100000-stream_formatter-synchronous-16,54.4944,15.6196,3.48884734564
multithreaded_one_sink-100000-stream_formatter-synchronous-2,6.17228,2.32106,2.65925051485
multithreaded_one_sink-100000-stream_formatter-synchronous-4,13.1288,3.97269,3.30476327123
multithreaded_one_sink-100000-stream_formatter-synchronous-8,27.0216,7.78805,3.46962333318
multithreaded_one_sink-500000-stream_formatter-synchronous-16,271.043,77.3134,3.50577002176
multithreaded_one_sink-500000-stream_formatter-synchronous-2,31.3997,12.2201,2.56951252445
multithreaded_one_sink-500000-stream_formatter-synchronous-4,65.4474,20.4628,3.19835995074
multithreaded_one_sink-500000-stream_formatter-synchronous-8,134.113,39.0574,3.43374110924
severity_filter_one_sink-100000-stream_formatter-synchronous-All,2.29414,1.70946,1.34202613691
severity_filter_one_sink-100000-stream_formatter-synchronous-Debug,2.28437,1.68863,1.35279486921
severity_filter_one_sink-100000-stream_formatter-synchronous-Error,0.72986,0.417632,1.74761512528
severity_filter_one_sink-100000-stream_formatter-synchronous-Off,0.0117604,0.00573194,2.05173117653
severity_filter_one_sink-100000-stream_formatter-synchronous-Warning,1.28886,0.852997,1.51097835045
severity_filter_one_sink-500000-stream_formatter-synchronous-All,11.3765,8.46448,1.34402822146
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,11.4139,8.47666,1.34650912034
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,3.65212,2.12972,1.71483575306
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0539772,0.0252915,2.1342031908
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,6.48643,4.30585,1.50642265755
By default, MS STL uses safe iterators even in release mode. When I disabled it by defining \_SECURE\_SCL=0, I got these results:
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-100000-stream_formatter-synchronous-16,50.9833,16.1329,3.16020678241
multithreaded_one_sink-100000-stream_formatter-synchronous-2,5.93586,2.40918,2.4638507708
multithreaded_one_sink-100000-stream_formatter-synchronous-4,12.1192,4.02978,3.00740983379
multithreaded_one_sink-100000-stream_formatter-synchronous-8,25.0048,8.02236,3.11688829721
multithreaded_one_sink-500000-stream_formatter-synchronous-16,258.451,80.382,3.21528451643
multithreaded_one_sink-500000-stream_formatter-synchronous-2,30.021,12.9795,2.31295504449
multithreaded_one_sink-500000-stream_formatter-synchronous-4,62.0108,20.1239,3.08145041468
multithreaded_one_sink-500000-stream_formatter-synchronous-8,123.769,40.0947,3.08691672465
severity_filter_one_sink-100000-stream_formatter-synchronous-All,2.32958,1.61444,1.44296474319
severity_filter_one_sink-100000-stream_formatter-synchronous-Debug,2.27488,1.62285,1.4017808177
severity_filter_one_sink-100000-stream_formatter-synchronous-Error,0.764154,0.403368,1.89443386684
severity_filter_one_sink-100000-stream_formatter-synchronous-Off,0.0107245,0.00534188,2.00762652849
severity_filter_one_sink-100000-stream_formatter-synchronous-Warning,1.44511,0.829504,1.7421374701
severity_filter_one_sink-500000-stream_formatter-synchronous-All,11.4795,8.11225,1.41508212888
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,11.437,8.06501,1.41810115549
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,3.76856,2.01278,1.87231590139
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0555998,0.0258023,2.15483890971
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,6.53946,4.11032,1.59098561669
Looks better in several tests (especially, comparing the absolute numbers), but not quite there yet. What's surprising is that it is also worse in other tests. I ran these tests several times, so this is not a measurement error. I cannot explain why it is so.
Now, there are testing results with STLPort:
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-100000-stream_formatter-synchronous-16,10.3824,10.9299,0.949908050394
multithreaded_one_sink-100000-stream_formatter-synchronous-2,1.48811,1.32735,1.12111349682
multithreaded_one_sink-100000-stream_formatter-synchronous-4,2.65505,2.75109,0.965090200611
multithreaded_one_sink-100000-stream_formatter-synchronous-8,5.2386,5.45188,0.960879549807
multithreaded_one_sink-500000-stream_formatter-synchronous-16,51.0906,54.6055,0.935631026179
multithreaded_one_sink-500000-stream_formatter-synchronous-2,7.23628,6.77495,1.06809349146
multithreaded_one_sink-500000-stream_formatter-synchronous-4,12.9446,14.0326,0.922466257144
multithreaded_one_sink-500000-stream_formatter-synchronous-8,25.4704,27.4352,0.928383973873
severity_filter_one_sink-100000-stream_formatter-synchronous-All,0.964215,1.02641,0.939405305872
severity_filter_one_sink-100000-stream_formatter-synchronous-Debug,0.965002,1.01575,0.950038887522
severity_filter_one_sink-100000-stream_formatter-synchronous-Error,0.397464,0.256262,1.5510063919
severity_filter_one_sink-100000-stream_formatter-synchronous-Off,0.0108164,0.00534398,2.02403452109
severity_filter_one_sink-100000-stream_formatter-synchronous-Warning,0.585077,0.507316,1.15327921848
severity_filter_one_sink-500000-stream_formatter-synchronous-All,4.85279,5.13153,0.945680917777
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,4.99073,5.13659,0.971603729322
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,2.03358,1.26737,1.60456693783
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0543396,0.0257195,2.11277824219
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,3.00093,2.57053,1.16743628746
These are much better. I once again ensured that MSVC STL is crap.
PS: All tests were run in three iterations, on Windows Vista, on a Core 2 Duo CPU. The revision used was 408.
Wow, doesn't bode well for MSVC STL…
Andrey, if you have time, could you please run tests using STLPort with the time stamp enabled? I assume you're using the IOStreams with STLPort, it'd be interesting to see how Boost.DateTime performs.
(I haven't set my environment up to use STLPort yet…)
Thanks.
Here it goes:
MS STL:
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-500000-stream_formatter-synchronous-16,274.982,77.6194,3.54269680003
multithreaded_one_sink-500000-stream_formatter-synchronous-2,31.0337,11.9872,2.58890316337
multithreaded_one_sink-500000-stream_formatter-synchronous-4,65.7671,19.4855,3.37518154525
multithreaded_one_sink-500000-stream_formatter-synchronous-8,134.436,38.9683,3.44988105717
null_backend_formatted-500000-stream_formatter-synchronous,9.25863,7.51481,1.23205110974
null_backend_formatted-500000-stream_formatter-synchronous-severity,11.2151,8.38265,1.33789434129
null_backend_formatted-500000-stream_formatter-synchronous-timestamp,47.8288,8.67058,5.51621690821
null_backend_formatted-500000-stream_formatter-synchronous-timestamp-severity,49.7972,9.78414,5.08958375493
severity_filter_one_sink-500000-stream_formatter-synchronous-All,11.8941,8.43732,1.40970118474
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,11.9009,8.46729,1.40551463337
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,3.81297,2.10672,1.80990829346
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0538176,0.0264101,2.03776585473
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,6.68373,4.26425,1.56738699654
MS STL with \_SECURE\_SCL=0 defined
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-500000-stream_formatter-synchronous-16,272.442,74.0341,3.67995288658
multithreaded_one_sink-500000-stream_formatter-synchronous-2,30.9445,9.80439,3.15618819733
multithreaded_one_sink-500000-stream_formatter-synchronous-4,64.1603,18.4257,3.48210922787
multithreaded_one_sink-500000-stream_formatter-synchronous-8,132.214,36.9861,3.57469427704
null_backend_formatted-500000-stream_formatter-synchronous,9.03268,7.09685,1.27277313174
null_backend_formatted-500000-stream_formatter-synchronous-severity,10.9823,7.90245,1.38973356364
null_backend_formatted-500000-stream_formatter-synchronous-timestamp,45.6601,8.25821,5.52905532797
null_backend_formatted-500000-stream_formatter-synchronous-timestamp-severity,47.6091,9.16183,5.19646184223
severity_filter_one_sink-500000-stream_formatter-synchronous-All,11.3727,8.00671,1.42039614274
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,11.3207,7.97542,1.4194487563
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,3.76906,1.99189,1.89220288269
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0581114,0.026011,2.23410864634
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,6.50412,4.07295,1.59690641918
STLPort:
Test,boost_log Avg Time,log4cxx Avg Time,boost_log/log4cxx Avg Time Ratio
multithreaded_one_sink-500000-stream_formatter-synchronous-16,51.8109,55.479,0.933883090899
multithreaded_one_sink-500000-stream_formatter-synchronous-2,7.46065,6.58715,1.13260666601
multithreaded_one_sink-500000-stream_formatter-synchronous-4,13.1226,13.9511,0.940614001763
multithreaded_one_sink-500000-stream_formatter-synchronous-8,26.6689,27.735,0.961561204255
null_backend_formatted-500000-stream_formatter-synchronous,3.87206,4.67924,0.82749762782
null_backend_formatted-500000-stream_formatter-synchronous-severity,4.63098,5.12503,0.903600564289
null_backend_formatted-500000-stream_formatter-synchronous-timestamp,22.569,5.14729,4.38463735286
null_backend_formatted-500000-stream_formatter-synchronous-timestamp-severity,23.7769,5.48891,4.33180722584
severity_filter_one_sink-500000-stream_formatter-synchronous-All,4.81075,5.11426,0.940654170887
severity_filter_one_sink-500000-stream_formatter-synchronous-Debug,4.81807,5.10213,0.944325213195
severity_filter_one_sink-500000-stream_formatter-synchronous-Error,1.9644,1.26537,1.55243130468
severity_filter_one_sink-500000-stream_formatter-synchronous-Off,0.0554591,0.0252432,2.19699166508
severity_filter_one_sink-500000-stream_formatter-synchronous-Warning,2.93594,2.5407,1.1555634274
These tests were done on revision 409, one iteration each to save time.
One additional observation I have about MS STL is that it severely reduces scalability. In MT tests the second core was never utilized fully.
Sam, I would like to clean up the test code and release it with the library under Boost license. Do you have any objections?
Also, did you test Boost.Log against other libraries? I would be interested to see the results.
Okay, sorry for the delayed response I have just returned from leave.
Before the break I tested STL Port and the results were similar to yours. Using my custom clock, Boost.Log consistently performs faster than log4cxx. Although Boost.DateTime is slow, log4cxx's time uses buffering which obviously in our test scenarios is very beneficial. My custom clock uses buffering also.
With regard to the use of my code, I have consulted my boss and he is happy for you to use it as you wish. He just mentioned that he doesn't want my time used up maintaining it, but I think from your message you'll be cleaning it up and doing that yourself?
I only tested Boost.Log against our existing internal library. I was going to test against Pantheios but, unfortunately, never got around to it…
Cheers,
Sam