Thread: Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

Brought to you by: orenmnero

quickfix-developers

Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

From: Oren M. <or...@qu...> - 2004-07-13 18:55:00

Ok,

I've taken care of most of these in CVS (haven't messed with the build 
files yet).

- The macros now do not contain ending semicolons, so you must provide 
one when using a macro
- trailing commas in enums have been removed
- tm_isdst being set to -1 after conversions, I'll probably want to 
incorporate this more tightly into the Utc classes
- std::make_pair is no longer used
- std::labs being used as suggested
- using includes like cctype instead of ctype.h, which pulls in 
standard library methods into std namespace
   std is now used to reference all of these calls

A couple things.

The std::copy bit did not jive well with gcc.  We may need to write a 
runtime test to choose the proper method of copying maps.  We may need 
to write a configure test that exposes this STL bug and uses one or the 
other depending on the results.

I noticed you replaced some sprintf calls with ostringstreams.  We 
actually used to do this, but performance testing revealed that the 
sprintf calls were considerably faster.  I don't remember the exact 
numbers, but I believe using the sprintf calls made message 
construction something on the order of 10% faster.  That's pretty 
significant.  Is there a particular reason you changed these?

--oren

Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

From: Caleb E. <cal...@gm...> - 2004-07-13 21:01:56

On Tue, 13 Jul 2004 13:54:50 -0500, Oren Miller <or...@qu...> wrote:

> I noticed you replaced some sprintf calls with ostringstreams.  We
> actually used to do this, but performance testing revealed that the
> sprintf calls were considerably faster.  I don't remember the exact
> numbers, but I believe using the sprintf calls made message
> construction something on the order of 10% faster.  That's pretty
> significant.  Is there a particular reason you changed these?

     On Linux I've definitely found this to be the case.  Using
sprintf to do the formatting and just sending char*'s to the iostreams
is considerably faster.  You can optimize even further by collecting
the return value of sprintf and using ostream::write (const char*,
streamsize) instead of ostream::operator<<, which will avoid at least
a strlen call.  Taken together, I think these will give you a good
deal more than a 10% performance improvement.

     I think there is some additional optimization that could be done
inside the QF Field class by reserving an appropriate amount of
storage for m_data in the setString method before assigning to it.  We
have found std::string::append to be a bottleneck when doing this sort
of thing on Linux.  It would have to be a bit of a guesstimate though,
since you don't know the number of digits in the ASCII representation
of the tag.

-- 
Caleb Epstein
cal...@gm...

Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

From: Oren M. <or...@qu...> - 2004-07-14 05:55:34

Yeah, I think there is quite a few things that can be done.  I modified 
the method declaration of calculate string from:

std::string calculateString()

to:

std::string& calculateString( std::string& )

This gave some interesting results.  For normal messages this gave a 
small to negligible performance improvement to the messages toString() 
implementation.  I rather expected this.  But, for messages with 
repeating groups (MassQuote messages with 10 groups), the performance 
of toString() improved something like 80-90%.  Previously the machine I 
was benchmarking was able to convert these messages to strings at a 
rate of about 12,000+ per second, after the change it was 22-23000+ per 
second.

It would be nice to see if we can focus on improving message 
construction for messages with repeating groups.  Although we can 
convert these to strings rather quickly, I was only able to construct 
1700 of these messages objects per second on the same hardware.  This 
is actually quite good considering the size of the messages, but I 
think more can be done.  One of the reasons repeating groups is slower 
is that they have the odd requirement of having the fields sorted in a 
specific order.  If we are talking to an engine that doesn't care about 
this (say, for instance, QuickFIX), it would be nice to construct the 
messages without the need for this overhead.

On Jul 13, 2004, at 4:01 PM, Caleb Epstein wrote:

> On Tue, 13 Jul 2004 13:54:50 -0500, Oren Miller 
> <or...@qu...> wrote:
>
>> I noticed you replaced some sprintf calls with ostringstreams.  We
>> actually used to do this, but performance testing revealed that the
>> sprintf calls were considerably faster.  I don't remember the exact
>> numbers, but I believe using the sprintf calls made message
>> construction something on the order of 10% faster.  That's pretty
>> significant.  Is there a particular reason you changed these?
>
>      On Linux I've definitely found this to be the case.  Using
> sprintf to do the formatting and just sending char*'s to the iostreams
> is considerably faster.  You can optimize even further by collecting
> the return value of sprintf and using ostream::write (const char*,
> streamsize) instead of ostream::operator<<, which will avoid at least
> a strlen call.  Taken together, I think these will give you a good
> deal more than a 10% performance improvement.
>
>      I think there is some additional optimization that could be done
> inside the QF Field class by reserving an appropriate amount of
> storage for m_data in the setString method before assigning to it.  We
> have found std::string::append to be a bottleneck when doing this sort
> of thing on Linux.  It would have to be a bit of a guesstimate though,
> since you don't know the number of digits in the ASCII representation
> of the tag.
>
> -- 
> Caleb Epstein
> cal...@gm...
>

Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

From: Oren M. <or...@qu...> - 2004-07-14 12:32:27

Actually, the speed improvement for regular messages was more 
significant than I thought, so the performance has improved all around.

I tried reserving space on the m_data member, but this actually hurt 
performance a bit.  The idea is good, but I figured this technique 
might be more effective on larger strings.  So instead instead I tried 
reserving space in the calculateString call, and this made a huge 
difference.  I can now do a to string on the quote messages at a rate 
of 27,000+ per second (up from the previous 23,000+).  Right now I'm 
just making a simple assumption that each field will average about 20 
characters.  Maybe this can be made smarter, but I think this is a 
reasonable assumption for most messages.  I'm happy to err on the side 
of reserving too much.  Speed versus size is a good tradeoff here since 
messages tend to be temporary objects, and generally aren't kept around 
in large quantities.

I've checked these changes if anyone wants to take a look at what was 
done and maybe suggest some other ideas.

--oren

On Jul 13, 2004, at 4:01 PM, Caleb Epstein wrote:

> On Tue, 13 Jul 2004 13:54:50 -0500, Oren Miller 
> <or...@qu...> wrote:
>
>> I noticed you replaced some sprintf calls with ostringstreams.  We
>> actually used to do this, but performance testing revealed that the
>> sprintf calls were considerably faster.  I don't remember the exact
>> numbers, but I believe using the sprintf calls made message
>> construction something on the order of 10% faster.  That's pretty
>> significant.  Is there a particular reason you changed these?
>
>      On Linux I've definitely found this to be the case.  Using
> sprintf to do the formatting and just sending char*'s to the iostreams
> is considerably faster.  You can optimize even further by collecting
> the return value of sprintf and using ostream::write (const char*,
> streamsize) instead of ostream::operator<<, which will avoid at least
> a strlen call.  Taken together, I think these will give you a good
> deal more than a 10% performance improvement.
>
>      I think there is some additional optimization that could be done
> inside the QF Field class by reserving an appropriate amount of
> storage for m_data in the setString method before assigning to it.  We
> have found std::string::append to be a bottleneck when doing this sort
> of thing on Linux.  It would have to be a bit of a guesstimate though,
> since you don't know the number of digits in the ASCII representation
> of the tag.
>
> -- 
> Caleb Epstein
> cal...@gm...
>

Re: [Quickfix-developers] Details for Solaris / SunPRO 5.3 build

From: Caleb E. <cal...@gm...> - 2004-07-14 13:10:31

OK, here's an optimization to Field::calculate which improves the
results from the "pt" application by a healthy amount.  When possible,
it uses a fixed-size char buffer and sprintf instead of building up a
string with operator+.  It falls back to the old behavior if the fixed
buffer is too small.

I also changed the checksum calculation to use std::accumulate, which
probably has no real impact on anything other than lines-of-code.

---8<---
  void calculate()
  {
      char buf[64];

      // If the largest possible representation of the field ID (11
      // digits) '=' value '\001' can fit in the buffer, use sprintf
      if (13 + m_string.length () < sizeof (buf))
          {
          m_length = sprintf (buf, "%d=%*.*s\001", m_field,
                              m_string.length (), m_string.length (),
                              m_string.data ());
          m_data.assign (buf, m_length);
          }
      else
          {
          m_data = IntConvertor::convert(m_field) + "=" + m_string + "\001";
          m_length = m_data.length ();
          }

    const char* iter = m_data.data ();
    m_total = std::accumulate (iter, iter + m_length, 0);
  }
---8<---

Before the change:
Converting integers to strings: 
    num: 10000, seconds: 0.006, num_per_second: 1.66667e+06
Converting strings to integers: 
    num: 10000, seconds: 0.001, num_per_second: 1e+07
Converting doubles to strings: 
    num: 10000, seconds: 0.019, num_per_second: 526316
Converting strings to doubles: 
    num: 10000, seconds: 0.028, num_per_second: 357143
Creating Heartbeat messages: 
    num: 10000, seconds: 0.113, num_per_second: 88495.6
Serializing Heartbeat messages to strings: 
    num: 10000, seconds: 0.155, num_per_second: 64516.1
Serializing Heartbeat messages from strings: 
    num: 10000, seconds: 0.275, num_per_second: 36363.6
Creating NewOrderSingle messages: 
    num: 10000, seconds: 0.415, num_per_second: 24096.4
Serializing NewOrderSingle messages to strings: 
    num: 10000, seconds: 0.41, num_per_second: 24390.2
Serializing NewOrderSingle messages from strings: 
    num: 10000, seconds: 0.585, num_per_second: 17094
Creating QuoteRequest messages: 
    num: 10000, seconds: 4.83, num_per_second: 2070.39
Serializing QuoteRequest messages to strings: 
    num: 10000, seconds: 0.646, num_per_second: 15479.9
Serializing QuoteRequest messages from strings: 
    num: 10000, seconds: 4.386, num_per_second: 2279.98
Reading fields from QuoteRequest message: 
    num: 10000, seconds: 1.374, num_per_second: 7278.02
Storing NewOrderSingle messages: 
    num: 10000, seconds: 0.576, num_per_second: 17361.1
Validating NewOrderSingle messages with no data dictionary: 
    num: 10000, seconds: 0.071, num_per_second: 140845
Validating NewOrderSingle messages with data dictionary: 
    num: 10000, seconds: 0.23, num_per_second: 43478.3

After:
Converting integers to strings: 
    num: 10000, seconds: 0.007, num_per_second: 1.42857e+06
Converting strings to integers: 
    num: 10000, seconds: 0.001, num_per_second: 1e+07
Converting doubles to strings: 
    num: 10000, seconds: 0.02, num_per_second: 500000
Converting strings to doubles: 
    num: 10000, seconds: 0.027, num_per_second: 370370
Creating Heartbeat messages: 
    num: 10000, seconds: 0.08, num_per_second: 125000
Serializing Heartbeat messages to strings: 
    num: 10000, seconds: 0.118, num_per_second: 84745.8
Serializing Heartbeat messages from strings: 
    num: 10000, seconds: 0.159, num_per_second: 62893.1
Creating NewOrderSingle messages: 
    num: 10000, seconds: 0.255, num_per_second: 39215.7
Serializing NewOrderSingle messages to strings: 
    num: 10000, seconds: 0.251, num_per_second: 39840.6
Serializing NewOrderSingle messages from strings: 
    num: 10000, seconds: 0.355, num_per_second: 28169
Creating QuoteRequest messages: 
    num: 10000, seconds: 3.131, num_per_second: 3193.87
Serializing QuoteRequest messages to strings: 
    num: 10000, seconds: 0.62, num_per_second: 16129
Serializing QuoteRequest messages from strings: 
    num: 10000, seconds: 2.846, num_per_second: 3513.7
Reading fields from QuoteRequest message: 
    num: 10000, seconds: 1.172, num_per_second: 8532.42
Storing NewOrderSingle messages: 
    num: 10000, seconds: 0.633, num_per_second: 15797.8
Validating NewOrderSingle messages with no data dictionary: 
    num: 10000, seconds: 0.042, num_per_second: 238095
Validating NewOrderSingle messages with data dictionary: 
    num: 10000, seconds: 0.189, num_per_second: 52910.1


-- 
Caleb Epstein
cal...@gm...