From: Robert G. <rpg...@si...> - 2011-10-30 04:15:25
|
I have been working on a large system that reads data off a socket. The data is in Google Protocol Buffer format (thanks to Robert Brown's CL protobuf library), and the socket is being filled by a Java program. We have been testing this code, and when we set up a simple program that just reads the messages off the socket (on linux, Java server on the same machine), the messages come in disastrously slowly, two orders of magnitude slower than reading the same protocol buffer data out of a file. Colleagues have programs in Java and Erlang that read from the same socket source at quite acceptable speeds. Colleagues have also replicated these results on other linux machines, so I don't believe this is an OS problem. Although I haven't figured out how to measure this carefully, when I look at top, the simple Java and SBCL programs seem to be spending most of their time idle, whereas I believe they should just be burning through the data as fast as possible. This suggests that somehow I misconfigured the SBCL socket reader, and perhaps it's sleeping between reads, the input buffer is misconfigured (so that when I call READ-SEQUENCE on the stream there's a buffer size mismatch), or something silly like that. I was hoping someone might have a theory about how this could happen. I created a class that contains a socket and that implements the gray stream API --- could this be what causes it to behave so badly? (defmethod sb-gray:stream-read-sequence ((stream active-socket-mixin) seq &optional start end) (read-sequence seq (socket-stream stream) :start start :end end)) Any suggestions would be very gratefully received. thanks, r |
From: Nikodemus S. <nik...@ra...> - 2011-10-30 09:58:29
|
On 30 October 2011 06:15, Robert Goldman <rpg...@si...> wrote: > I have been working on a large system that reads data off a socket. The > data is in Google Protocol Buffer format (thanks to Robert Brown's CL > protobuf library), and the socket is being filled by a Java program. > We have been testing this code, and when we set up a simple program that > just reads the messages off the socket (on linux, Java server on the > same machine), the messages come in disastrously slowly, two orders of > magnitude slower than reading the same protocol buffer data out of a file. > > Colleagues have programs in Java and Erlang that read from the same > socket source at quite acceptable speeds. > > Colleagues have also replicated these results on other linux machines, > so I don't believe this is an OS problem. > > Although I haven't figured out how to measure this carefully, when I > look at top, the simple Java and SBCL programs seem to be spending most > of their time idle, whereas I believe they should just be burning > through the data as fast as possible. Some idling /sounds/ sensible for an IO bound system, really, but... * I'm not sure if I understood correct, so I doublecheck: both your colleagues Java and Erlang programs use the same Java server as you do? (So we know the problem is at the lisp side.) * I would start by profiling the system using SB-SPROF in all three modes: :CPU, :ALLOC, and :TIME, and giving the results a good long look. * Without more information my first guess as to the culprit would be external format conversion. Cheers, -- Nikodemus |
From: Robert G. <rpg...@si...> - 2011-10-30 15:56:39
|
On 10/30/11 Oct 30 -4:58 AM, Nikodemus Siivola wrote: > On 30 October 2011 06:15, Robert Goldman <rpg...@si...> wrote: > >> I have been working on a large system that reads data off a socket. The >> data is in Google Protocol Buffer format (thanks to Robert Brown's CL >> protobuf library), and the socket is being filled by a Java program. > >> We have been testing this code, and when we set up a simple program that >> just reads the messages off the socket (on linux, Java server on the >> same machine), the messages come in disastrously slowly, two orders of >> magnitude slower than reading the same protocol buffer data out of a file. >> >> Colleagues have programs in Java and Erlang that read from the same >> socket source at quite acceptable speeds. >> >> Colleagues have also replicated these results on other linux machines, >> so I don't believe this is an OS problem. >> >> Although I haven't figured out how to measure this carefully, when I >> look at top, the simple Java and SBCL programs seem to be spending most >> of their time idle, whereas I believe they should just be burning >> through the data as fast as possible. > > Some idling /sounds/ sensible for an IO bound system, really, but... > > * I'm not sure if I understood correct, so I doublecheck: both your > colleagues Java and Erlang programs use the same Java server as you > do? (So we know the problem is at the lisp side.) Yes, they all use the same server. It seems that this is not simply an SBCL problem, as I originally thought. I ran roughly the same code in ACL and SBCL, and saw the same slowdown of socket traffic versus reading from a file. I am trying to probe my colleagues about the differences between the CL and Erlang and Java clients. Some format conversion might well be the issue. Digging deeper in this code, which was supposed to be an optimization of an interchange format that was originally ASCII, shows that the socket streams are (a) bidirectional (the client sends a message to the server to say "I'm done.") and (b) bivalent (the client's message is the string "DONE"). This meant that the socket stream does not provide any useful information to CL about the information it is carrying. Thank you for the assistance. best, R |
From: Philipp M. <ph...@ma...> - 2011-10-30 18:05:43
|
... >> * I'm not sure if I understood correct, so I doublecheck: both your >> colleagues Java and Erlang programs use the same Java server as you >> do? (So we know the problem is at the lisp side.) > > Yes, they all use the same server. > > It seems that this is not simply an SBCL problem, as I originally > thought. I ran roughly the same code in ACL and SBCL, and saw the same > slowdown of socket traffic versus reading from a file. ... > Some format conversion might well be the issue. Digging deeper in this > code, which was supposed to be an optimization of an interchange format > that was originally ASCII, shows that the socket streams are (a) > bidirectional (the client sends a message to the server to say "I'm > done.") and (b) bivalent (the client's message is the string "DONE"). > This meant that the socket stream does not provide any useful > information to CL about the information it is carrying. Perhaps the problem is Nagle? http://en.wikipedia.org/wiki/Nagle%27s_algorithm If the Lisp side waits for some time before sending the (small) ACKs, it might explain the bad throughput. Can you use the sb-posix:setsockopt() function to set TCP_NODELAY (http://linux.die.net/man/7/tcp) and sb-posix:write() for putting the ACK on the line? Might be interesting to see any effects. Otherwise, how about using wireshark? If you like, you can send me a pcap file for looking at. Regards, Phil |
From: Faré <fa...@gm...> - 2011-10-30 16:24:45
|
On Sun, Oct 30, 2011 at 11:56, Robert Goldman <rpg...@si...> wrote: >>> I have been working on a large system that reads data off a socket. The >>> data is in Google Protocol Buffer format (thanks to Robert Brown's CL >>> protobuf library), and the socket is being filled by a Java program. >> >>> We have been testing this code, and when we set up a simple program that >>> just reads the messages off the socket (on linux, Java server on the >>> same machine), the messages come in disastrously slowly, two orders of >>> magnitude slower than reading the same protocol buffer data out of a file. >>> >>> Colleagues have programs in Java and Erlang that read from the same >>> socket source at quite acceptable speeds. >>> >>> Colleagues have also replicated these results on other linux machines, >>> so I don't believe this is an OS problem. >>> >>> Although I haven't figured out how to measure this carefully, when I >>> look at top, the simple Java and SBCL programs seem to be spending most >>> of their time idle, whereas I believe they should just be burning >>> through the data as fast as possible. >> I don't know what version of protobuf client you're using, but it is definitely possible to write vastly inefficient versions of protobuf by interpreting all the structure at runtime instead of compiling it as intended. Where did you get what version of your protobuf implementation? At a cursory glance, is it compiling or interpreting the protobuf structures? Also, I remember problems in previous socket applications due to either not flushing output buffers properly (the other end thus being idle), or waiting idly for the input buffer being full or some terminator being sent. [For instance, I once experienced a bug wherein old versions of SBCL would demand a full 4096 byte input buffer before they would start external-format translation. Are you using a recent SBCL?] Regards, —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org |
From: Robert G. <rpg...@si...> - 2011-10-31 13:15:43
|
On 10/30/11 Oct 30 -11:24 AM, Faré wrote: > On Sun, Oct 30, 2011 at 11:56, Robert Goldman <rpg...@si...> wrote: >>>> I have been working on a large system that reads data off a socket. The >>>> data is in Google Protocol Buffer format (thanks to Robert Brown's CL >>>> protobuf library), and the socket is being filled by a Java program. >>> >>>> We have been testing this code, and when we set up a simple program that >>>> just reads the messages off the socket (on linux, Java server on the >>>> same machine), the messages come in disastrously slowly, two orders of >>>> magnitude slower than reading the same protocol buffer data out of a file. >>>> >>>> Colleagues have programs in Java and Erlang that read from the same >>>> socket source at quite acceptable speeds. >>>> >>>> Colleagues have also replicated these results on other linux machines, >>>> so I don't believe this is an OS problem. >>>> >>>> Although I haven't figured out how to measure this carefully, when I >>>> look at top, the simple Java and SBCL programs seem to be spending most >>>> of their time idle, whereas I believe they should just be burning >>>> through the data as fast as possible. >>> > I don't know what version of protobuf client you're using, but > it is definitely possible to write vastly inefficient versions of protobuf > by interpreting all the structure at runtime > instead of compiling it as intended. > Where did you get what version of your protobuf implementation? > At a cursory glance, is it compiling or interpreting > the protobuf structures? We are using Bob Brown's August version of CL-protobuf. It has specially compiled methods for the different data structures. Also, I'm inclined to think the fact that we can read these data structures off a file with a 2000-fold speedup suggests it's not the protobuf code that's hurting us. It seems to be socket-specific. > > Also, I remember problems in previous socket applications due to either > not flushing output buffers properly (the other end thus being idle), or > waiting idly for the input buffer being full or some terminator being sent. > [For instance, I once experienced a bug wherein old versions of SBCL > would demand a full 4096 byte input buffer before they would start > external-format translation. Are you using a recent SBCL?] We froze on 1.0.47 a while ago (in order to ensure we didn't have oddities across developers). Is this recent enough to be past that problem? Wouldn't be hard for me to test on 1.0.52, if it's possible there's a later improvement. Thanks for the suggestions, r |
From: Rudolf S. <ru...@co...> - 2011-10-31 13:41:25
|
On Oct 31, 2011, at 14:15, Robert Goldman wrote: > On 10/30/11 Oct 30 -11:24 AM, Faré wrote: >> On Sun, Oct 30, 2011 at 11:56, Robert Goldman <rpg...@si...> wrote: >>>>> I have been working on a large system that reads data off a socket. The >>>>> data is in Google Protocol Buffer format (thanks to Robert Brown's CL >>>>> protobuf library), and the socket is being filled by a Java program. >>>> >>>>> We have been testing this code, and when we set up a simple program that >>>>> just reads the messages off the socket (on linux, Java server on the >>>>> same machine), the messages come in disastrously slowly, two orders of >>>>> magnitude slower than reading the same protocol buffer data out of a file. >>>>> >>>>> Colleagues have programs in Java and Erlang that read from the same >>>>> socket source at quite acceptable speeds. >>>>> >>>>> Colleagues have also replicated these results on other linux machines, >>>>> so I don't believe this is an OS problem. >>>>> >>>>> Although I haven't figured out how to measure this carefully, when I >>>>> look at top, the simple Java and SBCL programs seem to be spending most >>>>> of their time idle, whereas I believe they should just be burning >>>>> through the data as fast as possible. >>>> >> I don't know what version of protobuf client you're using, but >> it is definitely possible to write vastly inefficient versions of protobuf >> by interpreting all the structure at runtime >> instead of compiling it as intended. >> Where did you get what version of your protobuf implementation? >> At a cursory glance, is it compiling or interpreting >> the protobuf structures? > > We are using Bob Brown's August version of CL-protobuf. It has > specially compiled methods for the different data structures. > > Also, I'm inclined to think the fact that we can read these data > structures off a file with a 2000-fold speedup suggests it's not the > protobuf code that's hurting us. It seems to be socket-specific. You are not by any chance using cl:read-sequence to read data from a socket? The CL standard forbids read-sequence to return with a half-filled buffer if the stream didn't EOF, which makes read-sequence unusable for sockets in practice. Rudi |
From: Robert G. <rpg...@si...> - 2011-10-31 14:58:33
|
On 10/31/11 Oct 31 -8:28 AM, Rudolf Schlatte wrote: > > On Oct 31, 2011, at 14:15, Robert Goldman wrote: > >> On 10/30/11 Oct 30 -11:24 AM, Faré wrote: >>> On Sun, Oct 30, 2011 at 11:56, Robert Goldman <rpg...@si...> wrote: >>>>>> I have been working on a large system that reads data off a socket. The >>>>>> data is in Google Protocol Buffer format (thanks to Robert Brown's CL >>>>>> protobuf library), and the socket is being filled by a Java program. >>>>> >>>>>> We have been testing this code, and when we set up a simple program that >>>>>> just reads the messages off the socket (on linux, Java server on the >>>>>> same machine), the messages come in disastrously slowly, two orders of >>>>>> magnitude slower than reading the same protocol buffer data out of a file. >>>>>> >>>>>> Colleagues have programs in Java and Erlang that read from the same >>>>>> socket source at quite acceptable speeds. >>>>>> >>>>>> Colleagues have also replicated these results on other linux machines, >>>>>> so I don't believe this is an OS problem. >>>>>> >>>>>> Although I haven't figured out how to measure this carefully, when I >>>>>> look at top, the simple Java and SBCL programs seem to be spending most >>>>>> of their time idle, whereas I believe they should just be burning >>>>>> through the data as fast as possible. >>>>> >>> I don't know what version of protobuf client you're using, but >>> it is definitely possible to write vastly inefficient versions of protobuf >>> by interpreting all the structure at runtime >>> instead of compiling it as intended. >>> Where did you get what version of your protobuf implementation? >>> At a cursory glance, is it compiling or interpreting >>> the protobuf structures? >> >> We are using Bob Brown's August version of CL-protobuf. It has >> specially compiled methods for the different data structures. >> >> Also, I'm inclined to think the fact that we can read these data >> structures off a file with a 2000-fold speedup suggests it's not the >> protobuf code that's hurting us. It seems to be socket-specific. > > You are not by any chance using cl:read-sequence to read data from a socket? > The CL standard forbids read-sequence to return with a half-filled buffer if > the stream didn't EOF, which makes read-sequence unusable for sockets in > practice. We /are/ using READ-SEQUENCE, and this may well be the problem. I ripped the protocol-buffer parsing code out of the loop, and now just READ-SEQUENCE and just throw the bytes on the floor. Still slow. What's the right replacement for READ-SEQUENCE? A hasty test seems to show READ-BYTE to be equally slow. Thanks, r |
From: <rus...@ya...> - 2011-10-31 15:50:28
|
Robert Goldman <rpg...@si...> writes: > What's the right replacement for READ-SEQUENCE? A hasty test seems to > show READ-BYTE to be equally slow. I'm not sure if this will work in the context of your problem, but you can call unix-read directly on the vector-sap of a ub8 vector once you know that there is data to be read. This is unportable of course, but should be fast. -russ |
From: Dmitry S. <dm...@st...> - 2011-11-01 17:00:30
|
>>>>> Philipp Marek writes: PM> ... >>> * I'm not sure if I understood correct, so I doublecheck: both your >>> colleagues Java and Erlang programs use the same Java server as you >>> do? (So we know the problem is at the lisp side.) >> >> Yes, they all use the same server. >> >> It seems that this is not simply an SBCL problem, as I originally >> thought. I ran roughly the same code in ACL and SBCL, and saw the >> same slowdown of socket traffic versus reading from a file. PM> ... >> Some format conversion might well be the issue. Digging deeper in >> this code, which was supposed to be an optimization of an >> interchange format that was originally ASCII, shows that the socket >> streams are (a) bidirectional (the client sends a message to the >> server to say "I'm done.") and (b) bivalent (the client's message is >> the string "DONE"). This meant that the socket stream does not >> provide any useful information to CL about the information it is >> carrying. PM> Perhaps the problem is Nagle? PM> http://en.wikipedia.org/wiki/Nagle%27s_algorithm PM> If the Lisp side waits for some time before sending the (small) PM> ACKs, it might explain the bad throughput. +1. OP can also try to pass :nodelay t to usocket:socket-connect if one is in use... [...] PM> Can you use the sb-posix:setsockopt() function to set TCP_NODELAY PM> (http://linux.die.net/man/7/tcp) and sb-posix:write() for putting PM> the ACK on the line? [...] -- Dmitry Statyvka |