|
From: Chris A. <chr...@be...> - 2007-05-02 19:14:19
|
I have a FIX based system that was developed on QuickFIX/C++ with Java bindings. I'm interested in using QuickFIX/J for a number of reasons, but I've had problems with it that I don't have with the native Quickfix. The most severe problem I have is with QuickFIX/J locking up during heavy loads. I have a reproducible case where the client program tries to send nearly 500 QuoteRequest messages through a FIX router, both running QuickFIX. With native QuickFIX, the test runs without a hitch, however with QuickFIX/J it will fail every time. Anywhere between 10 to 60 messages will get through successfully but then the FIX router just stops. There is no indication of an error in any logs, I've been specifically looking for any logging that may have come from AbstractIoHandler.exceptionCaught(). Once this error has occurred, no FIX traffic will go through the connection between the client and the router. The socket is still open, but the router doesn't seem to recognize it. Eventually the client will report the session is closed because the heart beat and TEST messages are missing. Without any indications of a problem in the logs, I don't know where to begin looking for a solution. Does anyone have an idea what could be causing this behavior? Thanks Chris |
|
From: Steve B. <st...@te...> - 2007-05-02 19:46:08
|
Hi Chris, Have you looked at a Java thread dump to see if there are any clues there? Steve > -----Original Message----- > From: qui...@li... > [mailto:qui...@li...]On Behalf Of Chris > Audley > Sent: Wednesday, May 02, 2007 3:14 PM > To: qui...@li... > Subject: [Quickfixj-users] QuickFIX/J seizes up under load > > > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ > > I have a FIX based system that was developed on QuickFIX/C++ with Java > bindings. I'm interested in using QuickFIX/J for a number of reasons, > but I've had problems with it that I don't have with the native > Quickfix. > > The most severe problem I have is with QuickFIX/J locking up during > heavy loads. I have a reproducible case where the client program tries > to send nearly 500 QuoteRequest messages through a FIX router, both > running QuickFIX. With native QuickFIX, the test runs without a hitch, > however with QuickFIX/J it will fail every time. Anywhere between 10 to > 60 messages will get through successfully but then the FIX router just > stops. > > There is no indication of an error in any logs, I've been specifically > looking for any logging that may have come from > AbstractIoHandler.exceptionCaught(). > > Once this error has occurred, no FIX traffic will go through the > connection between the client and the router. The socket is still open, > but the router doesn't seem to recognize it. Eventually the client will > report the session is closed because the heart beat and TEST messages > are missing. > > Without any indications of a problem in the logs, I don't know where to > begin looking for a solution. Does anyone have an idea what could be > causing this behavior? > > Thanks > Chris > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users > |
|
From: Chris A. <chr...@be...> - 2007-05-02 21:39:33
|
Excellent suggestion Steve, the issue is a thread deadlock. In the traces below, the session 'BEACONFIX->front-server-200-38' is the router forwarding a response message to the process that is submitting the 500 QuoteRequests. The session 'exchange-server-200-8->exchange-server200' is the recipient of the QuoteRequest messages. The deadlock occurs when the router is processing a QuoteRequest and a response to a QuoteRequest at the same time. In more abstract terms, the router is bridging between sessions A and B. Message 1 is received from A to forward to B at the same time Message 2 is received from B to forward to A. QuickFIX/J holds the session lock while processing an incoming message, so the dispatcher for Message 1 holds a lock on A and the dispatcher for Message 2 holds a lock on B. Each message wants to be sent to the other session, so dispatcher Message 1 is waiting for the lock on B and dispatcher Message 2 is waiting for the lock on A. Why does quickfix.Session.next() need to be synchronized? It looks like the synchronization in Session has been overdone. There maybe some code in next() that should be synchronized, but definitely not all of it. Found one Java-level deadlock: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D "QF/J Session dispatcher: FIX.4.4:BEACONFIX->front-server-200-38": waiting to lock monitor 0x080ece3c (object 0xe399f1e8, a quickfix.Session), which is held by "QF/J Session dispatcher: FIX.4.4:exchange-server-200-8->exchange-server200" "QF/J Session dispatcher: FIX.4.4:exchange-server-200-8->exchange-server200": waiting to lock monitor 0x080eccbc (object 0xe329fbb8, a quickfix.Session), which is held by "QF/J Session dispatcher: FIX.4.4:BEACONFIX->front-server-200-38" Java stack information for the threads listed above: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "QF/J Session dispatcher: FIX.4.4:BEACONFIX->front-server-200-38": at quickfix.Session.send(Session.java:1616) - waiting to lock <0xe399f1e8> (a quickfix.Session) at quickfix.Session.sendToTarget(Session.java:423) at headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:43) at headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:63) at headend.quickfix.DefaultMessageSender.sendToAll(DefaultMessageSender.jav a:97) at headend.quickfix.AbstractMessageSender.sendToAll(AbstractMessageSender.j ava:27) at headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH andler.java:42) at headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH andler.java:37) at headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j ava:64) at headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. java:59) at headend.quickfix.AbstractFixListener.incomingRouteMessage(AbstractFixLis tener.java:119) at headend.quickfix.AppService.incomingRouteMessage(AppService.java:79) at headend.quickfix.AppService.toRoute(AppService.java:85) at headend.quickfix.RouteMessageHandler.sendMessage(RouteMessageHandler.jav a:53) at headend.clientportal.ClientPortalAgent$RouteTradingMessageHandler.sendMe ssage(ClientPortalAgent.java:160) at headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j ava:64) at headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. java:59) at headend.quickfix.AbstractFixListener.incomingMessage(AbstractFixListener .java:98) at headend.quickfix.AppService.incomingMessage(AppService.java:69) at headend.quickfix.BasicApplication.fromApp(BasicApplication.java:125) at headend.quickfix.QuickfixApplication.fromApp(QuickfixApplication.java:86 ) at quickfix.Session.fromCallback(Session.java:1189) at quickfix.Session.verify(Session.java:1143) at quickfix.Session.verify(Session.java:1218) at quickfix.Session.next(Session.java:670) - locked <0xe329fbb8> (a quickfix.Session) at quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingTh read.run(ThreadPerSessionEventHandlingStrategy.java:75)=20 "QF/J Session dispatcher: FIX.4.4:exchange-server-200-8->exchange-server200": at quickfix.Session.isEnabled(Session.java:489) - waiting to lock <0xe329fbb8> (a quickfix.Session) at headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:40) at headend.quickfix.AbstractMessageSender.send(AbstractMessageSender.java:2 3) at headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH andler.java:44) at headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH andler.java:37) at headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j ava:64) at headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. java:59) at headend.quickfix.AbstractFixListener.incomingMessage(AbstractFixListener .java:98) at headend.quickfix.AppService.incomingMessage(AppService.java:69) at headend.quickfix.BasicApplication.fromApp(BasicApplication.java:125) at headend.quickfix.QuickfixApplication.fromApp(QuickfixApplication.java:86 ) at quickfix.Session.fromCallback(Session.java:1189) at quickfix.Session.verify(Session.java:1143) at quickfix.Session.verify(Session.java:1218) at quickfix.Session.next(Session.java:670) - locked <0xe399f1e8> (a quickfix.Session) at quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingTh read.run(ThreadPerSessionEventHandlingStrategy.java:75) -----Original Message----- From: qui...@li... [mailto:qui...@li...] On Behalf Of Steve Bate Sent: Wednesday, May 02, 2007 3:46 PM To: qui...@li... Subject: Re: [Quickfixj-users] QuickFIX/J seizes up under load QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ QuickFIX/J Support: http://www.quickfixj.org/support/ Hi Chris, Have you looked at a Java thread dump to see if there are any clues there? Steve > -----Original Message----- > From: qui...@li... > [mailto:qui...@li...]On Behalf Of=20 > Chris Audley > Sent: Wednesday, May 02, 2007 3:14 PM > To: qui...@li... > Subject: [Quickfixj-users] QuickFIX/J seizes up under load >=20 >=20 > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ >=20 > I have a FIX based system that was developed on QuickFIX/C++ with Java > bindings. I'm interested in using QuickFIX/J for a number of reasons, > but I've had problems with it that I don't have with the native=20 > Quickfix. >=20 > The most severe problem I have is with QuickFIX/J locking up during=20 > heavy loads. I have a reproducible case where the client program=20 > tries to send nearly 500 QuoteRequest messages through a FIX router,=20 > both running QuickFIX. With native QuickFIX, the test runs without a=20 > hitch, however with QuickFIX/J it will fail every time. Anywhere=20 > between 10 to 60 messages will get through successfully but then the=20 > FIX router just stops. >=20 > There is no indication of an error in any logs, I've been specifically > looking for any logging that may have come from=20 > AbstractIoHandler.exceptionCaught(). >=20 > Once this error has occurred, no FIX traffic will go through the=20 > connection between the client and the router. The socket is still=20 > open, but the router doesn't seem to recognize it. Eventually the=20 > client will report the session is closed because the heart beat and=20 > TEST messages are missing. >=20 > Without any indications of a problem in the logs, I don't know where=20 > to begin looking for a solution. Does anyone have an idea what could=20 > be causing this behavior? >=20 > Thanks > Chris >=20 > ---------------------------------------------------------------------- > --- This SF.net email is sponsored by DB2 Express Download DB2 Express > C - the FREE version of DB2 express and take control of your XML. No=20 > limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users >=20 ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Quickfixj-users mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfixj-users |
|
From: Joerg T. <Joe...@ma...> - 2007-05-03 11:22:37
|
Hi Chris, On 05/02/07 23:39, Chris Audley wrote: > Excellent suggestion Steve, the issue is a thread deadlock. Yes, very classical example: 1 locks A, tries to lock B; 2 locks B and tries to lock A. I would try to decouple A and B by some sort of concurrent queue. I.e. the fromApp() callback just puts the receive message into the queue and another threads picks messages from the queue and sends them to the other session. > In the traces below, the session 'BEACONFIX->front-server-200-38' is the > router forwarding a response message to the process that is submitting > the 500 QuoteRequests. The session > 'exchange-server-200-8->exchange-server200' is the recipient of the > QuoteRequest messages. The deadlock occurs when the router is > processing a QuoteRequest and a response to a QuoteRequest at the same > time. > > In more abstract terms, the router is bridging between sessions A and B. > Message 1 is received from A to forward to B at the same time Message 2 > is received from B to forward to A. QuickFIX/J holds the session lock > while processing an incoming message, so the dispatcher for Message 1 > holds a lock on A and the dispatcher for Message 2 holds a lock on B. > Each message wants to be sent to the other session, so dispatcher > Message 1 is waiting for the lock on B and dispatcher Message 2 is > waiting for the lock on A. > > Why does quickfix.Session.next() need to be synchronized? It looks like > the synchronization in Session has been overdone. There maybe some code > in next() that should be synchronized, but definitely not all of it. OK, you could open an enhancement request to reduce amount of synchronized code in QF/J. Possibly, the amount of synchronization could be reduced, but I am not sure. Cheers, Jörg > Found one Java-level deadlock: > ============================= > "QF/J Session dispatcher: FIX.4.4:BEACONFIX->front-server-200-38": > waiting to lock monitor 0x080ece3c (object 0xe399f1e8, a > quickfix.Session), > which is held by "QF/J Session dispatcher: > FIX.4.4:exchange-server-200-8->exchange-server200" > "QF/J Session dispatcher: > FIX.4.4:exchange-server-200-8->exchange-server200": > waiting to lock monitor 0x080eccbc (object 0xe329fbb8, a > quickfix.Session), > which is held by "QF/J Session dispatcher: > FIX.4.4:BEACONFIX->front-server-200-38" > > Java stack information for the threads listed above: > =================================================== > "QF/J Session dispatcher: FIX.4.4:BEACONFIX->front-server-200-38": > at quickfix.Session.send(Session.java:1616) > - waiting to lock <0xe399f1e8> (a quickfix.Session) > at quickfix.Session.sendToTarget(Session.java:423) > at > headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:43) > at > headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:63) > at > headend.quickfix.DefaultMessageSender.sendToAll(DefaultMessageSender.jav > a:97) > at > headend.quickfix.AbstractMessageSender.sendToAll(AbstractMessageSender.j > ava:27) > at > headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH > andler.java:42) > at > headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH > andler.java:37) > at > headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j > ava:64) > at > headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. > java:59) > at > headend.quickfix.AbstractFixListener.incomingRouteMessage(AbstractFixLis > tener.java:119) > at > headend.quickfix.AppService.incomingRouteMessage(AppService.java:79) > at headend.quickfix.AppService.toRoute(AppService.java:85) > at > headend.quickfix.RouteMessageHandler.sendMessage(RouteMessageHandler.jav > a:53) > at > headend.clientportal.ClientPortalAgent$RouteTradingMessageHandler.sendMe > ssage(ClientPortalAgent.java:160) > at > headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j > ava:64) > at > headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. > java:59) > at > headend.quickfix.AbstractFixListener.incomingMessage(AbstractFixListener > .java:98) > at > headend.quickfix.AppService.incomingMessage(AppService.java:69) > at > headend.quickfix.BasicApplication.fromApp(BasicApplication.java:125) > at > headend.quickfix.QuickfixApplication.fromApp(QuickfixApplication.java:86 > ) > at quickfix.Session.fromCallback(Session.java:1189) > at quickfix.Session.verify(Session.java:1143) > at quickfix.Session.verify(Session.java:1218) > at quickfix.Session.next(Session.java:670) > - locked <0xe329fbb8> (a quickfix.Session) > at > quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingTh > read.run(ThreadPerSessionEventHandlingStrategy.java:75) > "QF/J Session dispatcher: > FIX.4.4:exchange-server-200-8->exchange-server200": > at quickfix.Session.isEnabled(Session.java:489) > - waiting to lock <0xe329fbb8> (a quickfix.Session) > at > headend.quickfix.DefaultMessageSender.send(DefaultMessageSender.java:40) > at > headend.quickfix.AbstractMessageSender.send(AbstractMessageSender.java:2 > 3) > at > headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH > andler.java:44) > at > headend.clientportal.ResponseMessageHandler.sendMessage(ResponseMessageH > andler.java:37) > at > headend.quickfix.RouteMessageHandler.handleMessage(RouteMessageHandler.j > ava:64) > at > headend.quickfix.AbstractFixListener.processMessage(AbstractFixListener. > java:59) > at > headend.quickfix.AbstractFixListener.incomingMessage(AbstractFixListener > .java:98) > at > headend.quickfix.AppService.incomingMessage(AppService.java:69) > at > headend.quickfix.BasicApplication.fromApp(BasicApplication.java:125) > at > headend.quickfix.QuickfixApplication.fromApp(QuickfixApplication.java:86 > ) > at quickfix.Session.fromCallback(Session.java:1189) > at quickfix.Session.verify(Session.java:1143) > at quickfix.Session.verify(Session.java:1218) > at quickfix.Session.next(Session.java:670) > - locked <0xe399f1e8> (a quickfix.Session) > at > quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingTh > read.run(ThreadPerSessionEventHandlingStrategy.java:75) > > -----Original Message----- > From: qui...@li... > [mailto:qui...@li...] On Behalf Of > Steve Bate > Sent: Wednesday, May 02, 2007 3:46 PM > To: qui...@li... > Subject: Re: [Quickfixj-users] QuickFIX/J seizes up under load > > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ Hi Chris, > > Have you looked at a Java thread dump to see if there are any clues > there? > > Steve > >> -----Original Message----- >> From: qui...@li... >> [mailto:qui...@li...]On Behalf Of >> Chris Audley >> Sent: Wednesday, May 02, 2007 3:14 PM >> To: qui...@li... >> Subject: [Quickfixj-users] QuickFIX/J seizes up under load >> >> >> QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ >> QuickFIX/J Support: http://www.quickfixj.org/support/ >> >> I have a FIX based system that was developed on QuickFIX/C++ with Java > >> bindings. I'm interested in using QuickFIX/J for a number of reasons, > >> but I've had problems with it that I don't have with the native >> Quickfix. >> >> The most severe problem I have is with QuickFIX/J locking up during >> heavy loads. I have a reproducible case where the client program >> tries to send nearly 500 QuoteRequest messages through a FIX router, >> both running QuickFIX. With native QuickFIX, the test runs without a >> hitch, however with QuickFIX/J it will fail every time. Anywhere >> between 10 to 60 messages will get through successfully but then the >> FIX router just stops. >> >> There is no indication of an error in any logs, I've been specifically > >> looking for any logging that may have come from >> AbstractIoHandler.exceptionCaught(). >> >> Once this error has occurred, no FIX traffic will go through the >> connection between the client and the router. The socket is still >> open, but the router doesn't seem to recognize it. Eventually the >> client will report the session is closed because the heart beat and >> TEST messages are missing. >> >> Without any indications of a problem in the logs, I don't know where >> to begin looking for a solution. Does anyone have an idea what could >> be causing this behavior? >> >> Thanks >> Chris >> >> ---------------------------------------------------------------------- >> --- This SF.net email is sponsored by DB2 Express Download DB2 Express > >> C - the FREE version of DB2 express and take control of your XML. No >> limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Quickfixj-users mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfixj-users >> > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by DB2 Express Download DB2 Express C - > the FREE version of DB2 express and take control of your XML. No limits. > Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users > -- Joerg Thoennes http://www.macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Geschäftsführer: Roger Macdonald Lothringer Str. 52, D-52070 Aachen Amtsgericht Aachen, HRB 8151, Ust.-Id DE813021663 |
|
From: Steve B. <st...@te...> - 2007-05-04 15:35:10
|
> Why does quickfix.Session.next() need to be synchronized? It looks like > the synchronization in Session has been overdone. There maybe some code > in next() that should be synchronized, but definitely not all of it. Hi Chris, I'm open to doing lower level synchronization in next(). One potential issue is performance of increased lock acquisitions and release activity, but later versions of the Java JVM have reportedly optimized this behavior quite a bit. Lower level synchronization would also be more prone to thread safety bugs. That could be handled to some extent with a specialized test suite for that purpose. To solve your problem we'd have to not only do the locking at a lower level but also have separate mutexes for inbound and outbound messages. Feel free to add an RFE for this. Steve |