Menu

zmq_recv.vi hangs frequently in DEALER.vi from my throughput test

crocket
2013-05-08
2013-05-21
  • crocket

    crocket - 2013-05-08

    In DEALER.vi, I send messages and recv what were sent in a batch.

    When I encounter a hang, I abort DEALER.vi and ROUTER.vi.
    After aborting them and starting them again, ROUTER.vi can't bind because of "address in use" error.

    So I have to choose another address for the test.

    Why does it happen so frequently?

     
  • Martijn Jasperse

    Aborting does not guarantee an address will be released. There are several reasons for this, including improper termination resulting in the OS holding the port open (aka the "TIME_WAIT" problem).

    Aborts should not be part of normal operating procedure; is the "hang" an OS-level hang or a block-diagram level hang? I.e. does LabVIEW continue to function, but both sides appear to be waiting on a recv?

     
  • crocket

    crocket - 2013-05-08

    Both ROUTER.vi and DEALER.vi wait on a recv forever.

    If I raise the volume above 300,000, the probability of hang increases conspicuously..

    The more volume I specify in both VIs, the more likely they wait on a recv forever.

    And if I abort those hanging VIs often, labview crashes.

    It doesn't happen with jzmq or JeroMQ.

    Plus, you should assume that aborts are normal since my coworker terminates VIs by aborting them. And even I used to do so until I met labview-zmq. It's hard to resist the big friendly button.

     

    Last edit: crocket 2013-05-08
  • Martijn Jasperse

    I've spent some time looking into this, and I believe the cause to be incorrect use of the HWM property. If you check the API (http://api.zeromq.org/3-2:zmq-setsockopt), you'll see that the actual HWM is up to 60% lower than what you specify. I think messages are being silently dropped during routing as the HWM is reached, causing the dealer to wait forever for messages that are never coming. Whether it hangs or not therefore depends on how quickly messages are removed from the queue, introducing randomness.

    It seems to work for me when I use a HWM which is 2x the actual number of messages. I noticed your language tests use volume*3. Why did you not do this in the LV test?

    Also, aborts interrupt code execution circumventing clean-up, which combined with the lack of destructors produces undefined behaviour - principally memory leaks and thread mangling. That is why aborts should only be used for debugging or as last resort. It was hard enough to make it so labview doesn't crash every time you click abort.

     
  • crocket

    crocket - 2013-05-09

    I set RCVHWM and SNDHWM to volume * 5, but if I repeat DEALER.vi several times, both VIs wait for recv forever.

    I guess something is going wrong here.

     
  • Martijn Jasperse

    I set the HWMs to volume*5 in both dealer and router, and added a loop counter to "router.vi" to track how many messages it's passing. Running "dealer.vi" on "Run Continuously" has been going for some time (>30min), successfully passing 80 million messages with no hanging.

    There was a memory leak causing LV to run out of memory, which has been corrected in the most recent version (1.4.3.72), but I have not observed waiting forever since changing RCVHWM/SNDHWM in both VIs.

     
  • crocket

    crocket - 2013-05-15

    I ran dealer.vi and router.vi in LabVIEW2012 on two windows xp sp3 32bit systems and one windows 7 64bit system.

    It seems that the issue doesn't exist on windows 7 64bit.
    But the issue is reproducible on windows xp sp3 32bit systems.

    Can you test it on windows xp sp3 32bit?

     
  • Martijn Jasperse

    We no longer have any XP machines so testing is not particularly easy and testing inside a VM may not give the same results.
    Regardless, once I got a VM running I observed a sudden sharp memory usage increase after ~80,000 messages routed, sometimes resulting in the OS running out of memory and the process being terminated, otherwise producing 1097 error (seg fault).

    Can you check whether you see a sudden spike in memory usage before it hangs?

     
  • Martijn Jasperse

    I've checked against C-equivalent programs and dealing from C, routing through LabVIEW appears to work fine, 500,000 messages routed happily.

    However, dealing from LabVIEW and routing through C makes my VM run out of memory within 100,000 messages. This casts suspicion on the dealer code, which is odd because they both use the same send/recv calls under the hood.

    I've attached the test programs; perhaps you could perform the same test to see which side of the procedure is hanging first.

     
  • Martijn Jasperse

    Getting a bit ahead of myself there, since 10kb*40000 = 390mb, which was just enough to tip my VM over the edge. So the "sudden spike" was 10kb per message, building up through the routing process. I had never considered the possibility of malloc() failing, hence the segfault.

    Regardless, throwing more memory at it I'm seeing it work fine - right up until it runs out of memory. Both dealer and router in LabVIEW exchanging messages continuously without problems (1.5 million so far). Tested in Win XP 32-bit SP3 VM with 1GB RAM.

     
  • crocket

    crocket - 2013-05-16

    I ran the executables from dealer-router.zip on my windows xp system, and it usually hangs after 6 million messages ~ 200 million messages are processed. I don't know which side hangs first with that test.

    I installed labview-zmq-1.4.4.73.vip and set the volume to as low as 40,000, but my VIs still hang after 82667 messages.

    Something is very very wrong.

     

    Last edit: crocket 2013-05-16
  • Martijn Jasperse

    I have filed this as ticket #3. However, as it only appears to affect xp systems and I cannot reproduce it locally, it is of low-priority. This is open-source, unpaid development; you are of course free to debug the source yourself and I am happy to merge patches if you find a fix.

     
  • crocket

    crocket - 2013-05-21

    I left a note in ticket #3. Please go check.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.