Menu

#10 Hash modulo is constant => no balancing (figured it out)

open
nobody
None
5
2011-02-08
2011-02-08
No

Hi,

Host: Ubuntu 10.04 i386 (32 bits)
Balance 3.54, tar.gz downloaded from the InLab site, then compiled

I tried to set up load balancing of 4 services listening on 127.0.0.1:8801, 127.0.0.1:8802, 127.0.0.1:8803 and 127.0.0.1:8804 the following way:

-H -b 0.0.0.0 8800 127.0.0.1:8801:500 127.0.0.1:8802:500 127.0.0.1:8803:500 127.0.0.1:8804:500 %

Clients of this service are connecting to port 8800 on the server's public IP address.

According to the logs load balancing does not happen. All the traffic is going to a single server process all the time. This process might change (but not all the time) whenever I restarted balance. I tried to get some debug output and found something:

source port 8800
file /var/run/balance/balance.8800.0.0.0.0 already exists
using AF_UNSPEC
the following channels are active:
0 0 127.0.0.1:8801:500
0 1 127.0.0.1:8802:500
0 2 127.0.0.1:8803:500
0 3 127.0.0.1:8804:500
connect from ::c41a:efbf:2232:8ab7 clilen=16
HASH-method: fold returns 3860154307
modulo 4 gives 3
trying group 0 channel 3 ... connect to channel 3 successful

Please note the source address: ::c41a:efbf:2232:8ab7

This is a meaningless value usually changing when I restart balance. It must be some memory garbage.

Then I started to read the code and found that the source address is parsed as AF_INET6, but the function actually receives a sockaddr_in structure instead of a sockaddr_in6. It results in garbage printed into the debug output.

It also results in wrong client address passed to the hash_fold function, so the module becomes a constant value resulting in broken load balancing, e.g. directing all the traffic to a single channel all the time. Fail-over can occur, however, since only the channel number becomes fixed due to this bug.

I fixed it in the code temporarily to replace sockaddr_in6 with sockaddr_in and sin6_addr to sin_addr in the appropriate places, but it is not the right way to fix it. After applying these workarounds at two places in the code (see the attached diff file) it started to work as expected and load balancing started to work fine:

source port 8800
file /var/run/balance/balance.8800.0.0.0.0 already exists
using AF_UNSPEC
the following channels are active:
0 0 127.0.0.1:8801:500
0 1 127.0.0.1:8802:500
0 2 127.0.0.1:8803:500
0 3 127.0.0.1:8804:500
connect from 146.2.193.68 clilen=16
HASH-method: fold returns 4291690323
modulo 4 gives 3
trying group 0 channel 3 ... connect to channel 3 successful
argv[0]=/usr/local/bin/balance
bindhost=0.0.0.0
connect from 146.2.193.68 clilen=16
HASH-method: fold returns 4291690323
modulo 4 gives 3
trying group 0 channel 3 ... connect to channel 3 successful
connect from 89.160.167.168 clilen=16
HASH-method: fold returns 2556296
modulo 4 gives 0
trying group 0 channel 0 ... connect to channel 0 successful
argv[0]=/usr/local/bin/balance
bindhost=0.0.0.0
connect from 89.160.167.168 clilen=16
HASH-method: fold returns 2556296
modulo 4 gives 0
trying group 0 channel 0 ... connect to channel 0 successful
connect from 145.236.32.95 clilen=16
HASH-method: fold returns 4291642362
modulo 4 gives 2
trying group 0 channel 2 ... connect to channel 2 successful
...

Please note, that the attached workaround works only with IPv4 here and is NOT guaranteed to work with IPv6! It won't crash, but will display the source addresses wrong in the debug output and might not properly balance the traffic due to the hash algorithm receiving wrong (possibly constant) data. So DO NOT use my patch for IPv6!

I think all the places where balance handles any IPv4 or IPv6 data structure should be reviewed and the two different data structures should not be mixed in a non type safe way by casting types at lots of places. The two cases should be clearly separated by some variable and conditional clauses should be introduced to clearly separate these code paths and the variables used in each of these cases.

Discussion

  • Viktor Ferenczi

    Viktor Ferenczi - 2011-02-08

    Patch to fix Balance 3.54 to work properly with IPv4 and hash based load balancing.

     
  • Viktor Ferenczi

    Viktor Ferenczi - 2011-02-08
    • summary: Hash module is constant => not balanced (figured it out) --> Hash modulo is constant => no balancing (figured it out)
     

Log in to post a comment.

MongoDB Logo MongoDB