Today I discovered that even if a Servers is marked as
down in SDDB , PC's will still connect to it. This is
important to know if you expect clients to move away
from that server by marking it down. The good news is
that there is a workaround.
This is what I found. Let me explain by using an example.
1/ CEPSWatch ( running on a PC ) is responsible for
maintaining a list
of up to 3 servers .
2/ The Providor Server will supply a list of up to 3
servers to
CEPSWatch.
If you know the ipaddress of the PC, you can get this
list by running "providor -a <PC ipaddress> on any
server. e.g:-
18:58:16 1002 $ providor -a 64.104.195.10
------------------------<snip>-------------------------
<yourhost ipaddr="64.104.195.10"/>
<useserver ipaddr="64.104.204.6" level="1"/>
<useserver ipaddr="64.104.204.4" level="2"/>
<useserver ipaddr="64.104.204.5" level="3"/>
-------------------------<snip>------------------------
3/ Now assume that you mark server 64.104.204.6 as
down in SDDB. When you re-run this command you will
only see :-
<useserver ipaddr="64.104.204.4" level="1"/>
<useserver ipaddr="64.104.204.5" level="2"/>
i.e Servers that are marked Down in SDDB are not
returned in he server list.
4/ However, CEPSWatch remembers the list of Servers
from the previous time. So it adds the surrent list
of servers to the previous list , removing duplicates
and comes up with a new list of Servers. In this
example, CEPSWatch will end up with the following list:-
<useserver ipaddr="64.104.204.4" level="1"/>
<useserver ipaddr="64.104.204.5" level="2"/>
<useserver ipaddr="64.104.204.6" level="3"/>
Notice the .6 servers is added to the list from the
previous list.
5/ CEPSWatch will then ping all 3 servers. If they
respond, they will be retained in the list and ordered
by response time. So in our example , 64.104.204.6
will remain in the list of servers even though it is
marked down!
There in lies the problem. We have marked a server
down but we can't stop clients who previously
connected with that server to stop using it.
The workaround is if you mark a server down because it
is unstable and should not be used by the PC's then
place it in recover mode. In recover mode, the
providor server is not running ( i.e nothing "listening
" on tcp/1029 or udp/1029. ) . This means when
CEPSWatch attempts to talk to it, there will be no
response from the server and therefore it will move
away from it. This is the surest way to get clients off
a server.
I have verified this with Greg and he agrees. Greg
will fix this
problem in the next version of the providor. The
initial plan is for the server to set the weighting
of each server ( this number is currently used to
measure the response from a server ) to a pre determine
value so that CEPSWatch will recognize that as meaning
"don't use this server".
Ticket moved from /p/ceps/feature-requests/95/
Can't be converted: