Thread: [SSI] [Design] Cluster wide TCP and LVS Part 2
Brought to you by:
brucewalker,
rogertsang
From: Aneesh K. K.V <ane...@di...> - 2002-04-15 04:27:48
|
CLUSTE WIDE TCIP/IP: MODIFICATION TO bind(), connect() and close(): (1) bind require two feature. Should allow load balancing and cluster wide port. If one is interested in load balancing rather than cluster wide port space then one need to use some flags like IP_PARALLEL SERVER. ( where to pass the flag ? We will discuss it later ). If this flag is specified then bind won't consult the LVS database not it register the port and routing information in LVS database/routing table. Instead the user should manually do it using ipvsadm command. It is assumed that load balancing is needed for system daemons like telnetd http which have a startup script. This startup script can be used to do the above job. CONSULTING LVS routing table/database: We will have a NSC service to read/write and delete entries of LVS routing table. (2). Cluster wide port space: We will use the above service to see if the port is already occupied. MODIFICATION to connect(): Two allow node to make out going connection with CVIP as the address. Add an entry into the LVS database. Use the above NSC service. MODIFICATION to close(): Delete the entry from LVS database. Use the above NSC service. Any thing missing ? -aneesh |
From: Bruce W. <br...@ka...> - 2002-04-30 00:06:07
|
> CLUSTE WIDE TCIP/IP: Let's see if I understand the environment, issues and options. There are several kinds of listening daemons we may wish to run in the cluster. All of them no doubt will just do wildcard listens: a. daemons that are strictly local and not parallel and don't want to be listening on the VIPs (just the local physical interfaces; b. single instance daemons (not parallel) that might be started anywhere in the cluster and mostly want to listen on VIP and have a fixed port number. c. single instance daemons (not parallel) that might be started anywhere in the cluster and mostly want to listen on VIP and want to use an ephimeral port number. d. parallel instance daemons (like http) that want to run on every node and listen on the VIP and local interfaces, with VIP traffic being load levelled to them. LVS is just fine at handling case a (not involved), case b (somewhat degenerate case of d) and d (what LVS was meant for. Case c is a problem because you need a unique port number which the system must pick and you need it registered with ldirectord. Question is, is this a very interesting case to worry about. Now, the main place that port space is an issue is with those doing connects, which often get an ephimeral local port number for the socket. The port space needs only to be unique w.r.t. the ip address so this is a clusterwide issue only when we allow binds/connects on VIPs (normally the program doesn't specify an ip address and the system picks the one that it is going to route the traffic out on). If we want to support VIP as the address on outgoing connections ( I suspect no one else does this?), we have to provide a clusterwide port space and must register the port with ldirector so that the traffic from the remote node can be directed to the local node that initiated the connection. I think we need to discuss the specific goals a little more before going on the implementation. bruce > > MODIFICATION TO bind(), connect() and close(): > (1) bind require two feature. Should allow load balancing and cluster > wide port. If one is interested in load balancing rather than cluster > wide port space then one need to use some flags like IP_PARALLEL SERVER. > ( where to pass the flag ? We will discuss it later ). > If this flag is specified then bind won't consult the LVS database not > it register the port and routing information in LVS database/routing > table. Instead the user should manually do it using ipvsadm command. It > is assumed that load balancing is needed for system daemons like > telnetd http which have a startup script. This startup script can be > used to do the above job. > > CONSULTING LVS routing table/database: > We will have a NSC service to read/write and delete entries of LVS > routing table. > > (2). Cluster wide port space: > We will use the above service to see if the port is already occupied. > > > MODIFICATION to connect(): > Two allow node to make out going connection with CVIP as the address. > Add an entry into the LVS database. Use the above NSC service. > > MODIFICATION to close(): > Delete the entry from LVS database. Use the above NSC service. > > > > Any thing missing ? > > -aneesh > > > > > > > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel |
From: Aneesh K. K.V <ane...@di...> - 2002-04-30 15:26:46
|
On Tue, 2002-04-30 at 05:23, Bruce Walker wrote: > > CLUSTE WIDE TCIP/IP: > > Let's see if I understand the environment, issues and options. > There are several kinds of listening daemons we may wish to run in the > cluster. All of them no doubt will just do wildcard listens: > a. daemons that are strictly local and not parallel and > don't want to be listening on the VIPs (just the local > physical interfaces; > b. single instance daemons (not parallel) that might be > started anywhere in the cluster and mostly want to listen > on VIP and have a fixed port number. > c. single instance daemons (not parallel) that might be > started anywhere in the cluster and mostly want to listen > on VIP and want to use an ephimeral port number. > d. parallel instance daemons (like http) that want to run on > every node and listen on the VIP and local interfaces, with > VIP traffic being load levelled to them. > > LVS is just fine at handling case a (not involved), case b (somewhat > degenerate case of d) and d (what LVS was meant for. Case c is > a problem because you need a unique port number which the system must > pick and you need it registered with ldirectord. Question is, is this > a very interesting case to worry about. > > Now, the main place that port space is an issue is with those doing > connects, which often get an ephimeral local port number for the > socket. The port space needs only to be unique w.r.t. the ip > address so this is a clusterwide issue only when we allow > binds/connects on VIPs (normally the program doesn't > specify an ip address and the system picks the one that it is going > to route the traffic out on). If we want to support VIP as > the address on outgoing connections ( I suspect no one else does this?), > we have to provide a clusterwide port space and must register the > port with ldirector so that the traffic from the remote node can > be directed to the local node that initiated the connection. > I guess (c) and connect() have almost same characteristics. The main place the hook is needed is get_port functions in the network stack. When one does bind() it makes a call to inet_bind and and a inet_autobind ( same as case c ) when one calls sendmsg with out binding and a datagram connect. Both of this call( inet_bind and inet_autobind() ) and connect() makes a protocol specific get_port. So our hook could basically go in to the protocol specific get_port .For TCPV4 it looks like static int tcp_v4_get_port(struct sock *sk, unsigned short snum); { /* snum is the port number . if == 0 means bind to any free port */ if( ssi_get_port(struct sock *sk, unsigned short snum) < 0 ) { ret = 1; goto fail; } ssi_get_port will make a call to the SSI service. So we will have two SSI/NSC service. One to add entry to the LVS table and other to delete entry to the LVS table. So the first job of the NSC service is to query the LVS table and see whether the service is already registered. and for snum = 0 to get a free port. We will identify a connect() call by checking if we have both the destination and source IP secified. This nsc services will make a call to netfilter functinon int nf_setsockopt(struct sock *sk, int pf, int val, char *opt, int len) here we can negelect sock because it is not used at all pf=PFINET val = IP_VS_SO_SET_ADD for bind which is basically adding a service and a combination of IP_VS_SO_SET_ADD and IP_VS_SO_SET_ADDDEST for the connect which is basically adding a tuple. *opt = optval /* struct ip_vs_rule_user *urule */ len = optlen /* sizeof the above struct */ This is just a rough sketch.Lot of complexity are involved here. First i guess we will try to get the bind()/close() work and then we will try to get the connect(). What i understood is connect is not largely different from bind(). One of added complexity is if the actual connect fails then we may need to remove the entry from the LVS table which was added in the get_port routine. > > I think we need to discuss the specific goals a little more before > going on the implementation. > > bruce > > > > > MODIFICATION TO bind(), connect() and close(): > > (1) bind require two feature. Should allow load balancing and cluster > > wide port. If one is interested in load balancing rather than cluster > > wide port space then one need to use some flags like IP_PARALLEL SERVER. > > ( where to pass the flag ? We will discuss it later ). > > If this flag is specified then bind won't consult the LVS database not > > it register the port and routing information in LVS database/routing > > table. Instead the user should manually do it using ipvsadm command. It > > is assumed that load balancing is needed for system daemons like > > telnetd http which have a startup script. This startup script can be > > used to do the above job. > > > > CONSULTING LVS routing table/database: > > We will have a NSC service to read/write and delete entries of LVS > > routing table. > > > > (2). Cluster wide port space: > > We will use the above service to see if the port is already occupied. > > > > > > MODIFICATION to connect(): > > Two allow node to make out going connection with CVIP as the address. > > Add an entry into the LVS database. Use the above NSC service. > > > > MODIFICATION to close(): > > Delete the entry from LVS database. Use the above NSC service. > > > > > > > > Any thing missing ? > > > > -aneesh > > > > > > > > > > > > > > _______________________________________________ > > ssic-linux-devel mailing list > > ssi...@li... > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel |
From: John H. <john@Calva.COM> - 2002-05-01 18:40:01
|
> If we want to support VIP as > the address on outgoing connections ( I suspect no one else does this?), > we have to provide a clusterwide port space and must register the > port with ldirector so that the traffic from the remote node can > be directed to the local node that initiated the connection. I need this. In fact on UnixWare NSC I'm having to modify some programs to explicitly bind the CVIP before doing the connect. Why? Because the other end of the connection knows my address and will reject the call if it's not from me. I want them to call the CVIP for reliability, so I have to call from the CVIP. |