Thread: [SSI] [Cluster Alias/CVIP ] Wish List or is it already there ?
Brought to you by:
brucewalker,
rogertsang
From: Aneesh K. K.V <ane...@di...> - 2002-03-26 08:01:58
|
Hi, I was going through the cluster documentation of Tru64 to figure out what exactly is CVIP.It actually made me prepare this wish list. I wonder if any of this is already there. 1)Since we talk of Single System Image I guess we need to have the external interface( I mean interface connected to the external network ) of all the nodes to be shown on each node once i do ifconfig ( I guess Brian once told about this. Brian ? ) ie, on node1 if I do a ifconfig -a I should get eth0 eth1 eth2 .... .... where eth1 and eth2 are interfaces on node2 and node3. I should be able to do any ifconfig operation on these interfaces. I should be able to run a server on node1 that listen on IP associated with eth1 or eth2 ( I guess this brings in the requirement of bind/connect and listen to be clusterwide. am I correct Bruce ? ) I am not sure how we should name the cluster interconnect interface in this particular case. 2) It will great if i can see my cluster alias as an interface on all the nodes and if I can do a "ifconfig clu0 down" That will bring the cluster IP down. If you have multiple cluster alias then ifconfig should show only those alias to which the particular node belongs. 3) I should be able to configure a cluster alias and should be able to say node1 node2 and node3 belong to this alias and node4, node5 and node6 belong to cluster alias2 3) A user configurable load balancing on the basis of connection in the case of a multi instance configuration of different server like webserver ( LVS ? ) 4) ...... ( Not there yet but as I read more about CVIP I will update this part :) ) -aneesh |
From: John H. <john@Calva.COM> - 2002-03-26 08:59:06
|
> I was going through the cluster documentation of Tru64 to figure out > what exactly is CVIP.It actually made me prepare this wish list. I > wonder if any of this is already there. Shouldn't you be looking at the UnixWare NSC doc rather than Tru64? Or is the cvip implementation the same? > 1)Since we talk of Single System Image I guess we need to have the > external interface( I mean interface connected to the external > network ) of all the nodes to be shown on each node once i do > ifconfig ( I guess Brian once told about this. Brian ? ) Here's what I get on my NSC system: (net0.[01] is the cluster interconnect, net1.[01] is the external ether, cvip0 is the cvip and lo0 is the loopback.) # onall ifconfig -a (node 1) lo0: flags=4049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask ff000000 net0.1: flags=24043<UP,BROADCAST,RUNNING,MULTICAST,ICS> mtu 1500 inet 192.168.255.1 netmask ffffff00 broadcast 192.168.255.255 ether 00:03:47:40:ce:a3 cvip0: flags=84041<UP,RUNNING,MULTICAST,CVIP> mtu 1500 inet 213.39.1.236 netmask ffffffe0 net1.1: flags=4043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 213.39.1.243 netmask ffffffe0 broadcast 213.39.1.255 ether 00:06:5b:38:49:cf net1.2: flags=44043<UP,BROADCAST,RUNNING,MULTICAST,REMOTE> mtu 1500 inet 213.39.1.244 netmask ffffffe0 broadcast 213.39.1.255 ether 00:06:5b:38:49:ba net0.2: flags=64043<UP,BROADCAST,RUNNING,MULTICAST,ICS,REMOTE> mtu 1500 inet 192.168.255.2 netmask ffffff00 broadcast 192.168.255.255 ether 00:03:47:3f:c9:0d (node 2) lo0: flags=4049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask ff000000 net0.2: flags=24043<UP,BROADCAST,RUNNING,MULTICAST,ICS> mtu 1500 inet 192.168.255.2 netmask ffffff00 broadcast 192.168.255.255 ether 00:03:47:3f:c9:0d cvip0: flags=84041<UP,RUNNING,MULTICAST,CVIP> mtu 1500 inet 213.39.1.236 netmask ffffffe0 net1.2: flags=4043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 213.39.1.244 netmask ffffffe0 broadcast 213.39.1.255 ether 00:06:5b:38:49:ba net0.1: flags=64043<UP,BROADCAST,RUNNING,MULTICAST,ICS,REMOTE> mtu 1500 inet 192.168.255.1 netmask ffffff00 broadcast 192.168.255.255 ether 00:03:47:40:ce:a3 net1.1: flags=44043<UP,BROADCAST,RUNNING,MULTICAST,REMOTE> mtu 1500 inet 213.39.1.243 netmask ffffffe0 broadcast 213.39.1.255 ether 00:06:5b:38:49:cf I.E. the order of the output is different but the same information is printed. > .... where eth1 and eth2 are interfaces on node2 and node3. I should be > able to do any ifconfig operation on these interfaces. Can do on UW NSC. > I should be able to run a server on node1 that listen on IP associated > with eth1 or eth2 Can do on UW NSC. > 2) It will great if i can see my cluster alias as an interface on all > the nodes and if I can do a "ifconfig clu0 down" That will bring the > cluster IP down. On UW NSC doing "ifconfig cvip0 down" takes down the cvip, but the individual node interfaces (net1.1 and net1.2 for me) stay up. > If you have multiple cluster alias then ifconfig should show > only those alias to which the particular node belongs. > 3) I should be able to configure a cluster alias and should be able > to say node1 node2 and node3 belong to this alias and node4, node5 > and node6 belong to cluster alias2 AFAIK the cvip (all the cvip's, you can have many) on UW NSC applies to all nodes. > 3) A user configurable load balancing on the basis of connection in > the case of a multi instance configuration of different server like > webserver ( LVS ? ) On UW NSC it seems to be round robin. |
From: Aneesh K. K.V <ane...@di...> - 2002-03-26 09:15:17
|
Hi, On Tue, 2002-03-26 at 14:28, John Hughes wrote: > > I was going through the cluster documentation of Tru64 to figure out > > what exactly is CVIP.It actually made me prepare this wish list. I > > wonder if any of this is already there. > > Shouldn't you be looking at the UnixWare NSC doc rather than Tru64? Or > is the cvip implementation the same? I don't have a UnixWare NSC here. :) I was trying to look into the different features that CVIP provides > > > 1)Since we talk of Single System Image I guess we need to have the > > external interface( I mean interface connected to the external > > network ) of all the nodes to be shown on each node once i do > > ifconfig ( I guess Brian once told about this. Brian ? ) > > Here's what I get on my NSC system: (net0.[01] is the cluster > interconnect, net1.[01] is the external ether, cvip0 is the cvip > and lo0 is the loopback.) > > # onall ifconfig -a > (node 1) > lo0: flags=4049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 > inet 127.0.0.1 netmask ff000000 > net0.1: flags=24043<UP,BROADCAST,RUNNING,MULTICAST,ICS> mtu 1500 > inet 192.168.255.1 netmask ffffff00 broadcast 192.168.255.255 > ether 00:03:47:40:ce:a3 > cvip0: flags=84041<UP,RUNNING,MULTICAST,CVIP> mtu 1500 > inet 213.39.1.236 netmask ffffffe0 > net1.1: flags=4043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 > inet 213.39.1.243 netmask ffffffe0 broadcast 213.39.1.255 > ether 00:06:5b:38:49:cf > net1.2: flags=44043<UP,BROADCAST,RUNNING,MULTICAST,REMOTE> mtu 1500 > inet 213.39.1.244 netmask ffffffe0 broadcast 213.39.1.255 > ether 00:06:5b:38:49:ba > net0.2: flags=64043<UP,BROADCAST,RUNNING,MULTICAST,ICS,REMOTE> mtu 1500 > inet 192.168.255.2 netmask ffffff00 broadcast 192.168.255.255 > ether 00:03:47:3f:c9:0d > (node 2) > lo0: flags=4049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 > inet 127.0.0.1 netmask ff000000 > net0.2: flags=24043<UP,BROADCAST,RUNNING,MULTICAST,ICS> mtu 1500 > inet 192.168.255.2 netmask ffffff00 broadcast 192.168.255.255 > ether 00:03:47:3f:c9:0d > cvip0: flags=84041<UP,RUNNING,MULTICAST,CVIP> mtu 1500 > inet 213.39.1.236 netmask ffffffe0 > net1.2: flags=4043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 > inet 213.39.1.244 netmask ffffffe0 broadcast 213.39.1.255 > ether 00:06:5b:38:49:ba > net0.1: flags=64043<UP,BROADCAST,RUNNING,MULTICAST,ICS,REMOTE> mtu 1500 > inet 192.168.255.1 netmask ffffff00 broadcast 192.168.255.255 > ether 00:03:47:40:ce:a3 > net1.1: flags=44043<UP,BROADCAST,RUNNING,MULTICAST,REMOTE> mtu 1500 > inet 213.39.1.243 netmask ffffffe0 broadcast 213.39.1.255 > ether 00:06:5b:38:49:cf > > I.E. the order of the output is different but the same information is > printed. I didn't know that it is already there . Great. > > > .... where eth1 and eth2 are interfaces on node2 and node3. I should > be > > able to do any ifconfig operation on these interfaces. > > Can do on UW NSC. > > > I should be able to run a server on node1 that listen on IP > associated > > with eth1 or eth2 > > Can do on UW NSC. > > > > 2) It will great if i can see my cluster alias as an interface on all > > the nodes and if I can do a "ifconfig clu0 down" That will bring the > > cluster IP down. > > On UW NSC doing "ifconfig cvip0 down" takes down the cvip, but the > individual node interfaces (net1.1 and net1.2 for me) stay up. That's fine. So that is also there :) > > > If you have multiple cluster alias then ifconfig should show > > only those alias to which the particular node belongs. > > 3) I should be able to configure a cluster alias and should be able > > to say node1 node2 and node3 belong to this alias and node4, node5 > > and node6 belong to cluster alias2 > > AFAIK the cvip (all the cvip's, you can have many) on UW NSC applies to > all nodes. What happens if you have a cluster spread across different subnets. I mean Let's say i have a three node cluster and node1 is on subnet1 and node2 is on subnet 2. Now i have a cluster IP configured and the cluster IP falls in subnet1 . Now a client sitting on a subnet one will do a frame relay to send any packet to the cluster IP because both of them fall on the same subnet. Here in this case node1 will reply for the arp of cluster IP and will read the packet . Let's assume that the packet is destined for service A and is running on node2 ( yes it is bind to cluster IP ) . Now from the IP layer of node1 packet will get redirected to the TCP layer of node2 and the application works smoothly. Now what happens if the node1 goes down. My application is still running on node2 and the cluster is still there but the further arp request from my client on subnet 1 will not be replied becuase node 1 is down and there is no other node in the cluster connected to that subnet. How to prevent . Make it a condition that cluster cannot spread across subnet . OR all always impose cluster IP to fall in a different subnet so that always packet get sent to the router and then to one of the nodes in the cluster. Another solution is to allow the cluster IP to be configure per subnet that is we should have cluster IP per subnet in this case also we should ensure that the cluster IP contains only nodes that are in that subnet. other wise previously stated problem will occur. > > > 3) A user configurable load balancing on the basis of connection in > > the case of a multi instance configuration of different server like > > webserver ( LVS ? ) > > On UW NSC it seems to be round robin. > It is better to have a priority based weighted round robin. That's what Tru64 have. It gives me fine control over load balancing. > > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel |
From: John H. <john@Calva.COM> - 2002-03-26 10:26:23
|
>> (onall ifconfig -a ...) > I didn't know that it is already there . Great. On UnixWare NSC. >> AFAIK the cvip (all the cvip's, you can have many) on UW NSC applies >> to all nodes. > What happens if you have a cluster spread across different subnets. I > mean Let's say i have a three node cluster and node1 is on subnet1 > and node2 is on subnet 2. AFAIK on UW NSC you can't do that. You can have multiple cvip's, each on different subnets. > (load balancing) > >> On UW NSC it seems to be round robin. >> > It is better to have a priority based weighted round robin. That's what > Tru64 have. It gives me fine control over load balancing. That'd be nice if you have asymetric clusters. How is the priority set? Is it a per node priority or a per bind priority? |
From: Aneesh K. K.V <ane...@di...> - 2002-03-26 10:33:50
|
Hi, > what > Tru64 have. It gives me fine control over load balancing. > > That'd be nice if you have asymetric clusters. How is the priority > set? Is it a per node priority or a per bind priority? > I think it is per node. But it will be nice to have a per service priority and weight,, so that i can say i should get most of web server connection to this node1 and most of database connection to node2 .... and when the web server connection reaches the limit send the connection to node3 and database connection when it reaches the limit send it to node4. -aneesh |
From: John H. <john@Calva.COM> - 2002-03-26 10:39:35
|
>>> (re weighted round robin load balancing) >> >> That'd be nice if you have asymetric clusters. How is the priority >> set? Is it a per node priority or a per bind priority? >> > > I think it is per node. But it will be nice to have a per service > priority and weight,, so that i can say i should get most of web > server connection to this node1 and most of database connection to > node2 .... and when the web server connection reaches the limit send > the connection to node3 and database connection when it reaches the > limit send it to node4. I think it'd have to be per service to be useful. And wouldn't you normaly send x% to node 1, y% to node 2, z% to node 3... rather than filling each node to the limit before going on to the next? |
From: Brian J. W. <Bri...@co...> - 2002-03-26 22:24:41
|
"Aneesh Kumar K.V" wrote: > ... My application is still running on node2 > and the cluster is still there but the further arp request from my > client on subnet 1 will not be replied becuase node 1 is down and there > is no other node in the cluster connected to that subnet. > > How to prevent . Make it a condition that cluster cannot spread across > subnet . OR all always impose cluster IP to fall in a different subnet > so that always packet get sent to the router and then to one of the > nodes in the cluster. I think a classic Unix mixed metaphor applies: "Give the user enough rope to shoot themselves in the foot." The fear that a user might put together a bad configuration does not justify restrictions on potentially useful configurations. In the scenario described above, the sysadmin should make sure that at least two nodes are up on subnet 1, so that no single point of failure takes down service on that cluster IP address. > Another solution is to allow the cluster IP to be configure per subnet > that is we should have cluster IP per subnet in this case also we should > ensure that the cluster IP contains only nodes that are in that subnet. > other wise previously stated problem will occur. That could cause problems for an unmodified app that wants to read an IP address out of a configuration file and do a listen on it. If it gets started on a node that has no access to that cluster interface, the app will get confused and probably exit. Part of the SSI philosophy is that apps should be able to run without modification on any node in the cluster, because it all looks like one big machine. Sometimes it's acceptable to deviate from this philosophy because it's not worth the effort to implement it 100%. An example is not allowing a process to bind to another node's physical IP address, because the process can just as easily bind to a highly available cluster IP address. To not allow it the freedom to bind to any cluster IP address, however, breaks the SSI model too much. > It is better to have a priority based weighted round robin. That's what > Tru64 have. It gives me fine control over load balancing. The round robin system on NSC was chosen for ease of implementation in the face of higher priorities. A better system for load-leveling connections is definitely desirable. Doesn't LVS already have algorithms for this? -- Brian Watson | "Now I don't know, but I been told it's Linux Kernel Developer | hard to run with the weight of gold, Open SSI Clustering Project | Other hand I heard it said, it's Compaq Computer Corp | just as hard with the weight of lead." Los Angeles, CA | -Robert Hunter, 1970 mailto:Bri...@co... http://opensource.compaq.com/ |
From: Brian J. W. <Bri...@co...> - 2002-03-26 21:41:02
|
"Aneesh Kumar K.V" wrote: > 1)Since we talk of Single System Image I guess we need to have the > external interface( I mean interface connected to the external network ) > of all the nodes to be shown on each node once i do ifconfig ( I guess > Brian once told about this. Brian ? ) > > ie, on node1 if I do a ifconfig -a I should get > > eth0 > eth1 > eth2 > .... > .... where eth1 and eth2 are interfaces on node2 and node3. I should be > able to do any ifconfig operation on these interfaces. I should be able > to run a server on node1 that listen on IP associated with eth1 or eth2 Yes, that's my opinion. I dislike the NSC convention of node encoding the network device (e.g., net0.1 on node 1, net0.2 on node 2). The SSI philosophy is that the cluster should look like one big machine with a whole bunch of NICs, so the devices should be numbered that way. Note that the interface to node number mapping would not necesarily be eth0 on node 1, eth1 on node 2, etc. The interfaces would probably be numbered in whatever order the cluster learns about them, although maybe a user-interface to rearrange the numbering might be a good idea at some point. > ( I guess this brings in the requirement of bind/connect and listen to > be clusterwide. am I correct Bruce ? ) Not exactly. I think NIC configuration should be clusterwide, so that a user can ifconfig any interface from any node in the cluster. OTOH, allowing a program to do a listen on any interface from anywhere in the cluster may not be worth the work. Programs should just listen on one of the cluster interfaces, which can be done from any node in the cluster thanks to LVS. Correct me if I'm wrong, Kai or Bruce. The reason for remote bind, connect, listen, etc., is a bit different. Right now, processes can migrate, but sockets can't. Eventually, I would like to make sockets migrateable, but there could still be situations where two or more processes on different nodes share a socket. With the current SSI code, if a process does a socket() call, gets migrated by the load-leveler, then tries to do a connect(), it'll break. If it finishes the connect() before it migrates, then file ops like read(), write(), poll(), etc., will work fine. I just need to do the same for socket ops like connect() and listen(). > 2) It will great if i can see my cluster alias as an interface on all > the nodes and if I can do a "ifconfig clu0 down" That will bring the > cluster IP down. If you have multiple cluster alias then ifconfig > should show only those alias to which the particular node belongs. As with the physical NICs, a user should be able to ifconfig a cluster interface from any node in the cluster. -- Brian Watson | "Now I don't know, but I been told it's Linux Kernel Developer | hard to run with the weight of gold, Open SSI Clustering Project | Other hand I heard it said, it's Compaq Computer Corp | just as hard with the weight of lead." Los Angeles, CA | -Robert Hunter, 1970 mailto:Bri...@co... http://opensource.compaq.com/ |
From: John H. <john@Calva.COM> - 2002-03-26 23:21:21
|
> Yes, that's my opinion. I dislike the NSC convention of node encoding > the network device (e.g., net0.1 on node 1, net0.2 on node 2). The SSI > philosophy is that the cluster should look like one big machine with a > whole bunch of NICs, so the devices should be numbered that way. Personly I think that devices shouldn't be "numbered". What is the point of knowing that this is the "nth" ethernet device? In other words "eth23" is no more or less stupid than "net17.23" and encodes less information. Maybe useless information, but I hate discarding it before you know whether you need it. |
From: Brian J. W. <Bri...@co...> - 2002-03-27 00:29:05
|
John Hughes wrote: > Personly I think that devices shouldn't be "numbered". What is the > point of knowing that this is the "nth" ethernet device? > > In other words "eth23" is no more or less stupid than "net17.23" and > encodes less information. Maybe useless information, but I hate > discarding it before you know whether you need it. The reason I dislike it is because it breaks the base naming convention, which then breaks apps and utilities that depend on the base naming convention. For awhile, I had the "joy" of maintaining the NSC changes to the UnixWare netcfg command. Most of the changes were required because of the node-encoded naming convention for network devices. An alternative way to get node information for network interfaces could be an option to the where command. -- Brian Watson | "Now I don't know, but I been told it's Linux Kernel Developer | hard to run with the weight of gold, Open SSI Clustering Project | Other hand I heard it said, it's Compaq Computer Corp | just as hard with the weight of lead." Los Angeles, CA | -Robert Hunter, 1970 mailto:Bri...@co... http://opensource.compaq.com/ |
From: John H. <john@Calva.COM> - 2002-03-27 10:29:00
|
> The reason I dislike it is because it breaks the base naming convention, > which then breaks apps and utilities that depend on the base naming > convention. For awhile, I had the "joy" of maintaining the NSC changes > to the UnixWare netcfg command. Most of the changes were required > because of the node-encoded naming convention for network devices. Ok, I see your point, reducing the number of changes to the base system seems like a pretty important goal. |