From: Changju G. <CG...@no...> - 2006-02-24 18:25:52
|
Hi Johan, Sorry, I don't know if anyone is working on this. I surely hope so. I also hope Heartbeat will provide an unique ID for a cluster. I=20 opened a defect on the ID issue.=20 Regards, - Changju=20 >>> <jo...@ca...> 2/24/2006 12:47 am >>> "Changju Gao" <CG...@no...> S=E4nt av: evm...@li...=20 2006-02-24 02:02 =20 Till: "Steve Dobbelstein" <st...@us...> Kopia: <evm...@li...> =C4rende: Re: [Evms-devel] Re: Novell HA plug-in Hi, Changju! Thanks for working with this HA plugin, it is very much needed! How=B4 = bout=20 the heatbeat dynamic configuration, do you know if anyone is working on=20 that? Rgrds Johan Hi Steve, Thank you for reviewing the files.=20 I have modified the code to only report nodes that are configured (instead of the maximum) and tested on my cluster. Please see=20 the attached source file for details. I figure we can wait until=20 Heartbeat allows dynamic configuration to continue the discussion.=20 I did run into problems with newly added nodes, but it was the plugin for Novell's cluster product, not HB2. If I remember correctly, EVMS will try a certain times sending out a message. It's not an infinite loop. I saw this behavior quite a few times in the log. Nevertheless, you are welcome to change the return code.=20 Good to know HB has strictly ordered messages. I guess that will make my life easier if I must break large messages into small pieces. Please feel free to modify the Makefile.in. I appreciate your efforts=20 there.=20 Best regards, - Changju Gao=20 >>> Steve Dobbelstein <st...@us...> 2/23/2006 4:03 pm >>> "Changju Gao" <CG...@no...> wrote on 02/21/2006 06:29:05 PM: > Hi Steve, > > Thank you for merging the files and creating the patch. > > I have revised the code according to your comments and tested the > plugin. I said last time that I realized that I have to include > evmsccm > and evms_failover to help facilitate failing containers over. Sorry I > have to modify Makefile.in to include the two utilities. Please review > > it for me. Thanks. Hi, Changju. I looked over your new files. The changes look good. I have a few=20 replies to your new comments. Rather than put them in the source file I'm putting them here, since they are now topical comments rather than comments on particular lines of code. // SLD2 I think you are confusing configuration and membership. The // SLD2 configuration is the definition of all the nodes in the cluster whether // SLD2 they are running or not. Unless HA added some functionality of which I // SLD2 am not aware, the configuration does not change. What can change is the // SLD2 membership, i.e., which nodes are currently active. The engine/daemon // SLD2 does not need to handle changes in the configuration; the configuration // SLD2 is static. The engine/daemon does need to handle changes in the // SLD2 membership. Indeed, it needs to, since it doesn't do it well=20 right now. // SLD2 It's on my list of things to do. // SLD2 That said, I suppose you can decide what to report for the nodes defined // SLD2 as the cluster configuration. Since the engine/daemon will only deal // SLD2 with nodes that are in the membership, it probably won't do anything // SLD2 with the other nodes in the configuration, so you probably won't run the // SLD2 risk of the engine/daemon trying to do something with a node that this // SLD2 plug-in reports is in the configuration but really isn't part of the HA // SLD2 cluster configuration. Still, I think it is more correct to=20 report the // SLD2 nodes that HA says are configured rather than the maximum = number=20 of // SLD2 nodes this plug-in supports. It can prevent problems in the future. // SLD2 For example, if the engine/daemon ever decided to report all the nodes // SLD2 that are configured in the cluster, the administrator could get confused // SDL2 when he/she configured the HA cluster with 2 nodes but EVMS=20 reports that // SLD2 there are 16. // CGAO2 I think I understood configuration and membership. EVMS engine/daemon // CGAO2 DOES need the configuration when logging (debug) information. = If=20 a string name // CGAO2 cannot be found for a newly added node, engine/daemon will print <null> instead. // CGAO2 Sometimes, a null string will cause problems inside the engine/daemon. And that's // CGAO2 The main reason behind the trick here. Did you actually run into a problem where the engine/daemon tried to print the name of a node but got a NULL string? If so, which node's name was it trying to print? Was it a node that was defined in the heartbeat cluster or one that your plug-in defined? // CGAO2 I decided to increase the number to 32, which is a physical limitation of my HB2 // CGAO2 implementation. I also prefer to predefine the configuration for those nodes that // CGAO2 are not configured at the time (when engine/daemon starts). I'm sorry. I don't see your point. Can HA indeed handle a change in the configuration of the cluster, for example, adding a new node (converting a 2 node cluster to a 3 node cluster), while it is up and running? If so, then our clustering design is broken. It assumes that a cluster configuration doesn't change. If the configuration cannot be changed while heartbeat is running, then there is no such thing as a "newly added node". The definition of the cluster configuration remains fixed as long as heartbeat is running. Therefore you don't have to reserve extra slots in your table and you certainly don't have to report that there are more nodes configured than HA says there are. The engine/daemon won't ask for the name of a node=20 that is not configured. // SLD need to handle the case when the message is > HA's MAXLENGTH. // CGAO Done. // // SLD2 Not quite. It is not enough to just fail sending the message. The // SLD2 message must be sent. The send_msg API has no restrictions on the // SDL2 size of a message. If the underlying clustering software has a // SLD2 restriction on the size of a message, then it is the cluster manager // SLD2 plug-in's responsibility to split up the message into smaller messages // SLD2 and reassemble them at the receiving end. I know this work is not // SLD2 simple. // CGAO2 I totally agree that such work is not simple. However, I did dig // CGAO2 into the implementation of heartbeat2 2.0.2. A message doesn't appear having // CGAO2 any limitations other than its depth. Taking function ha_msg_addin() as // CGAO2 an example, it tries to allocate a buffer big enough to hold the binary data, // CGAO2 and calls add_binary_field() to add it. So if it fails, most likely the system is // CGAO2 running out of memory and returning an EAGAIN should be appropriate. // CGAO2 I tested the implementation rigorously and never lost messages=20 due to // CGAO2 size limit. There are some cases that I bet you didn't test that will generate large messages. For example, if you remotely administer another node and run fsck on a volume, the File System Interface Modules (FSIMs) will report=20 the output of fsck. The FSIM's internal buffers for the data are 10 KB, which is larger than the MAXLENGTH of 1 KB. However, looking into the heartbeat code just a little, I see that MAXLENGTH is only used for some internal buffers. The maximum length of a message is MAXMSG, which is defined to be the same as MAXDATASIZE, = which=20 is 64 KB. So even though HA does have a limit to the message size, I = think=20 it is fair to say that EVMS won't generate messages bigger than 64 KB. So I think you are OK in not having to split the messages. BTW, had HA had a smaller message limit that could not be ignored, returning EAGAIN is just asking for an infinite loop. The calling code would simply try again. It would be better to return something like=20 ENOSPC so that the caller can know what action to take rather than retrying a operation that will just fail again. // CGAO2 Another reason I am reluctant to implement my own communication protocol // CGAO2 is that doing so might alter whatever ordering HB2 provides. For example, // CGAO2 message A arrives node 1 first, but it's incomplete. While=20 waiting for the remaining // CGAO2 parts of A, we might get another message from a different source. I don't know // CGAO2 what kind of ordering HB2 provides, so I might break it no matter how I // CGAO2 implement the marshaling here. HA guarantees ordering of the messages. The messages will be received in the same order they are sent, and the messages will be seen in the same order on all the nodes that receive the message. It should be fairly simple to implement your own protocol for splitting messages. Add some fields to your protocol that say which message the fragment is for and perhaps which numbered fragment it is (you shouldn't need an order number since HA guarantees they will arrive in the same=20 order as they were sent), then reassemble them on the receiving end. If you are worried about message B coming in while all the parts of message A have=20 not arrived, you can postpone the delivery of message B until all the=20 fragments of message A arrive. Just a thought. It's a moot point due since we established above that you shouldn't have to split the messages. I made a couple of tweaks to Makefile.in on my end. I changed the NAME from evms-hb2 to just hb2, the "evms-" being superfluous. I removed=20 adding of the settings of HA_MAJOR, HA_MINOR, and HA_PATCH to EVMS_DEFS since=20 they are not used by any of the code. I look forward to your comments. Steve D. |