From: Changju G. <CG...@no...> - 2004-08-30 21:32:56
|
Hi, Steve, Thank you for the help. I got it to work correctly. Most of the problems were cause by my assumption that a message sent by the daemon instance of my plugin should be received by the engine instance. Apparently, that's not right. The plugin works with Novell Cluster Services. Best regards, - Changju >>> Steve Dobbelstein <st...@us...> 8/30/2004 10:09:32 AM >>> "Changju Gao" <CG...@no...> wrote on 08/27/2004 06:38:14 PM: Hi, Changju. > I am working on an ECE plugin. Great! Thanks for your interest in contributing to the project. We will be happy to help. May I ask for which clustering platform you are writing your plug-in? > when the plugin is asked by EVMS engine > to > send out a message, the field "corrolator" is always set to 0. Most of the time, yes. There are a few rare cases where the Engine will reuse a message and will leave the corrolator as it was set by the ECE. > According to the > document at http://evms.sourceforge.net/clustering/ece_api_guide.txt, > the > engine should be able to accept a new correlator set by the plugin if > correlaor > comes in as 0. > > Tests revealed that the engine is still waiting for messages with > "corrolator == 0" > even if the plugin sets the "corrolator" field to something else. Hmmm. I don't see that on my cluster which is running Linux HA. When the ECE returns from the call to its send_msg() it fills in the corrolator in the message with a non-zero value. The Engine should never be waiting for a corrolator of zero. Zero is not a valid corrolator. > According to the following code, EVMS puts a talk into the "talk_list" > before asking a > plugin to send the message out. A plugin can change the "corrolator" of > "talk->say", Yes. In fact, the ECE *must* change the corrolator in the message if it was called with a zero corrolator in the message. (This is not explicitly stated in the EVMS CLUSTERED ENGINE(ECE) API.) Zero cannot be returned as a valid corrolator. If zero was considered to be a valid corrolator, then when the ECE is asked to send a message with a zero corrolator it cannot distinguish a "valid" corrolator of zero with a request from the sender to generate a new corrolator. > but not the entry already in the "talk_list", which will later be used to > screen incoming > responses (in handle_response()). The "talk" is put on the talk_list. "say" is a message that is a field within the talk_t. The talk->say is the message that is sent. The ECE should be updating the corrolator in the talk->say, which is in the "talk" on the talk_list. There is only message. There are not two copies. > So if the "corrolator" field is set by a plugin, EVMS will never get > the response back. > And worse, EVMS can easily mistake a incoming message as a response. > Here is how > that can happen. > > Node 1 sends message1 to Node 2. > Node 2 sands a status message to node 2. I think you meant "Node 2 sends a status message to node 1." > Node 1 finds the status message "match" message1 in its talk_list > because the status message's > address is "node 2" and "corrolator==0". In handle_response(), EVMS set > the rc to 0, even after > it realizes that the status message is a request, causing the node1 to > think message1 is replied and > node 2 to wait for the response of the status message that will never > come. The scenario you describe does in fact happen. handle_response() will put a copy of the status request into the "hear" field in the talk_t, even though it is not a response to the command that was sent. It is, however, a response of some sort from the other node. The status request is handled by the caller to wait_for_response(). The caller of wait_for_reponse() must check to see if what it got back was a reply to the command it sent. If it isn't, then it should handle the incoming request, which includes returning a reply to the other node. For an example of this, see transact_message() where it has: do { wait_for_response(talk); if (talk->rc == 0) { if (!(talk->hear.cmd & COMMAND_RESPONSE)) { rc = handle_callback(talk); } } else { rc = talk->rc; } } while ((rc == 0) && ((talk->hear.cmd & COMMAND_MASK) != cmd)); If you are worried that a status message may get lost because it comes in after the response to the original command, the Engine only send status messages between the start and end of processing a command. It looks to me like the ECE might not be updating the zero corrolator field in the message when the message is sent successfully. Or perhaps there was an error in sending the message and an error code was not returned from the ECE's send_msg(). Hope this helps. Steve D. |