Ang: Re: [Evms-devel] Re: Novell HA plug-in

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Johan,

Sorry, I don't know if anyone is working on this. I surely hope so.
I also hope Heartbeat will provide an unique ID for a cluster. I=20
opened a defect on the ID issue.=20

Regards,

- Changju=20

>>> <jo...@ca...> 2/24/2006 12:47 am >>>
"Changju Gao" <CG...@no...>
S=E4nt av: evm...@li...=20
2006-02-24 02:02

=20
        Till:   "Steve Dobbelstein" <st...@us...>
        Kopia:  <evm...@li...>
        =C4rende: Re: [Evms-devel] Re: Novell HA plug-in

Hi, Changju!

Thanks for working with this HA plugin, it is very much needed! How=B4 =
bout=20
the heatbeat dynamic configuration, do you know if anyone is working on=20
that?

 Rgrds Johan

Hi Steve,

Thank you for reviewing the files.=20

I have modified the code to only report nodes that are configured
(instead of the maximum) and tested on my cluster. Please see=20
the attached source file for details. I figure we can wait until=20
Heartbeat allows dynamic configuration to continue the discussion.=20

I did run into problems with newly added nodes, but it was the plugin
for Novell's cluster product, not HB2.

If I remember correctly, EVMS will try a certain times sending out a
message. It's not an infinite loop. I saw this behavior quite a few times
in the log. Nevertheless, you are welcome to change the return code.=20

Good to know HB has strictly ordered messages. I guess that will
make my life easier if I must break large messages into small pieces.

Please feel free to modify the Makefile.in. I appreciate your efforts=20
there.=20

Best regards,

- Changju Gao=20

>>> Steve Dobbelstein <st...@us...> 2/23/2006 4:03 pm >>>
"Changju Gao" <CG...@no...> wrote on 02/21/2006 06:29:05 PM:

> Hi Steve,
>
> Thank you for merging the files and creating the patch.
>
> I have revised the code according to your comments and tested the
> plugin. I said last time that I realized that I have to include
> evmsccm
> and evms_failover to help facilitate failing containers over. Sorry I
> have to modify Makefile.in to include the two utilities. Please review
>
> it for me. Thanks.

Hi, Changju.

I looked over your new files.  The changes look good.  I have a few=20
replies
to your new comments.  Rather than put them in the source file I'm putting
them here, since they are now topical comments rather than comments on
particular lines of code.

// SLD2 I think you are confusing configuration and membership.  The
// SLD2 configuration is the definition of all the nodes in the cluster
whether
// SLD2 they are running or not.  Unless HA added some functionality of
which I
// SLD2 am not aware, the configuration does not change.  What can change
is the
// SLD2 membership, i.e., which nodes are currently active.  The
engine/daemon
// SLD2 does not need to handle changes in the configuration; the
configuration
// SLD2 is static.  The engine/daemon does need to handle changes in the
// SLD2 membership.  Indeed, it needs to, since it doesn't do it well=20
right
now.
// SLD2 It's on my list of things to do.
// SLD2 That said, I suppose you can decide what to report for the nodes
defined
// SLD2 as the cluster configuration.  Since the engine/daemon will only
deal
// SLD2 with nodes that are in the membership, it probably won't do
anything
// SLD2 with the other nodes in the configuration, so you probably won't
run the
// SLD2 risk of the engine/daemon trying to do something with a node that
this
// SLD2 plug-in reports is in the configuration but really isn't part of
the HA
// SLD2 cluster configuration.  Still, I think it is more correct to=20
report
the
// SLD2 nodes that HA says are configured rather than the maximum =
number=20
of
// SLD2 nodes this plug-in supports.  It can prevent problems in the
future.
// SLD2 For example, if the engine/daemon ever decided to report all the
nodes
// SLD2 that are configured in the cluster, the administrator could get
confused
// SDL2 when he/she configured the HA cluster with 2 nodes but EVMS=20
reports
that
// SLD2 there are 16.

// CGAO2 I think I understood configuration and membership. EVMS
engine/daemon
// CGAO2 DOES need the configuration when logging (debug) information. =
If=20
a
string name
// CGAO2 cannot be found for a newly added node, engine/daemon will print
<null> instead.
// CGAO2 Sometimes, a null string will cause problems inside the
engine/daemon. And that's
// CGAO2 The main reason behind the trick here.

Did you actually run into a problem where the engine/daemon tried to print
the name of a node but got a NULL string?  If so, which node's name was it
trying to print?  Was it a node that was defined in the heartbeat cluster
or one that your plug-in defined?

// CGAO2 I decided to increase the number to 32, which is a physical
limitation of my HB2
// CGAO2 implementation. I also prefer to predefine the configuration for
those nodes that
// CGAO2 are not configured at the time (when engine/daemon starts).

I'm sorry.  I don't see your point.  Can HA indeed handle a change in the
configuration of the cluster, for example, adding a new node (converting a
2 node cluster to a 3 node cluster), while it is up and running?  If so,
then our clustering design is broken.  It assumes that a cluster
configuration doesn't change.

If the configuration cannot be changed while heartbeat is running, then
there is no such thing as a "newly added node".  The definition of the
cluster configuration remains fixed as long as heartbeat is running.
Therefore you don't have to reserve extra slots in your table and you
certainly don't have to report  that there are more nodes configured than
HA says there are.  The engine/daemon won't ask for the name of a node=20
that
is not configured.

// SLD need to handle the case when the message is > HA's MAXLENGTH.
// CGAO Done.
//
// SLD2 Not quite.  It is not enough to just fail sending the message. The
// SLD2 message must be sent.  The send_msg API has no restrictions on the
// SDL2 size of a message.  If the underlying clustering software has a
// SLD2 restriction on the size of a message, then it is the cluster
manager
// SLD2 plug-in's responsibility to split up the message into smaller
messages
// SLD2 and reassemble them at the receiving end.  I know this work is not
// SLD2 simple.

// CGAO2 I totally agree that such work is not simple. However, I did dig
// CGAO2 into the implementation of heartbeat2 2.0.2. A message doesn't
appear having
// CGAO2 any limitations other than its depth. Taking function
ha_msg_addin() as
// CGAO2 an example, it tries to allocate a buffer big enough to hold the
binary data,
// CGAO2 and calls add_binary_field() to add it. So if it fails, most
likely the system is
// CGAO2 running out of memory and returning an EAGAIN should be
appropriate.
// CGAO2 I tested the implementation rigorously and never lost messages=20
due
to
// CGAO2 size limit.

There are some cases that I bet you didn't test that will generate large
messages.  For example, if you remotely administer another node and run
fsck on a volume, the File System Interface Modules (FSIMs) will report=20
the
output of fsck.  The FSIM's internal buffers for the data are 10 KB, which
is larger than the MAXLENGTH of 1 KB.

However, looking into the heartbeat code just a little, I see that
MAXLENGTH is only used for some internal buffers.  The maximum length of a
message is MAXMSG, which is defined to be the same as MAXDATASIZE, =
which=20
is
64 KB.  So even though HA does have a limit to the message size, I =
think=20
it
is fair to say that EVMS won't generate messages bigger than 64 KB.  So I
think you are OK in not having to split the messages.

BTW, had HA had a smaller message limit that could not be ignored,
returning EAGAIN is just asking for an infinite loop.  The calling code
would simply try again.  It would be better to return something like=20
ENOSPC
so that the caller can know what action to take rather than retrying a
operation that will just fail again.

// CGAO2 Another reason I am reluctant to implement my own communication
protocol
// CGAO2 is that doing so might alter whatever ordering HB2 provides. For
example,
// CGAO2 message A arrives node 1 first, but it's incomplete. While=20
waiting
for the remaining
// CGAO2 parts of A, we might get another message from a different source.
I don't know
// CGAO2 what kind of ordering HB2 provides, so I might break it no matter
how I
// CGAO2 implement the marshaling here.

HA guarantees ordering of the messages.  The messages will be received in
the same order they are sent, and the messages will be seen in the same
order on all the nodes that receive the message.

It should be fairly simple to implement your own protocol for splitting
messages.  Add some fields to your protocol that say which message the
fragment is for and perhaps which numbered fragment it is (you shouldn't
need an order number since HA guarantees they will arrive in the same=20
order
as they were sent), then reassemble them on the receiving end.  If you are
worried about message B coming in while all the parts of message A have=20
not
arrived, you can postpone the delivery of message B until all the=20
fragments
of message A arrive.  Just a thought.  It's a moot point due since we
established above that you shouldn't have to split the messages.

I made a couple of tweaks to Makefile.in on my end.  I changed the NAME
from evms-hb2 to just hb2, the "evms-" being superfluous.  I removed=20
adding
of the settings of HA_MAJOR, HA_MINOR, and HA_PATCH to EVMS_DEFS since=20
they
are not used by any of the code.

I look forward to your comments.

Steve D.