I got incorrect node names from the following code in
get_nodes_info() from
http://cvs.sourceforge.net/viewcvs.py/evms/evms2/engine
/plugins/rsct/rsct_mem_info.c?view=markup:
if\(system\("lsrsrc -axd
IBM.PeerNode > /tmp/rsct_node_info") == -1){
LOG_ERROR("ERROR:get_nodes_info() fails ");
}
fstream=fopen
("/tmp/rsct_node_info","r");
i=0;
while\(fgets\(buf,
MAX_BUF_SIZE,fstream) != NULL){
int length;
info_entry_t* entry;
entry=
(info_entry_t*)malloc(sizeof(info_entry_t));
memset
(entry,0,sizeof(info_entry_t));
temp=strtok
(buf,":"); //name
strcpy
(entry->name,temp);
get_rid_of_quota(entry->name);
temp=strtok
(NULL,":"); //nodeList
sscanf
(temp+1,"%d",&entry->node_number);
strtok
(NULL,":"); //RSCT version
strtok
(NULL,":"); //class Version
strtok
(NULL,":"); // CritRsrcProtMethod
temp=
strtok(NULL,":"); //NodeNameList in the form of {"xxx"}
length=strlen(temp);
strncpy
((char*)&entry->nodeid,temp+2,length-4);
entry-
>nodeid.bytes[length-4]='\0';
i++;
The easiest fix is to add the following line after "strtok
(NULL,":"); // CritRsrcProtMethod":
strtok
(NULL,":"); // ActivePeerDomain
However, I recommend a cleaner fix to replace the above
section with the following, where all "strtok(NULL,":")"
lines are deleted and the top line is expanded:
if\(system\("lsrsrc -axd
IBM.PeerNode Name NodeList NodeNameList
> /tmp/rsct_node_info") == -1){
LOG_ERROR("ERROR:get_nodes_info() fails ");
}
fstream=fopen
("/tmp/rsct_node_info","r");
i=0;
while\(fgets\(buf,
MAX_BUF_SIZE,fstream) != NULL){
int length;
info_entry_t* entry;
entry=
(info_entry_t*)malloc(sizeof(info_entry_t));
memset
(entry,0,sizeof(info_entry_t));
temp=strtok
(buf,":"); //name
strcpy
(entry->name,temp);
get_rid_of_quota(entry->name);
temp=strtok
(NULL,":"); //nodeList
sscanf
(temp+1,"%d",&entry->node_number);
temp=
strtok(NULL,":"); //NodeNameList in the form of {"xxx"}
length=strlen(temp);
strncpy
((char*)&entry->nodeid,temp+2,length-4);
entry-
>nodeid.bytes[length-4]='\0';
i++;
Logged In: YES
user_id=1330993
After having made either fix, the nodeids look correct:
c108f1n02:/usr/src/evms-2.3.3/plugins/rsct # ./ecetest
I am running as master
called mycb type=0 num_entries=2
c108f1n03.ppd.pok.ibm.com
c108f1n02.ppd.pok.ibm.com
-------------------
called mycb type=1 num_entries=0
-------------------
hi
called mycb1 type=2 num_entries=2
c108f1n03.ppd.pok.ibm.com
c108f1n02.ppd.pok.ibm.com
-------------------
nodeid=c108f1n02.ppd.pok.ibm.com
str=c108f1n02.ppd.pok.ibm.com maxnodes=2
nodeid is c108f1n02.ppd.pok.ibm.com
nodes[0]=c108f1n03.ppd.pok.ibm.com
nodes[1]=c108f1n02.ppd.pok.ibm.com
membership is
called mycb1 type=0 num_entries=2
c108f1n03.ppd.pok.ibm.com
c108f1n02.ppd.pok.ibm.com
-------------------
clusterid is RSCT
broadcast a message
Logged In: YES
user_id=1330993
RSCT plug-in seems working correctly after this fix. EVMS
2.3.3 is installed on 2 nodes, a SLES9 and a RHEL4. I am
able to create a container with CSM and change it properties.
Logged In: YES
user_id=1330993
Information about lsrsrc command:
http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?
topic=/com.ibm.help.rsct.doc/rsct_books/rsct_linux_tech_ref/
bl5trl0819.html
Information about IBM.PeerNode can be found under "Peer
Node resource class" in
http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm05/bl5ad
m0573.html
I suspect entry->nodeid should be the same as entry->name:
c108f1n02:~ # lsrsrc -da IBM.PeerNode Name NodeList
NodeIDs NodeNames NodeNameList
Resource Persistent Attributes for IBM.PeerNode
Name:NodeList:NodeIDs:NodeNames:NodeNameList:
"c108f1n02":{1}:{13595822499165139164}:
{"c108f1n02.ppd.pok.ibm.com","c108f1n02"}:
{"c108f1n02.ppd.pok.ibm.com"}:
"c108f1n04":{2}:{10570954252156351641}:{"c108f1n04"}:
{"c108f1n04"}:
Logged In: YES
user_id=1330993
It is probably the best to get entry->nodeid from NodeNames
instead of from NodeNameList or from Name. These 3 can be
different:
Name:NodeList:NodeIDs:NodeNames:NodeNameList:
"c108f1n04":{2}:{10570954252156351641}:{"c108f1n04"}:
{"c108f1n04"}:
"c108f1n02":{1}:{13595822499165139164}:
{"c108f1n02.ppd.pok.ibm.com","c108f1n02"}:
{"c108f1n02.ppd.pok.ibm.com"}:
The only this 1 among the 3 is documented meaningfully:
NodeNames
A list of names that the node may be referred to within a
Peer Domain through the NodeNameList attribute of any
resource class.
Logged In: YES
user_id=1330993
The descriptions on web are probably outdated because they
are not in the more recent versions of these documents.
With command "lsrsrcdef -A p -p 0 -e IBM.PeerNode", I got
the following descriptions from RSCT:
program_name = "Name"
description = "The name of the node. It may be
specified on define as either an IP address or a DNS name. If
a DNS name is specified, it must be resolvable to an IP
address."
program_name = "NodeNames"
description = "The value of this attribute reflects the
list of names that the node may be referred to within a Peer
Domain through the NodeNameList attribute of any resource
resource class."
program_name = "NodeNameList"
description = "This attribute lists the symbolic names
of the nodes where the operational interface of the resource
is available."
"Name" seems to me the best choice. Therefore, entry-
>nodeid should be replaced with entry->name whereever it is
used.
Logged In: YES
user_id=1330993
Is there any chance to get this bug fixed? There is no
indication of anybody working on this at all since the
original report nearly 3 months ago.
I have suggested 2 different fixes. Will anybody be able to
pick a fix?