avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
#0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
#1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
#2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
#3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
#4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
#5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=<optimized out="">) at avnd_susm.c:1236
#6 0x000000000042ffe0 in avnd_evt_process (evt=<optimized out="">) at avnd_proc.c:278
#7 avnd_main_process () at avnd_proc.c:219
#8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Tickets: #1728
Tickets: #517
Wiki: ChangeLog-4.7.2
Wiki: ChangeLog-5.0.1
This could happen if the local immnd is down, then there would be a deadlock like this. I think there are some old tickets on devel.opensaf.org that touches this
I dont think this should be called a deadlock.
The imm service would reply immediately with TRY_AGAIN on attempts to
allocate an imm handle when the local
immnd is down. Any TRY_AGAIN loop has to be limited in number of retries
and sleep times, whether in
immutils or implemented directly. So not really a deadlock.
/AndersBj
Hans Feldt wrote:
Old ticket: https://sourceforge.net/p/opensaf/tickets/395/
not sure if this one is needed. it contains no logs nothing...
This kind of problem is also related to imm enhancement ticket [#27]
https://sourceforge.net/p/opensaf/tickets/27/
Not much help right now for solving this ticket.
The idea in (#27) is that a select few operations, such as implementer-set, class-implementer-set, class-implementer-release
could be possible to handle (i.e. forwarded) by the local immnd while it is still syncing, thus not exposing the handle.
I am assuming in this case that the need for the oi-handle-initialize is due to a handle having been exposed.
Related
Tickets: #27
We need some more information on this issue. Is it an application SU or middleware SU instantiation ? I guess this is an application SU instantiation when opensafd is stoppoing and immnd is not available. Please confirm.
Last edit: Nagendra Kumar 2013-08-14
What triggered this problem was that they powered off SC-2 after a cluster reboot. This causes immnd on some of the PL's to restart (OUT OF ORDER messages). On one of the PL's (PL-5) the immnd failed to contact the immd after the restart causing the amfnd OmInitialize to timeout with TRY AGAIN.
So the issues here are :
Why did some of the immnd's restart when SC-2 was lost.
Why did immnd on PL-5 fail to contact immd after it restarted.
This indicates communication problems in MDS/TIPC.
The question still remains what amfnd should do when IMM communication times out.
Yes the answer to the two first Why issues is MDS problems (dropped messages without explicit link loss).
A possible folowup question is Why there was MDS problems ?
This error scenario is clearly unusual (or is it just that we dont test power off of av SC that often?).
/AndersBj
From: Bertil Engelholm [mailto:ebereng@users.sf.net]
Sent: den 16 augusti 2013 11:05
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
What triggered this problem was that they powered off SC-2 after a cluster reboot. This causes immnd on some of the PL's to restart (OUT OF ORDER messages). On one of the PL's (PL-5) the immnd failed to contact the immd after the restart causing the amfnd OmInitialize to timeout with TRY AGAIN.
So the issues here are :
Why did some of the immnd's restart when SC-2 was lost.
Why did immnd on PL-5 fail to contact immd after it restarted.
This indicates communication problems in MDS/TIPC.
The question still remains what amfnd should do when IMM communication times out.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Wed Aug 14, 2013 06:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Tickets:
#517The lost messages in this case are so called fevs (fake evs messaes).
Each fevs message is individually checkpointed to standby SC and then "broadcast" to all IMMNDs using MDS brodcast
to the service-id of IMMND. This broadcast I understand not to be a real broadcast but a multicast to the set of receivers.
I can understand if the SC that is being shut down is the active IMMD tht it could succed partially in the multicast of
one fevs message.
The strange thing about this case is that the MDS loss of messages detected at one or more PLs was a gap of 30+ messages.
This proves that it was not a case of the active IMMD being interrupted (in the MDS library) by the power off.
THe issue seems more like a buffering issue. The send buffer at the active SC towards some PLs (but not all) apparently contained 30+ messages
at time of power off.
After failover the new active immnd will resend the two latest fevs messages it received over mbcp.
Here would actually have needed to resend 30+ fevs messages.
AndersBj
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: den 16 augusti 2013 11:20
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
Yes the answer to the two first Why issues is MDS problems (dropped messages without explicit link loss).
A possible folowup question is Why there was MDS problems ?
This error scenario is clearly unusual (or is it just that we dont test power off of av SC that often?).
/AndersBj
From: Bertil Engelholm [mailto:ebereng@users.sf.net]
Sent: den 16 augusti 2013 11:05
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
What triggered this problem was that they powered off SC-2 after a cluster reboot. This causes immnd on some of the PL's to restart (OUT OF ORDER messages). On one of the PL's (PL-5) the immnd failed to contact the immd after the restart causing the amfnd OmInitialize to timeout with TRY AGAIN.
So the issues here are :
Why did some of the immnd's restart when SC-2 was lost.
Why did immnd on PL-5 fail to contact immd after it restarted.
This indicates communication problems in MDS/TIPC.
The question still remains what amfnd should do when IMM communication times out.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Wed Aug 14, 2013 06:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Last edit: Nagendra Kumar 2013-08-19
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
In the code I can see that the unresponsive local IMMND can not be due
to a long sync.
The TRY_AGAIN is only given when the local IMMND is dow.
If the local IMMND was syncing then it would be up.
In that case it would discover the handle to be stale and it actually
replies with SA_AIS_OK in
saImmOiImplementerClear in this situation.
So the reason that IMMND is not reponsing is that it has gone down and
not been restarted.
/AndersBj
Anders Bjornerstedt wrote:
If we could improve immnd behaviour from 'restarting' to 'resync the data without restart' in case of counter mismatch(which is very frequent), these kind of problems may get resolved.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 12:46
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517] Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
#0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
#1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
#2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
#3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
#4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
#5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=<optimized out="">) at avnd_susm.c:1236
#6 0x000000000042ffe0 in avnd_evt_process (evt=<optimized out="">) at avnd_proc.c:278
#7 avnd_main_process () at avnd_proc.c:219
#8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
What kind of counter missmatch are you talking about that is "frequent" ?
And why is the IMMND not restarted ?
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 10:47
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
If we could improve immnd behaviour from 'restarting' to 'resync the data without restart' in case of counter mismatch(which is very frequent), these kind of problems may get resolved.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 12:46
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Ok looked back at the ticket.
Thiswas a fevs count missmatch and those I claim are rare.
The can happen when MDS drops messages, which is rare.
Or at least it used to be rare.
Perhapse there is an introduced problem with MDS in some recent release.
The ticklet is lacking both version and milestone so I cant tell.
/AndersBj
From: Anders Björnerstedt
Sent: den 19 augusti 2013 10:52
To: '[opensaf:tickets] '
Subject: RE: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
What kind of counter missmatch are you talking about that is "frequent" ?
And why is the IMMND not restarted ?
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 10:47
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
If we could improve immnd behaviour from 'restarting' to 'resync the data without restart' in case of counter mismatch(which is very frequent), these kind of problems may get resolved.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 12:46
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Tickets:
#517The reason of immnd down is frequent asserts/aborts in the code all around.
Logs like "OUT OF ORDER my highest processed" are frequent. immnd can flush its existing data and resync again.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 14:23
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
What kind of counter missmatch are you talking about that is "frequent" ?
And why is the IMMND not restarted ?
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 10:47
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
If we could improve immnd behaviour from 'restarting' to 'resync the data without restart' in case of counter mismatch(which is very frequent), these kind of problems may get resolved.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 12:46
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71 ________
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517] Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Tue Aug 13, 2013 12:16 PM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
#0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
#1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
#2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
#3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
#4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
#5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=<optimized out="">) at avnd_susm.c:1236
#6 0x000000000042ffe0 in avnd_evt_process (evt=<optimized out="">) at avnd_proc.c:278
#7 avnd_main_process () at avnd_proc.c:219
#8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Tickets:
#517Can we please avoid non constructive comments like "Frequent asserts/aborts all arround".
That statement just implies poor quality of the IMMND which I claim is not a justified claim.
If thre are uncontrolled crashes of the immsv then they should be reported in a ticket.
If the IMD really ios crashing so much then trying to "flush data" and resync without restarting would be insane.
Lets try to focus again on this particular case, with the fevs count missmatch and try
to avoid throwing arround ungrounded "blame".
If fevs count problems are frequent then MDS is simply unreliable.
This has NOT been a fequent problem before so iff it IS a problem now the OpenSAF has BIG
problems with MDS. Trying to compensate newly discovered problems with MDS by having the
immsv re-syncing is not the correct order od priority.
The question then is why has the frequencey of MDS dropping messages increased so much.
What has changed in MDS or the setup/configuration or test ??
ISo if this is indeed a case (that we now see frequent problems with MDS dropping messages)
then there needs to be a critical ticket looking in to thet.
Finally, I woul still like to know why the IMMND is not restarted.
Regardless of how frequent the FEVS count missmatch (i.e. MDS problems) are.
/Anders
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 11:08
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason of immnd down is frequent asserts/aborts in the code all around.
Logs like "OUT OF ORDER my highest processed" are frequent. immnd can flush its existing data and resync again.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 14:23
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
What kind of counter missmatch are you talking about that is "frequent" ?
And why is the IMMND not restarted ?
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 10:47
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
If we could improve immnd behaviour from 'restarting' to 'resync the data without restart' in case of counter mismatch(which is very frequent), these kind of problems may get resolved.
-Nagu
-----Original Message-----
From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
Sent: 19 August 2013 12:46
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The reason that imm is not responsive appears to be that the local IMMND is down.
So one question is why is IMMND down locally?
Did it crash or exit ?
Second, if IMMND goes down it will restart and try to get synced.
Such a sync should take less than 60 seconds.
It depends of course on the ammount of data and the hardware used.
But a typical normal sync takes 10 - 20 seconds.
If the sync can not get started (no IMMD service ?) then that would explain it.
Since the problem was reproduced I assume there are syslogs.
/AndersBj
From: Nagendra Kumar [mailto:nagendra-k@users.sf.net]
Sent: den 19 augusti 2013 06:37
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
The question still remains what amfnd should do when IMM communication times out.
If Immnd is not responsive for configured timeout, then Amfnd aborts and it traslates into rebooting the node. It looks justifiable for Amfnd as untill it doesn't get configurable information, it can't proceeds and hence, finally it looks like local node issue.
Immnd must response in real time.
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Fri Aug 16, 2013 09:05 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71 __
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano Last Updated: Tue Aug 13, 2013 12:16 PM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Tickets:
#517First there are logs but they are still internal. And immnd is restarted by AMF
Ok, my mistake, I was looking at #516 wihere saImmOiImplementerClear was the problem.
In that ticket there is talk about "the imm becomming slow".
But in actuality the IMMND is not at all "slow" there.
It is simply not being restarted in case #516.
In #517, the IMMND has apparently been restarted and is syncing, but there the problem seems to be trivial
in that the default timeout is used in immutils. An imm sync can take more than 10 seconds.
About the claim that the fevs-count missmatch problem happens "often".
That claim nees to be evaluated, since if it is true then we dont have a reliable system.
/AndersBj
From: Hans Feldt [mailto:hansfeldt@users.sf.net]
Sent: den 19 augusti 2013 11:41
To: [opensaf:tickets]
Subject: [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
First there are logs but they are still internal. And immnd is restarted by AMF
[tickets:#517]http://sourceforge.net/p/opensaf/tickets/517/ Amfnd: coredumps when calling immutil_saImmOmInitialize
Status: assigned
Created: Wed Jul 24, 2013 08:29 AM UTC by hano
Last Updated: Mon Aug 19, 2013 04:37 AM UTC
Owner: Nagendra Kumar
avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors.
There are two problems here, immutil should not do abort and the other problem
is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd.
(gdb) bt
0 0x00007ffac238cb35 in raise () from /lib64/libc.so.6
1 0x00007ffac238e111 in abort () from /lib64/libc.so.6
2 0x00000000004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70
3 0x00000000004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0)
at ../../../../../osaf/tools/safimm/src/immutil.c:1126
4 0x0000000000422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743
5 0x0000000000436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236
6 0x000000000042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278
7 avnd_main_process () at avnd_proc.c:219
8 0x0000000000408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/517/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
In broad level, Amfnd reads configuration from Immnd in below scenario:
1. During CSI Assignment.
2. SU instantiation.
3. Amfnd comming up.
4. During clc script running.
Mostly, we are getting stuck at #2.
One Solution looks to me for #2 : In case Amfnd fails to get information from local Immnd(Timeout), it can get the information from Amfd by some means as Amfd has those information already.
Any thoughts ?
For more info see: http://devel.opensaf.org/~hafe/AMF/amfnd-start.png
Getting the config from amfd kind of means reverting back to the 3.0 state. Good or bad I don't know yet. But it seems unnecessary to distribute information that is already available locally on the node. At the same time there are some hard to solve problems.
I envision a separate thread in amfnd that read from IMM and handles TRYAGAIN loops. Such thread could also be an applier for the case of detecting config changes
Or If Amfnd fails to get information from local Immnd(Timeout), it can get the information from any other non-local Immnd.