Activity for Rafael Odzakow

  • Rafael Odzakow Rafael Odzakow committed [6dcf23]

    smf: update PR with information about faster upgrade [#2017]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2648

    That would work. As long as it is possible to rollback the campaign it is fine. On 10/20/2017 03:18 PM, Alex Jones wrote: I understand the intention. It makes sense. One of the other solutions I had considered is to put a check at the beginning of SmfCampaign::initExecution(). If the campaign state is EXECUTION_COMPLETED, then just return. What is the point of reexecuting a campaign that already completed? Are you OK with that? [tickets:#2648] https://sourceforge.net/p/opensaf/tickets/2648/ smf:...

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2648

    A rollback will not work if a unexpected cluster-reboot was done before PBE was enabled. SMF looses its runtime data in that case, so your patch would cause issues for rollback. The intention is to be able to test the system and then decide to proceed with rollback or commit. That means reboots are allowed once the campaign is completed together with rollback. I think the issue you are having needs a solution preferably in SMF but I'm not sure how that would look yet.

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2648

    I can not comment on moving restorePbe to commit for now. But a SMF rollback is what is used to undo the campaign operations. A reboot would only clear changed IMM data if PBE was off. That leaves software and CLI operations, which would cause incompatibilities. A rollback has to be planned for in a campaign and does not handle errors. So by default SMF takes a backup before starting a campaign to be able to recover.

  • Rafael Odzakow Rafael Odzakow committed [410025]

    smf: refactor smfd folders [#2633]

  • Rafael Odzakow Rafael Odzakow modified ticket #2633

    smf: refactor smfd folders

  • Rafael Odzakow Rafael Odzakow created ticket #2633

    smf: refactor smfd directory structure

  • Rafael Odzakow Rafael Odzakow modified ticket #1572

    smf: Node by node upgrade

  • Rafael Odzakow Rafael Odzakow modified ticket #2622

    base: double start failed

  • Rafael Odzakow Rafael Odzakow committed [6f5eb2]

    base: double start failed [#2622]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2622

    issue was found on ubuntu 14.04 where subsys folder is not created by default. Move the pid removal to be called after pidofproc.

  • Rafael Odzakow Rafael Odzakow committed [0b6e35]

    base: double start failed [#2622]

  • Rafael Odzakow Rafael Odzakow committed [393adc]

    base: double start failed [#2622]

  • Rafael Odzakow Rafael Odzakow modified ticket #2555

    smf: execLevel for balanced upgrade

  • Rafael Odzakow Rafael Odzakow committed [e8c8e1]

    base: double start failed [#2622]

  • Rafael Odzakow Rafael Odzakow modified ticket #2622

    base: double start failed

  • Rafael Odzakow Rafael Odzakow modified ticket #2622

    base: double start failed

  • Rafael Odzakow Rafael Odzakow modified ticket #2622

    [base] double start failed

  • Rafael Odzakow Rafael Odzakow created ticket #2622

    double start failed

  • Rafael Odzakow Rafael Odzakow committed [d02423]

    smf: execLevel for balanced upgrades [#2555]

  • Rafael Odzakow Rafael Odzakow modified ticket #2599

    smf: remove cascading delete for runtime objects

  • Rafael Odzakow Rafael Odzakow modified ticket #2599

    smf: remove cascading delete for runtime objects

  • Rafael Odzakow Rafael Odzakow created ticket #2599

    smf: remove cascading delete for runtime objects

  • Rafael Odzakow Rafael Odzakow modified ticket #2464

    smf: try to wait for opensafd status before executing reboot

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2464

    commit f3ef8eebf44f0eab4dcc65f83fe3119a77ef5067 (HEAD -> develop, origin/develop) Author: Rafael Odzakow rafael.odzakow@ericsson.com Date: Mon Sep 25 13:52:03 2017 +0200 smf: try to wait for opensafd status before reboot [#2464]

  • Rafael Odzakow Rafael Odzakow committed [f3ef8e]

    smf: try to wait for opensafd status before reboot [#2464]

  • Rafael Odzakow Rafael Odzakow modified ticket #2464

    smf: try to wait for opensafd status before executing reboot

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2464

    Seen again as protecting with mutexes and try again loops in opensafd nid script does not solve for when triggering node reboot as other services will be shutting down and causing unexpected errors.

  • Rafael Odzakow Rafael Odzakow committed [eae3dd]

    smf: try to wait for opensafd status before reboot [#2464]

  • Rafael Odzakow Rafael Odzakow committed [0eea25]

    smf: execLevel for balanced upgrade [#2555]

  • Rafael Odzakow Rafael Odzakow modified ticket #2555

    smf: execLevel for balanced upgrade

  • Rafael Odzakow Rafael Odzakow modified ticket #2555

    smf: execLevel for balanced upgrade

  • Rafael Odzakow Rafael Odzakow modified ticket #2555

    smf: execLevel for balanced upgrade

  • Rafael Odzakow Rafael Odzakow created ticket #2555

    smf: execLevel for balanced upgrade

  • Rafael Odzakow Rafael Odzakow modified ticket #2441

    smf: coredump and syslog flood after immnd crash

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2441

    Setting it to minor until it shows up again.

  • Rafael Odzakow Rafael Odzakow modified ticket #2441

    smf: coredump and syslog flood after immnd crash

  • Rafael Odzakow Rafael Odzakow modified ticket #2464

    smf: try to wait for opensafd status before executing reboot

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2464

    solved in base opensaf commit a051496719a3c862594af17d88b082031dd53b33 (ticket-2459)

  • Rafael Odzakow Rafael Odzakow committed [afcdbe]

    nid: order of system log print out is not correct [#2541]

  • Rafael Odzakow Rafael Odzakow created ticket #2541

    nid: order of system log print out is not correct

  • Rafael Odzakow Rafael Odzakow modified ticket #2521

    smf: no node locking when procedures are empty

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2521

    for rolling upgrades only commit 653edb5d9b217f1a3280b5aed8597fb53ffa5f61

  • Rafael Odzakow Rafael Odzakow committed [653edb]

    smf: no node locking when procedures are empty [#2521]

  • Rafael Odzakow Rafael Odzakow modified ticket #2521

    smf: no node locking when procedures are empty

  • Rafael Odzakow Rafael Odzakow modified ticket #2521

    smf: no node locking when procedures are empty

  • Rafael Odzakow Rafael Odzakow committed [9f8015]

    smf: no node locking when procedures are empty [#2521]

  • Rafael Odzakow Rafael Odzakow modified ticket #2521

    smf: remove node locking with empty procedures

  • Rafael Odzakow Rafael Odzakow created ticket #2521

    smf: remove node locking with empty procedures

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    fixed in commit 3e1d1091270fa83cb8efe5458d6050b56f41f001 Author: Rafael Odzakow rafael.odzakow@ericsson.com Date: Fri Jun 30 10:57:36 2017 +0200 smf: 20 seconds timeout in getting node destination is not enough [#2499]

  • Rafael Odzakow Rafael Odzakow modified ticket #2499

    SMF: 20 seconds timeout in getting node destination is not enough

  • Rafael Odzakow Rafael Odzakow committed [3e1d10]

    smf: 20 seconds timeout in getting node destination is not enough [#2499]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2451

    For the node that is not allowed to join the CLM cluster will this solution also block IMM (and other services) from starting up?

  • Rafael Odzakow Rafael Odzakow modified a comment on ticket #2499

    This issue is as far as I could see a bug. In other campaign sequences SMF will wait with rebootTimeout before doing any operation after reboot. In this campaign sequence the first operation type after a reboot was to to a CLI command on a payload node. This timed out because the CLI command is not wrapped in a retry using the rebootTimeout of SMF. SMF does not keep track of all nodes after a cluster reboot therefore the mechanism for handling a cluster reboot is to wrap all possible operations that...

  • Rafael Odzakow Rafael Odzakow modified a comment on ticket #2499

    This issue is as far as I could see a bug. In other campaign sequences SMF will wait with rebootTimeout before doing any operation after reboot. In this campaign sequence the first operation type after a reboot was to to a CLI command on a payload node. This timed out because the CLI command is not wrapped in a retry using the rebootTimeout of SMF. SMF does not keep track of all nodes after a cluster reboot therefore the mechanism for handling a cluster reboot is to wrap all possible operations that...

  • Rafael Odzakow Rafael Odzakow committed [b48d48]

    smf: 20 seconds timeout in getting node destination is not enough [#2499]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    This issue is as far as I could see a bug. In other campaign sequences SMF will wait with rebootTimeout before doing any operation after reboot. In this campaign sequence the first operation type after a reboot was to to a CLI command on a payload node. This timed out because the CLI command is not wrapped in a retry using the rebootTimeout of SMF. SMF does not keep track of all nodes after a cluster reboot therefore the mechanism for handling a cluster reboot is to wrap any operations that is done...

  • Rafael Odzakow Rafael Odzakow modified ticket #2459

    try-again for opensafd stop

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2459

    commit a051496719a3c862594af17d88b082031dd53b33 (ticket-2459) base: Try again for opensafd stop [#2459] Internally opensafd creates a mutex during start/stop to avoid parallel execution. Makes mutex more robust and add a short retry if mutex is taken.

  • Rafael Odzakow Rafael Odzakow committed [a05149]

    base: Try again for opensafd stop [#2459]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    Going for a short vacation, here is the untested patch. Use rebootTimeout to increase the timeout for it. commit 2ffbd1c5cd3f4193fd631130eef60b17c92892e6 (HEAD -> ticket-2499) Author: Rafael Odzakow rafael.odzakow@ericsson.com Date: Tue Jun 20 16:10:12 2017 +0200 smf: 20 seconds timeout in getting node destination is not enough [#2499] diff --git a/src/smf/smfd/SmfUpgradeStep.cc b/src/smf/smfd/SmfUpgradeStep.cc index 2ffeab110..a99c7661a 100644 --- a/src/smf/smfd/SmfUpgradeStep.cc +++ b/src/smf/smfd/SmfUpgradeStep.cc...

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    It should be enough to wrap getNodeDestination in waitForGetNodeDestination in SmfCliCommandAction::execute(). Other getNodeDestination calls are not needing to wait for nodes or have custom code for retry.

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    If you have the logs please send them my way.

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2499

    waitForNodeDestination already uses smfRebootTimeout. Is it still timing out or was getNodeDestination called without the waitFor wrapper?

  • Rafael Odzakow Rafael Odzakow modified ticket #2459

    try-again for opensafd stop

  • Rafael Odzakow Rafael Odzakow committed [29bb1d]

    base: Try again for opensafd stop [#2459]

  • Rafael Odzakow Rafael Odzakow committed [a7b509]

    base: Try again for opensafd stop [#2459]

  • Rafael Odzakow Rafael Odzakow modified ticket #2464

    smf: try to wait for opensafd status before executing reboot

  • Rafael Odzakow Rafael Odzakow committed [941789]

    smf: try to wait for opensafd status before executing reboot [#2464]

  • Rafael Odzakow Rafael Odzakow created ticket #2464

    smf: try to wait for opensafd status before executing reboot

  • Rafael Odzakow Rafael Odzakow committed [790dab]

    base: Improve state report for opensafd [#2459]

  • Rafael Odzakow Rafael Odzakow modified ticket #2459

    improve state report for opensafd

  • Rafael Odzakow Rafael Odzakow committed [5e1e02]

    base: Improve state report for opensafd [#2459]

  • Rafael Odzakow Rafael Odzakow modified ticket #2459

    improve state report for opensafd

  • Rafael Odzakow Rafael Odzakow created ticket #2459

    graceful shutdown of opensafd

  • Rafael Odzakow Rafael Odzakow created ticket #2441

    smf: coredump and syslog flood after immnd crash

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #1969

    pushed to develop with commit f9149b49420d989b6ffcaf0f3553c5452e7e2302

  • Rafael Odzakow Rafael Odzakow modified ticket #1969

    smf: One step upgrade with cluster reboot does not wait for nodes to start

  • Rafael Odzakow Rafael Odzakow committed [f9149b]

    smf: cli-command does not wait for nodes to start [#1969]

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2419

    I consider the AMF objects as an interface and some external code outside of OpenSAF might be reading that campaignDN attribute.

  • Rafael Odzakow Rafael Odzakow modified ticket #2402

    base: "hardening" use of lockfile in opensafd

  • Rafael Odzakow Rafael Odzakow posted a comment on ticket #2402

    I have seen a issue with the lockfile. Here are some parts from the system log: 21:59:15 SC-1 opensafd: Starting OpenSAF Services(5.2.0 - 8767:c1cc2a915e72:default) (Using TCP) Reboot command is issued from SC-2: 21:59:16 SC-2 osafsmfd[599]: NO STEP: Reboot node for removal safAmfNode=SC-1,safAmfCluster=myAmfCluster SC-1 is not finished with the start of opensaf. This line is missing from SC-1: opensafd: OpenSAF services successfully started 21:59:17 SC-1 opensafd: Stopping OpenSAF Services 21:59:17...

  • Rafael Rafael posted a comment on ticket #2419

    It is possible to do it both ways but I prefer to do this in AMF because it appears that the campaign dn was set on the objects before #2144 and #2145 were introduced. It was set by SMF and most likely the attribute was never used but I can't say for sure. The safe solution is to keep setting it just as it has been previously. As for turning this on/off during a campaign. If someone external decides to change things in IMM during upgrade then we can not guarantee that the campaign will be successful....

  • Rafael Rafael committed [a424e4]

    smf: cli-command does not wait for nodes to sta...

  • Rafael Rafael modified a comment on ticket #2419

    Suggestion is to disable setting the maintenance campaign attribute on the AMF object...

  • Rafael Rafael posted a comment on ticket #2419

    Suggestion is to disable setting the maintenance campaign attribute on the AMF object...

  • Rafael Rafael modified a comment on ticket #2419

    Hej, valid question. In the case that we looked at the component recovered automatically...

  • Rafael Rafael modified a comment on ticket #2419

    Hej, valid question. In the case that we looked at the component recovered automatically...

  • Rafael Rafael modified ticket #1969

    smf: One step upgrade with cluster reboot does not wait for nodes to start

  • Rafael Rafael modified ticket #2419

    smf: when fixing ticket #2145 a NBC problem was introduced

  • Rafael Rafael posted a comment on ticket #2419

    Hej, valid question. In the case that we looked at the component recevered automatically...

  • Rafael Rafael committed [37e738]

    smf: admin owner err_exist on parallel procedur...

  • Rafael Rafael committed [37f663]

    smf: admin owner err_exist on parallel procedur...

  • Rafael Rafael modified ticket #2288

    smf: admin owner err_exist on parallel procedures

  • Rafael Rafael posted a comment on ticket #2288

    Pushed to default branch: HG changeset patch User Rafael Odzakow rafael.odzakow@ericsson.com...

  • Rafael Rafael committed [12d8b8]

    smf: admin owner err_exist on parallel procedur...

  • Rafael Rafael modified a comment on ticket #2288

    campaign example: <softwareBundle name="safSmfBundle=BundleA"> <removal> <offline...

  • Rafael Rafael modified a comment on ticket #2288

    campaign example: <softwareBundle name="safSmfBundle=BundleA"> <removal> <offline...

1 >