After a restart of the secondary master node fc-brokerd was unable to start dynamic VMs on that node.
The logs showed that fc-brokerd was unable to get the list of available/running VMs on that node.
A restart of fc-brokerd solved the problem.
virConnectIsAlive() seems to be a simple ping function, so you can most probably use it at will.
But can't you simply invalidate the connection handle in the "catch(VirtException &e)..." block to force a reconnect on the next run?
Getting the number of domains running is a pretty basic command. Failing in that context most probably means that you lost the connection to the remote libvirt server and the easiest way to recover from that is to reconnect.
Please attach a log file (preferably start the daemon with loglevel 7)
Can it lead to problems if virConnectIsAlive() is called repeatedly to check if the connection still exists?
virConnectIsAlive() seems to be a simple ping function, so you can most probably use it at will.
But can't you simply invalidate the connection handle in the "catch(VirtException &e)..." block to force a reconnect on the next run?
Getting the number of domains running is a pretty basic command. Failing in that context most probably means that you lost the connection to the remote libvirt server and the easiest way to recover from that is to reconnect.
You can then improve on that by following http://libvirt.org/guide/pdf/Application_Development_Guide.pdf and do a proper error handling in your catch-block using virGetLastError() which gives you a pointer to the current virError struct which in turn contains a member "code" which is a virErrorNumber and can be deciphered acording to http://libvirt.org/html/libvirt-virterror.html
Problem is still present in fc-brokerd 1.2.11: