While I was trying to write some unit-tests for pytango, I noticed an intriguing timeout issue in the C++ API: it takes a long time to start a device server in a subprocess after some unrelated tango calls in the parent process. More precisely, the Util
object instanciation takes 5 seconds.
Here's the code to reproduce this issue:
import os import tango from tango.server import Device, DeviceMeta class Test(Device): __metaclass__ = DeviceMeta # Unrelated tango call db = tango.Database() # Run the server in a subprocess (using fork) os.wait() if os.fork() else Test.run_server()
I'm fairly confident the equivalent C++ code will produce the same issue.
Dear Vincent,
thanks for the bug report.
Actually, it seems to be a bad idea to do a fork in a multithreaded program as stated on this link:
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them.
If you are using fork, you should rather start your device server using one of the functions from the exec family. This is what the Starter device server is doing for instance. (Starter is actually doing a double fork + execxxx)
In the example you provided, the creation of the Database object will initialize CORBA (as a client) and connect to the database CORBA object. This will create some CORBA threads. omniORB will create a special thread called the Scavenger thread which will scan the CORBA threads for idle connections (every 5 s by default, defined by ORBscanGranularity parameter) and kill the idle connection threads automatically after a while (180s or 120s by default).
When the device server is started, Tango tries to destroy the ORB which was previously created, because it was created for a CORBA client only. We need an ORB for a CORBA server in this case.
The 5 seconds timeout you are observing is happening when Tango tries to destroy the previously created ORB. omniORB is then waiting for each previoulsy created thread to stop with a timeout corresponding to the ORBscanGranularity parameter (5 seconds by default). But because of what is described in the link I provided before, especially because of some critical sections/mutexes, the child process cannot stop the threads which were created by the parent process so it has to wait until the end of this timeout.
One work-around to remove this 5 seconds waiting time is to set ORBscanGranularity environment variable to 0. This will disable the omniORB ScanAvenger thread and make the ORB destroy fast BUT you have to be conscious that idle connection threads won't be removed any longer if you do that! This might be acceptable in your use case, since this is for unit tests but please be careful when using that.
Please be aware that passing -ORBscanGranularity 0 as argin parameter when starting the device server will not change the behaviour since this parameter will be taken into account only after the previous ORB has been destroyed. You would still get the timeout in this case. Using the ORBscanGranularity environment variable should work to make this 5 seconds waiting time disappear.
What you could do as well is setting ORBscanGranularity to 0 and pass the -ORBscanGranularity argin parameter set to 5 at device server creation. (I am not familiar with PyTango but I guess there should be a way to pass this kind of parameter at device server creation.) This way, the device server will still run with the Scavenger thread and idle connections threads will still be automatically removed for your device server.
Hoping this helps a bit.
Kind regards
Reynald (with the help of Manu for trying to understand this issue).
Thanks Reynald for the thorough answer, it's much clearer now.
In this case, we can't really do that. In fact, the device classes are created in test functions (sometimes dynamically) so there is no specific server to run using exec.
Indeed, I just checked with this code:
And the 5 seconds timeout is still here.
This solution actually works fine!
Thanks a lot for your help.
I can add a bit of information about my experimentation with unit-testing.
The thread approach:
The subprocess approach:
Then I use a library called pytest-xdist that let me combine those approaches to get the best of both worlds. In fact, this lib has a
--boxed
option to automatically run the collected tests in isolated subprocesses. Since that happens before any kind of CORBA operation, the problem we've been discussing does not apply. Then the test function can either run one server in a thread, or several servers in subprocesses, depending on the use case.Thanks,
/Vincent