I submitted this patch a couple of weeks ago but maybe it fell
through the cracks. Here it is again, re-integrated with today's CVS
Our testing group at IBM noticed that they would get intermittent error
messages when running their test suite on PPC machines with a 2.6.18 kernel:
"line 1292: echo: write error: Invalid argument"
We were able to reproduce this by running the following script:
for (( i=1; i<=50; i++ )); do
This problem only occurs when opcontrol --start is used instead of
opcontrol --start-daemon; opcontrol --start.
After some debugging we concluded that there is a race condition between:
The problem is that the daemon doesn't trigger the kernel driver setup
until AFTER the child process(daemon) is forked and the parent process
returns control back to opcontrol. This allows the possiblity that
opcontrol will trigger oprof.c:oprofile_start() in the kernel driver,
before oprof.c:oprofile_setup() has completed. When oprofile_start()
wins the race, is_setup==0, and you get the "write error" message.
The fix for this is to move the call to opd_open_files() to the end of
opd_26_init() which will cause the parent process to wait for the kernel
driver to complete setup in drivers/oprofile/oprof.c:oprofile_setup()
before returning control to the daemon. The daemon will then fork the
child process, which will create the lock file and return control to
opcontrol allowing it to check for the lock file.
Although the same race condition probably exists in kernel version 2.4 I
did not change daemon/liblegacy/init.c.
LTC Linux Power Toolchain