I have a distributed OW network which is run by 2 servers (TinyCore, Centos 6.x), the PERL port of the software, both running owserver for my client software to read. The harware is USB DS9490 adapters, many DS18B20 sensors and hubs/switches bought at Hobby-Boards.
Occasionally, one or the other owserver instances stops responding, sometimes making OW::get wait indefinitely, sometimes returning 'undef' after some timeout. The only solution to recover seems to be to kill/restart owserver and the monitoring software.
I took to sending OW:init commands to each server before issuing the OW::get assuming this would initialize something, which is working fine until the owserver dies.
Once a server dies, though (e.g. intentionally killing the process for testing purposes) not only OW::get from the dead server fail, but also from the other. Restarting the owserver does not help to recover/make reads work from either owserver.
Restarting the software perforning the reads makes everythig work ok after the owserver had been restarted
Using an initial OW::init to both servers in one call seems to be the more stable situation with OW::get working at least on the 'surviving' owserver, and actually starting to work again on the restarted owserver (despite new process ID etc). On the other hand this scenario 'seems to increase' the occasions of an OW::get never coming back.
The only reliable way of dealing with the instability that I found so far is using CRON as a scheduler to start the software for a one-time read, and if a previous process of the same software is still runnning kill it, kill/restart the owservers before performing the read.
Is there a recommended way of dealing with errors, or evenly importantly, a clean way to reset the entire 1-wire system from within a running application?
Regards
Steffen
First, if owserver stops responding and you are using the DS9490, that could be due to USB problems. Check the kernel logs with "dmesg" in that case. Usually owserver reconnects to a "newly" found DS9490, but sometimes this seems to fail for unknown reasons. It that case, the only way to get it working again is restarting the owserver process. (You could have it worse: the DS9490 is also prone to hang internally on some occasions. Rare, but when it does you have to disconnect it from USB completely.)
And yes, you should OW::init only once and you should always list all the devices you want to access. Don't know why the Perl binding allows multiple connecting. It's not intended to work that way.
For OW::get not coming back, I recommend you to tweak the network and server node in the /settings/timeout path of owfs.