[Refdb-users] Recovering after errors
Status: Beta
Brought to you by:
mhoenicka
|
From: Markus H. <mar...@mh...> - 2006-07-11 21:57:12
|
Hi Dan, Dan O'Donnell writes: > I've been having trouble lately with refdb crashing after errors and > then being extremely difficult to restart. A typical cause for something > like this might be an incorrect command-line option (I once used -t for > the file name instead of -f and was forced to use control c to abort). > After an error like this, refdb seems to lose contact with its databases > Mysql, however, is still working. > What command were you trying to run? Do you remember the exact command line? I'd like to replay what was going wrong here. It is likely that refdbd lacks a few sanity checks for variable values. > I've tried several things to get it running again from using refdbctl to > stop and start, to restarting mysql and apache2, to removing my > configuration files and reinstalling refdb with the installation script > (often when I come to do this there is one or more refdbd sessions not > properly killed off and immune to refdbctl [I kill them manually]) > refdbctl kills only the process that registered its PID in the appropriate file. If you bypass refdbctl and start refdbd manually, you may end up running two processes, one of which can't be killed with refdbctl. Also, if something goes grossly wrong, you may have the parent and the child around at the same time. If the child hangs (which it should never do, of course) you can only kill it manually from the process list. > I usually get set and viewstat to work in refdba, and can usually > selectdb and use whichdb in refdbc. But things hang up the moment I try > to add any references. > > Any ideas what might be making it so unstable? Here's the log of my last > session. I image the problem is the lost version file > at /usr/local/var/lib/refdb/db/DB_VERSION. In refdbdrc this path is > given as /usr/var/lib/refdb/db/ so I'm not sure what is telling it to > look in this (non-existent) directory unless something in the setup has > missed my original prefix parameter: > Is that the log of a failed addref command? You should re-run this test with the log level set to 7. The "error" message does not seem to have much to do with the DB_VERSION stuff. The latter is only a means to give a packaging tool a hint about the database version without having to look at the database itself (which might require username and password info). If refdbd can't update this file it will continue without a hitch. The file or the write attempt has no meaning for the running process. > > > 6:pid=5412:Tue Jul 11 20:13:15 2006:adding client 127.0.0.1 on fd 6 > > 6:pid=5412:Tue Jul 11 20:13:15 2006:server waiting n_max_fd=6 > > 6:pid=5456:Tue Jul 11 20:13:15 2006:serving client on fd 6 with protocol version 4 > > 6:pid=5456:Tue Jul 11 20:13:15 2006:dbi is up > > 3:pid=5456:Tue Jul 11 20:13:15 2006:could not open version file: > > 3:pid=5456:Tue Jul 11 20:13:15 2006:/usr/local/var/lib/refdb/db/DB_VERSION > > 4:pid=5456:Tue Jul 11 20:19:29 2006:error > > 6:pid=5456:Tue Jul 11 20:19:29 2006:child finished client on fd 6 > My only advice (until I get more thorough debug info) is to make sure that you kill all hanging child processes if things go wrong. On many OSes the process IDs count up, so the child is usually the process with the higher ID. Other OSes pick random numbers, so you'll have to kill them all. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |