Thread: [Refdb-users] Recovering after errors
Status: Beta
Brought to you by:
mhoenicka
From: Dan O'D. <dan...@ul...> - 2006-07-11 20:37:55
|
I've been having trouble lately with refdb crashing after errors and then being extremely difficult to restart. A typical cause for something like this might be an incorrect command-line option (I once used -t for the file name instead of -f and was forced to use control c to abort). After an error like this, refdb seems to lose contact with its databases Mysql, however, is still working. I've tried several things to get it running again from using refdbctl to stop and start, to restarting mysql and apache2, to removing my configuration files and reinstalling refdb with the installation script (often when I come to do this there is one or more refdbd sessions not properly killed off and immune to refdbctl [I kill them manually]) I usually get set and viewstat to work in refdba, and can usually selectdb and use whichdb in refdbc. But things hang up the moment I try to add any references. Any ideas what might be making it so unstable? Here's the log of my last session. I image the problem is the lost version file at /usr/local/var/lib/refdb/db/DB_VERSION. In refdbdrc this path is given as /usr/var/lib/refdb/db/ so I'm not sure what is telling it to look in this (non-existent) directory unless something in the setup has missed my original prefix parameter: > 6:pid=5412:Tue Jul 11 20:13:15 2006:adding client 127.0.0.1 on fd 6 > 6:pid=5412:Tue Jul 11 20:13:15 2006:server waiting n_max_fd=6 > 6:pid=5456:Tue Jul 11 20:13:15 2006:serving client on fd 6 with protocol version 4 > 6:pid=5456:Tue Jul 11 20:13:15 2006:dbi is up > 3:pid=5456:Tue Jul 11 20:13:15 2006:could not open version file: > 3:pid=5456:Tue Jul 11 20:13:15 2006:/usr/local/var/lib/refdb/db/DB_VERSION > 4:pid=5456:Tue Jul 11 20:19:29 2006:error > 6:pid=5456:Tue Jul 11 20:19:29 2006:child finished client on fd 6 |
From: Markus H. <mar...@mh...> - 2006-07-11 21:57:12
|
Hi Dan, Dan O'Donnell writes: > I've been having trouble lately with refdb crashing after errors and > then being extremely difficult to restart. A typical cause for something > like this might be an incorrect command-line option (I once used -t for > the file name instead of -f and was forced to use control c to abort). > After an error like this, refdb seems to lose contact with its databases > Mysql, however, is still working. > What command were you trying to run? Do you remember the exact command line? I'd like to replay what was going wrong here. It is likely that refdbd lacks a few sanity checks for variable values. > I've tried several things to get it running again from using refdbctl to > stop and start, to restarting mysql and apache2, to removing my > configuration files and reinstalling refdb with the installation script > (often when I come to do this there is one or more refdbd sessions not > properly killed off and immune to refdbctl [I kill them manually]) > refdbctl kills only the process that registered its PID in the appropriate file. If you bypass refdbctl and start refdbd manually, you may end up running two processes, one of which can't be killed with refdbctl. Also, if something goes grossly wrong, you may have the parent and the child around at the same time. If the child hangs (which it should never do, of course) you can only kill it manually from the process list. > I usually get set and viewstat to work in refdba, and can usually > selectdb and use whichdb in refdbc. But things hang up the moment I try > to add any references. > > Any ideas what might be making it so unstable? Here's the log of my last > session. I image the problem is the lost version file > at /usr/local/var/lib/refdb/db/DB_VERSION. In refdbdrc this path is > given as /usr/var/lib/refdb/db/ so I'm not sure what is telling it to > look in this (non-existent) directory unless something in the setup has > missed my original prefix parameter: > Is that the log of a failed addref command? You should re-run this test with the log level set to 7. The "error" message does not seem to have much to do with the DB_VERSION stuff. The latter is only a means to give a packaging tool a hint about the database version without having to look at the database itself (which might require username and password info). If refdbd can't update this file it will continue without a hitch. The file or the write attempt has no meaning for the running process. > > > 6:pid=5412:Tue Jul 11 20:13:15 2006:adding client 127.0.0.1 on fd 6 > > 6:pid=5412:Tue Jul 11 20:13:15 2006:server waiting n_max_fd=6 > > 6:pid=5456:Tue Jul 11 20:13:15 2006:serving client on fd 6 with protocol version 4 > > 6:pid=5456:Tue Jul 11 20:13:15 2006:dbi is up > > 3:pid=5456:Tue Jul 11 20:13:15 2006:could not open version file: > > 3:pid=5456:Tue Jul 11 20:13:15 2006:/usr/local/var/lib/refdb/db/DB_VERSION > > 4:pid=5456:Tue Jul 11 20:19:29 2006:error > > 6:pid=5456:Tue Jul 11 20:19:29 2006:child finished client on fd 6 > My only advice (until I get more thorough debug info) is to make sure that you kill all hanging child processes if things go wrong. On many OSes the process IDs count up, so the child is usually the process with the higher ID. Other OSes pick random numbers, so you'll have to kill them all. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Daniel O'D. <dan...@ul...> - 2006-07-11 23:30:08
|
Here's l=7 output of my last session. I tried to add a single entry in a text file ~/one.ris using the following in refdbc; things just hung on me and I had to abort using ^C: selectdb refdbib addref -f one.ris (I started refdbc in ~/) refdbd.log: > 7:pid=6427:Tue Jul 11 23:21:05 2006:dbi_driver_dir went to: > 7:pid=6427:Tue Jul 11 23:21:05 2006: > 7:pid=6427:Tue Jul 11 23:21:05 2006:dbi is up using default driver dir > 6:pid=6427:Tue Jul 11 23:21:05 2006:Available libdbi database drivers: > 6:pid=6427:Tue Jul 11 23:21:05 2006:mysql > 6:pid=6427:Tue Jul 11 23:21:05 2006:Requested libdbi driver found > 6:pid=6427:Tue Jul 11 23:21:05 2006:Database directory: > 6:pid=6427:Tue Jul 11 23:21:05 2006:/usr/var/lib/refdb/db > 6:pid=6427:Tue Jul 11 23:21:05 2006:application server started > 6:pid=6427:Tue Jul 11 23:21:05 2006:share extended notes by default > 7:pid=6427:Tue Jul 11 23:21:05 2006:use /tmp/refdbd_fifo6427 as fifo > 6:pid=6427:Tue Jul 11 23:21:05 2006:server waiting n_max_fd=5 > 6:pid=6427:Tue Jul 11 23:21:35 2006:adding client 127.0.0.1 on fd 6 > 6:pid=6427:Tue Jul 11 23:21:35 2006:server waiting n_max_fd=6 > 7:pid=6429:Tue Jul 11 23:21:35 2006:try to read from client > 6:pid=6429:Tue Jul 11 23:21:35 2006:serving client on fd 6 with protocol version 4 > 7:pid=6429:Tue Jul 11 23:21:35 2006:210-21-04-49 > 7:pid=6429:Tue Jul 11 23:21:35 2006:send pseudo-random string to client > 7:pid=6429:Tue Jul 11 23:21:35 2006:selectdb refdbib -u dan -w 072035094057068069114113082 > 6:pid=6429:Tue Jul 11 23:21:35 2006:dbi is up > 7:pid=6429:Tue Jul 11 23:21:35 2006:localhost > 7:pid=6429:Tue Jul 11 23:21:35 2006:dan > 7:pid=6429:Tue Jul 11 23:21:35 2006:SecretPassWord > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:3306 > 7:pid=6429:Tue Jul 11 23:21:35 2006:mysql > 7:pid=6429:Tue Jul 11 23:21:35 2006:/usr/var/lib/refdb/db > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:refdb > 7:pid=6429:Tue Jul 11 23:21:35 2006:connected to database server using database:7:pid=6429:Tue Jul 11 23:21:35 2006:refdb > 3:pid=6429:Tue Jul 11 23:21:35 2006:could not open version file: > 3:pid=6429:Tue Jul 11 23:21:35 2006:/usr/local/var/lib/refdb/db/DB_VERSION > 7:pid=6429:Tue Jul 11 23:21:35 2006:Main database looks ok: > 7:pid=6429:Tue Jul 11 23:21:35 2006:refdb > 7:pid=6429:Tue Jul 11 23:21:35 2006:localhost > 7:pid=6429:Tue Jul 11 23:21:35 2006:dan > 7:pid=6429:Tue Jul 11 23:21:35 2006:SecretPassWord > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:3306 > 7:pid=6429:Tue Jul 11 23:21:35 2006:mysql > 7:pid=6429:Tue Jul 11 23:21:35 2006:/usr/var/lib/refdb/db > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:refdb > 7:pid=6429:Tue Jul 11 23:21:35 2006:connected to database server using database:7:pid=6429:Tue Jul 11 23:21:35 2006:refdb > 7:pid=6429:Tue Jul 11 23:21:35 2006:localhost > 7:pid=6429:Tue Jul 11 23:21:35 2006:dan > 7:pid=6429:Tue Jul 11 23:21:35 2006:SecretPassWord > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:3306 > 7:pid=6429:Tue Jul 11 23:21:35 2006:mysql > 7:pid=6429:Tue Jul 11 23:21:35 2006:/usr/var/lib/refdb/db > 7:pid=6429:Tue Jul 11 23:21:35 2006: > 7:pid=6429:Tue Jul 11 23:21:35 2006:refdbib > 7:pid=6429:Tue Jul 11 23:21:35 2006:connected to database server using database:7:pid=6429:Tue Jul 11 23:21:35 2006:refdbib > 7:pid=6429:Tue Jul 11 23:21:35 2006:SELECT meta_app,meta_type,meta_dbversion from t_meta > 7:pid=6429:Tue Jul 11 23:21:35 2006:command processing done, finish dialog now > 6:pid=6429:Tue Jul 11 23:21:35 2006:child finished client on fd 6 > 6:pid=6427:Tue Jul 11 23:21:35 2006:parent removing client on fd 6 > 6:pid=6427:Tue Jul 11 23:21:35 2006:server waiting n_max_fd=5 > 6:pid=6427:Tue Jul 11 23:21:35 2006:child exited with code 0 > 6:pid=6427:Tue Jul 11 23:21:35 2006:server waiting n_max_fd=5 > 6:pid=6427:Tue Jul 11 23:21:46 2006:adding client 127.0.0.1 on fd 6 > 6:pid=6427:Tue Jul 11 23:21:46 2006:server waiting n_max_fd=6 On Tue, 2006-11-07 at 23:56 +0200, Markus Hoenicka wrote: > Hi Dan, > > Dan O'Donnell writes: > > I've been having trouble lately with refdb crashing after errors and > > then being extremely difficult to restart. A typical cause for something > > like this might be an incorrect command-line option (I once used -t for > > the file name instead of -f and was forced to use control c to abort). > > After an error like this, refdb seems to lose contact with its databases > > Mysql, however, is still working. > > > > What command were you trying to run? Do you remember the exact command > line? I'd like to replay what was going wrong here. It is likely that > refdbd lacks a few sanity checks for variable values. > > > I've tried several things to get it running again from using refdbctl to > > stop and start, to restarting mysql and apache2, to removing my > > configuration files and reinstalling refdb with the installation script > > (often when I come to do this there is one or more refdbd sessions not > > properly killed off and immune to refdbctl [I kill them manually]) > > > > refdbctl kills only the process that registered its PID in the > appropriate file. If you bypass refdbctl and start refdbd manually, > you may end up running two processes, one of which can't be killed > with refdbctl. Also, if something goes grossly wrong, you may have the > parent and the child around at the same time. If the child hangs > (which it should never do, of course) you can only kill it manually > from the process list. > > > I usually get set and viewstat to work in refdba, and can usually > > selectdb and use whichdb in refdbc. But things hang up the moment I try > > to add any references. > > > > Any ideas what might be making it so unstable? Here's the log of my last > > session. I image the problem is the lost version file > > at /usr/local/var/lib/refdb/db/DB_VERSION. In refdbdrc this path is > > given as /usr/var/lib/refdb/db/ so I'm not sure what is telling it to > > look in this (non-existent) directory unless something in the setup has > > missed my original prefix parameter: > > > > Is that the log of a failed addref command? You should re-run this > test with the log level set to 7. The "error" message does not seem to > have much to do with the DB_VERSION stuff. The latter is only a means > to give a packaging tool a hint about the database version without > having to look at the database itself (which might require username > and password info). If refdbd can't update this file it will continue > without a hitch. The file or the write attempt has no meaning for the > running process. > > > > > > 6:pid=5412:Tue Jul 11 20:13:15 2006:adding client 127.0.0.1 on fd 6 > > > 6:pid=5412:Tue Jul 11 20:13:15 2006:server waiting n_max_fd=6 > > > 6:pid=5456:Tue Jul 11 20:13:15 2006:serving client on fd 6 with protocol version 4 > > > 6:pid=5456:Tue Jul 11 20:13:15 2006:dbi is up > > > 3:pid=5456:Tue Jul 11 20:13:15 2006:could not open version file: > > > 3:pid=5456:Tue Jul 11 20:13:15 2006:/usr/local/var/lib/refdb/db/DB_VERSION > > > 4:pid=5456:Tue Jul 11 20:19:29 2006:error > > > 6:pid=5456:Tue Jul 11 20:19:29 2006:child finished client on fd 6 > > > > My only advice (until I get more thorough debug info) is to make sure > that you kill all hanging child processes if things go wrong. On many > OSes the process IDs count up, so the child is usually the process > with the higher ID. Other OSes pick random numbers, so you'll have to > kill them all. > > regards, > Markus > -- Daniel Paul O'Donnell, PhD Associate Professor and Chair Director, Digital Medievalist Project <http://www.digitalmedievalist.org/> Department of English University of Lethbridge Lethbridge AB T1K 3M4 Tel. +1 (403) 329-2378 Fax. +1 (403) 382-7191 :@wiglaf (dapper ubuntu) |
From: Markus H. <mar...@mh...> - 2006-07-12 11:24:39
|
Hi Dan, Daniel O'Donnell <dan...@ul...> was heard to say: > > 7:pid=6429:Tue Jul 11 23:21:35 2006:selectdb refdbib -u dan -w > 072035094057068069114113082 I'm afraid you caught only the selectdb part of the log. I didn't find any hint about you running the addref command. Make sure to kill all instances of refdbd before looking at the log as there may still be data in the cache as long as one process keeps the file open. Can you retrieve datasets from that database? regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Dan O'D. <dan...@ul...> - 2006-07-12 17:14:16
|
Hi Markus, That log was cut from the end of tail n=500, so it may mean addref is not firing somehow (i.e. there is nothing else in the log). That would agree with what I'm seeing. After the first error, the db became unusable, so I wiped it and refdb out, refreshed mysql, and reinstalled refdb. This lack of action is coming on a fresh install. refdba is building the db and adding users, whichdb can see it, but addref seems to die. -d On Wed, 2006-12-07 at 13:24 +0200, Markus Hoenicka wrote: > Hi Dan, > > Daniel O'Donnell <dan...@ul...> was heard to say: > > > > 7:pid=6429:Tue Jul 11 23:21:35 2006:selectdb refdbib -u dan -w > > 072035094057068069114113082 > > I'm afraid you caught only the selectdb part of the log. I didn't find any hint > about you running the addref command. Make sure to kill all instances of refdbd > before looking at the log as there may still be data in the cache as long as one > process keeps the file open. Can you retrieve datasets from that database? > > regards, > Markus > |
From: Markus H. <mar...@mh...> - 2006-07-12 17:22:58
|
Dan O'Donnell <dan...@ul...> was heard to say: > Hi Markus, > > That log was cut from the end of tail n=500, so it may mean addref is > not firing somehow (i.e. there is nothing else in the log). That would > agree with what I'm seeing. > > After the first error, the db became unusable, so I wiped it and refdb > out, refreshed mysql, and reinstalled refdb. This lack of action is > coming on a fresh install. refdba is building the db and adding users, > whichdb can see it, but addref seems to die. > Is this somehow related to the kind of data you try to add? I see from the end of your log that the server receives a connection request from a client but nothing else happens. Needless to say I haven't seen this before. Is there a chance to get a ssh guest account on this box to debug this? regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Dan O'D. <dan...@ul...> - 2006-07-12 17:31:03
|
On Wed, 2006-12-07 at 19:22 +0200, Markus Hoenicka wrote: > Dan O'Donnell <dan...@ul...> was heard to say: > > > Hi Markus, > > > > That log was cut from the end of tail n=500, so it may mean addref is > > not firing somehow (i.e. there is nothing else in the log). That would > > agree with what I'm seeing. > > > > After the first error, the db became unusable, so I wiped it and refdb > > out, refreshed mysql, and reinstalled refdb. This lack of action is > > coming on a fresh install. refdba is building the db and adding users, > > whichdb can see it, but addref seems to die. > > > > Is this somehow related to the kind of data you try to add? Shouldn't be I believe it is a pretty straight forward RIS file (in UTF-8 though). > > I see from the end of your log that the server receives a connection request > from a client but nothing else happens. Needless to say I haven't seen this > before. Is there a chance to get a ssh guest account on this box to debug this? > Sending you one privately. > regards, > Markus > > |
From: Markus H. <mar...@mh...> - 2006-07-12 19:30:26
|
Dan O'Donnell writes: > Shouldn't be I believe it is a pretty straight forward RIS file (in > UTF-8 though). > > What is your setup in refdbdrc regarding the default character encoding of incoming RIS data? What encoding do you use in your MySQL database? I wonder whether it is possible to confuse refdbd by converting UTF-8 data once too often? regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Dan O'D. <dan...@ul...> - 2006-07-12 20:29:33
|
On Wed, 2006-12-07 at 21:27 +0200, Markus Hoenicka wrote: > Dan O'Donnell writes: > > Shouldn't be I believe it is a pretty straight forward RIS file (in > > UTF-8 though). > > > > > What is your setup in refdbdrc regarding the default character encoding of > incoming RIS data? What encoding do you use in your MySQL database? I > wonder whether it is possible to confuse refdbd by converting UTF-8 > data once too often? Well right now nothing, since I'm trying a new install. But it had been utf8 in and out (BTW, another possibility may lie in how utf8 is referred to in these rc files? Sometimes it seems to be utf8, sometimes UTF-8; I never know what to do if I'm indicating it with -E on the command line). > > regards, > Markus > |