From: HaJo S. <ha...@ha...> - 2004-12-24 04:40:06
|
Hi Lionel, Just discovered two start-up issues with sqlgrey while doing a reboot of my mail server: - During boot, sqlgrey couldn't access postgresql yet (DBI returned "db is starting up"). sqlgrey then tried to create the from_awl table, failed (db still starting up) and died silently (rather than, what would IMO have been correct, retrying after a while). I have attached the relevant part of the log below fyi. - I accidentially started sqlgrey (thorugh /etc/init.d/sqlgrey start) as non-root user. The init-script said: 'Starting SQLgrey: Pid_file "/var/run/sqlgrey.pid" already exists. Overwriting!' and then '[ OK ]'. However, it was of course not OK, sqlgrey died with a permission-denied on the PID file, did not tell me however... Merry Christmas, HaJo ----------[maillog sniplet]--------------- Dec 23 23:19:18 sun sqlgrey[3691]: Process Backgrounded Dec 23 23:19:18 sun sqlgrey[3691]: 2004/12/23-23:19:17 sqlgrey (type Net::Server::Multiplex) starting! pid(3691) Dec 23 23:19:20 sun sqlgrey[3691]: Binding to TCP port 2501 on host localhost Dec 23 23:19:20 sun sqlgrey[3691]: Group Not Defined. Defaulting to EGID '0' Dec 23 23:19:20 sun sqlgrey[3691]: Setting uid to "91" Dec 23 23:19:21 sun sqlgrey[3691]: Can't connect to DB: FATAL: The database system is starting up Dec 23 23:19:21 sun sqlgrey[3691]: Can't connect to DB: FATAL: The database system is starting up Dec 23 23:19:21 sun sqlgrey[3691]: Warning: couldn't do query: SELECT 1 from from_awl LIMIT 0: FATAL: The database system is starting up, reconnecting to DB Dec 23 23:19:22 sun sqlgrey[3691]: Can't connect to DB: FATAL: The database system is starting up Dec 23 23:19:22 sun sqlgrey[3691]: Can't connect to DB: FATAL: The database system is starting up Dec 23 23:19:22 sun sqlgrey[3691]: Warning: couldn't do query: CREATE TABLE from_awl (sender_name varchar(64) NOT NULL, sender_domain varchar(255) NOT NULL, host_ip varchar(15) NOT NULL, last_seen timestamp NOT NULL, PRIMARY KEY (sender_name, sender_domain, host_ip));: FATAL: The database system is starting up, reconnecting to DB Dec 23 23:19:22 sun sqlgrey[3691]: Can't connect to DB: FATAL: The database system is starting up Dec 23 23:19:23 sun sqlgrey[3691]: fatal: Couldn't create table from_awl: FATAL: The database system is starting up Dec 23 23:36:42 sun postfix/smtpd[7424]: warning: connect to 127.0.0.1:2501: Connection refused -- HaJo Schatz <ha...@ha...> http://www.HaJo.Net PGP-Key: http://www.hajo.net/hajonet/keys/pgpkey_hajo.txt |
From: Lionel B. <lio...@bo...> - 2004-12-28 21:56:42
|
HaJo Schatz wrote the following on 12/24/04 05:39 : >Hi Lionel, > >Just discovered two start-up issues with sqlgrey while doing a reboot of >my mail server: > >- During boot, sqlgrey couldn't access postgresql yet (DBI returned "db >is starting up"). sqlgrey then tried to create the from_awl table, >failed (db still starting up) and died silently (rather than, what would >IMO have been correct, retrying after a while). I have attached the >relevant part of the log below fyi. > > > I'll have to see if I can make sqlgrey exit with an error, this is the most desirable behaviour as scripts launching sqlgrey could then detect the problem and output an error on the console. Retrying after a while is IMHO not worth it (it can't decide on its own how much time it has to sleep). SQLgrey must be started after the database and this is the administrator's job to configure the system accordingly. >- I accidentially started sqlgrey (thorugh /etc/init.d/sqlgrey start) as >non-root user. The init-script said: > >'Starting SQLgrey: Pid_file "/var/run/sqlgrey.pid" already exists. >Overwriting!' > >and then '[ OK ]'. However, it was of course not OK, sqlgrey died with a > permission-denied on the PID file, did not tell me however... > > > This is related to the problem above, SQLgrey doesn't exit with an error today, I'll have to look if I can make the appropriate checks *before* it forks to daemonize itself. Thanks for the bug-reports, I hope all of you found pleasant surprises under the tree, enjoyed some good time and delicious meals :-) Lionel (back from numerous enjoyable and delicious meals). |
From: HaJo S. <ha...@ha...> - 2004-12-29 05:55:04
|
Lionel Bouton wrote: >> - During boot, sqlgrey couldn't access postgresql yet (DBI returned >> "db is starting up"). > Retrying after a while is IMHO not worth it (it can't decide on its own > how much time it has to sleep). SQLgrey must be started after the > database and this is the administrator's job to configure the system > accordingly. Sure -- I have an S85psotgresql and a S90sqlgrey. However postgres seems to take a while to "come up", which makes it currently impossible to rely on a correct start-up of a box, you have to intervene manually (and you will still have to if you simply exit sqlgrey with an error). Currently, a reboot of a box means that you will not be able to receive any mail afterwards until you attend to it... I think there are only two proper ways of solving this: 1) In above case, DBI is indicating a clear reason why the connection (temporarily) failed. Hence, if DBI returns "DB starting up" as error, re-try within sqlgrey until this error is gone. I actually think that "DB starting up" is not really an error at all... 2) Do this check/delay in the init-script before launching sqlgrey. Ie query the DB and see whether it's responsive. I think 2) is a bit confusing when thinking about what will happen if the db gets restarted while sqlgrey is already running. I know, in such a case sqlgrey will re-try. But here's IMHO the inconsistency -- if there's a DB issue at start-up, sqlgrey ignores it. If the issue occurs during execution, sqlgrey takes care of it... > Lionel (back from numerous enjoyable and delicious meals). Hajo, big and round by now. And I thought you went skiing... -- HaJo Schatz <ha...@ha...> http://www.HaJo.Net PGP-Key: http://www.hajo.net/hajonet/keys/pgpkey_hajo.txt |
From: Lionel B. <lio...@bo...> - 2004-12-29 09:10:43
|
HaJo Schatz wrote the following on 12/29/04 06:54 : > Lionel Bouton wrote: > >>> - During boot, sqlgrey couldn't access postgresql yet (DBI returned >>> "db is starting up"). >> > >> Retrying after a while is IMHO not worth it (it can't decide on its >> own how much time it has to sleep). SQLgrey must be started after the >> database and this is the administrator's job to configure the system >> accordingly. > > > Sure -- I have an S85psotgresql and a S90sqlgrey. However postgres > seems to take a while to "come up", which makes it currently > impossible to rely on a correct start-up of a box, you have to > intervene manually (and you will still have to if you simply exit > sqlgrey with an error). Currently, a reboot of a box means that you > will not be able to receive any mail afterwards until you attend to it... > > I think there are only two proper ways of solving this: > 1) In above case, DBI is indicating a clear reason why the connection > (temporarily) failed. Hence, if DBI returns "DB starting up" as error, > re-try within sqlgrey until this error is gone. I actually think that > "DB starting up" is not really an error at all... Ok, I'll look into it. > 2) Do this check/delay in the init-script before launching sqlgrey. Ie > query the DB and see whether it's responsive. > > I think 2) is a bit confusing when thinking about what will happen if > the db gets restarted while sqlgrey is already running. I know, in > such a case sqlgrey will re-try. But here's IMHO the inconsistency -- > if there's a DB issue at start-up, sqlgrey ignores it. If the issue > occurs during execution, sqlgrey takes care of it... > I could change that, I believed it would be quite messy to handle the no connection available at startup time but on second thought I think it can be handled cleanly. >> Lionel (back from numerous enjoyable and delicious meals). > > > Hajo, big and round by now. And I thought you went skiing... > Skiing is scheduled on the last 2 weeks of January :-) I hope I'll find some time to hack on SQLgrey before that. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-01-12 23:19:47
|
Lionel Bouton wrote the following on 12/29/04 10:09 : >> >> I think there are only two proper ways of solving this: >> 1) In above case, DBI is indicating a clear reason why the connection >> (temporarily) failed. Hence, if DBI returns "DB starting up" as >> error, re-try within sqlgrey until this error is gone. I actually >> think that "DB starting up" is not really an error at all... > > > > Ok, I'll look into it. Having the very same problem on my Gentoo, I looked into the distribution PostgreSQL init scripts and found some interesting code. People seem to be aware of the fact that PostgreSQL isn't available for a short time after the init script exits. They do 2 things : - wait for postmaster.pid to appear inside the PostgreSQL init script (they seem to know PostgreSQL isn't especially quick to load in some cases, reboot from system crash comes to mind). - for pg_autovacuum, the following code is used in its init script : while [ "$CONTINUE" -eq 0 ] && [ $TOO_LONG -lt 10 ] do psql -U $PGUSER -d template1 -c "SELECT 1" 1> /dev/null 2> /dev/null if [ "$?" -eq 0 ] then CONTINUE=1 else echo -n "." TOO_LONG=`expr $TOO_LONG + 1` sleep 1 fi done As I don't want to make sqlgrey dirty with such an hack (after all it could very well sit in the postgresql init script for God's sake), there are two courses of action : - I explain in the FAQ how to modify the postgresql init script to solve the problem and point to it in the HOWTO when adressing the install with PostgreSQL case (my preferred choice), - I add a lot of ugly PostgreSQL-specific things in sqlgrey init script. On Gentoo, I can help solve this by advising people to make sqlgrey depend on pg_autovacuum, the Gentoo init dependency resolver will solve these problems for me :-) Best regards, Lionel. |