Thread: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

Brought to you by: gyver, ludvigm, rebum

sqlgrey-users

[Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-06-30 14:09:04

Hi,

I'm using sqlgrey with postfix on three servers, configured using the
DBCLUSTER layout as defined in the README. However, when one machine goes
down, all three fail, with the following postfix message:

Jun 30 06:14:00 mail03 postfix/smtpd[32601]: NOQUEUE: reject: RCPT from
bmail.bridgemailsystem.com[66.206.172.149]: 451 4.3.5 Server configuration
problem; from=<mar...@lo...> to=<Rya...@ex...>
proto=ESMTP helo=<bmail.bridgemailsystem.com>

Isn't it supposed to continue running on the remaining systems when one of
them becomes disconnected?

This is my sqlgrey.conf config on one of the slave machines:

loglevel = 3
log_override = whitelist:1,grey:3,spam:2
reconnect_delay = 5
db_type = mysql
db_name = sqlgrey
db_host = ns1.example.com
db_port = default
db_user = sqlgrey
db_pass = mypass
db_cleanup_hostname=ns1.example.com
db_cleandelay = 1800
clean_method = sync
db_cluster = on
read_hosts=localhost,mail02.example.com,mail03.example.com,
mail01.example.com
prepend = 1
admin_mail = my...@me...

Any ideas greatly appreciated.
Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Jernej P. <jer...@ar...> - 2014-06-30 14:32:19

Dear Alex,

You could use hapolicy instead (http://postfwd.org/hapolicy/index.html) 
and run multiple instances of sqlgrey on multiple machines.

I am not sure, whether I completely understand your setup: you have 
three node cluster with MySQL master-master replication?

We have successfully deployed sqlgrey with mysql master-slave 
configuration, where reads were performed into slave nodes, while SQL 
writes were done on the master node. After a while, we ditched sqlgrey 
in favour of posftwd2 and hapolicy...

cheers, Jernej

On 30/06/14 16:08, Alex wrote:
> Hi,
>
> I'm using sqlgrey with postfix on three servers, configured using the
> DBCLUSTER layout as defined in the README. However, when one machine
> goes down, all three fail, with the following postfix message:
>
> Jun 30 06:14:00 mail03 postfix/smtpd[32601]: NOQUEUE: reject: RCPT from
> bmail.bridgemailsystem.com
> <http://bmail.bridgemailsystem.com>[66.206.172.149]: 451 4.3.5 Server
> configuration problem; from=<mar...@lo...
> <mailto:mar...@lo...>> to=<Rya...@ex...
> <mailto:Rya...@ex...>> proto=ESMTP
> helo=<bmail.bridgemailsystem.com <http://bmail.bridgemailsystem.com>>
>
> Isn't it supposed to continue running on the remaining systems when one
> of them becomes disconnected?
>
> This is my sqlgrey.conf config on one of the slave machines:
>
> loglevel = 3
> log_override = whitelist:1,grey:3,spam:2
> reconnect_delay = 5
> db_type = mysql
> db_name = sqlgrey
> db_host = ns1.example.com <http://ns1.example.com>
> db_port = default
> db_user = sqlgrey
> db_pass = mypass
> db_cleanup_hostname=ns1.example.com <http://ns1.example.com>
> db_cleandelay = 1800
> clean_method = sync
> db_cluster = on
> read_hosts=localhost,mail02.example.com
> <http://mail02.example.com>,mail03.example.com
> <http://mail03.example.com>,mail01.example.com <http://mail01.example.com>
> prepend = 1
> admin_mail = my...@me...
> <mailto:my...@me...>
>
> Any ideas greatly appreciated.
> Thanks,
> Alex
>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
>
>
>
> _______________________________________________
> Sqlgrey-users mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlgrey-users
>

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-06-30 19:20:02

Hi,

> You could use hapolicy instead (http://postfwd.org/hapolicy/index.html)
> and run multiple instances of sqlgrey on multiple machines.

If it wasn't already clear, I am running an instance of sqlgrey on each
machine, which all talk to one master, the one that happened to go down
this morning. This resulted in none of them apparently being able to talk
to their own sqlgrey service and just started rejecting mail.

> I am not sure, whether I completely understand your setup: you have
> three node cluster with MySQL master-master replication?

I'm a mysql novice, but I think it's just a slave-master situation. They
all should have their own copies of the complete greylist.

> We have successfully deployed sqlgrey with mysql master-slave
> configuration, where reads were performed into slave nodes, while SQL
> writes were done on the master node. After a while, we ditched sqlgrey
> in favour of posftwd2 and hapolicy...

So did you ditch it for this reason? That sounds like how I have it set up
here. Is it not possible to create a fault-tolerant sqlgrey system on its
own?

Would you be able to send your postfwd2 and hapolicy configs for a
reference to get started?

I also realized I made a typo in the configuration file I posted here,
which doesn't exist on my production system. Here are the relevant bits.
This one has the db_host properly, in case that matters for reference here:

loglevel = 3
log_override = whitelist:1,grey:3,spam:2
reconnect_delay = 5
db_type = mysql
db_name = sqlgrey
db_host = mail01.example.com
db_port = default
db_user = sqlgrey
db_pass = mypass
db_cleanup_hostname=mail01.example.com
db_cleandelay = 1800
clean_method = sync
db_cluster = on
read_hosts=localhost,mail01.example.com,mail02.example.com,
mail03.example.com
prepend = 1
admin_mail = my...@me...

Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Jernej P. <jer...@ar...> - 2014-07-01 08:59:22

Dear Alex,

On 30/06/14 21:19, Alex wrote:
>
>  > We have successfully deployed sqlgrey with mysql master-slave
>  > configuration, where reads were performed into slave nodes, while SQL
>  > writes were done on the master node. After a while, we ditched sqlgrey
>  > in favour of posftwd2 and hapolicy...
>
> So did you ditch it for this reason? That sounds like how I have it set
> up here. Is it not possible to create a fault-tolerant sqlgrey system on
> its own?

I think that you would need a multi-master SQL setup to be able to use 
sqlgrey in a way you are trying to set it up. The problem is that when 
mysql master goes down, sqlgrey is unable to update database and it fails.

I dont know sqlgreys' DB_CLUSTER setup anymore, but as I recall it only 
stands for offloading the read queries to other slaves as well, while 
still relying on write master to be up all the time. If write node is 
down, sqlgrey is failing...

IMHO, you have two options:
- using hapolicy with sqlgrey with default settings to DUNNO, which will 
say DUNNO if sqlgrey fails after MySQL failure
- set up HA mysql setup where write node will never fail (multi-master 
etc. - there are few options available)

I would go with hapolicy: easier to maintain and no real hassle if 
sqlgrey is failing...

>
> Would you be able to send your postfwd2 and hapolicy configs for a
> reference to get started?

Sorry, my setup is site specific, however both tools have good 
documentation and live support mailing lists, so no worries. If you hit 
a barrier, just ask a question there...

cheers, Jernej

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Lionel B. <lio...@bo...> - 2014-07-01 09:39:51

Le 01/07/2014 10:59, Jernej Porenta a écrit :
> Dear Alex,
>
> On 30/06/14 21:19, Alex wrote:
>>  > We have successfully deployed sqlgrey with mysql master-slave
>>  > configuration, where reads were performed into slave nodes, while SQL
>>  > writes were done on the master node. After a while, we ditched sqlgrey
>>  > in favour of posftwd2 and hapolicy...
>>
>> So did you ditch it for this reason? That sounds like how I have it set
>> up here. Is it not possible to create a fault-tolerant sqlgrey system on
>> its own?
> I think that you would need a multi-master SQL setup to be able to use 
> sqlgrey in a way you are trying to set it up. The problem is that when 
> mysql master goes down, sqlgrey is unable to update database and it fails.

It shouldn't. I didn't write the cluster support but with a single
server, I coded SQLgrey to handle database failures gracefully and stop
greylisting until the database server restarts.
There's one exception : SQLgrey doesn't start correctly if the database
server is unavailable, once it runs it should not fail.

You can consider this a bug in the cluster support (and might want to
test SQLgrey without a cluster setup).

Best regards,

Lionel.

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Jernej P. <jer...@ar...> - 2014-07-01 09:50:33

On 01/07/14 11:21, Lionel Bouton wrote:
> Le 01/07/2014 10:59, Jernej Porenta a écrit :
>> Dear Alex,
>>
>> On 30/06/14 21:19, Alex wrote:
>>>   > We have successfully deployed sqlgrey with mysql master-slave
>>>   > configuration, where reads were performed into slave nodes, while SQL
>>>   > writes were done on the master node. After a while, we ditched sqlgrey
>>>   > in favour of posftwd2 and hapolicy...
>>>
>>> So did you ditch it for this reason? That sounds like how I have it set
>>> up here. Is it not possible to create a fault-tolerant sqlgrey system on
>>> its own?
>> I think that you would need a multi-master SQL setup to be able to use
>> sqlgrey in a way you are trying to set it up. The problem is that when
>> mysql master goes down, sqlgrey is unable to update database and it fails.
>
> It shouldn't. I didn't write the cluster support but with a single
> server, I coded SQLgrey to handle database failures gracefully and stop
> greylisting until the database server restarts.
> There's one exception : SQLgrey doesn't start correctly if the database
> server is unavailable, once it runs it should not fail.

Does "stop greylisting" means responding with DUNNO or does nor responds 
at all?

If it responds with DUNNO, then postfix continues working normally 
otherwise postfix issues an Server configuration error and defers a 
mail. This happens with all non-responsive policy servers...

I know that sqlgrey does great job at reconnecting to failing mysql 
servers, however I don't know the details behind it...

cheers, J.

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Lionel B. <lio...@bo...> - 2014-07-01 10:29:01

Le 01/07/2014 11:50, Jernej Porenta a écrit :
> On 01/07/14 11:21, Lionel Bouton wrote:
>> It shouldn't. I didn't write the cluster support but with a single
>> server, I coded SQLgrey to handle database failures gracefully and
>> stop greylisting until the database server restarts. There's one
>> exception : SQLgrey doesn't start correctly if the database server is
>> unavailable, once it runs it should not fail. 
> Does "stop greylisting" means responding with DUNNO or does nor responds 
> at all?

DUNNO.

>
> If it responds with DUNNO, then postfix continues working normally 

That's the expected behaviour.

Lionel

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Dan F. <da...@ha...> - 2014-07-01 12:31:44

Hi Alex.

I wrote the DBCluster code. I've just tested the scenario where the
sql-master dies and in my test it continues as expected, simply allowing
everything through.

I've tested both running sql loosing master and restarting sqlgrey with
master gone. If the master-db doesn't work, it simply seems to keep calm
and carry on. In both cases.

When sqlgrey tries to reconnect, it fails and will keep failing. When this
happens i can clearly see it in my logfile:
sqlgrey: dbaccess: Using DBIx:DBCluster
sqlgrey: warning: Could not connect to any server in WRITE_HOSTS at
./sqlgrey line 833
sqlgrey: dbaccess: can't connect to DB: Can't connect to MySQL server on
'127.0.0.2' (111)
sqlgrey: dbaccess: error: couldn't access optout_domain table: Can't
connect to MySQL server on '127.0.0.2' (111)

Perhaps you can find any sqlgrey log-output on this issue, as the postfix
error you quoted, isnt telling me much.

In fact, the ONLY way i have been able to get a "Server configuration
problem" in my tests, is if i point the db_host to a server that behind a
firewall that DROPS packages. This makes "connect" hang for a very long
time, which makes postfix drop the connection due to timeout and cry
"Server configuration problem".

- Dan

Alex wrote:
> Hi,
>
>
>> You could use hapolicy instead (http://postfwd.org/hapolicy/index.html)
>>  and run multiple instances of sqlgrey on multiple machines.
>
> If it wasn't already clear, I am running an instance of sqlgrey on each
> machine, which all talk to one master, the one that happened to go down
> this morning. This resulted in none of them apparently being able to talk
>  to their own sqlgrey service and just started rejecting mail.
>
>> I am not sure, whether I completely understand your setup: you have
>> three node cluster with MySQL master-master replication?
>
> I'm a mysql novice, but I think it's just a slave-master situation. They
> all should have their own copies of the complete greylist.
>
>> We have successfully deployed sqlgrey with mysql master-slave
>> configuration, where reads were performed into slave nodes, while SQL
>> writes were done on the master node. After a while, we ditched sqlgrey
>> in favour of posftwd2 and hapolicy...
>
> So did you ditch it for this reason? That sounds like how I have it set
> up here. Is it not possible to create a fault-tolerant sqlgrey system on
> its own?
>
> Would you be able to send your postfwd2 and hapolicy configs for a
> reference to get started?
>
> I also realized I made a typo in the configuration file I posted here,
> which doesn't exist on my production system. Here are the relevant bits.
> This one has the db_host properly, in case that matters for reference
> here:
>
>
> loglevel = 3 log_override = whitelist:1,grey:3,spam:2 reconnect_delay = 5
> db_type = mysql db_name = sqlgrey db_host = mail01.example.com db_port =
> default db_user = sqlgrey db_pass = mypass
> db_cleanup_hostname=mail01.example.com db_cleandelay = 1800 clean_method =
> sync db_cluster = on
> read_hosts=localhost,mail01.example.com,mail02.example.com,
> mail03.example.com prepend = 1 admin_mail = my...@me...
>
> Thanks,
> Alex
> --------------------------------------------------------------------------
> ----
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft____________________________________________
> ___
> Sqlgrey-users mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlgrey-users
>
>

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-01 20:24:33

Hi,

> I wrote the DBCluster code. I've just tested the scenario where the
> sql-master dies and in my test it continues as expected, simply allowing
> everything through.

That's good to know. I'd definitely like to see about getting sqlgrey
working properly before trying alternatives, so I very much appreciate your
help.

> I've tested both running sql loosing master and restarting sqlgrey with
> master gone. If the master-db doesn't work, it simply seems to keep calm
> and carry on. In both cases.
>
> When sqlgrey tries to reconnect, it fails and will keep failing. When this
> happens i can clearly see it in my logfile:
> sqlgrey: dbaccess: Using DBIx:DBCluster
> sqlgrey: warning: Could not connect to any server in WRITE_HOSTS at
> ./sqlgrey line 833
> sqlgrey: dbaccess: can't connect to DB: Can't connect to MySQL server on
> '127.0.0.2' (111)
> sqlgrey: dbaccess: error: couldn't access optout_domain table: Can't
> connect to MySQL server on '127.0.0.2' (111)

Yes, that is the very same messages I receive:

Jun 30 06:35:16 mail03 sqlgrey: warning: Could not connect to any server in
WRITE_HOSTS at /usr/sbin/sqlgrey line 827.
Jun 30 06:35:16 mail03 sqlgrey: dbaccess: can't connect to DB: Can't
connect to MySQL server on 'mail02.example.com' (113)
Jun 30 06:35:16 mail03 sqlgrey: dbaccess: error: couldn't access config
table: Can't connect to MySQL server on 'mail02.example.com' (113)
Jun 30 06:35:16 mail03 sqlgrey: mail: failed to send:
Jun 30 06:35:16 mail03 sqlgrey: fatal: setconfig error at /usr/sbin/sqlgrey
line 195.

I have sqlgrey defined as such in master.cf:

greylist  unix  -       n       n       -       0       spawn
        user=nobody argv=/usr/bin/perl /usr/sbin/sqlgrey

and "check_policy_service inet:127.0.0.1:2501" in main.cf.

> In fact, the ONLY way i have been able to get a "Server configuration
> problem" in my tests, is if i point the db_host to a server that behind a
> firewall that DROPS packages. This makes "connect" hang for a very long
> time, which makes postfix drop the connection due to timeout and cry
> "Server configuration problem".

Have I configured postfix incorrectly? I'll include my sqlgrey.conf again,
in hopes it helps.

loglevel = 3
log_override = whitelist:1,grey:3,spam:2
reconnect_delay = 5
db_type = mysql
db_name = sqlgrey
db_host = mail02.example.com
db_port = default
db_user = sqlgrey
db_pass = mypass
db_cleanup_hostname=mail02.example.com
db_cleandelay = 1800
clean_method = sync
db_cluster = on
read_hosts=localhost,mail02.example.com,mail03.example.com,
mail01.example.com
prepend = 1
admin_mail = my...@me...

Thanks again,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Dan F. <da...@ha...> - 2014-07-03 14:49:00

Alex wrote:
> Yes, that is the very same messages I receive:
>
>
> Jun 30 06:35:16 mail03 sqlgrey: warning: Could not connect to any server
> in WRITE_HOSTS at /usr/sbin/sqlgrey line 827.
> Jun 30 06:35:16 mail03 sqlgrey: dbaccess: can't connect to DB: Can't
> connect to MySQL server on 'mail02.example.com' (113) Jun 30 06:35:16
> mail03 sqlgrey: dbaccess: error: couldn't access config table: Can't
> connect to MySQL server on 'mail02.example.com' (113) Jun 30 06:35:16
> mail03 sqlgrey: mail: failed to send: Jun 30 06:35:16 mail03 sqlgrey:
> fatal: setconfig error at /usr/sbin/sqlgrey
> line 195.

I believe the error 113 means "no route to host" and that should fail
instantly. Which means you are probably not having timeout issues. And
then I'm at a loss, since i cannot reproduce the issue here. And if theres
nothing else in the logs from sqlgrey indicating errors, well...

I'd go with Lionel's suggestion to try and run sqlgrey without
db_clustering to simplify the setup. Though i dont think itll show any
difference, it should be an easy test and it will rule out (or confirm)
that it has something to do with db_clustering.

Then I'd try the same with the "spawn" setup you described below. Does it
make any difference if you comment out that line, and simply run it using
$ /usr/sbin/sqlgrey -d

I'm just thinking, that if I cannot reproduce your error, it must be
something specific to your setup.

- Dan

>
> I have sqlgrey defined as such in master.cf:
>
>
> greylist  unix  -       n       n       -       0       spawn user=nobody
> argv=/usr/bin/perl /usr/sbin/sqlgrey
>
> and "check_policy_service inet:127.0.0.1:2501" in main.cf.
>
>> In fact, the ONLY way i have been able to get a "Server configuration
>> problem" in my tests, is if i point the db_host to a server that behind
>> a firewall that DROPS packages. This makes "connect" hang for a very
>> long time, which makes postfix drop the connection due to timeout and
>> cry "Server configuration problem".
>>
>
> Have I configured postfix incorrectly? I'll include my sqlgrey.conf
> again, in hopes it helps.
>
> loglevel = 3 log_override = whitelist:1,grey:3,spam:2 reconnect_delay = 5
> db_type = mysql db_name = sqlgrey db_host = mail02.example.com db_port =
> default db_user = sqlgrey db_pass = mypass
> db_cleanup_hostname=mail02.example.com db_cleandelay = 1800 clean_method =
> sync db_cluster = on
> read_hosts=localhost,mail02.example.com,mail03.example.com,
> mail01.example.com prepend = 1 admin_mail = my...@me...
>
> Thanks again,
> Alex
> --------------------------------------------------------------------------
> ----
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft____________________________________________
> ___
> Sqlgrey-users mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlgrey-users
>
>

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-03 17:29:55

Hi,

> > Jun 30 06:35:16 mail03 sqlgrey: warning: Could not connect to any server
> > in WRITE_HOSTS at /usr/sbin/sqlgrey line 827.
> > Jun 30 06:35:16 mail03 sqlgrey: dbaccess: can't connect to DB: Can't
> > connect to MySQL server on 'mail02.example.com' (113) Jun 30 06:35:16
> > mail03 sqlgrey: dbaccess: error: couldn't access config table: Can't
> > connect to MySQL server on 'mail02.example.com' (113) Jun 30 06:35:16
> > mail03 sqlgrey: mail: failed to send: Jun 30 06:35:16 mail03 sqlgrey:
> > fatal: setconfig error at /usr/sbin/sqlgrey
> > line 195.
>
> I believe the error 113 means "no route to host" and that should fail
> instantly. Which means you are probably not having timeout issues. And
> then I'm at a loss, since i cannot reproduce the issue here. And if theres
> nothing else in the logs from sqlgrey indicating errors, well...

Yes, the server was unreachable because it was down.

> I'd go with Lionel's suggestion to try and run sqlgrey without
> db_clustering to simplify the setup. Though i dont think itll show any
> difference, it should be an easy test and it will rule out (or confirm)
> that it has something to do with db_clustering.

I don't really see how that's an option, though, because a client could
conceivably have to try three different servers before being allowed to
connect, meaning up to a fifteen minute delay before the mail is even
accepted, assuming the client even retries that many times, which I doubt
it would. That's the whole reason for clustering in the first place.

> Then I'd try the same with the "spawn" setup you described below. Does it
> make any difference if you comment out that line, and simply run it using
> $ /usr/sbin/sqlgrey -d

That is how I'm running it.

> I'm just thinking, that if I cannot reproduce your error, it must be
> something specific to your setup.

The configuration isn't all that complex. Have you tested your environment
and know that yours works properly?

Could you post your config so I can compare with mine?

How did you set up mysql?

I'd really like to stick with sqlgrey if at all possible, so I'd really
appreciate your help in figuring this out.

Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Lionel B. <lio...@bo...> - 2014-07-01 21:13:30

Hi,

Just a heads-up about this:

> I have sqlgrey defined as such in master.cf <http://master.cf>:
>
> greylist  unix  -       n       n       -       0       spawn
>         user=nobody argv=/usr/bin/perl /usr/sbin/sqlgrey
>

That's odd. I didn't expect to see it run like that and I'm not sure
under which circumstances it makes sense.
spawn daemons are only launched if something tries to connect to them
and they expect communication on STDIN/OUT/ERR:
http://www.postfix.org/spawn.8.html

SQLgrey wasn't designed to work like that and should be launched has a
separate service. I'm not sure how you make it work at all unless this
configuration in master.cf is simply not used and SQLgrey is started
separately.

Best regards,

Lionel

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-01 21:18:31

Hi,

Just a heads-up about this:
>
>   I have sqlgrey defined as such in master.cf:
>
> greylist  unix  -       n       n       -       0       spawn
>         user=nobody argv=/usr/bin/perl /usr/sbin/sqlgrey
>
>
> That's odd. I didn't expect to see it run like that and I'm not sure under
> which circumstances it makes sense.
> spawn daemons are only launched if something tries to connect to them and
> they expect communication on STDIN/OUT/ERR:
> http://www.postfix.org/spawn.8.html
>
> SQLgrey wasn't designed to work like that and should be launched has a
> separate service. I'm not sure how you make it work at all unless this
> configuration in master.cf is simply not used and SQLgrey is started
> separately.
>

Ugh, you're right. I'm starting sqlgrey separately as a standalone program.
I recall now that I was experimenting with this early on, trying to get it
to work the way postfwd works, but had abandoned it.

Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: <da...@ha...> - 2014-07-03 20:59:10

On 2014-07-03 19:29, Alex wrote:
> > I believe the error 113 means "no route to host" and that should fail
>
> Yes, the server was unreachable because it was down.
>
Yes, sorry, my point was perhaps unclear.
I was just trying to say, that out of the many errors you could have
gotten, you got 113. And that 113 should fail fast and not hang.
But in the meantime, id's like to revise that statement. I have actaully
gotten a 113 that hangs now.
I finally succeeded in getting it to do so, by entering db_host as
192.188.1.3, which for me apparently cannot be routed.

And now im seeing delays which may support to my original "timeout"
theory. So i need you to test something.
change your:
db_host = mail02.example.com
to:
db_host = mail02.example.com;mysql_connect_timeout=1

(same line, no extra spaces) and the restart sqlgrey and see if it helps.

Also. What version of sqlgrey are you running?


> > I'd go with Lionel's suggestion to try and run sqlgrey without
> > db_clustering to simplify the setup. Though i dont think itll show any
> > difference, it should be an easy test and it will rule out (or confirm)
> > that it has something to do with db_clustering.
>
> I don't really see how that's an option, though, because a client
> could conceivably have to try three different servers before being
> allowed to connect, meaning up to a fifteen minute delay before the
> mail is even accepted, assuming the client even retries that many
> times, which I doubt it would. That's the whole reason for clustering
> in the first place.
Well.. No. The reason you mention, is the reason for using a central
sql-server. The reason for db-clustering, is the performance of the
central sql-server.
All your mail-nodes use the same write-host. And so the write host will
have the same data as your readhosts.

Theres no technical reason why all your mailservers couldnt use one
central database, like so:

      [mail1] --->  [db]  <---- [mail*]

The reason i created dbclustering, was because i had some 10 mailservers
at the time, with one central database and bot-nets were hammering
sqlgrey, causing the db to hang sometimes, due to the sheer amount of
lookups.
So i setup a mysql-slave on each mailserver, had them replicate data
from the master and made sqlgrey read from localhost only. This removed
all the "read" load from the db-master.

Under normal load, i can easily point all queries to the db-master,
without any problems.  I just tested with db_cluster=off and i can see
select queries going to the master now, instead of localhost. And
everything else works fine.


>
> > Then I'd try the same with the "spawn" setup you described below.
> Does it
> > make any difference if you comment out that line, and simply run it
> using
> > $ /usr/sbin/sqlgrey -d
>
> That is how I'm running it.
Ah yes. I misread your reply to Lionel. Sorry.


> The configuration isn't all that complex. Have you tested your
> environment and know that yours works properly?
Yes. I take a node out of my production environment temporarily for
testing on and everthing i test acts as expected.

> Could you post your config so I can compare with mine?

loglevel = 2
reconnect_delay = 5
max_connect_age = 3
connect_src_throttle = 15
awl_age = 32
group_domain_level = 10
db_type = mysql
db_name = sqlgrey
db_host = dbmaster.example.com
db_user = sqlgreyuser
db_pass = password
db_cleandelay = 60
db_cluster = on
read_hosts=localhost
prepend = 0
optmethod = optout
discrimination = on
discrimination_add_rulenr = on
reject_first_attempt = immed
reject_early_reconnect = immed
reject_code = 451



> How did you set up mysql?

1 master and many slaves replicating. Each slave lives on the
mailserver-node, together with postfix and sqlgrey.
All sqlgrey's use localhost for read, master for write.



>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
>
>
> _______________________________________________
> Sqlgrey-users mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlgrey-users

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-04 01:44:47

Hi,

On Thu, Jul 3, 2014 at 4:59 PM, <da...@ha...> wrote:

>  On 2014-07-03 19:29, Alex wrote:
>
> > I believe the error 113 means "no route to host" and that should fail
>
>  Yes, the server was unreachable because it was down.
>
>   Yes, sorry, my point was perhaps unclear.
> I was just trying to say, that out of the many errors you could have
> gotten, you got 113. And that 113 should fail fast and not hang.
>

I'm not sure it actually hung. I realized it was a problem when every mail
that was being received was immediately rejected due to "Server
configuration error". The messages weren't queued or delayed in any way.
All mail on all three systems were immediately being rejected, for more
than an hour before I was able to bring the server back and restart sqlgrey
on each system.

> But in the meantime, id's like to revise that statement. I have actaully
> gotten a 113 that hangs now.
> I finally succeeded in getting it to do so, by entering db_host as
> 192.188.1.3, which for me apparently cannot be routed.
>

Okay, maybe your definition of "hang" is different than mine, but perhaps
we're really talking about the same thing. In any case, when my system
fails, it just outright rejects mail across all systems, apparently because
it can't talk to the master.

>
> And now im seeing delays which may support to my original "timeout"
> theory. So i need you to test something.
> change your:
> db_host = mail02.example.com
> to:
> db_host = mail02.example.com;mysql_connect_timeout=1
>
> (same line, no extra spaces) and the restart sqlgrey and see if it helps.
>

Please confirm that you think I should do this, given the new information
about failures above.

>
> Also. What version of sqlgrey are you running?
>

sqlgrey-1.8.0 compiled here locally.

>  > I'd go with Lionel's suggestion to try and run sqlgrey without
> > db_clustering to simplify the setup. Though i dont think itll show any
> > difference, it should be an easy test and it will rule out (or confirm)
> > that it has something to do with db_clustering.
>
>  I don't really see how that's an option, though, because a client could
> conceivably have to try three different servers before being allowed to
> connect, meaning up to a fifteen minute delay before the mail is even
> accepted, assuming the client even retries that many times, which I doubt
> it would. That's the whole reason for clustering in the first place.
>
> Well.. No. The reason you mention, is the reason for using a central
> sql-server. The reason for db-clustering, is the performance of the central
> sql-server.
> All your mail-nodes use the same write-host. And so the write host will
> have the same data as your readhosts.
>
> Theres no technical reason why all your mailservers couldnt use one
> central database, like so:
>
>       [mail1] --->  [db]  <---- [mail*]
>
> The reason i created dbclustering, was because i had some 10 mailservers
> at the time, with one central database and bot-nets were hammering sqlgrey,
> causing the db to hang sometimes, due to the sheer amount of lookups.
> So i setup a mysql-slave on each mailserver, had them replicate data from
> the master and made sqlgrey read from localhost only. This removed all the
> "read" load from the db-master.
>

Yes, okay, I do understand that. I should have written that as well, but my
main reason is to avoid users from being greylisted numerous times for
sending mail to the same user in the same domain.

>
> Under normal load, i can easily point all queries to the db-master,
> without any problems.  I just tested with db_cluster=off and i can see
> select queries going to the master now, instead of localhost. And
> everything else works fine.
>

Okay, but if the master dies, then no queries occur, correct?

 Could you post your config so I can compare with mine?

loglevel = 2
> reconnect_delay = 5
> max_connect_age = 3
> connect_src_throttle = 15
> awl_age = 32
> group_domain_level = 10
>
> db_type = mysql
> db_name = sqlgrey
> db_host = dbmaster.example.com
> db_user = sqlgreyuser
> db_pass = password
> db_cleandelay = 60
> db_cluster = on
> read_hosts=localhost
> prepend = 0
> optmethod = optout
> discrimination = on
> discrimination_add_rulenr = on
> reject_first_attempt = immed
> reject_early_reconnect = immed
> reject_code = 451
>

There are a few options there that I'm not using, and I don't recognize,
but I don't believe the lack of any of them would cause the issue I'm
having, correct?

 How did you set up mysql?
>
>
> 1 master and many slaves replicating. Each slave lives on the
> mailserver-node, together with postfix and sqlgrey.
> All sqlgrey's use localhost for read, master for write.
>

Ah, I think I have it configured for all hosts to write to the one master.
How can you have all hosts write to the local database, yet have any kind
of synchronization between tables?

I'm pretty sure I set it up according to the way it was documented,
particularly given I don't know much about replication myself.

Hopefully this info helps better isolate where I'm going wrong?

Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Dan F. <da...@ha...> - 2014-07-04 09:44:48

Alex wrote:
> Okay, maybe your definition of "hang" is different than mine, but perhaps

By hanging, i mean "any network connection or connection attempt, that
stalls for more than a few seconds".

You have this whole chain of individual connections:
  internet -> postfix -> sqlgrey -> mysql

Each of these have a timeout value. Which doesn't have to be the same.
So when postfix connects to sqlgrey, its not gonna wait forever for a
reply. If sqlgrey's attempt to connect to mysql "hangs", for more seconds
than Postfix is willing to wait, postfix kills the connection and replies
"Server configuration error".

Thus, if your mysql connection attempt doesn't timeout fast enough,
sqlgrey never gets a chance to reply "dunno" to postfix and allow the mail
to go through.

>> db_host = mail02.example.com;mysql_connect_timeout=1
>>
>> (same line, no extra spaces) and the restart sqlgrey and see if it
>> helps.
>>
>
> Please confirm that you think I should do this, given the new information
>  about failures above.

Yes. I think you should :). I have tested this with 1.7.4 and 1.8.0 and it
works in both cases. What I'm doing, is simply adding a connect-timeout of
1 second to the mysql connection. So if the connect attempt hangs (as per
my earlier definition), it will give up after one second. (ofcourse you
could more seconds than 1, if you worry that your SQL server will ever be
slower than 1 second to accept a connection).

In my tests, this solves the issue, because postfix doesn't have to
timeout the connection to sqlgrey and everything remains shiny.

(shiny = "mails will pass through unhindered, while the sql-server is down")

> Yes, okay, I do understand that. I should have written that as well, but
> my main reason is to avoid users from being greylisted numerous times for
> sending mail to the same user in the same domain.

For that, you only need 1 sql-server, shared among all mail servers. And
sqlgrey running with db_cluster=off.

"db_cluster=on" is only needed if the 1 sql-server cant service all your
mail servers fast enough.

(I'm not saying that you're doing it wrong, I'm just pointing out the
different motivations.)

>> Under normal load, i can easily point all queries to the db-master,
>> without any problems.  I just tested with db_cluster=off and i can see
>
> Okay, but if the master dies, then no queries occur, correct?

Correct. But no queries occur in db_cluster=on mode either, if master
dies. sqlgrey defaults back to "allow everything" if db_host (the master)
dies. And as such, there is no need to do queries anymore, until master is
online again.

>> read_hosts=localhost prepend = 0 optmethod = optout discrimination = on
> There are a few options there that I'm not using, and I don't recognize,
> but I don't believe the lack of any of them would cause the issue I'm
> having, correct?

No. There are no undocumented settings here, that relates to connections
to databases. In fact, the only option I'd try to change in your case,
would be the prepend. Though i doubt it has any effect, it does change the
way sqlgrey responds to postfix. And if postfix doesn't understand the
response, you get "Server configuration problem".

>> 1 master and many slaves replicating. Each slave lives on the
>> mailserver-node, together with postfix and sqlgrey. All sqlgrey's use
>> localhost for read, master for write.
>>
>
> Ah, I think I have it configured for all hosts to write to the one
> master. How can you have all hosts write to the local database, yet have
> any kind of synchronization between tables?

Hmm.. Let me just explain MySQL Replication real quick:

You have a mysql server. You do reads and writes and everything is fine.
Now you'd like a "replica". So you make a NEW mysql-server, calling it
"slave01". Then you instruct slave01 to "replicate" from the master. The
slave is actually doing all the work, replication wise. The master doesn't
know and doesn't care about how the slave is doing, if its behind or
whatever.
And you can add as more slaves and the master still doesn't know or cares.

The master doesn't know its a master. It doesn't "act" differently. It
still can do reads and write just like when it was stand-alone. Any
statements executed on the master, that would change data in any way, gets
executed on all the slaves as well, via replication.

On the slaves, you can do reads (and technically it can also do writes,
but writing would not be smart, as it causes inconsistencies with the
master and can make replication stop dead).
If writes WERE to be done to a slave, the write changes would NOT be
replicated to the master. Thats simply just not how it works.
The slaves copy all INSERT,REPLACE,UPDATE,DELETE,CREATE,ALTER, ect.
statements from the master and execute them on themselves..

So now you have 1 server where you can read and write all you like, and X
slave servers, that should have the same data as the master, where you can
do read queries.

So now that we know that slaves are just a read only copy of the master,
and the master is still just a normal mysql-server, i assume you can see
why disabling db-clustering, wont change anything as long as the master
doesn't suffer from poor performance. Since, all that happens by setting
db_cluster=off, is that all the slaves wont be used for reads anymore and
all read queries will go to the master instead.

Hope that makes it clearer.

- Dan

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-04 14:14:49

Hi,

> > Okay, maybe your definition of "hang" is different than mine, but
perhaps
>
> By hanging, i mean "any network connection or connection attempt, that
> stalls for more than a few seconds".

Okay, that's how I understand it, but that's not what's happening here.

There are two scenarios where I get the "451 4.3.5 Server configuration
problem" error. The first is if sqlgrey dies on any system, then that
system will respond with the error. The second is when mysql is stopped on
the master server.

(after adding your mysql_connect_timeout=1 option, it no longer fails when
mysql dies.)

However, postfix still responds with "server configuration ..." if sqlgrey
is dead or inaccessible. This is the issue I need to fix now.

> You have this whole chain of individual connections:
>   internet -> postfix -> sqlgrey -> mysql
>
> Each of these have a timeout value. Which doesn't have to be the same.
> So when postfix connects to sqlgrey, its not gonna wait forever for a
> reply. If sqlgrey's attempt to connect to mysql "hangs", for more seconds
> than Postfix is willing to wait, postfix kills the connection and replies
> "Server configuration error".
>
> Thus, if your mysql connection attempt doesn't timeout fast enough,
> sqlgrey never gets a chance to reply "dunno" to postfix and allow the mail
> to go through.

That's assuming sqlgrey is still around to respond.  I need to also
consider the possibility where sqlgrey dies.

> >> db_host = mail02.example.com;mysql_connect_timeout=1
> >>
> >> (same line, no extra spaces) and the restart sqlgrey and see if it
> >> helps.

Okay, this did appear to solve the problem with the master mysqld is not
able to respond. It no longer responds with "Server configuration ...",
which is good.

I don't see that option in the default documentation. Where is this
documented?

> > Please confirm that you think I should do this, given the new
information
> >  about failures above.
>
> Yes. I think you should :). I have tested this with 1.7.4 and 1.8.0 and it
> works in both cases. What I'm doing, is simply adding a connect-timeout of
> 1 second to the mysql connection. So if the connect attempt hangs (as per
> my earlier definition), it will give up after one second. (ofcourse you
> could more seconds than 1, if you worry that your SQL server will ever be
> slower than 1 second to accept a connection).

So then in my setup, where the master mysql daemon is unavailable, each
client references their own database? And no updating is occurring since
they aren't configured as write servers, correct?

> In my tests, this solves the issue, because postfix doesn't have to
> timeout the connection to sqlgrey and everything remains shiny.
>
> (shiny = "mails will pass through unhindered, while the sql-server is
down")

So postfix was always waiting patiently enough; it was sqlgrey that was
responding with failure too quickly?

> > Yes, okay, I do understand that. I should have written that as well, but
> > my main reason is to avoid users from being greylisted numerous times
for
> > sending mail to the same user in the same domain.
>
> For that, you only need 1 sql-server, shared among all mail servers. And
> sqlgrey running with db_cluster=off.
>
> "db_cluster=on" is only needed if the 1 sql-server cant service all your
> mail servers fast enough.
>
> (I'm not saying that you're doing it wrong, I'm just pointing out the
> different motivations.)

So it's okay to leave it on, correct? Wouldn't this also serve to make it
possible for existing entries to be queried through the local copies while
the master is unavailable?

> >> Under normal load, i can easily point all queries to the db-master,
> >> without any problems.  I just tested with db_cluster=off and i can see
> >
> > Okay, but if the master dies, then no queries occur, correct?
>
> Correct. But no queries occur in db_cluster=on mode either, if master
> dies. sqlgrey defaults back to "allow everything" if db_host (the master)
> dies. And as such, there is no need to do queries anymore, until master is
> online again.

Each client has a local copy of the database, no? And by setting read_hosts
to contain at least localhost, it should then be able to query the local
database, no?

> >> read_hosts=localhost prepend = 0 optmethod = optout discrimination = on
> > There are a few options there that I'm not using, and I don't recognize,
> > but I don't believe the lack of any of them would cause the issue I'm
> > having, correct?
>
> No. There are no undocumented settings here, that relates to connections
> to databases. In fact, the only option I'd try to change in your case,
> would be the prepend. Though i doubt it has any effect, it does change the
> way sqlgrey responds to postfix. And if postfix doesn't understand the
> response, you get "Server configuration problem".

I don't see where these options are defined either.

> So now that we know that slaves are just a read only copy of the master,
> and the master is still just a normal mysql-server, i assume you can see
> why disabling db-clustering, wont change anything as long as the master
> doesn't suffer from poor performance. Since, all that happens by setting
> db_cluster=off, is that all the slaves wont be used for reads anymore and
> all read queries will go to the master instead.

Okay, got it. I think I got confused, but I believe I understood it
correctly, in that when the master is down, the slaves can continue to read
from their local database. I think it was just the db_cluster terminology
that I wasn't understanding there.

Thanks again,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: <da...@ha...> - 2014-07-04 16:18:27

On 2014-07-04 16:14, Alex wrote:
> > By hanging, i mean "any network connection or connection attempt, that
> > stalls for more than a few seconds".
>
> Okay, that's how I understand it, but that's not what's happening here.
All evidence so far, points to this explanation. Including the fact that
my timeout-fix worked.
Im unsure what you base your assumption on, that this is not whats
happening, as the logs wont show you this and you'd need to do somthing
like modifying the sqlgrey code to provide you with debugging
information or telnet/netcat to talk to sqlgrey & postfix.

> There are two scenarios where I get the "451 4.3.5 Server
> configuration problem" error. The first is if sqlgrey dies on any
> system, then that system will respond with the error.
> That's assuming sqlgrey is still around to respond.  I need to also
> consider the possibility where sqlgrey dies.

I've never experienced sqlgrey just dying on me, but if it happens, it
is Postfix that decides what to respond. It cannot it be influenced by
sqlgrey.
And the error, 451, is a temporary error, so mails will be delivered
once sqlgrey is running again.

I dont think theres a setting in postfix to choose default answers to
policy_daemon failures. So this will be the same issue with any postfix
policy daemon that isnt running.


> > >> db_host = mail02.example.com
> <http://mail02.example.com>;mysql_connect_timeout=1
>    
> I don't see that option in the default documentation. Where is this
> documented?
Its not an option. Its a hack i made up for this occasion.

Sqlgrey uses a "DSN" internally for connecting to mysql. They look
somthing like this:

DBI:mysql:sqlgrey;host=db.example.com;port=3306

And in sqlgrey $host is just inserted into this DSN somthing like this.
DBI:mysql:sqlgrey;host=$host;port=3306

Which is why, if $host = "127.0.0.2;whatever=3", the DSN will contain
DBI:mysql:sqlgrey;host=127.0.0.2;whatever=3;port=3306

and mysql_connect_timeout happens to be an option you can add to the DSN.
So its just a hack. Its definitely something we should add as an option
in a later version.

> So then in my setup, where the master mysql daemon is unavailable,
> each client references their own database? And no updating is
> occurring since they aren't configured as write servers, correct?
Sqlgrey will default to "accept all mail" when master is unavailable. So
no need to read anything until master is back online.

>
> > In my tests, this solves the issue, because postfix doesn't have to
> > timeout the connection to sqlgrey and everything remains shiny.
> >
> > (shiny = "mails will pass through unhindered, while the sql-server
> is down")
>
> So postfix was always waiting patiently enough; it was sqlgrey that
> was responding with failure too quickly?
No. The other way around. sqlgrey may be 3 minutes in getting a timeout
from its mysql-connect(). But postfix "aint got time for that" and is
disconnecting already after, eg. , 100 seconds. So sqlgrey is too slow
to respond to postfix and postfix just disconnects.  And THATS why you
get "Server configuration problem".


>
> > "db_cluster=on" is only needed if the 1 sql-server cant service all your
> > mail servers fast enough.
> >
> > (I'm not saying that you're doing it wrong, I'm just pointing out the
> > different motivations.)
>
> So it's okay to leave it on, correct?
Yes its fine.
> Wouldn't this also serve to make it possible for existing entries to
> be queried through the local copies while the master is unavailable?
No.. There is no database "high-availability" here. If master dies, all
mail accepted by default.


>
> > dies. And as such, there is no need to do queries anymore, until
> master is
> > online again.
>
> Each client has a local copy of the database, no? And by setting
> read_hosts to contain at least localhost, it should then be able to
> query the local database, no?

In theory, we could query the localhost. But since sqlgrey will
fall-back to to allowing all mails through, it doesnt matter what is in
the database. Since the mail will go through anyway.

And sqlgrey doesnt really work without being able to write, so it
smarter just to accept all mail.


>
> > >> read_hosts=localhost prepend = 0 optmethod = optout
> discrimination = on
>
> I don't see where these options are defined either.
I see all of them, with comments, in the sample config that comes with
sqlgrey-1.8.0. Have a look there and see if not everything is explained.


> > Since, all that happens by setting
> > db_cluster=off, is that all the slaves wont be used for reads
> anymore and
> > all read queries will go to the master instead.
>
> Okay, got it. I think I got confused, but I believe I understood it
> correctly, in that when the master is down, the slaves can continue to
> read from their local database. I think it was just the db_cluster
> terminology that I wasn't understanding there.
Yes. In general (non-sqlgrey) cases, when an sql-master is down, the
application can still read from the slaves. Sqlgrey just doesnt use
this, as sqlgrey NEEDS to be able to write.


Hope that answers everything :)

- Dan

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-06 03:15:41

Hi,

> > By hanging, i mean "any network connection or connection attempt, that
> > stalls for more than a few seconds".
>
> Okay, that's how I understand it, but that's not what's happening here.
>
> All evidence so far, points to this explanation. Including the fact that
my timeout-fix worked.
> Im unsure what you base your assumption on, that this is not whats
happening, as the logs wont show you
> this and you'd need to do somthing like modifying the sqlgrey code to
provide you with debugging information or
> telnet/netcat to talk to sqlgrey & postfix.

It's only based on the fact that there is no stalling or any delays here -
it happens immediately when sqlgrey isn't running at all. Hopefully I'm
being pedantic here. I just mean that the connection attempt is never
successful if sqlgrey isn't running. It should realize this immediately.

> That's assuming sqlgrey is still around to respond.  I need to also
consider the possibility where sqlgrey dies.
>
> I've never experienced sqlgrey just dying on me, but if it happens, it is
Postfix that decides what to respond.
> It cannot it be influenced by sqlgrey.
> And the error, 451, is a temporary error, so mails will be delivered once
sqlgrey is running again.

Okay, right, I should have known that's a temporary error. I do know
sqlgrey can't control postfix if it's not running, of course.

> > >> db_host = mail02.example.com;mysql_connect_timeout=1
>
> I don't see that option in the default documentation. Where is this
documented?
>
> Its not an option. Its a hack i made up for this occasion.
>
> Sqlgrey uses a "DSN" internally for connecting to mysql. They look
somthing like this:
>
> DBI:mysql:sqlgrey;host=db.example.com;port=3306
>
> And in sqlgrey $host is just inserted into this DSN somthing like this.
> DBI:mysql:sqlgrey;host=$host;port=3306
>
> Which is why, if $host = "127.0.0.2;whatever=3", the DSN will contain
> DBI:mysql:sqlgrey;host=127.0.0.2;whatever=3;port=3306
>
> and mysql_connect_timeout happens to be an option you can add to the DSN.
> So its just a hack. Its definitely something we should add as an option
in a later version.

Okay, great, got it. It's also nice to hear another version is intended at
some point.

> > In my tests, this solves the issue, because postfix doesn't have to
> > timeout the connection to sqlgrey and everything remains shiny.
> >
> > (shiny = "mails will pass through unhindered, while the sql-server is
down")
>
> So postfix was always waiting patiently enough; it was sqlgrey that was
responding with failure too quickly?
>
> No. The other way around. sqlgrey may be 3 minutes in getting a timeout
from its mysql-connect(). But
> postfix "aint got time for that" and is disconnecting already after, eg.
, 100 seconds. So sqlgrey is too slow to
> respond to postfix and postfix just disconnects.  And THATS why you get
"Server configuration problem".

Right, okay. So is the "mysql_connect_timeout=1" instructing sqlgrey to
wait for 1s? Or is that just an on/off thing? I'm trying to understand the
postfix interaction part. In other words, postfix must have a fixed-length
amount of time it waits, since you mentioned it wasn't adjustable.
Hardcoded in sqlgrey is something that makes sure it waits less units of
time than this postfix timeout default, correct?

> Each client has a local copy of the database, no? And by setting
read_hosts to contain at least localhost, it should
> then be able to query the local database, no?
>
> In theory, we could query the localhost. But since sqlgrey will fall-back
to to allowing all mails through, it doesnt
> matter what is in the database. Since the mail will go through anyway.
>
> And sqlgrey doesnt really work without being able to write, so it smarter
just to accept all mail.

Okay, that's a big help. So although mysql itself replicates the data
between each host, sqlgrey isn't designed to read the data from that local
host, and it doesn't make sense to do that.

> > >> read_hosts=localhost prepend = 0 optmethod = optout discrimination =
on
>
> I don't see where these options are defined either.
>
> I see all of them, with comments, in the sample config that comes with
sqlgrey-1.8.0. Have a look there and
> see if not everything is explained.

I'll have to look again.

> Hope that answers everything :)

Really appreciate all your hard work, both here and in the code. I've
learned so much.

Thanks,
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Dan F. <da...@ha...> - 2014-07-06 11:08:06

> It's only based on the fact that there is no stalling or any delays here
> - it happens immediately when sqlgrey isn't running at all.

You are now talking about how Postfix reacts to a missing policy-daemon
(sqlgrey is a postfix policy-daemon).
As this is not something I or sqlgrey can influence, this is not what im
talking about at all.

I am ONLY talking about the issue you specified in your original mail,
which was (slightly summarized):
 - You had "..configured using the DBCLUSTER.."
 - and when "..one machine goes down, all three fail.."
 - with error "..4.3.5 Server configuration problem.."

And as such, i believe the issue was a mysql connection attempt that took
too long. This is now solved by setting timeout to 1 second.

How Postfix reacts to a missing policy-daemon is completely out of my
hands and out of scope. If we were troubleshooting WHY sqlgrey wasn't
running at the time, that would something else entirely. :)



> Right, okay. So is the "mysql_connect_timeout=1" instructing sqlgrey to
wait for 1s?
Yes. Hence "mysql_connect_timeout=3" would allow it to wait for up to 3
seconds for the sql-server to respond.

> In other words, postfix must have a fixed-length amount of time it
waits, since you mentioned it wasn't adjustable.
No. It is adjustable through postfix' config option
"smtpd_policy_service_timeout".
(The thing i said that i didnt think could be adjusted, was Postfixes
default reaction to unexpected errors - Ie. have it do something other
than "Server configuration problem")

> Hardcoded in sqlgrey is something that makes sure it waits less units of
time than this postfix timeout default, correct?
No. sqlgrey should never take more than a few seconds to do anything. The
fact that we are hitting the "smtpd_policy_service_timeout" is a bug.
sqlgrey simply uses the default timeout-value for connecting to mysql.
Which is way to high when something hangs. We've remedied this, by setting
it to 1, in your case.


> Okay, that's a big help. So although mysql itself replicates the data
between each host, sqlgrey isn't designed to read
> the data from that local host, and it doesn't make sense to do that.
Correct. Under normal operation, localhost is used for reads. When master
is dead, nothing will be read from localhost either.


- Dan

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Karl O. P. <ko...@me...> - 2014-07-06 12:50:28

On 07/06/2014 06:07:58 AM, Dan Faerch wrote:

> How Postfix reacts to a missing policy-daemon is completely out of my
> hands and out of scope. If we were troubleshooting WHY sqlgrey wasn't
> running at the time, that would something else entirely. :)

If someone was really worried about sqlgrey dying then there's
probably a way to run it from inetd.  But that just pushes
the problem of a dead daemon back to inetd, so the right
thing to do is work from inittab.  But why?  :-)

Karl <ko...@me...>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Dan F. <da...@ha...> - 2014-07-06 17:37:20

Karl O. Pinc wrote:
> On 07/06/2014 06:07:58 AM, Dan Faerch wrote:
>
> If someone was really worried about sqlgrey dying then there's
> probably a way to run it from inetd.  But that just pushes the problem of a
> dead daemon back to inetd, so the right thing to do is work from inittab.

Indeed.
I had issues with postgrey +10 years ago, before i switched to sqlgrey.
And i had based an internal policy-daemon upon that codebase as well.
Which then experienced the same problem and i simply couldnt track down
the bug.
Sometimes they would just stop responding, but they were still running.

I searched a long time for a way to configure the default
policy-daemon-response in postfix from "defer_if_permit" to "dunno", but
found nothing. I even stared at the source to postfix for a while, to see
if it was in there, as an undocumented option. I couldnt find anything to
suggest it.

So i ended up creating an ultra simple "policy-daemon-proxy", whos only
job was to talk to the real policy server, have faster timeout and always
report "dunno" if something goes wrong. A really silly hack and it just
underlines why this option should exist in postfix.

Then, i went with "sqlgrey" and all my problems disappeared ;)

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-17 03:52:00

Hi,

Dan, I'm hoping you can still help me, because I'm still doing something
wrong.

> I searched a long time for a way to configure the default
> policy-daemon-response in postfix from "defer_if_permit" to "dunno", but
> found nothing. I even stared at the source to postfix for a while, to see
> if it was in there, as an undocumented option. I couldnt find anything to
> suggest it.
>
> So i ended up creating an ultra simple "policy-daemon-proxy", whos only
> job was to talk to the real policy server, have faster timeout and always
> report "dunno" if something goes wrong. A really silly hack and it just
> underlines why this option should exist in postfix.
>
> Then, i went with "sqlgrey" and all my problems disappeared ;)

I did some tests this evening by basically disconnecting the server with
the master mysql database, and it caused all mail on the two remaining
systems that were still running to bounce with the "4.3.5 Server
configuration problem".

You mention here that sqlgrey has solved your problems and I apparently
don't understand how you have it configured to no longer reply with a
temporary error and somehow bypass the greylisting?

The messages aren't queued, they're just rejected, albeit temporarily, but
we can't create this single point of failure...

Thanks again for your help.
Alex

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: <da...@ha...> - 2014-07-17 10:59:39

On 2014-07-17T05:51:53 CEST, Alex wrote:
> I did some tests this evening by basically disconnecting the server
> with the master mysql database, and it caused all mail on the two
> remaining systems that were still running to bounce with the "4.3.5
> Server configuration problem".
If you made the configuration change on all your hosts, i dont know what
you are experiencing and your mail contains no new information,
technical or otherwise, to go on.  And that, paired with the fact that
im fairly certain how this works and can see in my tests that it is
indeed working as expected, simply makes me unable to come up with
guesses as to whats troubling your system.

What i CAN do, is show you how to test better, to pinpoint where the
issue may lie.

The way i tested this manually, was by simply "telnetting" to the
postgrey service and talking to it. That may be a bit cumbersome, so
fortunately Michael Ludvig has included a testscript in the tar-ball,
simply called "tester.pl".

On my system, a normal run looks like this:
----
$ ./tester.pl --client-ip 10.0.0.1
action=451 Greylisted for 5 minutes (16)
----
By adding "time" to the beginning of the command, we can see how much
time it took to complete.


So heres a run where mysql-server has downed its interface just for 10
seconds:
----
$ time ./tester.pl --client-ip 10.0.0.1
action=dunno

real	0m3.062s
user	0m0.056s
sys	0m0.004s
----
"action=dunno" means sqlgrey passes no judgment. Which in turn means
"let it through". This "conclusion" is reached within 3 seconds (you can
see that  at the line "real	0m3.062s").


And this is an example of sqlgrey not running
----
$ time ./tester.pl --client-ip 10.0.0.1
Connect failed: IO::Socket::INET: connect: Connection refused
----


Finding out how long postfix will wait is as simple as:
----
$ postconf smtpd_policy_service_timeout
smtpd_policy_service_timeout = 100s
----
In this case 100s.


When i point a my sqlgrey to a server behind a packet-dropping-firewall
and rerun the test
----
$ time ./tester.pl --client-ip 10.0.0.1
----
i literally had to ctl-c manually after ~6 minutes.  Which is way more
than 100s, of course. So THAT would result in "Server configuration
problem". Another thing that could give "Server configuration problem",
would be if any garbage output (ie. an internal error from sqlgrey) was
to be printed out to the socket. But even that would be visible by
testing like this.

As the predominant theory (and the only theory with a positive test so
far) is the timeout theory, I think you'd have to to try running this
command while you're experiencing the problem. This should help to
either prove or disprove that its a timeout problem and may even catch
any garbage output if that was the case..

- Dan

Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

From: Alex <mys...@gm...> - 2014-07-17 22:30:47

Hi,

> > I did some tests this evening by basically disconnecting the server
> > with the master mysql database, and it caused all mail on the two
> > remaining systems that were still running to bounce with the "4.3.5
> > Server configuration problem".
> If you made the configuration change on all your hosts, i dont know what
> you are experiencing and your mail contains no new information,
> technical or otherwise, to go on.  And that, paired with the fact that
> im fairly certain how this works and can see in my tests that it is
> indeed working as expected, simply makes me unable to come up with
> guesses as to whats troubling your system.

The problem is simply that when the server with the master sql database is
running on goes down, mail is stopped on all three systems. The two systems
that remain running just respond with temporary bounce messages instead of
responding with "dunno" or otherwise forwarding on the message.

> So heres a run where mysql-server has downed its interface just for 10
> seconds:
> ----
> $ time ./tester.pl --client-ip 10.0.0.1
> action=dunno
>
> real    0m3.062s
> user    0m0.056s
> sys     0m0.004s
> ----
> "action=dunno" means sqlgrey passes no judgment. Which in turn means
> "let it through". This "conclusion" is reached within 3 seconds (you can
> see that  at the line "real     0m3.062s").

Okay, I did some more testing. Live testing. At first I was surprised to
see the systems continued to deliver mail after stopping entirely the
master mysqld on mail02 because I knew I was having some kind of problem. I
monitored it for a while, made sure it was actually continuing to deliver
mail (which it was), and looking at the tons of sqlgrey logs reporting it
couldn't properly communicate with the database.

Then, about seven minutes into my testing, sqlgrey quit and died on all
three systems:

Jul 17 18:02:22 mail02 sqlgrey: fatal: setconfig error at /usr/sbin/sqlgrey
line 195.
Jul 17 18:02:36 mail03 sqlgrey: fatal: setconfig error at /usr/sbin/sqlgrey
line 195.
Jul 17 18:03:03 mail01 sqlgrey: fatal: setconfig error at /usr/sbin/sqlgrey
line 195.

The testing began here:

Jul 17 17:53:30 mail01 sqlgrey: dbaccess: error: couldn't get now() from
DB:
Jul 17 17:53:32 mail02 sqlgrey: dbaccess: error: couldn't get now() from DB:
Jul 17 17:53:33 mail03 sqlgrey: dbaccess: error: couldn't get now() from DB:

Between those times were hundreds of "Server configuration..." postfix
errors because sqlgrey had died.

> And this is an example of sqlgrey not running
> ----
> $ time ./tester.pl --client-ip 10.0.0.1
> Connect failed: IO::Socket::INET: connect: Connection refused
> ----
>
>
> Finding out how long postfix will wait is as simple as:
> ----
> $ postconf smtpd_policy_service_timeout
> smtpd_policy_service_timeout = 100s
> ----
> In this case 100s.

I still have mine set for 1s, but I'd like an "indefinite" option for the
case where I'm taking the main mysql system down for maintenance, or an
unexpected event occurs where I cannot reach the system for an undetermined
amount of time.

I don't know if my 7m test hit some magic limit or something else happened,
but I can test again if necessary, although I'd like your input first.

> When i point a my sqlgrey to a server behind a packet-dropping-firewall
> and rerun the test
> ----
> $ time ./tester.pl --client-ip 10.0.0.1
> ----
> i literally had to ctl-c manually after ~6 minutes.  Which is way more
> than 100s, of course. So THAT would result in "Server configuration
> problem". Another thing that could give "Server configuration problem",
> would be if any garbage output (ie. an internal error from sqlgrey) was
> to be printed out to the socket. But even that would be visible by
> testing like this.

So how do we explain it continuing beyond 100s when you've explicitly
defined the timeout period to be 100s?

Thanks,
Alex

1 2 > >> (Page 1 of 2)