Re: [Sqlgrey-users] sqlgrey fails when configured with multiple machines

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

On Thu, Jul 3, 2014 at 4:59 PM, <da...@ha...> wrote:

>  On 2014-07-03 19:29, Alex wrote:
>
> > I believe the error 113 means "no route to host" and that should fail
>
>  Yes, the server was unreachable because it was down.
>
>   Yes, sorry, my point was perhaps unclear.
> I was just trying to say, that out of the many errors you could have
> gotten, you got 113. And that 113 should fail fast and not hang.
>

I'm not sure it actually hung. I realized it was a problem when every mail
that was being received was immediately rejected due to "Server
configuration error". The messages weren't queued or delayed in any way.
All mail on all three systems were immediately being rejected, for more
than an hour before I was able to bring the server back and restart sqlgrey
on each system.

> But in the meantime, id's like to revise that statement. I have actaully
> gotten a 113 that hangs now.
> I finally succeeded in getting it to do so, by entering db_host as
> 192.188.1.3, which for me apparently cannot be routed.
>

Okay, maybe your definition of "hang" is different than mine, but perhaps
we're really talking about the same thing. In any case, when my system
fails, it just outright rejects mail across all systems, apparently because
it can't talk to the master.

>
> And now im seeing delays which may support to my original "timeout"
> theory. So i need you to test something.
> change your:
> db_host = mail02.example.com
> to:
> db_host = mail02.example.com;mysql_connect_timeout=1
>
> (same line, no extra spaces) and the restart sqlgrey and see if it helps.
>

Please confirm that you think I should do this, given the new information
about failures above.

>
> Also. What version of sqlgrey are you running?
>

sqlgrey-1.8.0 compiled here locally.

>  > I'd go with Lionel's suggestion to try and run sqlgrey without
> > db_clustering to simplify the setup. Though i dont think itll show any
> > difference, it should be an easy test and it will rule out (or confirm)
> > that it has something to do with db_clustering.
>
>  I don't really see how that's an option, though, because a client could
> conceivably have to try three different servers before being allowed to
> connect, meaning up to a fifteen minute delay before the mail is even
> accepted, assuming the client even retries that many times, which I doubt
> it would. That's the whole reason for clustering in the first place.
>
> Well.. No. The reason you mention, is the reason for using a central
> sql-server. The reason for db-clustering, is the performance of the central
> sql-server.
> All your mail-nodes use the same write-host. And so the write host will
> have the same data as your readhosts.
>
> Theres no technical reason why all your mailservers couldnt use one
> central database, like so:
>
>       [mail1] --->  [db]  <---- [mail*]
>
> The reason i created dbclustering, was because i had some 10 mailservers
> at the time, with one central database and bot-nets were hammering sqlgrey,
> causing the db to hang sometimes, due to the sheer amount of lookups.
> So i setup a mysql-slave on each mailserver, had them replicate data from
> the master and made sqlgrey read from localhost only. This removed all the
> "read" load from the db-master.
>

Yes, okay, I do understand that. I should have written that as well, but my
main reason is to avoid users from being greylisted numerous times for
sending mail to the same user in the same domain.

>
> Under normal load, i can easily point all queries to the db-master,
> without any problems.  I just tested with db_cluster=off and i can see
> select queries going to the master now, instead of localhost. And
> everything else works fine.
>

Okay, but if the master dies, then no queries occur, correct?

 Could you post your config so I can compare with mine?

loglevel = 2
> reconnect_delay = 5
> max_connect_age = 3
> connect_src_throttle = 15
> awl_age = 32
> group_domain_level = 10
>
> db_type = mysql
> db_name = sqlgrey
> db_host = dbmaster.example.com
> db_user = sqlgreyuser
> db_pass = password
> db_cleandelay = 60
> db_cluster = on
> read_hosts=localhost
> prepend = 0
> optmethod = optout
> discrimination = on
> discrimination_add_rulenr = on
> reject_first_attempt = immed
> reject_early_reconnect = immed
> reject_code = 451
>

There are a few options there that I'm not using, and I don't recognize,
but I don't believe the lack of any of them would cause the issue I'm
having, correct?

 How did you set up mysql?
>
>
> 1 master and many slaves replicating. Each slave lives on the
> mailserver-node, together with postfix and sqlgrey.
> All sqlgrey's use localhost for read, master for write.
>

Ah, I think I have it configured for all hosts to write to the one master.
How can you have all hosts write to the local database, yet have any kind
of synchronization between tables?

I'm pretty sure I set it up according to the way it was documented,
particularly given I don't know much about replication myself.

Hopefully this info helps better isolate where I'm going wrong?

Thanks,
Alex