From: Lionel B. <lio...@bo...> - 2015-08-11 22:28:40
|
Le 11/08/2015 22:26, Jean-Michel Pouré - GOOZE a écrit : > Dear Lionel, > > First, I would like to congratulate you for bringing SqlGrey to the > community. I just discovered SqlGrey and it solved my SPAM problems. Happy to be of service. Numerous others have helped too (see the CONTRIB file). > > Together with SPF validation, DNSBL and other filtering techniques, I > now have nearly zero spam. > > I am running Debian and I am a big fan of PostgreSQL. The automatic > population of the database and the SQL code using prepared queries was > a very nice surprise. > > Previously, I tested other grey listing solutions and they don't have > the simplicity and power of SqlGrey. I am surprised to see that some > SqlGrey databases have been running years without incident. This is a testament to the robustness of both the database systems and SQLgrey. > > There might be modifications needed to stand "huge" loads under > PostgreSQL. For example, I think of as much as 10.000 queries per > second. This does not happen very often, I agree. In theory you should be able to reach these levels with PostgreSQL easily even with write-intensive situations like SPAM storms if you use enterprise level SSD to store your database. I've not tested SQLgrey itself at these levels though (you may need to distribute the workload on multiple instances). > > First, I think it should be interesting to : > > * Use special the inet type for IPs and subnets > http://www.postgresql.org/docs/current/static/datatype-net-types.html > > I believe that using the inet type should be compatible with current > code in a fresh database. > > SELECT '216.34.181'::cidr => "216.34.181.0/24" > SELECT '216.34.181.6'::inet => "216.34.181.6" I considered this very early (and in fact I ran some code using these datatypes the very first day I started developing SQLgrey). PostgreSQL is by far my favourite RDBMS but I wanted SQLgrey to avoid being a PostgreSQL-only project so I refrained from using proprietary datatypes. The performance difference should be negligible as a string representation doesn't have much storage overhead (probably <3x) and allows us to use a very simple (and fast) string operation in Perl and equality for matching class C networks using the src column. It is used like this when using greymethod "smart" (default) or "classc". Using a cidr datatype could be faster but I'm not aware of a single SQLgrey installation where this would make a difference. Switching to the cidr datatype could probably give a minor speedup in PostgreSQL, but would stop SQLgrey support (or make it complex) for SQLite and MySQL/MariaDB and would probably bring a minor slowdown in the Perl part. > > * IP4r type could also be used from managing ranges of servers and > spammers.IP4r reported 6000 queries per second on complex projects: > https://www.siafoo.net/article/53 I'm not familiar with the IP4r module but I guess that my answer above would apply to it too. > > * Add indexes on domain_awl and from_awl tables. They are already indexed: the primary key automatically adds a unique index which should be used by all queries (at least it was designed to). > The advantage of CIDR > is that it will be very well indexed. The string index is very efficient too, the only real drawback is the storage. If you have a SQLgrey database where shaving half of the storage requirement would be beneficial the cidr datatype might be interesting to look at but I seriously doubt that such a database exists. Best regards, Lionel |