Re: [Sqlgrey-users] Congratulations & PostgreSQL issue

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Le 11/08/2015 22:26, Jean-Michel Pouré - GOOZE a écrit :
> Dear Lionel,
>
> First, I would like to congratulate you for bringing SqlGrey to the
> community. I just discovered SqlGrey and it solved my SPAM problems.

Happy to be of service. Numerous others have helped too (see the CONTRIB
file).

>
> Together with SPF validation, DNSBL and other filtering techniques, I
> now have nearly zero spam.
>
> I am running Debian and I am a big fan of PostgreSQL. The automatic
> population of the database and the SQL code using prepared queries was
> a very nice surprise. 
>
> Previously, I tested other grey listing solutions and they don't have
> the simplicity and power of SqlGrey. I am surprised to see that some
> SqlGrey databases have been running years without incident.

This is a testament to the robustness of both the database systems and
SQLgrey.

>
> There might be modifications needed to stand "huge" loads under
> PostgreSQL. For example, I think of as much as 10.000 queries per
> second. This does not happen very often, I agree.

In theory you should be able to reach these levels with PostgreSQL
easily even with write-intensive situations like SPAM storms if you use
enterprise level SSD to store your database. I've not tested SQLgrey
itself at these levels though (you may need to distribute the workload
on multiple instances).

>
> First, I think it should be interesting to :
>
> * Use special the inet type for IPs and subnets
> http://www.postgresql.org/docs/current/static/datatype-net-types.html
>
> I believe that using the inet type should be compatible with current
> code in a fresh database.
>
> SELECT '216.34.181'::cidr =>  "216.34.181.0/24"
> SELECT '216.34.181.6'::inet => "216.34.181.6"

I considered this very early (and in fact I ran some code using these
datatypes the very first day I started developing SQLgrey). PostgreSQL
is by far my favourite RDBMS but I wanted SQLgrey to avoid being a
PostgreSQL-only project so I refrained from using proprietary datatypes.
The performance difference should be negligible as a string
representation doesn't have much storage overhead (probably <3x) and
allows us to use a very simple (and fast) string operation in Perl and
equality for matching class C networks using the src column. It is used
like this when using greymethod "smart" (default) or "classc". Using a
cidr datatype could be faster but I'm not aware of a single SQLgrey
installation where this would make a difference.

Switching to the cidr datatype could probably give a minor speedup in
PostgreSQL, but would stop SQLgrey support (or make it complex) for
SQLite and MySQL/MariaDB and would probably bring a minor slowdown in
the Perl part.

>
> * IP4r type could also be used from managing ranges of servers and
> spammers.IP4r reported 6000 queries per second on complex projects:
> https://www.siafoo.net/article/53

I'm not familiar with the IP4r module but I guess that my answer above
would apply to it too.

>
> * Add indexes on domain_awl and from_awl tables.

They are already indexed: the primary key automatically adds a unique
index which should be used by all queries (at least it was designed to).

>  The advantage of CIDR
> is that it will be very well indexed.

The string index is very efficient too, the only real drawback is the
storage. If you have a SQLgrey database where shaving half of the
storage requirement would be beneficial the cidr datatype might be
interesting to look at but I seriously doubt that such a database exists.

Best regards,

Lionel