Re: [Sqlgrey-users] Add primary keys to tables?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Le 09/02/2016 14:38, incredibleh0lg a écrit :
> [...]
> Another point made in this discussion was to give a use case: Third party applications using the data in the database. In my case, I’d like to provide a similar functionality as SQLgrey Webinterface provides. So present the user with relevant entries - e.g. in the connect table - and let him decide what to do with it.

I don't think it's a good example because you definitely don't want to
modify the connect table content in the back of SQLgrey : this is the
table it uses to implement the basic greylisting algorithm. So if you
want to modify the connect table you actually want to modify the
greylisting algorithm. It seems to me that the two sane approaches if
that's what you really want would be to :
- implement your changes in SQLgrey itself, not in a separate application,
- replace SQLgrey completely or fork it to implement your own algorithm
if what you want to do is very different or solves a specific problem
which isn't commonly encountered by SQLgrey uers.

The same limitations apply to the config/from_awl/domain_awl tables (awl
means *automatic* white list), if you try to modify them behind
SQLgrey's back you are actually designing a new greylisting program.

The only tables I created for external configuration are the
opt[in|out]_[email|domain] tables.

>  In the current format this means, that the request has to carry the WHOLE entry as parameter to get it moved or deleted.

In fact the connect table is a special case and doesn't even guarantee
that each entry is unique so you can't event select a single entry
deterministically.
The tables that are supposed to be managed externally only have a single
column : so the "WHOLE" entry is actually a single value which is
obviously known by any application which would be designed to insert or
delete entries (update doesn't make sense for these tables)... See below
if "obviously" is not a given for you.

>  This is pretty wasteful and it is the reason why I asked if somebody tested a slightly adjusted schema where generated primary keys are added by the database.In theory, this shouldn’t break the application.

It wouldn't break SQLgrey but it would make developing the application
managing SQLgrey's configuration tables more complex and bug-prone (see
a detailed explanation below).

I think you are probably approaching the SQLgrey database and an
external application's database as a single entity (see why I think so
below too), which I don't think is a good way to approach the external
configuration of SQLgrey.

In a single database for any moderately complex application surrogate
keys are nearly always the best choice for various reasons. For a
database dedicated to a trivial service like SQLgrey this is clearly not
the case.

Let's use a practical example to demonstrate.
Take the optout_email table (the other configuration tables are similar:
all have a single column) and create an application managing its content
(creating entries when users want to optout and deleting them when they
want to activate greylisting again).
If you use surrogate keys in the optout_email table the first question
you have to ask yourself is : does my application separate database's
user objects store :
- user emails,
- references to the surrogate keys in optout_email,
- both
and what is used among them to reference the optout_email entries.

You made clear that don't want to use the user emails to reference
optout_email entries so the only 2 choices for you remaining are the
last ones.
But actually if you store the references to surrogate keys you will most
probably want to keep a copy of the email address too. Not only because
it would be convenient in your application which would probably access
the Postfix user database (or a database used to create the Postfix one
which obviously must have the email addresses itself) but also because
optout_email won't have the email if the user didn't choose to opt out.
If you want to avoid storing the address at all costs you would make the
application and Postfix depend on a SQLgrey change to store all your
users' email addresses to solve this. This is not SQLgrey's role: it's a
greylisting service, not an email user database. If you want to create
your user database you obviously won't try to do so by installing a
greylisting service.

In conclusion there's no real choice once you have chosen to use
surrogate keys : you will have to store both the email and the reference
to the surrogate key in the external database in one way or another.

So now if you really want to use the surrogate keys instead of the email
itself you have to maintain consistency between your reference and the
email addresses stored in two separate databases with 2 different
codebases managed by 2 different sets of people. This is clearly (I hope
so anyway) not something you want to do *at all*. By avoiding surrogate
keys SQLgrey actually forces a potential separate database to be
inherently consistent with its own database.

In case this isn't crystal clear, as an example of the consistency
problems consider these common scenarios :
- admins create entries manually until enough users ask for direct
access and install the configuration application,
- admins use batches to create initial entries or maintain a subset of
the optout_email table in parallel to the configuration application,
these batches probably aren't aware of the existence of the
configuration application and don't know how to update its database,
- after a crash, on restoring the 2 separate databases (SQLgrey and
configuration application) you can get a more recent SQLgrey database
with entries not (yet) referenced by the other restored database version,
- ...
All these scenario imply that there will be entries in optout_email not
yet referenced by the configuration application.
When creating new users with addresses already in optout_email, how will
you create the references to these entries ? The only way is to check
that for each user you create in your application there isn't already an
entry in the optout_email table with the same email address. So for new
users your reference to the optout_email table is the email address and
for existing users the reference is the surrogate key : it's bad design
and a loud and clear invitation to bugs.

Now let's see how you would design the application without surrogate keys.
The application has its user objects with their own email attribute.
- when you want to activate optout, you use your application's email
attribute to find or create an entry in the optout_email table if needed,
- when you want to deactivate it, you find and delete the entry with the
same email,
- when you change the email address of a user, you remove the old entry
(if it exists) and create a new one if applicable.
This is *trivial* logic. An environment where this is difficult to do
would have to prevent you to make the most basic SQL queries imaginable.

Why should we even consider a change only useful to people starting to
develop external tools by shooting themselves in the foot when choosing
their development environment ?

Best regards,

Lionel