Re: [Sqlgrey-users] Add primary keys to tables?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi.

Thanks @Lionel for the long explanation. I really like to discuss this :)

So let’s go into it.
> On 9 Feb 2016, at 17:28, Lionel Bouton <lio...@bo...> wrote:
> 
> Hi,
> 
> Le 09/02/2016 14:38, incredibleh0lg a écrit :
>> [...]
>> Another point made in this discussion was to give a use case: Third party applications using the data in the database. In my case, I’d like to provide a similar functionality as SQLgrey Webinterface provides. So present the user with relevant entries - e.g. in the connect table - and let him decide what to do with it.
> 
> I don't think it's a good example because you definitely don't want to
> modify the connect table content in the back of SQLgrey : this is the
> table it uses to implement the basic greylisting algorithm. So if you
> want to modify the connect table you actually want to modify the
> greylisting algorithm. It seems to me that the two sane approaches if
> that's what you really want would be to :
> - implement your changes in SQLgrey itself, not in a separate application,
> - replace SQLgrey completely or fork it to implement your own algorithm
> if what you want to do is very different or solves a specific problem
> which isn't commonly encountered by SQLgrey uers.
> 
> The same limitations apply to the config/from_awl/domain_awl tables (awl
> means *automatic* white list), if you try to modify them behind
> SQLgrey's back you are actually designing a new greylisting program.
> 
> The only tables I created for external configuration are the
> opt[in|out]_[email|domain] tables.
Ok, my bad (or more: my English skills :)) I see and treat SQLgrey’s database 
as what it is - a database used by an external tool. The “relevant entries” is 
something I still try to get my head around, as I will have to implement some 
logic *outside* SQLgrey (+ it’s database) to filter entries. This is definitely not 
SQLgrey’s job. It is and will always be the greylisting solution of my choice.
My “problem”, I try to solve, is to provide some kind of integrated UI where
users can manage their own Greylisting settings (opting out, whitelisting one 
specific email etc.) SQLgrey should do what it is really good at: greylisting.

> 
>> In the current format this means, that the request has to carry the WHOLE entry as parameter to get it moved or deleted.
> 
> In fact the connect table is a special case and doesn't even guarantee
> that each entry is unique so you can't event select a single entry
> deterministically.
> The tables that are supposed to be managed externally only have a single
> column : so the "WHOLE" entry is actually a single value which is
> obviously known by any application which would be designed to insert or
> delete entries (update doesn't make sense for these tables)... See below
> if "obviously" is not a given for you.
Fair enough. I was just thinking about an option to get an optional surrogate 
key (to stick with your terminology - every day a new word / term :D) to make 
it easier to address an entry. As far as I understand the logic for whitelisting an 
email currently “sitting in” greylisting, we have to create a corresponding entry 
with the data from the ‘connect' table within the ‘from_awl’ table and delete the 
entry in ‘connect’. At the moment I would have to submit 4 different values. As 
the entries in ‘connect’ are not necessarily unique, a surrogate key would make 
this easier. But life isn’t always easy, is it :)

> 
>> This is pretty wasteful and it is the reason why I asked if somebody tested a slightly adjusted schema where generated primary keys are added by the database.In theory, this shouldn’t break the application.
> 
> It wouldn't break SQLgrey but it would make developing the application
> managing SQLgrey's configuration tables more complex and bug-prone (see
> a detailed explanation below).
> 
> I think you are probably approaching the SQLgrey database and an
> external application's database as a single entity (see why I think so
> below too), which I don't think is a good way to approach the external
> configuration of SQLgrey.
See above, I wasn’t really specific enough. So sorry again :)

> 
> In a single database for any moderately complex application surrogate
> keys are nearly always the best choice for various reasons. For a
> database dedicated to a trivial service like SQLgrey this is clearly not
> the case.
Agreed. Just the point I tried to make in my previous email.

> 
> Let's use a practical example to demonstrate.
> Take the optout_email table (the other configuration tables are similar:
> all have a single column) and create an application managing its content
> (creating entries when users want to optout and deleting them when they
> want to activate greylisting again).
> If you use surrogate keys in the optout_email table the first question
> you have to ask yourself is : does my application separate database's
> user objects store :
> - user emails,
> - references to the surrogate keys in optout_email,
> - both
> and what is used among them to reference the optout_email entries.
> 
> You made clear that don't want to use the user emails to reference
> optout_email entries so the only 2 choices for you remaining are the
> last ones.
> But actually if you store the references to surrogate keys you will most
> probably want to keep a copy of the email address too. Not only because
> it would be convenient in your application which would probably access
> the Postfix user database (or a database used to create the Postfix one
> which obviously must have the email addresses itself) but also because
> optout_email won't have the email if the user didn't choose to opt out.
> If you want to avoid storing the address at all costs you would make the
> application and Postfix depend on a SQLgrey change to store all your
> users' email addresses to solve this. This is not SQLgrey's role: it's a
> greylisting service, not an email user database. If you want to create
> your user database you obviously won't try to do so by installing a
> greylisting service.
Well, I wouldn’t mind to use the email address - although this can be tricky
in itself, depending on how the data gets transfered (http-encoded email 
addresses can be a pain, esp. with + addresses, which I personally use 
extensively).
I absolutely get your point and agree 100%. My application should not 
“hijack” SQLgrey’s schema (and was never planned to do it). Otherwise it 
would be worth implementing my own greylisting service. Which would be 
an interesting challenge in itself, but not very high on my long ToDo list…

> 
> In conclusion there's no real choice once you have chosen to use
> surrogate keys : you will have to store both the email and the reference
> to the surrogate key in the external database in one way or another.
> 
> So now if you really want to use the surrogate keys instead of the email
> itself you have to maintain consistency between your reference and the
> email addresses stored in two separate databases with 2 different
> codebases managed by 2 different sets of people. This is clearly (I hope
> so anyway) not something you want to do *at all*. By avoiding surrogate
> keys SQLgrey actually forces a potential separate database to be
> inherently consistent with its own database.
> 
> In case this isn't crystal clear, as an example of the consistency
> problems consider these common scenarios :
> - admins create entries manually until enough users ask for direct
> access and install the configuration application,
> - admins use batches to create initial entries or maintain a subset of
> the optout_email table in parallel to the configuration application,
> these batches probably aren't aware of the existence of the
> configuration application and don't know how to update its database,
> - after a crash, on restoring the 2 separate databases (SQLgrey and
> configuration application) you can get a more recent SQLgrey database
> with entries not (yet) referenced by the other restored database version,
> - ...
> All these scenario imply that there will be entries in optout_email not
> yet referenced by the configuration application.
> When creating new users with addresses already in optout_email, how will
> you create the references to these entries ? The only way is to check
> that for each user you create in your application there isn't already an
> entry in the optout_email table with the same email address. So for new
> users your reference to the optout_email table is the email address and
> for existing users the reference is the surrogate key : it's bad design
> and a loud and clear invitation to bugs.
Absolutely agreed. And on this special occasion, I think it is absolutely fine to
not normalise the data. It is simply not necessary or helpful. The complete
opposite is the case: this would definitely have an performance impact.

> 
> Now let's see how you would design the application without surrogate keys.
> The application has its user objects with their own email attribute.
> - when you want to activate optout, you use your application's email
> attribute to find or create an entry in the optout_email table if needed,
> - when you want to deactivate it, you find and delete the entry with the
> same email,
> - when you change the email address of a user, you remove the old entry
> (if it exists) and create a new one if applicable.
> This is *trivial* logic. An environment where this is difficult to do
> would have to prevent you to make the most basic SQL queries imaginable.
> 
> Why should we even consider a change only useful to people starting to
> develop external tools by shooting themselves in the foot when choosing
> their development environment ?
Couldn’t agree more. 

Well, I think it was worth asking anyway. Thanks again for the very 
detailed answer.

All the best,
Holger