Re: Database architecture...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Thomas...

This is the last answer to you over the list, for a further discussion I 
suggest we do that privatly....

I am aware of 3NF, and why you don't always want to follow those rules, I've 
made a lot exceptions on them myself because it can be usefull (mostly for 
performance as you say).
But helas, thats not my point here. I'm not talking about performance here, 
but about using the possibilities that are offered by open-source software, 
and about keeping possibilyties open for future expansions.
Most companies would be helped if a software package could be build that 
incorporates al packages they need, using only one database. Take for exemple 
an addresbook. You need the addresses to make invoices, but not everyone has 
acces (or should have) to the accounting, so you'll need a second one for the 
other employees. But if there's a second program that uses the same tables, 
they could use that.
For that idea to work, you would need a standerd database model, and 
something like addresses would have to be in one table.
There's also something like loginnames, but I'll react to that later...
For sql-ledger to really improof from what it is now (it's the best 
bookkeeping program I've seen) is to make it work together with other 
programs and to let the idea go that everything should work inside of it. 
You've now made an order system inside of sql-ledger, but I havn't seen an 
accounting doing the ordering of hardware yet or placing the orders of 
clients. Usually that's someone else, who has no access to the accounting.
Wouldn't it be great if sql-ledger and phpgroupware could use the same 
database (and i don't mean just the adresbook, but the entire package). With 
enough willpower, it could be done within a month, because it really isn't 
that much work.

But even if you don't do something like that, the continuing expanding of 
sql-ledger as it is happening now, will mean that in the future there are 
going to be more functions for people and businesses. And creating tables 
every time is going to mean that you're database gets more complex and less 
usefull for companies who want to use all the functionality because they have 
to copy persons all over the database and have diffuculties keeping address 
correct.

> This is called 3NF (Codd's "Third Normal Form".)  As to it being redundant
> and the source of 'minor problems' is really a matter of opinion.  Ensuring
> relational integrity by having each entity describe only ONE subject
> (whether customer or vendor or foobar) is the practical application of the
> theory. The entity described by customer is the person's ROLE...this is a
> distinct role, different from the role of vendor, hence, not really
> "redundant". Can the same person have two roles?  Yes...does it happen
> often enough to justify adding complexity to the schema?  Not terribly
> often.

I agree it doesn't really happen that often, but that in intself means that i 
DOES happen...???

> > Other exemple. Say I would like everybody (vendors and customars) to have
> > a login on my website. With what query do I check the login-name?? I
> > cannot create a view because then I cannot garantee that a login name
> > appears only once!
>
> This has been discussed previously and is a valid point.  Relational
> databases use check constraints to prevent corruption.  See the postgres
> man page on ADD CONSTRAINT or CREATE INDEX...you could also add a stanza or
> two to the existing code to trap the error before the parser prints stderr
> to the user's screen.  This is prettier to view than a failed query dumped
> to the screen (and possibly more useful for the end user.)

As far as I know that isn't possible. You cannot create an index on a view, 
only on individual tables, which means that a loginname could excist twice, 
once in vendor, and once in customar.
I think we agree that coding something that checks for that problem isn't the 
ideal solution.

> > Solution: The solution would be to not take people or companies as being
> > there functions. I mean, a vendor is not really a vendor because he
> > himself also buys goods. He is actually a "company" that has a "vendor"
> > relationship with you.
> > To recreate that you make a th folowing tables:
> >
> > -companies (with the fields like customar or vendor)
> > -function (a table with company_nr and function_nr, they are primary key)
> > -function_define (function_nr = primary key, and some explanation fields)
> > -people (probably something like employees like adres and phone,... but
> > without salary)
> > -peoplecompany (a bit like function to tie people to companies, like
> > contacts) -employee (people_nr, salary,...)
> > ( You could also start with departments, but is not necessary)
>
> And this reduces redundancy?  I don't see it.  From this vantage point it
> increases complexity without any real gain as to access the data you would
> need too many joins.  Views would contain subqueries (correlated
> perhaps)... All of this for the rare exception?   I don't see the gain.

I'm going from 5 to 6 tables, so not that much compexity extra. If you're 
counting fields it will actually become less. But when there's a new kind of 
company or person, that's when the advantages of this system come into action.

> It would be easier to simply place every contact in one table and then
> define their role in the context of a transaction.  But is it necessary?
> Or even useful?  It makes cursor selects (without joins) difficult when
> all we want back is a list of vendors...keep it simple is a very good
> motto.

There's something called views. If you create a view calle vendor consisting 
of all companies that sell goods to you, you wouldn't even need to change the 
query's. Personly, I don't mind the joins. I have more problems with tables 
that shouldn't be there, making it more diffucult for people trying to make 
add-ons for you're program.

> > Employee could have a company_nr, but what's the point, I am only
> > ionterested in my employees so I assume there all mine....
> > Even adresses can be separated from people and companies if you're really
> > ambitius.
>
> vendor/customer relations may have drill down tables as well...without
> abandoning the concept of confining each relation to the description of
> a singular entity (in this case the ROLE of the person recorded in the
> relation.)  Although here again, it is not a matter of ambition but the
> usual tradeoff between performance (denormalisation of schema) versus
> reduction of redundancy (in an effort to cut overhead and eliminate data
> errors).  In real world situations people often denormalise because the
> performance gain is worth it.  

I agree that you can have a bad model which gives significant performance 
loss. That's definitly not the case here.
If you're using an index (which was the idea), the db will search 
logaritmicly. Meaning that that if the tables become twice as long (combining 
vendor's and customars) it will need twice as long.
Exemple: You have 1 million customars and 1 million vendor's in you're db.
making the query ( to search a certain vendor using indexnumber) in the db we 
use now using 20 compare statments. The new one will use 42. 42 Compare 
statements is still using no time.
The performance towards reduction of redundancy you're talking about is 
actually called caching. This is where you deliberatly put data in a table, 
to be able to find it faster later. That's not what were discussing now.

If you see this sort of thing as too
> redundant for your taste you might consider the hierarchical model (mumps)
> as an alternative.  It is very fast, less 'redundant' and has a leaner
> programming language.  Of course, it has given way to the relational model
> for some very good reasons (not just Oracle's marketing skills.)

I'm very fond of the relational system, and for good reason. Even dough I 
don't like using 1 to 3NF

> Simpler IS better...

Yes, but what is simpler????

> > Query's
> > Queries will not change much, other then always having to define who you
> > want.
>
> [ ... snip ... ]
>
> No offence Roland, but what you're talking about is reinventing the wheel.
> In my view, the current one (being round - and sound) suffices.  Another
> point is that I've seen alot of people come along and offer suggestions for
> sweeping changes to a product that has already racked up hours of
> development time and has a base of satisified users.  Usually not much
> comes of this. The reason?  The current product does the job...otherwise it
> would not be in wide use.  

I'm not suggesting anything that would take a lot of time coding. In its 
sompelest form, not even the query's would need work, only the db. If it 
needed a totaly new interface, I would be in the wrong place, because this 
interface is practicly finished and really good.

Of course, should you wish to reinvent the wheel
> as an academic exercise, why not?  But I see the existing schema and code
> as already being rather trim and possessing an object oriented bent.  The
> combo of a sound relational structure coupled with an OO design is a clear
> winner.

Finaly!!!! Of course " a sound relational structure coupled with an OO design 
is a clear winnar". That's what I'm suggesting here. You have an object 
called companies, with certain functions in an interaction with you. That's 
what this is all about. There is no object "vendor" because thats a proparty 
of an object, and not the object itself.

Roland