Re: Database architecture...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

You are both right.

The problem is to know where to stop.

1. re excellence of SQLledger
  I agree that SQLledger is superb. It is much much simpler than
  commercial offerings that I have had to support in the past.
  In general this simplicity is not at the expense of loss of
  function.

  Having a database that you can fully understand because it has only
  a dozen tables is precious.

  Roland's point about increasing database complexity as modules and
  features are added is very apparent in commercial examples where 
  tables are numbered in the many hundreds. I ernestly hope that
  SQL-ledger can avoid falling into this trap. I think Roland's
suggestions
  might help this - but they need to be carefully thought through first.

2. re 3NF
  This is a sound principle which can easily be pushed past the point of
  absurdity.

  For example, why stop at a separate table for addresses? I could (but
won't)
  argue that we should properly normalise addresses into 4 subtables:
   locations -< streets -< towns -< countries on the grounds that this
  would eliminate enormous redundancy (especially in countries with
large
  population and many towns/cities). 

 3. re OOP
  All OOP programming has at its heart the difficulty of deciding how
  to structure your nodel of the problem space. This is almost never
  a single correct or obviuos design.

John O'Gorman

Roland Stoker wrote:
> 
> Hi Thomas...
> 
> This is the last answer to you over the list, for a further discussion I
> suggest we do that privatly....
> 
> I am aware of 3NF, and why you don't always want to follow those rules, I've
> made a lot exceptions on them myself because it can be usefull (mostly for
> performance as you say).
> But helas, thats not my point here. I'm not talking about performance here,
> but about using the possibilities that are offered by open-source software,
> and about keeping possibilyties open for future expansions.
> Most companies would be helped if a software package could be build that
> incorporates al packages they need, using only one database. Take for exemple
> an addresbook. You need the addresses to make invoices, but not everyone has
> acces (or should have) to the accounting, so you'll need a second one for the
> other employees. But if there's a second program that uses the same tables,
> they could use that.
> For that idea to work, you would need a standerd database model, and
> something like addresses would have to be in one table.
> There's also something like loginnames, but I'll react to that later...
> For sql-ledger to really improof from what it is now (it's the best
> bookkeeping program I've seen) is to make it work together with other
> programs and to let the idea go that everything should work inside of it.
> You've now made an order system inside of sql-ledger, but I havn't seen an
> accounting doing the ordering of hardware yet or placing the orders of
> clients. Usually that's someone else, who has no access to the accounting.
> Wouldn't it be great if sql-ledger and phpgroupware could use the same
> database (and i don't mean just the adresbook, but the entire package). With
> enough willpower, it could be done within a month, because it really isn't
> that much work.
> 
> But even if you don't do something like that, the continuing expanding of
> sql-ledger as it is happening now, will mean that in the future there are
> going to be more functions for people and businesses. And creating tables
> every time is going to mean that you're database gets more complex and less
> usefull for companies who want to use all the functionality because they have
> to copy persons all over the database and have diffuculties keeping address
> correct.
> 
> > This is called 3NF (Codd's "Third Normal Form".)  As to it being redundant
> > and the source of 'minor problems' is really a matter of opinion.  Ensuring
> > relational integrity by having each entity describe only ONE subject
> > (whether customer or vendor or foobar) is the practical application of the
> > theory. The entity described by customer is the person's ROLE...this is a
> > distinct role, different from the role of vendor, hence, not really
> > "redundant". Can the same person have two roles?  Yes...does it happen
> > often enough to justify adding complexity to the schema?  Not terribly
> > often.
> 
> I agree it doesn't really happen that often, but that in intself means that i
> DOES happen...???
> 
> > > Other exemple. Say I would like everybody (vendors and customars) to have
> > > a login on my website. With what query do I check the login-name?? I
> > > cannot create a view because then I cannot garantee that a login name
> > > appears only once!
> >
> > This has been discussed previously and is a valid point.  Relational
> > databases use check constraints to prevent corruption.  See the postgres
> > man page on ADD CONSTRAINT or CREATE INDEX...you could also add a stanza or
> > two to the existing code to trap the error before the parser prints stderr
> > to the user's screen.  This is prettier to view than a failed query dumped
> > to the screen (and possibly more useful for the end user.)
> 
> As far as I know that isn't possible. You cannot create an index on a view,
> only on individual tables, which means that a loginname could excist twice,
> once in vendor, and once in customar.
> I think we agree that coding something that checks for that problem isn't the
> ideal solution.
> 
> > > Solution: The solution would be to not take people or companies as being
> > > there functions. I mean, a vendor is not really a vendor because he
> > > himself also buys goods. He is actually a "company" that has a "vendor"
> > > relationship with you.
> > > To recreate that you make a th folowing tables:
> > >
> > > -companies (with the fields like customar or vendor)
> > > -function (a table with company_nr and function_nr, they are primary key)
> > > -function_define (function_nr = primary key, and some explanation fields)
> > > -people (probably something like employees like adres and phone,... but
> > > without salary)
> > > -peoplecompany (a bit like function to tie people to companies, like
> > > contacts) -employee (people_nr, salary,...)
> > > ( You could also start with departments, but is not necessary)
> >
> > And this reduces redundancy?  I don't see it.  From this vantage point it
> > increases complexity without any real gain as to access the data you would
> > need too many joins.  Views would contain subqueries (correlated
> > perhaps)... All of this for the rare exception?   I don't see the gain.
> 
> I'm going from 5 to 6 tables, so not that much compexity extra. If you're
> counting fields it will actually become less. But when there's a new kind of
> company or person, that's when the advantages of this system come into action.
> 
> > It would be easier to simply place every contact in one table and then
> > define their role in the context of a transaction.  But is it necessary?
> > Or even useful?  It makes cursor selects (without joins) difficult when
> > all we want back is a list of vendors...keep it simple is a very good
> > motto.
> 
> There's something called views. If you create a view calle vendor consisting
> of all companies that sell goods to you, you wouldn't even need to change the
> query's. Personly, I don't mind the joins. I have more problems with tables
> that shouldn't be there, making it more diffucult for people trying to make
> add-ons for you're program.
> 
> > > Employee could have a company_nr, but what's the point, I am only
> > > ionterested in my employees so I assume there all mine....
> > > Even adresses can be separated from people and companies if you're really
> > > ambitius.
> >
> > vendor/customer relations may have drill down tables as well...without
> > abandoning the concept of confining each relation to the description of
> > a singular entity (in this case the ROLE of the person recorded in the
> > relation.)  Although here again, it is not a matter of ambition but the
> > usual tradeoff between performance (denormalisation of schema) versus
> > reduction of redundancy (in an effort to cut overhead and eliminate data
> > errors).  In real world situations people often denormalise because the
> > performance gain is worth it.
> 
> I agree that you can have a bad model which gives significant performance
> loss. That's definitly not the case here.
> If you're using an index (which was the idea), the db will search
> logaritmicly. Meaning that that if the tables become twice as long (combining
> vendor's and customars) it will need twice as long.
> Exemple: You have 1 million customars and 1 million vendor's in you're db.
> making the query ( to search a certain vendor using indexnumber) in the db we
> use now using 20 compare statments. The new one will use 42. 42 Compare
> statements is still using no time.
> The performance towards reduction of redundancy you're talking about is
> actually called caching. This is where you deliberatly put data in a table,
> to be able to find it faster later. That's not what were discussing now.
> 
> If you see this sort of thing as too
> > redundant for your taste you might consider the hierarchical model (mumps)
> > as an alternative.  It is very fast, less 'redundant' and has a leaner
> > programming language.  Of course, it has given way to the relational model
> > for some very good reasons (not just Oracle's marketing skills.)
> 
> I'm very fond of the relational system, and for good reason. Even dough I
> don't like using 1 to 3NF
> 
> > Simpler IS better...
> 
> Yes, but what is simpler????
> 
> > > Query's
> > > Queries will not change much, other then always having to define who you
> > > want.
> >
> > [ ... snip ... ]
> >
> > No offence Roland, but what you're talking about is reinventing the wheel.
> > In my view, the current one (being round - and sound) suffices.  Another
> > point is that I've seen alot of people come along and offer suggestions for
> > sweeping changes to a product that has already racked up hours of
> > development time and has a base of satisified users.  Usually not much
> > comes of this. The reason?  The current product does the job...otherwise it
> > would not be in wide use.
> 
> I'm not suggesting anything that would take a lot of time coding. In its
> sompelest form, not even the query's would need work, only the db. If it
> needed a totaly new interface, I would be in the wrong place, because this
> interface is practicly finished and really good.
> 
> Of course, should you wish to reinvent the wheel
> > as an academic exercise, why not?  But I see the existing schema and code
> > as already being rather trim and possessing an object oriented bent.  The
> > combo of a sound relational structure coupled with an OO design is a clear
> > winner.
> 
> Finaly!!!! Of course " a sound relational structure coupled with an OO design
> is a clear winnar". That's what I'm suggesting here. You have an object
> called companies, with certain functions in an interaction with you. That's
> what this is all about. There is no object "vendor" because thats a proparty
> of an object, and not the object itself.
> 
> Roland