Re: [Modeling-users] Pythonic, and non-XML, Model description

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Thanks for the ModelSet clarification. Further comments below...

> About: Models' being mutable at runtime
>> In general this is not what anyone woudl want to do, and I was trying
>> to think of times when this would be useful. Can't really, but i
>> *suspect* it might come in handy. But, in fact, imagine some
>> application that is stocking lots of incoming unknown data into a
>> db... let's say the data is XML, but each XML data-schema is unknown a
>> priori. You can create a py model to correspond, at runtime (but saved
>> for subsequent accessing), create the tables, and stock the unknown
>> XML in the db directly as real db tables....
>
> Okay, I now can be more precise here: adding an entity at runtime is
> definitely not a problem. But mutating an entity (changing its
> attributes for example, or its name), or removing an entity should 
> *not*
> be done after an object of the corresponding class has been registered
> within an EditingContext.
>
> (this should be a FAQ)

OK, understood, and probbaly a good idea to mention it as an FAQ.

...

>> What is wrong with a python list?
>> multiplicity = [0,1]
>> multiplicity = [1,-1]
>
> Well, I understand your point quite well and I'm ok with a list. [1,-1]
> seems however obscure, if we go for a python list what do you think of
> [0,'*'] or maybe better: [0,]  ?

Absolutely fine with me (used -1 to keep it as an integer.) I think i 
prefer
the None version of this, which could also be explicit, i.e [1,None]

>>>   - Same for external type's width, precision and scale, such as in:
>>>     'NUMERIC(12,2)' or 'VARCHAR(200)'
>>
>> Same thing here. This would also make it difficult to provide defaults
>> separately for dbtype, and for width or precision.
>
> I disagree: default would be treated exactly the same way: if the width
> or precision/scale cannot be found in the parsed string then the 
> defaults
> apply.

Ah, so you mean that assignem,ent will just map such an initial value 
onto the
concerned properties, and then work with the separated value sfrom then 
on.
Well, OK. But may lead to confusion if you for the value of type, and 
get
something else than what you specified (e.g. only 'NUMERIC' without the 
qualifiers).

>> So, I would vote to keep them separate (and tyo change the name of
>> externalType to dbType). I think it will be easier to work with,
>> and clearer for everyone.
>
> I'd really like to keep the former possibility, though. It can co-exist
> with yours, I've no problem with that.
>
> I won't however go crazy if we do not support it ! I suggest we forget
> about my proposal for 'VARCHAR(20)' and that we only consider explicit
> parameters width/precision/scale. Since it's not a hard thing to add, 
> we
> can safely forget it for the moment and add it later if this reveals to
> be a users' request.

I would agree with proceeding this way.

>   Now that I think of it, there's an underlying point that should be
>   discussed for clarification. For example a string/VARCHAR won't get
>   any defaults in the conversion process, unless explicitly stated in
>   the model itself (yours AString.default['width']) --same for NUMERIC,
>   etc. This way it makes it clear that a String requires a width, that 
> a
>   float requires precision and scale, etc., since we would get an
>   error if neither a default nor an explicit parameter is provided.

Yes, this syntax will make that association clearer. But the validation 
should
also point out if these are missing.

>   What do you think about this? In fact, I do not even know what a good
>   default for width, scale or precision can be. Different DB have
>   different defaults, and I don't think it's worth registering/tracking
>   them.

Have no idea what reasonable defaults should be...
I feel their main purpose is to make things less error-prone, especially
in the beginning. Programs should probably specify them anyway,
when they know what is best for the program (and, should probably
not rely on default values set by the framework.)

>>>   - Again, same for an entity's parent which could be specified
>>>     along with the entity's name: 'Executive(Employee)'
>>
>> Same. I vote to keep separate, and to rename parentEntity to isAlso,
>> e.g.  Entity('Executive', isAlso='Employee', ...
>
> I know this is a detail, but I would have sponteanously shortened it to
> 'parent'. 'isAlso' vaguely recalls me something, but I don't know
> what... It sounds familiar. Is this UML jargon?

Not UML -- it is just what i think of (isAlso) when i think of 
inheritance...
But parent is just fine.

>>>   - What about the possibility to add a '*' to an attribute's name
>>> when it's
>>>     required? (may be same for a relationship, equivalent to lower
>>> bound==1)
>>
>> Same for adding '*' to att name
>
> Ok, again, let's forget it, I added it for completeness but didn't
> really like it either. The explicit parameter 'required' is enough.
>
> Speaking of this term, maybe it's not clear enough: I observed people
> being surprised that '' (empty string) is a valid value for a field 
> that
> is required because they misunderstood it, while it only tells whether
> the attribute can be None (python) / NULL (SQL). In the Attribute API
> there's also the counterpart 'allowsNone', what do you think?

Hmmn, i feel isRequired is more logical, while allowsNone is more
implementation-oriented -- but quite clear. The problem with it is that
it is a double negative (required => allowsNone = No) which i always
find irritating. Plus, should one day a formal expression of what values
are allowed for an attribute are also specified in the model, then 
isRequired
will lose the confusion potential you mention, e.g. if the model 
supports
a regexp for an attribute (to which all values must match) then 
isRequired
will clearly mean that a value respecting this constraint must be 
present.

Which is a good opportunity to propose the feature request of being 
able to
specify a regexp for an attribute, against which the framework will
validate all values to be assigned to the attribute (and which does not
exclude providing other declarations to use for data validation for any
attribute, such as type).

...

>> See the defaults for sourceAttribute and destinationAttribute in
>> RToOne and RToMany... Also, see the explicit declaration of the
>> foreign keys (with the possibility of automatic the name for it).
>
> I'm afraid the naming scheme you propose won't make it. In fact, I was
> thinking of automating the simple case, where only one association
> exists between two entities. In this case, the foreign key only needs 
> to
> be named 'fk<destinationEntityName>' ; this makes it easy for one
> relationship and its possible inverse to actually refer to the same
> source/destination attributes in their joins.
>
> Now a more complex example: suppose you want to model something like
> this:
>
>       A <------->> B
>         <------->>
>
>   (this comes from a "real-life" example, two relationships from A to B
>    having a different semantics)
>
> Suppose we first process A's relationships: this will create two
> different FK in B pointing to A's PK. Now we process entity B: there is
> no way to identify which relationship coming from A is the inverse of a
> toOne from B. Hence, this can be automated *if and only if* we also 
> have
> the keyword 'inverse' in the declaration of a relationship.
>
>   BTW, this is where I think automation will not be a real gain and 
> will
>   not be worth the effort, because that sort of model is really not
>   showing up frequently, and because when it shows up you usually want
>   to have names for foreign keys more explicit than
>   fk<destinationEntityName><count>
>
>   (You stated:
>    fk<destinationEntityName><destinationAttributeName><count> but since
>    the destAttr. will always be the same for a given dest. entity (the
>    PK), I removed it here)
>
>>> This could be as handy as the feature you submitted in your proposal:
>>> automatic definition parent properties which are not overriden in
>>> sub-entities.
>>
>> Yes. Auto generation of foreign keys from a relationship definition 
>> does
>> make sense in fact... but the possibility to declare them explicitly
>> (with
>> the desired values for their attributes) should always be there. Since
>> such explicit declaration can be very simple, I prefer to leave it as
>> must be explicit.
>
> My opinion here is:
>
>   - make it possible to automate the simple case (PK and FK), as
>     explained above,
>
>   - always keep the possibility to explicitly write every single
>     properties in models, entities, etc.

Here i think the second point is more important than the first, and will
not limit unneccessarily what the framework can handle. The first point
is nice, maybe, but since these would be so simple to declare, there is
not so much to gain.

> You might think that I'm definitely reluctant to impose explicit
> declarations (such as for the FK). I think I am but it's not a
> religion ;)

Well, i can understand that in an ideal world the "db details" are
all handled automatically, and inded you cannot get to them. But unless
full automation can be guaranteed, then going this route may cause
more problems than it solves. And, will not handle existent databases.
"Optional automation" is OK, but i feel one should always be able to
override it.

On the same line, i also feel that an object id should optionally be
manually set by the client code.In 90% (or more) of the time you just 
want
it automatic, but sometimes you may want to control it (for db 
optimization
or whatever other reasons).

> In fact, it's really a matter of usual practices in E-R modeling. This
> is not only /my/ practice, but the way most people seem to deal with
> entity-relationship modeling --they do not care about the db-schema
> PK/FK details unless they need to, for example when defining a model 
> for
> an existing model, or when some complex design is required (just like
> the one I described above).
>
>   Obviously this observation was not made on that framework (there are
> still not enough people using it for such an observation), but it is
> what I observed from the community using the original Apple's 
> Enterprise
> Object Framework(tm).
>
> I'm possibly wrong and maybe I just think that people do not care
> because most of the times *I* usually don't care. In the ZModeler I
> added the ability to make the two relationships necessary for a
> one-to-many association at a single-click reach for that very reason:
> it's a valuable feature to my own eyes, and I think it may be for
> other's.

I agree that *most* of the time you want all these things to be managed
automatically. But, again, as long as full automation cannot be 
guaranteed,
one should always allow the possibility to do whatever the db allows you
to, as long as one takes responsibility for it. This is also similar to 
python's
way of doing things (that of not "forcing" that variables are private 
for example)
and has turned out to be one of the strong points of the language (that 
whatever
unforeseen situation may arise, you are not locked in).

>   Last comment on this: I do not really like the idea of imposing the 
> FK
>   to be declared while having a implicit default for the PK, it sounds
>   inconsistent to me.

Yes, here i agree. I am fine with not requiring that FKs are declared
explicitly, and the framework handles their definition automatically.
But the framework should also recognize if the FKs are declared and
and create them accordingly.

...

>> Oh, and some other changes I have not mentioned here are listed in
>> the top of PyModel_3_mr.py, but mostly name changes, and having
>> entities have only one list, that mixes attributes and relations...
>> (your opinion?).
>
> * deleteRule -> delete: ok
>
> * multiplicity lower/upper bound -> multiplicity as a python list: ok
>
> * entity only has attributes that may also be relations: I'm ok for the
>   principle, however i would prefer an distinct term for this, 
> relations
>   being attributes could be a source of confusion. Maybe 'properties' ?

OK for me.

>   More comment on this: having attributes and relations mixed in a
>     common list will require extra-checks when analysing the model to
>     perform the possibly needed automated steps. This is because we 
> will
>     need to have all the entities and their attributes loaded and
>     processed before we can make a decision on whether a declared
>     relation needs automatic generation of FK. We will need to separate
>     attributes from relations anyway.

When processing atts and rels, the logic would be to assign them to 2 
different
arrays, and work with those arrays.

>   I do not mean this is complex: it's not. However I'm not sure this
>   enhance readability. In the sample model your clearly first declared
>   attributes, than relationships, and I bet everyone will do it that
>   way, because it would be a real mess to have some attrs, than rels,
>   than attrs, etc.

Well, some may prefer to group in some other logical way than by type...
But, my main reason for this is that have to deal with extra list 
brackets inside
the Entity constructor may be avoided, making the code a little more
readable.

> * misc.: at some point in the discussion the FK were made class
>          properties in the py-model, but they should not!

Ooops, mistake.

>
> Regarding PyModel_3_mr.py:
>
>   - I'm not sure we want a default for entity.adaptorName

You mean model? OK, no problem.

>   - I'd add a APrimaryKey subclass of Attribute

OK.

The updated sample (4) is below, but you can also get to it, along with 
the
new PyModel version (4) from http://ruggier.dyndns.org:8080/PyModel/

Cheers, mario

'''
A sample Pythonic OO-RDB Model
(re-expressing testPackages/StoreEmployees/model_StoreEmployees.xml) --
A PyModel must define a global variable called 'model', that is of type 
Model
'''

from PyModel import *

##
# Set preferred defaults for this model (when different from
# standard defaults, or if we want to make things explicit)

AInteger.defaults['precision'] = 10
AString.defaults['width'] = 20

RToOne.defaults['delete'] = 'cascade'
RToOne.defaults['multiplicity'] = [0,1]
RToOne.defaults['sourceAttribute'] = RToOne.attNameFromRel # 
'fk'+destEnt+destAtt+count
RToOne.defaults['destinationAttribute'] = 'id'

RToMany.defaults['delete'] = 'deny'
RToMany.defaults['multiplicity'] = [0,None]
RToMany.defaults['sourceAttribute'] = 'id'
RToMany.defaults['destinationAttribute'] = RToMany.attNameFromRel # 
fk+destEnt+sourceAtt+count

# Note that Relation.attNameFromRel is a callable, that calculates the 
att name
# from the indicated pieces (where count to distinguish between 
multiple relations
# between same source and target

Entity.defaults['properties'] = [
     APrimaryKey('id', isClassProperty=0, isRequired=1)
]

##

_connDict = {}
model = Model('StoreEmployees',connDict=_connDict)
model.entities = [

     #
     Entity('Store',
         properties=[
             AString('corporateName', isRequired=1),
             RToMany('employees', 'Employee')
         ]
     )

     # Employee and its subclasses SalesClerk and Executive
     Entity('Employee',
         properties=[
             AString('lastName', isRequired=1, usedForLocking=1),
             AString('firstName', isRequired=1, width=50, 
usedForLocking=1),
             AForeignKey(Attribute.fk('Store'), isClassProperty=0),
             RToMany('toAddresses', 'Address', delete='cascade'),
             RToOne('toStore', 'Store')
         ]
     )

     Entity('SalesClerk', parent='Employee',
         properties=[
             AString('storeArea')
         ]
     ),

     Entity('Executive', parent='Employee',
         properties=[
             AString('officeLocation', width=5),
             RToMany('marks', 'Mark', delete='cascade')
         ]
     ),

     #
     Entity('Address',
         properties=[
             AString('street', width=80),
             AString('zipCode'),
             AString('town', width=80),
             AForeignKey(Attribute.fk('Employee'), isClassProperty=0),
             RToMany('toEmployee', 'Employee', delete='deny')
         ]
     ),

     #
     Entity('Mark',
         properties=[
             AInteger('month', isRequired=1),
             AInteger('mark', isRequired=1),
             AForeignKey(Attribute.fk('Executive'), isClassProperty=0),
             RToOne('executive', 'Executive' ]
         ]
     )

]

if __name__ == '__main__':
     print model.validate()
     print model.toXML()
     # plus whatever ...

##