Re: [SQLObject] API Redesign for 0.6

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Jan 20, 2004 at 03:37:50AM +0000, Ian Bicking wrote:
<snip>
|    from SQLObject import EditingContext
|    ec = EditingContext()
|    # every editing context automatically picks up all the SQLObject
|    # classes, all magic like.

I would like to see two things here:

+ Being able to pass a list of class names to be picked
+ Being able to pass a class and pick the class plus its dependencies
(eg: ForeignKey's)

There's no much reason for picking all classes if you are only to use
one or two.

|    person = ec.Person.get(1) # by ID
|    ec2 = EditingContext() # separate transaction
|    person2 = ec.Person.get(1)
|    assert person is not person2
|    assert person.id == person2.id
|    assert person.fname == 'Guy'
|    person.fname = 'Gerald'
|    assert person2.fname == 'Guy'
|    ec.commit() # SQL is not sent to server

When SQL is sent to server?

|    assert person2.fname == 'Guy' # Doesn't see changes
|    person2.fname = 'Norm'
|    # raises exception if locking is turned on; overwrites if locking
|    # is not turned on.  (Locking enabled on a per-class level)
|
| I'm not at all sure about that example.  Mostly the confusing parts
| relate to locking and when the database lookup occurs (and how late a
| conflict exception may be raised).

The example is ok, I'm also curious about how locking would be handled.

| Somewhere in here, process-level transactions might fit in.  That is,
| even on a backend that doesn't support transactions, we can still
| delay SQL statements until a commit/rollback is performed.  In turn,
| we can create temporary "memory" objects, which is any object which
| hasn't been committed to the database in any way.  To do this we'll
| need sequences -- to preallocate IDs -- which MySQL and SQLite don't
| really provide :(
|
| Nested transactions...?  Maybe they'd fall out of this fairly easily,
| especially if we define a global context, with global caches etc.,
| then further levels of context will come for free.
|
| We still need to think about an auto-commit mode.  Maybe the global
| context would be auto-commit.

As long as there's a way to disable auto-commit if you don't want it,
it seems ok. Nested transactions would be particularly useful for me.

| Caching
| -------
|
| Really doing transactions right means making caching significantly
| more complex.  If the cache is purely transaction-specific, then we'll
| really be limiting the effectiveness of the cache.  With that in mind,
| a copy-on-write style of object is really called for -- when you fetch
| an object in a transaction, you can use the globally cached instance
| until you write to the object.
|
| Really this isn't copy-on-write, it's more like a proxy object.  Until
| the object is changed, it can delegate all its columns to its global
| object for which it is a proxy.  Of course, traversal via foreign keys
| or joins must also return proxied objects.  As the object is changed
| -- perhaps on a column-by-column basis, or as a whole on the first
| change -- the object takes on the personality of a full SQLObject
| instance.

I like the proxy idea a lot. +1 on it.

| When the transaction is committed, this transactional object copies
| itself to the global object, and becomes a full proxy.  These
| transactional caches themselves should be pooled -- so that when
| another transaction comes along you have a potentially useful set of
| proxy objects already created for you.  This is a common use case for
| web applications, which have lots of short transactions, which are
| often very repetitive.
|
| In addition to this, there should be more cache control.  This means
| explicit ways to control things like:
|
| 1. Caching of instances:
|   + Application/process-global definition.
|   + Database-level definition.
|   + Transaction/EditingContext-level definition.
|   + Class-level definition.
| 2. Caching of columns:
|   + Class-level.
| 3. Cache sweep frequency:
|   + Application/process-global.
|   + Database-level.
|   + Class-level.
|   + Doesn't need to be as complete as 1; maybe on the class level you
|     could only indicate that a certain class should not be sweeped.
|   + Sweep during a fetch (e.g., every 100 fetches), by time or fetch
|     frequency, or sweep with an explicit call (e.g., to do sweeps in
|     a separate thread).
| 4. Cache sweep policy:
|   + Maximum age.
|   + Least-recently-used (actually, least-recently-fetched).
|   + Random (the current policy).
|   + Multi-level (randomly move objects to a lower-priority cache,
|     raise level when the object is fetched again).
|   + Target cache size (keep trimming until the cache is small
|     enough).
|   + Simple policy (if enough objects qualify, cache can be of any
|     size).
|   + Percentage culling (e.g., kill 33% of objects for each sweep;
|     this is the current policy).

There's one cache algorithm that was tried on ZODB and I think gave
one of the best hit rates. If I still recall it was called 'Thor' or
something similar. It takes the object size and number of hits into
account to keep the memory footprint to a minimal while still doing a
good job at caching both big and small objects.

| 5. Batching of updates (whether updates should immediately go to the
|   database, or whether it would be batched until a commit or other
|   signal).
| 6. Natural expiring of objects.  Even if an object must persist
|   because there are still references, we could expire it so that
|   future accesses re-query the database.  To avoid stale data.

Maybe replacing the object in the cache by a proxy? :)

| Expose some methods of the cache, like getting all objects currently
| in memory.  These would probably be exposed on a class level, e.g.,
| all the Addresses currently in memory via
| ``Address.cache.current()`` or something.  What about when there's a
| cached instance in the parent context, but not in the present
| transaction?

I think the cached instance in the parent context should show up
together with objects cached in the transaction.

| Columns as Descriptors
| ----------------------
|
| Each column will become a descriptor.  That is, ``Col`` and subclasses
| will return an object with ``__get__`` and ``__set__`` methods.  The
| metaclass will not itself generate methods.
|
| A metaclass will still be used so that the descriptor can be tied to
| its name, e.g., that with ``fname = StringCol()``, the resultant
| descriptor will know that it is bound to ``fname``.
|
| By using descriptors, introspection should become a bit easier -- or
| at least more uniform with respect to other new-style classes.
| Various class-wide indexes of columns will still be necessary, but
| these should be able to remain mostly private.
|
| To customize getters or setters (which you currently do by defining a
| ``_get_columnName`` or ``_set_columnName`` method), you will pass
| arguments to the ``Col`` object, like::
|
|    def _get_name(self, dbGetter):
|        return dbGetter().strip()
|
|    name = StringCol(getter=_get_name)
|
| This gets rid of ``_SO_get_columnName`` as well.  We can
| transitionally add something to the metaclass to signal an error if a
| spurious ``_get_columnName`` method is sitting around.

Yay! You dont know how much I've been missing this one. :)

| Construction and Fetching
| -------------------------
|
| Currently you fetch an object with class instantiation, e.g.,
| ``Address(1)``.  This may or may not create a new instance, and does
| not create a table row.  If you want to create a table row, you do
| something like ``Address.new(city='New York', ...)``.  This is
| somewhat in contrast to normal Python, where class instantiation
| (calling a class) will create a new object, while objects are fetched
| otherwise (with no particular standard interface).
|
| To make SQLObject classes more normal in this case, ``new`` will
| become ``__init__`` (more or less), and classes will have a ``get``
| method that gets an already-existant row.  E.g., ``Address.get(1)``
| vs. ``Address(city='New York', ...)``.  This is perhaps the most
| significant change in SQLObject usage.  Because of the different
| signatures, if you forget to make a change someplace you will get an
| immediate exception, so updating code should not be too hard.

+1 all the way. Because of this, I had to special-case object creation
for SQLObject in my zope3-based app.

/me sees lots of crufty code going away and feels happy

| Extra Table Information
| -----------------------
|
| People have increasingly used SQLObject to create tables, and while it
| can make a significant number of schemas, there are several extensions
| of table generation that people occasionally want.  Since these occur
| later in development, it would be convenient if SQLObject could grow
| as the complexity of the programs using it grow.  Some of these
| extensions are:
|
| * Table name (``_table``).
| * Table type for MySQL (e.g., MyISAM vs. InnoDB).
| * Multi-column unique constraints. (Other constraints?)
| * Indexes.  (Function or multi-column indexes?)
| * Primary key type.  (Primary key generation?)
| * Primary key sequence names (for Postgres, Firebird, Oracle, etc).
| * Multi-column primary keys.
| * Naming scheme.
| * Permissions.
| * Locking (e.g., optimistic locking).
| * Inheritance (see Daniel Savard's recent patch).
| * Anything else?

* Enforcing constraints in python. Brad B. was chatting to me on irc
yesterday and we came to agree on a api. He's writing a proposal (with
a patch) and its going to present it soon. Basically, when you create
a column you would provide a callable object as a keyword 'constraint'
parameter. This constraint would then be used to enforce some
restrictions.

def foo_constraint(obj, name, value, values=None):
    # name is the column name
    # value is the value to be set for this column
    # values is a dict of the values to be set for other columns
    # in the case you are creating an object or modifying more than
    # one column at a time
    # returns True or False

age = IntCol(constraint=foo_constraint)

class Col:

  def __init__(self, name, constraint):
      self.name = name
      self.constraint = constraint

  def __set__(self, obj, value):
      if not self.constraint(obj, name, value):
         raise ValueError, value
      # Set the value otherwise

| Some of these may be globally defined, or defined for an entire
| database.  For example, typically you'll want to use a common MySQL
| table type for your entire database, even though its defined on a
| per-table basis.  And while MySQL allows global permission
| declarations, Postgres does not and requires tedious repetitions of
| the permissions for each table -- so while it's applied on a per-table
| basis, it's likely that (at least to some degree) a per-database
| declaration is called for.  Naming schemes are also usually
| database-wide.
|
| As these accumulate -- and by partitioning this list differently, the
| list could be even longer -- it's messy to do these all as special
| class variables (``_idName``, etc).  It also makes the class logic and
| its database implementation details difficult to distinguish.  Some
| of these can be handled elegantly like ``id = StringCol()`` or ``id
| = ("fname", "lname")``.  But the others perhaps should be put into a
| single instance variable, perhaps itself a class::
|
|    class Address(SQLObject):
|        class SQLMeta:
|            mysqlType = 'InnoDB'
|            naming = Underscore
|            permission = {'bob': ['select', 'insert'],
|                          'joe': ['select', 'insert', 'update'],
|                          'public': ['select']}
|        street = StringCol()
|        ....
|
| The metadata is found by its name (``SQLMeta``), and is simply a
| container.  The class syntax is easier to write and read than a
| dictionary-like syntax.  Or, it could be a proper class/instance and
| provide a partitioned way to handle introspection.  E.g.,
| ``Address.SQLMeta.permission.get('bob')`` or
| ``Address.SQLMeta.columns``.  In this case values that weren't
| overridden would be calculated from defaults (like the default naming
| scheme and so on).

+1 on it being an instance and providing introspection. -1 on it being
a class.

| I'm not at all certain about how this should look, or if there are
| other things that should go into the class-meta-data object.

I can't think of something missing for now.

| Joins, Foreign Keys
| -------------------
|
| First, the poorly-named ``MultipleJoin`` and ``RelatedJoin`` (which
| are rather ambiguous) will be renamed ``ManyToOneJoin`` and
| ``ManyToManyJoin``.  ``OneToOneJoin`` will also be added, while
| ``ForeignKey`` remains the related column type.  (Many2Many?
| Many2many?  many2many?)
|
| ForeignKey will be driven by a special validator/converter. (But will
| this make ID access more difficult?)
|
| Joins will return smart objects which can be iterated across.  These
| smart objects will be related to ``SelectResults``, and allow the
| same features like ordering.  In both cases, an option to retrieve
| IDs instead of objects will be allowed.
|
| These smarter objects will allow, in the case of ManyToManyJoin,
| ``Set`` like operations to relate (or unrelate) objects.  For
| ManyToOneJoin the list/set operations are not really appropriate,
| because they would reassign the relation, not just add or remove
| relations.
|
| It would be nice to make the Join protocol more explicit and public,
| so other kinds of joins (e.g., three-way) could be more accessible.

Sounds pretty good. I haven't used much Joins, only ForeignKeys though.

--
Sidnei da Silva <si...@aw...>
http://awkly.org - dreamcatching :: making your dreams come true
http://plone.org/about/team#dreamcatcher

Machines that have broken down will work perfectly when the repairman arrives.

Re: [SQLObject] API Redesign for 0.6

SQLObject is a Python ORM.

Re: [SQLObject] API Redesign for 0.6