|
From: Sidnei da S. <si...@aw...> - 2004-01-20 16:39:34
|
On Tue, Jan 20, 2004 at 03:37:50AM +0000, Ian Bicking wrote:
<snip>
| from SQLObject import EditingContext
| ec = EditingContext()
| # every editing context automatically picks up all the SQLObject
| # classes, all magic like.
I would like to see two things here:
+ Being able to pass a list of class names to be picked
+ Being able to pass a class and pick the class plus its dependencies
(eg: ForeignKey's)
There's no much reason for picking all classes if you are only to use
one or two.
| person = ec.Person.get(1) # by ID
| ec2 = EditingContext() # separate transaction
| person2 = ec.Person.get(1)
| assert person is not person2
| assert person.id == person2.id
| assert person.fname == 'Guy'
| person.fname = 'Gerald'
| assert person2.fname == 'Guy'
| ec.commit() # SQL is not sent to server
When SQL is sent to server?
| assert person2.fname == 'Guy' # Doesn't see changes
| person2.fname = 'Norm'
| # raises exception if locking is turned on; overwrites if locking
| # is not turned on. (Locking enabled on a per-class level)
|
| I'm not at all sure about that example. Mostly the confusing parts
| relate to locking and when the database lookup occurs (and how late a
| conflict exception may be raised).
The example is ok, I'm also curious about how locking would be handled.
| Somewhere in here, process-level transactions might fit in. That is,
| even on a backend that doesn't support transactions, we can still
| delay SQL statements until a commit/rollback is performed. In turn,
| we can create temporary "memory" objects, which is any object which
| hasn't been committed to the database in any way. To do this we'll
| need sequences -- to preallocate IDs -- which MySQL and SQLite don't
| really provide :(
|
| Nested transactions...? Maybe they'd fall out of this fairly easily,
| especially if we define a global context, with global caches etc.,
| then further levels of context will come for free.
|
| We still need to think about an auto-commit mode. Maybe the global
| context would be auto-commit.
As long as there's a way to disable auto-commit if you don't want it,
it seems ok. Nested transactions would be particularly useful for me.
| Caching
| -------
|
| Really doing transactions right means making caching significantly
| more complex. If the cache is purely transaction-specific, then we'll
| really be limiting the effectiveness of the cache. With that in mind,
| a copy-on-write style of object is really called for -- when you fetch
| an object in a transaction, you can use the globally cached instance
| until you write to the object.
|
| Really this isn't copy-on-write, it's more like a proxy object. Until
| the object is changed, it can delegate all its columns to its global
| object for which it is a proxy. Of course, traversal via foreign keys
| or joins must also return proxied objects. As the object is changed
| -- perhaps on a column-by-column basis, or as a whole on the first
| change -- the object takes on the personality of a full SQLObject
| instance.
I like the proxy idea a lot. +1 on it.
| When the transaction is committed, this transactional object copies
| itself to the global object, and becomes a full proxy. These
| transactional caches themselves should be pooled -- so that when
| another transaction comes along you have a potentially useful set of
| proxy objects already created for you. This is a common use case for
| web applications, which have lots of short transactions, which are
| often very repetitive.
|
| In addition to this, there should be more cache control. This means
| explicit ways to control things like:
|
| 1. Caching of instances:
| + Application/process-global definition.
| + Database-level definition.
| + Transaction/EditingContext-level definition.
| + Class-level definition.
| 2. Caching of columns:
| + Class-level.
| 3. Cache sweep frequency:
| + Application/process-global.
| + Database-level.
| + Class-level.
| + Doesn't need to be as complete as 1; maybe on the class level you
| could only indicate that a certain class should not be sweeped.
| + Sweep during a fetch (e.g., every 100 fetches), by time or fetch
| frequency, or sweep with an explicit call (e.g., to do sweeps in
| a separate thread).
| 4. Cache sweep policy:
| + Maximum age.
| + Least-recently-used (actually, least-recently-fetched).
| + Random (the current policy).
| + Multi-level (randomly move objects to a lower-priority cache,
| raise level when the object is fetched again).
| + Target cache size (keep trimming until the cache is small
| enough).
| + Simple policy (if enough objects qualify, cache can be of any
| size).
| + Percentage culling (e.g., kill 33% of objects for each sweep;
| this is the current policy).
There's one cache algorithm that was tried on ZODB and I think gave
one of the best hit rates. If I still recall it was called 'Thor' or
something similar. It takes the object size and number of hits into
account to keep the memory footprint to a minimal while still doing a
good job at caching both big and small objects.
| 5. Batching of updates (whether updates should immediately go to the
| database, or whether it would be batched until a commit or other
| signal).
| 6. Natural expiring of objects. Even if an object must persist
| because there are still references, we could expire it so that
| future accesses re-query the database. To avoid stale data.
Maybe replacing the object in the cache by a proxy? :)
| Expose some methods of the cache, like getting all objects currently
| in memory. These would probably be exposed on a class level, e.g.,
| all the Addresses currently in memory via
| ``Address.cache.current()`` or something. What about when there's a
| cached instance in the parent context, but not in the present
| transaction?
I think the cached instance in the parent context should show up
together with objects cached in the transaction.
| Columns as Descriptors
| ----------------------
|
| Each column will become a descriptor. That is, ``Col`` and subclasses
| will return an object with ``__get__`` and ``__set__`` methods. The
| metaclass will not itself generate methods.
|
| A metaclass will still be used so that the descriptor can be tied to
| its name, e.g., that with ``fname = StringCol()``, the resultant
| descriptor will know that it is bound to ``fname``.
|
| By using descriptors, introspection should become a bit easier -- or
| at least more uniform with respect to other new-style classes.
| Various class-wide indexes of columns will still be necessary, but
| these should be able to remain mostly private.
|
| To customize getters or setters (which you currently do by defining a
| ``_get_columnName`` or ``_set_columnName`` method), you will pass
| arguments to the ``Col`` object, like::
|
| def _get_name(self, dbGetter):
| return dbGetter().strip()
|
| name = StringCol(getter=_get_name)
|
| This gets rid of ``_SO_get_columnName`` as well. We can
| transitionally add something to the metaclass to signal an error if a
| spurious ``_get_columnName`` method is sitting around.
Yay! You dont know how much I've been missing this one. :)
| Construction and Fetching
| -------------------------
|
| Currently you fetch an object with class instantiation, e.g.,
| ``Address(1)``. This may or may not create a new instance, and does
| not create a table row. If you want to create a table row, you do
| something like ``Address.new(city='New York', ...)``. This is
| somewhat in contrast to normal Python, where class instantiation
| (calling a class) will create a new object, while objects are fetched
| otherwise (with no particular standard interface).
|
| To make SQLObject classes more normal in this case, ``new`` will
| become ``__init__`` (more or less), and classes will have a ``get``
| method that gets an already-existant row. E.g., ``Address.get(1)``
| vs. ``Address(city='New York', ...)``. This is perhaps the most
| significant change in SQLObject usage. Because of the different
| signatures, if you forget to make a change someplace you will get an
| immediate exception, so updating code should not be too hard.
+1 all the way. Because of this, I had to special-case object creation
for SQLObject in my zope3-based app.
/me sees lots of crufty code going away and feels happy
| Extra Table Information
| -----------------------
|
| People have increasingly used SQLObject to create tables, and while it
| can make a significant number of schemas, there are several extensions
| of table generation that people occasionally want. Since these occur
| later in development, it would be convenient if SQLObject could grow
| as the complexity of the programs using it grow. Some of these
| extensions are:
|
| * Table name (``_table``).
| * Table type for MySQL (e.g., MyISAM vs. InnoDB).
| * Multi-column unique constraints. (Other constraints?)
| * Indexes. (Function or multi-column indexes?)
| * Primary key type. (Primary key generation?)
| * Primary key sequence names (for Postgres, Firebird, Oracle, etc).
| * Multi-column primary keys.
| * Naming scheme.
| * Permissions.
| * Locking (e.g., optimistic locking).
| * Inheritance (see Daniel Savard's recent patch).
| * Anything else?
* Enforcing constraints in python. Brad B. was chatting to me on irc
yesterday and we came to agree on a api. He's writing a proposal (with
a patch) and its going to present it soon. Basically, when you create
a column you would provide a callable object as a keyword 'constraint'
parameter. This constraint would then be used to enforce some
restrictions.
def foo_constraint(obj, name, value, values=None):
# name is the column name
# value is the value to be set for this column
# values is a dict of the values to be set for other columns
# in the case you are creating an object or modifying more than
# one column at a time
# returns True or False
age = IntCol(constraint=foo_constraint)
class Col:
def __init__(self, name, constraint):
self.name = name
self.constraint = constraint
def __set__(self, obj, value):
if not self.constraint(obj, name, value):
raise ValueError, value
# Set the value otherwise
| Some of these may be globally defined, or defined for an entire
| database. For example, typically you'll want to use a common MySQL
| table type for your entire database, even though its defined on a
| per-table basis. And while MySQL allows global permission
| declarations, Postgres does not and requires tedious repetitions of
| the permissions for each table -- so while it's applied on a per-table
| basis, it's likely that (at least to some degree) a per-database
| declaration is called for. Naming schemes are also usually
| database-wide.
|
| As these accumulate -- and by partitioning this list differently, the
| list could be even longer -- it's messy to do these all as special
| class variables (``_idName``, etc). It also makes the class logic and
| its database implementation details difficult to distinguish. Some
| of these can be handled elegantly like ``id = StringCol()`` or ``id
| = ("fname", "lname")``. But the others perhaps should be put into a
| single instance variable, perhaps itself a class::
|
| class Address(SQLObject):
| class SQLMeta:
| mysqlType = 'InnoDB'
| naming = Underscore
| permission = {'bob': ['select', 'insert'],
| 'joe': ['select', 'insert', 'update'],
| 'public': ['select']}
| street = StringCol()
| ....
|
| The metadata is found by its name (``SQLMeta``), and is simply a
| container. The class syntax is easier to write and read than a
| dictionary-like syntax. Or, it could be a proper class/instance and
| provide a partitioned way to handle introspection. E.g.,
| ``Address.SQLMeta.permission.get('bob')`` or
| ``Address.SQLMeta.columns``. In this case values that weren't
| overridden would be calculated from defaults (like the default naming
| scheme and so on).
+1 on it being an instance and providing introspection. -1 on it being
a class.
| I'm not at all certain about how this should look, or if there are
| other things that should go into the class-meta-data object.
I can't think of something missing for now.
| Joins, Foreign Keys
| -------------------
|
| First, the poorly-named ``MultipleJoin`` and ``RelatedJoin`` (which
| are rather ambiguous) will be renamed ``ManyToOneJoin`` and
| ``ManyToManyJoin``. ``OneToOneJoin`` will also be added, while
| ``ForeignKey`` remains the related column type. (Many2Many?
| Many2many? many2many?)
|
| ForeignKey will be driven by a special validator/converter. (But will
| this make ID access more difficult?)
|
| Joins will return smart objects which can be iterated across. These
| smart objects will be related to ``SelectResults``, and allow the
| same features like ordering. In both cases, an option to retrieve
| IDs instead of objects will be allowed.
|
| These smarter objects will allow, in the case of ManyToManyJoin,
| ``Set`` like operations to relate (or unrelate) objects. For
| ManyToOneJoin the list/set operations are not really appropriate,
| because they would reassign the relation, not just add or remove
| relations.
|
| It would be nice to make the Join protocol more explicit and public,
| so other kinds of joins (e.g., three-way) could be more accessible.
Sounds pretty good. I haven't used much Joins, only ForeignKeys though.
--
Sidnei da Silva <si...@aw...>
http://awkly.org - dreamcatching :: making your dreams come true
http://plone.org/about/team#dreamcatcher
Machines that have broken down will work perfectly when the repairman arrives.
|