From: Sidnei da S. <si...@aw...> - 2004-01-20 16:39:34
|
On Tue, Jan 20, 2004 at 03:37:50AM +0000, Ian Bicking wrote: <snip> | from SQLObject import EditingContext | ec = EditingContext() | # every editing context automatically picks up all the SQLObject | # classes, all magic like. I would like to see two things here: + Being able to pass a list of class names to be picked + Being able to pass a class and pick the class plus its dependencies (eg: ForeignKey's) There's no much reason for picking all classes if you are only to use one or two. | person = ec.Person.get(1) # by ID | ec2 = EditingContext() # separate transaction | person2 = ec.Person.get(1) | assert person is not person2 | assert person.id == person2.id | assert person.fname == 'Guy' | person.fname = 'Gerald' | assert person2.fname == 'Guy' | ec.commit() # SQL is not sent to server When SQL is sent to server? | assert person2.fname == 'Guy' # Doesn't see changes | person2.fname = 'Norm' | # raises exception if locking is turned on; overwrites if locking | # is not turned on. (Locking enabled on a per-class level) | | I'm not at all sure about that example. Mostly the confusing parts | relate to locking and when the database lookup occurs (and how late a | conflict exception may be raised). The example is ok, I'm also curious about how locking would be handled. | Somewhere in here, process-level transactions might fit in. That is, | even on a backend that doesn't support transactions, we can still | delay SQL statements until a commit/rollback is performed. In turn, | we can create temporary "memory" objects, which is any object which | hasn't been committed to the database in any way. To do this we'll | need sequences -- to preallocate IDs -- which MySQL and SQLite don't | really provide :( | | Nested transactions...? Maybe they'd fall out of this fairly easily, | especially if we define a global context, with global caches etc., | then further levels of context will come for free. | | We still need to think about an auto-commit mode. Maybe the global | context would be auto-commit. As long as there's a way to disable auto-commit if you don't want it, it seems ok. Nested transactions would be particularly useful for me. | Caching | ------- | | Really doing transactions right means making caching significantly | more complex. If the cache is purely transaction-specific, then we'll | really be limiting the effectiveness of the cache. With that in mind, | a copy-on-write style of object is really called for -- when you fetch | an object in a transaction, you can use the globally cached instance | until you write to the object. | | Really this isn't copy-on-write, it's more like a proxy object. Until | the object is changed, it can delegate all its columns to its global | object for which it is a proxy. Of course, traversal via foreign keys | or joins must also return proxied objects. As the object is changed | -- perhaps on a column-by-column basis, or as a whole on the first | change -- the object takes on the personality of a full SQLObject | instance. I like the proxy idea a lot. +1 on it. | When the transaction is committed, this transactional object copies | itself to the global object, and becomes a full proxy. These | transactional caches themselves should be pooled -- so that when | another transaction comes along you have a potentially useful set of | proxy objects already created for you. This is a common use case for | web applications, which have lots of short transactions, which are | often very repetitive. | | In addition to this, there should be more cache control. This means | explicit ways to control things like: | | 1. Caching of instances: | + Application/process-global definition. | + Database-level definition. | + Transaction/EditingContext-level definition. | + Class-level definition. | 2. Caching of columns: | + Class-level. | 3. Cache sweep frequency: | + Application/process-global. | + Database-level. | + Class-level. | + Doesn't need to be as complete as 1; maybe on the class level you | could only indicate that a certain class should not be sweeped. | + Sweep during a fetch (e.g., every 100 fetches), by time or fetch | frequency, or sweep with an explicit call (e.g., to do sweeps in | a separate thread). | 4. Cache sweep policy: | + Maximum age. | + Least-recently-used (actually, least-recently-fetched). | + Random (the current policy). | + Multi-level (randomly move objects to a lower-priority cache, | raise level when the object is fetched again). | + Target cache size (keep trimming until the cache is small | enough). | + Simple policy (if enough objects qualify, cache can be of any | size). | + Percentage culling (e.g., kill 33% of objects for each sweep; | this is the current policy). There's one cache algorithm that was tried on ZODB and I think gave one of the best hit rates. If I still recall it was called 'Thor' or something similar. It takes the object size and number of hits into account to keep the memory footprint to a minimal while still doing a good job at caching both big and small objects. | 5. Batching of updates (whether updates should immediately go to the | database, or whether it would be batched until a commit or other | signal). | 6. Natural expiring of objects. Even if an object must persist | because there are still references, we could expire it so that | future accesses re-query the database. To avoid stale data. Maybe replacing the object in the cache by a proxy? :) | Expose some methods of the cache, like getting all objects currently | in memory. These would probably be exposed on a class level, e.g., | all the Addresses currently in memory via | ``Address.cache.current()`` or something. What about when there's a | cached instance in the parent context, but not in the present | transaction? I think the cached instance in the parent context should show up together with objects cached in the transaction. | Columns as Descriptors | ---------------------- | | Each column will become a descriptor. That is, ``Col`` and subclasses | will return an object with ``__get__`` and ``__set__`` methods. The | metaclass will not itself generate methods. | | A metaclass will still be used so that the descriptor can be tied to | its name, e.g., that with ``fname = StringCol()``, the resultant | descriptor will know that it is bound to ``fname``. | | By using descriptors, introspection should become a bit easier -- or | at least more uniform with respect to other new-style classes. | Various class-wide indexes of columns will still be necessary, but | these should be able to remain mostly private. | | To customize getters or setters (which you currently do by defining a | ``_get_columnName`` or ``_set_columnName`` method), you will pass | arguments to the ``Col`` object, like:: | | def _get_name(self, dbGetter): | return dbGetter().strip() | | name = StringCol(getter=_get_name) | | This gets rid of ``_SO_get_columnName`` as well. We can | transitionally add something to the metaclass to signal an error if a | spurious ``_get_columnName`` method is sitting around. Yay! You dont know how much I've been missing this one. :) | Construction and Fetching | ------------------------- | | Currently you fetch an object with class instantiation, e.g., | ``Address(1)``. This may or may not create a new instance, and does | not create a table row. If you want to create a table row, you do | something like ``Address.new(city='New York', ...)``. This is | somewhat in contrast to normal Python, where class instantiation | (calling a class) will create a new object, while objects are fetched | otherwise (with no particular standard interface). | | To make SQLObject classes more normal in this case, ``new`` will | become ``__init__`` (more or less), and classes will have a ``get`` | method that gets an already-existant row. E.g., ``Address.get(1)`` | vs. ``Address(city='New York', ...)``. This is perhaps the most | significant change in SQLObject usage. Because of the different | signatures, if you forget to make a change someplace you will get an | immediate exception, so updating code should not be too hard. +1 all the way. Because of this, I had to special-case object creation for SQLObject in my zope3-based app. /me sees lots of crufty code going away and feels happy | Extra Table Information | ----------------------- | | People have increasingly used SQLObject to create tables, and while it | can make a significant number of schemas, there are several extensions | of table generation that people occasionally want. Since these occur | later in development, it would be convenient if SQLObject could grow | as the complexity of the programs using it grow. Some of these | extensions are: | | * Table name (``_table``). | * Table type for MySQL (e.g., MyISAM vs. InnoDB). | * Multi-column unique constraints. (Other constraints?) | * Indexes. (Function or multi-column indexes?) | * Primary key type. (Primary key generation?) | * Primary key sequence names (for Postgres, Firebird, Oracle, etc). | * Multi-column primary keys. | * Naming scheme. | * Permissions. | * Locking (e.g., optimistic locking). | * Inheritance (see Daniel Savard's recent patch). | * Anything else? * Enforcing constraints in python. Brad B. was chatting to me on irc yesterday and we came to agree on a api. He's writing a proposal (with a patch) and its going to present it soon. Basically, when you create a column you would provide a callable object as a keyword 'constraint' parameter. This constraint would then be used to enforce some restrictions. def foo_constraint(obj, name, value, values=None): # name is the column name # value is the value to be set for this column # values is a dict of the values to be set for other columns # in the case you are creating an object or modifying more than # one column at a time # returns True or False age = IntCol(constraint=foo_constraint) class Col: def __init__(self, name, constraint): self.name = name self.constraint = constraint def __set__(self, obj, value): if not self.constraint(obj, name, value): raise ValueError, value # Set the value otherwise | Some of these may be globally defined, or defined for an entire | database. For example, typically you'll want to use a common MySQL | table type for your entire database, even though its defined on a | per-table basis. And while MySQL allows global permission | declarations, Postgres does not and requires tedious repetitions of | the permissions for each table -- so while it's applied on a per-table | basis, it's likely that (at least to some degree) a per-database | declaration is called for. Naming schemes are also usually | database-wide. | | As these accumulate -- and by partitioning this list differently, the | list could be even longer -- it's messy to do these all as special | class variables (``_idName``, etc). It also makes the class logic and | its database implementation details difficult to distinguish. Some | of these can be handled elegantly like ``id = StringCol()`` or ``id | = ("fname", "lname")``. But the others perhaps should be put into a | single instance variable, perhaps itself a class:: | | class Address(SQLObject): | class SQLMeta: | mysqlType = 'InnoDB' | naming = Underscore | permission = {'bob': ['select', 'insert'], | 'joe': ['select', 'insert', 'update'], | 'public': ['select']} | street = StringCol() | .... | | The metadata is found by its name (``SQLMeta``), and is simply a | container. The class syntax is easier to write and read than a | dictionary-like syntax. Or, it could be a proper class/instance and | provide a partitioned way to handle introspection. E.g., | ``Address.SQLMeta.permission.get('bob')`` or | ``Address.SQLMeta.columns``. In this case values that weren't | overridden would be calculated from defaults (like the default naming | scheme and so on). +1 on it being an instance and providing introspection. -1 on it being a class. | I'm not at all certain about how this should look, or if there are | other things that should go into the class-meta-data object. I can't think of something missing for now. | Joins, Foreign Keys | ------------------- | | First, the poorly-named ``MultipleJoin`` and ``RelatedJoin`` (which | are rather ambiguous) will be renamed ``ManyToOneJoin`` and | ``ManyToManyJoin``. ``OneToOneJoin`` will also be added, while | ``ForeignKey`` remains the related column type. (Many2Many? | Many2many? many2many?) | | ForeignKey will be driven by a special validator/converter. (But will | this make ID access more difficult?) | | Joins will return smart objects which can be iterated across. These | smart objects will be related to ``SelectResults``, and allow the | same features like ordering. In both cases, an option to retrieve | IDs instead of objects will be allowed. | | These smarter objects will allow, in the case of ManyToManyJoin, | ``Set`` like operations to relate (or unrelate) objects. For | ManyToOneJoin the list/set operations are not really appropriate, | because they would reassign the relation, not just add or remove | relations. | | It would be nice to make the Join protocol more explicit and public, | so other kinds of joins (e.g., three-way) could be more accessible. Sounds pretty good. I haven't used much Joins, only ForeignKeys though. -- Sidnei da Silva <si...@aw...> http://awkly.org - dreamcatching :: making your dreams come true http://plone.org/about/team#dreamcatcher Machines that have broken down will work perfectly when the repairman arrives. |