Re: [Modeling-users] Re: Consistency among different processes?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Federico Heinz <fh...@vi...> wrote:
> On Thu, 2003-09-25 at 10:18, Sebastien Bigaret wrote:
> > To be precise, an object's row is cached as long as one instance for
> > this row is registered in one EC.
>=20
> OK... So the question WRT cache life seems to be "when does a row get
> deregistered from an EC?". It would seem reasonable to think that every
> time the EC gets a user-level fetch request (as opposed as a fetch
> request due to accessing a fault object), it clears its cache, since the
> application now is obviously interested in another set of objects
> instead of the ones already in memory.

In fact, there are two levels of caching.

1. Within EC:=20

  - you fetch an object obj1, and modify it,
=20=20
  - then you submit another query, which returns obj1 as well: there,
    you don't want to override the modification you've made but not
    saved.

2. The database's rows cache, held by Database, to which the framework
   refer for various tasks, such as: building fetched objects, computing
   the changes that needs to be forwarded to the database, etc.

> This could create problems if the
> application kept references to objects between fetches, but I'd argue
> that doing this is a Bad Thing to boot. Should the need arise for this
> kind of functionality (kind of hard to imagine, but life is weird), a
> method cumulativeFetch() could be added to the EditingContext, which
> fetches without clearing the cache first.

  When referring to fetch(), both mechanisms can be triggered:

1. -> if the object already exists in the EC, possibly modified, you'll
      get that one.

2. -> otherwise, the database cache is searched for the row, and if
      found, that one will be used instead.

(1.) is probably something you do not want to change,

(2.) can be annoying, and that's the situations where this is annoying
     that 'refresh' will address (and in addition to the default
     mechanism, it will allow you to do whatever you want through a
     specific delegate if the object actually changed, just like with
     optimistic locking)

  In fact, clearing the cache cannot be the default, just because you'd
  probably won't rely in the framework to modify the data in your back.
  Suppose, for example, that when fetching, an previously fetched object
  has been deleted in the meantime (by an other applications): what you
  the framework do? Should it take the responsability to delete the
  object in the EC that fetched the data? Or suppose you modified the
  relationships in the EC, and that when fetching these relationships
  have changed: discarding the data could lead to inconsistencies in the
  graph of objects, since most of the time, relationships have an
  inverse (and constitue a bi-directional UML associations). When you say
  that it is a bad thing to hold references to objects between fetching,
  you forget that the objects themselves actually hold ref. to others
  they are in relation with.

Now if you want to be sure to get fresh data (until 'refresh' for fetch
is available), you can make sure that the row is deregistered by calling
ec.dispose() on (each of) your EC. Be aware that this method invalidates
any object the EC hold and that it discards any updates/deletes/
etc. that are not saved yet. This also has a significant impact on
performances, since every objects will need to be refetched and rebuild.

  If it's not clear enough, feel free to ask for more ;) Maybe I'm not
  thinking/answering the right way, so if you have a specific example in
  mind that could help.

> > I see the point. This is the current status. Say you have two instances
> > (so, two adress spaces) with two ECs, ec1 and ec2. Both query and update
> > an object obj1.
>=20
> Your description matches what I figured, and I like the optimistic
> locking idea. Is the implementation of optimistic lockig scheduled any
> time soon? Ideas on how much effort it would entail to implement?

  Not now, but I can make a plan for it, say, this week-end if you wish.

> > (In fact, as the documentation says and as you noted, we currently also
> >  have this problem between two different ECs in the same address space
> >  --but this will be solved by delivering notifications to the
> >  appropriate objects)
>=20
> I must admit I'm kinda skeptic about the notification idea... Assume ec1
> and ec2 above are in the same address space now. When ec1 commits
> changes to object x, it can notify ec2 of this... but what is ec2 going
> to do with this information? If ec2 has uncommited changes to x, it has
> to resort to the same kind of logic that we'd use in the optimistic lock
> case. In the end, thus, the only thing we gain is that we skip a fetch.
> Not that this is not important performance-wise, the point I want to
> make is that notification alone does not solve the problem, we also need
> the optimistic lock resolution for it to work.

Agreed, just because the modifications could have been made by any
bash/perl/... script who won't post any modifications ;) Back on the
notifications, at least they could solve the case where the framework
runs in a single address space (this is the case in Zope, for example,
or in any threaded application) and an EC save changes that you'd like
to see appear in other ECs.

> > Now if you want two
> > different address spaces to be notified of changes made by the other
> > before any attempt to save the changes in the db, we would need a more
> > general notification mechanism which should be able to broadcast changes
> > through the network, but even then, I suspect this is a hard problem to
> > solve.
>=20
> This would be a nightmare, gazillions of things could go wrong, and they
> would certainly do so in the worst possible sequence. We don't want to
> pursue this.

  I really like the way you put it ;) and totally agree. In fact, this
  also applies to particular situations where the data can be changed by
  any mean outside the framework. Such situations require specific and
  specialized use-cases and actions, so this make sense I guess to leave
  it open (but we still need to provide the tool for handling them, such
  as refresh and optimistic locking).

> > Another cleaner solution (and maybe the only one that can be
> > guaranteed to be 100% safe) could be to explicitely lock the appropriate
> > row before any attempt to modify an object, and to release the lock only
> > after changes has been made --this is the so-called pessimistic locking
> > strategy.
>=20
> We could implement this as a method of persistent objects, thus x.lock()
> would perform a locking read on the row until the transaction's done or
> rolled back. Of course, this means that the programmer will have to take
> care of which objects to lock, but such is the fate of the pessimistic
> locking programmer :-)

  Yes, that could be done; this is in fact the very basis for automatic
  pessimistic locking: lock the object as sson as it is modified
  (binding lock() to willChange()), release the lock when it is saved
  and/or refaulted.

-- S=E9bastien.