Re: [Modeling-users] Re: Consistency among different processes?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Federico Heinz <fh...@vi...> wrote:
> The point I'm trying to make is that applications usually let the user
> perform transactions on a certain part of the database, and there's
> little point to keeping the data from previous transactions around.

Let me rephrase what your idea is, so that we're sure we're speaking of
the same thing: you consider that fetching is the preliminary phase,
then you modify objects (without fetching explicitely anymore), than you
save changes and finally discards the object(s) you fetched.

  If you want this, you can simply ec.dispose() after you've done with
  the changes, and the objects will be invalidated and the corresponding
  cached rows will be removed (I assume here that you only have one EC
  at a time). This way, you'll get the exact behaviour you're asking
  for.

> For example, imagine a simple app that allows you to delete, insert or
> modify the Author's info. The user of such an app will repeatedly look
> up an Author, change an attribute or two, commit the change, and start
> over again. When the user looks up the second Author, it doesn't make a
> lot of sense to keep the first Author's object around, does it? Yes, it
> is possible that the application will let the user navigate the second
> Author's books, and if the first Author has co-authored a book with the
> second, it might also let the user navigate back to the first Author...
> but I think it's affordable (and in some cases even desirable) that this
> navigation results in the first Author being fetched again from the
> database, instead of just fishing a likely stale copy from the cache.

Agreed, this is possible --but you must also understand that other
people might think differently, esp. wrt performance issues. Here we
just have a couple of objects, so this does not make much differences,
but if you need to work on a bigger set of objects that's another story.

  Speaking of perf., here are some figures I've just produced on my
  installation:

    - fetching 5000 simple objects (one attributes, a to-one and a
      to-many relationship, plus an ID) -> 4.77s

    - fetching the same 5000 objects while they are already fetched
      in the EC: 798 ms

  That's a gain factor of ~6x --just because in the 2nd case, objects
  were not re-created and re-populated with their values.

> On Thu, 2003-09-25 at 15:40, Sebastien Bigaret wrote:
> > In fact, there are two levels of caching.
> > 1. Within EC:=20
> >   - you fetch an object obj1, and modify it,
> >   - then you submit another query, which returns obj1 as well: there,
> >     you don't want to override the modification you've made but not
> >     saved.
>=20
> Hmmm... Well, this is the thing. I mean, what are you doing submitting
> another query while you still have uncommitted changes? I understand
> this is exactly the right behavior if you are traversing a series of
> relationships that brings you back to the original table/entity, but my
> understanding is that when the program says ec.fetch(...), it is
> actually stating that it's done with the results of the last fetch, and
> wants to start anew. So if you try to do a fetch while changes are
> pending, the EC should either commit the changes (don't like it) or
> raise an exception (much better)

Okay, let's go for a little illustration! I want to be able to fetch
while some changes are uncommitted, because I may be in the middle of a
modification process, and now, for example, I need to fetch some other
objects to add them to my initial object's relationships. And I want
that both the initial changes, and the one I make after the subsequent
fetch(es), are all done atomically wrt the db. For example, i may want
to fetch an author, get its books, remove all books from the author's
list, delete the author, fetch an other author based on any user
specification (no traversal here, but real fetch), assign the former
author's books to the latter's list of books, then, and only then, save
the changes.

> > 2. The database's rows cache, held by Database, to which the framework
> >    refer for various tasks, such as: building fetched objects, computing
> >    the changes that needs to be forwarded to the database, etc.
>=20
> I'm arguing that for most applications this cache ought to be flushed
> with each user-level fetch. I understand that, for single-user
> applications, longer-term caching can be a significance performance
> boost (although it may, as noted in the documentation, lead to memory
> footprint bloat if the application is not restarted regularly). In any
> other environment, I feel that the risk of the cache becoming stale
> seriously outweights performance concerns.

I think we both could argue endlessly ;) but I do understand your
point. All that I say is that both mechanisms should be supported. And
they are, actually.

[...]
> > 2. -> otherwise, the database cache is searched for the row, and if
> >       found, that one will be used instead.
> > (2.) can be annoying, and that's the situations where this is annoying
> >      that 'refresh' will address (and in addition to the default
> >      mechanism, it will allow you to do whatever you want through a
> >      specific delegate if the object actually changed, just like with
> >      optimistic locking)
>=20
> I agree that optimistic locking could make the long-term caching of rows
> workable. It will also, however, make conflicts more likely, because
> rows that have been longer in the database cache have a larger
> probability of becoming stale.
>=20
> >   In fact, clearing the cache cannot be the default, just because you'd
> >   probably won't rely in the framework to modify the data in your back.
> >   Suppose, for example, that when fetching, an previously fetched object
> >   has been deleted in the meantime (by an other applications): what you
> >   the framework do? Should it take the responsability to delete the
> >   object in the EC that fetched the data?
>=20
> If the database doesn't keep a long-term cache, the call to fetch() will
> not return the deleted row. If the deletion takes place after the
> fetch(), of course, we'll have to resort to the whole optimistic locking
> thing. Come to think of it, I think most of my argument rests upon the
> idea that it's desirable to minimize the likelyhood of optimistic
> locking conflict occurrence, which sound intuitively right to me, but I
> don't have any hard data to back it up.

That sounds a reasonable goal, but some others will argue that this is
not their priority number one. In other words, that's a
application-design decision, and I do not want to make this decision
within the framework but, again, I think we'd better offer the
developper the choice by giving him the tools. Agreed however, some of
these tools are still missing.

> >   Not now, but I can make a plan for it, say, this week-end if you wish.
>=20
> Well, that would be great! I'm trying (and still failing :-) ) to figure
> out which module does what in the framework, so an expert opinion on
> what would need to be done to get optimistic locking and vertical
> mapping working would be a wonderful thing.

Okay, I'll try hard to make this happen this week-end, then. BTW, I know
there should be documentation for the framework's architecture.
Hopefully this will be done one day, but the todo list is sooo long...

> > > I must admit I'm kinda skeptic about the notification idea...
> > Agreed, just because the modifications could have been made by any
> > bash/perl/... script who won't post any modifications ;) Back on the
> > notifications, at least they could solve the case where the framework
> > runs in a single address space (this is the case in Zope, for example,
> > or in any threaded application) and an EC save changes that you'd like
> > to see appear in other ECs.
>=20
> I'm saying that I don't see how notification could solve conflicts even
> in a single address space. If both ec1 and ec2 have pending
> modifications for obj1, and ec1 commits the change and notifies ec2...
> what will ec2 do with its changes?

You can, for example:

  - either ignore the changes, and rely on optimistic locking (this will
    probably be the default),

  - decide to examine the objects, apply the saved changes, then
    re-apply the uncommitted changes. Example: you modify a person's
    phone number, at this point you get a notification saying that the
    tel. number and the middle name has changed: you apply those
    changes, then re-apply the uncommitted changes for the phone number)
    [This can maybe be done automatically, although there are some
    subtle points when it comes to relationships --this needs to be
    investigated]

  - ask the user,

  - ... add your aplication requirements here ;)

You can think about these notifications as a mean to provide a specific
and application-specific behaviour for minimizing failures under
optimistic locking strategies, at least when in a single
address-space. Moreover, these notifications are really needed if for
any reason you ''choose'' the no-locking policy (which is the only
supported policy by now, and the reason why the User's Guide details the
problem when using one EC per session).

  Does all this make sense wrt your own claims & requirements?

-- S=E9bastien.