[Openinteract-dev] Re: caching SPOPS objects

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I think there are really two pieces to the idea we have ...

(1) caching
(2) prevention of duplicate in memory objects

... where (1) has to do with avoiding going to the database for a 
fetch, and (2) has to do with making sure that if you fetch twice you 
get two variables pointing to the same object in memory.

I'm not sure if the Cache::Cache stuff addresses both or only (1). 
Guess it's time for me to go do some reading.

At 2:26 PM -0500 12/19/01, Chris Winters wrote:
>On Wed, 2001-12-19 at 13:22, Ray Zimmerman wrote:
>  > The issue we were concerned with first came up when we were deciding
>>  whether to auto-fetch an Account object with a TransactionItem which
>>  "has-a" Account. We realized that if you fetch a bunch of
>>  TransactionItems which reference the same Account id, you'll get a
>>  bunch of identical copies of the Account object in memory. It would
>>  be nice to just fetch one copy and then have future fetches just
>>  return a reference to that object.
>
>I hadn't really thought about in-process copies before. My original idea
>had been to just provide an interface to the Cache::Cache hierarchy
>since you'd get for free all sorts of good stuff -- and smarter people
>than me are worrying about how it all works.
>
>I assume the benefit of an in-process cache would be to have a change
>made to one object reflected in all of them. I'm a little wary of this,
>mostly because I tend to think in multi-process (mod_perl) terms.
>However... the Cache::MemoryBackend might be worth checking out.

Yes, I definitely view this as a benefit. The way I think about it is 
that there is only one "real" object - the one in the database. It's 
just that each process that fetches the object gets copies in memory 
that it can use to manipulate the "real" object. Using a cache like 
we're proposing makes sure each process only has a single copy.

IIRC, Tangram does not allow you to fetch more than one copy of an 
object in a single process.

>  > I believe this could easily be handled by modifying fetch() to stick
>>  objects in a cache when they're fetched and to always look to the
>>  cache first when doing a fetch.
>
>Yeah, that's one of the places where the hook is. Basically it goes like
>this:
>
>  fetch
>    - check cache and return if exist
>    - fetch object; exit on failure or non-retrieval
>    - add object to cache
>
>  save
>    - save object; exit on failure
>    - refresh (add/update) object in cache

I'm not sure this last step would be necessary if the object in cache 
is the only copy of the object in memory.

>  remove
>    - remove object; exit on failure
>    - remove object from cache
>
>Fairly straightforward. The main wrinkle with this is
>fetch_group/fetch_iterator. I modified fetch_group a while back to grab
>the objects in one pass rather than grabbing the ID of the objects and
>just calling fetch() for each of them.
>
>The latter is cleaner because we can have the caching and other logic in
>one place (fetch()), but it's a very inefficient use of a database --
>and I normally don't even think about speed :-)
>
>We could make this optional if you're using a cache.... But while I
>haven't thought about this much, my gut feeling is that the current
>design does not support this readily.

Another thing about fetch_group (and I assume fetch_iterator, which I 
haven't yet used) is that things get uglier when you start dealing 
with inheritance like we do in ESPOPS. You almost have to go back to 
getting IDs and then calling fetch (ESPOPS fetch_group still does 
this) since the objects returned may not all be of the same class. 
E.g. If I want to fetch all Vehicles owned by Bob, and he owns a Car 
and a Boat, I want the objects returned to be of type Car and Boat, 
not just of type Vehicle (the base class used to do the fetch_group).

I suppose you could make it smarter and fetch all of the Vehicles in 
one pass and then only re-fetch the objects which belong to a 
sub-class.

But yeah, fetch_group is likely to complicate the caching picture a 
bit, though hopefully not too much.

>It's not inconceivable that much of the security/caching stuff will get
>moved up a level, with the actual persistence getting called through
>callbacks.

Hmmm ... now that sounds interesting.

>  > The only complicated thing is keeping a reference count so that the
>>  object get's destroyed when the reference in the cache is the only
>>  one left.
>
>This is something I'd rather not deal with :-) If the Cache::* framework
>doesn't support this then I think we'd be better off working on that and
>making SPOPS work well with it.

I think I agree.

>Caching is one of those things that I really want to implement but I
>never get around to it because I don't have a pressing need. I also have
>a feeling it will necessitate some redesign (as mentioned above) which,
>while not a bad thing, can be time consuming :-)  (OTOH, there'd be the
>benefit of having more users and developers around...)
>
>Isn't the ambivalence oozing off this email? :-)

Yeah, you were supposed to have thought through all of this stuff and 
have definitive answers by now  :-)

>So if you guys have ideas, I'm all ears. Particularly if they're as
>well-written and thought-out as the object relationship stuff.

Thanks ... I'm not likely to have the time to do anything quite that 
complete for caching in the near future, but if I do have more 
concrete ideas, don't worry ... I'll pass them along :-)

-- 
  Ray Zimmerman  / e-mail: rz...@co... / 428-B Phillips Hall
   Sr Research  /   phone: (607) 255-9645  /  Cornell University
    Associate  /      FAX: (815) 377-3932 /   Ithaca, NY  14853