From: Ray Z. <rz...@co...> - 2001-12-19 18:25:06
|
Chris, How's the SPOPS development is coming? Raj Chandran and I have been talking about something that we think should be included in SPOPS. I know you have some sort of plans for caching, but I'm not sure what it includes and how much is currently implemented. The issue we were concerned with first came up when we were deciding whether to auto-fetch an Account object with a TransactionItem which "has-a" Account. We realized that if you fetch a bunch of TransactionItems which reference the same Account id, you'll get a bunch of identical copies of the Account object in memory. It would be nice to just fetch one copy and then have future fetches just return a reference to that object. I believe this could easily be handled by modifying fetch() to stick objects in a cache when they're fetched and to always look to the cache first when doing a fetch. The only complicated thing is keeping a reference count so that the object get's destroyed when the reference in the cache is the only one left. You'd also need a way to force a fetch to skip the cache ... I'm sure you've probably already thought about all of these issues already. What do you think? -- Ray Zimmerman / e-mail: rz...@co... / 428-B Phillips Hall Sr Research / phone: (607) 255-9645 / Cornell University Associate / FAX: (815) 377-3932 / Ithaca, NY 14853 |
From: Chris W. <ch...@cw...> - 2001-12-19 19:25:27
|
On Wed, 2001-12-19 at 13:22, Ray Zimmerman wrote: > Chris, > > How's the SPOPS development is coming? Not bad -- I keep getting sidetracked from the new object relationship stuff, but it's coming along. The next version will have some useful (I hope) import (data and structure) and export (data only) methods along with a couple other little items. > Raj Chandran and I have been talking about something that we think > should be included in SPOPS. I know you have some sort of plans for > caching, but I'm not sure what it includes and how much is currently > implemented. Right now there are hooks in DBI.pm (at least) to use caching if it's available. When the caching gets on the radar these will probably change... > The issue we were concerned with first came up when we were deciding > whether to auto-fetch an Account object with a TransactionItem which > "has-a" Account. We realized that if you fetch a bunch of > TransactionItems which reference the same Account id, you'll get a > bunch of identical copies of the Account object in memory. It would > be nice to just fetch one copy and then have future fetches just > return a reference to that object. I hadn't really thought about in-process copies before. My original idea had been to just provide an interface to the Cache::Cache hierarchy since you'd get for free all sorts of good stuff -- and smarter people than me are worrying about how it all works. I assume the benefit of an in-process cache would be to have a change made to one object reflected in all of them. I'm a little wary of this, mostly because I tend to think in multi-process (mod_perl) terms. However... the Cache::MemoryBackend might be worth checking out. > I believe this could easily be handled by modifying fetch() to stick > objects in a cache when they're fetched and to always look to the > cache first when doing a fetch. Yeah, that's one of the places where the hook is. Basically it goes like this: fetch - check cache and return if exist - fetch object; exit on failure or non-retrieval - add object to cache save - save object; exit on failure - refresh (add/update) object in cache remove - remove object; exit on failure - remove object from cache Fairly straightforward. The main wrinkle with this is fetch_group/fetch_iterator. I modified fetch_group a while back to grab the objects in one pass rather than grabbing the ID of the objects and just calling fetch() for each of them. The latter is cleaner because we can have the caching and other logic in one place (fetch()), but it's a very inefficient use of a database -- and I normally don't even think about speed :-) We could make this optional if you're using a cache.... But while I haven't thought about this much, my gut feeling is that the current design does not support this readily. It's not inconceivable that much of the security/caching stuff will get moved up a level, with the actual persistence getting called through callbacks. > The only complicated thing is keeping a reference count so that the > object get's destroyed when the reference in the cache is the only > one left. This is something I'd rather not deal with :-) If the Cache::* framework doesn't support this then I think we'd be better off working on that and making SPOPS work well with it. > You'd also need a way to force a fetch to skip the cache ... I think passing in 'skip_cache' is currently there already. If it's not it will be. > I'm sure you've probably already thought about all of these > issues already. What do you think? Caching is one of those things that I really want to implement but I never get around to it because I don't have a pressing need. I also have a feeling it will necessitate some redesign (as mentioned above) which, while not a bad thing, can be time consuming :-) (OTOH, there'd be the benefit of having more users and developers around...) Isn't the ambivalence oozing off this email? :-) So if you guys have ideas, I'm all ears. Particularly if they're as well-written and thought-out as the object relationship stuff. Later, Chris -- Chris Winters (ch...@cw...) Building enterprise-capable snack solutions since 1988. |
From: Ray Z. <rz...@co...> - 2001-12-20 13:59:52
|
I think there are really two pieces to the idea we have ... (1) caching (2) prevention of duplicate in memory objects ... where (1) has to do with avoiding going to the database for a fetch, and (2) has to do with making sure that if you fetch twice you get two variables pointing to the same object in memory. I'm not sure if the Cache::Cache stuff addresses both or only (1). Guess it's time for me to go do some reading. At 2:26 PM -0500 12/19/01, Chris Winters wrote: >On Wed, 2001-12-19 at 13:22, Ray Zimmerman wrote: > > The issue we were concerned with first came up when we were deciding >> whether to auto-fetch an Account object with a TransactionItem which >> "has-a" Account. We realized that if you fetch a bunch of >> TransactionItems which reference the same Account id, you'll get a >> bunch of identical copies of the Account object in memory. It would >> be nice to just fetch one copy and then have future fetches just >> return a reference to that object. > >I hadn't really thought about in-process copies before. My original idea >had been to just provide an interface to the Cache::Cache hierarchy >since you'd get for free all sorts of good stuff -- and smarter people >than me are worrying about how it all works. > >I assume the benefit of an in-process cache would be to have a change >made to one object reflected in all of them. I'm a little wary of this, >mostly because I tend to think in multi-process (mod_perl) terms. >However... the Cache::MemoryBackend might be worth checking out. Yes, I definitely view this as a benefit. The way I think about it is that there is only one "real" object - the one in the database. It's just that each process that fetches the object gets copies in memory that it can use to manipulate the "real" object. Using a cache like we're proposing makes sure each process only has a single copy. IIRC, Tangram does not allow you to fetch more than one copy of an object in a single process. > > I believe this could easily be handled by modifying fetch() to stick >> objects in a cache when they're fetched and to always look to the >> cache first when doing a fetch. > >Yeah, that's one of the places where the hook is. Basically it goes like >this: > > fetch > - check cache and return if exist > - fetch object; exit on failure or non-retrieval > - add object to cache > > save > - save object; exit on failure > - refresh (add/update) object in cache I'm not sure this last step would be necessary if the object in cache is the only copy of the object in memory. > remove > - remove object; exit on failure > - remove object from cache > >Fairly straightforward. The main wrinkle with this is >fetch_group/fetch_iterator. I modified fetch_group a while back to grab >the objects in one pass rather than grabbing the ID of the objects and >just calling fetch() for each of them. > >The latter is cleaner because we can have the caching and other logic in >one place (fetch()), but it's a very inefficient use of a database -- >and I normally don't even think about speed :-) > >We could make this optional if you're using a cache.... But while I >haven't thought about this much, my gut feeling is that the current >design does not support this readily. Another thing about fetch_group (and I assume fetch_iterator, which I haven't yet used) is that things get uglier when you start dealing with inheritance like we do in ESPOPS. You almost have to go back to getting IDs and then calling fetch (ESPOPS fetch_group still does this) since the objects returned may not all be of the same class. E.g. If I want to fetch all Vehicles owned by Bob, and he owns a Car and a Boat, I want the objects returned to be of type Car and Boat, not just of type Vehicle (the base class used to do the fetch_group). I suppose you could make it smarter and fetch all of the Vehicles in one pass and then only re-fetch the objects which belong to a sub-class. But yeah, fetch_group is likely to complicate the caching picture a bit, though hopefully not too much. >It's not inconceivable that much of the security/caching stuff will get >moved up a level, with the actual persistence getting called through >callbacks. Hmmm ... now that sounds interesting. > > The only complicated thing is keeping a reference count so that the >> object get's destroyed when the reference in the cache is the only >> one left. > >This is something I'd rather not deal with :-) If the Cache::* framework >doesn't support this then I think we'd be better off working on that and >making SPOPS work well with it. I think I agree. >Caching is one of those things that I really want to implement but I >never get around to it because I don't have a pressing need. I also have >a feeling it will necessitate some redesign (as mentioned above) which, >while not a bad thing, can be time consuming :-) (OTOH, there'd be the >benefit of having more users and developers around...) > >Isn't the ambivalence oozing off this email? :-) Yeah, you were supposed to have thought through all of this stuff and have definitive answers by now :-) >So if you guys have ideas, I'm all ears. Particularly if they're as >well-written and thought-out as the object relationship stuff. Thanks ... I'm not likely to have the time to do anything quite that complete for caching in the near future, but if I do have more concrete ideas, don't worry ... I'll pass them along :-) -- Ray Zimmerman / e-mail: rz...@co... / 428-B Phillips Hall Sr Research / phone: (607) 255-9645 / Cornell University Associate / FAX: (815) 377-3932 / Ithaca, NY 14853 |