From: Doug B. <dou...@gm...> - 2009-07-15 14:21:20
|
On Wed, Jul 15, 2009 at 9:48 AM, Gerald Britton <ger...@gm...>wrote: > On Wed, Jul 15, 2009 at 9:29 AM, Benny > Malengier<ben...@gm...> wrote: > > 2009/7/15 Gerald Britton <ger...@gm...>: > >> Thanks for the feedback! The motivation behind my idea is that I see > >> many patterns like this: > >> > >> for handle in self.db.iter_person_handles(): > >> person = self.db.get_person_from_handle(handle) > >> > >> I attached some grep output showing just how common this is. If I > >> write an iter_people method, this becomes: > >> > >> for handle, person in self.db.iter_people(); > > > > I think you should make it: > > > > for person in self.db.iter_people(); > > > > after all, person.handle, or person.get_handle(), give you the handle > then. > > > >> The other thing is that, the iter_... methods at the bottom of the > >> stack use the bsddb cursor, which returns a tuple (key, data) where > >> data is pickled(serialized(object)) in our application. That means > >> that the our cursor method already has the data and already unpickles > >> it. In the common pattern, the call get_person_from_handle goes out > >> to bsddb again to retrieve the data, then unpickles it (again!) and > >> finally unserializes it. Using the iter_people method that I am > >> proposing would eliminate this double work (it will be a measurable > >> difference and probably perceivable on larger databases.) > >> > >> Back to cursor methods for the moment. Some options: > >> > >> 1. Do nothing. Since the proxy database access must get to a real db > >> sometime as it dives down the stack, it will eventually hit the bsddb > >> cursor calls anyway. > >> > >> 2. Just implement iter_people (and other object) methods. Leave the > >> cursor as is. Modify existing code following the pattern above to use > >> the new methods and encourage others to work that way. > > > > ok, but how to assure the iter only returns objects that are in the > > proxy is the key I believe. > > yes, good point. The way I'm working on proxybase, the iter_ methods > look for predicates include_something and call these if defined to see > if an object should be returned or not. By default, the predicates > cause all non-null objects to be returned. I committed some > experimental code to living.py, private.py and referenced.py to > illustrate the idea, though it is by no means final. I could also > make the iter_methods take a predicate as a parameter like this: > > for handle, person in iter_people(lambda x: not x.get_privacy()): > # I only get people NOT marked private > > Of course the programmer could make the predicate do whatever is > required to get a True or False result. I got this idea from the > itertools.ifilter function, which I really like (and is very fast -- > very well written it seems). > I've been watching the discussion on the database interface, and this all looks great. I've been especially interested in seeing if these changes will allow an easier switch of the backend (say, to a SQL engine). The selection predicate is one place where (I suspect) there could be some optimization over the bsddb. I'm not sure what we can do here to allow these different kinds of backends. The above example of course implies that the selection is done in Python after the data is retrieved from the database. I thought I'd just mention it here in case you all have ideas. Thanks for all this work! -Doug > > This method should only be present if eg > > in privateproxy, really only the non private data is returned. > > However, as you know, privateproxy sanitizes the person object before > > returning it (to avoid eg private event reference). > > If you take care of all this in some easy manner, then by all means > > add these methods. Otherwise, I don't see how it can be achieved. > > Of course, if you add them to the base class, some people will want to > > use them in the reports in the future, as they appear to be read only > > .... > > > > Benny > >> 3. Implement proxy cursors as well. That would allow code written to > >> use a cursor directly to work on both real and proxy databases > >> > >> 4. ? > >> > >> After thinking about it overnight, I'm leaning towards option 2. > >> We'll then have both iter_something_handles and iter_somethings > >> methods but will encourage folks to use iter_somethings unless they > >> really need iter_something_handles. > >> > >> On Wed, Jul 15, 2009 at 4:10 AM, Benny > >> Malengier<ben...@gm...> wrote: > >>> 2009/7/15 Brian Matherly <br...@gr...>: > >>>> > >>>>> Here's something to consider: > >>>>> > >>>>> The "real" database has a cursor object for running through > >>>>> the > >>>>> objects in a database. Currently it serves up a tuple > >>>>> of (handle, > >>>>> unpickled data) and is working fine. Two things come > >>>>> to mind: > >>>>> > >>>>> 1. There is no cursor object for the proxy databases. > >>>>> I think that, for > >>>>> completeness, I should implement one. > >>>> > >>>> That's what I like about you. Not only do you suggest something that > should be added, but you volunteer yourself to do the work at the same time > :) > >>>> > >>>>> 2. What is the feeling about adding an option to the > >>>>> cursors to > >>>>> request that the data be objectified instead of just > >>>>> unpickled? That > >>>>> might let me do something like this: > >>>>> > >>>>> with db.get_person_cursor(object=True): > >>>>> for handle, person in cursor: > >>>>> #process the person object > >>>> > >>>> With "object=True", how would the function be different then your > iter_* functions? I can't see how they would be different. > >>>> > >>>> With the addition of your iter_* functions, I would be interested to > know if we even need the cursors any more. Do they still add real value? How > hard would it be to remove them? > >>> > >>> First, speed, speed, speed. > >>> > >>> The serialize only functions will be quite a bit faster for loops > >>> where only part of the data is needed and you can check on the tuples > >>> to discard or not. > >>> Combined with POS_ variables for all positions in the unpickled data, > >>> this is powerful. > >>> > >>> For tool writers objectified might be handy, but write clear docstring > >>> that it gives a penalty! > >>> If you have time, activate > >>> http://www.gramps-project.org/docs/gen/gen_db.html so that you see how > >>> the doc looks like. > >>> > >>> Second, iter works on a cursor no, so it cannot be removed, yes? > >>> > >>> Benny > >>> > >>> > ------------------------------------------------------------------------------ > >>> Enter the BlackBerry Developer Challenge > >>> This is your chance to win up to $100,000 in prizes! For a limited > time, > >>> vendors submitting new applications to BlackBerry App World(TM) will > have > >>> the opportunity to enter the BlackBerry Developer Challenge. See full > prize > >>> details at: http://p.sf.net/sfu/Challenge > >>> _______________________________________________ > >>> Gramps-devel mailing list > >>> Gra...@li... > >>> https://lists.sourceforge.net/lists/listinfo/gramps-devel > >>> > >> > >> > >> > >> -- > >> Gerald Britton > >> > > > > > > -- > Gerald Britton > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/Challenge > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |