Greetings:

This is simply my 0.02 worth...

If this is so experimental of concept and use, then:

1) create a clone of trunk in a new repository
2) apply patches
3) create a great backup of your data, *.gpkg file)
4) cd to the clone
5) start gramps clone

I do not know if this is a stupid idea or not?

Sincerely yours,
Rob G. Healey

On Mon, Dec 28, 2009 at 2:07 PM, Gerald Britton <gerald.britton@gmail.com> wrote:
Hmmm.  So let's see if I am following this correctly:

1. We have a working data model today that is mostly monolithic and
groups data into the seven primary objects we are familiar with.

2. Due to the requirements of the bsddb backend, we use
pickle/unpickle to create valid data for bsddb to read and write

3. In order to prepare the data for pickle/unpickle, we have
serialize/unserialize functions for every object type that we store.

4. When Doug built up the Django implementation, he defined a schema
for gramps data, then built methods to move the data from the seven
primary objects into tables in the new schema and back again.

5. We're beginning to see that item 4 can lead to unnecessary overhead
when one only wants a piece of some object.

6. Because of 5, we're here brainstorming ways to only extract data
for presentation when needed -- thus lazily.

7. This is leading us to question the structure as a whole and begin
to think of ways to fit a proper relational schema into the current
codebase.

Is that about right?  If so, I have no objection to experimenting in
this direction.  I only want to be sure that the end result is
properly "aged" and completely thought through -- from new databases
to imports and exports to views and editors to reports and tools.

Let's just take the time to do it right!  Otherwise it really boils
down to the tail wagging the dog, so to speak.

On Mon, Dec 28, 2009 at 4:04 PM, Benny Malengier
<benny.malengier@gmail.com> wrote:
> Brian, Gerald,
>
> I sometimes think you are so looking at Gramps today, you do not see
> the shortcomings or the ugliness in some of it's design.
>
> Take serialize and unserialize in every gen.lib object. This is a
> clear connection between the database and our objects, and serves only
> one purpose: fast conversion from the bsddb layer to the objects.
>
> Both methods are ugly to see in the gen.lib API as they serve
> something that people who work with the objects don't see a reason
> for.
>
> Now, with django, we also have an SQL backend that stores our data.
> Not a fake one as Gerald is suggesting, but a real one. Serialize and
> unserialize don't work, there is no reason to even have those methods,
> trying to fit the SQL backend into serialize/unserialize is equal to
> trying to fit an elephant in a closet.
>
> We need something else. My suggestion is to remove
> serialize/unserialize from the objects in gen.lib and move it to an
> "engine", which is the link between the database and our objects,
> allowing for 'delayed access' as our objects in gen.lib are much more
> than a single 'dataobject' in the datamodel (also in the eyes of our
> users and developers).
>
> Now, where would be the speed increase and advantage for the present
> bsddb backend of this change:
>
> 1/if you eg filter on family name, you obtain first a person, then the
> name. All is unserialized, meaning also a MarkerType and a PrivacyType
> are made. The unserializing is fast, this useless class instantiation
> is 'expensive'. So this will lead up to a small benefit, useless for
> the editors, but very visible in some tools.
>
> 2/For the views, some things are too slow, so object creation is
> skipped, and only unserialize is done, with then a lookup in the
> unserialize, something like data[2][0]. Unreadable code, as you need
> to know 2 is eg the medialist, and 0 then the first item. This is done
> due to 1/ above in the code.
>
> Those are little benefits, we can do without. For an SQL backend
> however, doing it differently is a must. As there are now people who
> want to work on it, we should tackle this now.
>
> So the question is: how to use the design of gen.lib objects as today,
> while unserialize/serialize are _not_ a direct part of the objects. My
> answer to this is:
>
> 1/the database is called to create an object, eg a person:
> db.get_person_from_handle. Now the database has it easy, it calls the
> unserialize function of the object. With no unserialize in the object,
> this must be done differently
>
> 2/the database instantiates the empty objects, passing the unpickled
> data, the object uses an engine to handle that data. My idea is an
> engine that holds a pointer to the database, and fills up the direct
> data, and of the subobjects sets the key:
>  p = Person(datamap)
> where Person working on the bsddb engine does:
>  (self.handle, self.id, ...) =    Person.engine.unserialize_person(datamap)
> whereas the engine of the djangodb only stores keys.
>
> 3/when the object is asked for an attribute of the subobjects, the
> objects create the subobject if they are not present yet, using the
> engine, then returns the required value (from this the name
> delayedaccess), the object uses for this the pointer to the database
> and the key that is known.
>
> Specifically for bssdb:
> 1/ db.get_person_from_handle -> unpickle and pass to Person object.
> 2/ engine unserializes all data,  assigns to fields, for the type
> fields, sets the key to the unserialized type but does not yet make
> the Type classes
> 3/ when type is requested, eg person.marker, a MarkerType is created,
> assigned with the values from the key, and the marker returned
>
> For django this is much more complicated
> 1/db.get_person_from_handle -> the django model obtains the
> model.person object, and passes it to the Person object
> 2/the engine, using django model, sets the person data, all subobjects
> like noteref, mediaref, or objects of who the key is only known
> privacykey, markerkey, ...., are not yet retrieved
> 3/if the data is actually needed, the engine is used to obtain the
> rest of the data
>
> I think that is a clean way to handle this, and it will also for bsddb
> put the link between bsddb en our gen.lib objects in one single class,
> the bsddb engine.
>
> The start of a patch in attachment, but just some proof of concept,
> (un)serialize is still part of the objects, and a patch against rev
> 13409 in the geps013 branch, so old. I stopped with this attempt when
> the changes in trunk where too large to be merged back in geps013.
>
> Benny
>
> 2009/12/28 Doug Blank <doug.blank@gmail.com>:
>> On Mon, Dec 28, 2009 at 1:55 PM, Gerald Britton <gerald.britton@gmail.com>
>> wrote:
>>>
>>> Would I be mistaken if I said that the problem you are trying to solve
>>> only exists  because of the move to a relational structure?
>>
>> No, you would not be mistaken.
>>
>>>
>>> If so,
>>> why worry about lazy unserialization in the main code?  You might wind
>>> up with more efficient use of the relational structure at the cost of
>>> additional overhead for local BSD databases.  I really think we should
>>> take a giant step back here and ponder the future of persistent
>>> storage for gramps data.  Where should it be heading?   Would we move
>>> to rel db because of theoretical niceties or is that really the _only_
>>> way to deliver some specific and tangible benefit to the user?
>>
>> There are pros and cons, but there are a lot of tools for dealing with
>> relational data. Dealing with the db as a clearly defined layer (with
>> schema) is a bonus. Swapping out database backends, with scalable
>> replacements is a bonus. Integration with other relational tools is a bonus.
>> You could write some tools in some other language, once you free the data
>> from DbBsddb. Collaboration, multi-user, cross-platform, backwards and
>> future compatibility are all bonuses. None of this is theoretical.
>>
>>>
>>> Also:
>>>
>>> Before you write the create() method you propose, consider that the
>>> pickle protocol recognizes and calls two "magic" methods: __getstate__
>>> and __setstate__ that are analogous to our current
>>> serialize/unserialize methods.  I suppose that the magic methods were
>>> not available when Don began the project (that was Python 1.5 IIRC).
>>> If we were to use these as intended, we would not need the
>>> serialize/unserialize functions as they exist today since __getstate__
>>>  and __setstate__ would be called automatically from the pickle
>>> load/dump methods.
>>
>> Thanks, I didn't know about getstate/setstate. That might clean up the
>> interface a little; I presume that pickle.dumps(Person()) would just do the
>> right thing. However, either way, it looks like I'd have to extend every
>> gen.lib object to override the state methods. Benny's proposal would put an
>> engine in the middle of this so that one wouldn't have to duplicate all of
>> gen.lib.
>>
>> I don't see a way to reconcile these differences without implementing
>> Benny's engine idea, or just having two interfaces.
>>
>> -Doug
>>
>>>
>>>
>>>
>>> On Mon, Dec 28, 2009 at 1:29 PM, Doug Blank <doug.blank@gmail.com> wrote:
>>> > On Mon, Dec 28, 2009 at 12:49 PM, Gerald Britton
>>> > <gerald.britton@gmail.com>
>>> > wrote:
>>> >>
>>> >> I'm interested in lazy evals in general, but like Brian I wonder if we
>>> >> really have a problem to solve.  I also wonder why we don't just store
>>> >>  the seven primary objects by handle in Django the same way we do in
>>> >> BSDDB.  What is the advantage, especially in light of this thread, of
>>> >> breaking out the objects into individual tables?  Surely Django can
>>> >> take a picked object and store it by the handle we already use in
>>> >> gramps?  Can we not just treat Django like a storage engine and use it
>>> >> as we use BSDDB?  Granted we would not be exploiting Django's true
>>> >> power, but it might be easier for our project to use it that way.
>>> >> Should we be modifying gramps to conform to Django's models or looking
>>> >> for a way to use Django to store the data in the form we already know?
>>> >>
>>> >
>>> > Gerald,
>>> >
>>> > Good questions, and get to the heart of the issues. We do have a problem
>>> > to
>>> > solve only if we want to use a relational database. If we don't want to
>>> > allow a relational database to effectively use gen.lib, there is no
>>> > reason
>>> > to talk about any of this.
>>> >
>>> > Replicating the hierarchical structure of Gramps data in a relational
>>> > database would be an effective way to use gen.lib, but crazy on all
>>> > other
>>> > counts. It wouldn't be able to use any of the abstract, model-based
>>> > parts of
>>> > Django, which includes most of the reasoning for using Django in the
>>> > first
>>> > place.
>>> >
>>> > Gramps-Connect is using Django to exploit the power of relational
>>> > databases,
>>> > so, no, we won't be replicating Gramps hierarchical structure.
>>> >
>>> > gen.lib is pretty tightly integrated with a pickle-based, hierarchical
>>> > strorage backend. I was hoping to be able to take advantage of the Proxy
>>> > database wrappers, but it looks like it is just going to be too
>>> > expensive
>>> > from Django without some major plumbing changes (as per Benny's
>>> > outline).
>>> >
>>> > As we are hoping to have a read-only version of Gramps-Connect out about
>>> > the
>>> > same time as Gramps 3.2, I think we'll have two interfaces for
>>> > Gramps-Connect: a direct version replicating proxy logic, and the
>>> > DbDjango
>>> > interface for running reports. The second will be slow, but can be
>>> > generated
>>> > in the background, and the results emailed, or put into a results queue
>>> > on
>>> > the web.
>>> >
>>> > If no one objects, I think I will add the method create() for all
>>> > gen.lib
>>> > objects that wraps the Object().unserialize(data) idiom. That will be a
>>> > more
>>> > abstract way for gramps core developers to create gen.lib objects, and
>>> > useful to overload.
>>> >
>>> > -Doug
>>> >
>>> >>
>>> >> On Mon, Dec 28, 2009 at 11:19 AM, Doug Blank <doug.blank@gmail.com>
>>> >> wrote:
>>> >> > On Mon, Dec 28, 2009 at 10:40 AM, Brian Matherly
>>> >> > <brian@gramps-project.org>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Doug,
>>> >> >>
>>> >> >> Interesting stuff.
>>> >> >>
>>> >> >> > Benny and I had discussed the notion of applying lazy
>>> >> >> > evaluation a few
>>> >> >> > months ago, and I thought I'd make a little experiment to
>>> >> >> > see how this
>>> >> >> > would work. I've put a patch on:
>>> >> >> >
>>> >> >> > http://www.gramps-project.org/bugs/view.php?id=3476
>>> >> >> >
>>> >> >> > but this is highly experimental. I'll explain briefly the
>>> >> >> > ideas behind this.
>>> >> >>
>>> >> >> <snip>
>>> >> >>
>>> >> >> > Whenever you try to access some part of the lazy object, it
>>> >> >> > will
>>> >> >> > unserialize itself and give you the right part. With this
>>> >> >> > in place
>>> >> >> > this uses up far less memory, doesn't unserialize anything
>>> >> >> > it doesn't
>>> >> >> > need to, and doesn't do all of the work in a
>>> >> >> > constructor---unless it
>>> >> >> > has to. (You can also turn this on and off by redefining
>>> >> >> > lazy as a
>>> >> >> > function that returns the immediate evaluation of the
>>> >> >> > function on the
>>> >> >> > arguments---useful to see how many times you unserialize a
>>> >> >> > particular
>>> >> >> > type in a session.)
>>> >> >>
>>> >> >> I feel like you are always quick to come up with complicated
>>> >> >> solutions
>>> >> >> to
>>> >> >> problems that I don't even know exist. What exactly is the problem
>>> >> >> you
>>> >> >> are
>>> >> >> solving? Is access to the Gramps database too slow for some use case
>>> >> >> you
>>> >> >> have? Does this proposal actually solve that problem?
>>> >> >
>>> >> > I assure you that I am being as lazy as I can be :) The issue is
>>> >> > reducing
>>> >> > the amount of work in database lookups. This manifests itself in
>>> >> > unserializing unnecessary stuff in DbBsddb and in doing too many
>>> >> > queries
>>> >> > in
>>> >> > a relational layout, as per DbDjango. This isn't that big of an issue
>>> >> > (if
>>> >> > one at all) for DbBsddb. It might be faster using lazy evals. On the
>>> >> > otherhand, this is a killer for DbDjango.
>>> >> >
>>> >> >>
>>> >> >> With regards to your particular proposed implementation, the whole
>>> >> >> e-mail
>>> >> >> would have been much more useful if you provided benchmarks for us.
>>> >> >> Personally, I challenge your whole premise that lazy access will
>>> >> >> actually be
>>> >> >> faster for most users. Accessing a database is expensive. The more
>>> >> >> data
>>> >> >> you
>>> >> >> can get while you're in there, the better. Consider the EditPerson
>>> >> >> dialog.
>>> >> >> It can get pretty much everything it needs with one hit to the
>>> >> >> database.
>>> >> >> With your proposed solution, the EditPerson dialog will certainly
>>> >> >> feel
>>> >> >> slower (unless I'm misunderstanding something). Unserializing data,
>>> >> >> on
>>> >> >> the
>>> >> >> other hand, isn't that terribly expensive because it uses the Python
>>> >> >> C
>>> >> >> implementation of the unpickle function.
>>> >> >>
>>> >> >
>>> >> > Yeah, I was too tired last night to even try what I wrote. But see my
>>> >> > previous benchmarks on numbers of queries from DbDjango.
>>> >> >
>>> >> >>
>>> >> >> If you want this idea to grow some legs, it would be great if you
>>> >> >> could
>>> >> >> run some benchmarks with the following use cases:
>>> >> >>
>>> >> >> 1) Accessing a lot of data about a particular object (as in the case
>>> >> >> of
>>> >> >> the EditPerson dialog)
>>> >> >>
>>> >> >> 2) Accessing about half of the data about a particular object (as in
>>> >> >> the
>>> >> >> case of a typical report that doesn't display all possible
>>> >> >> information).
>>> >> >>
>>> >> >> 3) Accessing only one piece of data about a particular object (as in
>>> >> >> the
>>> >> >> case of a simple report or Gramplet).
>>> >> >>
>>> >> >> Until the benchmark data is available, you are only tempting people
>>> >> >> to
>>> >> >> fall into a bad habit of premature optimization
>>> >> >> (http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize)
>>> >> >>
>>> >> >
>>> >> > No problem in not prematurely optimizing here! But seeing Gramps data
>>> >> > in
>>> >> > a
>>> >> > relational layout makes one see very different problems. I'll try to
>>> >> > give
>>> >> > more background.
>>> >> >
>>> >> > (BTW, I now have a DbDjango interface that is 100% lazy. Only uses
>>> >> > the
>>> >> > data
>>> >> > when it needs it.)
>>> >> >
>>> >> > -Doug
>>> >> >
>>> >> >>
>>> >> >> ~Brian
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------------
>>> >> >> This SF.Net email is sponsored by the Verizon Developer Community
>>> >> >> Take advantage of Verizon's best-in-class app development support
>>> >> >> A streamlined, 14 day to market process makes app distribution fast
>>> >> >> and
>>> >> >> easy
>>> >> >> Join now and get one step closer to millions of Verizon customers
>>> >> >> http://p.sf.net/sfu/verizon-dev2dev
>>> >> >> _______________________________________________
>>> >> >> Gramps-devel mailing list
>>> >> >> Gramps-devel@lists.sourceforge.net
>>> >> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > ------------------------------------------------------------------------------
>>> >> > This SF.Net email is sponsored by the Verizon Developer Community
>>> >> > Take advantage of Verizon's best-in-class app development support
>>> >> > A streamlined, 14 day to market process makes app distribution fast
>>> >> > and
>>> >> > easy
>>> >> > Join now and get one step closer to millions of Verizon customers
>>> >> > http://p.sf.net/sfu/verizon-dev2dev
>>> >> > _______________________________________________
>>> >> > Gramps-devel mailing list
>>> >> > Gramps-devel@lists.sourceforge.net
>>> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gerald Britton
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Gerald Britton
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Verizon Developer Community
>> Take advantage of Verizon's best-in-class app development support
>> A streamlined, 14 day to market process makes app distribution fast and easy
>> Join now and get one step closer to millions of Verizon customers
>> http://p.sf.net/sfu/verizon-dev2dev
>> _______________________________________________
>> Gramps-devel mailing list
>> Gramps-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>>
>



--
Gerald Britton

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Gramps-devel mailing list
Gramps-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gramps-devel