From: Brian M. <pez...@us...> - 2005-12-14 14:20:09
|
I forgot about the fact that the person maintaines references to the source already. In that case, could the source also maintain back-references to each person that references it? That way you would not need a new table at all. The process to remove a source would be as follows: 1)Look up the source in the source table for each person referenced by the source: 2)Look up the person in the person table 3)Remove the source reference from the person 4)Remove the source from the source table That seems simple. What am I missing? ~Brian --- Alex Roitman <sh...@gr...> wrote: > I am not sure which message to respond to :-) > So I am replying to the last one so far, but I really > mean to continue a broader discussion. > > I just thought I'd clear the details of what we discussed > earlier today on IRC, and I think it addresses most of > the concerns. > > What is being proposed is the table with entries like this: > key: a source_handle, such as 'JHGSDFHSDF2342SJDHFS' > data: a (pickled) Set of the following form: > [ ('Person','IUN78VUYV786V87U'), > ('Place', 'OIU87TY65ADS98ADS'), > ('Family','WI0WF0WF564WF87FE'), > ('Person','blah'), > ('Event','blah2'), ] > and so on. I may have switched the type/handle order in a tuple, > but this is not really important. > > These would identify all the primary objects that > refer to the source. Adding a reference to this source > will set-add a tuple to this record (set-add meaning > that if such tuple already exists then there's no actual > addition -- union operation). Removing will have to be > smarter and make sure that referring object from which > the reference is removed does not have any remaining > references to that source. > > Same table would also hold referring objects' handles > for other references such as Media objects, Places, > Events, Person, and Family. Handles are unique, so > any handle can be the key without worrying of mixing > them up. Example: > > key: a media_handle, such as 'media_handle' > data: a (pickled) Set of the following form: > [ ('Person','234IU234N78VUYV786V87U'), > ('Place', '456OIU87TY65ADS98ADS'), > ('Source','WI0WF0WF564WF87FE'), > ('Person','blah234'), > ('Event','blah234'), ] > > Now, two important concerns are being addressed as follows: > 1. A person 'blah' may refer a given source 15 times. Smart > removal will skip removing the tuple from the set in the > above reference_map data unless the last reference is being > removed. > 2. A person 'blah' may refer the source not only from > the actual primary Person object, but also from its > secondary members such as Name, Address, Attribute, etc. > Currently this is handled by recursive checks through > all the members and their members etc. We will leave > this exactly as it is. This should not have any noticeable > performance penalty, because the primary object is > already in memory and no DB lookup is needed. In essence, > this would be just checking selected fields of selected > members. > > So Julio's concern that multiple references will be mixed up > seems not to really be a concern here -- all we need is > from 100000 people to identify 50 that refer to this source. > Then we can quickly search the 50 correct people for details > of specific refs (can be 15 per person, both from different > members as well as the Person object itself). If all 100000 > people refer to that source then we don't gain anything, > but it's tough luck, but then no other optimization can go > around that either. Our goal is to make search scale as O(K) > where K is the number of existing refs, as opposed to O(N) > where N is the database size. The proposed reference_map achieves > that goal. > > As for Brian's concern: I think you mis-placed the keys in > your example. When Person refers to the source, Person object > already stores source handle (along with more reference-related > data like page number etc). So for a given source we know > exactly which sources it refers. The problem is how to efficiently > do the opposite, i.e. given the source find people referring > it. At present we walk the db (bad). The proposed table would > only store such back references, so given the source we would > be able to identify the persons referring it with a single lookup. > > So, to delete a source we do a single lookup, find handles > of all N objects referring to the source, and then modify those > objects (clean up the refs) and delete the source. We need > as many DB lookups and edits as there are records referencing > the source, and this is a reasonable thing. Plus one initial > lookup and one final edit in a reference_map, for any number of > referring objects. In an alternative setup, we just transferring > same number of edits from Person table to this associate table: > each reference has to be found and deleted, right? > > Finally, Benny's suggestion of Person-Source reference table > would work just fine for the relational database. We're in a > situation both better and worse than that. Better because we > are not limited to a one-to-one relationship of relational > databases. Our Person object can hold as many references > to any number of source as needed, with no additional effort. > This is impossible in a relational db with the fixed number > of columns in a table, hence people usually resort to the > associated tables. We don't have to. > > Worse because we can't ask a simple SQL question of "give > me people referenced by source "blah" and have DB do the > work for us. The proposed reference_map would help us there. > For other kinds of queries we'd have to do some more footwork. > Either additional tables or smart indexes or something > else. > > In any case, having Person-Source table would be OK, except > (1) we don't need it for Person->Source references, and > (2) we would need similar tables for other reference types. > A single table with all the references is arguably better. > Especially since a single lookup is needed for any > reference-related query. Also, with the Person-Source table > multiple references become a problem. Finally, this means > rewriting/rethinking lots of reference-related code, > while the proposed setup is actually adding some code. > > Disclaimer: I know very little about databases, just what > I learned while hacking on gramps, plus some FoxPro hacking > over ten years ago :-) > > Alex > > -- > Alexander Roitman http://www.gramps-project.org > |