2010/12/2 Nick Hall <nick__hall@hotmail.com>

Tim Lyons wrote:
> Gerald Britton-2 wrote:
>> Except that it rapidly becomes unmanageable.  A census is a singe
>> source with millions of entries, each of which will have tens of
>> attributes.  Source data key/value pairs are completely insufficient
>> for this purpose.  You would have to make up keys that include some
>> identification information for each census entry you want to record.
>> So, for some census, you would have:
>> JohnDoe_House: brick
>> NancyDrew_House: stone
>> etc.  Since you might have hundreds of members of your family tree in
>> the census, this is clearly unworkable.  Unless you abuse the notion
>> of a Source and make one up for every census entry, that is.  (A
>> method I cannot accept)
>> This is exactly where my suggestion to create Source Reference
>> attributes arises.  And, it is not limited to census data.  It applies
>> equally well to birth, marriage and death registers and many other
>> similar, bulk sources, that contain many columns of data on hundreds,
>> thousands or even millions of individuals.
>>  --<snip>
>> Difficult to search, impossible to compare, except for the most basic
>> data.  Remember that I have censuses with up to 140 key/value pairs
>> per individual.
> I want to argue very strongly for the addition of media and if wanted
> attributes to Source References, and for Source References to become first
> class objects so that they can be shared.

Yes, I have often thought that this would be a good idea.

And I think i is a very, very, very bad idea.
I think many of us mean the same thing but use different words and use some equal words for different things, this does not allow discussion, so let's define a unique vocabulary first, and stick to it for the discussion.

Let me try to make some things clear.
You have a person, and you have a thing to store data of a source. Between the two, you will always have something to store data about the relationship between the two. At least that is how I was taught it when doing my master in applied informatics, and how I have seen it used everywhere where I worked. That is what is now source reference in Gramps. As a logical conclusion, if we make source reference shared, we need a new unique non shared object between the two. A source reference reference? ;-)
Anyway, the word reference in Gramps _always_ indicates the relationship information between two objects, so don't use the word for things you want as core object!

The discussion will always return to the same thing, how to handle large sources in Gramps. I already discussed that when I started using Gramps, but the admins then did not really think the problem required changes.
In my book, best is to mimic as close as possible (with minimum of complexity) the reality. So, think outside the box, and change how some objects behave.

My suggestion would be:

Source (Data=publication information) --> Source Content <----- source-object-relation ---> object

Our source-object-relation is the object storing the unique relationship, and we call that at present "source reference".

The question then becomes:
1/ is above suggestion good enough?
2/ what data must be stored under which object? Eg, does it really make sense to store media in the relationship object? Is pagenumber not a part of the source content object? Should there still be something stored under the source-reference?

What Tim calls a shared source reference is what I would call source content. What Nick calls Citation I call Source Content. If we list what attributes should be stored under which object, it will be more clear what the best name for the object is.

One can make things more complicated with an object model as:

Source (Data=publication information) --> Source Content <----- deduction-process ----> Information <--- information-object-relation ---> object

But I do not think our users would have the time to actually work like that. An important constraint for Gramps is that it must still be easy to create a family tree, even if that means breaking away from the possibility to mimic reality as close as possible.

I don't mind changes to the core of Gramps, but they must be _completely_ worked out so we are certain things are sound. So let's discuss, but in the end, first a GEP must be made with a full documentation of all the changes to make working with large sources a joy in Gramps.

So, next step would be somebody listing the objects above, and indicating what of our present fields go with what object, and what possible extra fields are needed.

To end the definition of vocabulary, Attributes are Data with a source (and notes, but less imporant). Data are key,value pairs. As a source does not link to another source, it has Data, not Attributes. Feel free to come up with better names.


You are suggesting that we convert existing Source References into new
primary "Citation" objects. Where we now have a Source Reference, it
would become a reference to a Citation. A Citation would contain a
reference to a Source.

A Citation object could contain Attributes and Media objects.

We would create a Citation View, Editor and Selector. Existing editors
that create/edit/delete a Source Reference would need updating to
add/select/remove a Citation.

I think that this has been discussed and rejected in the past, but I'm
not sure why. The Citation table would be large, which may have been a
factor. Does anyone know anything about this?


> At present if you treat a "large source" as a single source, then you have
> problems with managing (manually and by conventions) the components of that
> source. You can store transcripts of parts of the source in Source
> References, but they are copied on each object that refers to them so are
> independent, which causes problems for finding them and updating them. You
> can't store attributes, and you can't store media on the source reference.
> Storing a transcript on a source reference as a shared note does avoid some
> problems, but the information on the source reference is still independent,
> so you can get different representations of the page number in what should
> be the same source reference. As Gerard says, using source data for key
> value pairs rapidly becomes unmanageable, and the data he wants to store are
> actually properties of the source reference.
> At present, if you treat each component as a separate source, then you have
> an unnatural breakdown that causes difficulties in managing the links to
> repositories. It also does not respect the Page number that is a built-in
> property of source references. I agree with Benny that a source is something
> like a book, so this is not the appropriate model. As Gerard says "Unless
> you abuse the notion of a Source and make one up for every census entry,
> that is.  (A method I cannot accept)"
> If you change Gramps so that you treat a 'large source' as a single source,
> but allow media links and if you want, attributes, and they are shared, then
> you don't need any conventions as to how to manage things like pages from
> that source. Each separate thing that is referenced is a separate source
> reference, and these can be shared.
> One of the points Gerard made in his earlier postings was that limitations
> arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both
> multi-media links as well as notes in Source Citations, which are the exact
> equivalent of Source References. I'm afraid I don't understand the
> distinction that is being drawn between 'attributes' and 'data' - it seems
> to me to be rather artificial. I don't think it's important from a
> theoretical point of view whether the things you use for storage in a source
> reference are notes, or attributes or data. Whatever is most convenient for
> the user to use. Similarly, I don't think it is important that 'Data' is not
> exported to GEDCOM. This is surely just a feature of the current code.
> GEDCOM source citations have notes and 'text from source', so I would expect
> data, attributes and notes all to be exported to suitable notes within
> GEDCOM in due course.
> The fact that there have been many previous discussions along these lines in
> the Gramps lists does seem to me to indicate that there is something
> missing. I think that shared source references with media and attributes
> would meet most of the needs.
> I would hope that making source references shared would be more or less
> transparent to most of the screens and reports in Gramps. Obviously you
> would need a new category of display, and you would need to allow the user
> to choose an existing source reference as well as create a new one.
> Technically you would indeed need a source reference reference, but that is
> only a technically awkward name. I imagine that one could even hide the link
> to the intermediate source reference object so that a prototype of shared
> source reference objects might not even need to change most of the rest of
> the code which could continue to refer to a source reference and get to it
> through the s-r-r transparently - just a suggestion - I don't know enough
> about it to know whether it is really feasible or even desirable.

Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
Gramps-devel mailing list