From: Gerald B. <ger...@gm...> - 2010-11-29 17:09:00
|
I want to open up a discussion about how best to store data from large sources. By "large" I mean sources such as registers, logs, family bibles, censuses, member lists and other things that contain many entries (millions, in the case of a census). Usually, each entry in such a document has several columns of data. For example, a marriage register at a church will have names of the bride, groom, witnesses and possibly parents, date, officiating minister's name and other things. A death register may include cause of death, place and other things. A census may have all sorts of data, including whether the house was brick or frame and how many stories it had, if the person was an employer (and if so, how many employees he had) or employee, whether he was deaf, dumb, blind, crazy (or "lunatic") and how many sheep he has. One challenge with this sort of data is the document in which it is found. If we treat it as a source (which seems natural to me) then we have a problem storing all the bits of data we find for one individual. You can't use source attributes, since those are shared in gramps. We could treat the document as a repository and the entries as sources from that repository, but that seems unnatural when you are looking at a book or a bible or a microfilm. Also, since a repository can't have a repository (i.e. you can't nest them) and the real repository (building, web site, etc.) may house many such sources, how can you bring all these pseudo-repositories together under that real repository? Another challenge is all the bits of data we find in this document. Some should no doubt find there way to other objects: the cause of death to a death event attribute; the witness names to a marriage event attribute; the house construction and size to a residence attribute, perhaps. However, it is still good (very good, I feel) to also keep all this data together and tie it to the source in which it is found. The Census gramplet addresses this by using event reference attributes. So the fact that the person was a lunatic is recorded in an attribute of the event reference -- Lunatic: yes -- and similarly, the other attributes. This solves the immediate problem for censuses but may not be generally extensible to other documents -- especially if there are multiple documents for some event that disagree with each other. Furthermore, it is not clear to me that this is the best way to handle this data since the data is not really an attribute of the event but of the source document since the data was recorded in the source document at the time of the event. I'm now wondering if we should add attributes to source references analogous to event references. If available, this would be a natural place to store all the bits of data for each entry while keeping one source object (the book, film, etc.) at the repositories where it can be found. On the other hand, introducing source reference attributes may introduce challenges for GEDCOM exports and imports. So let's discuss! What creative ways can we devise to handle these sorts of source documents? Should we extend our data model and, if so, how? If we extend the data model, what are the repercussions? -- Gerald Britton |
From: Benny M. <ben...@gm...> - 2010-11-30 08:49:01
|
2010/11/29 Gerald Britton <ger...@gm...> > I want to open up a discussion about how best to store data from large > sources. By "large" I mean sources such as registers, logs, family > bibles, censuses, member lists and other things that contain many > entries (millions, in the case of a census). Usually, each entry in > such a document has several columns of data. For example, a marriage > register at a church will have names of the bride, groom, witnesses > and possibly parents, date, officiating minister's name and other > things. A death register may include cause of death, place and other > things. A census may have all sorts of data, including whether the > house was brick or frame and how many stories it had, if the person > was an employer (and if so, how many employees he had) or employee, > whether he was deaf, dumb, blind, crazy (or "lunatic") and how many > sheep he has. > > One challenge with this sort of data is the document in which it is > found. If we treat it as a source (which seems natural to me) then we > have a problem storing all the bits of data we find for one > individual. You can't use source attributes, since those are shared > in gramps. We could treat the document as a repository and the > entries as sources from that repository, but that seems unnatural when > you are looking at a book or a bible or a microfilm. Also, since a > repository can't have a repository (i.e. you can't nest them) and the > real repository (building, web site, etc.) may house many such > sources, how can you bring all these pseudo-repositories together > under that real repository? > > Another challenge is all the bits of data we find in this document. > Some should no doubt find there way to other objects: the cause of > death to a death event attribute; the witness names to a marriage > event attribute; the house construction and size to a residence > attribute, perhaps. However, it is still good (very good, I feel) to > also keep all this data together and tie it to the source in which it > is found. > > The Census gramplet addresses this by using event reference > attributes. So the fact that the person was a lunatic is recorded in > an attribute of the event reference -- Lunatic: yes -- and similarly, > the other attributes. This solves the immediate problem for censuses > but may not be generally extensible to other documents -- especially > if there are multiple documents for some event that disagree with each > other. Furthermore, it is not clear to me that this is the best way > to handle this data since the data is not really an attribute of the > event but of the source document since the data was recorded in the > source document at the time of the event. > > I'm now wondering if we should add attributes to source references > analogous to event references. If available, this would be a natural > place to store all the bits of data for each entry while keeping one > source object (the book, film, etc.) at the repositories where it can > be found. On the other hand, introducing source reference attributes > may introduce challenges for GEDCOM exports and imports. > > So let's discuss! What creative ways can we devise to handle these > sorts of source documents? Should we extend our data model and, if > so, how? If we extend the data model, what are the repercussions? > > For me, 1. Repository is where you find a source. We should not misuse it 2. Source is the the book/registery, or a part of it. The source holds information, and literal transcripts of a source should hence be stored in this object. A source does _not_ have what we call attributes, a source has "Data". This is not exported to GEDCOM. The Data is not shared, the source is what is shared. 3. An event is something happening to a person/family at a certain time/place. Census event is the census taker that passes and writes info in the census source. 4. You learn from a source information about a person or family, so you want to add information about the person/family in the person/family object. You add this information, eg an attribute: Description, Blue eyes. Source of this attribute is the census souce. I don't see problems here, except for the fact that you can only store the census data in the source as a note if you want it stored. So there is no 'database scheme' for it. You can use Source Data for key-value pairs. Now, the other way around. You have a person, and you see a source saying green eyes. You go to attributes and you see blue eyes. You wonder if there is no error. You click on the attribute to from what source you have this information, you open the census source, and you look at the data inside of it. If you used a note for the data in the census, you can share it in the source reference, and you know what the census said. If you are uncertain and you want to recheck the census, you go to the repository tab and you see where this census is stored to check in the repository the source again. So, In all this, you normally _don't_ check the census event! It seems stupid to me to store data obtained in the census taking there. At most, I would share a note with the transcript there. So, in my view, the way census gramplet works is wrong. Benny -- > Gerald Britton > > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today with the > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Nick H. <nic...@ho...> - 2010-11-30 11:10:50
|
Benny Malengier wrote: > > > 2010/11/29 Gerald Britton <ger...@gm... > <mailto:ger...@gm...>> > > I want to open up a discussion about how best to store data from large > sources. By "large" I mean sources such as registers, logs, family > bibles, censuses, member lists and other things that contain many > entries (millions, in the case of a census). Usually, each entry in > such a document has several columns of data. For example, a marriage > register at a church will have names of the bride, groom, witnesses > and possibly parents, date, officiating minister's name and other > things. A death register may include cause of death, place and other > things. A census may have all sorts of data, including whether the > house was brick or frame and how many stories it had, if the person > was an employer (and if so, how many employees he had) or employee, > whether he was deaf, dumb, blind, crazy (or "lunatic") and how many > sheep he has. > > One challenge with this sort of data is the document in which it is > found. If we treat it as a source (which seems natural to me) then we > have a problem storing all the bits of data we find for one > individual. You can't use source attributes, since those are shared > in gramps. We could treat the document as a repository and the > entries as sources from that repository, but that seems unnatural when > you are looking at a book or a bible or a microfilm. Also, since a > repository can't have a repository (i.e. you can't nest them) and the > real repository (building, web site, etc.) may house many such > sources, how can you bring all these pseudo-repositories together > under that real repository? > > Another challenge is all the bits of data we find in this document. > Some should no doubt find there way to other objects: the cause of > death to a death event attribute; the witness names to a marriage > event attribute; the house construction and size to a residence > attribute, perhaps. However, it is still good (very good, I feel) to > also keep all this data together and tie it to the source in which it > is found. > > The Census gramplet addresses this by using event reference > attributes. So the fact that the person was a lunatic is recorded in > an attribute of the event reference -- Lunatic: yes -- and similarly, > the other attributes. This solves the immediate problem for censuses > but may not be generally extensible to other documents -- especially > if there are multiple documents for some event that disagree with each > other. Furthermore, it is not clear to me that this is the best way > to handle this data since the data is not really an attribute of the > event but of the source document since the data was recorded in the > source document at the time of the event. > > I'm now wondering if we should add attributes to source references > analogous to event references. If available, this would be a natural > place to store all the bits of data for each entry while keeping one > source object (the book, film, etc.) at the repositories where it can > be found. On the other hand, introducing source reference attributes > may introduce challenges for GEDCOM exports and imports. > > So let's discuss! What creative ways can we devise to handle these > sorts of source documents? Should we extend our data model and, if > so, how? If we extend the data model, what are the repercussions? > > > For me, > > 1. Repository is where you find a source. We should not misuse it I agree. The repository for a census might be "National Archives". > > 2. Source is the the book/registery, or a part of it. The source holds > information, and literal transcripts of a source should hence be > stored in this object. Yes. The source for a census might be "1851 England Census". You could store literal transcripts here, but you would have a large number of them. Wouldn't it be better to store them in a Source Reference where you would only have the transcript of a page? > A source does _not_ have what we call attributes, a source has "Data". > This is not exported to GEDCOM. The Data is not shared, the source is > what is shared. > > 3. An event is something happening to a person/family at a certain > time/place. Census event is the census taker that passes and writes > info in the census source. Yes. A census event will in general have several people attached to it. It will also have a census source with the source reference containing a full reference to its page, and possibly a transcript. (On my ToDo list). > > 4. You learn from a source information about a person or family, so > you want to add information about the person/family in the > person/family object. You add this information, eg an attribute: > Description, Blue eyes. Source of this attribute is the census souce. OK, this is where we have a problem. One of the reasons that I wrote the census add-ons is that it is common to get contradictory information. You want to record all this information against a Person/Census combination. The natural place to store this is as an attribute on the event reference object. > > I don't see problems here, except for the fact that you can only store > the census data in the source as a note if you want it stored. So > there is no 'database scheme' for it. You can use Source Data for > key-value pairs. I don't like the idea of using Source Data. Storing a transcript against a source and/or source reference as a shared Note is a good idea. > > Now, the other way around. You have a person, and you see a source > saying green eyes. You go to attributes and you see blue eyes. You > wonder if there is no error. Good example. You might have added from a census or from another source. > You click on the attribute to from what source you have this > information, you open the census source, and you look at the data > inside of it. If you used a note for the data in the census, you can > share it in the source reference, and you know what the census said. > If you are uncertain and you want to recheck the census, you go to the > repository tab and you see where this census is stored to check in the > repository the source again. Well at this point you probably want to stop editing and run some reports to examine your data. The Census report is written just for this purpose - it allows you to compare all census data for a person in a structured way. Once you have evaluated your data you can either go back and edit the record. > > So, In all this, you normally _don't_ check the census event! It's not really a matter of checking an event. We want the data stored in a structured manner so that we can run reports to analyse the data. > It seems stupid to me to store data obtained in the census taking there. I was suggesting storing data such as "number of rooms" as attributes of a census event. Again, this is a natural place to store the data and allows convenient access for the Census report and Census editor. > At most, I would share a note with the transcript there. I would prefer for transcripts to be stored on the Source Reference rather than Event. I only suggested storing an image on the Event because it is not possible to store it on the Source Reference. > > So, in my view, the way census gramplet works is wrong. Well I see it as, transcripts and images on the Source Reference or Source, maybe shared. Data extracted from this source data on People, Families, Events. Nick. > > Benny > > -- > > Gerald Britton > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today > with the > Intel(R) Software Partner Program. Five $500 cash prizes are up > for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > <mailto:Gra...@li...> > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today with the > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > ------------------------------------------------------------------------ > > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Gerald B. <ger...@gm...> - 2010-11-30 15:19:59
|
On Tue, Nov 30, 2010 at 3:48 AM, Benny Malengier <ben...@gm...> wrote: > > > 2010/11/29 Gerald Britton <ger...@gm...> >> >> I want to open up a discussion about how best to store data from large >> sources. By "large" I mean sources such as registers, logs, family >> bibles, censuses, member lists and other things that contain many >> entries (millions, in the case of a census). Usually, each entry in >> such a document has several columns of data. For example, a marriage >> register at a church will have names of the bride, groom, witnesses >> and possibly parents, date, officiating minister's name and other >> things. A death register may include cause of death, place and other >> things. A census may have all sorts of data, including whether the >> house was brick or frame and how many stories it had, if the person >> was an employer (and if so, how many employees he had) or employee, >> whether he was deaf, dumb, blind, crazy (or "lunatic") and how many >> sheep he has. >> >> One challenge with this sort of data is the document in which it is >> found. If we treat it as a source (which seems natural to me) then we >> have a problem storing all the bits of data we find for one >> individual. You can't use source attributes, since those are shared >> in gramps. We could treat the document as a repository and the >> entries as sources from that repository, but that seems unnatural when >> you are looking at a book or a bible or a microfilm. Also, since a >> repository can't have a repository (i.e. you can't nest them) and the >> real repository (building, web site, etc.) may house many such >> sources, how can you bring all these pseudo-repositories together >> under that real repository? >> >> Another challenge is all the bits of data we find in this document. >> Some should no doubt find there way to other objects: the cause of >> death to a death event attribute; the witness names to a marriage >> event attribute; the house construction and size to a residence >> attribute, perhaps. However, it is still good (very good, I feel) to >> also keep all this data together and tie it to the source in which it >> is found. >> >> The Census gramplet addresses this by using event reference >> attributes. So the fact that the person was a lunatic is recorded in >> an attribute of the event reference -- Lunatic: yes -- and similarly, >> the other attributes. This solves the immediate problem for censuses >> but may not be generally extensible to other documents -- especially >> if there are multiple documents for some event that disagree with each >> other. Furthermore, it is not clear to me that this is the best way >> to handle this data since the data is not really an attribute of the >> event but of the source document since the data was recorded in the >> source document at the time of the event. >> >> I'm now wondering if we should add attributes to source references >> analogous to event references. If available, this would be a natural >> place to store all the bits of data for each entry while keeping one >> source object (the book, film, etc.) at the repositories where it can >> be found. On the other hand, introducing source reference attributes >> may introduce challenges for GEDCOM exports and imports. >> >> So let's discuss! What creative ways can we devise to handle these >> sorts of source documents? Should we extend our data model and, if >> so, how? If we extend the data model, what are the repercussions? >> > > For me, > > 1. Repository is where you find a source. We should not misuse it Yup, I use it the same way > > 2. Source is the the book/registery, or a part of it. The source holds > information, and literal transcripts of a source should hence be stored in > this object. > A source does _not_ have what we call attributes, a source has "Data". This > is not exported to GEDCOM. The Data is not shared, the source is what is > shared. Yup > > 3. An event is something happening to a person/family at a certain > time/place. Census event is the census taker that passes and writes info in > the census source. exactly > > 4. You learn from a source information about a person or family, so you want > to add information about the person/family in the person/family object. You > add this information, eg an attribute: Description, Blue eyes. Source of > this attribute is the census souce. Yes, but I think it is good to keep the information that you find in one place as well. This is what the census gramplet does and it is very useful, since the data extracted from the census lives independently. > > I don't see problems here, except for the fact that you can only store the > census data in the source as a note if you want it stored. So there is no > 'database scheme' for it. You can use Source Data for key-value pairs. Except that it rapidly becomes unmanageable. A census is a singe source with millions of entries, each of which will have tens of attributes. Source data key/value pairs are completely insufficient for this purpose. You would have to make up keys that include some identification information for each census entry you want to record. So, for some census, you would have: JohnDoe_House: brick NancyDrew_House: stone etc. Since you might have hundreds of members of your family tree in the census, this is clearly unworkable. Unless you abuse the notion of a Source and make one up for every census entry, that is. (A method I cannot accept) This is exactly where my suggestion to create Source Reference attributes arises. And, it is not limited to census data. It applies equally well to birth, marriage and death registers and many other similar, bulk sources, that contain many columns of data on hundreds, thousands or even millions of individuals. > > Now, the other way around. You have a person, and you see a source saying > green eyes. You go to attributes and you see blue eyes. You wonder if there > is no error. You click on the attribute to from what source you have this > information, you open the census source, and you look at the data inside of > it. If you used a note for the data in the census, you can share it in the > source reference, and you know what the census said. If you are uncertain > and you want to recheck the census, you go to the repository tab and you see > where this census is stored to check in the repository the source again. > > So, In all this, you normally _don't_ check the census event! It seems > stupid to me to store data obtained in the census taking there. At most, I > would share a note with the transcript there. Difficult to search, impossible to compare, except for the most basic data. Remember that I have censuses with up to 140 key/value pairs per individual. > > So, in my view, the way census gramplet works is wrong. I think it works the best that it can with the current schema. If we had Source Reference Attributes or something similar, we could put the data there instead. > > Benny > > -- >> >> Gerald Britton >> >> >> ------------------------------------------------------------------------------ >> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! >> Tap into the largest installed PC base & get more eyes on your game by >> optimizing for Intel(R) Graphics Technology. Get started today with the >> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. >> http://p.sf.net/sfu/intelisp-dev2dev >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel > > -- Gerald Britton |
From: jerome <rom...@ya...> - 2010-12-19 15:25:44
|
Hi, Nice work. I just wonder if to provide the ability to link sources together (source grouping) will not match most cases ? It was the proposal scheme. sourceref with hlink and group attributes into a source object. http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources ex: Book/publication/index gives some source references. Primary source will be the Book/publication/index, child references will be the secondary sources. Large data might be stored into multiple sources, which are shared between persons, events, etc ... Jérôme --- En date de : Dim 19.12.10, Nick Hall <nic...@ho...> a écrit : > De: Nick Hall <nic...@ho...> > Objet: Re: [Gramps-devel] Storing data from large sources > À: "Tim Lyons" <guy...@gm...> > Cc: gra...@li... > Date: Dimanche 19 décembre 2010, 15h19 > Benny, > > Tim has done a lot of work on the GEPS, and it is now at a > stage where I > think that it would be helpful if you could review it. > > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources > > My main concern is that the Citation Reference editor may > by rather > complicated and large. What do you think? > > Could we combine the Type and Deduction Confidence in the > citation > reference? A Transcript type would imply a high confidence, > whereas a > deduction would be a lower confidence. We could go down to > a "Guess" > which would imply a very low confidence. > > I didn't use the Source Confidence when I started to use > Gramps, because > I was unsure which value to choose. The values in the > GEDCOM standard > (Direct/Primary, Secondary, Questionable, Unreliable) are > more obvious > how to use than the Gramps values (Very High, High, Normal, > Low, Very > Low). Perhaps we could choose more descriptive values for > the Deduction > Type? > > > Nick. > > > > Tim Lyons wrote: > > Thanks for your suggestions Benny, I think we may be > moving towards a > > consensus. I have created a GEPS to outline a change. > > > > Benny Malengier wrote: > > > >> I don't think SourceContent > >> must be presented in the interface as a core > object. Instead a treeview > >> Source-SourceContent seems more natural to me. > >> > >> > > Thanks for the suggestion. A treeview for the Source > View and the selector > > works well. > > > > Benny Malengier wrote: > > > >> I would do the attributes > >> different. > >> > >> Source > >> 1 Title > >> 1 Author > >> 1 Gramps ID > >> 1 Abbr > >> 1 Pulication Information > >> 1 Global Confidence > >> n Publication Data (key value pairs, > eg Publication Date, Publisher, > >> ...) > >> n MediaRef (Region, Src, attr, > notes) --> Media > >> n RepoRef (Type, Callnumber) > --> Repo > >> > >> SourceContent > >> 1 Source (GrampsID) > >> 1 Confidence (5 values) > >> 1 Volume > >> 1 Page > >> 1 LogDate > >> 1 Linenumber > >> 1 Position (eg. Upper Left > Corner of image) > >> n Information (key, value > pairs, current Data) > >> n NoteIds > >> n MediaRef (Region, Src, > attr, notes) --> Media > >> > >> > > I agree except that I wouldn't remove the Notes field > from the Source. This > > would be too awkward for people who are already using > it, and is relevant > > where the source is not 'large'. > > > > I wonder whether we should keep Volume/Page instead of > separate Volume, > > Page, Linenumber and Position for this enhancement. > There is a proposal > > (GEPS 018) which would change the fields in the > SourceContent according to a > > Source Type. > > http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources > > I wonder whether it would be better that we wait for > this, rather than > > changing the fields twice. In any case, there are > plenty of cases where > > breakdowns other than the proposed one are more > appropriate. > > > > Benny Malengier wrote: > > > >> SourceContentRef (called Citation in the > interface, part of objects with > >> sources) > >> 1 Type: Transcript or Deduction > >> 1 Deduction Confidence (5 > values) > >> 1 Argumentation (one line > string) > >> n Note > >> > >> > > I have not included this in the GEPS, because it seems > to relate to how > > deductions are stored, and as such may not be directly > related to this > > enhancement. Also I am concerned that this may make > the user experience too > > complicated. In the GEPS, users who are happy with the > existing interface > > will see little change (change always frightens > users); those who want more > > will be able to use the additional features. > > > > Benny Malengier wrote: > > > >> So, in this design, one must envision that Source > and sourcecontent form > >> one > >> single editor. > >> > >> > > I agree, having a single editor makes things simpler > for the user and > > ensures that the workflow does not get more > complicated. > > > > Benny Malengier wrote: > > > >> When adding a citation to eg a person, you obtain > a treeview > >> source-sourcecontent, so if you select a census > entry, you immediately see > >> the data of that entry you stored. > >> > >> > > I agree - a treeview will make it no more complicated > to select an existing > > Source or SourceContent than it is at present. > > > > Benny Malengier wrote: > > > >> Confidence is given globally, of the content (same > as globally by > >> default), > >> and of the deduction. The SourceContentRef is > there to hold the process of > >> deducing information you add to eg a person as > coming from a source. In > >> many > >> cases, a pure transcript of the source is done, > and no deduction happens, > >> in > >> which case this object contains nothing of > interest. If one however makes > >> a > >> deduction, then one can store this here > specifically. Eg you find the name > >> Nic__ where you cannot make out what the last hand > written letters are, > >> and > >> you save the name as Nick, with reference to this > source. Then the > >> sourcereference can indicate why you decide to use > Nick and not eg Nicki. > >> > >> > > As I mentioned, I have not included the fields of a > SourceContentRef in the > > GEPS. They could be added if there is a general desire > to do so. > > > > > > The GEPS is at > > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Tim L. <guy...@gm...> - 2010-12-19 16:20:59
|
jerome wrote: > > I just wonder if to provide the ability to link sources together (source > grouping) will not match most cases ? It was the proposal scheme. > > sourceref with hlink and group attributes into a source object. > > http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources > > ex: Book/publication/index gives some source references. Primary source > will be the Book/publication/index, child references will be the secondary > sources. Large data might be stored into multiple sources, which are > shared between persons, events, etc ... > If I understand your suggestion correctly, this would still leave the current SourceRef, which would contain the Volume/Page and notes, which means the problems with updating this information in many different places remains. There would also not be a place to store the Volume/Page on the child reference/secondary source. When producing reports from your suggestion, it would not be obvious where to get the primary information from and where to get the secondary information. In contrast, with the proposal for Citations, the information in the current SourceRef is moved to the Citation, where it is shared so that it only needs to be updated in one place. When reports are produced, or when the information is output to GEDCOM, the Citation gives the detailed information and the Source gives the general information, and it is clear how to combine these to produce a complete reference text. -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3094577.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |
From: jerome <rom...@ya...> - 2010-12-19 16:38:27
|
OK, I see. Thank you! ------------------------------------- Note, I contacted actors/authors of Gedbas4all, here Jesper's answer: "I am also very interested in a cooperation. In preparation for Gedbas4all I took a look at Gramp's data model. On thing that is missing there -- for my private research as well -- is the possibility to create persons from different sources as different persons in Gramps and mark them as "possible the same". That way it would always be clear which information comes from a sources and which information is just an assertion of the researcher. Our goal for Gedbas4all is a web application. It would be great if there was a desktop application with such functionality, too. A scientific analysis of genealogical information would be much easier. However, I suppose such an extension of Gramps would be too comprehensive and require changes in nearly all parts of the source code. I took a look at the text of GEPS 23. Together with GEPS 24 a good management of complex sources would be a great basis for an efficient capture of genealogical sources like church records, census lists etc. For Gedbas4all we have planned an arbitrary nested tree structure of sources. For an address book it might look like: book -> page -> entry. Every level can have several media objects or a clippings attached (like it is possible for sources in Gramps right now). I would like to keep in touch to advance both projects. Or maybe we can even work together closely on specific points." --Jesper Zedlitz Jérôme --- En date de : Dim 19.12.10, Tim Lyons <guy...@gm...> a écrit : > De: Tim Lyons <guy...@gm...> > Objet: Re: [Gramps-devel] Storing data from large sources > À: gra...@li... > Date: Dimanche 19 décembre 2010, 17h20 > > > jerome wrote: > > > > I just wonder if to provide the ability to link > sources together (source > > grouping) will not match most cases ? It was the > proposal scheme. > > > > sourceref with hlink and group attributes into a > source object. > > > > http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources > > > > ex: Book/publication/index gives some source > references. Primary source > > will be the Book/publication/index, child references > will be the secondary > > sources. Large data might be stored into multiple > sources, which are > > shared between persons, events, etc ... > > > > If I understand your suggestion correctly, this would still > leave the > current SourceRef, which would contain the Volume/Page and > notes, which > means the problems with updating this information in many > different places > remains. There would also not be a place to store the > Volume/Page on the > child reference/secondary source. When producing reports > from your > suggestion, it would not be obvious where to get the > primary information > from and where to get the secondary information. > > In contrast, with the proposal for Citations, the > information in the current > SourceRef is moved to the Citation, where it is shared so > that it only needs > to be updated in one place. When reports are produced, or > when the > information is output to GEDCOM, the Citation gives the > detailed information > and the Source gives the general information, and it is > clear how to combine > these to produce a complete reference text. > > -- > View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3094577.html > Sent from the GRAMPS - Dev mailing list archive at > Nabble.com. > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: lcc . <lcc...@gm...> - 2010-12-19 16:46:48
|
I can imagine this being quite a simple thing to implement in Gramps. Marking people as possibly the same? What could be hard about that? lcc On 12/19/10, jerome <rom...@ya...> wrote: > OK, I see. Thank you! > > ------------------------------------- > > Note, I contacted actors/authors of Gedbas4all, here Jesper's answer: > > "I am also very interested in a cooperation. In preparation for > Gedbas4all I took a look at Gramp's data model. On thing that is missing > there -- for my private research as well -- is the possibility to create > persons from different sources as different persons in Gramps and mark them > as "possible the same". That way it would always be clear which information > comes from a sources and which information is just an assertion of the > researcher. > > Our goal for Gedbas4all is a web application. It would be great if there was > a desktop application with such functionality, too. A scientific analysis of > genealogical information would be much easier. However, I suppose such an > extension of Gramps would be too comprehensive and require changes in nearly > all parts of the source code. > > I took a look at the text of GEPS 23. Together with GEPS 24 a good > management of complex sources would be a great basis for an efficient > capture of genealogical sources like church records, census lists etc. For > Gedbas4all we have planned an arbitrary nested tree structure of sources. > For an address book it might look like: book -> page -> entry. Every level > can have several media objects or a clippings attached (like it is possible > for sources in Gramps right now). > > I would like to keep in touch to advance both projects. Or maybe we can even > work together closely on specific points." > --Jesper Zedlitz > > > > Jérôme > > --- En date de : Dim 19.12.10, Tim Lyons <guy...@gm...> a écrit : > >> De: Tim Lyons <guy...@gm...> >> Objet: Re: [Gramps-devel] Storing data from large sources >> À: gra...@li... >> Date: Dimanche 19 décembre 2010, 17h20 >> >> >> jerome wrote: >> > >> > I just wonder if to provide the ability to link >> sources together (source >> > grouping) will not match most cases ? It was the >> proposal scheme. >> > >> > sourceref with hlink and group attributes into a >> source object. >> > >> > http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources >> > >> > ex: Book/publication/index gives some source >> references. Primary source >> > will be the Book/publication/index, child references >> will be the secondary >> > sources. Large data might be stored into multiple >> sources, which are >> > shared between persons, events, etc ... >> > >> >> If I understand your suggestion correctly, this would still >> leave the >> current SourceRef, which would contain the Volume/Page and >> notes, which >> means the problems with updating this information in many >> different places >> remains. There would also not be a place to store the >> Volume/Page on the >> child reference/secondary source. When producing reports >> from your >> suggestion, it would not be obvious where to get the >> primary information >> from and where to get the secondary information. >> >> In contrast, with the proposal for Citations, the >> information in the current >> SourceRef is moved to the Citation, where it is shared so >> that it only needs >> to be updated in one place. When reports are produced, or >> when the >> information is output to GEDCOM, the Citation gives the >> detailed information >> and the Source gives the general information, and it is >> clear how to combine >> these to produce a complete reference text. >> >> -- >> View this message in context: >> http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3094577.html >> Sent from the GRAMPS - Dev mailing list archive at >> Nabble.com. >> >> ------------------------------------------------------------------------------ >> Lotusphere 2011 >> Register now for Lotusphere 2011 and learn how >> to connect the dots, take your collaborative environment >> to the next level, and enter the era of Social Business. >> http://p.sf.net/sfu/lotusphere-d2d >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> > > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Benny M. <ben...@gm...> - 2010-12-20 10:13:22
|
2010/12/19 Nick Hall <nic...@ho...> > Benny, > > Tim has done a lot of work on the GEPS, and it is now at a stage where I > think that it would be helpful if you could review it. > > > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources > > My main concern is that the Citation Reference editor may by rather > complicated and large. What do you think? > I'll try to find the time to read it this week. A long wiki page that! > > Could we combine the Type and Deduction Confidence in the citation > reference? A Transcript type would imply a high confidence, whereas a > deduction would be a lower confidence. We could go down to a "Guess" > which would imply a very low confidence. > > I didn't use the Source Confidence when I started to use Gramps, because > I was unsure which value to choose. The values in the GEDCOM standard > (Direct/Primary, Secondary, Questionable, Unreliable) are more obvious > how to use than the Gramps values (Very High, High, Normal, Low, Very > Low). Perhaps we could choose more descriptive values for the Deduction > Type? > I added it to the infolabel in trunk. Yes, we can rename it, but then best both of them I think: High - Direct/Primary .... Or does that seem as if we cannot choose? Benny > > > Nick. > > > > Tim Lyons wrote: > > Thanks for your suggestions Benny, I think we may be moving towards a > > consensus. I have created a GEPS to outline a change. > > > > Benny Malengier wrote: > > > >> I don't think SourceContent > >> must be presented in the interface as a core object. Instead a treeview > >> Source-SourceContent seems more natural to me. > >> > >> > > Thanks for the suggestion. A treeview for the Source View and the > selector > > works well. > > > > Benny Malengier wrote: > > > >> I would do the attributes > >> different. > >> > >> Source > >> 1 Title > >> 1 Author > >> 1 Gramps ID > >> 1 Abbr > >> 1 Pulication Information > >> 1 Global Confidence > >> n Publication Data (key value pairs, eg Publication Date, Publisher, > >> ...) > >> n MediaRef (Region, Src, attr, notes) --> Media > >> n RepoRef (Type, Callnumber) --> Repo > >> > >> SourceContent > >> 1 Source (GrampsID) > >> 1 Confidence (5 values) > >> 1 Volume > >> 1 Page > >> 1 LogDate > >> 1 Linenumber > >> 1 Position (eg. Upper Left Corner of image) > >> n Information (key, value pairs, current Data) > >> n NoteIds > >> n MediaRef (Region, Src, attr, notes) --> Media > >> > >> > > I agree except that I wouldn't remove the Notes field from the Source. > This > > would be too awkward for people who are already using it, and is relevant > > where the source is not 'large'. > > > > I wonder whether we should keep Volume/Page instead of separate Volume, > > Page, Linenumber and Position for this enhancement. There is a proposal > > (GEPS 018) which would change the fields in the SourceContent according > to a > > Source Type. > > > http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources > > I wonder whether it would be better that we wait for this, rather than > > changing the fields twice. In any case, there are plenty of cases where > > breakdowns other than the proposed one are more appropriate. > > > > Benny Malengier wrote: > > > >> SourceContentRef (called Citation in the interface, part of objects with > >> sources) > >> 1 Type: Transcript or Deduction > >> 1 Deduction Confidence (5 values) > >> 1 Argumentation (one line string) > >> n Note > >> > >> > > I have not included this in the GEPS, because it seems to relate to how > > deductions are stored, and as such may not be directly related to this > > enhancement. Also I am concerned that this may make the user experience > too > > complicated. In the GEPS, users who are happy with the existing interface > > will see little change (change always frightens users); those who want > more > > will be able to use the additional features. > > > > Benny Malengier wrote: > > > >> So, in this design, one must envision that Source and sourcecontent form > >> one > >> single editor. > >> > >> > > I agree, having a single editor makes things simpler for the user and > > ensures that the workflow does not get more complicated. > > > > Benny Malengier wrote: > > > >> When adding a citation to eg a person, you obtain a treeview > >> source-sourcecontent, so if you select a census entry, you immediately > see > >> the data of that entry you stored. > >> > >> > > I agree - a treeview will make it no more complicated to select an > existing > > Source or SourceContent than it is at present. > > > > Benny Malengier wrote: > > > >> Confidence is given globally, of the content (same as globally by > >> default), > >> and of the deduction. The SourceContentRef is there to hold the process > of > >> deducing information you add to eg a person as coming from a source. In > >> many > >> cases, a pure transcript of the source is done, and no deduction > happens, > >> in > >> which case this object contains nothing of interest. If one however > makes > >> a > >> deduction, then one can store this here specifically. Eg you find the > name > >> Nic__ where you cannot make out what the last hand written letters are, > >> and > >> you save the name as Nick, with reference to this source. Then the > >> sourcereference can indicate why you decide to use Nick and not eg > Nicki. > >> > >> > > As I mentioned, I have not included the fields of a SourceContentRef in > the > > GEPS. They could be added if there is a general desire to do so. > > > > > > The GEPS is at > > > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources > > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Tim L. <guy...@gm...> - 2011-01-15 17:44:16
|
Benny Malengier wrote: > > 2010/12/19 Nick Hall <nic...@ho...> > >> Benny, >> >> Tim has done a lot of work on the GEPS, and it is now at a stage where I >> think that it would be helpful if you could review it. >> >> >> http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources >> >> My main concern is that the Citation Reference editor may by rather >> complicated and large. What do you think? >> > > I'll try to find the time to read it this week. A long wiki page that! > Benny, have you been able to have a look at the GEPS, because I am still very keen to work on this change, and I believe that Nick would take the lead on it. -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3219311.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |
From: Nick H. <nic...@ho...> - 2010-11-30 10:15:30
|
Gerald, I have some comments regarding census data and the census add-ons. Gerald Britton wrote: > I want to open up a discussion about how best to store data from large > sources. By "large" I mean sources such as registers, logs, family > bibles, censuses, member lists and other things that contain many > entries (millions, in the case of a census). Usually, each entry in > such a document has several columns of data. For example, a marriage > register at a church will have names of the bride, groom, witnesses > and possibly parents, date, officiating minister's name and other > things. A death register may include cause of death, place and other > things. A census may have all sorts of data, including whether the > house was brick or frame and how many stories it had, if the person > was an employer (and if so, how many employees he had) or employee, > whether he was deaf, dumb, blind, crazy (or "lunatic") and how many > sheep he has. > > One challenge with this sort of data is the document in which it is > found. If we treat it as a source (which seems natural to me) then we > have a problem storing all the bits of data we find for one > individual. A census is clearly an event. > You can't use source attributes, since those are shared > in gramps. We could treat the document as a repository and the > entries as sources from that repository, but that seems unnatural when > you are looking at a book or a bible or a microfilm. Also, since a > repository can't have a repository (i.e. you can't nest them) and the > real repository (building, web site, etc.) may house many such > sources, how can you bring all these pseudo-repositories together > under that real repository? > > Another challenge is all the bits of data we find in this document. > Some should no doubt find there way to other objects: the cause of > death to a death event attribute; the witness names to a marriage > event attribute; the house construction and size to a residence > attribute, perhaps. However, it is still good (very good, I feel) to > also keep all this data together and tie it to the source in which it > is found. > > The Census gramplet addresses this by using event reference > attributes. So the fact that the person was a lunatic is recorded in > an attribute of the event reference -- Lunatic: yes -- and similarly, > the other attributes. This solves the immediate problem for censuses > but may not be generally extensible to other documents -- especially > if there are multiple documents for some event that disagree with each > other. Yes, a census is a case where I can't see a census event ever having more than one source. In fact, if the census gramplet is used then only one census source is allowed. > Furthermore, it is not clear to me that this is the best way > to handle this data since the data is not really an attribute of the > event but of the source document since the data was recorded in the > source document at the time of the event. > I see something like "place of birth" to be an attribute of the person and census event. An attribute in the event reference object is a natural and convenient place to store this data. I see something like "number of rooms in the house" to be an attribute of the census event only. > I'm now wondering if we should add attributes to source references > analogous to event references. If available, this would be a natural > place to store all the bits of data for each entry while keeping one > source object (the book, film, etc.) at the repositories where it can > be found. On the other hand, introducing source reference attributes > may introduce challenges for GEDCOM exports and imports. > From a census point of view, I don't see a requirement for attributes on a source reference object. I have had a request to generate a transcript from the census data and record it as a Note in the source reference. This is a good idea, but I have not yet implemented it. I can also see the case for storing an image of the census page in the source reference. Unfortunately this is not possible so I have suggested we store it on the census event. Also not yet implemented. > So let's discuss! What creative ways can we devise to handle these > sorts of source documents? Should we extend our data model and, if > so, how? If we extend the data model, what are the repercussions? > > Regards, Nick. |
From: doug <do...@o2...> - 2010-11-30 13:09:25
|
On 29/11/10 17:08, Gerald Britton wrote: > I want to open up a discussion about how best to store data from large > sources. By "large" I mean sources such as registers, logs, family > bibles, censuses, member lists and other things that contain many > entries (millions, in the case of a census). Usually, each entry in > such a document has several columns of data. For example, a marriage > register at a church will have names of the bride, groom, witnesses > and possibly parents, date, officiating minister's name and other > things. A death register may include cause of death, place and other > things. A census may have all sorts of data, including whether the > house was brick or frame and how many stories it had, if the person > was an employer (and if so, how many employees he had) or employee, > whether he was deaf, dumb, blind, crazy (or "lunatic") and how many > sheep he has. > > One challenge with this sort of data is the document in which it is > found. If we treat it as a source (which seems natural to me) then we > have a problem storing all the bits of data we find for one > individual. You can't use source attributes, since those are shared > in gramps. We could treat the document as a repository and the > entries as sources from that repository, but that seems unnatural when > you are looking at a book or a bible or a microfilm. There was a long discussion very much along these lines some months ago on the users' list. It seems to me that there's a need to have the possibility of recognising some grouping of sources at a level between the individual source and their current geographical location. Sometimes it's natural, say, someone's personal library housed within a national library. Other times it's certainly a bit artificial, like a calling an LDS film a repository, although we can sort of get round that by specifying the type of repository to be Microfilm - for me, not so much a problem as a matter of taste. Also, since a > repository can't have a repository (i.e. you can't nest them) and the > real repository (building, web site, etc.) may house many such > sources, how can you bring all these pseudo-repositories together > under that real repository? That's the real problem, I think. It would make more sense if the address of the repository was a Place. Then there would be no difficulty in having many repositories sharing the same Place (geographical location). Also if a repository were moved - say, a private library to another institution - only the Place of the repository need be changed, instead of needing to create and enter a new repository for all the sources located there and having to delete the previous one. > > Another challenge is all the bits of data we find in this document. > Some should no doubt find there way to other objects: the cause of > death to a death event attribute; the witness names to a marriage > event attribute; I don't understand. Doesn't the marriage event with its attributes and references collect all the information in one place? What did you have in mind? Doug the house construction and size to a residence > attribute, perhaps. However, it is still good (very good, I feel) to > also keep all this data together and tie it to the source in which it > is found. > > The Census gramplet addresses this by using event reference > attributes. So the fact that the person was a lunatic is recorded in > an attribute of the event reference -- Lunatic: yes -- and similarly, > the other attributes. This solves the immediate problem for censuses > but may not be generally extensible to other documents -- especially > if there are multiple documents for some event that disagree with each > other. Furthermore, it is not clear to me that this is the best way > to handle this data since the data is not really an attribute of the > event but of the source document since the data was recorded in the > source document at the time of the event. > > I'm now wondering if we should add attributes to source references > analogous to event references. If available, this would be a natural > place to store all the bits of data for each entry while keeping one > source object (the book, film, etc.) at the repositories where it can > be found. On the other hand, introducing source reference attributes > may introduce challenges for GEDCOM exports and imports. > > So let's discuss! What creative ways can we devise to handle these > sorts of source documents? Should we extend our data model and, if > so, how? If we extend the data model, what are the repercussions? > |
From: Tim L. <guy...@gm...> - 2010-12-01 21:52:57
|
Gerald Britton-2 wrote: > > Except that it rapidly becomes unmanageable. A census is a singe > source with millions of entries, each of which will have tens of > attributes. Source data key/value pairs are completely insufficient > for this purpose. You would have to make up keys that include some > identification information for each census entry you want to record. > So, for some census, you would have: > > JohnDoe_House: brick > NancyDrew_House: stone > > etc. Since you might have hundreds of members of your family tree in > the census, this is clearly unworkable. Unless you abuse the notion > of a Source and make one up for every census entry, that is. (A > method I cannot accept) > > This is exactly where my suggestion to create Source Reference > attributes arises. And, it is not limited to census data. It applies > equally well to birth, marriage and death registers and many other > similar, bulk sources, that contain many columns of data on hundreds, > thousands or even millions of individuals. > --<snip> > Difficult to search, impossible to compare, except for the most basic > data. Remember that I have censuses with up to 140 key/value pairs > per individual. > I want to argue very strongly for the addition of media and if wanted attributes to Source References, and for Source References to become first class objects so that they can be shared. At present if you treat a "large source" as a single source, then you have problems with managing (manually and by conventions) the components of that source. You can store transcripts of parts of the source in Source References, but they are copied on each object that refers to them so are independent, which causes problems for finding them and updating them. You can't store attributes, and you can't store media on the source reference. Storing a transcript on a source reference as a shared note does avoid some problems, but the information on the source reference is still independent, so you can get different representations of the page number in what should be the same source reference. As Gerard says, using source data for key value pairs rapidly becomes unmanageable, and the data he wants to store are actually properties of the source reference. At present, if you treat each component as a separate source, then you have an unnatural breakdown that causes difficulties in managing the links to repositories. It also does not respect the Page number that is a built-in property of source references. I agree with Benny that a source is something like a book, so this is not the appropriate model. As Gerard says "Unless you abuse the notion of a Source and make one up for every census entry, that is. (A method I cannot accept)" If you change Gramps so that you treat a 'large source' as a single source, but allow media links and if you want, attributes, and they are shared, then you don't need any conventions as to how to manage things like pages from that source. Each separate thing that is referenced is a separate source reference, and these can be shared. One of the points Gerard made in his earlier postings was that limitations arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both multi-media links as well as notes in Source Citations, which are the exact equivalent of Source References. I'm afraid I don't understand the distinction that is being drawn between 'attributes' and 'data' - it seems to me to be rather artificial. I don't think it's important from a theoretical point of view whether the things you use for storage in a source reference are notes, or attributes or data. Whatever is most convenient for the user to use. Similarly, I don't think it is important that 'Data' is not exported to GEDCOM. This is surely just a feature of the current code. GEDCOM source citations have notes and 'text from source', so I would expect data, attributes and notes all to be exported to suitable notes within GEDCOM in due course. The fact that there have been many previous discussions along these lines in the Gramps lists does seem to me to indicate that there is something missing. I think that shared source references with media and attributes would meet most of the needs. I would hope that making source references shared would be more or less transparent to most of the screens and reports in Gramps. Obviously you would need a new category of display, and you would need to allow the user to choose an existing source reference as well as create a new one. Technically you would indeed need a source reference reference, but that is only a technically awkward name. I imagine that one could even hide the link to the intermediate source reference object so that a prototype of shared source reference objects might not even need to change most of the rest of the code which could continue to refer to a source reference and get to it through the s-r-r transparently - just a suggestion - I don't know enough about it to know whether it is really feasible or even desirable. -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3068128.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |
From: Nick H. <nic...@ho...> - 2010-12-01 23:14:58
|
Tim Lyons wrote: > Gerald Britton-2 wrote: > >> Except that it rapidly becomes unmanageable. A census is a singe >> source with millions of entries, each of which will have tens of >> attributes. Source data key/value pairs are completely insufficient >> for this purpose. You would have to make up keys that include some >> identification information for each census entry you want to record. >> So, for some census, you would have: >> >> JohnDoe_House: brick >> NancyDrew_House: stone >> >> etc. Since you might have hundreds of members of your family tree in >> the census, this is clearly unworkable. Unless you abuse the notion >> of a Source and make one up for every census entry, that is. (A >> method I cannot accept) >> >> This is exactly where my suggestion to create Source Reference >> attributes arises. And, it is not limited to census data. It applies >> equally well to birth, marriage and death registers and many other >> similar, bulk sources, that contain many columns of data on hundreds, >> thousands or even millions of individuals. >> --<snip> >> Difficult to search, impossible to compare, except for the most basic >> data. Remember that I have censuses with up to 140 key/value pairs >> per individual. >> >> > > > I want to argue very strongly for the addition of media and if wanted > attributes to Source References, and for Source References to become first > class objects so that they can be shared. > Yes, I have often thought that this would be a good idea. You are suggesting that we convert existing Source References into new primary "Citation" objects. Where we now have a Source Reference, it would become a reference to a Citation. A Citation would contain a reference to a Source. A Citation object could contain Attributes and Media objects. We would create a Citation View, Editor and Selector. Existing editors that create/edit/delete a Source Reference would need updating to add/select/remove a Citation. I think that this has been discussed and rejected in the past, but I'm not sure why. The Citation table would be large, which may have been a factor. Does anyone know anything about this? Nick. > At present if you treat a "large source" as a single source, then you have > problems with managing (manually and by conventions) the components of that > source. You can store transcripts of parts of the source in Source > References, but they are copied on each object that refers to them so are > independent, which causes problems for finding them and updating them. You > can't store attributes, and you can't store media on the source reference. > Storing a transcript on a source reference as a shared note does avoid some > problems, but the information on the source reference is still independent, > so you can get different representations of the page number in what should > be the same source reference. As Gerard says, using source data for key > value pairs rapidly becomes unmanageable, and the data he wants to store are > actually properties of the source reference. > > At present, if you treat each component as a separate source, then you have > an unnatural breakdown that causes difficulties in managing the links to > repositories. It also does not respect the Page number that is a built-in > property of source references. I agree with Benny that a source is something > like a book, so this is not the appropriate model. As Gerard says "Unless > you abuse the notion of a Source and make one up for every census entry, > that is. (A method I cannot accept)" > > If you change Gramps so that you treat a 'large source' as a single source, > but allow media links and if you want, attributes, and they are shared, then > you don't need any conventions as to how to manage things like pages from > that source. Each separate thing that is referenced is a separate source > reference, and these can be shared. > > One of the points Gerard made in his earlier postings was that limitations > arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both > multi-media links as well as notes in Source Citations, which are the exact > equivalent of Source References. I'm afraid I don't understand the > distinction that is being drawn between 'attributes' and 'data' - it seems > to me to be rather artificial. I don't think it's important from a > theoretical point of view whether the things you use for storage in a source > reference are notes, or attributes or data. Whatever is most convenient for > the user to use. Similarly, I don't think it is important that 'Data' is not > exported to GEDCOM. This is surely just a feature of the current code. > GEDCOM source citations have notes and 'text from source', so I would expect > data, attributes and notes all to be exported to suitable notes within > GEDCOM in due course. > > The fact that there have been many previous discussions along these lines in > the Gramps lists does seem to me to indicate that there is something > missing. I think that shared source references with media and attributes > would meet most of the needs. > > I would hope that making source references shared would be more or less > transparent to most of the screens and reports in Gramps. Obviously you > would need a new category of display, and you would need to allow the user > to choose an existing source reference as well as create a new one. > Technically you would indeed need a source reference reference, but that is > only a technically awkward name. I imagine that one could even hide the link > to the intermediate source reference object so that a prototype of shared > source reference objects might not even need to change most of the rest of > the code which could continue to refer to a source reference and get to it > through the s-r-r transparently - just a suggestion - I don't know enough > about it to know whether it is really feasible or even desirable. > |
From: Benny M. <ben...@gm...> - 2010-12-02 08:34:48
|
2010/12/2 Nick Hall <nic...@ho...> > > > Tim Lyons wrote: > > Gerald Britton-2 wrote: > > > >> Except that it rapidly becomes unmanageable. A census is a singe > >> source with millions of entries, each of which will have tens of > >> attributes. Source data key/value pairs are completely insufficient > >> for this purpose. You would have to make up keys that include some > >> identification information for each census entry you want to record. > >> So, for some census, you would have: > >> > >> JohnDoe_House: brick > >> NancyDrew_House: stone > >> > >> etc. Since you might have hundreds of members of your family tree in > >> the census, this is clearly unworkable. Unless you abuse the notion > >> of a Source and make one up for every census entry, that is. (A > >> method I cannot accept) > >> > >> This is exactly where my suggestion to create Source Reference > >> attributes arises. And, it is not limited to census data. It applies > >> equally well to birth, marriage and death registers and many other > >> similar, bulk sources, that contain many columns of data on hundreds, > >> thousands or even millions of individuals. > >> --<snip> > >> Difficult to search, impossible to compare, except for the most basic > >> data. Remember that I have censuses with up to 140 key/value pairs > >> per individual. > >> > >> > > > > > > I want to argue very strongly for the addition of media and if wanted > > attributes to Source References, and for Source References to become > first > > class objects so that they can be shared. > > > > Yes, I have often thought that this would be a good idea. > And I think i is a very, very, very bad idea. I think many of us mean the same thing but use different words and use some equal words for different things, this does not allow discussion, so let's define a unique vocabulary first, and stick to it for the discussion. Let me try to make some things clear. You have a person, and you have a thing to store data of a source. Between the two, you will always have something to store data about the relationship between the two. At least that is how I was taught it when doing my master in applied informatics, and how I have seen it used everywhere where I worked. That is what is now source reference in Gramps. As a logical conclusion, if we make source reference shared, we need a new unique non shared object between the two. A source reference reference? ;-) Anyway, the word reference in Gramps _always_ indicates the relationship information between two objects, so don't use the word for things you want as core object! The discussion will always return to the same thing, how to handle large sources in Gramps. I already discussed that when I started using Gramps, but the admins then did not really think the problem required changes. In my book, best is to mimic as close as possible (with minimum of complexity) the reality. So, think outside the box, and change how some objects behave. My suggestion would be: Source (Data=publication information) --> Source Content <----- source-object-relation ---> object Our source-object-relation is the object storing the unique relationship, and we call that at present "source reference". The question then becomes: 1/ is above suggestion good enough? 2/ what data must be stored under which object? Eg, does it really make sense to store media in the relationship object? Is pagenumber not a part of the source content object? Should there still be something stored under the source-reference? What Tim calls a shared source reference is what I would call source content. What Nick calls Citation I call Source Content. If we list what attributes should be stored under which object, it will be more clear what the best name for the object is. One can make things more complicated with an object model as: Source (Data=publication information) --> Source Content <----- deduction-process ----> Information <--- information-object-relation ---> object But I do not think our users would have the time to actually work like that. An important constraint for Gramps is that it must still be easy to create a family tree, even if that means breaking away from the possibility to mimic reality as close as possible. I don't mind changes to the core of Gramps, but they must be _completely_ worked out so we are certain things are sound. So let's discuss, but in the end, first a GEP must be made with a full documentation of all the changes to make working with large sources a joy in Gramps. So, next step would be somebody listing the objects above, and indicating what of our present fields go with what object, and what possible extra fields are needed. To end the definition of vocabulary, Attributes are Data with a source (and notes, but less imporant). Data are key,value pairs. As a source does not link to another source, it has Data, not Attributes. Feel free to come up with better names. Benny > You are suggesting that we convert existing Source References into new > primary "Citation" objects. Where we now have a Source Reference, it > would become a reference to a Citation. A Citation would contain a > reference to a Source. > > A Citation object could contain Attributes and Media objects. > > We would create a Citation View, Editor and Selector. Existing editors > that create/edit/delete a Source Reference would need updating to > add/select/remove a Citation. > > I think that this has been discussed and rejected in the past, but I'm > not sure why. The Citation table would be large, which may have been a > factor. Does anyone know anything about this? > > > Nick. > > > > At present if you treat a "large source" as a single source, then you > have > > problems with managing (manually and by conventions) the components of > that > > source. You can store transcripts of parts of the source in Source > > References, but they are copied on each object that refers to them so are > > independent, which causes problems for finding them and updating them. > You > > can't store attributes, and you can't store media on the source > reference. > > Storing a transcript on a source reference as a shared note does avoid > some > > problems, but the information on the source reference is still > independent, > > so you can get different representations of the page number in what > should > > be the same source reference. As Gerard says, using source data for key > > value pairs rapidly becomes unmanageable, and the data he wants to store > are > > actually properties of the source reference. > > > > At present, if you treat each component as a separate source, then you > have > > an unnatural breakdown that causes difficulties in managing the links to > > repositories. It also does not respect the Page number that is a built-in > > property of source references. I agree with Benny that a source is > something > > like a book, so this is not the appropriate model. As Gerard says "Unless > > you abuse the notion of a Source and make one up for every census entry, > > that is. (A method I cannot accept)" > > > > If you change Gramps so that you treat a 'large source' as a single > source, > > but allow media links and if you want, attributes, and they are shared, > then > > you don't need any conventions as to how to manage things like pages from > > that source. Each separate thing that is referenced is a separate source > > reference, and these can be shared. > > > > One of the points Gerard made in his earlier postings was that > limitations > > arose from our adherence to GEDCOM. However, in fact, GEDCOM provides > both > > multi-media links as well as notes in Source Citations, which are the > exact > > equivalent of Source References. I'm afraid I don't understand the > > distinction that is being drawn between 'attributes' and 'data' - it > seems > > to me to be rather artificial. I don't think it's important from a > > theoretical point of view whether the things you use for storage in a > source > > reference are notes, or attributes or data. Whatever is most convenient > for > > the user to use. Similarly, I don't think it is important that 'Data' is > not > > exported to GEDCOM. This is surely just a feature of the current code. > > GEDCOM source citations have notes and 'text from source', so I would > expect > > data, attributes and notes all to be exported to suitable notes within > > GEDCOM in due course. > > > > The fact that there have been many previous discussions along these lines > in > > the Gramps lists does seem to me to indicate that there is something > > missing. I think that shared source references with media and attributes > > would meet most of the needs. > > > > I would hope that making source references shared would be more or less > > transparent to most of the screens and reports in Gramps. Obviously you > > would need a new category of display, and you would need to allow the > user > > to choose an existing source reference as well as create a new one. > > Technically you would indeed need a source reference reference, but that > is > > only a technically awkward name. I imagine that one could even hide the > link > > to the intermediate source reference object so that a prototype of shared > > source reference objects might not even need to change most of the rest > of > > the code which could continue to refer to a source reference and get to > it > > through the s-r-r transparently - just a suggestion - I don't know enough > > about it to know whether it is really feasible or even desirable. > > > > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today with the > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Nick H. <nic...@ho...> - 2010-12-02 12:00:44
|
Benny Malengier wrote: > > > 2010/12/2 Nick Hall <nic...@ho... > <mailto:nic...@ho...>> > > > > Tim Lyons wrote: > > Gerald Britton-2 wrote: > > > >> Except that it rapidly becomes unmanageable. A census is a singe > >> source with millions of entries, each of which will have tens of > >> attributes. Source data key/value pairs are completely > insufficient > >> for this purpose. You would have to make up keys that include some > >> identification information for each census entry you want to > record. > >> So, for some census, you would have: > >> > >> JohnDoe_House: brick > >> NancyDrew_House: stone > >> > >> etc. Since you might have hundreds of members of your family > tree in > >> the census, this is clearly unworkable. Unless you abuse the > notion > >> of a Source and make one up for every census entry, that is. (A > >> method I cannot accept) > >> > >> This is exactly where my suggestion to create Source Reference > >> attributes arises. And, it is not limited to census data. It > applies > >> equally well to birth, marriage and death registers and many other > >> similar, bulk sources, that contain many columns of data on > hundreds, > >> thousands or even millions of individuals. > >> --<snip> > >> Difficult to search, impossible to compare, except for the most > basic > >> data. Remember that I have censuses with up to 140 key/value pairs > >> per individual. > >> > >> > > > > > > I want to argue very strongly for the addition of media and if > wanted > > attributes to Source References, and for Source References to > become first > > class objects so that they can be shared. > > > > Yes, I have often thought that this would be a good idea. > > > And I think i is a very, very, very bad idea. > I think many of us mean the same thing but use different words and use > some equal words for different things, this does not allow discussion, > so let's define a unique vocabulary first, and stick to it for the > discussion. > > Let me try to make some things clear. > You have a person, and you have a thing to store data of a source. > Between the two, you will always have something to store data about > the relationship between the two. At least that is how I was taught it > when doing my master in applied informatics, and how I have seen it > used everywhere where I worked. That is what is now source reference > in Gramps. As a logical conclusion, if we make source reference > shared, we need a new unique non shared object between the two. A > source reference reference? ;-) > Anyway, the word reference in Gramps _always_ indicates the > relationship information between two objects, so don't use the word > for things you want as core object! > > The discussion will always return to the same thing, how to handle > large sources in Gramps. I already discussed that when I started using > Gramps, but the admins then did not really think the problem required > changes. > In my book, best is to mimic as close as possible (with minimum of > complexity) the reality. So, think outside the box, and change how > some objects behave. > > My suggestion would be: > > Source (Data=publication information) --> Source Content <----- > source-object-relation ---> object > > Our source-object-relation is the object storing the unique > relationship, and we call that at present "source reference". > > The question then becomes: > 1/ is above suggestion good enough? > 2/ what data must be stored under which object? Eg, does it really > make sense to store media in the relationship object? Is pagenumber > not a part of the source content object? Should there still be > something stored under the source-reference? > > What Tim calls a shared source reference is what I would call source > content. What Nick calls Citation I call Source Content. If we list > what attributes should be stored under which object, it will be more > clear what the best name for the object is. Benny, I think we are all suggesting the same thing. You have just described it better, and we have been using different terminology. The new primary object will be: SourceContent SourceRef Date Volume/Page Note Media Data All objects that contain SourceRef will be changed to contain SourceContentRef: SourceContentRef Confidence Note SourceRef will become the reference between a SourceContent and a Source object. I don't think that it will have any content except the source handle. I think that I would still call "Source Content", "Citation". :) Creating a SourceContent view would be easy. The SourceContent selector should always display a source title to provide context. The extra level could be tedious for users when entering a new source. We would have to be careful about the design of the editors. Would this extra level be confusing for Aunt Martha? A user could create a SourceContent without a Source. The data stored in the SourceContent object may change with GEPS 018: Evidence style sources. Has this been discussed before? Nick. > > One can make things more complicated with an object model as: > > Source (Data=publication information) --> Source Content <----- > deduction-process ----> Information <--- information-object-relation > ---> object > > But I do not think our users would have the time to actually work like > that. An important constraint for Gramps is that it must still be easy > to create a family tree, even if that means breaking away from the > possibility to mimic reality as close as possible. > > I don't mind changes to the core of Gramps, but they must be > _completely_ worked out so we are certain things are sound. So let's > discuss, but in the end, first a GEP must be made with a full > documentation of all the changes to make working with large sources a > joy in Gramps. > > So, next step would be somebody listing the objects above, and > indicating what of our present fields go with what object, and what > possible extra fields are needed. > > To end the definition of vocabulary, Attributes are Data with a source > (and notes, but less imporant). Data are key,value pairs. As a source > does not link to another source, it has Data, not Attributes. Feel > free to come up with better names. > > Benny > > > You are suggesting that we convert existing Source References into new > primary "Citation" objects. Where we now have a Source Reference, it > would become a reference to a Citation. A Citation would contain a > reference to a Source. > > A Citation object could contain Attributes and Media objects. > > We would create a Citation View, Editor and Selector. Existing editors > that create/edit/delete a Source Reference would need updating to > add/select/remove a Citation. > > I think that this has been discussed and rejected in the past, but I'm > not sure why. The Citation table would be large, which may have been a > factor. Does anyone know anything about this? > > > Nick. > > > > At present if you treat a "large source" as a single source, > then you have > > problems with managing (manually and by conventions) the > components of that > > source. You can store transcripts of parts of the source in Source > > References, but they are copied on each object that refers to > them so are > > independent, which causes problems for finding them and updating > them. You > > can't store attributes, and you can't store media on the source > reference. > > Storing a transcript on a source reference as a shared note does > avoid some > > problems, but the information on the source reference is still > independent, > > so you can get different representations of the page number in > what should > > be the same source reference. As Gerard says, using source data > for key > > value pairs rapidly becomes unmanageable, and the data he wants > to store are > > actually properties of the source reference. > > > > At present, if you treat each component as a separate source, > then you have > > an unnatural breakdown that causes difficulties in managing the > links to > > repositories. It also does not respect the Page number that is a > built-in > > property of source references. I agree with Benny that a source > is something > > like a book, so this is not the appropriate model. As Gerard > says "Unless > > you abuse the notion of a Source and make one up for every > census entry, > > that is. (A method I cannot accept)" > > > > If you change Gramps so that you treat a 'large source' as a > single source, > > but allow media links and if you want, attributes, and they are > shared, then > > you don't need any conventions as to how to manage things like > pages from > > that source. Each separate thing that is referenced is a > separate source > > reference, and these can be shared. > > > > One of the points Gerard made in his earlier postings was that > limitations > > arose from our adherence to GEDCOM. However, in fact, GEDCOM > provides both > > multi-media links as well as notes in Source Citations, which > are the exact > > equivalent of Source References. I'm afraid I don't understand the > > distinction that is being drawn between 'attributes' and 'data' > - it seems > > to me to be rather artificial. I don't think it's important from a > > theoretical point of view whether the things you use for storage > in a source > > reference are notes, or attributes or data. Whatever is most > convenient for > > the user to use. Similarly, I don't think it is important that > 'Data' is not > > exported to GEDCOM. This is surely just a feature of the current > code. > > GEDCOM source citations have notes and 'text from source', so I > would expect > > data, attributes and notes all to be exported to suitable notes > within > > GEDCOM in due course. > > > > The fact that there have been many previous discussions along > these lines in > > the Gramps lists does seem to me to indicate that there is something > > missing. I think that shared source references with media and > attributes > > would meet most of the needs. > > > > I would hope that making source references shared would be more > or less > > transparent to most of the screens and reports in Gramps. > Obviously you > > would need a new category of display, and you would need to > allow the user > > to choose an existing source reference as well as create a new one. > > Technically you would indeed need a source reference reference, > but that is > > only a technically awkward name. I imagine that one could even > hide the link > > to the intermediate source reference object so that a prototype > of shared > > source reference objects might not even need to change most of > the rest of > > the code which could continue to refer to a source reference and > get to it > > through the s-r-r transparently - just a suggestion - I don't > know enough > > about it to know whether it is really feasible or even desirable. > > > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today > with the > Intel(R) Software Partner Program. Five $500 cash prizes are up > for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > <mailto:Gra...@li...> > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > |
From: Benny M. <ben...@gm...> - 2010-12-02 13:02:52
|
2010/12/2 Nick Hall <nic...@ho...> > Benny, > > I think we are all suggesting the same thing. You have just described it > better, and we have been using different terminology. > > The new primary object will be: > > SourceContent > SourceRef > Date > Volume/Page > Note > Media > Data > > All objects that contain SourceRef will be changed to contain > SourceContentRef: > > SourceContentRef > Confidence > Note > > SourceRef will become the reference between a SourceContent and a Source > object. I don't think that it will have any content except the source > handle. > > I think that I would still call "Source Content", "Citation". :) > > Creating a SourceContent view would be easy. The SourceContent selector > should always display a source title to provide context. > > The extra level could be tedious for users when entering a new source. We > would have to be careful about the design of the editors. > > Would this extra level be confusing for Aunt Martha? A user could create a > SourceContent without a Source. > > The data stored in the SourceContent object may change with GEPS 018: > Evidence style sources. > > Has this been discussed before? > Many emails, no design. We could make things more versatile and consider the informtion learned from a source as an object. So 1 Source can have N SourceContent 1 Information can refer to several SourceContent 1 object can have several Informations. Like this, Information is like a citation, but also holding the conclusion itself. Anyway, also complicated. To go to your design, I don't think SourceContent must be presented in the interface as a core object. Instead a treeview Source-SourceContent seems more natural to me. I would do the attributes different. Source 1 Title 1 Author 1 Gramps ID 1 Abbr 1 Pulication Information 1 Global Confidence n Publication Data (key value pairs, eg Publication Date, Publisher, ...) n MediaRef (Region, Src, attr, notes) --> Media n RepoRef (Type, Callnumber) --> Repo SourceContent 1 Source (GrampsID) 1 Confidence (5 values) 1 Volume 1 Page 1 LogDate 1 Linenumber 1 Position (eg. Upper Left Corner of image) n Information (key, value pairs, current Data) n NoteIds n MediaRef (Region, Src, attr, notes) --> Media SourceContentRef (called Citation in the interface, part of objects with sources) 1 Type: Transcript or Deduction 1 Deduction Confidence (5 values) 1 Argumentation (one line string) n Note So, in this design, one must envision that Source and sourcecontent form one single editor. The source editor would have a list of sourcecontent, and if you select one, you see to the right the detail about that content. When adding a citation to eg a person, you obtain a treeview source-sourcecontent, so if you select a census entry, you immediately see the data of that entry you stored. Confidence is given globally, of the content (same as globally by default), and of the deduction. The SourceContentRef is there to hold the process of deducing information you add to eg a person as coming from a source. In many cases, a pure transcript of the source is done, and no deduction happens, in which case this object contains nothing of interest. If one however makes a deduction, then one can store this here specifically. Eg you find the name Nic__ where you cannot make out what the last hand written letters are, and you save the name as Nick, with reference to this source. Then the sourcereference can indicate why you decide to use Nick and not eg Nicki. Time for somebody else to refine/change things. Benny |
From: Tim L. <guy...@gm...> - 2010-12-03 21:51:30
|
Thanks for your suggestions Benny, I think we may be moving towards a consensus. I have created a GEPS to outline a change. Benny Malengier wrote: > > I don't think SourceContent > must be presented in the interface as a core object. Instead a treeview > Source-SourceContent seems more natural to me. > Thanks for the suggestion. A treeview for the Source View and the selector works well. Benny Malengier wrote: > > I would do the attributes > different. > > Source > 1 Title > 1 Author > 1 Gramps ID > 1 Abbr > 1 Pulication Information > 1 Global Confidence > n Publication Data (key value pairs, eg Publication Date, Publisher, > ...) > n MediaRef (Region, Src, attr, notes) --> Media > n RepoRef (Type, Callnumber) --> Repo > > SourceContent > 1 Source (GrampsID) > 1 Confidence (5 values) > 1 Volume > 1 Page > 1 LogDate > 1 Linenumber > 1 Position (eg. Upper Left Corner of image) > n Information (key, value pairs, current Data) > n NoteIds > n MediaRef (Region, Src, attr, notes) --> Media > I agree except that I wouldn't remove the Notes field from the Source. This would be too awkward for people who are already using it, and is relevant where the source is not 'large'. I wonder whether we should keep Volume/Page instead of separate Volume, Page, Linenumber and Position for this enhancement. There is a proposal (GEPS 018) which would change the fields in the SourceContent according to a Source Type. http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources I wonder whether it would be better that we wait for this, rather than changing the fields twice. In any case, there are plenty of cases where breakdowns other than the proposed one are more appropriate. Benny Malengier wrote: > > SourceContentRef (called Citation in the interface, part of objects with > sources) > 1 Type: Transcript or Deduction > 1 Deduction Confidence (5 values) > 1 Argumentation (one line string) > n Note > I have not included this in the GEPS, because it seems to relate to how deductions are stored, and as such may not be directly related to this enhancement. Also I am concerned that this may make the user experience too complicated. In the GEPS, users who are happy with the existing interface will see little change (change always frightens users); those who want more will be able to use the additional features. Benny Malengier wrote: > > So, in this design, one must envision that Source and sourcecontent form > one > single editor. > I agree, having a single editor makes things simpler for the user and ensures that the workflow does not get more complicated. Benny Malengier wrote: > > When adding a citation to eg a person, you obtain a treeview > source-sourcecontent, so if you select a census entry, you immediately see > the data of that entry you stored. > I agree - a treeview will make it no more complicated to select an existing Source or SourceContent than it is at present. Benny Malengier wrote: > > Confidence is given globally, of the content (same as globally by > default), > and of the deduction. The SourceContentRef is there to hold the process of > deducing information you add to eg a person as coming from a source. In > many > cases, a pure transcript of the source is done, and no deduction happens, > in > which case this object contains nothing of interest. If one however makes > a > deduction, then one can store this here specifically. Eg you find the name > Nic__ where you cannot make out what the last hand written letters are, > and > you save the name as Nick, with reference to this source. Then the > sourcereference can indicate why you decide to use Nick and not eg Nicki. > As I mentioned, I have not included the fields of a SourceContentRef in the GEPS. They could be added if there is a general desire to do so. The GEPS is at http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3071825.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |
From: Nick H. <nic...@ho...> - 2010-12-19 14:19:14
|
Benny, Tim has done a lot of work on the GEPS, and it is now at a stage where I think that it would be helpful if you could review it. http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources My main concern is that the Citation Reference editor may by rather complicated and large. What do you think? Could we combine the Type and Deduction Confidence in the citation reference? A Transcript type would imply a high confidence, whereas a deduction would be a lower confidence. We could go down to a "Guess" which would imply a very low confidence. I didn't use the Source Confidence when I started to use Gramps, because I was unsure which value to choose. The values in the GEDCOM standard (Direct/Primary, Secondary, Questionable, Unreliable) are more obvious how to use than the Gramps values (Very High, High, Normal, Low, Very Low). Perhaps we could choose more descriptive values for the Deduction Type? Nick. Tim Lyons wrote: > Thanks for your suggestions Benny, I think we may be moving towards a > consensus. I have created a GEPS to outline a change. > > Benny Malengier wrote: > >> I don't think SourceContent >> must be presented in the interface as a core object. Instead a treeview >> Source-SourceContent seems more natural to me. >> >> > Thanks for the suggestion. A treeview for the Source View and the selector > works well. > > Benny Malengier wrote: > >> I would do the attributes >> different. >> >> Source >> 1 Title >> 1 Author >> 1 Gramps ID >> 1 Abbr >> 1 Pulication Information >> 1 Global Confidence >> n Publication Data (key value pairs, eg Publication Date, Publisher, >> ...) >> n MediaRef (Region, Src, attr, notes) --> Media >> n RepoRef (Type, Callnumber) --> Repo >> >> SourceContent >> 1 Source (GrampsID) >> 1 Confidence (5 values) >> 1 Volume >> 1 Page >> 1 LogDate >> 1 Linenumber >> 1 Position (eg. Upper Left Corner of image) >> n Information (key, value pairs, current Data) >> n NoteIds >> n MediaRef (Region, Src, attr, notes) --> Media >> >> > I agree except that I wouldn't remove the Notes field from the Source. This > would be too awkward for people who are already using it, and is relevant > where the source is not 'large'. > > I wonder whether we should keep Volume/Page instead of separate Volume, > Page, Linenumber and Position for this enhancement. There is a proposal > (GEPS 018) which would change the fields in the SourceContent according to a > Source Type. > http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources > I wonder whether it would be better that we wait for this, rather than > changing the fields twice. In any case, there are plenty of cases where > breakdowns other than the proposed one are more appropriate. > > Benny Malengier wrote: > >> SourceContentRef (called Citation in the interface, part of objects with >> sources) >> 1 Type: Transcript or Deduction >> 1 Deduction Confidence (5 values) >> 1 Argumentation (one line string) >> n Note >> >> > I have not included this in the GEPS, because it seems to relate to how > deductions are stored, and as such may not be directly related to this > enhancement. Also I am concerned that this may make the user experience too > complicated. In the GEPS, users who are happy with the existing interface > will see little change (change always frightens users); those who want more > will be able to use the additional features. > > Benny Malengier wrote: > >> So, in this design, one must envision that Source and sourcecontent form >> one >> single editor. >> >> > I agree, having a single editor makes things simpler for the user and > ensures that the workflow does not get more complicated. > > Benny Malengier wrote: > >> When adding a citation to eg a person, you obtain a treeview >> source-sourcecontent, so if you select a census entry, you immediately see >> the data of that entry you stored. >> >> > I agree - a treeview will make it no more complicated to select an existing > Source or SourceContent than it is at present. > > Benny Malengier wrote: > >> Confidence is given globally, of the content (same as globally by >> default), >> and of the deduction. The SourceContentRef is there to hold the process of >> deducing information you add to eg a person as coming from a source. In >> many >> cases, a pure transcript of the source is done, and no deduction happens, >> in >> which case this object contains nothing of interest. If one however makes >> a >> deduction, then one can store this here specifically. Eg you find the name >> Nic__ where you cannot make out what the last hand written letters are, >> and >> you save the name as Nick, with reference to this source. Then the >> sourcereference can indicate why you decide to use Nick and not eg Nicki. >> >> > As I mentioned, I have not included the fields of a SourceContentRef in the > GEPS. They could be added if there is a general desire to do so. > > > The GEPS is at > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources > |
From: Frederico M. <fs...@gm...> - 2011-01-01 18:36:56
|
Hello, Yet again apologies for the delay in participating, this is an issue that I've previously discussed privately with Tim as well as in the list. I have a different take on the problem, which doesn't mean that changes aren't welcome - especially given the detailed GEPS than Tim created, thanks a lot for that. I still have to read the GEPS more times to completely understand it but for now I have some doubts whose clarification will help in my more complete understanding of it. For now let me answer quickly my approach to the "problem that needs to be solved" (http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources). My way of doing things is actually reflected in there, with some objections, which I will tackle in a minute. > I have a book that details, on page 7: > > “In the 1870s B moved to the town of BT. It was here that I's father K was born in 1860. By the time he was 30 he had married. > His first child M was born there. Shortly afterwards his wife died and two years later he married G. M was 12 before her brother > I appeared.” > I would create a source for the book, or a source for the book collection (e.g. Baptism Records for a single church can have many books, divided by year range, that cover centuries. I only create one source). > So I wish to record B, K, and the fact that K was born in 1860, and married around 1890. K's children were M and I, M was born > around 1890 and I was born around 1900. [Actually, from other sources he was born on 5 Dec 1902.] I need to record page 7 > of this book as the source for all these pieces of information. I would create a source reference for page 7 of the book, copy it to the clipboard and use it in each event/assertion. I would further add a specific *citation* (TEXT_FROM_SOURCE) to each different source reference that deals only with the specific event. Example: in K's birth event I would add "... It was here that I's father K was born in 1860...", etc. So, each Source Reference contains a different citation, making it unique. > Some time later I decide I should record a transcript of the source text. I always add the transcriptions in the Source. > Some time later, I decide to scan that page of the book, and need to store the scan as the source. I always add the scans to the Source. Yes, I end up having to look at the Source gallery for the specific scan that supports the Source Reference, although this is mitigated by the use of citations somewhat > Later still, I discover that page 212 of the same book details that I married W in 1946. I would add a source reference to the same source, but with a different page, in the marriage event. Would also add a citation. > Now I wish to record W, and the marriage of W and I in 1946. The source for all this is page 212 of the book, and this time I record > the scan against the source. Same as above. The objections listed are: > * The Source Reference does not allow the Media scan to be stored. I agree. This was one of my original problems - although the scan should be present in the Source, a way to link it to a source reference would be nice. > * The Source Reference is not shared, there is a separate instance for each place where it occurs (e.g. each event). I depend on this behaviour, and the GEPS makes explicit mention to the way I do things. I would like to note that what I do is what it already present in GEDCOM (http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#SOURCE_CITATION): ># Actual text from the source that was used in making assertions, for example a date phrase as actually recorded or an applicable > sentence from a letter, would be appropriate. ># Data that allows an assessment of the relative value of one source over another for making the recorded assertions (primary or > secondary source, etc.). Data needed for this assessment is how much time from the asserted fact and when the source event was > recorded, what type of event was cited, and what was the role of this person in the cited event. > > -Date when the entry was recorded in source document, ".SOUR.DATA.DATE." > -Event that initiated the recording, ".SOUR.EVEN." > -Role of this person in the event, ".SOUR.EVEN.ROLE". The second paragraph is something not yet supporter in Gramps (see http://bugs.gramps-project.org/view.php?id=2918&PHPSESSID=c5e5f69a0d0e4353852a8d5aa8ab66ad and http://bugs.gramps-project.org/view.php?id=2924). Having said that: > Note that there is an argument that separate source references for the different events is preferable, because the exact text > that relates to that particular event can be attached. For example, for the birth event for person K, one could attach: “…the > town of BT. It was here that I's father K was born in 1860…”. There are two objections to this: > > * It is difficult to identify exactly which parts of the text are relevant to each event. Should I’s father be included in the source for K’s birth? While there will always be the need for some personal criteria (this is far from exact an exact science), this is no different from other decisions concerning where should source references be added. I do not understand the objection very well though (my fault), but yes: if the supporting information concerning K's birth is derived from that sentence I would use "...It was here that I's father K was born in 1860...". If I knew the place that "here" is supposed to mean I would put it inside brackets. This way I will know exactly why I have K's birth in 1860. Without this sort of event-specific citations (read, TEXT_FROM_SOURCE, a source note added to a source reference) I would have to go find out by reading the entire source. > * It is far too tedious and laborious to devise separate source texts for each event. Given that the original paragraph giving family history > information (this is a genuine example) is quite short, it is much quicker and easier to include the whole paragraph in each reference. Well, sharing the Source Text note would be an option...the problem here is that while sharing supporting citation work for one-liners not all (and certainly not most in my experience) sources are like that. And by making Source References something "shared" it stops being possible to provide adequate citations that support the specific event. > When it comes to adding the scan, the only option I really have is to add it to the source itself, despite the fact that the scan only relates to one page. True, but the scanned image is from the Source. What I miss is a way to specify in a specific source reference that a certain page from the Source is used (not transferring the scans from sources to source references). So, for me it is important that any improvement maintains the ability to keep source reference specific content. Since I use citations for everything (this is why they exist, and in PAF for example citations have a first-order UI element that helps a lot, I have made a feature request about this) sharing Source References would not work since changing a citation would mean changing all of them. A different matter is the way to "split" sources. Again using Church Records as an example I have often felt the need for an hierarchical classification, similar to what is used in the repositories I use: looking at http://pesquisa.adporto.pt/cravfrontoffice/default.aspx?page=regShow&ID=488904&searchMode=as in the right side one can see a tree. The organisation is hierarchical, with a top category for the Parish, which contains different "series" (Baptism, Marriages, etc), each containing an "installation unit" (a specific book, for a specific time period). Since sources in Gramps are "flat" this is not entirely different from what was done with Places. I'm sure that this GEPS will lead to a better way to do things, I'm just presenting some initial considerations that I deem relevant. Cheers, Frederico |
From: Gerald B. <ger...@gm...> - 2011-01-01 19:31:54
|
Whew! Your idea of a quick answer and mine are clearly not in alignment! Anyway, I appreciate your insight and for many things your approach mirrors my own. As I originally stated the problem however, I was referring to bulk sources such as censuses or BMD registers that contain data on thousands or even millions of people. Plus, these sources frequently contain interesting information that do not have a natural home in GEDCOM or gramps, other than as textual data. For example, Canadian censuses often record the construction material of the house where a family lived when it was polled. Now, I may eventually want a Residence event; then, the construction material might be a good event attribute. However, I wish to capture the data from the census *all at once* and *in one place*, including an image of the page where the data is found. Later, I can build other event types from the data. Also, I would like to have key/value pairs for each data point, for ease of comparison with other censuses. This idea forms the foundation of the GEPS, I believe. So I would have an Event (Census), with a Source (1901 Census of Canada), with a Source Reference (RG31, Alberta, Calgary, District 35, Sub District 1, page 2, line 3) and a matching Source Contents containing all the data points -- as key, value pairs -- on that line in the census (for at least one Canadian census, there are over 100 data points!) plus a Media Object reference to the image of the page itself -- either stored locally or on the Library and Archives Canada site (which is also the Repository for my source). The Census gramplet does much of what I'm talking about except that it stores the attributes as Event Reference attributes. That has limitations (especially sharability) and I would argue that the number of sheep my g.grandfather had is not an attribute of the Census but rather of my grandfather or perhaps the farm he had at the time. It is the search for a more general solution that prompted this thread and the GEPS. On Sat, Jan 1, 2011 at 1:36 PM, Frederico Muñoz <fs...@gm...> wrote: > Hello, > > Yet again apologies for the delay in participating, this is an issue > that I've previously discussed privately with Tim as well as in the > list. > > I have a different take on the problem, which doesn't mean that > changes aren't welcome - especially given the detailed GEPS than Tim > created, thanks a lot for that. > > I still have to read the GEPS more times to completely understand it > but for now I have some doubts whose clarification will help in my > more complete understanding of it. > > For now let me answer quickly my approach to the "problem that needs > to be solved" (http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources). > My way of doing things is actually reflected in there, with some > objections, which I will tackle in a minute. > >> I have a book that details, on page 7: >> >> “In the 1870s B moved to the town of BT. It was here that I's father K was born in 1860. By the time he was 30 he had married. >> His first child M was born there. Shortly afterwards his wife died and two years later he married G. M was 12 before her brother >> I appeared.” >> > > I would create a source for the book, or a source for the book > collection (e.g. Baptism Records for a single church can have many > books, divided by year range, that cover centuries. I only create one > source). > >> So I wish to record B, K, and the fact that K was born in 1860, and married around 1890. K's children were M and I, M was born >> around 1890 and I was born around 1900. [Actually, from other sources he was born on 5 Dec 1902.] I need to record page 7 >> of this book as the source for all these pieces of information. > > I would create a source reference for page 7 of the book, copy it to > the clipboard and use it in each event/assertion. I would further add > a specific *citation* (TEXT_FROM_SOURCE) to each different source > reference that deals only with the specific event. Example: in K's > birth event I would add "... It was here that I's father K was born in > 1860...", etc. So, each Source Reference contains a different > citation, making it unique. > >> Some time later I decide I should record a transcript of the source text. > > I always add the transcriptions in the Source. > >> Some time later, I decide to scan that page of the book, and need to store the scan as the source. > > I always add the scans to the Source. Yes, I end up having to look at > the Source gallery for the specific scan that supports the Source > Reference, although this is mitigated by the use of citations somewhat > >> Later still, I discover that page 212 of the same book details that I married W in 1946. > > I would add a source reference to the same source, but with a > different page, in the marriage event. Would also add a citation. > >> Now I wish to record W, and the marriage of W and I in 1946. The source for all this is page 212 of the book, and this time I record >> the scan against the source. > > Same as above. > > The objections listed are: > >> * The Source Reference does not allow the Media scan to be stored. > > I agree. This was one of my original problems - although the scan > should be present in the Source, a way to link it to a source > reference would be nice. > >> * The Source Reference is not shared, there is a separate instance for each place where it occurs (e.g. each event). > > I depend on this behaviour, and the GEPS makes explicit mention to the > way I do things. I would like to note that what I do is what it > already present in GEDCOM > (http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#SOURCE_CITATION): > >># Actual text from the source that was used in making assertions, for example a date phrase as actually recorded or an applicable >> sentence from a letter, would be appropriate. >># Data that allows an assessment of the relative value of one source over another for making the recorded assertions (primary or >> secondary source, etc.). Data needed for this assessment is how much time from the asserted fact and when the source event was >> recorded, what type of event was cited, and what was the role of this person in the cited event. >> >> -Date when the entry was recorded in source document, ".SOUR.DATA.DATE." >> -Event that initiated the recording, ".SOUR.EVEN." >> -Role of this person in the event, ".SOUR.EVEN.ROLE". > > The second paragraph is something not yet supporter in Gramps (see > http://bugs.gramps-project.org/view.php?id=2918&PHPSESSID=c5e5f69a0d0e4353852a8d5aa8ab66ad > and http://bugs.gramps-project.org/view.php?id=2924). > > Having said that: > >> Note that there is an argument that separate source references for the different events is preferable, because the exact text >> that relates to that particular event can be attached. For example, for the birth event for person K, one could attach: “…the >> town of BT. It was here that I's father K was born in 1860…”. There are two objections to this: >> >> * It is difficult to identify exactly which parts of the text are relevant to each event. Should I’s father be included in the source for K’s birth? > > While there will always be the need for some personal criteria (this > is far from exact an exact science), this is no different from other > decisions concerning where should source references be added. I do not > understand the objection very well though (my fault), but yes: if the > supporting information concerning K's birth is derived from that > sentence I would use "...It was here that I's father K was born in > 1860...". If I knew the place that "here" is supposed to mean I would > put it inside brackets. This way I will know exactly why I have K's > birth in 1860. Without this sort of event-specific citations (read, > TEXT_FROM_SOURCE, a source note added to a source reference) I would > have to go find out by reading the entire source. > >> * It is far too tedious and laborious to devise separate source texts for each event. Given that the original paragraph giving family history >> information (this is a genuine example) is quite short, it is much quicker and easier to include the whole paragraph in each reference. > > Well, sharing the Source Text note would be an option...the problem > here is that while sharing supporting citation work for one-liners not > all (and certainly not most in my experience) sources are like that. > And by making Source References something "shared" it stops being > possible to provide adequate citations that support the specific > event. > >> When it comes to adding the scan, the only option I really have is to add it to the source itself, despite the fact that the scan only relates to one page. > > True, but the scanned image is from the Source. What I miss is a way > to specify in a specific source reference that a certain page from the > Source is used (not transferring the scans from sources to source > references). > > So, for me it is important that any improvement maintains the ability > to keep source reference specific content. Since I use citations for > everything (this is why they exist, and in PAF for example citations > have a first-order UI element that helps a lot, I have made a feature > request about this) sharing Source References would not work since > changing a citation would mean changing all of them. > > A different matter is the way to "split" sources. Again using Church > Records as an example I have often felt the need for an hierarchical > classification, similar to what is used in the repositories I use: > looking at http://pesquisa.adporto.pt/cravfrontoffice/default.aspx?page=regShow&ID=488904&searchMode=as > in the right side one can see a tree. The organisation is > hierarchical, with a top category for the Parish, which contains > different "series" (Baptism, Marriages, etc), each containing an > "installation unit" (a specific book, for a specific time period). > Since sources in Gramps are "flat" this is not entirely different from > what was done with Places. > > I'm sure that this GEPS will lead to a better way to do things, I'm > just presenting some initial considerations that I deem relevant. > > Cheers, > > Frederico > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > -- Gerald Britton |
From: Frederico M. <fs...@gm...> - 2011-01-02 16:15:02
|
Hello, 2011/1/1 Gerald Britton <ger...@gm...>: > Whew! Your idea of a quick answer and mine are clearly not in > alignment! Ehe, this one wasn't so long ago in hindsight, true. > Anyway, I appreciate your insight and for many things your > approach mirrors my own. As I originally stated the problem however, > I was referring to bulk sources such as censuses or BMD registers that > contain data on thousands or even millions of people. Plus, these > sources frequently contain interesting information that do not have a > natural home in GEDCOM or gramps, other than as textual data. Indeed, my apologies, I almost entirely disregarded your initial message to focus on the GEPS. > For example, Canadian censuses often record the construction material > of the house where a family lived when it was polled. Damn, you guys are thorough! > Now, I may > eventually want a Residence event; then, the construction material > might be a good event attribute. However, I wish to capture the data > from the census *all at once* and *in one place*, including an image > of the page where the data is found. Later, I can build other event > types from the data. Also, I would like to have key/value pairs for > each data point, for ease of comparison with other censuses. This > idea forms the foundation of the GEPS, I believe. Yes, as I added in a latter message my perspective is tinted by my almost complete reliance on Church Records, not Census (much to my sadness). > So I would have an Event (Census), with a Source (1901 Census of > Canada), with a Source Reference (RG31, Alberta, Calgary, District 35, > Sub District 1, page 2, line 3) and a matching Source Contents > containing all the data points -- as key, value pairs -- on that line > in the census (for at least one Canadian census, there are over 100 > data points!) plus a Media Object reference to the image of the page > itself -- either stored locally or on the Library and Archives Canada > site (which is also the Repository for my source). I see, it makes sense given the tabular nature of Census... the problem being the implementation of key/value pairs I suppose... > The Census gramplet does much of what I'm talking about except that it > stores the attributes as Event Reference attributes. That has > limitations (especially sharability) and I would argue that the number > of sheep my g.grandfather had is not an attribute of the Census but > rather of my grandfather or perhaps the farm he had at the time. For me it is first of all an "attribute" of the source, which can then be used as something that backs up attributes of a person or event. The number of sheep is no different, I think, from e.g. the number of children a person is said to have on a particular source (say, a death certificate or something like that). Having said that I do understand the problem here: when considering that a Census of a particular year is a Source, and each entre a SourceReference, all the information present in a single line (let alone a single page, I imagine...) is used to back up numerous events, hence the need to share these SourceRefs (since copying would mean that correcting the census entry would have to be done multiple times)... the only thing that comes to mind is that these information should not be stored in the SourceRef but in the Source, and SourceRefs would then reference them. This works for all kinds of notes, but not for Images. This is something that could be impractical, I'm not sure... Cheers, Frederico |
From: Tim L. <guy...@gm...> - 2011-01-01 22:34:57
|
Thanks for your thoughtful reply. I was well aware of the approach you take, and was careful to ensure that nothing that was done would prevent you using Gramps just as you want to. Frederico Muñoz wrote: > >> I have a book that details, on page 7: >> >> “In the 1870s B moved to the town of BT. It was here that I's father K >> was born in 1860. By the time he was 30 he had married. >> His first child M was born there. Shortly afterwards his wife died and >> two years later he married G. M was 12 before her brother >> I appeared.” >> > > I would create a source reference for page 7 of the book, copy it to > the clipboard and use it in each event/assertion. I would further add > a specific *citation* (TEXT_FROM_SOURCE) to each different source > reference that deals only with the specific event. Example: in K's > birth event I would add "... It was here that I's father K was born in > 1860...", etc. So, each Source Reference contains a different > citation, making it unique. > Yes, that is fine, and if you have the time to extract the specific words for each citation, it is a good approach. However, my point was that the family history part of the page was only those two sentences, and I am going to use those sentences to support many different facts. Sometimes, life is just too short to extract specific words. I think it is much better to help users to provide *some* source citation information, even if they do not take the time and effort to input some perfect data. With the GEPS, the user can input a citation, and then use it repeatedly if they want. Frederico Muñoz wrote: > >>The objections listed are: > >> * The Source Reference does not allow the Media scan to be stored. > I agree. This was one of my original problems - although the scan > should be present in the Source, a way to link it to a source > reference would be nice. > You would be able to continue to store the scan in the source. However, if you wanted the scan to be related to the citation, then you would be able to store it there (either as well as in the source, or instead of in the source). Frederico Muñoz wrote: > >> * The Source Reference is not shared, there is a separate instance for >> each place where it occurs (e.g. each event). > > I depend on this behaviour, and the GEPS makes explicit mention to the > way I do things. I would like to note that what I do is what it > already present in GEDCOM > Nothing in the GEPS *forces* the citation to be shared. It would continue to be possible to have separate citation instances. In fact this would be the normal situation, and you would have to explicitly select an existing citation if you wanted it shared. The proposal is entirely consistent with GEDCOM and the way source citations are used there with text (notes) and multimedia links. Frederico Muñoz wrote: > > >> Note that there is an argument that separate source references for the >> different events is preferable, because the exact text >> that relates to that particular event can be attached. For example, for >> the birth event for person K, one could attach: “…the >> town of BT. It was here that I's father K was born in 1860…”. There are >> two objections to this: >> >> * It is difficult to identify exactly which parts of the text are >> relevant to each event. Should I’s father be included in the source for >> K’s birth? > > While there will always be the need for some personal criteria (this > is far from exact an exact science), this is no different from other > decisions concerning where should source references be added. I do not > understand the objection very well though (my fault), but yes: if the > supporting information concerning K's birth is derived from that > sentence I would use "...It was here that I's father K was born in > 1860...". If I knew the place that "here" is supposed to mean I would > put it inside brackets. This way I will know exactly why I have K's > birth in 1860. Without this sort of event-specific citations (read, > TEXT_FROM_SOURCE, a source note added to a source reference) I would > have to go find out by reading the entire source. > Sorry this wasn't very clear. I agree that it is no different from any other decisions about how source information should be input. The point I was trying to make is that it is rather tedious to work out exactly which parts of the source text relate to each event that you are trying to provide the source for. Also, think about what happens when you (or someone else) later goes back to the source to check the conclusion. In practice you are likely to want to re-read the whole paragraph, because the doubt in your mind arises from some thought that was not present when you wrote the original source reference. Therefore it is quite likely that you want to check everything, not just the words that you had selected. Having said all this though, the GEPS would not prevent you carefully citing separate exact words for each source. Frederico Muñoz wrote: > > Well, sharing the Source Text note would be an option...the problem > here is that while sharing supporting citation work for one-liners not > all (and certainly not most in my experience) sources are like that. > And by making Source References something "shared" it stops being > possible to provide adequate citations that support the specific > event. > > So, for me it is important that any improvement maintains the ability > to keep source reference specific content. Since I use citations for > everything (this is why they exist, and in PAF for example citations > have a first-order UI element that helps a lot, I have made a feature > request about this) sharing Source References would not work since > changing a citation would mean changing all of them. > Just to emphasise the point again, the GEPS does not *force* you to share citations. You could continue to keep them unique to each event or other object. The GEPS *does* make citations first class UI objects, so that they can be examined as you wish. Frederico Muñoz wrote: > > A different matter is the way to "split" sources. Again using Church > Records as an example I have often felt the need for an hierarchical > classification, similar to what is used in the repositories I use: > looking at > http://pesquisa.adporto.pt/cravfrontoffice/default.aspx?page=regShow&ID=488904&searchMode=as > in the right side one can see a tree. The organisation is > hierarchical, with a top category for the Parish, which contains > different "series" (Baptism, Marriages, etc), each containing an > "installation unit" (a specific book, for a specific time period). > Since sources in Gramps are "flat" this is not entirely different from > what was done with Places. > I haven't suggested a hierarchical arrangement (except you can regard citations and sources as a two level hierarchy). I think that a hierarchy would be much too complicated, especially for Aunt Martha. The user would have to decide exactly which things were going to be at each level. He would have to decide how the 'location' attribute was going to be used at each level (e.g. at one level, the location refers to volume number, at the next to page, and at the next to line number). This provides more opportunity for confusion, and for inconsistency between different sources within one family tree. Finally, a hierarchical approach is not consistent with GEDCOM. With the approach in the GEPS, the Volume/Page location in the citation is consistent with GEDCOM, and one can take advice from GEDCOM documents as to how to structure your sources. -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3170512.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |
From: Frederico M. <fs...@gm...> - 2011-01-02 15:56:27
|
Hi Tim, First of all a Happy New Year. I think I've focused to much on the background information in my comment, and to little in the GEPS itself, apologies. I didn't want to convey the idea that I was opposed to the proposed solution, nor that it didn't address my needs (after all it's quite clear that you took explicit care in documenting the approach I use). I'm sending a new batch of comments; I being a bit of "devil's advocate" in some points, but only because I think that the more comments the GEPS gets the better. There is also a bigger difference here: I almost never use Census information, so I'm less sensible to some of the problems that are being addressed, which is something that no doubt tints my perspective. 2011/1/1 Tim Lyons <guy...@gm...>: > > Thanks for your thoughtful reply. > > I was well aware of the approach you take, and was careful to ensure that > nothing that was done would prevent you using Gramps just as you want to. I'm not opposed to changes, even if they mean that the way I use it would need to be changed. I'm not married to "my way" of doing things, so I can easily adapt to any model that satisfies my needs. In any event it's good to have as little impact as possible on existing practices (but that shouldn't be something that gets in a way of a better design) > With the GEPS, the user > can input a citation, and then use it repeatedly if they want. One interesting thing about the proposal is that there will be (correct me if I'm wrong) another layer added ("Source Citation Information") that will *not* be shared and will be specific to each citation. It's not entirely impossible to use the same arguments to advocate that this one should also be shared, ad infinitum... > You would be able to continue to store the scan in the source. However, if > you wanted the scan to be related to the citation, then you would be able to > store it there (either as well as in the source, or instead of in the > source). This is something useful. I sometime use the event gallery, but this would be better. > Nothing in the GEPS *forces* the citation to be shared. It would continue to > be possible to have separate citation instances. In fact this would be the > normal situation, and you would have to explicitly select an existing > citation if you wanted it shared. The proposal is entirely consistent with > GEDCOM and the way source citations are used there with text (notes) and > multimedia links. Yes, I can use different citations. However once the change is made it would be a bit strange to have different citations that are identified by exactly the same citation data and only differ in the notes, gallery, etc. What I mean by this is that I'm more than willing to adapt to any model instead of maintaining my own, and these decisions will influence how people will use Gramps by default (e.g. I do not use a "one source per page" approach because I feel it doesn't fit in the existing model, even if it is possible). > Also, think about what happens when you (or someone else) later > goes back to the source to check the conclusion. In practice you are likely > to want to re-read the whole paragraph, because the doubt in your mind > arises from some thought that was not present when you wrote the original > source reference. Therefore it is quite likely that you want to check > everything, not just the words that you had selected. Do note that it is already possible to share SourceRef notes, so even if the SourceRefs themselves are not shareable the notes are. I actually use this for the one-line examples: I use the same TEXT_FROM_SOURCE note in more than one SourceRef, and changing it once will be reflected in all of them. I'm not saying that this addresses your needs (or even the needs of most users), just that it is possible. > Having said all this though, the GEPS would not prevent you carefully citing > separate exact words for each source. That is good, even because I want t make a GEPS/request/debate about adding first-class UI support for those citations, even if via popup or something. One of the problems here (and you have been very careful in mentioning it, your GEPS is overall very detailed and thorough) is that these kind of changes also have impact in other GEPS and feature requests, or even sometimes depend on them. > Just to emphasise the point again, the GEPS does not *force* you to share > citations. You could continue to keep them unique to each event or other > object. The GEPS *does* make citations first class UI objects, so that they > can be examined as you wish. Sorry, my mistake here. I'm using "citations" wrongly: when I use citations I mean the TEXT_FROM_SOURCE note that is parte of a SourceRef, while you mean SourceRefs (and rightly so since they are SOURCE_CITATIONS in GEDCOM). What I mean by "first class UI citations" is a way to quickly see what is the text in the source that a particular source reference contains and that supports that particular event. > I haven't suggested a hierarchical arrangement (except you can regard > citations and sources as a two level hierarchy). I think that a hierarchy > would be much too complicated, especially for Aunt Martha. The user would > have to decide exactly which things were going to be at each level. He would > have to decide how the 'location' attribute was going to be used at each > level (e.g. at one level, the location refers to volume number, at the next > to page, and at the next to line number). This provides more opportunity for > confusion, and for inconsistency between different sources within one family > tree. Finally, a hierarchical approach is not consistent with GEDCOM. With > the approach in the GEPS, the Volume/Page location in the citation is > consistent with GEDCOM, and one can take advice from GEDCOM documents as to > how to structure your sources. This is sufficiently different from the main aspects of your GEPS that it can be considered a different issue altogether, I only mentioned it because it could be useful to think about it a bit more (perhaps in a different GEPS). I tend to think that it is possible to have a non-flat approach that is GEDCOM compatible, not unlike the Locations Tree View. However I haven't really thought about it much and it would surely be something not trivial (and, again, that would be entangled with other GEPS, like the "Evidences-type citations", etc). Best regards, Frederico |
From: Tim L. <guy...@gm...> - 2011-01-15 17:37:25
|
Frederico Muñoz wrote: > > One interesting thing about the proposal is that there will be > (correct me if I'm wrong) another layer added ("Source Citation > Information") that will *not* be shared and will be specific to each > citation. It's not entirely impossible to use the same arguments to > advocate that this one should also be shared, ad infinitum... > Actually, the original proposed solution (in section 2.3 of the GEPS) does not have any information in the "CitationRef", so there would not be any information that is specific, and hence there would not be an argument for an infinite regress. The design adds information to the CitationRef to support deduction content, but this addition is not an essential part of the GEPS. -- View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3219303.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |