GEDCOM format details - poss a bit off topic

2011-12-17
2013-05-30
  • David Ledger
    David Ledger
    2011-12-17

    Would it be true to say that in a GEDCOM file:

    1.   Level 0 records only ever have the @…@ in the 2nd field.
    2.   Level 0 records only have 3 fields apart from HEAD and TRLR records.
    3.   Level 1, 2, & 3 records that include a @…@ have it only as the 3rd field.
    4.   Higher levels never contain a @…@ field.

    My limited GEDCOMs conform to those rules, but I can't be certain that it would always be the case.

    Thanks,
    David

     
  • David Ledger
    David Ledger
    2011-12-18

    I was rather hoping that someone who has worked a lot with GEDCOM files would just know. I suppose what I really want to know is are those four statements true for GEDCOM files produced by PhpGedView?

    It must be nearly 20 years since I worked regularly with BNF definitions, but the method used in that definition of the GEDCOM format is a poor relation of BNF. It's fine for showing how the data you have should be put in, but no good for going the other way. It would take a long time to work out what results could be obtained from unspecified data.

    David

     
  • Stephen Arnold
    Stephen Arnold
    2011-12-18

    David
    The standard clearly identifies the tags and levels allowed. Not having any idea why your 'theory' would be important (or of any real use), I wouldn't take the time to compare it against the standard.  PGV's coding complies with the standard and makes an effort to be sure you can't/don't enter data that would be non-compliant via the GUI. Of course you can enter raw data in any misconfiguration of your choosing.

    However:
    Neither 0 HEAD nor 0 TRLR have any "2nd field", so they don't conform to your theory as they have no @xref@
    and
    generally speaking, your theory is simply a rehash of the rules used to create the standard, although again, I will not take the time to verify each condition as I can't see any significance unless you were writing code yourself - which you have not revealed.
    -Stephen

     
  • David Ledger
    David Ledger
    2011-12-18

    I have two small gedcom files where each one shares a single person with my main gedcom file. I have already written a short Perl script to renumber and join two files that works with my files. There have been threads in the forum about the problems of merging trees that never really come up with a universal answer. If all practical PhpGedView gedcom files conform to my 'rules' then the script could be useful to others. I suspect that there is more complexity possible that would stop the script being a generic solution, otherwise merging gedcoms that have only one common person would be easy using my trivial script.

    Those who wrote PhpGedView will have been taking values from gedcom file fields and putting them back from the beginning. I just expected someone to know that you only need to consider the 2nd field in level 0 records, and the 3rd field in level 1, 2 & 3 records when it comes to @xref@, as you won't find a @xref@ anywhere else. … Or not.

    By using the concept of 'renumbering' I have already restricted my script to files that use an xref of
    <single-alpha-char><number>
    like PhpGedView ones. But merging those is useful.

    David

     
  • Stephen Arnold
    Stephen Arnold
    2011-12-18

    David
    Developing a true merging mechanism would be interesting, but has proven, time and time again, to be a choice of willingness to let algorithms make choices from inexact, non-matching data.  A combining and renumbering of two files is quite easy and there are XLS spreadsheets, free EXE programs, commercial programs (genmerge and genmatcher), open source projects (gedmerge), free programs and genealogy software already available that do an excellent job. For MAC users, GEDITCOM lets you select the master file and devise the primary (I1) XREF and renumber the entire combined GEDCOM as a new version, leaving the previous files untouched.

    True merging software is problematic and even the idea of merging two GEDCOMs needs serious consideration. It is unlikely, given the personal variations in the methods of keeping family trees, even with a standard-compliant basis, leaves much cleanup and we have not even discussed sourcing standards.  Personally, I'm all for using the newly acquired data as an excellent lead to reentering the information, not simply adding it to my efforts. PLAC, SOUR and more are never right.
    -Stephen

     
  • David Ledger
    David Ledger
    2011-12-19

    Ok, so it's not going to be of use generally. I use Macs and I think I looked at GEDITCOM a few years back. Personally I prefer the simplicity of a Perl script over a GUI app that hides what it's actually doing. For speed of checking things I often browse an HTMLised gedcom file rather than use PhpGedView, but PhpGedView wins hands down for data entry of course.

    Thanks,
    David