Menu

GEDCOM validity

Help
2020-04-07
2020-04-13
  • David Ledger

    David Ledger - 2020-04-07

    I'm still having some oddities with my GEDCOM file. If I user the
    'Gedcom checker' I get no Critical errors, but several yards of ordinary
    Errors. They are all similar. This GEDCOM was exported from FindMyPast.

     With two lines of context, one example is:
    

    0000626 2 SOUR @S70@
    0000627 3 PAGE Archive reference: 1D63/3; Archive: Record Office for
    Leicestershire, Leicester & Rutland; Record set: Leicestershire
    Baptisms; Subcategory: Parish Baptisms; Category: Birth, Marriage &
    Death (Parish Registers); Collections from: England, United
    ‎0000628[[4 CONC Kingdom;]]‎ invalid ‎tag ‎; see ‎INDI @I1003@‎
    ‎0000629[[3 REFN klzzwxh:0002]]‎
    invalid ‎tag ‎; see ‎INDI @I1003@‎
    0000630 1 SEX F
    0000631 1 BIRT
    The highlighted error lines being 0000628 and 0000629.

    I've already edited The FindMyPast 'REF' tags to be 'REFN', but that
    hasn't fixed it. Looking at the GEDCOM 5.5 spec available as a pdf at
    https://www.familysearch.org/developers/docs/guides/gedcom
    and the error it looks like a PAGE tag can't have a CONC, and that a
    SOUR pointer tag can't have a REFN. Is this the case, in phpgedview at
    lest? I can hopefully join the CONC text nto the end of the PAGE tag
    value, but what shoud happen to the REFN tag values?

    TIA,
    David

     
  • Gerry Kroll

    Gerry Kroll - 2020-04-07

    The 3 PAGE should just be 3 PAGE 1D63/3

    The text following "Archive: " is really a REPO, but you could just stuff it into a NOTE. Making this into a valid REPO would take a lot of work, but if you mention this archive a number of times, you only have to do this once, and you reference that REPO with a 3 REPO @Rxxxx@ as often as needed.

    If PhpGedView isn't complaining, why not just ignore the Gedcom Checker complaints? Let whoever takes over from you sort it out. It's obvious what's intended.

     
    • David Ledger

      David Ledger - 2020-04-08

      On 2020-04-08 00:52, Gerry Kroll wrote:

      The 3 PAGE should just be 3 PAGE 1D63/3

      The text following "Archive: " is really a REPO, but you could just
      stuff it into a NOTE. Making this into a valid REPO would take a lot of
      work, but if you mention this archive a number of times, you only have
      to do this once, and you reference that REPO with a 3 REPO @Rxxxx@ as
      often as needed.

      I can program in 'awk' among other things. so sorting such problems is
      quite easy for me. REPO looks like it should only be used in a
      Repository definition and SOUR entries. I can truncate the 'Archive:'
      part and remove the CONCs easily as long as FindMyPast has included that
      data in the SOUR records and its use of the ':' is consistent.

      I'll sort this and see if the REFNs still cause a problem.

      If PhpGedView isn't complaining, why not just ignore the Gedcom Checker
      complaints? Let whoever takes over from you sort it out. It's obvious
      what's intended.

      Because there are many hundreds of these. There may be some real problem
      lurking among them.

      This isn't being done for someone else. When I'm dead I don't know who
      will take it over. I'd like to get it into a good valid state so I can
      upload to Ancestry and FindMyPast where others might be able to gain
      from it. My daughter is interested but I doubt she could be focussed
      enough to develop it. It would have to be hosted by someone else because
      she couldn't manage the AWS system it now runs on.

      Thanks,
      David

       
      • David Ledger

        David Ledger - 2020-04-11

        On 2020-04-08 10:53, David Ledger wrote:

        On 2020-04-08 00:52, Gerry Kroll wrote:

        The 3 PAGE should just be 3 PAGE 1D63/3

        The text following "Archive: " is really a REPO, but you could just
        stuff it into a NOTE. Making this into a valid REPO would take a lot
        of work, but if you mention this archive a number of times, you only
        have to do this once, and you reference that REPO with a 3 REPO
        @Rxxxx@ as often as needed.

        I can program in 'awk' among other things. so sorting such problems is
        quite easy for me. REPO looks like it should only be used in a
        Repository definition and SOUR entries. I can truncate the 'Archive:'
        part and remove the CONCs easily as long as FindMyPast has included that
        data in the SOUR records and its use of the ':' is consistent.

        I'll sort this and see if the REFNs still cause a problem.

        If PhpGedView isn't complaining, why not just ignore the Gedcom
        Checker complaints? Let whoever takes over from you sort it out. It's
        obvious what's intended.

        Because there are many hundreds of these. There may be some real problem
        lurking among them.

        This isn't being done for someone else. When I'm dead I don't know who
        will take it over. I'd like to get it into a good valid state so I can
        upload to Ancestry and FindMyPast where others might be able to gain
        from it. My daughter is interested but I doubt she could be focussed
        enough to develop it. It would have to be hosted by someone else because
        she couldn't manage the AWS system it now runs on.

        I've now sorted the PAGE lines that contain more than a page identifier
        in the cases of 'Archive reference:' and another similar one. There are
        still some other cases to do, but they don't flag as errors. I've
        removed the CONC lines after PAGE lines, putting the CONC contents on to
        the end of the PAGE lines (to be sorted later); re-written the 'REFN's
        as 'NOTE REFN's so I know where they came from. I now only have four
        pages of errors.

        I have fixed one of my
        0 INDI
        ...
        1 ADDR
        2 CONT
        ...
        reported errors by adding a RESI level:
        0 INDI
        ...
        1 RESI
        2 ADDR
        3 CONC
        ...
        and the checker seems happy with that. To fix another ADDR error where
        there was no data I removed the ADDR using 'Edit raw GEDCOM record' and
        then added the real address using phpgedview. The program added it as a
        level 1 ADDR and then the checker reported it as an error, so phpgedview
        is doing the wrong thing here.

        As a part of the recording of a burial, phpgedview allows a cemetary to
        be added, but CEME has been removed from the standard and 'check'
        reports it as an invalid tag. Any idea what should be used instead?

        Are these (ADDR AND CEME) things that should be and are likely to be
        fixed, or have fixes stopped?

        Thanks,
        David

         
  • Gerry Kroll

    Gerry Kroll - 2020-04-11

    You should NOT be editing raw GEDCOM records unless there's no alternative.

    CEME and its accompanying address are sub-records of the BURI and CREM level-1 tags, like so:

    1 BURI
    2 CEME foo
    2 DATE 20 MAR 2020
    2 PLAC Ottawa, ON, Canada
    2 ADDR bar
    

    So: no fix is required. Use PhpGedView's built-in editing to edit the BURI or CREM fact.

     
    • David Ledger

      David Ledger - 2020-04-11

      On 2020-04-11 15:42, Gerry Kroll wrote:

      You should NOT be editing raw GEDCOM records unless there's no alternative.

      CEME and its accompanying address are sub-records of the BURI and CREM
      level-1 tags, like so:

      1 BURI
      2 CEME foo
      2 DATE 20 MAR 2020
      2 PLAC Ottawa, ON, Canada
      2 ADDR bar

      So: no fix is required. Use PhpGedView's built-in editing to edit the
      BURI or CREM fact.

      But the checker gives me:
      0025236 2 NOTE From Burial record
      0025237 1 BURI
      ‎0025238

      [[2 CEME St. Margaret's, Leicester, Leicester, England]] (ValueError('No closing quotation'))

      invalid ‎tag ‎; see ‎INDI @I1958@‎
      0025239 2 DATE 26 MAR 1816
      0025240 2 PLAC Leicester, St Margaret's, Leicestershire, England

      And also there's the issue of the level 1 ADDR that is generated by
      phpgedview and the checker reports as an error.

      David

       
  • Gerry Kroll

    Gerry Kroll - 2020-04-11

    Exactly WHERE and HOW did you enter the "real address"?

    The GEDCOM Checker doesn't claim to be perfect. It's intended to identify entries in your GEDCOM that might be in error.

    You should ignore the unpaired apostrophe "error" since it's caused by a legitimately unpaired apostrophe. There's just no way the checker can determine that the unpaired apostrophe is not an error. The rules for using such characters in English, never mind other languages, are too complex to even consider implementing them programmatically.

    The only corrective action that might be appropriate is to disable all checks for unpaired apostrophes and unpaired quotation marks. I think this action is not desirable.

     
    • David Ledger

      David Ledger - 2020-04-12

      On 2020-04-11 20:00, Gerry Kroll wrote:

      Exactly WHERE and HOW did you enter the "real address"?

      I used the 'Add new fact' -> Address [ADDR], Add

      I probably should have added a RESI, but I was correcting an empty ADDR
      so didn't consider it.

      What/where is the GEDCOM definition document that phpgedview is designed
      against? I'm using the 5.5 one
      https://edge.fscdn.org/assets/img/document/gedcom55-82e1509bd8dbe7477e3b500e4f62c240.pdf"
      linked to on
      https://www.familysearch.org/developers/docs/guides/gedcom

      According to that ADDR is not a valid level 1 item, which makes sense.
      Should ADDR just be removed from the pop-up list for 'Add new fact'?

      The GEDCOM Checker doesn't claim to be perfect. It's intended to
      identify entries in your GEDCOM that might be in error.

      You should ignore the unpaired apostrophe "error" since it's caused by a
      legitimately unpaired apostrophe. There's just no way the checker can
      determine that the unpaired apostrophe is not an error. The rules for
      using such characters in English, never mind other languages, are too
      complex to even consider implementing them programmatically.

      The only corrective action that might be appropriate is to disable all
      checks for unpaired apostrophes and unpaired quotation marks. I think
      this action is not desirable.

      Don't know where the 'No closing quotation' came from. It's not in the
      webpage but it is in the email I sent, copy/pasted from the webpage.
      It's also not in the section repeated below.

      0025236 2 NOTE From Burial record
      0025237 1 BURI
      ‎0025238

      [[2 CEME St. Margaret's, Leicester, Leicester, England]] (ValueError('No closing quotation'))

      invalid ‎tag ‎; see ‎INDI @I1958@‎
      0025239 2 DATE 26 MAR 1816
      0025240 2 PLAC Leicester, St Margaret's, Leicestershire, England

      The entry was added using 'Burial' from 'Add new fact'.

      Regards,
      David

       
  • Gerry Kroll

    Gerry Kroll - 2020-04-12

    "no closing quotation" is talking about the apostrophe in the text "St Margaret's". The message means that the apostrophe or quotation mark is not paired with a second occurrence of the same character in the same text.

    You're probably right that "ADDR" should be removed from the list of Facts that you can add. I'll check into correcting this, but I need to determine whether there is ever a reason for having a Level 1 ADDR tag. I don't think there is.

     
    • David Ledger

      David Ledger - 2020-04-12

      On 2020-04-12 14:58, Gerry Kroll wrote:

      "no closing quotation" is talking about the apostrophe in the text "St
      Margaret's". The message means that the apostrophe or quotation mark is
      not paired with a second occurrence of the same character in the same text.

      That "no closing quotation" is a red herring. It's not in the email I
      sent as saved in my 'Sent' mailbox, but it is in the copy I got back as
      being my post from the mailing list.

      David

       
  • Gerry Kroll

    Gerry Kroll - 2020-04-12

    Here's a link to the definitive description of the ADDR tag according to the German genealogical society.

    http://wiki-en.genealogy.net/GEDCOM/ADDR-Tag

    You can see that there are legitimate instances of the ADDR tag occurring at Level 1. So: I can't eliminate ADDR from the list of tags you can add. The best that can be done is to make the list context sensitive, but that will take some tricky programming to achieve. I'll see what can be done, but I strongly suggest you don't hold your breath.

     
  • Gerry Kroll

    Gerry Kroll - 2020-04-12

    David:
    Let's let this rest, shall we? I don't have access to your "sent" mailbox, so I can't verify the accuracy of your statement. All I can see is that there is an error message about a missing quotation mark. This is what I see in my Inbox (without the fancy red box) when SourceForge sends me a copy of your post. For "quotation" you should read "single or double quote".

    There are other GEDCOM checkers out there. If you're unhappy with what's available as part of the PhpGedView package, try one of the others. Your dissatifaction doesn't affect me in the slightest. I am not the author, and I have not had occasion to improve or otherwise change the program. If you want to improve on the wording of the error message, I'm open to suggestions.

     
  • Gerry Kroll

    Gerry Kroll - 2020-04-12

    David:
    I think we're both barking up the wrong tree here. The text "No closing quotation" does not occur anywhere in PhpGedView. Neither does the text "ValueError". I am at a loss to explain eaxctly where this message is coming from. It certainly is NOT PhpGedView.

     
    • David Ledger

      David Ledger - 2020-04-13

      On 2020-04-12 20:30, Gerry Kroll wrote:

      David:
      I think we're both barking up the wrong tree here. The text "No closing
      quotation" does not occur anywhere in PhpGedView. Neither does the text
      "ValueError". I am at a loss to explain eaxctly where this message is
      coming from. It certainly is NOT PhpGedView.

      Gerry

      I suspect it's more likely to be an oddity of the sourceforge email system.

      I'm not dissatisfied with phpgedview at all, and I applaud your efforts
      to keep it going. We have an non-ideal GEDCOM spec and a program that
      has its origins in old versions that's maintained by volunteers and
      available for free. Not surprising it has a few dents here and there.
      Personally I'd like to be able to create a GEDCOM as compliant as
      possible when I want to upload to Ancestry / FindMyPast so that others
      can access it, even if that does mean running it through a script to fix
      bits. Obviously it would be better for me if no script were necessary.

      On the ADDR front, as far as I can see it is only valid at level 1 when
      level 0 is SUBM or REPO, so it could be removed from the list available
      for 'Personal Facts and Details' assuming it has its own list like some
      configurable ones do.

      I have written a bit of php. I did an online furniture sales website
      about 9 years ago, and have written stuff to get the information I want
      from a WordPress database. At the moment I'm learning and consolidating
      my knowledge of AWS Linux servers, vhosts and 'Lets Encrypt', so I'd
      rather not have to learn the program structure of phpgedview just yet.
      At almost 73 I may not get round to it.

      Regards,
      David

       
  • Gerry Kroll

    Gerry Kroll - 2020-04-13

    David:
    Google says that the error message comes from Python. How Python fits into this puzzle is a mystery.

     
  • Gerry Kroll

    Gerry Kroll - 2020-04-13

    If you want to remove ADDR from the list of Facts that can be added to individuals, you edit the GEDCOM configuration "Edit" section. It's right there, in two places.

    The list of quick Facts that can be added to Repository records is missing ADDR.

    This should be corrected in the SVN version.

     

Log in to post a comment.