Menu

Disappearing GEDCOM - and restoring problem

Help
cv55
2010-06-19
2013-05-30
  • cv55

    cv55 - 2010-06-19

    I am running PGV 4.2.3 on a local machine (Win) as 'editing environment', everything working fine - until I yesterday found the whole GEDCOM file (~10 MB) "disappeared" and being replaced by a 'new' GEDCOM (~10 KB) with the same name, containing only the records edited or added during my last session. I'm wondering what happened - and how to restore the data with minimum loss…

    (1) I am sure, I did not touch the GEDCOM administration at the time in question, so it was no inadvertent deletion. The only peculiarities are: (a) my GEDCOMs are not in the "/index" directory, but in a separate "/gedcom" directory (the result of a manual GEDCOM "update": When finding PGV not copying a "new GEDCOM already on the server" to "/index", but using it at the location where it is, I left it there for easier backups); (b) I am encountering problems with the virtual machine my browser is running in (must be launched twice to become visible), but I think this doesn't matter here. - Are there experiences with the phenomenon of "disappearing GEDCOMs"?

    (2) I did backup the GEDCOM the day before yesterday, so perhaps I did not lose too much data. I did replace/add the saved "new" data manually and did copy the (partially?) restored file to it's place. When starting PGV now - without "importing" or "saving" anything - I find the records (apparently) complete: seems the DB has kept it's last state - may be even complete (this is difficult to verify at the moment). I wonder what to do now - realizing that I'm not completely sure about PGV's manner of operation: When "updating" the GEDCOM with the "uploaded" new file, the DB will be overwritten by the (manually edited) GEDCOM - and so perhaps lose some entries not in the latter one. But when "saving" any edits (usually made to one entry), doesn't PGV change only the respective entry in this moment? Did I understand this right? Or more generally: Is there a possibility to make PGV write the whole DB to the GEDCOM file? - Or better: Is there a possibilty to make PGV synchronize DB and GEDCOM mutually?

    Hope I could clarify the question. Thanks,
    volker.

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-19

    volker
    Are you saying the PGV has lost all the data? None of your previous gedcom exists in the display and only the most recent changes? Or, in the TEXT gedcom, in the INDEX folder, your GEDCOM has been replaced/corrupted/trashed?  Frankly, if you are still using SYNC to gedcom, you have certainly significantly slowed your system. This procedure requires that the entire GEDCOM be rewritten to disk every time you approve a change. On a large gedcom, this can take up to 45 seconds or more, depending on the processor, the memory and the speed of the hard drive. It is NOT recommended to retain SYNC.

    If the first, and your gedcom with all changes is retained within the DB, then simply download or export a new gedcom from the DB (after approving all changes) and replace the one you have - remembering that you should NOT be using SYNC.

    If the second, and the DB has been corrupted, then you have a disk or memory problem that will only get worse, and you probably need a new hard drive at the least. Additionally, your backup is probably your only solution, although remember that you just told us that the remaining gedcom contains all your changes. You could simply add (merge) that data with your older backup and perhaps lose little to nothing.

    A regular backup routine, both onto your local disk and onto remote locations, is critical for any computer. Use your free space on Google/Yahoo/MSN/ME.com/etc and/or a service and/or purchase an external drive and use develop a regular scheme, like Apple's Time Machine or Retrospect or a cloning software (Mac- Carbon Copy Cloner).
    -Stephen

     
  • cv55

    cv55 - 2010-06-20

    Stephen,

    Are you saying the PGV has lost all the data?

    Yes and no: it lost all data of one gedcom, the one I did work with this day - and it did lose only the (previous state, large) gedcom file; the db is/seems complete.

    None of your previous gedcom exists in the display and only the most recent changes?

    There is a second gedcom, I did not open this day - nothing happpened with it.
    Once more: the data loss is very selective: The previous, quite large gedcom was deleted - or perhaps: 'emptied' - and all changed or added entries were written correctly into this file. The daily logs are present and complete (I don't see any conspicuous entries in them).

    still using SYNC to gedcom … requires that the entire GEDCOM be rewritten to disk every time you approve a change … It is NOT recommended to retain SYNC

    I'm not yet really understanding the connection to my problem. - I presume you mean "GEDCOM configuration" > "Edit Options" > "Synchronize edits into GEDCOM file": Yes, this option is "still" active. Why not? Or: what else? I do not find (or 'feel') a real slow down at the moment (~16,000 persons). When not synchronizing DB and gedcom, PGV would not  use the gedcom but as an 'initial data input', working, the initialization once done, with the DB(!?) (btw: I remember a note, in the wiki or the forum, saying the opposite: the DB being 'subordinated', being used first of all for indexing purposes… (may be, this is a bit off topic here) - More important, at the moment: Is it possible, that my SYNC configuration is the cause of my Gedcom disappearance?

    simply download or export a new gedcom from the DB

    a good idea, I forgot about that.

    … then you have a disk or memory problem …

    I will have to check it anyway. But I did not encounter any data loss in other programs (so far…), so I suppose a configuration or operation error of my PGV installation…
    volker

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-20

    Volker
    I thought I was pretty explanatory of the problem and the solution, if the DB gedcom version was complete and only your text gedcom was trashed, and I - if this was the scenario - as to a possible reason(s) and the warnings.

    PGV long ago stopped being a text gedcom program. It relies on a few support text files (config's for overall operations, for gedcom configuration settings and for privacy settings as well as some other support files) and this has been its legacy problem.

    Once again - when you keep SYNC to GEDCOM set to yes, you require the program to write to file the ENTIRE file, each and every time you make a minor change - any change - one character, one date - ANYTHING. Writing the entire file time and time again is wearysome on your system resources - processor, memory and hard drives - and leaves the file wuite vulnerable to any glitch in the infrastructure during this write to disk period. Any I/O errors will be translated to a corrupted gedcom, including the loss of the entire file. If you have auto-accept set and have multiple users who happen to hit the GEDCOM at the same time, the system can also get confused.

    There are a plethora of reasons NOT to keep the gedcom SYNC setting as YES, beyond those noted above. We considered removing it many times, but its still there and its still a problem portion of the program, IMHO. It is a terrible method of maintaining a backup of the real GEDCOM - the one sitting inside the DB - both due to the reasons mentioned about possible corruption, as well as it encouraging a false sense of performing a backup. It is not a good, or even valid backup method unless you copy this file regularly to another location as it is rewritten - in its entirety - with each edit. That's a ton of I/O.

    The corruption is a good indicator that you may be about to experience a hard drive failure. At the very least, backup everything and then run a good HD examination, and check your SMART output - if your drive has this feature.

    Now, again the solution suggested before, if your scenario was as now described:
    Simply download a copy of the gedcom from the DB and replace your text version. You do not need to reimport, as the data in the download and the DB would now be identical as they will again be "in sync". I would recommend turning off the SYNC to GEDCOM 'feature' (crutch) as it is terribly hard on your system, whether you sense it or not.

    Stephen

     
  • cv55

    cv55 - 2010-06-21

    Stephen,
    thanks again for the "download solution"; it did work well. And sorry for not immediately understanding the 'forensic' part of the explanation (perhaps, it had been too implicit for my knowledge of English ;-).
    One last question concerning the gedcom-db-interaction within PGV: What about the  "Edit raw Gedcom" feature? (I am using it quite frequently - e.g. for adding missing tags (PEDI in "add parent"), or for editing manually added ("merged Gedcom") entries.) - What will happen with SYNC deactivated when using it? Where will PGV write the changes made here - to the Gedcom or to the DB?
    volker.

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-21

    volker
    Simplistically, when using PGV, all changes made, regardless on the methods used, are written to a pending file in the DB until approved. When these changes are approved (either manually, or via auto-accept) they pass from this suspense file into the DB. If, and only if, you use SYNC does PGV then copies over the entire text gedcom in your specified 'index' folder, replacing (erasing) the original. EVERYTIME.

    This is a large amount of I/O activity if you make frequent changes, each subject to many glitches. This is pure legacy - a carryover to the days that PGV was an index-retention program - and not DB-driven software. Frankly, it is a feature that should have been removed long ago. There are numerous locks and hooks to keep multiple, simultaneous users from mucking up each others' edits, but despite many efforts - it remains potentially buggy, particularly as the file size grows (10,000 INDI's is a substantial, medium-level gedcom).

    To get the BEST, and latest copy of your data, it has always (since leaving index in v.3) been necessary to download a copy of the DB's version of the gedcom. Even with SYNC enabled, it is possible that some of your changes fail to copy to the text version.

    Hope this better explains the scenario. The DB is always the latest reflection of the gedcom in PGV v.4+ and only if you experience DB I/O problems, power or connection issues or an unknown bug, would that data be corrupted or incorrect. As is always the mantra - Backup, then backup again (and in my world, after you think you've backed up enough - BACKUP again.)

    Stephen

     
  • cv55

    cv55 - 2010-06-21

    Stephen,
    sorry for having written in a way that could be misunderstood. I really got the point, meanwhile. And please let me confirm, that I did decide to use PGV, because it is DB-driven (not although): Because it uses an 'open' standard, SQL, that can be used for the purposes of own needs and special projects, too - instead of proprietary formats or 'hard copied' web pages, and so on. And I really like the lots of 'sophisticated' solutions in it, for the everyday problems of the genealogical work, proving that the programmers are using it themselves.
    When I'm speaking about Gedcom, I do so for three reasons: (a) Gecom is not perfect, but it is a standard - and using a standard is better than writing the 5th program with the 6th not compatible data format (I presume, you know that). (b) For the 'end user', e.g. me, when for example getting a 100 persons set of digitized, structured data, it is easier to sit down for 2 hours and write a small script merging them to my Gedcom - instead of sitting down for 2 weeks (or months or more) and writing a pgv-merging-addon (lovely legacy :). (C) Because of the GED in phpGEDview - I simply misunderstood presumably outdated hints or statements in PGV's docs, wiki, or forum about the primacy of Gedcom in PGV. (I got it now, really!)
    Just to complete the mis- or lacking understandings on my side: Initially I did not get, that you suspect a connection between my 'I/O stress test' via pgv's SYNC and the data loss. And I take it seriously.
    - E.g. by deactivating SYNC. And this is the context of my last question: I don't want to rehabilitate Gedcom text files - I simply note, that - SYNC being deactivated - the "Edit raw Gedcom" feature opens a window displaying the data to edit 'like a gedcom entry': Is it? Or is it a 'simulation'? For me, there are good reasons to use this feature. I would only like to know, whether this 'window' writes changes directly to the gedcom file (as it pretends to do) - and past the DB, so to speak -, or whether the changes are written to the DB, as usual…
    Regards, volker.

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-21

    volker
    Asked and answered. PGV works ok without any text gedcom file if SYNC is NOT enabled, although there is an occasional call that might throw an error message.

    Simplistically, when using PGV, all changes made, regardless on the methods used, are written to a pending file in the DB until approved. When these changes are approved (either manually, or via auto-accept) they pass from this suspense file into the DB.

    No pretending is done by PGV - it is editing the raw gedcom, but ONLY that gedcom which is DB-housed, not text-based, as previous described. PGV is structured to accept the standard gedcom input structure into its table-defined DB, manipulate the additional data, if changed, and regurgitate a standard-formatted gedcom if and when needed or requested.
    -Stephen

     
  • David Ledger

    David Ledger - 2010-06-21

    As syncing to file is such a large overhead, could there not be a button or menu entry (under MyGedView Portal ?) that triggers a save to file when clicked/selected? It would only need to be displayed when the logged-in person is one allowed to approve or has auto-save set. Much easier for the user than turning Sync on, making a change, and turning Sync on again; or doing a full database backup.

    David

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-21

    David
    Why? A button of this sort already exists - EXPORT or DOWNLOAD from the Manage Gedcom Administrative page. A full "DB" backup is something you don't need often  (although we do ours nightly), as other than gedcom facts, the USERs tables, NEWS tables, FAVorites tables and a couple of more are not part of the GEDCOM anyway.  If you are not regularly scheduling a backup to your data, you are playing with fire.  Understand that Exporting a GEDCOM puts a fresh copy in the Index folder on the server while Downloading puts a fresh copy on your local machine.

    It isn't the overhead so much as the constant rewriting to disk the entire file, copying over (erasing) the previous version. This SYNC setting will undoubtedly be removed in some future version as its purely a throwback to the INDEX version of PGV and was retained as a poor-man's backup. It has never been promised to be complete, or accurate, since moving to a DB-driven program.
    -Stephen

     
  • David Ledger

    David Ledger - 2010-06-22

    Why? Because it's quicker and you're more likely to do it. Rather like doing a Save on a WP document periodically while you're writing it. Some people (like me) have more confidence that the hosting company will not lose a file than a database. Belt and braces.

    David

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-22

    Just as fast to hit the export (to create the file on the server) or download to backup on your local drive, and a LOT better practice. However, submit an RFE and maybe someone else will take the temperature and see if adding yet another button in an already convoluted - overly linked interface makes sense. Doesn't - IMHO, but I'm but one guy (opinionated at that, but still just one voice).
    Stephen

     
  • David Ledger

    David Ledger - 2010-06-22

    I'm thinking of using such a button/menu entry while working in user mode rather than admin mode. It would only be available to users who would cause immediate file syncs. While in user mode, when finished working on a family, say, you would just hit 'Sync' rather then going into Admin mode, possibly involving a password entry, navigating to Manage Gedcoms, and doing an Export. Certainly not 'just as fast'.

    David

     
  • Stephen Arnold

    Stephen Arnold - 2010-06-22

    David - enough already. Submit the RFE.
    Personally, I think its a terrible idea, but just one voice. The RFE will be reviewed and, if there's any/enough interest, acted upon. if not, it will just sit there. That's the way most new features have been added.
    Stephen

     

Log in to post a comment.