Now that fsdb is basically working, it is time to start thinking about how we could make it a local cache for PGV...
At first I thought of starting with pgvcgi and adding an fsdb backend, but I think it is better to start with fsdb and add a pgvcgi backend. The goal is to be able to use the pgv database offline, and sync up when online. So the stable state is where fsdb has a complete copy of the pgv database and you are reading it offline. That means the user sees fsdb, and fsdb has to transparently get info from pgv as needed.
A pgv feature that would really help here would be to request the list of all records that have changed since a certain date. fsdb would always save the last sync time. It could update all the new records in the background while the user is using the GUI.
The tough part is allowing local edits while offline. What may make it easier is to save all modified records to a separate directory, and treat the previously syncd baseline as read-only. In most cases there would be no overlap, so when it connects, it would download all new records, and then upload all modified records.
Adding a new record offline would have an extra step. The pgv database could have added new records in parallel, so we have to treat the new fsdb IDs as temporary IDs, and request real ones from pgv. That means converting all the IDs before uploading them.
The biggest problem is when the same record has been modified on both sides. That gives us our classic 3-way merge. Combining facts is the classic case, but when the links to other records has changed, it can get pretty ugly. What if you added a spouse on both sides? Besides choosing the right family link, you should actually remove the unused person. Maybe that is too aggressive; some clean-up utility could find unlinked people later.
To simplify this task, we can start by making the local database read-only when offline.
Are there other issues to work out? Any other ideas in general?
I decided to make a full release, version 13, before starting this new project. I figure it will require moving code around, so it is better to test everything and get a solid base first. Sure enough, I found problems with jLifelines and GenJ support, since I have spent most of my time on PGV and fsdb.
Since this PGV cache is a layer on top of fsdb, which is brand new, I will be working on fsdb itself at the same time as the PGV layer. Right now I am improving the fsdb startup.
Program startup with fsdb is inherently slow. It has to read thousands of small files. The current optimization is to start up threads that read the records and cache mini-records in the background. That would not impact the user if he just looks at specific people or families when he starts up. On the other hand, if he starts by searching for a name or by running a report on the entire database, he has to wait until it reads all the records.
One approach I considered was writing a simple text file with the INDI xref and NAME value on each line. We can read that in quickly. It would solve the name search problem, but not the report problem.
What I am trying to do now is have it read/write a mini-gedcom, which is what I call a text file with mini-records. GDBI uses simplified records, mini-recs, with just the basic info and xrefs, and only uses the full GEDCOM record when it needs the rest of the info. It should be able to read/write that file fairly quickly. Not only does it give us the names for the name search, reports that use the entire database generally only need the info in the mini-recs, so it should make fsdb more practical.
I should have thought of this earlier. I had asked John a while back if he could add a PGV feature to generate a mini-gedcom for GDBI. The name search is not a problem, since the PGV server does the search and sends the results, but it takes a while to cache in all the INDI/FAM relationships. Of course if we get the fsdb layer working, that will no longer be a problem.
I have a basic system working! It reads from the PGV database, and makes a copy of all records in the local FSDB. It is not working that well yet, but it proves it is a reasonable solution.
The way I implemented this is that the user accesses FSDB, and FSDB accesses PGV. That way when PGV is not connected, everything works the same for the user. In this new FSDB/PGV database, every command is a 3-step process:
1. Send request to PGV
2. Save results in FSDB
3. Return results from FSDB
When you are not connected to PGV, it just skips the first 2 steps.
I am having trouble compiling the latest code.
MemdbContext.Record) in net.sourceforge.gdbi.db.pgvcgi.PgvcgiGedcom cannot be applied to (net.source
.UMemdbContext.Record) in net.sourceforge.gdbi.db.pgvcgi.PgvcgiGedcom cannot be applied to (net.sour
return getPgvcgiGedcom().deleteRecord((PgvcgiRecord) this);
..\net\sourceforge\gdbi\db\pgvcgi\PgvcgiIndi.java:37: net.sourceforge.gdbi.db.pgvcgi.PgvcgiIndi is n
ot abstract and does not override abstract method toPrevIndi() in net.sourceforge.gdbi.api.GdbiIntrI
public class PgvcgiIndi extends PgvcgiRecord
..\net\sourceforge\gdbi\db\pgvcgi\PgvcgiIndi.java:49: cannot find symbol
symbol : method toNextIndi(net.sourceforge.gdbi.db.pgvcgi.PgvcgiIndi)
location: class net.sourceforge.gdbi.db.pgvcgi.PgvcgiGedcom
..\net\sourceforge\gdbi\db\pgvcgi\PgvcgiIndi.java:52: cannot find symbol
symbol : method toPrevIndi(net.sourceforge.gdbi.db.pgvcgi.PgvcgiIndi)
location: class net.sourceforge.gdbi.db.pgvcgi.PgvcgiGedcom
Any ideas on these errors. It seems that the PgvcgiRecord should extend the Memdb Record? But I am not sure what your intent was there.
It looks like your PgvcgiRecord.java is bad. It compiles for me, and your line numbers are off by 8.
It looks confusing in db/pgvcgi because I obsoleted most of the files there but did not cvs delete them yet. There is a standard hierarchy of context classes that I have duplicated 3 times, so I am changing the code to use a common copy in util/memdb. The only difference in the 3 copies was calling a few methods in the db interfaces, so I added interface UMemdbIntrGedcom with those methods, and I have the 3 DBs implement that interface.
Ok, when I renamed those files on my local repository I was able to build again.
I have been testing the FSDB/PGV combo using a hard-wired database. Now I have a menu item on the main window to open a database.
When you create a new database, it is a 2-step process. First you have to choose a local folder for FSDB, and then you select a PGV database. Once it is created, FSDB stores the PGV login info. The next time you open the FSDB part, it automatically connects to PGV.
One thing that is awkward right now is that to create an FSDB, you browse to the parent folder and type in the name of the folder you want to create. I think I need to separate open from create and make it more obvious.
It has taken a while to create a user interface that makes it easy to open PGV databases with an FSDB cache. I started with hard-wired values, and then I went to a 2-step process of opening FSDB and PGV separately. Now I combined them by adding an FSDB section on the PGV open dialog. To the user it looks like just another PGV option, but when you specify an FSDB folder, it uses a different kind of database, the FSDB/PGV combo database instead of the standard PGV database.
I have also been working with threads. There is a new one for writing the mini-gedcoms, which are the summary of what is in the FSDB files. There were already threads to read in all the FSDB files, and now there are corresponding ones in FSDB/PGV to slowly populate FSDB from PGV in the background.
I have been tweaking the FSDB/PGV combo quite a bit. Besides the threads to slowly download the database, I have others that keep FSDB and PGV in sync. I have been fixing lots of bugs since the combo is based on FSDB, which is all new code.
It is almost time to upload a patch for people to try. I want to be sure about the local disk layout first, though. I currently organize the directories by INDI, FAM, and other. Maybe I should separate SOUR. The advantage of having separate directories is that you can get the list of all xref’s for a record type by getting a directory listing. For the ones that are mixed together, I have to read in all the records at start-up to know what kind they are.
I released the first test version of GDBI with the FSDB/PGV caching! It is version 13.1 in the "GDBI Patches" package.
Has anyone tried the caching in patch 13.1?
This is how you use the new version of GDBI with PGV caching...
The “Open database” window has a new “Local Cache” section with a directory name field, a browse button, and an offline option. If you do not enter a directory name, it connects to PGV the old way. If you want the new feature, you type in or browse to a directory (folder).
When you enter a new directory name and press OK, it verifies that you want to create a new FSDB directory. These directories are just read-only caches, so there is no harm in adding/deleting them. If you use multiple databases, make sure you do not mix up the FSDB directories though.
When you connect to PGV and start reading data, it stores the records in the FSDB directory. It also slowly syncs all the records in the background. (We may want an option for quickly finishing the sync so you know you have the whole database.)
The next time you open that PGV database with the FSDB cache, you can try the Offline option. That will skip the PGV connection and use the local data. It will be much faster, but it is read-only.
If you switch back to release 13, note that your list of connections will be different. The list is in a new registry location, so it starts with a copy and modifies the new list.
I am starting to work on the next step, allowing local edits that GDBI will upload the next time it connects to PGV. I will need to save information for each modified GEDCOM record, so I changed the FSDB file format from raw GEDCOM lines to the output of Properties.store(), which is a text file with name/value pairs. That meant adding an FSDB version check so it does not try using the FSDB directories with the old file format.
I have not given an update on FSDB/PGV for a while. The first step was caching all records so you can view the data offline. That part is working. The next goal is to use the FSDB cache most of the time and only use PGV to download the records that change. The final goal is to edit offline and sync the changes when you connect later.
What I have been working on lately is adding timestamps on all records. That will help find which records are out of date with PGV and which are modified locally. It also helps internally to avoid copying data between FSDB and PGV when it has not changed. I have also been testing BKEdit by adding info to my own database, which is a great motivator to fix bugs and add minor features.
The next step is to only download PGV records as needed, so I will start using the new PGV CGI “getchanges” command that gets the list of changes since a given date.
Log in to post a comment.