2013/2/26 John Ralls <jralls@ceridwen.fremont.ca.us>

On Feb 25, 2013, at 4:28 PM, Benny Malengier <benny.malengier@gmail.com> wrote:


I think you go to fast.
We provide the flag (gen/db/write.py) db.DB_PRIVATE, so we open for a single process. So that part of your assertion we already do.
However, we also initialize the locks, db.DB_INIT_LOCK, which is needed because GTK is multithreaded, so 3 different gramplet might be going over the database, together with a view. Doing a save, and having cursor above a view column will have GTK retrieving data while save is ongoing.

As far as I remember, that is why we need the LOCK system. Now, it might be true that we can remove the LOCK, I have never seen a deadlock type of error with gramps, which really amazed me when starting with Gramps, coming from database programming. But then, we don't do normally writes in two transactions at the same time, so problems go down fast when we combine that with short view and write transaction + lock of application on batch transactions behind a progress bar.

Anyway, as the doc says:
Initialize the locking subsystem. This subsystem should be used when multiple processes or threads are going to be reading and writing a Berkeley DB database, so that they do not interfere with each other. If all threads are accessing the database(s) read-only, locking is unnecessary. When the DB_INIT_LOCK flag is specified, it is usually necessary to run a 

I do think that GTK being multithread, requires us to have it. Hence, shorter transactions for import or batch operations, are an option, as long as they are done sufficiently intelligent.
I don't think there is a big problem doing this in many tools. On import though, a crash would give part imported, part not, and reimporting will cause a problem. Other workarounds might be devised there though.

I'm not going too fast, 'cause I'm not going anywhere. I seriously don't have time to muck about in the db backend. That said:

Whatever makes you think that Gtk is multithreaded? It *supports* multithreading, but only if the client application is written to be. In fact there's been a lot of discussion on the gtk-dev list over the years about how it's important to have only one thread operating on GtkWidgets. For portability that needs to be the main thread, because OSX and Windows will only send events there.


That flag on the Db has always been there. I was in the impression that events are handled async, and that that comes down to needing the locking.

So, you say that although that GTK is async this is not implemented via multiple threads, GTK will never interrupt while in a transaction, as long as we don't have transactions that also process some events?
So, user can click on a 'delete person' button, but that will never actually be done during the transaction that is running?

Well, removing the flag is easy to test. I'm somewhat worried about the effect of processing events to have the user interface responsive, while a transaction is running. Don't know if we actually have that except for the batch transactions and the progress dialog it keeps responsive.


AFAICT Gramps isn't multithreaded: grampsgui.py calls GObject.threads_init(), but that's it. The webapp has a single call to python's threading.Thread.start(), but it's not part of Gramps proper, and I didn't dig into it to see if it's touching the database in both the main and the worker threads.

As Enno pointed out, the problem with having locking turned on and all inclusive transactions is that it doesn't scale. Every record the import touches gets a lock which isn't released until the transaction is either committed or rolled back, and eventually bdb runs out of space in the lock table. IIRC, you can checkpoint at the beginning of the "batch" operation and easily roll back to the checkpoint if you crash in the middle: No need to try to do everything in a single txn.

John Ralls