From: Serge N. <Ser...@fr...> - 2007-11-12 20:42:17
|
Le/The samedi 10 novembre 2007, Alex Roitman a =E9crit/wrote=A0: > Serge, >=20 > I am copying the devel list, as it is better suited to this topic. >=20 > On Fri, 2007-11-09 at 23:04 +0000, Serge Noiraud wrote:=20 > > Le/The jeudi 25 octobre 2007, Lee Myers a =E9crit/wrote : > > > What is the largest database any one has used in Gramps successfully.= I have > > > a lot of problems with programs freezing up during imports exports and > > > everything else because of the size of my files. > > > It would be nice to know if I have too keep looking for a good large > > > database editor. > > >=20 > >=20 > > I have ~ 124000 in one of my databases : > >=20 > > I have effectively many problems during import with SVN. > >=20 > > At 100% gramps uses 163 logs ( log.0000000* ) in the database directory= to load it. > > Each log is 10MB in size. > > At this point, gramps use appoximatively 350MB in memory. > > At this point, 142 logs are freed then we have new logs. > > So the max size used in the file system is : > > 163x10MB + real size of the database which is in my case 500MB. > > So I need 1.63+.5=3D2.13GB to load it. > >=20 > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > serge 8921 83.7 29.7 348064 288692 pts/0 R+ 21:25 109:27 python= src/gramps.py ( 100% file loaded ) > >=20 > > My biggest problem is "Lock table is out of available locks". > >=20 It takes a long time to find the following : After digging in the bsd db sources, I saw something very interesting. They use shared memory segment. There is a kernel parameter for this : the default value is 32MB ( shmmax ) I tried a big value ( 128 000 000 ) and after another try, my database load= ed without error. set_lk_max_locks and set_lk_max_objects had big values. I set the default value in gramps ( 25000 ) and it doesn't work so 25000 is= not enough. it worked with 80000 and perhaps below. I'm trying to make tests with different database size test to detect how ma= ny locks are needed and what shared memory size is needed. It could take some time, but I think I'm on the good way. if someone accept to send me different database size I could perform these = tests. ex : ( 30000 people, 50000, and above 150000 ) Is it possible to get the 850000 people database ? =2E.. > > There is several questions to ask : > >=20 > > 1 - Do we need a transaction when we import a file ? > > The transaction is in batch mode, so : > > A batch transaction does not store the commits > > Aborting the session completely will become impossible > > Undo is also impossible after batch transaction >=20 > Any batch transaction does not use bsddb transactions (txn). > It is called transaction in our own sense (gramps sense) which > is different from txn. The batch transaction is not a single > huge txn that stores all commits. >=20 > Instead, if transaction is batch (import, tool, etc) then each > commit call does this: > start txn > write data > commit txn >=20 > So if you have a million commits you will start, write, and end > a million txns. No problem here. >=20 > Now the problem with large databases is not the locks during > actual import. It is the problems with building the secondary > indices. Indeed the http://bugs.gramps-project.org/view.php?id=3D362 > is a good report to see this. After tracing with pdb I agree with that. >=20 > So here's the background. When we deal with large import, we first > remove the secondary indices: reference map, surname, etc. Then > we import, and then we rebuild the indices. This approach improves > the performance dramatically. Otherwise imports are slow, because > each data commit is writing into the real data tables *and* into > the secondary index. >=20 > All is fine, up until the size of the data is too much to rebuild > the secondary index. We could of course remove that "disconnect- > import-rebuild" trick and do none of this. Then we would be OK but slow. > A proper solution would be to find a way to create a secondary > index without the use of txn. Since the whole import is not a single txn > then we should not worry about wrapping secondary index into a single > txn either. Any corruption to the secondary index is also not fatal: > just remove and rebuild again. >=20 > The problem is, I could not get this working. If I have a txn-capable > environment and the txn-aware db table, I cannot create a secondary > index without transaction. If anybody has insights this would be great. >=20 > Otherwise, I can only see 2 choices: > 1. Disable the magic: slow but sure on import. > 2. Disable the magic if the database is too large. This brings > up the questions of how large is too large. Without changing anything : Importing the database is ~ 105 minutes ( 1h:45 ) long for 124000 people. The max file space used to load it was 1,9GB. The actual size is 730MB. When we start gramps, the loading is ~ 40 secondes selecting the events view takes 9 seconds ( 201000 events ) and clicking on the events ID to sort it is 13 seconds in time. So, for a big database it's very fast ! Don't forget python is an interpreted language. My CPU is an AMD Athlon(tm) 64 3500+ >=20 > But again, if anybody could fix the real problem or at least figure out > what to do it would be great. >=20 > Alex >=20 |