Serge,
I am copying the devel list, as it is better suited to this topic.
On Fri, 2007-11-09 at 23:04 +0000, Serge Noiraud wrote:=20
> Le/The jeudi 25 octobre 2007, Lee Myers a =C3=A9crit/wrote :
> > What is the largest database any one has used in Gramps successfully. I=
have
> > a lot of problems with programs freezing up during imports exports and
> > everything else because of the size of my files.
> > It would be nice to know if I have too keep looking for a good large
> > database editor.
> >=20
>=20
> I have ~ 124000 in one of my databases :
>=20
> I have effectively many problems during import with SVN.
>=20
> At 100% gramps uses 163 logs ( log.0000000* ) in the database directory t=
o load it.
> Each log is 10MB in size.
> At this point, gramps use appoximatively 350MB in memory.
> At this point, 142 logs are freed then we have new logs.
> So the max size used in the file system is :
> 163x10MB + real size of the database which is in my case 500MB.
> So I need 1.63+.5=3D2.13GB to load it.
>=20
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> serge 8921 83.7 29.7 348064 288692 pts/0 R+ 21:25 109:27 python s=
rc/gramps.py ( 100% file loaded )
>=20
> My biggest problem is "Lock table is out of available locks".
>=20
> A few month ago, to load it I need to modify the following lines like tha=
t :
>=20
> --- src/GrampsDb/_GrampsBSDDB.py.orig 2007-08-31 15:24:51.000000000 +02=
00
> +++ src/GrampsDb/_GrampsBSDDB.py 2007-08-31 15:26:23.000000000 +02=
00
> @@ -405,8 +405,8 @@
>=20
> if self.UseTXN:
> # These env settings are only needed for Txn environment
> - self.env.set_lk_max_locks(25000)
> - self.env.set_lk_max_objects(25000)
> + self.env.set_lk_max_locks(125000)
> + self.env.set_lk_max_objects(125000)
> self.env.set_flags(db.DB_LOG_AUTOREMOVE, 1) # clean up unus=
ed logs
>=20
> # The DB_PRIVATE flag must go if we ever move to multi-user =
setup
> --- src/GrampsDb/_GrampsDBDir.py.orig 2007-08-31 15:24:15.000000000 +02=
00
> +++ src/GrampsDb/_GrampsDBDir.py 2007-08-31 15:26:14.000000000 +02=
00
> @@ -475,8 +475,8 @@
>=20
> if self.UseTXN:
> # These env settings are only needed for Txn environment
> - self.env.set_lk_max_locks(25000)
> - self.env.set_lk_max_objects(25000)
> + self.env.set_lk_max_locks(125000)
> + self.env.set_lk_max_objects(125000)
> self.env.set_flags(db.DB_LOG_AUTOREMOVE, 1) # clean up unus=
ed logs
>=20
> # The DB_PRIVATE flag must go if we ever move to multi-user =
setup
>=20
>=20
> Now with the last svn, it doesn't works.
> I change the value wich is now 30000 and set it up to 2000000 !
> I can't load it.
> I tried to add self.env.set_lk_max_lockers(125000) in these two files wit=
hout success.
>=20
> The solution given by Robert Cawley asking to put a DB_CONFIG in the data=
base directory
> works, but when we get the problem, it is too late. The database is parti=
ally loaded.
> So to create this file, we must delete the database, re-create it, set th=
e DB_CONFIG and
> then we could import the big file. ( This file doesn't work with the last=
svn )
>=20
> How aunt Marta will do this ? How can she know the values to put in this =
file ?
>=20
> These are not good solutions.
>=20
> I think the default number of locks wich is 1000 ( the default ) should b=
e sufficient.
> OK for 10000 but not more than that.
>=20
> The problem is the transaction.
> When we normally works adding some people, modifying some others, adding =
a few notes,
> we use a few locks. So I think the default is OK ( 1000 ).
>=20
> There is several questions to ask :
>=20
> 1 - Do we need a transaction when we import a file ?
> The transaction is in batch mode, so :
> A batch transaction does not store the commits
> Aborting the session completely will become impossible
> Undo is also impossible after batch transaction
Any batch transaction does not use bsddb transactions (txn).
It is called transaction in our own sense (gramps sense) which
is different from txn. The batch transaction is not a single
huge txn that stores all commits.
Instead, if transaction is batch (import, tool, etc) then each
commit call does this:
start txn
write data
commit txn
So if you have a million commits you will start, write, and end
a million txns. No problem here.
Now the problem with large databases is not the locks during
actual import. It is the problems with building the secondary
indices. Indeed the http://bugs.gramps-project.org/view.php?id=3D362
is a good report to see this.
So here's the background. When we deal with large import, we first
remove the secondary indices: reference map, surname, etc. Then
we import, and then we rebuild the indices. This approach improves
the performance dramatically. Otherwise imports are slow, because
each data commit is writing into the real data tables *and* into
the secondary index.
All is fine, up until the size of the data is too much to rebuild
the secondary index. We could of course remove that "disconnect-
import-rebuild" trick and do none of this. Then we would be OK but slow.
A proper solution would be to find a way to create a secondary
index without the use of txn. Since the whole import is not a single txn
then we should not worry about wrapping secondary index into a single
txn either. Any corruption to the secondary index is also not fatal:
just remove and rebuild again.
The problem is, I could not get this working. If I have a txn-capable
environment and the txn-aware db table, I cannot create a secondary
index without transaction. If anybody has insights this would be great.
Otherwise, I can only see 2 choices:
1. Disable the magic: slow but sure on import.
2. Disable the magic if the database is too large. This brings
up the questions of how large is too large.
But again, if anybody could fix the real problem or at least figure out
what to do it would be great.
Alex
--=20
Alexander Roitman http://gramps-project.org
|