Re: [Gramps-users] [Gramps-devel] large database

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Le/The samedi 10 novembre 2007, Alex Roitman a =E9crit/wrote=A0:
> Serge,
>=20
> I am copying the devel list, as it is better suited to this topic.
>=20
> On Fri, 2007-11-09 at 23:04 +0000, Serge Noiraud wrote:=20
> > Le/The jeudi 25 octobre 2007, Lee Myers a =E9crit/wrote :
> > > What is the largest database any one has used in Gramps successfully.=
 I have
> > > a lot of problems with programs freezing up during imports exports and
> > > everything else because of the size of my files.
> > > It would be nice to know if I have too keep looking for a good large
> > > database editor.
> > >=20
> >=20
> > I have ~ 124000 in one of my databases :
> >=20
> > I have effectively many problems during import with SVN.
> >=20
> > At 100% gramps uses 163 logs ( log.0000000* ) in the database directory=
 to load it.
> > Each log is 10MB in size.
> > At this point, gramps use appoximatively 350MB in memory.
> > At this point, 142 logs are freed then we have new logs.
> > So the max size used in the file system is :
> > 163x10MB + real size of the database which is in my case 500MB.
> > So I need 1.63+.5=3D2.13GB to load it.
> >=20
> > USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> > serge     8921 83.7 29.7 348064 288692 pts/0   R+   21:25 109:27 python=
 src/gramps.py ( 100% file loaded )
> >=20
> > My biggest problem is "Lock table is out of available locks".
> >=20

It takes a long time to find the following :
After digging in the bsd db sources, I saw something very interesting.
They use shared memory segment.
There is a kernel parameter for this : the default value is 32MB ( shmmax )
I tried a big value ( 128 000 000 ) and after another try, my database load=
ed without error.
set_lk_max_locks and set_lk_max_objects had big values.
I set the default value in gramps ( 25000 ) and it doesn't work so 25000 is=
 not enough.
it worked with 80000 and perhaps below.

I'm trying to make tests with different database size test to detect how ma=
ny locks
are needed and what shared memory size is needed.

It could take some time, but I think I'm on the good way.
if someone accept to send me different database size I could perform these =
tests.
ex : ( 30000 people, 50000, and above 150000 )
Is it possible to get the 850000 people database ?

=2E..

> > There is several questions to ask :
> >=20
> > 1 - Do we need a transaction when we import a file ?
> >     The transaction is in batch mode, so :
> >     A batch transaction does not store the commits
> >     Aborting the session completely will become impossible
> >     Undo is also impossible after batch transaction
>=20
> Any batch transaction does not use bsddb transactions (txn).
> It is called transaction in our own sense (gramps sense) which
> is different from txn. The batch transaction is not a single
> huge txn that stores all commits.
>=20
> Instead, if transaction is batch (import, tool, etc) then each
> commit call does this:
>    start txn
>    write data
>    commit txn
>=20
> So if you have a million commits you will start, write, and end
> a million txns. No problem here.
>=20
> Now the problem with large databases is not the locks during
> actual import. It is the problems with building the secondary
> indices. Indeed the http://bugs.gramps-project.org/view.php?id=3D362
> is a good report to see this.

After tracing with pdb I agree with that.

>=20
> So here's the background. When we deal with large import, we first
> remove the secondary indices: reference map, surname, etc. Then
> we import, and then we rebuild the indices. This approach improves
> the performance dramatically. Otherwise imports are slow, because
> each data commit is writing into the real data tables *and* into
> the secondary index.
>=20
> All is fine, up until the size of the data is too much to rebuild
> the secondary index. We could of course remove that "disconnect-
> import-rebuild" trick and do none of this. Then we would be OK but slow.
> A proper solution would be to find a way to create a secondary
> index without the use of txn. Since the whole import is not a single txn
> then we should not worry about wrapping secondary index into a single
> txn either. Any corruption to the secondary index is also not fatal:
> just remove and rebuild again.
>=20
> The problem is, I could not get this working. If I have a txn-capable
> environment and the txn-aware db table, I cannot create a secondary
> index without transaction. If anybody has insights this would be great.
>=20
> Otherwise, I can only see 2 choices:
> 1. Disable the magic: slow but sure on import.
> 2. Disable the magic if the database is too large. This brings
> up the questions of how large is too large.

Without changing anything :
Importing the database is ~ 105 minutes ( 1h:45 ) long for 124000 people.
The max file space used to load it was 1,9GB.
The actual size is 730MB.
When we start gramps, the loading is ~ 40 secondes
selecting the events view takes 9 seconds ( 201000 events )
and clicking on the events ID to sort it is 13 seconds in time.

So, for a big database it's very fast !
Don't forget python is an interpreted language.
My CPU is an AMD Athlon(tm) 64 3500+

>=20
> But again, if anybody could fix the real problem or at least figure out
> what to do it would be great.
>=20
> Alex
>=20

Re: [Gramps-users] [Gramps-devel] large database

Gramps, the open source genealogy program

Re: [Gramps-users] [Gramps-devel] large database