Thread: [Gramps-users] why is reference_map.db such a huge file!?

Gramps, the open source genealogy program

Brought to you by: bmcage, dallingham, nick-h, pez4brian, sam888

gramps-users

[Gramps-users] why is reference_map.db such a huge file!?

From: TJMcK <tim...@gm...> - 2014-04-28 19:31:37

With my very limited knowledge of dbs and how they work (from my recent
reading of Oracle related docs), the speed of a database (and gramps) and
how fast it can index is directly proportional to the size of the tables. 
And to speed up gramps it may be necessary to split or partition tables
(which may be a huge or impossible task??)
  
Now, is it correct to say that when I see a file like reference_map.db, and
it's double the size of a couple other files and nearly 10x larger then the
average db file, that there is a bottleneck here? (As well, this file is the
most accessed db file - almost constant activity).

Or quite possibly I just need to understand something else about this
reference_map.db file.. and why it's so large... Is it large for other users
too?



--
View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778.html
Sent from the GRAMPS - User mailing list archive at Nabble.com.

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Enno B. <enn...@gm...> - 2014-04-28 21:27:56

Tim,
> With my very limited knowledge of dbs and how they work (from my recent
> reading of Oracle related docs), the speed of a database (and gramps) and
> how fast it can index is directly proportional to the size of the tables.
> And to speed up gramps it may be necessary to split or partition tables
> (which may be a huge or impossible task??)
>    
> Now, is it correct to say that when I see a file like reference_map.db, and
> it's double the size of a couple other files and nearly 10x larger then the
> average db file, that there is a bottleneck here? (As well, this file is the
> most accessed db file - almost constant activity).
>
> Or quite possibly I just need to understand something else about this
> reference_map.db file.. and why it's so large... Is it large for other users
> too?
It is large here too, and to me the name suggests that this one, and the 
one named referenced_map.db, connect all the other ones. That would also 
explain why it is accessed that much.

If that is true, one may expect that this file grows with the number of 
connections between persons, events, sources, and so forth, and it may 
indeed be a candidate for optimizing.

And I bet you're right saying that splitting or partitioning it may be a 
huge task ...

regards,

Enno

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Bruce M. <moo...@gm...> - 2014-04-28 21:50:38

Attachments: smime.p7s moore_bw_22.vcf

Oracle/DB2/Postgres/MySQL etc are designed for dramatically (repeat,
dramatically) larger databases where you can spend tens (or hundreds) of
thousands of dollars on a software license.

Gramps uses berkeley database (BDB), a common (and good) choice for
embedded applications.  The partitioning functions discussed in the
Oracle documentation are generally not going to be available on BDB and
thus on Gramps.  It is also important to note the the speed of building
an index and the speed of accessing a table via an index are two
completely different things--building an index may take several hours
while a random access will be a few milliseconds.  Building an index is
definitely related to the size; accessing an index is generally
unrelated to size for all practical purposes.

I doubt that BDB offers partitioning.

I don't know the Gramps entity relationship model, but I suspect that
the reference map  table contains all of the pointers between people and
people, people and citations, citations and sources, people and places
etc., so it probably has at least one row for each row in every other
table in the database.  It doesn't surprise me that it is quite large. 
Since all of the accesses likely include the relationship type and the
object id, all of the accesses will be indexed based, and very very
fast.  If Gramps were doing table scans for everything, it would be so
slow as to be completely unusable.

If you are interested in improving the database performance, I would
read up on the BDB db_config file and see if there are ways to change
caching, page size, and locking.  In most embedded applications the
development choices are based upon settings that will run in all
environments, rather than high performance.  You may be able to change
some settings that will take advantage of more memory on your machine
(but which might not run at all on a smaller box).

You might investigate utilities for reorganizing (sorting) tables and/or
indexes.  I don't know if BDB offers these capabilities.

Read a recent thread on loading very large GEDCOM files for a starting
place.

Bruce Moore

On 04/28/2014 02:31 PM, TJMcK wrote:
> With my very limited knowledge of dbs and how they work (from my recent
> reading of Oracle related docs), the speed of a database (and gramps) and
> how fast it can index is directly proportional to the size of the tables. 
> And to speed up gramps it may be necessary to split or partition tables
> (which may be a huge or impossible task??)
>   
> Now, is it correct to say that when I see a file like reference_map.db, and
> it's double the size of a couple other files and nearly 10x larger then the
> average db file, that there is a bottleneck here? (As well, this file is the
> most accessed db file - almost constant activity).
>
> Or quite possibly I just need to understand something else about this
> reference_map.db file.. and why it's so large... Is it large for other users
> too?
>
>
>
> --
> View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778.html
> Sent from the GRAMPS - User mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Gramps-users mailing list
> Gra...@li...
> https://lists.sourceforge.net/lists/listinfo/gramps-users

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: TJMcK <tim...@gm...> - 2014-04-29 00:29:19

Thanks Bruce... lots of interesting stuff... gets me on track with my
reading.  I made the assumption that since the db_utilities that I
downloaded from Oracle worked with BDB, that Oracle was a variation of
BDB...

As for optimizing gramps with DB_CONFIG... I did spend quite a bit of time
testing gramps with this file, but the only noticeable speed improvement was
on gramps startup. I even tried setting cache to 0 bytes and gramps ran as
normal.  But maybe I'll spend some more time researching this...  Another
concern that I have is, that gramps may not be coded to use the DB_CONFIG
file for all the parts (modules??) that could use extra RAM? (Again, I
thought that what I read regarding the DB_CONFIG file, was in the Oracle
documentation... so I'll check this again...)

Something that would be of interest would be putting just the
reference_map.db in a ram disk. /Would there be anyway to "link' a ram disk
and the database folder to think it's one and the same?/  Then I could write
a script to copy the reference file into the ram before loading gramps and
copy it back into the harddrive db folder after gramps closes.  (And, since
I discovered that linux already uses a hidden RAM disk, I'd just use that
for the reference db file.)



--
View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665784.html
Sent from the GRAMPS - User mailing list archive at Nabble.com.

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Ron J. <ron...@co...> - 2014-04-29 06:34:57

On 04/28/2014 07:28 PM, TJMcK wrote:
[snip]
> Something that would be of interest would be putting just the
> reference_map.db in a ram disk. /Would there be anyway to "link' a ram disk
> and the database folder to think it's one and the same?/
unionfs might be what you're looking for.

>    Then I could write
> a script to copy the reference file into the ram before loading gramps and
> copy it back into the harddrive db folder after gramps closes.  (And, since
> I discovered that linux already uses a hidden RAM disk, I'd just use that
> for the reference db file.)

Why not just mv the file to the ramdisk then soft symlink it back to the db 
directory?

-- 
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: TJMcK <tim...@gm...> - 2014-04-29 20:05:51

I knew nothing about symlinks before you mentioned it... But from what I've
read it would have to be a "fast symlink".  It doesn't appear to be a simple
process from what I've seen. So far I haven't found any clear info about
this, for an amateur such as I.  I will continue to research this option as
this seems to be a good option for storing reference_map.db (or even all the
map.db files).  And of course I need to test this setup to see if it will
even work...



--
View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665791.html
Sent from the GRAMPS - User mailing list archive at Nabble.com.

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Ron J. <ron...@co...> - 2014-04-29 22:53:46

I've been using Linux for 14 years and have never hear of "fast symlinks".

[pause]

Ah, it's a change in the internal format that the FS stores link info. It's 
so old that it's the only way I've ever seen symlinks stored.

As for difficulty... pish. Couldn't be simpler!

Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a 
command window, what you do is:
## Set up
cd ~/.gramps/grampsdb/{hex-string}
mv reference_map.db ~/ramdisk
ls -aFl ~/ramdisk ## TO VERIFY
ln -s ~/ramdisk/reference_map.db
ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)

## DO YOUR GENEALOGY STUFF HERE
gramps

## Cleanup
rm reference_map.db
mv ~/ramdisk/reference_map.db .
ls -aFl reference_map.db ## TO VERIFY. Should be huge.

On 04/29/2014 03:05 PM, TJMcK wrote:
> I knew nothing about symlinks before you mentioned it... But from what I've
> read it would have to be a "fast symlink".  It doesn't appear to be a simple
> process from what I've seen. So far I haven't found any clear info about
> this, for an amateur such as I.  I will continue to research this option as
> this seems to be a good option for storing reference_map.db (or even all the
> map.db files).  And of course I need to test this setup to see if it will
> even work...

-- 
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Ron J. <ron...@co...> - 2014-04-29 23:29:57

Such is the risk of using RAM disks. But really, when was the last time your 
Linux PC froze or hung? In my case, it's been a *long* time.

Anyway, because you're a good and conscientious user, you make frequent 
backups, so in the very unlikely case that your PC crashes, you won't have 
lost a huge amount.

On 04/29/2014 06:06 PM, Tom Samstag wrote:
> first time poster here, but I found it important enough to mention that if you're talking about
> storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
> While Ron's directions would work, if your OS crashed or you ran out of power or something else
> caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
> a dangling symlink and no database.
>
> On 2014-04-29 15:53, Ron Johnson wrote:
>> I've been using Linux for 14 years and have never hear of "fast symlinks".
>>
>> [pause]
>>
>> Ah, it's a change in the internal format that the FS stores link info. It's
>> so old that it's the only way I've ever seen symlinks stored.
>>
>>
>> As for difficulty... pish. Couldn't be simpler!
>>
>> Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a
>> command window, what you do is:
>> ## Set up
>> cd ~/.gramps/grampsdb/{hex-string}
>> mv reference_map.db ~/ramdisk
>> ls -aFl ~/ramdisk ## TO VERIFY
>> ln -s ~/ramdisk/reference_map.db
>> ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)
>>
>> ## DO YOUR GENEALOGY STUFF HERE
>> gramps
>>
>> ## Cleanup
>> rm reference_map.db
>> mv ~/ramdisk/reference_map.db .
>> ls -aFl reference_map.db ## TO VERIFY. Should be huge.
>>
>>
>> On 04/29/2014 03:05 PM, TJMcK wrote:
>>> I knew nothing about symlinks before you mentioned it... But from what I've
>>> read it would have to be a "fast symlink".  It doesn't appear to be a simple
>>> process from what I've seen. So far I haven't found any clear info about
>>> this, for an amateur such as I.  I will continue to research this option as
>>> this seems to be a good option for storing reference_map.db (or even all the
>>> map.db files).  And of course I need to test this setup to see if it will
>>> even work...
>>>


-- 
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Tom S. <gra...@mo...> - 2014-04-29 23:34:41

first time poster here, but I found it important enough to mention that if you're talking about
storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
While Ron's directions would work, if your OS crashed or you ran out of power or something else
caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
a dangling symlink and no database.

On 2014-04-29 15:53, Ron Johnson wrote:
> I've been using Linux for 14 years and have never hear of "fast symlinks".
> 
> [pause]
> 
> Ah, it's a change in the internal format that the FS stores link info. It's 
> so old that it's the only way I've ever seen symlinks stored.
> 
> 
> As for difficulty... pish. Couldn't be simpler!
> 
> Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a 
> command window, what you do is:
> ## Set up
> cd ~/.gramps/grampsdb/{hex-string}
> mv reference_map.db ~/ramdisk
> ls -aFl ~/ramdisk ## TO VERIFY
> ln -s ~/ramdisk/reference_map.db
> ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)
> 
> ## DO YOUR GENEALOGY STUFF HERE
> gramps
> 
> ## Cleanup
> rm reference_map.db
> mv ~/ramdisk/reference_map.db .
> ls -aFl reference_map.db ## TO VERIFY. Should be huge.
> 
> 
> On 04/29/2014 03:05 PM, TJMcK wrote:
>> I knew nothing about symlinks before you mentioned it... But from what I've
>> read it would have to be a "fast symlink".  It doesn't appear to be a simple
>> process from what I've seen. So far I haven't found any clear info about
>> this, for an amateur such as I.  I will continue to research this option as
>> this seems to be a good option for storing reference_map.db (or even all the
>> map.db files).  And of course I need to test this setup to see if it will
>> even work...
>

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: Ken B. <kb...@te...> - 2014-04-30 00:07:01

That would not seem to be the case.

I moved all three *map.db files out of the way then loaded my small 
database in Gramps.
The database opened fine and the three files were recreated but of a 
much smaller size.

I have no idea what these files do or if they are actually loaded at 
start up.

I'll have to experiment with my truly huge database.

Ken.

On 29/04/14 04:06 PM, Tom Samstag wrote:
> first time poster here, but I found it important enough to mention that if you're talking about
> storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
> While Ron's directions would work, if your OS crashed or you ran out of power or something else
> caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
> a dangling symlink and no database.
>
>

Re: [Gramps-users] why is reference_map.db such a huge file!?

From: TJMcK <tim...@gm...> - 2014-04-30 21:30:20

I have done a fair bit of testing with putting db files and/or all db files
in a ramdisk.  

*Conclusion*: for the hardware that I use, putting any or all of the db in a
ramdisk has little or no improvement in speed!
*Caution*: I hope that if any tried this that you made backups because there
is a high probability there will be corruption (I have posted a bug)  The
files that you move between different folder appear to get corrupted with
low-level errors...  only your backup with save the testing db.

I've really appreciated the feedback about "large db files"... unfortunately
I didn't make any great discoveries to increase the speed of gramps.  But
now, my next step will be to attempt to profile gramps...  I've installed a
couple of python profilers and Kcachegrind to see what they can tell me.  Do
the developers do this on a regular basis?  Anyone ever done this?  Maybe I
should start a new topic?



--
View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665803.html
Sent from the GRAMPS - User mailing list archive at Nabble.com.