From: TJMcK <tim...@gm...> - 2014-04-28 19:31:37
|
With my very limited knowledge of dbs and how they work (from my recent reading of Oracle related docs), the speed of a database (and gramps) and how fast it can index is directly proportional to the size of the tables. And to speed up gramps it may be necessary to split or partition tables (which may be a huge or impossible task??) Now, is it correct to say that when I see a file like reference_map.db, and it's double the size of a couple other files and nearly 10x larger then the average db file, that there is a bottleneck here? (As well, this file is the most accessed db file - almost constant activity). Or quite possibly I just need to understand something else about this reference_map.db file.. and why it's so large... Is it large for other users too? -- View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778.html Sent from the GRAMPS - User mailing list archive at Nabble.com. |
From: Enno B. <enn...@gm...> - 2014-04-28 21:27:56
|
Tim, > With my very limited knowledge of dbs and how they work (from my recent > reading of Oracle related docs), the speed of a database (and gramps) and > how fast it can index is directly proportional to the size of the tables. > And to speed up gramps it may be necessary to split or partition tables > (which may be a huge or impossible task??) > > Now, is it correct to say that when I see a file like reference_map.db, and > it's double the size of a couple other files and nearly 10x larger then the > average db file, that there is a bottleneck here? (As well, this file is the > most accessed db file - almost constant activity). > > Or quite possibly I just need to understand something else about this > reference_map.db file.. and why it's so large... Is it large for other users > too? It is large here too, and to me the name suggests that this one, and the one named referenced_map.db, connect all the other ones. That would also explain why it is accessed that much. If that is true, one may expect that this file grows with the number of connections between persons, events, sources, and so forth, and it may indeed be a candidate for optimizing. And I bet you're right saying that splitting or partitioning it may be a huge task ... regards, Enno |
From: Bruce M. <moo...@gm...> - 2014-04-28 21:50:38
Attachments:
smime.p7s
moore_bw_22.vcf
|
Oracle/DB2/Postgres/MySQL etc are designed for dramatically (repeat, dramatically) larger databases where you can spend tens (or hundreds) of thousands of dollars on a software license. Gramps uses berkeley database (BDB), a common (and good) choice for embedded applications. The partitioning functions discussed in the Oracle documentation are generally not going to be available on BDB and thus on Gramps. It is also important to note the the speed of building an index and the speed of accessing a table via an index are two completely different things--building an index may take several hours while a random access will be a few milliseconds. Building an index is definitely related to the size; accessing an index is generally unrelated to size for all practical purposes. I doubt that BDB offers partitioning. I don't know the Gramps entity relationship model, but I suspect that the reference map table contains all of the pointers between people and people, people and citations, citations and sources, people and places etc., so it probably has at least one row for each row in every other table in the database. It doesn't surprise me that it is quite large. Since all of the accesses likely include the relationship type and the object id, all of the accesses will be indexed based, and very very fast. If Gramps were doing table scans for everything, it would be so slow as to be completely unusable. If you are interested in improving the database performance, I would read up on the BDB db_config file and see if there are ways to change caching, page size, and locking. In most embedded applications the development choices are based upon settings that will run in all environments, rather than high performance. You may be able to change some settings that will take advantage of more memory on your machine (but which might not run at all on a smaller box). You might investigate utilities for reorganizing (sorting) tables and/or indexes. I don't know if BDB offers these capabilities. Read a recent thread on loading very large GEDCOM files for a starting place. Bruce Moore On 04/28/2014 02:31 PM, TJMcK wrote: > With my very limited knowledge of dbs and how they work (from my recent > reading of Oracle related docs), the speed of a database (and gramps) and > how fast it can index is directly proportional to the size of the tables. > And to speed up gramps it may be necessary to split or partition tables > (which may be a huge or impossible task??) > > Now, is it correct to say that when I see a file like reference_map.db, and > it's double the size of a couple other files and nearly 10x larger then the > average db file, that there is a bottleneck here? (As well, this file is the > most accessed db file - almost constant activity). > > Or quite possibly I just need to understand something else about this > reference_map.db file.. and why it's so large... Is it large for other users > too? > > > > -- > View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778.html > Sent from the GRAMPS - User mailing list archive at Nabble.com. > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Gramps-users mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-users |
From: TJMcK <tim...@gm...> - 2014-04-29 00:29:19
|
Thanks Bruce... lots of interesting stuff... gets me on track with my reading. I made the assumption that since the db_utilities that I downloaded from Oracle worked with BDB, that Oracle was a variation of BDB... As for optimizing gramps with DB_CONFIG... I did spend quite a bit of time testing gramps with this file, but the only noticeable speed improvement was on gramps startup. I even tried setting cache to 0 bytes and gramps ran as normal. But maybe I'll spend some more time researching this... Another concern that I have is, that gramps may not be coded to use the DB_CONFIG file for all the parts (modules??) that could use extra RAM? (Again, I thought that what I read regarding the DB_CONFIG file, was in the Oracle documentation... so I'll check this again...) Something that would be of interest would be putting just the reference_map.db in a ram disk. /Would there be anyway to "link' a ram disk and the database folder to think it's one and the same?/ Then I could write a script to copy the reference file into the ram before loading gramps and copy it back into the harddrive db folder after gramps closes. (And, since I discovered that linux already uses a hidden RAM disk, I'd just use that for the reference db file.) -- View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665784.html Sent from the GRAMPS - User mailing list archive at Nabble.com. |
From: Ron J. <ron...@co...> - 2014-04-29 06:34:57
|
On 04/28/2014 07:28 PM, TJMcK wrote: [snip] > Something that would be of interest would be putting just the > reference_map.db in a ram disk. /Would there be anyway to "link' a ram disk > and the database folder to think it's one and the same?/ unionfs might be what you're looking for. > Then I could write > a script to copy the reference file into the ram before loading gramps and > copy it back into the harddrive db folder after gramps closes. (And, since > I discovered that linux already uses a hidden RAM disk, I'd just use that > for the reference db file.) Why not just mv the file to the ramdisk then soft symlink it back to the db directory? -- "Mathematics deals exclusively with the relations of concepts to each other without consideration of their relation to experience." Albert Einstein |
From: TJMcK <tim...@gm...> - 2014-04-29 20:05:51
|
I knew nothing about symlinks before you mentioned it... But from what I've read it would have to be a "fast symlink". It doesn't appear to be a simple process from what I've seen. So far I haven't found any clear info about this, for an amateur such as I. I will continue to research this option as this seems to be a good option for storing reference_map.db (or even all the map.db files). And of course I need to test this setup to see if it will even work... -- View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665791.html Sent from the GRAMPS - User mailing list archive at Nabble.com. |
From: Ron J. <ron...@co...> - 2014-04-29 22:53:46
|
I've been using Linux for 14 years and have never hear of "fast symlinks". [pause] Ah, it's a change in the internal format that the FS stores link info. It's so old that it's the only way I've ever seen symlinks stored. As for difficulty... pish. Couldn't be simpler! Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a command window, what you do is: ## Set up cd ~/.gramps/grampsdb/{hex-string} mv reference_map.db ~/ramdisk ls -aFl ~/ramdisk ## TO VERIFY ln -s ~/ramdisk/reference_map.db ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*) ## DO YOUR GENEALOGY STUFF HERE gramps ## Cleanup rm reference_map.db mv ~/ramdisk/reference_map.db . ls -aFl reference_map.db ## TO VERIFY. Should be huge. On 04/29/2014 03:05 PM, TJMcK wrote: > I knew nothing about symlinks before you mentioned it... But from what I've > read it would have to be a "fast symlink". It doesn't appear to be a simple > process from what I've seen. So far I haven't found any clear info about > this, for an amateur such as I. I will continue to research this option as > this seems to be a good option for storing reference_map.db (or even all the > map.db files). And of course I need to test this setup to see if it will > even work... -- "Mathematics deals exclusively with the relations of concepts to each other without consideration of their relation to experience." Albert Einstein |
From: Ron J. <ron...@co...> - 2014-04-29 23:29:57
|
Such is the risk of using RAM disks. But really, when was the last time your Linux PC froze or hung? In my case, it's been a *long* time. Anyway, because you're a good and conscientious user, you make frequent backups, so in the very unlikely case that your PC crashes, you won't have lost a huge amount. On 04/29/2014 06:06 PM, Tom Samstag wrote: > first time poster here, but I found it important enough to mention that if you're talking about > storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence. > While Ron's directions would work, if your OS crashed or you ran out of power or something else > caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with > a dangling symlink and no database. > > On 2014-04-29 15:53, Ron Johnson wrote: >> I've been using Linux for 14 years and have never hear of "fast symlinks". >> >> [pause] >> >> Ah, it's a change in the internal format that the FS stores link info. It's >> so old that it's the only way I've ever seen symlinks stored. >> >> >> As for difficulty... pish. Couldn't be simpler! >> >> Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a >> command window, what you do is: >> ## Set up >> cd ~/.gramps/grampsdb/{hex-string} >> mv reference_map.db ~/ramdisk >> ls -aFl ~/ramdisk ## TO VERIFY >> ln -s ~/ramdisk/reference_map.db >> ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*) >> >> ## DO YOUR GENEALOGY STUFF HERE >> gramps >> >> ## Cleanup >> rm reference_map.db >> mv ~/ramdisk/reference_map.db . >> ls -aFl reference_map.db ## TO VERIFY. Should be huge. >> >> >> On 04/29/2014 03:05 PM, TJMcK wrote: >>> I knew nothing about symlinks before you mentioned it... But from what I've >>> read it would have to be a "fast symlink". It doesn't appear to be a simple >>> process from what I've seen. So far I haven't found any clear info about >>> this, for an amateur such as I. I will continue to research this option as >>> this seems to be a good option for storing reference_map.db (or even all the >>> map.db files). And of course I need to test this setup to see if it will >>> even work... >>> -- "Mathematics deals exclusively with the relations of concepts to each other without consideration of their relation to experience." Albert Einstein |
From: Tom S. <gra...@mo...> - 2014-04-29 23:34:41
|
first time poster here, but I found it important enough to mention that if you're talking about storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence. While Ron's directions would work, if your OS crashed or you ran out of power or something else caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with a dangling symlink and no database. On 2014-04-29 15:53, Ron Johnson wrote: > I've been using Linux for 14 years and have never hear of "fast symlinks". > > [pause] > > Ah, it's a change in the internal format that the FS stores link info. It's > so old that it's the only way I've ever seen symlinks stored. > > > As for difficulty... pish. Couldn't be simpler! > > Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a > command window, what you do is: > ## Set up > cd ~/.gramps/grampsdb/{hex-string} > mv reference_map.db ~/ramdisk > ls -aFl ~/ramdisk ## TO VERIFY > ln -s ~/ramdisk/reference_map.db > ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*) > > ## DO YOUR GENEALOGY STUFF HERE > gramps > > ## Cleanup > rm reference_map.db > mv ~/ramdisk/reference_map.db . > ls -aFl reference_map.db ## TO VERIFY. Should be huge. > > > On 04/29/2014 03:05 PM, TJMcK wrote: >> I knew nothing about symlinks before you mentioned it... But from what I've >> read it would have to be a "fast symlink". It doesn't appear to be a simple >> process from what I've seen. So far I haven't found any clear info about >> this, for an amateur such as I. I will continue to research this option as >> this seems to be a good option for storing reference_map.db (or even all the >> map.db files). And of course I need to test this setup to see if it will >> even work... > |
From: Ken B. <kb...@te...> - 2014-04-30 00:07:01
|
That would not seem to be the case. I moved all three *map.db files out of the way then loaded my small database in Gramps. The database opened fine and the three files were recreated but of a much smaller size. I have no idea what these files do or if they are actually loaded at start up. I'll have to experiment with my truly huge database. Ken. On 29/04/14 04:06 PM, Tom Samstag wrote: > first time poster here, but I found it important enough to mention that if you're talking about > storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence. > While Ron's directions would work, if your OS crashed or you ran out of power or something else > caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with > a dangling symlink and no database. > > |
From: TJMcK <tim...@gm...> - 2014-04-30 21:30:20
|
I have done a fair bit of testing with putting db files and/or all db files in a ramdisk. *Conclusion*: for the hardware that I use, putting any or all of the db in a ramdisk has little or no improvement in speed! *Caution*: I hope that if any tried this that you made backups because there is a high probability there will be corruption (I have posted a bug) The files that you move between different folder appear to get corrupted with low-level errors... only your backup with save the testing db. I've really appreciated the feedback about "large db files"... unfortunately I didn't make any great discoveries to increase the speed of gramps. But now, my next step will be to attempt to profile gramps... I've installed a couple of python profilers and Kcachegrind to see what they can tell me. Do the developers do this on a regular basis? Anyone ever done this? Maybe I should start a new topic? -- View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778p4665803.html Sent from the GRAMPS - User mailing list archive at Nabble.com. |