From: <ri...@dr...> - 2003-07-22 11:32:41
|
Hi! We're looking at creating an object database on top of JDBM, and so far=20 it seem to be doable but I have some questions about JDBM. Here we go: * When I do commit() on a recMgr I thought that the .lg file was=20 emptied, but that doesnt' seem to be the case. What exactly happens=20 during a commit()? * I've made a testcase inserting 10M objects, doing a commit every 100k=20 of inserts. However, if I output Runtime.freeMemory() it continuously=20 drops, and doesn't seem to ever recover. Is there a memory leak=20 somewhere? Is JDBM holding onto objects somewhere? * I tried the same testcase above but first called disableTransactions.=20 After the testcase there's 10M objects, but if I close and open the=20 database it says 0. Is there anything I need to do to make the objects=20 really persist? * I'd like to implement read/write locks using Doug Lea's concurrency=20 utilities. Is this ok? Would this be interesting for anyone else?=20 Essentially, we *need* read/write locks in order to be able to do=20 runtime backups of the database (which acquires a read lock). * Do you know of any other OpenSource object databases built on top of JD= BM? * About the project itself, it doesn't seem to be happening much=20 anymore, although it doesn't seem to be dead either. Can anyone give=20 some estimates about the status of it? That's all so far, but I think more questions will come up along the way. /Rickard --=20 Rickard =C3=96berg ri...@dr... Senselogic Got blog? I do. http://dreambean.com |
From: <ri...@dr...> - 2003-07-22 13:26:39
|
Rickard =C3=96berg wrote: > We're looking at creating an object database on top of JDBM, and so far= =20 > it seem to be doable but I have some questions about JDBM. Here we go: > * When I do commit() on a recMgr I thought that the .lg file was=20 > emptied, but that doesnt' seem to be the case. What exactly happens=20 > during a commit()? Calling commit on the cache manager and not the base manager did the tric= k. Is there any way I can have a cache mgr for the BTree state and just a=20 base mgr for the objects being stored? I don't want the object state to=20 be cached at all (since I do that on my own anyway), but I really do=20 want the BTree to be cached inside JDBM. > * I've made a testcase inserting 10M objects, doing a commit every 100k= =20 > of inserts. However, if I output Runtime.freeMemory() it continuously=20 > drops, and doesn't seem to ever recover. Is there a memory leak=20 > somewhere? Is JDBM holding onto objects somewhere? This might be related to the cache mgr I think. /Rickard --=20 Rickard =C3=96berg ri...@dr... Senselogic Got blog? I do. http://dreambean.com |
From: <ri...@dr...> - 2003-08-04 13:04:10
|
Alex, if you can take a quick peek at these Q's that'd be great. Rickard Öberg wrote: > We're looking at creating an object database on top of JDBM, and so far > it seem to be doable but I have some questions about JDBM. Here we go: > * When I do commit() on a recMgr I thought that the .lg file was > emptied, but that doesnt' seem to be the case. What exactly happens > during a commit()? In particular, if we want to do backups is it enough to always just get the .db file? Is the .lg file a strictly tmp file? (if yes, would it be possible to specify tmp directory for jdbm? I don't want to see it at all really) > * I'd like to implement read/write locks using Doug Lea's concurrency > utilities. Is this ok? Would this be interesting for anyone else? > Essentially, we *need* read/write locks in order to be able to do > runtime backups of the database (which acquires a read lock). After fiddling some more it seems as though the main methods of JDBM would keep their synchronization, but that our layer on top of JDBM would use read/write locks to allow for "long transactions" when backups and optimization jobs are running. > * Do you know of any other OpenSource object databases built on top of > JDBM? > > * About the project itself, it doesn't seem to be happening much > anymore, although it doesn't seem to be dead either. Can anyone give > some estimates about the status of it? |
From: Alex B. <boi...@in...> - 2003-08-04 19:09:54
|
Rickard, See comments inline. > Rickard =C3=96berg wrote: >=20 >> We're looking at creating an object database on top of JDBM, and so=20 >> far it seem to be doable but I have some questions about JDBM. Here we= =20 >> go: >> * When I do commit() on a recMgr I thought that the .lg file was=20 >> emptied, but that doesnt' seem to be the case. What exactly happens=20 >> during a commit()? >=20 > In particular, if we want to do backups is it enough to always just get= =20 > the .db file? Is the .lg file a strictly tmp file? (if yes, would it be= =20 > possible to specify tmp directory for jdbm? I don't want to see it at=20 > all really) The log file is not just a temporary file. It may hold data from=20 committed transactions that have not yet been synchronized (to disk)=20 into the main database file. Hence, you cannot only backup the main database file. You would risk=20 getting an incoherent snapshot of your data. The transaction log is currently synchronized every 10 transactions=20 (specified in TransactionManager.java) but I plan to make that=20 configurable via options passed to the RecordManagerFactory. >> * I'd like to implement read/write locks using Doug Lea's concurrency=20 >> utilities. Is this ok? Would this be interesting for anyone else?=20 >> Essentially, we *need* read/write locks in order to be able to do=20 >> runtime backups of the database (which acquires a read lock). >=20 > After fiddling some more it seems as though the main methods of JDBM=20 > would keep their synchronization, but that our layer on top of JDBM=20 > would use read/write locks to allow for "long transactions" when backup= s=20 > and optimization jobs are running. Yes, that's what I would recommend. >> * Do you know of any other OpenSource object databases built on top of= =20 >> JDBM? >> >> * About the project itself, it doesn't seem to be happening much=20 >> anymore, although it doesn't seem to be dead either. Can anyone give=20 >> some estimates about the status of it? There isn't much planned at the moment between now and version 1.0. My ideal wish list for 1.0 would be the following: 1) Key compression in BTree. This has been requested and some patches=20 where posted earlier but never integrated since there was not thorough=20 testing of the patches. The reason for inclusion of this in 1.0 is that=20 it would certainly introduce some incompatible file-level changes and I=20 would ideally like to get them in 1.0 so that we can call the file=20 format "final" afterwards. (Is anything final in this world?? :-)) 2) Compaction utility for the database to reduce data file fragmentation. 3) Small configuration changes (like the transaction synchronization=20 threshold mentioned above) I see very little changes at the API level and the stability has been=20 good so far so it's not really an issue for 1.0. The biggest project in there is the compaction utility i think, and I'm=20 not sure it's a real requirement for a 1.0 release. We would need some=20 feedback from long-running apps to see how much fragmentation is created=20 and how much performance is lost due to that. alex |
From: Cees de G. <cg...@cd...> - 2003-08-04 23:00:21
|
[note - random blabberings from some guy who hasn't looked at the code for years. Take with grains of salt. YMMV. No warranties] On Mon, 2003-08-04 at 21:09, Alex Boisvert wrote: > Hence, you cannot only backup the main database file. You would risk > getting an incoherent snapshot of your data. > Or, at least an old snapshot. While the transaction log is being appended to, the db file is usually in a consistent state. A bit of tweaking (keep the txn log growing while you are backing up the db) might give an easy on-line backup tool. > The biggest project in there is the compaction utility i think, and I'm > not sure it's a real requirement for a 1.0 release. We would need some > feedback from long-running apps to see how much fragmentation is created > and how much performance is lost due to that. > I have used a similar file format in an old application which, to my surprise, ran for almost 10 years while no-one ever looked at the compaction tool I supplied with it.,, So I don't think it's a real requirement - stability and a stable file format are probably more important than compaction. Compaction is not nice anyway, it sits in the way of 24x7-ness - it's probably better to make the allocation algorithms incrementally smarter so that bigger holes are produced in the database, etcetera. |
From: <ri...@dr...> - 2003-08-05 06:05:25
|
Cees de Groot wrote: > On Mon, 2003-08-04 at 21:09, Alex Boisvert wrote: > >>Hence, you cannot only backup the main database file. You would risk >>getting an incoherent snapshot of your data. > > Or, at least an old snapshot. While the transaction log is being > appended to, the db file is usually in a consistent state. A bit of > tweaking (keep the txn log growing while you are backing up the db) > might give an easy on-line backup tool. That would be great! In other words: when db is put into backup mode the tx log is flushed and then grows until backup mode is off. During that time the file can be safely copied. If you can do this, and also make the log a temp file (i.e. I don't want to see it, just put it in /tmp somewhere), then that'd be terrific. > So I don't think it's a real requirement - stability and a stable file > format are probably more important than compaction. Compaction is not > nice anyway, it sits in the way of 24x7-ness - it's probably better to > make the allocation algorithms incrementally smarter so that bigger > holes are produced in the database, etcetera. If the above is implemented, with tx logs that can grow when the db is being used, then the data can be safely transferred into a new file during that time. That'd work for me. Usually compaction is more useful at the beginning of a production environment, when objects are being created and removed quite frequently. Later on objects most just created and updated, so compaction doesn't help as much. (this is for CMS-type functionality, and along lines of the experiences we have had anyway. YMMV) /Rickard |
From: <ri...@dr...> - 2003-08-05 06:00:55
|
Alex Boisvert wrote: > The log file is not just a temporary file. It may hold data from > committed transactions that have not yet been synchronized (to disk) > into the main database file. > > Hence, you cannot only backup the main database file. You would risk > getting an incoherent snapshot of your data. > > The transaction log is currently synchronized every 10 transactions > (specified in TransactionManager.java) but I plan to make that > configurable via options passed to the RecordManagerFactory. Would it be possible to add a method to explicitly flush the tx log? That way I could lock the whole thing, flush the tx log, and then do backup of a single file. That'd be awesome! > There isn't much planned at the moment between now and version 1.0. > > My ideal wish list for 1.0 would be the following: > > 1) Key compression in BTree. This has been requested and some patches > where posted earlier but never integrated since there was not thorough > testing of the patches. The reason for inclusion of this in 1.0 is that > it would certainly introduce some incompatible file-level changes and I > would ideally like to get them in 1.0 so that we can call the file > format "final" afterwards. (Is anything final in this world?? :-)) > > 2) Compaction utility for the database to reduce data file fragmentation. > > 3) Small configuration changes (like the transaction synchronization > threshold mentioned above) Sounds good to me. I have a couple of generic performance improvements I did for Jisp which might work well here as well, such as an unsynchronized block-oriented ByteArrayOutputStream. Comes in handy especially with large output streams. > I see very little changes at the API level and the stability has been > good so far so it's not really an issue for 1.0. > > The biggest project in there is the compaction utility i think, and I'm > not sure it's a real requirement for a 1.0 release. We would need some > feedback from long-running apps to see how much fragmentation is created > and how much performance is lost due to that. The problem with regard to fragmentation that we have seen with Jisp are: *) Since it wasn't paged we got loads of small holes which were never reused (i.e. small object created, then removed, then hole is never reused again). Compaction every month or so typically did give us about 20-30% extra storage. If I remember correctly JDBM uses paging (i.e. fix-sized blocks instead of size equal to output size), so this should be a non-issue here. *) We store files in Jisp. This means that the db needs to be able to handle that sometimes really small Java objects are stored, and sometimes multimedia movies in the 50Mb-100Mb range are stored. If such a file is removed then the hole needs to be reused in a smart way (like chunk it up for many small objects). But, the first three priorities for us are: 1) Stability 2) Stability 3) Stability We are using this in a CMS which is being beat on quite heavily (our webhotel has 15 customers whose websites are in one database). We've had some (nightmare) crashes with Jisp, and if JDBM can prove to be more stable we would be truly happy :-) /Rickard |
From: Alex B. <boi...@in...> - 2003-08-05 18:03:26
|
Rickard =D6berg wrote: >=20 > Would it be possible to add a method to explicitly flush the tx log?=20 > That way I could lock the whole thing, flush the tx log, and then do=20 > backup of a single file. That'd be awesome! Yes, I will look into that. > I have a couple of generic performance improvements I=20 > did for Jisp which might work well here as well, such as an=20 > unsynchronized block-oriented ByteArrayOutputStream. Comes in handy=20 > especially with large output streams. Cool. Assuming we can license the code with the JDBM license -- you=20 would still retain copyright -- then I would gladly integrate it. > The problem with regard to fragmentation that we have seen with Jisp ar= e: > *) Since it wasn't paged we got loads of small holes which were never=20 > reused (i.e. small object created, then removed, then hole is never=20 > reused again). Compaction every month or so typically did give us about= =20 > 20-30% extra storage. If I remember correctly JDBM uses paging (i.e.=20 > fix-sized blocks instead of size equal to output size), so this should=20 > be a non-issue here. Yes, that's correct. Physical allocation is page-based. > But, the first three priorities for us are: > 1) Stability > 2) Stability > 3) Stability >=20 > We are using this in a CMS which is being beat on quite heavily (our=20 > webhotel has 15 customers whose websites are in one database). We've ha= d=20 > some (nightmare) crashes with Jisp, and if JDBM can prove to be more=20 > stable we would be truly happy :-) >=20 I will let you be the judge of that. alex |
From: Alex B. <boi...@in...> - 2003-08-07 21:16:27
|
Rickard, I've implemented the enhancement you requested. You can now force the=20 transaction log to be synchronized with the database by doing: CacheRecordManager cache; BaseRecordManager base; TransactionManager txMgr; synchronized ( cache ) { base =3D (BaseRecordManager) cache.getRecordManager(); txMgr =3D base.getTransactionManager(); txMgr.synchronizeLog(); // do the backup while in here // (synchronization prevents concurrent modification // during backup) // leave synchronization when finished } Also, you can now set the maximum number of transactions to be kept in=20 the log before it's synchronized with the main database file, like this: synchronized ( cache ) { base =3D (BaseRecordManager) cache.getRecordManager(); txMgr =3D base.getTransactionManager(); txMgr.setMaximumTransactionsInLog( 100 ); } Keep in mind the transaction data is currently kept in memory for=20 performance reasons, so it's wise not to pick a number that's too large. You can grab the latest CVS code on the main web page. I've updated the=20 current ZIP file. cheers, alex Rickard =D6berg wrote: > Alex Boisvert wrote: >=20 >> The log file is not just a temporary file. It may hold data from=20 >> committed transactions that have not yet been synchronized (to disk)=20 >> into the main database file. >> >> Hence, you cannot only backup the main database file. You would risk=20 >> getting an incoherent snapshot of your data. >> >> The transaction log is currently synchronized every 10 transactions=20 >> (specified in TransactionManager.java) but I plan to make that=20 >> configurable via options passed to the RecordManagerFactory. >=20 > Would it be possible to add a method to explicitly flush the tx log?=20 > That way I could lock the whole thing, flush the tx log, and then do=20 > backup of a single file. That'd be awesome! >=20 |
From: <ri...@dr...> - 2003-08-08 06:01:49
|
Alex Boisvert wrote: > I've implemented the enhancement you requested. You can now force the > transaction log to be synchronized with the database by doing: > > CacheRecordManager cache; > BaseRecordManager base; > TransactionManager txMgr; > > synchronized ( cache ) { > > base = (BaseRecordManager) cache.getRecordManager(); > txMgr = base.getTransactionManager(); > txMgr.synchronizeLog(); > > // do the backup while in here > // (synchronization prevents concurrent modification > // during backup) > > // leave synchronization when finished > } > > Also, you can now set the maximum number of transactions to be kept in > the log before it's synchronized with the main database file, like this: > > synchronized ( cache ) { > base = (BaseRecordManager) cache.getRecordManager(); > txMgr = base.getTransactionManager(); > txMgr.setMaximumTransactionsInLog( 100 ); > } Excellent stuff! Just what I needed :-) > Keep in mind the transaction data is currently kept in memory for > performance reasons, so it's wise not to pick a number that's too large. I think this ties in with the "memory leaks" I saw. Sometimes, especially during defragmentation and db export (where a graph of objects are extracted from the db) each transaction will consist of >100.000 objects, some of which may be in the >1Mb range. For those occasions a setting of >10 will most likely lead to a Any ideas on placing the log file in a temp directory? Since I don't have to bother with it now, really, it'd be best if I can't even see it normally. /Rickard |
From: <ri...@dr...> - 2003-08-08 06:44:55
|
Rickard Öberg wrote: >> Also, you can now set the maximum number of transactions to be kept in >> the log before it's synchronized with the main database file, like this: >> >> synchronized ( cache ) { >> base = (BaseRecordManager) cache.getRecordManager(); >> txMgr = base.getTransactionManager(); >> txMgr.setMaximumTransactionsInLog( 100 ); >> } Actually, that's only almost what I needed. What I *really* want is online backups, i.e. I want the tx log to grow indefinitely while I'm doing backup of the db. It seems like this should be possible if I call synch and then set the maxtxinlog to 100000000000000 (or something like that). But I think I'd prefer using -1, i.e. never sync. When I'm done with the backup I'd then re-set it to 10 (or something like that). Would it be ok to introduce such logic? Would it work? i.e. is it ok to add stuff to the tx log while copying and reading (because I still want the db to be usable) from the main db? If this works it'd allow for backups that don't interrupt the usage of the db at all. Which would be nice :-) /Rickard |
From: Alex B. <boi...@in...> - 2003-08-08 17:08:25
|
Rickard =D6berg wrote: >=20 > Actually, that's only almost what I needed. What I *really* want is=20 > online backups, i.e. I want the tx log to grow indefinitely while I'm=20 > doing backup of the db. It seems like this should be possible if I call= =20 > synch and then set the maxtxinlog to 100000000000000 (or something like= =20 > that). But I think I'd prefer using -1, i.e. never sync. When I'm done=20 > with the backup I'd then re-set it to 10 (or something like that). Yes, I thought of the -1 (infinite) but figured that Integer.MAX_INT=20 would probably work as well practically. In any case, I'll add it since it seems cleaner and clearly spells out=20 the intent of the code. > Would it be ok to introduce such logic? Would it work? i.e. is it ok to= =20 > add stuff to the tx log while copying and reading (because I still want= =20 > the db to be usable) from the main db? Yes, it would certainly work. The only danger right now is that=20 synchronizeLog() is never called and therefore you end up consuming all=20 your memory. I guess keeping the log information in memory is not ideal for all use=20 cases. I'll see if I can change it such that it's configurable. >=20 > If this works it'd allow for backups that don't interrupt the usage of=20 > the db at all. Which would be nice :-) Yes, I think that's a very important feature. alex |
From: Alex B. <boi...@in...> - 2003-08-08 17:09:37
|
Rickard =D6berg wrote: >=20 > Any ideas on placing the log file in a temp directory? Since I don't=20 > have to bother with it now, really, it'd be best if I can't even see it= =20 > normally. >=20 I'll work on that too. alex |