From: Brian O'N. <Bri...@di...> - 2001-06-26 17:23:56
|
I agree that the core should be kept small. However, I think it can be made smaller. When I say a simple persistence mechanism can built from a small core, I mean that the existing functionality can be developed on top of a core implementation that operates simply on byte arrays. Although RecordManager does operate on arrays or objects, BTree and HTree cannot operate efficiently on byte arrays. If they did, it would help pave the way for other kinds persistence models. For now, what I would like is an efficient mapping of byte array keys to values. I don't want the overhead of object serialization because I don't need it. What I see in version 0.11 is three layers. JDBMHashtable -> HTree -> RecordManager. JDBMhashtable offers the most convenience, and most users should start there. More advanced users will look to the BTree or even the RecordManager directly. Things that I'd like to try out: 1: More use of interfaces. For example, with a RecordManager interface, caching layers can be designed as wrappers instead of built directly into it. In addition, different kinds of RecordManagers could be plugged into the trees. 2: Use of JDK1.2 collections interfaces where appropriate. You've also mentioned this. 3: Common interface for BTree and HTree. 4: Split BTree and HTree layers into two. One operates on byte arrays, and the other operates on objects. 5: Explore sharing object references among tree nodes. As JDBM becomes more popular it will inevitably grow larger. A cleaner, more flexible foundation will make this growth go much easier. The current version number is very low, which indicates to me that you haven't yet locked into any designs. -----Original Message----- From: Alex Boisvert [mailto:boi...@in...] Sent: Monday, June 25, 2001 08:54 PM To: Brian O'Neill Cc: jdb...@li... Subject: Re: [Jdbm-general] Sharing objects Brian O'Neill wrote: > I think JDBM is a great little persistence engine with tons of potential. > Its nice to see such a project like this being actively developed. I do see > some things that can be improved. I'm certainly open to improvements, either to the existing core or adding extra functionality. Something I've articulated in the past is that I'd like to see the core of JDBM remain small, so that it remains an interesting choice for small projects that only want a simple persistence engine. However, we can bundle a number of optional helpers/utilities around it to make it attractive to those who need a few more features. For example, the support for (or emulating) the Java2 Collection classes has been discussed here in the past and I still believe it would make a nice addition. > When I save a key or value into a JDBM table, the object is serialized in > its own stream. If this object has a reference to another shared object, the > original object graph is not preserved across hash table entries. The > approach that JDBM uses to serialize also has space overhead from all the > stream headers and class info written to each record. You're right. Currently, the object is serialized in its own stream (along with its complete object graph) and converted into a byte[] for storage in the RecordManager. Like you mention, this implies an overhead for each object placed into the hash table. This overhead is roughly 25 bytes for user-defined serialized objects. For strings, it's 7 bytes (including string length). It's not so bad, but can be a concern databases where a large number of small objects are stored. > I like the simple interfaces provided in the HTree and BTree, but since they > do nothing special to preserve object graphs, I think an interface that > operates on byte[] keys and values is more flexible. A simple object > serialization strategy could be placed above this level, or a more > sophisticated persistence model could be developed with less storage > overhead. Well, from my experience, the 'simple serialization strategy' you mention can become quite complex, depending on what exactly you have in mind. I'm not against the approach. In fact, I'd like to have such a feature if it really is not too complicated and doesn't impact the size of the core too much. If you have ideas in mind, please send them on this list for discussion. > Separating the HTree and BTree from the RecordManager is really nice. I > think more levels of layering will make it easier to develop many kinds of > persistence models. Making object persistence available at the RecordManager > level might not be the best place for this, since I consider this to be a > very "high level" function. Again, the current object persistence in the RecordManager is really just a utility function which serialiazes the object into a byte[]. If you want more functionality, you build it on top. This goes for any type of service generally found in OODBMS such as collection classes, concurrency policy (pessimistic/optimistic locking), transaction models (nesting, rollback ability, ...), etc. I've argued in the past that many of these features should be part of a separate project, in order to keep JDBM small and simple. I guess it depends on the scope of the overall project and whether you want to create a full-blown OODBMS or simply extend JDBM in ways that are compatible with its original objectives. > > I'm going to start mucking with JDBM, to see how feasible it is to have > higher level tree implementations built upon simpler ones. Has there been > any other talk of this, or have there been any such implementations? > The Exolab folks have built some services on top of JDBM in the past. I remember seeing locking and extended transaction management in the core library. I should check again to see if we could merge some of the code back into JDBM. cheers, alex |
From: Brian O'N. <Bri...@di...> - 2001-06-27 00:04:14
|
Okay, no raw byte arrays. How about a ByteArray interface that extends Comparable? Implementations can save the hash code in a field so that it doesn't need to be recalculated each time. The ByteArray interface would need to have writeTo/readFrom methods that operate on a plain DataOutput instead of ObjectOutput. public interface ByteArray extends Comparable { int hashCode(); boolean equals(Object obj); boolean equals(ByteArray array); int compareTo(Object obj); int compareTo(ByteArray array); void writeTo(DataOutput out) throws IOException; void readFrom(DataInput in) throws IOException; } Do you really need JDK1.1.x support? JDK1.2 has been out for over two years, and I don't think anyone would attempt to use JDBM in an applet. -----Original Message----- From: Alex Boisvert [mailto:boi...@in...] Sent: Tuesday, June 26, 2001 04:47 PM To: Brian O'Neill Cc: jdb...@li... Subject: Re: [Jdbm-general] Sharing objects Brian O'Neill wrote: > > I agree that the core should be kept small. However, I think it can be made > smaller. When I say a simple persistence mechanism can built from a small > core, I mean that the existing functionality can be developed on top of a > core implementation that operates simply on byte arrays. It's not going to make the code much smaller, but I agree it would better serve the need for higher-level services since it won't add the unecessary overhead of serialization. I agree to make this change if we can agree on the issue below. > > Although RecordManager does operate on arrays or objects, BTree and HTree > cannot operate efficiently on byte arrays. If they did, it would help pave > the way for other kinds persistence models. For now, what I would like is an > efficient mapping of byte array keys to values. I don't want the overhead of > object serialization because I don't need it. The issue you will run into if you make BTree/HTree run only on byte arrays is how to you compare (or get the hash code) of byte[] keys? For BTree, we could say that we order the byte[] according to the value of each byte in the array. For HTree, the hash code can be calculated by hashing the values of each bytes in the array. In both cases, we have something that is not necessarily optimal but works. Alternatively, we could also have a mechanism to 'callback' the higher-level service so that comparison/hashing is done at the higher-level, but this would lead to converting byte arrays back into objects for comparison/hashing, which might be slower than comparing the byte[] to start with. > What I see in version 0.11 is three layers. JDBMHashtable -> HTree -> > RecordManager. JDBMhashtable offers the most convenience, and most users > should start there. More advanced users will look to the BTree or even the > RecordManager directly. Things that I'd like to try out: > > 1: More use of interfaces. For example, with a RecordManager interface, > caching layers can be designed as wrappers instead of built directly into > it. In addition, different kinds of RecordManagers could be plugged into the > trees. Ok. > 2: Use of JDK1.2 collections interfaces where appropriate. You've also > mentioned this. Easy to say but requires substantial efforts. There's also the issue of supporting JDK 1.1.x. Many smaller platforms don't support JDK 1.2 yet. > 3: Common interface for BTree and HTree. For put() and get() it makes sense. But BTrees are ordered and therefore offer ordered traversal which isn't provided by HTree. > 4: Split BTree and HTree layers into two. One operates on byte arrays, and > the other operates on objects. Ok if we agree on issue discussed above. > 5: Explore sharing object references among tree nodes. That's where the real fun begins! :) alex -- Alex Boisvert boi...@in... Project Manager, Intalio Inc. www.intalio.com Operate at the Process Level <SM> (650) 345 2777 |
From: Alex B. <boi...@in...> - 2001-06-27 00:27:07
|
Brian O'Neill wrote: > > Okay, no raw byte arrays. How about a ByteArray interface that extends > Comparable? Implementations can save the hash code in a field so that it > doesn't need to be recalculated each time. The ByteArray interface would > need to have writeTo/readFrom methods that operate on a plain DataOutput > instead of ObjectOutput. > > public interface ByteArray extends Comparable { > int hashCode(); > > boolean equals(Object obj); > > boolean equals(ByteArray array); > > int compareTo(Object obj); > > int compareTo(ByteArray array); > > void writeTo(DataOutput out) throws IOException; > > void readFrom(DataInput in) throws IOException; > } Not sure it works. First, you would have to serialize the byte array in order to maintain the hashcode. Secundo, how would the BTree recreate a ByteArray object in order to compare a persistent object with one being inserted? I would go with an approach like this: // Used with HTree interface HashCalculator { int hash( byte[] data ); } // Used with BTree interface Comparator { int compare( byte[] data1, byte[] data2 ); } Both interfaces would plug into their repective data structure to handle callbacks. We could provide default implementations for both, which do what I explained in the previous email. I'm not too worried about caching the hash code, since keys are typically small (at least should be). If they're large, the calculation time should still be insignificant compared to the I/O cost. > > Do you really need JDK1.1.x support? JDK1.2 has been out for over two years, > and I don't think anyone would attempt to use JDBM in an applet. > Anybody on the list wants to keep JDK 1.1.x compatibility? If not, we'll move to JDK 1.2 and above. alex -- Alex Boisvert boi...@in... Project Manager, Intalio Inc. www.intalio.com Operate at the Process Level <SM> (650) 345 2777 |
From: Alex B. <boi...@in...> - 2001-06-26 23:47:42
|
Brian O'Neill wrote: > > I agree that the core should be kept small. However, I think it can be made > smaller. When I say a simple persistence mechanism can built from a small > core, I mean that the existing functionality can be developed on top of a > core implementation that operates simply on byte arrays. It's not going to make the code much smaller, but I agree it would better serve the need for higher-level services since it won't add the unecessary overhead of serialization. I agree to make this change if we can agree on the issue below. > > Although RecordManager does operate on arrays or objects, BTree and HTree > cannot operate efficiently on byte arrays. If they did, it would help pave > the way for other kinds persistence models. For now, what I would like is an > efficient mapping of byte array keys to values. I don't want the overhead of > object serialization because I don't need it. The issue you will run into if you make BTree/HTree run only on byte arrays is how to you compare (or get the hash code) of byte[] keys? For BTree, we could say that we order the byte[] according to the value of each byte in the array. For HTree, the hash code can be calculated by hashing the values of each bytes in the array. In both cases, we have something that is not necessarily optimal but works. Alternatively, we could also have a mechanism to 'callback' the higher-level service so that comparison/hashing is done at the higher-level, but this would lead to converting byte arrays back into objects for comparison/hashing, which might be slower than comparing the byte[] to start with. > What I see in version 0.11 is three layers. JDBMHashtable -> HTree -> > RecordManager. JDBMhashtable offers the most convenience, and most users > should start there. More advanced users will look to the BTree or even the > RecordManager directly. Things that I'd like to try out: > > 1: More use of interfaces. For example, with a RecordManager interface, > caching layers can be designed as wrappers instead of built directly into > it. In addition, different kinds of RecordManagers could be plugged into the > trees. Ok. > 2: Use of JDK1.2 collections interfaces where appropriate. You've also > mentioned this. Easy to say but requires substantial efforts. There's also the issue of supporting JDK 1.1.x. Many smaller platforms don't support JDK 1.2 yet. > 3: Common interface for BTree and HTree. For put() and get() it makes sense. But BTrees are ordered and therefore offer ordered traversal which isn't provided by HTree. > 4: Split BTree and HTree layers into two. One operates on byte arrays, and > the other operates on objects. Ok if we agree on issue discussed above. > 5: Explore sharing object references among tree nodes. That's where the real fun begins! :) alex -- Alex Boisvert boi...@in... Project Manager, Intalio Inc. www.intalio.com Operate at the Process Level <SM> (650) 345 2777 |