Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
Currently, the BPage is it's own serializer. As I'm fiddling with the BPage serialization (I'm working on key compression right now), I'm finding that I'm having to add functionality to the BPage that really should be factored out into a separate class...
I think that it is time to refactor the BPage serialization into an inner class, or it's own standalone class. This change should be backwards compatible (the current BTree implementation doesn't serialize the BPage serializer - it currently just creates it new each time the BTree is used).
With the addition of key compression, it becomes necessary to store the state of the BPage serializer...
I could just store compressor information in the BTree state, but it feels kludgy to me.
If no one objects, or has any concerns about this, I'll start refactoring...
Comments? Any preference for inner class vs regular class?
I would also like to be able to extend the BTree implementation. I have use cases for a BTree where I would like to extend it to expose additional methods and data rather than needing to lookup one of my application objects, get the btree's recid, fetch the btree, and then resolve the key to find the value. I have not looked into this further yet, but it seems like the current factory methods are designed to discourage this. Is that true?
I don't know if the factory methods were explicitly selected to discourage extension of BTree... They are more likely a reflection of the needs of the design.
Basically, you have to be able to retrieve a BTree and it's BPages from recman and set their recids. You can't do this with a constructor, so you have to use a factory.
The factory, as you've pointed out, does make it difficult to sub-class. If this does become a need, then we would need to implement a Provider for the BTree (similar to what was done with record manager awhile back).
After reading the code while adding support for key compression, I am actually going to retract my question about moving the BPage serializer into it's own class. The design of the BTree requires that the serializer that is storing a given BPage be the PARENT of that BPage. It's better to leave that alone, methinks.
1. Any reason you aren't just holding on to the BTree object instead of continuously looking it up? That's what I'm doing in my app (I store all of my indexes in an array in a static variable), and I want to make sure I'm not doing something potentially dangerous.... Could a BTree get corrupted by having multiple instances in memory??? hmmmm... Might be time to get that weak reference cache working.
2. I think it would be a good idea to expose more of the BTree's state via protected get() methods. That way you could add your additional behavior to a wrapper class that has the same package...
I do hold onto BTree references, but the object that holds onto those references it itself a persistent object, so I need to get my "index" object, and then resolve the BTree used to realize the index. I could get rid of that additional fetch if there were a mechanism for subclassing the BTree.
How about using a lazy initialization reference?
Hold on to the recid of the BTree. When the reference is resolved, look in an internal cache, if the object is there already, return it. If not, then do the lookup and return the result. When persisting the reference, just store the recid.
That said, doing a lookup for the BTree by recid is very fast. It's almost 100% certain going to be stored in the MRU cache, so it will return quite quickly...
I already do the latter. However, given the access patterns, it is relatively likely that the BTree will not be in the cache.
How many different trees do you have? Is it unreasonable to hold them all in memory at the same time?
If you can, then let's just get the weak reference cache working and be done with it...
The #of BTrees is not bounded. There can be N trees per object. The best way to think of this for my application is to consider how many secondary indices you can have in a database system. One per column (or combination of columns) per table.
Yes, let's get the weak reference cache running, and let's integrate a hash table that does not require Long's while we are at it.
I'd be happy to work on this with you if you want to suggest an approach.
I've upgraded your membership to "developer". You should have full CVS access now. I hope this helps collaboration with Kevin, and other developers.