If I understand B-trees correctly in the abstract, not only are they random access, but the nodes also form a _sorted_ linked list for sequential list.
Are there any examples of indexing into a B-tree's sequential list? For instance, let's say I populated a B-tree with 5000 items. I am interested in pulling out the 4000th item. How would I use JDBM to do that?
B-trees are efficient for lookup based on a key value because they order the entries and use a form of binary-search algorithm to quickly narrow on the correct entry. However, B-trees do not maintain any index position for entries, making them inefficient for such index-based lookup.
Currently, there isn't any efficient data structure in JDBM to support index-based lookups (and still be efficient for insert/remove operations). You could use a B-tree browser to iterate until the Nth entry, but that wouldn't be efficient at all.
How large is you dataset? You could simply use an ArrayList to store recids of objects. This ArrayList can be persisted in JDBM quite easily. Or, for more compact serialization, you could use a long for the recids. An array with 5000 objects would occupy roughly 40K (5000 * 8 bytes) on disk -- which is still manageable to update during a single transaction.
I'm curious, what kind of data do you have that requires index-based lookups?
Thanks for the prompt reply :-)
I was thinking of using JDBM to be the data model behind a Java JTable. At first, the table is displayed AS-IS (unsorted) to the user, but the user can select a column to sort the data by.
I chose B+tree because, for unsorted data, I could insert data into the table using the row # as the key for future retrieval. However, when it comes to sorting, I mimic it by treating the row number as an index into the B+ browser. Thereby, if I wanted to flip between unsorted and sorted data, it would be easy.
I hope that explains it.
If JDBM doesn't support this, I do have a suggestion. Maybe you could add an internal attribute to the B+tree nodes that reflect the least index that is under the particular node. This way, a function such as BTree.browseFromIndex() could traverse down to that index and return a Tuple starting at index I.
It doesn't make complete sense, sorry. Or maybe I don't understand correctly.
If you can use the row index as a key, then you can use BTree.find( key ) to directly obtain the object. But then, you can not insert/delete entries easily because you'll either have to renumber all of the keys or you'll create "holes" in the indexes.
About your suggestion to maintain indexes on the BTree node, this would seriously impact the performance because of the need to update multiple nodes each time a node is inserted (or deleted, depending on how you do it). I don't think it's a viable solution. We should look into implementing a more adequate data structure if this is something we want to achieve.
As for the "holes" in the indexes, there can be none since the table is read only. I do not allow any further insert/delete once the table is loaded.
I was looking for a data structure that has random access as well as direct indexing. I thought B+trees had that capability.