Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
I'm actually using the C# port of this library, but the two seem to be basically identical. Both also using three lookup dictionarys - the nodeMap which relates integer IDs to the actual node instance, and two symmetrical lookups for getting items from IDs.
All of these collections seem somewhat unnecessary (in fact, the nodeMap field has a TODO comment saying it needs removing). I've tried removing the dependency on these myself but had no luck (ending up with a corrupt tree). I can't say I understand the whole R tree implementation unfortunately!
Has anyone had any luck removing these collections?
I've started using the RTree recently, and I encountered roughly the same situation you describe. But I'm using the Java version and I only see one of these three maps, that being nodeMap.
It seemed to me that removing nodeMap would be too hard for an amateur, because the tree uses it heavily. The funny thing, though, is that I actually needed those two other maps you mention. I need to get an item's ID in order to remove it from the tree, and I need to get an item by ID in order to look it up in the tree. I satisfied the first requirement by simply cramming an ID field into my item class-that wasn't a problem. But you can't cram an item field into an integral ID, so to meet the second requirement, I needed an explicit map.
I thought at first that I could make the tree itself act as that map. I understand that one reason why RTree uses integers as its indexed items, rather than Objects of whatever type, is that that makes its operations especially fast. But why index useless integers very quickly, only to spend that saved time on the maintenance of a separate map from integers to useful Objects? I figured it'd make more sense to hunt through the tree code, replacing integers with objects of a generic type (thus rendering the RTree friendly and type-safe, too!), than to add a separate map. …But when I tried it, I learned that the ID's the tree stores are used in strange ways that I could not easily understand, so after some banging of my head against the metaphorical wall, and lots of unit tests failures, I gave up and added a Trove hash map (of the same type RTree uses internally).
(But the C# version of the library wouldn't use Trove, I guess.)
I don't mean to complain about how hard the tree is to understand-it works fine for me now, and that's all I need from it! :)
The original intention was to use a tree structure rather than a map, as that would reduce memory usage. Frankly I never bothered doing this work, because I found performance and memory usage to be more than acceptable for the original application that this was written for.
With a map, a fair proportion of the memory is overhead (it might use up to twice the space that an array would use). However this overhead can be reduced simply by having fewer nodes (obviously the number of entries in each node would be larger). My tests showed that the number of entries in a node can be made surprisingly large with little effect on performance (100+ entries per node is entirely feasible).
If you have a use case for which replacing the node map with a proper tree would make a significant difference, I'd be interested to see it.
The reason that the rtree stores integers and not objects, is that the original application used column oriented storage, i.e. for n rectangle objects, each containing m fields, the application stored m arrays of length n (mostly containing primitives). The integers stored in the rtree were the indexes into these arrays. This saves significant memory for large values of n.
Hope this makes things clearer…
Well, I wasn't trying to replace the node map with a new tree-I was trying to remove the need for the node map, by letting the RTree store and report the objects that I put into it, rather than arbitrary integers. I guess that would make searches for objects at least a little bit faster. (I'm not very concerned with memory usage.) But the Trove hash map is very fast, so I think things are probably fine as they are.