Thread: [Bigdata-developers] BTree key mismatch questions

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

There seems to be a bug in that KeyDecoder objects cannot decode BTree keys
where the key was built with the JDK collator. That is, the JDK CollationKey
contains bytes of zero, which the KeyDecoder falsely assumes separates
sections of the BTree key. What is the best way to fix this?

The bug was detected when the icu4j jar was inadvertently left out of a
classpath in a test environment. The decision to choose a default key
encoding based on whether a class can be loaded is undesirable because two
different virtual machines can attempt to use different key encodings on the
same BTree. Would you object to always using ICU as the default for now (and
possibly using an explicit configuration option in the future)?

What is the role of locale/language and collation in BTree keys? Does the
language choice affect RDF or SAIL or lexicon or SPARQL queries? The choice
of language appears to be left to default system properties which can also
vary from VM to VM. Would you object to always using en_US as the default
for now (and possibly using an explicit configuration option in the future)?

There seems to be a fairly odd relationship between three major classes in
BTree key handling:  Schema, KeyBuilder, KeyDecoder. It would seem
architecturally that the Schema object for a BTree should contain all the
information needed to create and manipulate keys for that BTree rather than
having that information distributed over many classes. Would you object to
Schema objects being the sole constructors of KeyBuilders and KeyDecoders
(replacing DefaultKeyBuilderFactory)?

If that is right, then it seems that Schema objects ought to be properties
of BTrees which can be obtained from a BTree itself [BTree.getSchema()], and
should only be constructed by clients during BTree creation. Does that make
sense?

Fred

Thread: [Bigdata-developers] BTree key mismatch questions

Fast, scalable, robust graph database platform

bigdata-developers