|
From: Bryan T. <br...@sy...> - 2010-07-15 21:58:56
|
Fred,
Answers below.
> bigdata/src/java/com/bigdata/btree/DefaultTupleSerializer.java:
> return new DefaultKeyBuilderFactory(new Properties());
^ The actual collator behavior will be captured by the IndexMetadata (the tuple serializer is saved as part of the IndexMetadata). You might add a warning to the constructor.
> bigdata/src/java/com/bigdata/btree/keys/KeyBuilder.java: return
> new DefaultKeyBuilderFactory(null/* properties */)
^ This is the specified behavior - it uses whatever is set in System.properties and otherwise defaults. You might add a warning to the constructor.
What is important for these first two cases is to make sure that we apply the Collator configuration as described for the specific index or triple store when it is created. So the use of these constructor forms could allow an unintended collator configuration to be inherited and made persistent as part of an index, triple store, etc. The critical case for the triple store is handled explicitly in LexiconRelation on line 644 (below) where it uses the properties used to create the AbstractTripleStore to setup the collator for the TERM2ID index. This is the only triple store index which can have Unicode data in the key, and hence the only one for which the collator configuration can have any impact. All of the rest of the indices are based on non-text data in the keys.
protected IndexMetadata getTerm2IdIndexMetadata(final String name) {
final IndexMetadata metadata = newIndexMetadata(name);
metadata.setTupleSerializer(new Term2IdTupleSerializer(getProperties()));
return metadata;
}
> bigdata/src/java/com/bigdata/btree/NOPTupleSerializer.java:
> new DefaultKeyBuilderFactory(new Properties()));
^ This should have no impact. The tuple serializer is not used in this case.
> bigdata/src/java/com/bigdata/journal/Name2Addr.java:
> new DefaultKeyBuilderFactory(new Properties())));
^ This one is worth doing something about. The role of Name2Addr is to map from index names to the address of a checkpoint record for the named index. The journals in the federation really should all have a consistent behavior in this regard otherwise bizarre errors could creep in with different collators on different nodes. E.g., you could have two index names which one node believed to be distinct while another node was using a collator which did not capture the distinction. For this, I think that the only "fix" is to put the information into the service configuration information so the data services and the metadata service all share the same collator configuration (all services can share, but these are the ones which could cause a problem).
Can you file an issue on this?
Thanks,
Bryan
> -----Original Message-----
> From: Fred Oliver [mailto:fko...@gm...]
> Sent: Thursday, July 15, 2010 5:39 PM
> To: Bryan Thompson
> Cc: Bigdata Developers
> Subject: Re: [Bigdata-developers] BTree key mismatch questions
>
> On Thu, Jul 15, 2010 at 4:29 PM, Bryan Thompson
> <br...@sy...> wrote:
> > Fred,
> >
> >> Yes, please add that locale to the sample configuration files. I
> >> think it should be made reasonably obvious that the locale of the
> >> machine and the locale of the data need not be related.
> >
> > If you don't mind, can you apply and test the edit. If you
> look in the configuration file (bigdataStandalone.config,
> bigdataCluster.config, bigdataCluster16.config), you will see
> the following line in each file. It is part of the section
> where we are declaring the properties that will be applied to
> the triple store created by the batch job:
> >
> > new NV(BigdataSail.Options.COLLATOR,"ASCII"),
> >
> > You should be able to just specify additional properties
> right there to override the locale, collator, etc. The
> BigdataSail.Options is just inheriting options which include
> KeyBuilder.Options, so all of the options should be
> accessible in the BigdataSail.Options namespace. You can
> also explictly reference them in the KeyBuilder.Options
> namespace if you feel that is clearer (but make sure to
> import that namespace at the top of the configuration file).
>
> OK. That covers a few of the cases (maybe the important
> ones). But there are DefaultKeyBuilderFactories created with
> empty or null properties objects:
>
> bigdata/src/java/com/bigdata/btree/DefaultTupleSerializer.java:
> return new DefaultKeyBuilderFactory(new Properties());
> bigdata/src/java/com/bigdata/btree/keys/KeyBuilder.java: return
> new DefaultKeyBuilderFactory(null/* properties */)
> bigdata/src/java/com/bigdata/btree/NOPTupleSerializer.java:
> new DefaultKeyBuilderFactory(new Properties()));
> bigdata/src/java/com/bigdata/journal/Name2Addr.java:
> new DefaultKeyBuilderFactory(new Properties())));
>
> What are the consequences of unfortunate collators or locales
> in these places?
>
> Fred
>
|