|
From: Ian S. <Ian...@ar...> - 2003-06-09 17:36:54
|
Hi, Pickle now exceeds shapefile performance for both reading and writing. The main method of PickleDataSource contains a shapefile vs. pickle performance benchmark. The unit test "PickleTest", contains some examples of usage and an example of limitations (see point 4 below). I also committed some documentation for anyone interested. There are several issues/ideas that the implementation process brought to light: 1) AttributeType, and FeatureType should be constrained by the factory pattern (Given that Feature follows this pattern). For example: Features is a class which provides the factory pattern. Instead of delegating Feature creation responsibility to a FeatureFactory, make the FeatureType responsible. AttributeType att = Features.createAttribute(name,clazz); ... FeatureType schema = Features.createSchema(name,atts); Feature f = schema.createFeature(values); 2) There should be more allowable policy with AttributeTypes, including and perhaps even specific subtypes to allow more meta data / validation policy. Meta data can include units, editing policy, persistence delegate, etc. Validation includes range validation, enumeration validation, or for references to other Features, type validation, etc. 3) It would make sense to allow for FeatureCollections to be immutable, enforce typed membership (i.e. Features must have same schema), etc. Given the flexability and logic of the Collections framework, the FeatureCollections framework should be modeled on this work. For instance a FeatureCollection should be the base interface. There could exist FeatureList and FeatureSet with the respective semantics. 4) The biggest challenge in the Feature API will be the use/management of "shared" objects. If FeatureType A includes a FeatureType B attribute and instance A1 and A2 both refer to the same instance of B1, this is important information and should be preserved. Right now, the pickle module ignores reference identities of objects and saves them as a "value". So if every Feature contained a reference to the same Date object, after writing and then reading back, each Feature now would contain a unique date object. This is obviously bad in terms of memory usage and is semantically lossy. Food for IRC thought. Ian |