|
From: <ch...@op...> - 2004-10-21 19:05:53
|
Ok, I've spent some good time with the interfaces the Refractions crew has thrown up, and have more thoughts and concerns than I could reasonably organize and put down in one email. So I think I'm going to mostly address the main issue I have with them. But before I begin, something positive =96 I really like many of the motivations for these changes, I think our api is in need of another evolutionary leap forward, especially simplified creation, better starting point for file based feature sources, better grid coverage integration, integration of featureresults/featurecollection/featureiterator/featurereader, random access and spatial index queries. I support upping fid mapping to the public api, and would actually like to take it further, extend the mapping idea to all objects, not just featureids. I want most all of these as well, and I'd like to be sure that some sort of joins are actually working, instead of mere ideas thrown into the api that might work. What I really don't like is this Catalog coming smashing into our API.=20 I've taken a good bit of time to figure out why I feel this way, went back through the catalog specs, looked at the the geoapi interfaces, studied the new interfaces, talked to David a bit. It basically comes down to two things: 1.)I think the catalog specs are not very good specifications 2.)I feel the way they are being used in the interfaces isn't very close to the intention of the specs. For the first point, how many catalog implementations do you guys know of? And compare that to how many WFS and WMS implementations do you know of? The only catalog implementation I know of is Ionic's, and if you look at their catalog product it's based much more on some other xml technology. I forget what it is at the moment, and I'm off-line so I can't look it up, but it's another web services standard put out by a different organization. I've talked to people in the OGC, and most of them agree that catalogs need to improve. I admit that this isn't a direct criticism of the catalog spec, but it is an indication of how useful other people have found it. As for my own personal opinion, the catalog specs themselves do not strike me as 'good' specifications. WFS and WMS lay out exactly what they want to do, and provide all the details to do it generically. The catalog specs either come across as incredibly abstract, talking about vague concepts, or incredibly detailed =96 this is how you implement our spec with z39.50, which comes off to me as 'we wrote a z39.50 catalog, and now we're going to write up how it works'. The fact that there are multiple specs is definitely a point of confusion, as there are at least 3 substantially different specifications, the abstract, 1.* and 2.0, and people seem to just select which ever one props up their point (which I could easily do to argue for pages in support of my second point, but which I will attempt to hold off doing). The specs also seem to suffer from trying to prop themselves up by referring to other OGC documents, but in a very superficial way. An example: 'The catalog entry consists of an aggregation of metadata attributes, at least one of which describes the "footprint" of the data referenced. Thus, a catalog entry meets the fundamental definition of a feature. For this reason, the Catalog Entry class realizes the Feature interface, that is, it supports all interface protocols defined on Feature. Since the catalog entries are sub-types of feature, their aggregation, the Catalog, is a sub-type of feature collection. Thus, the Catalog realizes the interface for Feature Collection.' Now this statement is of course true, but anything with a geographic attribute is a Feature. They go on to suggest that thus one way to implement a robust catalog is to use OGC compliant feature data stores. This point actually lends support in defense of what I'm against in the second point =96 it says that you could have metadata be a feature.=20 But it talks not at all about how one would go about doing that, they just mention that the models may be similar, and for me confuses the issue even more. Ok, I'm going to stop talking about why I don't like the catalog spec(s), unless anyone feels it necessary to call me on some of this stuff, since I'm not going incredibly in depth, and one could easily level the criticism that I don't like it because I don't understand it. To which I could reply with a quote I recently heard, which is if you read a spec and don't understand it, don't worry about it, it's most likely that no one else understood it either, and so it's not going to be important. But needless to say, I think the catalog spec(s) is/are bad spec(s). What this does beg is an examination of how close GeoTools needs to follow OGC specs. I personally feel we have no obligation to at all.=20 They are not paying us (at one point they were, but even then there was no prerogative to use their specs in our interfaces). We originally used the OGC specs as inspiration for our interfaces because they are very good specs. When reading them it is obvious that a lot of very smart people with a lot of experience have thought through these issues long and hard. That we could bootstrap on their knowledge, and focus on implementation instead of abstract notions. And that we'd gain in clarity of our interfaces since anyone who knew the OGC specs would much more easily understand where we were coming from and going. And I think this has benefited us enormously. But I feel very strongly that we should not be dogmatic in our use of OGC specs. If they put out bad specs, there's no reason to incorporate them into our interfaces, to blindly follow where they lead. This does of course beg the question of GeoAPI. I must admit that I am less excited about GeoAPI then I initially was, mostly due to the fact that deegree seems to have dropped off the map. I saw it as a coming together of open source projects, not as having our interfaces voted on by the OGC. My feeling now is we should make use of them, but only where their interfaces are substantially better than ours, where the cost of rewriting is worth the gain. This is obviously just my opinion, and is open for discussion. I very much support Martin's work in GeoAPI, and feel good about using it for the lower level referencing and coordinate transformation stuff that he's always worked with. The geometry stuff seems like it could be good, allowing us to plug in different implementations. But when we get into datastores and even feature models I'm more hesitant. And I'm hesitant borrowing a Catalog interface that hasn't been tested as far as I know (and is drawn from in my opinion a bad spec(s)), and attempting to fit it to our needs.=20 Which leads me to my second point, but before we get there, one last thing on the use of OGC specs. I encourage us to evaluate OGC specs for how they can help us, if the interfaces one derives from them are useful for understanding and simplifying things. Open Source has the ability to choose from the best out there, and when the best out there isn't sufficient, to work in a community to come up with a better way.=20 We've got a lot of smart people here (who are often distracted with other priorities, understandably, since none of us are paid to directly work on geotools), and a lot of people who care about this project, I think we just need to come together and move our architecture forward another step. Albeit a bit more slowly than the last major changes, as people do seem to have more commitments, but I think we can move it forward. Before I move fully into point 2, one more sub point, the new interfaces completely break backwards compatibility, and not even by just a little. 2.1 has already changed things enough that I can't plug in many datastores from 2.1 into my 2.0 based GeoServer (which I'm not incredibly psyched about). This catalog change would require me to rewrite large chunks of code. Ok, onto point 2. The use of the interfaces is not inline with the intent of the specifications. I will concede that the specs can be argued in support of their use, as the specs make all sorts of broad claims about data access. But what the Catalog spec is actually _useful_ for is implementing search and discovery of geographic resources by their metadata. And by metadata I do not mean the FeatureType. FeatureType _is_ a form of metadata, it is data about data. But it's corollary in the web world is a GML application schema, which is _not_ what catalog services search on. Catalog services search on information about the data. Metadata of the type represented by FGDC or ISO 19115, metadata as detailed in Martin's metadata object, or from the catalog spec: The catalog object includes metadata (information like who, what, why, when, where and how) and search engines that let users identify holdings of interest. Catalogs describe and reference content found in storage collections and in other catalogs. Metadata is additional data about the actual FeatureType, or rather the full FeatureCollection, made up of the FeatureType and the features. A Catalog is made to query those records. In practice it may return a WFS or WMS link, or many times it will be just a website or someone to contact to get the data. In theory I agree it should be more closely linked, but it still should just be a reference, and not the data itself (despite the spec(s) confusing bits about how metadata can be a feature, which I think is not worth following at all). I actually am not completely against a Catalog construct in geotools, but I am against it being tied up in the datastores, in the actual source of the data. What I think what we are looking for right now is a common way to access Grid Coverages and Feature Sources. In OGC terms a common way to access WFS and WCS. In the OGC world the catalog spec is _not_ the answer to this problem. It is just a way to input a bunch of search terms and get a reference to WFS and WCSes (or other formats of data repositorys), based on data about the holdings. It receives search queries and returns the records. In the OGC world WFS and WCS do not implement Catalog, and I don't think our interfaces should either. A WFS will provide a small bit of meta information, what is held in the FeatureTypeInfo construct of GeoServer, but that's just for catalog servers to crawl and refer to, it's not to be queried directly. Using the metadata and catalog interfaces can be justified, as the spec is written so vaguely, but it really just confuses the api a lot more.=20 DataStore no longer looks remotely like a WFS, which was it's inspiration =96 getFeatures, transaction, getFeatureType, ect. It just refers to weird CatalogEntries that you have to dig into to start to get your features. Using a query operation and getting a QueryResult?=20 And then I'm supposed to iterate through that to get my CatalogEntries? Cast those to FeatureType Entries? The call getFeatureSource on that? Then all I get back is a FeatureIterator, if I want the bounds later I'm going to have to pass both the FeatureIterator and the FeatureSource around, or else just iterate through completely whenever I want the bounds. I know I'm probably misrepresenting things, but the point is I've read the specs and worked with this stuff extensively and it doesn't make more sense to me. And maybe I'm just holding onto the old, but I don't think so, I've been all for changes in the past. And this is ignoring that this stuff is not going to be backwards compatible. This all said, I do think there may be room somewhere in GeoTools for a Catalog construct. But it should be focused on metadata, like that defined in Martin's metadata package. It should be able to archive lots of FGDC/iso/dublin core/ect. Metadata, and provide search functionality for it. I actually implemented a proto version of this, for z39.50, using this great lucene toolkit. It would be useful if we could have that construct in GeoTools, and have it refer to FeatureSources, to provide search and discovery functionality. But it should not be all up in our DataStore and GridCoverages. It should be the type of thing where I could easily plug servlet hooks into it and fairly seamlessly implement the Catalog 2.0 http part of the spec. Or z39.50 and implement that part of the spec. With DataStores and catalogs as they are proposed that would not be the case. So can we get rid of all the Catalog references? And allow David to rewrite his stuff without having to refer to them at all? It actually scared me when I just a couple days ago found out that AbstractDataStore implements Catalog. If we do want a construct that can register and look up DataStores I think we should use the DataRepository interface, and make it suit our needs. But to some extent I feel that DataRepository should maybe extend DataStore, or vice-versa. That it's just a source of data, that is more decoupled from the actual data format. Which I think is the direction we want to head, and I don't think fitting into Catalog is the way to do this. This is what Rob A is also interested in, so that you can define FeatureTypes that are no longer coupled with the back end format. You could map columns into sub fields, define new names for the columns. Basically tell the DataStore how it is that you'd like to view the data, give it the instructions to make the Schema (FeatureType) to your specifications, based on a number of mappings/joins/ect. From the back-end format. You have your DataRepository, which is your range of possibilities, and then you can define the application schemas you want to derive from it. Ok, I'm getting ahead of myself, I really should save those thoughts for future emails, as I think we've got a number of requirements, and that this is going to end up more work than the effort that brought us to DataStore from DataSource. In this email I'd just like convince once and for all that a Catalog in geotools should be about metadata, as defined in Martin's MetaData interface, not some derived definition squeezed into it. So the problem we do need to tackle is the 'metadata' that defines a source of features or coverages. This is not covered in any ogc spec, and it's because OGC works in a web based world, and what we're dealing with here is files and databases. We need a way to register and find the source of data based on parameters specified by users. We need to simplify this so it works across raster and vector representations. I personally don't have super strong feelings on this, I'm yet to be fully convinced that a map is not find (though I will listen to others for sure). Perhaps a way out is to look to JDBC for inspiration, where the URL really is just a map of key/value pairs, the url prefix would specify the geotools data type, and the kvps the values needed. I don't really know, all I'm saying is I think a better solution can be reached if we don't constrain ourselves to follow some fairly random interfaces. We have some very major problems to solve =96 random access, joins, complex mappings, uniting raster and vector access, and some minor ones that are worth cleaning up =96 featurecollection/reader/results/iterator confusion, high and low api, AbstractDataStore, ect. And I don't feel a Catalog interface helps us for really any of those, and that it's presence in the proposed interfaces only obscured the good work that might actually be done in them (which I will have more to say in future emails). I do admit that some sort of greater structure is needed in GeoTools, to register a number of DataSources, but I'd feel better about doing it through DataRepository than bending the definition of Catalog and Metadata. Thoughts? What do others think about getting rid of this catalog stuff? Best regards, Chris ---------------------------------------------------------- This mail sent through IMP: https://webmail.limegroup.com/ |