From: Cameron S. <ca...@bi...> - 2002-01-24 20:42:23
|
Summary: ======== Discussion about design of Datasources in Geotools 2. Ray will be putting together a UML description (in Argo) collating the ideas (to come). Next IRC meeting: 0930 Tuesday 29 January 2002. server:irc.openface.ca port:6667 channel: #mapbuilder -------------------------------------------------------- IRC Logs: ======== --> You are now talking on #mapbuilder <cameron> with you in a tick, just putting liam to bed. <jago> ok --- jago is now known as james <cameron> I'm back. Shall we wait for Ray. <james> yes, he has just come online juding by messenger <james> ok, he should be here in a sec --> Ray (a...@m7...) has joined #mapbuilder <Ray> mornin <james> mornin ray <cameron> evening <james> looks like its just the three of us <Ray> how's the weather <cameron> That is fine. <cameron> Weather is coolish - southerly buster has just come through after a few hot days. <Ray> grand <james> Miss it ray? <Ray> I still have flashbacks <james> I suspect you always will <cameron> Discussion for today: Data Sources? Anything else? <james> File structure for GT2, getting the source repository set up <james> Oh and a new web site, but I'll come to that latter <cameron> Ray, what would you like to contribute? <james> But lets start with Data Sources <Ray> Right oh. <Ray> DataSources: <Ray> The basic idea is to make an extensible loading system at one end, and an index-driven data repository at the other <cameron> 1st, thanks Ray for your ideas. I had to restructure my idea of a data source after reading your design. <Ray> The first proposal was a little breif <Ray> No worries. <cameron> I'd like to know what you mean by indexes. <Ray> I was going to reply to your mail, but I got to rethinking my own design again after reading it <cameron> hee, hee, hee. <Ray> ...it's always the way. <cameron> I felt the same. <Ray> But there were a couple of ideas thrown into the mix - if we swish these around fer a while, I can throw together a fuller proposal. <Ray> Indexes : <cameron> Good idea. <Ray> If you have a lot of data which needs to accessed quickly, it's best to sort it to that searching algorithms can be effective <cameron> OK, got that. <Ray> So the Index interface was just that. It's an array of references to mapshapes (stored in a DataSource). <cameron> Are the indexes spacial or textual? <Ray> Dosen't matter <Ray> I've been thinking in the abstract <cameron> So an index is an object unto itself? <Ray> all it sees are objects - sorting can be accomplished using Comparator objects <Ray> The index is the interface that the Viewer would see when it wants to know what mapshapes to display <Ray> note: for "mapshapes", read "geometry" or "geopolygon" or whatever <james> Need to whatch for when the data provider can do a better job of indexing <Ray> all the viewer wants to know is what the mapshapes are, and sometime to perform searches on the data <cameron> So could an index object implement a DataSource? <Ray> so, an Index is buildt up around a DataSource <james> and there may be several indexes <Ray> Indexes are fairly atomic. <Ray> There should be multiple Indexes built up around a single DataSource <Ray> Like building indexes in a database table <james> I guess they could be a SortedSet from collections <Ray> yeah <Ray> Or would use one <james> Well yes. <Ray> There could also be Indexes which filter <Ray> Indexes could be tied together <Ray> like, a FilteredIndex built around a SortedIndex <Ray> Actually - that should be the other way round <james> nope <cameron> OK, I think I understand now. <james> Imaging filtering for level of detail on a size attribute <Ray> So - starting at the DataSource :- <james> you would want to sort sizses first <Ray> Yeas, like that: <Ray> So: Starting at the DataSource: <Ray> It contains a 2d array of objects <Ray> Rows and columns <Ray> It as a single array of Object[] arrays <Ray> The first single array is for rows <Ray> Each offset in the Object[] is a column <Ray> The first offset in each row is a MapShape, the rest are arbitrary String or Integer values (X, Y, POPULATION) <Ray> An Index may be built by getting the DataSource's Default Index <Ray> And building an array of references to each row in the Index using whatever algorithm (sorting or whatever) <cameron> My next question is whether indexes should be applied privately within a datasource, or whether the index is accessable and changable for a datasource. <cameron> What is the use case which requires an index? <Ray> They're there for speed. They are useful for faster filtering, too. <james> Selection by attribute <james> Level of detail rendering <Ray> Getting the functionality in at the bottom level opens a few dorrs for expension and extensibility, etc. <james> Building classifications on atributes <james> actualy, classifications is a big use <james> I do a lot of sorting for that already <Ray> At the moment, filtering is done on the fly on each repaint, which can get a bit slow <james> very very slow in one aplication I have <Ray> A FilteredIndex would attach itself as a FilterChangeListener on the given filter and call it's Rebuils() method each time the filter changes <cameron> OK, so rendering speed is a problem - i didn't realise that. <james> It has a large number of cities in it, and needs to work out which ones to draw at what zoom level <Ray> The rebuild should go as fast a single repaint under the current system, other repaints would go faster <james> As long as we are not indexing columns that we wont need, yes <Ray> Nope - indexes would have to be built specifically for each application <Ray> ...or applied for each application <james> Now we need to be carefull about the name here <cameron> So does this mean that a filter needs to know about data types in a data source? <james> What you call a data source, I was calling a FeatureTable <Ray> Yep <cameron> I guess we need some definitions. <james> just a sec <james> Ian has arrived in his office <Ray> Although I felt that the FeatureTable/DataSource may need to be extended for, say, complex data sources (like an in-memory compressed file) <Ray> which Ian? <james> Turton <Ray> ah <james> I'll just grab him <james> 1min <cameron> While we wait, I think "Index" is not a good object name. Might want to try "DataSourceIndex" or similar. <james> OK, both of us are here <Ray> Sure. Although my fingers do get tired quite easily. <james> Ian> sounds better with a full name <cameron> Oh, fine for the typing - comment only applies to the final design. <Ray> right oh <james> now, confusion on datasource vs datatable/store <james> I see a data source as something like a shapefile, or a PostGIS database etc <Ray> I was thinking of them as the same thing - I saw a DataSource as a FeatureTable, basically, with loading being handled by a Loader <james> OK, so my idea of a data source is your idea of a loader <Ray> yeah <james> which is ok, but I worry that indexes will behave differently <cameron> Ah, to me a FeatureTable sounds like a Layer within a DataSource, or FeatureCollection. <Ray> Actually - yeah - shall we call them Loaders and FeatureTables from now on <Ray> A FeatureCollection would correspond more to one column of a FeatureTable <james> yes <james> Things get very very nasty when we have feature collections in the column <Ray> A FeatureCollection would be split in a FeatureTable implementation <Ray> Not in a column - AS a column <james> <ian>not sure about that <james> yes <Ray> I had not thought about storing features in something other than a FeatureTable <james> but what about GML which allows feature collections to contain feature collections <cameron> I'm still not sure what a Loader is. Is this a DataSource like a WFS? <james> yes <Ray> WFS? <james> Web Feature Server <Ray> Oh. <james> An online service for providing features <Ray> A Loader is a fairly simple interface with just, say a load() method, the implementation of which would handle the details for whatever source the data was coming from <james> I can see us having a lot of loaders, but only one FeatureTable <cameron> In that case, can we call a Loader a DataSource - it seems to be more descriptive (to me). <Ray> yeah <Ray> right oh <james> <ian> yes <james> yes <cameron> A DataSource is a bit more than just a load() function. <james> yes <cameron> It should contain a load(extent) which returns just the features within the extent. <Ray> ...not as far as a the DataSourcecontroller is concerned <cameron> Possibly also load(extent, layer) <Ray> the idea is the be able to plug different DataSource(Loader) implementations in to a given Featuretable or a threaded DataSourceController and let it just load the thing <james> not layer <james> layers are probably gone <Ray> no, Layers are a separate aspects - layers would use an Index <james> load(extent, featureTable) <Ray> or themes <james> i.e. load into FeatureTbale features for extent, that are not already in feature table <cameron> Both the Web Feature Servers, and Web Map Servers have a concept of "Layer". <Ray> yes <james> Ian> WFS and WMS layers are application logic and shouldn't influence our thinking here <cameron> ok. <cameron> Any chance we can change the name of FeatureTable to FeatureCollection (to fit in with GeometryCollection name in OGC specs). <Ray> sure <james> Hang on <james> that could get very confusing <Ray> actually - i think it could <james> One column of the feature table is a GeomtetruCollection <Ray> yes <cameron> No, A feature contains many attributes, but only one Geometry. <james> Not always true <cameron> A Geometry may be complex, like a MultiLine, but it is just one geometry. <james> GML allows multiple geometrys for one feature <Ray> I would be possible to include functionality in an Index which would present the indexed data as a GeometryCollection <james> e.g. School -> Boundry, Buildings, Center, Fields <james> One feature, many geometrys (and not just multi part) <cameron> Ah, yes, point taken. <james> Yes, I think it should be possible for indexes to look like GeometryCollections <Ray> I'm not sure of the conversion overhead, but it should be okay if we make Index an abstract superclass instead of an interface and finalize the code there. <cameron> Um, shouldn't an index look like a FeatureCollection rather than GeometryCollection? <james> Both <james> An index which only returns one column of data could be geometry collection <james> An index wich returns ROWS from the table would be a feature collection <Ray> Well - Viewers and things would not be dealing directly with FeatureCollections to get their Features - they'd have to get an Index first <Ray> hmm. That's an interesting split <Ray> What object does the Viewer need to see to be able to draw effectively? <Ray> for Viewer: read whatever object is doing the mapshape handling (theme or whatever) <james> lots of things <james> at a basic level it needs just the geometry <james> but the styling is going to need some or all of the attibutes <Ray> because my thinking was that the Index would be the point for accessing the data. <Ray> That's fine, that would be returned by the Index, too <james> OK, that works <james> now what about this use case <Ray> Could add a couple of (probably final) methods to the index superclass - getFeatureCollection and getGeometryCollection <james> a database has 200 columns, most of which we dont need <Ray> You wouldn't need to load them all in <james> we dont want to build a featuretable with all of them in <Ray> The Loader would filter only those columns you need <Ray> Timemap works that way at the moment <james> ok <james> Ian> What about times when the database can do better indexing that we can <james> Ian> Can we take advantage of that? <Ray> Then we don't need indexes <Ray> depends on the application, I suppose <Ray> If you know you're gonna get back data from a sorted query, then you're away <james> Some form of shallow/passthough index could work <cameron> Is an index an extention of a FeatureCollection, or a parameter for a FeatureCollection? <Ray> I would envisage the Featuretable's default Index as being sorted anyway <james> Who builds the index? <Ray> The default Index would be a SortedIndex, which would probably by default be sorted on X, then Y <Ray> DataSourecController <james> Ian>The loader or the feature table <james> I don't like controllers <Ray> which also handles threading, and take's the onfo for what classes to create (through factory classes) from Property objects <Ray> The Index ususally builds itself <Ray> but, on a signal from an event. <Ray> or when the data loaded changes <Ray> So.. what don't you like about controllers? <cameron> No, I prefer to get rid of them too. <cameron> Hopefully you can roll functionality into another class to make it cleaner. <james> I'd prefere the classes to be self threaded <james> I don't see the need for the controler <Ray> The only thing that needs to be threaded is the Loader <cameron> yes. <Ray> But I would shy away from having it thread itself <james> The featuretable can thread the call to load(extent,table) <Ray> it could, yes, but that may be too much functionality in one place <Ray> ...or it may not <cameron> I agree with James on this one. <Ray> right oh <james> If things use event properly they can trigger changes when needed <Ray> yeah. There's a bit more coding involved, though, in the whole load process, as I saw it. <cameron> It might be worth trying to make all events start on a new thread. <Ray> Here's what I saw: <james> I think we sould stay flexible <Ray> that many threads could get a bit awkward. <Ray> So - <Ray> what I was thinking was: <Ray> Indexes, FeatureTables, Lodaers could all be put together in whatever way you wanted, depending on the program. Indexes, and Loaders, though, are all very different implementations of the same thing <james> Erk! <Ray> So it would make sense to make factory classes for each of these Class types <Ray> by "same thing" I mean there a lots of different Index classes and lots of different Loader classes <james> It needs to be very very easy for someone to add suport for a new file type/ data source <Ray> sorry - that was a bit obscure <Ray> yes I know <Ray> so anyway <Ray> Factory classes : <Ray> An instance of a factory class would read a Properties file into a java.util.Properties object <Ray> containing all the classes and an identifier for each <Ray> the properties file could be stored in the geotools jar <cameron> How about setting up a "DataSource" interface, and Indexes, FeatureTables and Loaders implement the interface. <Ray> all the geotools-specific classes would be in the geotools jar, in , say "geotools/res/Indexes.prop" <Ray> the Factory would look for this property class by default, and also in "res/Indexes.prop" <Ray> res/Indexes.prop" would be the list of classes Joe Bloggs makes when he wants his own DataLoading and indexing implementation <Ray> ---let me get back to the idea of controllers for one second (we can always stick this into DataSource) <Ray> The idea of a controller was to take it's own property object, telling it which datasources to load, and what classes to use for the loading and indexing, and going to the factory classes for that. <Ray> So... have I gone off on a complete rant or is everyone still there? <cameron> still listening. <james> almost still with you <Ray> ah. Beacuse I think that's it. <james> I think we need to build some of these things <cameron> I'm trying to get a common DataSource interaface to fit into your model. <james> get an idea for what works <Ray> ok <Ray> I have a couple of questions <james> fire away <Ray> actually, I'm not quite sure what the question is, because I'm not sure of the uses GT2 is expected to be put to. I'm still trying to get straight in my head how the thing will fit in with the GT infrastructure <james> ian> As I see it: <james> ian> a theme asks the feature table /index for the features to draw (it possibly styles them accordinbg to attributes) <james> It then calls a series of "primative" drawing methods in the render which "draws" them on the viewer <Ray> Loading would be started by - the applet? <james> Applet, Application, Servlet <james> yes <Ray> right oh <james> now a tricky point in all of this <james> when to load <james> and how much <james> When the view changes do you fetch the features needed for the new display <james> Or did you already get all of them just in case <james> The above will change depending on the size and speed of the data we are playing with <Ray> it should be possible to do both <james> Indeed <james> but how do you decide which to do when, that is the hard bit <james> ian> this should depend on the loader <james> And, how long do you hold onto features for. <Ray> not sure about that - the loader wouldn't know what the extents of the map are... <james> It does know how fast it is, and how much data there is <Ray> also - the "extent" of the map may depend on more than just X & Y - it may be dependant on, say, a TimeFilter <james> indeed <Ray> or any other kind of filter <james> true <james> so how much do you get at once, and who decides? <Ray> would it make sense to say that the Viewer (read:whatever) queries the Index for mapshapes based on an extent <Ray> actually - no it wouldn't <Ray> don't query an index <Ray> it should be relatively static <james> build an indes as a result of a query <Ray> and only rebuilt when it needs to be <cameron> In that case, a query should be based on extent, and then an attrib filter put on top of that. <cameron> (in the index) <Ray> Would it make sense to, say, have an Extent object with an extensible iswithin() method ? <james> extensible? <Ray> em, yes, abstract <james> to time, or whatever you mean <Ray> you know, people would tailor their own extent to fit their needs <Ray> yea <cameron> The jts has an extent object which provides "iswithin" among others. <Ray> so it does make sense <james> almost <Ray> almost is good enough for most enterprise level application programming <james> hehe <james> Any one know about 'extream programming'? <Ray> I think I've heard this before... <cameron> Ray, I'm trying to work out whether Index, Filter, FeatureTable and all implement one interface (say DataSource). <Ray> Oh, yeah, I remember, I read an article on it once <james> It is a way of thinking about code and design that involves minimal UP FRONT effort <Ray> nope nope nope <cameron> etream - no, not me. <Ray> how do you mean? <Ray> http://www.extremeprogramming.org/ <james> I'm just saying we should start on the code, set up unit tests, and be prepared to refactor LOTS <james> the code is the design is the code <cameron> I'd like a concept of one DataSource (be it an index, WFS, ...) <Ray> yes that sounds great, not sure what it means, but it sounds great <james> I agree cameron, almost <Ray> I definitely think the thing, to be as extensible as possible, should fit together like lego bricks <james> but I think Indexes are something a little special <james> Can we let this mull in our heads for a while and move on <Ray> For most, programs, though, they wouldn't be dealing with indexes at all <james> And come back together in a few days <Ray> yeah ok <cameron> Ie a DataSource should have getFeatures(extent) and return a FeatureTable, no matter what the source <Ray> Shall I put together another, more up to date design? <james> cameron - yes <cameron> oh, yeh, good idea. <Ray> right oh <james> though probably populate(extent,table) but enough for now <Ray> Well, it's been nice meeting you Cameron <cameron> likewise. <Ray> talk to you soon.# <james> Good to have us all together <Ray> see you all <Ray> yeah <Ray> this was a really good idea <cameron> OK, what are the actions here? <Ray> Ray: make more detailed design based on what was discussed <james> everyone thinks <Ray> does anybody want this conversation mailed to any mailing lists? <cameron> yes, good for archiving. <james> devel sould take it <james> also, new web site will be able to take it soon <Ray> or at least the good bits <cameron> I can send this out. <Ray> grand <cameron> Shall we organise another meeting in a few days - or maybe a week? <Ray> yeah <cameron> Say Tuesday? <Ray> fine by me <james> and me <james> ian> also <cameron> OK, same time, same place. Will add this to the mail out. <james> ian> I'll get my own IRC client by then <cameron> Good. <Ray> great. See you all then <cameron> anything else? <james> I want to open the CVS repository <cameron> Ray, are you using poseidon for the UML? <Ray> don't think so <Ray> nope. Argo <Ray> haven't even heard of poseidon <james> Same thing <cameron> it is the commercial version of argo. <james> Argo is open version of Persidon <Ray> ah <cameron> Could you email the .argo file when you are done then. <Ray> sure <james> Any comments on the directory struture I posted a while back <cameron> James, did you want to talk about CVS? <james> I'm going to open the repository <Ray> I'm gonna sign off now, and get some coffee <cameron> Um, not sure I saw it. <james> Good move <Ray> see you s later <cameron> bye ray. <james> everthing else can wait <james> Thanks ray <-- Ray has quit () <james> I'd best go too <cameron> James, can you send the dir structure again? <cameron> ok bye. <james> Will do <james> Bye <-- james has quit () **** ENDING LOGGING AT Thu Jan 24 22:09:27 2002 |