From: Ted H. <Ted...@no...> - 2009-09-26 16:09:11
|
Hello all, Just wanted to mention that this proposal is conceptually equivalent to http://trac.osgeo.org/geonetwork/wiki/ComponentsAndComposites. We are definitely interested in participating... Ted On Sep 26, 2009, at 4:08 AM, Sim...@cs... wrote: > Apologies link is http://trac.osgeo.org/geonetwork/wiki/ComposedMetadataRecords > > ________________________________________ > From: Simon Pigot [Sim...@ut...] > Sent: Saturday, 26 September 2009 7:56 PM > To: Francois Prunayre > Cc: Devel geo...@li... > Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml > and wfs indexing > > Hi All, > > Looks good Francois - with regard to the WFS, the proposal crosses > over > with the proposal to harvest metadata from a WFS by converting > features > to ISO metadata fragments which can be linked into records > (ComposedMetadata proposal in the list of proposals on > http://trac.osgeo.org/geonetwork/proposals). I guess by comparison the > composed metadata records harvested from WFS approach is an attempt to > structure the info from the WFS rather than dump it directly into the > index for free text search (both are valid approaches - composing the > metadata records requires more work but permits targetted searching > and > because it uses a GN harvester & the xlink cache indexing is still > speedy). > > Would also be interesting to index content from attached document > resources like pdf or doc files, maybe using the apache tika content > analysis toolkit too? (http://lucene.apache.org/tika/) > > Cheers, > Simon > > > Francois Prunayre wrote: >> Hi Thijs, >> >> 2009/9/25 Thijs Brentjens <li...@br...>: >> >>> Great idea. Could be very powerful! Just to get it right for me: >>> this patch >>> indexes data directly (if referred to in a metadata record) and >>> adds this >>> information to the metadata records to improve search results. >>> >> That the point. >> >> >>> Possible practical issue: for WFS, even if you're using >>> maxFeatures (as in >>> the patch), still the indexes could grow quickly, so I think one >>> does want >>> to use a relatively small amount of features for indexing. >>> >> True, an idea, could be also to remove all non-text fields which >> sounds not really useful at first glance. >> >> But if using just >> >>> a few features, maybe the data returned is not representative >>> enough. So >>> there is some balance to find here (worth experimenting..). But >>> still, I >>> think it improves matching search results to queries. >>> >>> And in some cases, when data could change quickly in time, the >>> indexes may >>> become outdated, possibly resulting incorrect search results. >>> >> True also, but the index is updated for a record, everytime somebody >> look at it (due to popularity increase) and related documents will be >> parsed again (maybe we should only update the popularity value in the >> index but for the time being the full record is reindex). >> >> But again: >> >>> this is just in very rare cases.. I think these are just minor >>> issues; >>> things to find out if they really do occur. Do you have some >>> results / demo >>> maybe? >>> >> Not really, just had a try with some WFS I know about. >> >> >> >>> And to enable this feature, maybe add an extra queryable as well? >>> To search >>> on the data (only) or maybe disable searches on data somehow? >>> Would that be >>> possible? >>> >> For that, we could create a specific field in the index; "any" >> contains metadata full text info, another field to store data info. >> Easy. >> Maybe this field could be updated on a regular basis in a >> background task. >> >> Thanks for the comments. >> Francois. >> >> >>> best regards, >>> Thijs >>> >>> Francois Prunayre schreef: >>> >>>> Hi list, this is more an experiment on how to index related >>>> documents >>>> which could be referenced in a metadata records. >>>> >>>> For example having a kml document or a related WFS services in the >>>> distribution section, we could then try to retrieve the document >>>> (GML >>>> or KML) and index them in the full text search criteria (ie. any) >>>> the >>>> content of those remote document. >>>> This will slow down the index process for sure but could be >>>> useful in some >>>> ways. >>>> >>>> Attached a quick patch adding the feature to the index mechanism >>>> for >>>> iso19139 records. >>>> >>>> Any thoughts ? Any people working on that direction ? >>>> >>>> Ciao. >>>> Francois >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Come build with us! The BlackBerry® Developer Conference in >>>> SF, CA >>>> is the only developer event you need to attend this year. >>>> Jumpstart your >>>> developing skills, take BlackBerry mobile applications to market >>>> and stay >>>> ahead of the curve. Join us from November 9-12, 2009. >>>> Register now! >>>> http://p.sf.net/sfu/devconf >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> GeoNetwork-devel mailing list >>>> Geo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel >>>> GeoNetwork OpenSource is maintained at >>>> http://sourceforge.net/projects/geonetwork >>>> >>> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, >> CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market >> and stay >> ahead of the curve. Join us from November 9-12, 2009. Register >> now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> GeoNetwork-devel mailing list >> Geo...@li... >> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel >> GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork >> >> > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > GeoNetwork-devel mailing list > Geo...@li... > https://lists.sourceforge.net/lists/listinfo/geonetwork-devel > GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > GeoNetwork-devel mailing list > Geo...@li... > https://lists.sourceforge.net/lists/listinfo/geonetwork-devel > GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork ==== Ted Habermann =========================== Enterprise Data Systems Group Leader NOAA, National Geophysical Data Center V: 303.497.6472 F: 303.497.6513 "If you want to go quickly, go alone. If you want to go far, go together" Old Proverb ==== Ted...@no... ================== |