From: <Sim...@cs...> - 2009-09-26 10:10:13
|
Apologies link is http://trac.osgeo.org/geonetwork/wiki/ComposedMetadataRecords ________________________________________ From: Simon Pigot [Sim...@ut...] Sent: Saturday, 26 September 2009 7:56 PM To: Francois Prunayre Cc: Devel geo...@li... Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing Hi All, Looks good Francois - with regard to the WFS, the proposal crosses over with the proposal to harvest metadata from a WFS by converting features to ISO metadata fragments which can be linked into records (ComposedMetadata proposal in the list of proposals on http://trac.osgeo.org/geonetwork/proposals). I guess by comparison the composed metadata records harvested from WFS approach is an attempt to structure the info from the WFS rather than dump it directly into the index for free text search (both are valid approaches - composing the metadata records requires more work but permits targetted searching and because it uses a GN harvester & the xlink cache indexing is still speedy). Would also be interesting to index content from attached document resources like pdf or doc files, maybe using the apache tika content analysis toolkit too? (http://lucene.apache.org/tika/) Cheers, Simon Francois Prunayre wrote: > Hi Thijs, > > 2009/9/25 Thijs Brentjens <li...@br...>: > >> Great idea. Could be very powerful! Just to get it right for me: this patch >> indexes data directly (if referred to in a metadata record) and adds this >> information to the metadata records to improve search results. >> > That the point. > > >> Possible practical issue: for WFS, even if you're using maxFeatures (as in >> the patch), still the indexes could grow quickly, so I think one does want >> to use a relatively small amount of features for indexing. >> > True, an idea, could be also to remove all non-text fields which > sounds not really useful at first glance. > > But if using just > >> a few features, maybe the data returned is not representative enough. So >> there is some balance to find here (worth experimenting..). But still, I >> think it improves matching search results to queries. >> >> And in some cases, when data could change quickly in time, the indexes may >> become outdated, possibly resulting incorrect search results. >> > True also, but the index is updated for a record, everytime somebody > look at it (due to popularity increase) and related documents will be > parsed again (maybe we should only update the popularity value in the > index but for the time being the full record is reindex). > > But again: > >> this is just in very rare cases.. I think these are just minor issues; >> things to find out if they really do occur. Do you have some results / demo >> maybe? >> > Not really, just had a try with some WFS I know about. > > > >> And to enable this feature, maybe add an extra queryable as well? To search >> on the data (only) or maybe disable searches on data somehow? Would that be >> possible? >> > For that, we could create a specific field in the index; "any" > contains metadata full text info, another field to store data info. > Easy. > Maybe this field could be updated on a regular basis in a background task. > > Thanks for the comments. > Francois. > > >> best regards, >> Thijs >> >> Francois Prunayre schreef: >> >>> Hi list, this is more an experiment on how to index related documents >>> which could be referenced in a metadata records. >>> >>> For example having a kml document or a related WFS services in the >>> distribution section, we could then try to retrieve the document (GML >>> or KML) and index them in the full text search criteria (ie. any) the >>> content of those remote document. >>> This will slow down the index process for sure but could be useful in some >>> ways. >>> >>> Attached a quick patch adding the feature to the index mechanism for >>> iso19139 records. >>> >>> Any thoughts ? Any people working on that direction ? >>> >>> Ciao. >>> Francois >>> ------------------------------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------------ >>> Come build with us! The BlackBerry® Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and stay >>> ahead of the curve. Join us from November 9-12, 2009. Register now! >>> http://p.sf.net/sfu/devconf >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> GeoNetwork-devel mailing list >>> Geo...@li... >>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel >>> GeoNetwork OpenSource is maintained at >>> http://sourceforge.net/projects/geonetwork >>> >> > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > GeoNetwork-devel mailing list > Geo...@li... > https://lists.sourceforge.net/lists/listinfo/geonetwork-devel > GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork > > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ GeoNetwork-devel mailing list Geo...@li... https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork |