|
From: Andrew D. <an...@da...> - 2001-10-16 19:34:44
|
(I've been trying to write this for weeks and keep getting distracted..) I have been working on a geographic search engine, and as I mentioned in my earlier "features" message, I am getting fed up with my Perl robot, and am trying to use htdig. I have been somewhat successful and have a version without map navigation at http://geotags.com/htdig/ My question is really how much support, if any, I might get from the htdig community for this, either as mainstream htdig or from anyone else interested in this kind of thing. The basic idea is that given a set of Web pages describing physical objects, such as restaurants, bridges, parks etc., that the search engine is capable of finding the nearest one. This requires metadata on the page explicitly giving the position being described (though it can sometimes be guessed from things like US zipcodes in the text), and a search algorithm that can score according to a geographic distance. The search algorithm is not a true geographic one (as for example "show me all objects completely or partially inside this polygon") but rather a modification of a text search, e.g. "pizza AND restaurant SORTBY distance" The changes required are (so far) A database element to store a position for each page (I actually store a region code and placename too but don't use them) An addition to the HTML parser to get the metadata An addition to the CGI parser to get a requested position (map click) A weighting algorithm to calculate geographic distance I also have a config item to essentially force a ROBOTS NONE if there is no geographic tag on a page, so that I can refrain from indexing untagged pages. I am also trying to add support for position passed in an experimental HTTP header, which allows one to dispense with the map and potentially generate requests based on current position automatically, e.g. using GPS. This is essentially what is online at http://geotags.com/htdig/, using a fixed map. I will probably create custom templates and then try to run this version of htsearch using the original Perl as a wrapper, to enable the map zoom/pan features, which requires scaling the map clicks according to the currently displayed map. I could either do the mapclick scaling externally, passing pure Lat/Long to htsearch, or pass the current map extents to htsearch and do it internally. Doing the whole thing including the map manipulations and graphical markup is somewhat more complicated, and might require hardwiring some map dependant code, unless it could be done in templates. I haven't really looked at that yet. -- Andrew Daviel geotags.com etc. |