From: Gilles D. <gr...@sc...> - 2001-10-19 15:29:08
|
According to Andrew Daviel: > I have been working on a geographic search engine, and as I mentioned in > my earlier "features" message, I am getting fed up with my Perl robot, and > am trying to use htdig. > > I have been somewhat successful and have a version without map > navigation at http://geotags.com/htdig/ > > My question is really how much support, if any, I might get from the htdig > community for this, either as mainstream htdig or from anyone else > interested in this kind of thing. My question would be how much support do you need. Right now, all the developers are pretty strapped for time, so you probably won't get a lot of development work done for you. However, if you want to make changes to the C++ code yourself, I'm sure Geoff, myself, or a few others can suggest the best approaches and where to put in the changes. If they're done in a general enough way, they could even be incorporated into the 3.2 development code. > The changes required are (so far) > > A database element to store a position for each page > (I actually store a region code and placename too but don't use them) This is one of the more involved changes, but certainly feasible. It would require extending the DocumentRef and DocumentDB classes in htcommon, to handle the new field. To maintain compatibility, you'd want to add the new field code to the end of the enumeration in DocumentRef, to avoid shifting over the other codes. > An addition to the HTML parser to get the metadata Should be quite easy. This would likely involve both the HTML and Retriever classes in htdig. > An addition to the CGI parser to get a requested position (map click) I don't know what that would involve, but it might be possible with a front-end wrapper script for htsearch. > A weighting algorithm to calculate geographic distance You'd need to work out the specifics of the calculations. Right now, the scoring is done in Display::buildMatchList() in htsearch, but this code may get reorganized in the next month or two (in the 3.2 betas). > I also have a config item to essentially force a ROBOTS NONE if there > is no geographic tag on a page, so that I can refrain from indexing > untagged pages. Code to support this can go in the HTML class. Adding config attributes is pretty easy in 3.2, as everything for defining and documenting them goes into htcommon/defaults.cc. (Lots of examples to choose from in there!) > I am also trying to add support for position passed in an experimental > HTTP header, which allows one to dispense with the map and potentially > generate requests based on current position automatically, e.g. > using GPS. This would affect the HtHTTP and Document classes in htdig, as well as maybe the Retriever class. Sounds like an interesting project. I hope you have a C++ programmer to help you get the changes into the code. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |