From: Wolfgang M. <Wol...@cu...> - 2001-11-01 21:59:21
|
Hi, Gilles, Thanks for your fast reply and for saving me from trouble. > Before you go stripping out code from htsearch and end up having to > maintain another search program in parallel to htsearch, be sure you > fully investigate the template capabilities of htsearch. There's almost OK. In fact, I stumbled across these things *after* having had started to strip htdig. > nothing in the output that you can change through some template file > and/or config attribute. In fact, I recently committed to the 3.1.6 > development code a number of new attributes that take care of the last > few remaining areas that weren't configurable. See the new attribute > search_results_contenttype in this coming Sunday's 3.1.6 snapshot, as > well as the existing add_anchors_to_excerpt attribute. Ive got some CVS of htdig. So I'll update that. > > What I would like to know: the summary strings you generate, are they > > well-formed XML (in particular is there for each opening tag also a > > closing tag? ) > > You probably want to turn off anchors in excerpts (although they are > well-formed HTML/XHTML), and all the rest is under control of the > template files. I think the distributed templates are all well-formed > with closing tags. Exceptions to this right now are the <br>, <hr>, > <img>, <input> and <option> tags in the header, footer, nomatch, syntax > and wrapper HTML files, but you're going to change them anyway. Oh, > there's also a non-self-terminating <br> tag in long.html and short.html > that you may want to change to <br/> or remove. Yes, I rolled myself some MRML.html, which generates query-result-element MRML tags. The main problem I see is that I think a purely command line version of htdig would be cool. Purely commandline in the sense that you can write some shell that sais: /path/to/htsearch --no-cgi --query-string="eggs and bacon" --sorted-by=score >> results.xml (or some less fancy options style) and there is no interaction. Would you like me to start working on that some time? Currently, in fact, I have my ht_cli done (yes, a very ugly hack, and surely not commitable) and I am looking a bit more in the GIFT/popen side of things, which has me working on modifying Perl GIFT client for testing GIFT/HtDig. To summarise: 1) I will surely look deeper into templates (...and I will surely contribute the templates when they are done). 2) I am waiting for your comments on adding some code that makes it possible to call htsearch by a batch, and I would also like to get rid of the http headers in the output(or did I overlook something??) 3) By simple modification of a couple of templates I seem to get the XML I want out of htsearch, at least for the case where there is no error (I tried that in the ten minutes between writing 2) and 3). Cheers, and thanks, Wolfgang -- Dr. Wolfgang Müller, assistant == teaching assistant Personal page: http://cui.unige.ch/~vision/members/WolfgangMueller.html Maintainer, GNU Image Finding Tool (http://www.gnu.org/software/gift) |