From: Gilles D. <gr...@sc...> - 2003-11-17 22:49:13
|
According to Christopher Murtagh: > On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: > > The -m option MUST be followed by a file name, and this file must > > be a list of one or more URLs to add to the index. The htdig.html > > page is a tad misleading, as it shows [url_file] in brackets, which > > would suggest the file is optional, but the description for -m in > > http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in > > the file provided and no others." How will it get teh URL(s) if you don't > > provide a file? The description says nothing about reading from stdin. > > (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one > > feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before > > the feature freeze.) > > Hrm, a bit more than a tad misleading this is what I have in the docs > that shipped with a 3.2 tarball, regarding having '-' for htdig: > > 'Get the list of URLs to start indexing from the STDIN. This will > override the default start_url and the file supplied to -m [url_file].' > > http://lovelace.wcg.mcgill.ca/htdig/docs/ [htdig.html in frame] > > Funny thing is that the URL that you provide also has this same > description, and definitely says it will read from STDIN. > > However, htdig is getting the file. When I add 'v's it displays the > content/title and everything. It just doesn't add it to the index. Well, right you are. And, in fact, this is true, but what the documentation doesn't say is that the single "-" to get it to read from stdin must be after all the other options. Otherwise, the "-" causes htdig to stop scanning the argument list for option arguments, so it wouldn't see your -c option (even if -m wasn't swallowing it!). So, I'm guessing here that htdig is using the default htdig.conf file, instead of the one you want, and so it end up updating a different database. Is this right? In any case, you need to follow the -m with a filename, even if the final "-" overrides it. The behaviour we actually want to shoot for is what 3.1.6 does, which I think is much more consistent and logical (and better documented). See http://www.htdig.org/htdig.html to see what it should be. In the meantime, You should probably do something like this: echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026232' | ./htdig -s -v -m foo -c /www/htdig/install/conf/ads.conf - The "foo" will be ignored. > > With the syntax above, htdig will try to open a file called "-c", which > > it won't find, so it won't add any URLs to the index. > > How hard would it be to add it? I suppose I could write the url to a > temporary file as well. It shouldn't be hard to do. I just ran out of time earlier to do it before the feature freeze, as it wasn't the highest priority thing to tackle at the time (bug fixes came first). It should just take me an hour or so to compare the 3.1.6 and 3.2.0b5 htdig/htdig.cc code to see what changes are needed in the latter, then of course to code it, test it, document it and commit it. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |