|
From: Christopher M. <chr...@mc...> - 2003-11-13 22:10:01
|
Greetings htdig folks, Recently I've been trying to have htDig purge and re-index items (via a trigger in Postgres). The purge seems to work as I no longer see the item in the search results, however, when I try to re-index, I cannot bring the page back in unless I do a full index. I've just installed 3.2.0b5 hoping that this would help, but no luck. Here's some output from my command line attempts to get it to work: [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=10266860 [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf ht://dig Start Time: Thu Nov 13 16:36:02 2003 New server: newfind.mcgill.ca, 80 0:11472:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860: (changed) size = 660 htdig: Run complete htdig: 1 server seen: htdig: newfind.mcgill.ca:80 1 document HTTP statistics =============== Persistent connections : Yes HEAD call before GET : Yes Connections opened : 2 Connections closed : 1 Changes of server : 0 HTTP Requests : 3 HTTP KBytes requested : 0.442383 HTTP Average request time : 0 secs HTTP Average speed : inf KBytes/secs ht://dig End Time: Thu Nov 13 16:36:03 2003 So although this thing has been purged and re-entered, it no longer shows up in the query results. Also, it seems that the dbs aren't being updated after the htperge and htdig. Again more output from my konsole (note the moddates and filesizes - also, the filesize of db.docdb doesn't change between the purge and re-index): [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads total 1584 -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index -rw-r--r-- 1 root root 344064 Nov 13 16:38 db.docdb [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825 [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf ht://dig Start Time: Thu Nov 13 17:05:14 2003 New server: newfind.mcgill.ca, 80 0:11475:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825: (changed) size = 336 htdig: Run complete htdig: 1 server seen: htdig: newfind.mcgill.ca:80 1 document HTTP statistics =============== Persistent connections : Yes HEAD call before GET : Yes Connections opened : 2 Connections closed : 1 Changes of server : 0 HTTP Requests : 3 HTTP KBytes requested : 0.442383 HTTP Average request time : 0 secs HTTP Average speed : inf KBytes/secs ht://dig End Time: Thu Nov 13 17:05:14 2003 [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads total 1584 -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index -rw-r--r-- 1 root root 344064 Nov 13 17:05 db.docdb So, and info or help on this would be much appreciated. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Neal R. <ne...@ri...> - 2003-11-14 22:50:54
|
Questions:
1) If you run an htdump -w before and after the purge, do the db.docs
files differ?
For me, they differ by one line.. the URL I purged. I do notice the
dbfiles don't seem to differ in size.
But, the deleted URL won't show up in search results for me (after
the purge).
I'll investigate this further.. but my gut is that the record in the
db.docdb is not being purged, but is instead 'changing state' to
Reference_Obsolete.
As for trying to re-add it.... I can't get your htdig command to work
at all.... it errors for me that it can't find the default htdig.conf
file, even though I gave it the '-c' option. This indicates some error
happening somewhere....
I'll keep digging.
Thanks.
On 13 Nov 2003, Christopher Murtagh wrote:
> Greetings htdig folks,
>
> Recently I've been trying to have htDig purge and re-index items (via a
> trigger in Postgres). The purge seems to work as I no longer see the
> item in the search results, however, when I try to re-index, I cannot
> bring the page back in unless I do a full index. I've just installed
> 3.2.0b5 hoping that this would help, but no luck. Here's some output
> from my command line attempts to get it to work:
>
>
> [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=10266860
>
> [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf
>
> ht://dig Start Time: Thu Nov 13 16:36:02 2003
>
> New server: newfind.mcgill.ca, 80
> 0:11472:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860: (changed) size = 660
> htdig: Run complete
> htdig: 1 server seen:
> htdig: newfind.mcgill.ca:80 1 document
>
> HTTP statistics
> ===============
> Persistent connections : Yes
> HEAD call before GET : Yes
> Connections opened : 2
> Connections closed : 1
> Changes of server : 0
> HTTP Requests : 3
> HTTP KBytes requested : 0.442383
> HTTP Average request time : 0 secs
> HTTP Average speed : inf KBytes/secs
>
> ht://dig End Time: Thu Nov 13 16:36:03 2003
>
> So although this thing has been purged and re-entered, it no longer
> shows up in the query results. Also, it seems that the dbs aren't being
> updated after the htperge and htdig. Again more output from my konsole
> (note the moddates and filesizes - also, the filesize of db.docdb
> doesn't change between the purge and re-index):
>
> [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads
> total 1584
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work
> -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr
> -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db
> -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts
> -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index
> -rw-r--r-- 1 root root 344064 Nov 13 16:38 db.docdb
>
> [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825
>
> [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf
>
> ht://dig Start Time: Thu Nov 13 17:05:14 2003
>
> New server: newfind.mcgill.ca, 80
> 0:11475:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825: (changed) size = 336
> htdig: Run complete
> htdig: 1 server seen:
> htdig: newfind.mcgill.ca:80 1 document
>
> HTTP statistics
> ===============
> Persistent connections : Yes
> HEAD call before GET : Yes
> Connections opened : 2
> Connections closed : 1
> Changes of server : 0
> HTTP Requests : 3
> HTTP KBytes requested : 0.442383
> HTTP Average request time : 0 secs
> HTTP Average speed : inf KBytes/secs
>
> ht://dig End Time: Thu Nov 13 17:05:14 2003
>
> [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads
> total 1584
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work
> -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr
> -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db
> -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts
> -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index
> -rw-r--r-- 1 root root 344064 Nov 13 17:05 db.docdb
>
>
> So, and info or help on this would be much appreciated.
>
> Cheers,
>
> Chris
>
> --
> Christopher Murtagh
> Enterprise Systems Administrator
> ISR / Web Communications Group
> McGill University
> Montreal, Quebec
> Canada
>
> Tel.: (514) 398-3122
> Fax: (514) 398-2017
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: ApacheCon 2003,
> 16-19 November in Las Vegas. Learn firsthand the latest
> developments in Apache, PHP, Perl, XML, Java, MySQL,
> WebDAV, and more! http://www.apachecon.com/
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Neal R. <ne...@ri...> - 2003-11-14 23:15:01
|
OK, so I forgot the #2 Question...
> Questions:
>
> 1) If you run an htdump -w before and after the purge, do the db.docs
> files differ?
{snip}
2) If you temporarily replace your start_url with the one you want to
re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for
me.. and it shows up again in search results.
This seems to indicate that there is a problem properly
parsing/interpreting the command line options:
./htdig - -s -v -m -c
I haven't debugged it yet... but it looks like:
1) This combination of command line options is not playing well together.
2) htpurge is not really deleting any data.. just marking it obsolete.
This is probably due to not properly closing the BDB files at the end of
htpurge.
I've got this kind of thing working in libhtdig & libhtdigphp. How
attached are you to your current implementation?????
I will fix it in the regular binaries.... after I figure it out ;-)
Thanks.
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Christopher M. <chr...@mc...> - 2003-11-15 00:49:51
|
On Fri, 2003-11-14 at 18:11, Neal Richter wrote: > OK, so I forgot the #2 Question... > > > Questions: > > > > 1) If you run an htdump -w before and after the purge, do the db.docs > > files differ? No. > 2) If you temporarily replace your start_url with the one you want to > re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for > me.. and it shows up again in search results. No either unfortunately. > I've got this kind of thing working in libhtdig & libhtdigphp. How > attached are you to your current implementation????? Not terribly. I can change it if needed. Here's what I'm doing: I've got a couple of stored procedures in PostgreSQL that allow me to query htdig databases and return indexes. Basically, htdig returns the following type URLs: http://newfind.mcgill.ca/indexes/ads/?AdsID=1026194 where the integer at the end of the URL is the primary key of the Ads table in our database. This allows me to do things like this: http://newfind.mcgill.ca/ads/?words=jazz+guitar which is a nifty way of doing full text indexing in PostgreSQL. The only alternative at the moment in Postgres is to use GiST indexes and t_search, which is incredibly complex and so poorly documented that even the core Postgres developers are unable to get it to work. So, I've got a PL/Perl script (with a PL/pgSQL wrapper) in Postgres that returns these integers (above) as rows. This means I can do the following type query: SELECT * FROM htsearch('"reasonable offer"', 'ads'); and it returns this: item_id | htdig_order ------------+--------------- 1014752 | 1 1026970 | 2 All of this is working very nicely. The thing I want to do now is write the stored procedures that get htdig to re-index a new item. If I can do this in PHP, that's fine. Just tell me how to build PHP/HtDig with libhtdigphp and how to use it and I'm there. At the bottom of this email are the two stored procedures and data type definitions if you are interested. The Pl/pgSQL is a required wrapper because at the moment, PL/Perl scripts can't return sets, only simpler data types. If anyone else is interested in this, I can send all the code and documentation when it it completely built. I know that there are folks on the Postgres list that are interested in this when it is done. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 CREATE TYPE htdig AS (item_id int, htdig_order int); CREATE OR REPLACE FUNCTION htdig(text, text) RETURNS SETOF htdig AS ' DECLARE result text[]; low integer; high integer; item htdig%rowtype; BEGIN result := htsearch($1,$2); low := 1; high := array_upper(result, 1); FOR i IN low..high LOOP item.item_id := result[i]; item.htdig_order := i; RETURN NEXT item; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql' STABLE STRICT; CREATE OR REPLACE FUNCTION htsearch(text, text) RETURNS text[] AS ' my $SearchTerms = $_[0]; my $DBName = $_[1]; my @Result; my $Line; $DBName =~ s/[^a-z]//g; #dbname is only allowed letters $SearchTerms =~ s/['']/ /g; # remove single quotes (prevent SQL injection) open HTSEARCH, "/usr/local/htdig/bin/htsearch -c /usr/local/htdig/conf/${DBName}.conf ''config=${DBName};words=${SearchTerms};matchesperpage=1000;'' |"; while(<HTSEARCH>) { $Line = $_; $Line =~ s/[^0-9-]//g; chomp($Line); push @Result, $Line; } close HTSEARCH; return qq/{/ . (join qq/,/, @Result) . qq/}/; ' LANGUAGE plperlu; |
|
From: Neal R. <ne...@ri...> - 2003-11-15 01:15:33
|
On 14 Nov 2003, Christopher Murtagh wrote: > On Fri, 2003-11-14 at 18:11, Neal Richter wrote: > > OK, so I forgot the #2 Question... > > > > > Questions: > > > > > > 1) If you run an htdump -w before and after the purge, do the db.docs > > > files differ? > > No. Ack! This would imply that the 'purged document' is still returned in the search results AFTER you run htpurge!! True???? I am assuming that you did something like this: 1) index pages 2) htdump -w 3) mv db.docs db.docs1 4) htpurge 5) htdump -w 6) mv db.docs db.docs2 7) diff db.docs1 db.docs2 Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Christopher M. <chr...@mc...> - 2003-11-15 01:40:17
|
On Fri, 2003-11-14 at 20:12, Neal Richter wrote: > Ack! This would imply that the 'purged document' is still returned in > the search results AFTER you run htpurge!! True???? > > I am assuming that you did something like this: > > 1) index pages > 2) htdump -w > 3) mv db.docs db.docs1 > 4) htpurge > 5) htdump -w > 6) mv db.docs db.docs2 > 7) diff db.docs1 db.docs2 Sorry, my bad. I had to do a fresh index first (I had already purged the same one earlier today). After the fresh index, I did a dump, purged a record and diffed the second dump. Here's what I got: 824a825 > 818 u:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825 t:*** BASSIST WANTED *** \ a:0 m:1068859617 s:336 H: anyone out there play bass? we're a groove/funk\ /jazz/rock improv band with influences from medeski martin wood and bela fleck to phish, \ pink floyd and hendrix... anything and everything in between... improv skills would help...\ email fa...@ho... for details... h: l:1068859617 L:0 b:2 c:1 g:0\ e: n: S: d:1025825 A: 1357a1359 > 2 u:http://newfind.mcgill.ca/indexes/ads/ t: a:2 m:1068859603 s:112334 \ H: h: l:1068859604 L:1403 b:1 c:0 g:0e: n: S: d: \ A: After the purge, it doesn't show up any more. Then after that, I tried to re-index it by doing this: [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825' | ./htdig - -s -v \ -m -c /www/htdig/install/conf/ads.conf ht://dig Start Time: Fri Nov 14 20:36:08 2003 New server: newfind.mcgill.ca, 80 0:11476:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825: (changed) size = 336 htdig: Run complete htdig: 1 server seen: htdig: newfind.mcgill.ca:80 1 document HTTP statistics =============== Persistent connections : Yes HEAD call before GET : Yes Connections opened : 2 Connections closed : 1 Changes of server : 0 HTTP Requests : 3 HTTP KBytes requested : 0.442383 HTTP Average request time : 0 secs HTTP Average speed : inf KBytes/secs ht://dig End Time: Fri Nov 14 20:36:08 2003 but it still doesn't show up in the search results (even after I changed my start_url to be 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825'). Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:20:57
|
According to Neal Richter: > 2) If you temporarily replace your start_url with the one you want to > re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for > me.. and it shows up again in search results. > > This seems to indicate that there is a problem properly > parsing/interpreting the command line options: > > ./htdig - -s -v -m -c > > I haven't debugged it yet... but it looks like: > > 1) This combination of command line options is not playing well together. No, they won't play well when you don't follow the correct syntax. The -m option MUST be followed by a file name, and this file must be a list of one or more URLs to add to the index. The htdig.html page is a tad misleading, as it shows [url_file] in brackets, which would suggest the file is optional, but the description for -m in http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in the file provided and no others." How will it get teh URL(s) if you don't provide a file? The description says nothing about reading from stdin. (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before the feature freeze.) With the syntax above, htdig will try to open a file called "-c", which it won't find, so it won't add any URLs to the index. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-11-17 21:55:14
|
> > 1) This combination of command line options is not playing well together. > > No, they won't play well when you don't follow the correct syntax. Thanks Gilles... should we put the '-' stdin option on our list of desired features for 3.2.0??? I was looking at the output of htdig.cc:usage(). There is currently no output for the -m option. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:56:37
|
According to Neal Richter: > > > 1) This combination of command line options is not playing well together. > > > > No, they won't play well when you don't follow the correct syntax. > > Thanks Gilles... should we put the '-' stdin option on our list of > desired features for 3.2.0??? > > I was looking at the output of htdig.cc:usage(). There is currently no output for the -m option. Yes, we should deal with both of these issues, plus the confusing doc entry, in the final release. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Christopher M. <chr...@mc...> - 2003-11-17 22:00:15
|
On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: > The -m option MUST be followed by a file name, and this file must > be a list of one or more URLs to add to the index. The htdig.html > page is a tad misleading, as it shows [url_file] in brackets, which > would suggest the file is optional, but the description for -m in > http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in > the file provided and no others." How will it get teh URL(s) if you don't > provide a file? The description says nothing about reading from stdin. > (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one > feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before > the feature freeze.) Hrm, a bit more than a tad misleading this is what I have in the docs that shipped with a 3.2 tarball, regarding having '-' for htdig: 'Get the list of URLs to start indexing from the STDIN. This will override the default start_url and the file supplied to -m [url_file].' http://lovelace.wcg.mcgill.ca/htdig/docs/ [htdig.html in frame] Funny thing is that the URL that you provide also has this same description, and definitely says it will read from STDIN. However, htdig is getting the file. When I add 'v's it displays the content/title and everything. It just doesn't add it to the index. > With the syntax above, htdig will try to open a file called "-c", which > it won't find, so it won't add any URLs to the index. How hard would it be to add it? I suppose I could write the url to a temporary file as well. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 22:49:13
|
According to Christopher Murtagh: > On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: > > The -m option MUST be followed by a file name, and this file must > > be a list of one or more URLs to add to the index. The htdig.html > > page is a tad misleading, as it shows [url_file] in brackets, which > > would suggest the file is optional, but the description for -m in > > http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in > > the file provided and no others." How will it get teh URL(s) if you don't > > provide a file? The description says nothing about reading from stdin. > > (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one > > feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before > > the feature freeze.) > > Hrm, a bit more than a tad misleading this is what I have in the docs > that shipped with a 3.2 tarball, regarding having '-' for htdig: > > 'Get the list of URLs to start indexing from the STDIN. This will > override the default start_url and the file supplied to -m [url_file].' > > http://lovelace.wcg.mcgill.ca/htdig/docs/ [htdig.html in frame] > > Funny thing is that the URL that you provide also has this same > description, and definitely says it will read from STDIN. > > However, htdig is getting the file. When I add 'v's it displays the > content/title and everything. It just doesn't add it to the index. Well, right you are. And, in fact, this is true, but what the documentation doesn't say is that the single "-" to get it to read from stdin must be after all the other options. Otherwise, the "-" causes htdig to stop scanning the argument list for option arguments, so it wouldn't see your -c option (even if -m wasn't swallowing it!). So, I'm guessing here that htdig is using the default htdig.conf file, instead of the one you want, and so it end up updating a different database. Is this right? In any case, you need to follow the -m with a filename, even if the final "-" overrides it. The behaviour we actually want to shoot for is what 3.1.6 does, which I think is much more consistent and logical (and better documented). See http://www.htdig.org/htdig.html to see what it should be. In the meantime, You should probably do something like this: echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026232' | ./htdig -s -v -m foo -c /www/htdig/install/conf/ads.conf - The "foo" will be ignored. > > With the syntax above, htdig will try to open a file called "-c", which > > it won't find, so it won't add any URLs to the index. > > How hard would it be to add it? I suppose I could write the url to a > temporary file as well. It shouldn't be hard to do. I just ran out of time earlier to do it before the feature freeze, as it wasn't the highest priority thing to tackle at the time (bug fixes came first). It should just take me an hour or so to compare the 3.1.6 and 3.2.0b5 htdig/htdig.cc code to see what changes are needed in the latter, then of course to code it, test it, document it and commit it. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-11-18 01:39:52
|
> Well, right you are. And, in fact, this is true, but what the > documentation doesn't say is that the single "-" to get it to read > from stdin must be after all the other options. Otherwise, the "-" > causes htdig to stop scanning the argument list for option arguments, > so it wouldn't see your -c option (even if -m wasn't swallowing it!). > So, I'm guessing here that htdig is using the default htdig.conf file, This is the behavior that I saw... I purposely deleted my default htdig.conf file on my system so that I'm absolutely sure what htconf file is being used (my custom location). And when I used Christopher's command line it complained about not finding the default conf file. Thanks Gilles. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Christopher M. <chr...@mc...> - 2003-11-18 02:34:20
|
On Mon, 2003-11-17 at 17:49, Gilles Detillieux wrote: > Well, right you are. And, in fact, this is true, but what the > documentation doesn't say is that the single "-" to get it to read > from stdin must be after all the other options. Otherwise, the "-" > causes htdig to stop scanning the argument list for option arguments, > so it wouldn't see your -c option (even if -m wasn't swallowing it!). > So, I'm guessing here that htdig is using the default htdig.conf file, > instead of the one you want, and so it end up updating a different > database. Is this right? In any case, you need to follow the -m > with a filename, even if the final "-" overrides it. > > The behaviour we actually want to shoot for is what 3.1.6 does, which > I think is much more consistent and logical (and better documented). > See http://www.htdig.org/htdig.html to see what it should be. > > In the meantime, You should probably do something like this: > > echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026232' | > ./htdig -s -v -m foo -c /www/htdig/install/conf/ads.conf - > > The "foo" will be ignored. Ahh, thank you for clarifying this and for your workaround. The above line works, and I can re-index my URL! I spent so much time trying all sorts things to get this to work. Now I can move forward on what I was doing. Thanks again! Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |