You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Christopher M. <chr...@mc...> - 2003-11-18 02:34:20
|
On Mon, 2003-11-17 at 17:49, Gilles Detillieux wrote: > Well, right you are. And, in fact, this is true, but what the > documentation doesn't say is that the single "-" to get it to read > from stdin must be after all the other options. Otherwise, the "-" > causes htdig to stop scanning the argument list for option arguments, > so it wouldn't see your -c option (even if -m wasn't swallowing it!). > So, I'm guessing here that htdig is using the default htdig.conf file, > instead of the one you want, and so it end up updating a different > database. Is this right? In any case, you need to follow the -m > with a filename, even if the final "-" overrides it. > > The behaviour we actually want to shoot for is what 3.1.6 does, which > I think is much more consistent and logical (and better documented). > See http://www.htdig.org/htdig.html to see what it should be. > > In the meantime, You should probably do something like this: > > echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026232' | > ./htdig -s -v -m foo -c /www/htdig/install/conf/ads.conf - > > The "foo" will be ignored. Ahh, thank you for clarifying this and for your workaround. The above line works, and I can re-index my URL! I spent so much time trying all sorts things to get this to work. Now I can move forward on what I was doing. Thanks again! Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Neal R. <ne...@ri...> - 2003-11-18 01:39:52
|
> Well, right you are. And, in fact, this is true, but what the > documentation doesn't say is that the single "-" to get it to read > from stdin must be after all the other options. Otherwise, the "-" > causes htdig to stop scanning the argument list for option arguments, > so it wouldn't see your -c option (even if -m wasn't swallowing it!). > So, I'm guessing here that htdig is using the default htdig.conf file, This is the behavior that I saw... I purposely deleted my default htdig.conf file on my system so that I'm absolutely sure what htconf file is being used (my custom location). And when I used Christopher's command line it complained about not finding the default conf file. Thanks Gilles. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-11-18 01:35:32
|
Ok here is what I have found: The &#XXX; entities are OK inside the db files They are munged improperly during htsearch display of the excerpt. ResultFetch::hilight is the driving function and it calls HtSGMLCodec::instance()->decode(s) which does the improper conversion ™ --> &#153; I think that this change to HtSGMLCodec.h looks important: http://cvs.sourceforge.net/viewcvs.py/htdig/htdig/htcommon/HtSGMLCodec.h?r1=1.1&r2=1.1.2.1 Tue Mar 28 04:06:34 2000 UTC (3 years, 7 months ago) by ghutchis "Differentiate between codec used for &foo; and numeric form &#nnn; Make sure encoding goes through both but decoding only goes through the preferred text form." This part of the code is pretty cheese-whizzy, so attention Geoff! Any insights? I am assuming that at some point this worked properly. I'll pound on it some more tommorow.. it looks like the populated replacements array in the myTextWordCodec object of the singleton HtSGMLCodec object is improperly done, or there is some problem in the order of calls. HtSGMLCodec.cc 29 // Similar to the HtWordCodec class. Each string may contain 30 // zero or more of words from the lists. Here we need to run 31 // it through two codecs because we might have two different forms 32 inline String encode(const String &uncoded) const 33 { return myTextWordCodec->encode(myNumWordCodec->encode(uncoded)); } 34 35 // But we only want to decode into one form i.e. &foo; NOT &#nnn; 36 String decode(const String &coded) const 37 { return myTextWordCodec->decode(coded); } Intuitively I would think that if encode is as above, that decode should be the reverse of encode: return myNumWordCodec->decode(myTextWordCodec->decode(coded)); But this makes the problem worse! &#XXX; --> &amp;XXX; Thanks. Neal On Mon, 17 Nov 2003, Neal Richter wrote: > > I am seeing some HTML entities show up in search result 'blurbs'. > > See below. Basically any entity of this form &#XXX; get translated to &#XXX; > > ™ --> &#153; > > This only happens for numbered entities below 160. > >   --> > © --> © > ® --> ® > > I'm digging for this code.. looks like > > Is there a fix for this in 3.1.X?? Anyone complain about this before???? > > Thanks! > > Example Page: > > 1 <HTML> > 2 <TITLE>Test page > 3 </TITLE> > 4 <BODY> > 5 <h1>HTDIG ™</h1> > 6 <h2>Use our software — to enhance your website</h2> > 7 <BR> > 8 HTDig ™ 3.2.0 > 9 <BR> > 10 > 11 The ht://Dig system is a complete world wide web indexing and searching system > 12 for a domain or intranet. > 13 > 14 <BR> > 15 <BR> > 16 1 ‹2 < 3 > 17 <BR> > 18 © 2003 Neal Richter > 19 <BR> > 20 © 2003 HtDig Group > 21 </BODY> > 22 </HTML> > 23 > > Search results: > > nealr@westfork htdig-3.2.0b5-bin]$ cgi-bin/htsearch -c conf/htdig.conf > Enter value for words: htdig > Content-type: text/html > > Enter value for format: long > <dl><dt><strong><a > href="http://westfork.rightnow.com/data/test/test2.html">Test page > </a></strong><img src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" > alt="*"><img src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" > alt="*"> > </dt><dd> <strong>HTDIG</strong> &#153; USE OUR SOFTWARE &#151; TO > ENHANCE YOUR WEBSITE <strong>HTDig</strong> &#153; 3.2.0 The ht://Dig > system is a complete world wide web indexing and searching system for a > domain or intranet. 1 &#139;2 < 3 © 2003 > Neal Richter © 2003 <strong>HtDig</strong> Group <br> > <em><a > href="http://westfork.rightnow.com/data/test/test2.html">http://westfork.rightnow.com/data/test/test2.html</a></em> > <font size="-1">11/17/03, 384 bytes</font> > </dd></dl> > > > Neal Richter > Knowledgebase Developer > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 > > > > > > > > ------------------------------------------------------- > This SF. Net email is sponsored by: GoToMyPC > GoToMyPC is the fast, easy and secure way to access your computer from > any Web browser or wireless device. Click here to Try it Free! > https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-11-17 23:10:22
|
I am seeing some HTML entities show up in search result 'blurbs'. See below. Basically any entity of this form &#XXX; get translated to &#XXX; ™ --> &#153; This only happens for numbered entities below 160.   --> © --> © ® --> ® I'm digging for this code.. looks like Is there a fix for this in 3.1.X?? Anyone complain about this before???? Thanks! Example Page: 1 <HTML> 2 <TITLE>Test page 3 </TITLE> 4 <BODY> 5 <h1>HTDIG ™</h1> 6 <h2>Use our software — to enhance your website</h2> 7 <BR> 8 HTDig ™ 3.2.0 9 <BR> 10 11 The ht://Dig system is a complete world wide web indexing and searching system 12 for a domain or intranet. 13 14 <BR> 15 <BR> 16 1 ‹2 < 3 17 <BR> 18 © 2003 Neal Richter 19 <BR> 20 © 2003 HtDig Group 21 </BODY> 22 </HTML> 23 Search results: nealr@westfork htdig-3.2.0b5-bin]$ cgi-bin/htsearch -c conf/htdig.conf Enter value for words: htdig Content-type: text/html Enter value for format: long <dl><dt><strong><a href="http://westfork.rightnow.com/data/test/test2.html">Test page </a></strong><img src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" alt="*"> </dt><dd> <strong>HTDIG</strong> &#153; USE OUR SOFTWARE &#151; TO ENHANCE YOUR WEBSITE <strong>HTDig</strong> &#153; 3.2.0 The ht://Dig system is a complete world wide web indexing and searching system for a domain or intranet. 1 &#139;2 < 3 © 2003 Neal Richter © 2003 <strong>HtDig</strong> Group <br> <em><a href="http://westfork.rightnow.com/data/test/test2.html">http://westfork.rightnow.com/data/test/test2.html</a></em> <font size="-1">11/17/03, 384 bytes</font> </dd></dl> Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 22:49:13
|
According to Christopher Murtagh: > On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: > > The -m option MUST be followed by a file name, and this file must > > be a list of one or more URLs to add to the index. The htdig.html > > page is a tad misleading, as it shows [url_file] in brackets, which > > would suggest the file is optional, but the description for -m in > > http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in > > the file provided and no others." How will it get teh URL(s) if you don't > > provide a file? The description says nothing about reading from stdin. > > (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one > > feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before > > the feature freeze.) > > Hrm, a bit more than a tad misleading this is what I have in the docs > that shipped with a 3.2 tarball, regarding having '-' for htdig: > > 'Get the list of URLs to start indexing from the STDIN. This will > override the default start_url and the file supplied to -m [url_file].' > > http://lovelace.wcg.mcgill.ca/htdig/docs/ [htdig.html in frame] > > Funny thing is that the URL that you provide also has this same > description, and definitely says it will read from STDIN. > > However, htdig is getting the file. When I add 'v's it displays the > content/title and everything. It just doesn't add it to the index. Well, right you are. And, in fact, this is true, but what the documentation doesn't say is that the single "-" to get it to read from stdin must be after all the other options. Otherwise, the "-" causes htdig to stop scanning the argument list for option arguments, so it wouldn't see your -c option (even if -m wasn't swallowing it!). So, I'm guessing here that htdig is using the default htdig.conf file, instead of the one you want, and so it end up updating a different database. Is this right? In any case, you need to follow the -m with a filename, even if the final "-" overrides it. The behaviour we actually want to shoot for is what 3.1.6 does, which I think is much more consistent and logical (and better documented). See http://www.htdig.org/htdig.html to see what it should be. In the meantime, You should probably do something like this: echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026232' | ./htdig -s -v -m foo -c /www/htdig/install/conf/ads.conf - The "foo" will be ignored. > > With the syntax above, htdig will try to open a file called "-c", which > > it won't find, so it won't add any URLs to the index. > > How hard would it be to add it? I suppose I could write the url to a > temporary file as well. It shouldn't be hard to do. I just ran out of time earlier to do it before the feature freeze, as it wasn't the highest priority thing to tackle at the time (bug fixes came first). It should just take me an hour or so to compare the 3.1.6 and 3.2.0b5 htdig/htdig.cc code to see what changes are needed in the latter, then of course to code it, test it, document it and commit it. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Christopher M. <chr...@mc...> - 2003-11-17 22:00:15
|
On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: > The -m option MUST be followed by a file name, and this file must > be a list of one or more URLs to add to the index. The htdig.html > page is a tad misleading, as it shows [url_file] in brackets, which > would suggest the file is optional, but the description for -m in > http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in > the file provided and no others." How will it get teh URL(s) if you don't > provide a file? The description says nothing about reading from stdin. > (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one > feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before > the feature freeze.) Hrm, a bit more than a tad misleading this is what I have in the docs that shipped with a 3.2 tarball, regarding having '-' for htdig: 'Get the list of URLs to start indexing from the STDIN. This will override the default start_url and the file supplied to -m [url_file].' http://lovelace.wcg.mcgill.ca/htdig/docs/ [htdig.html in frame] Funny thing is that the URL that you provide also has this same description, and definitely says it will read from STDIN. However, htdig is getting the file. When I add 'v's it displays the content/title and everything. It just doesn't add it to the index. > With the syntax above, htdig will try to open a file called "-c", which > it won't find, so it won't add any URLs to the index. How hard would it be to add it? I suppose I could write the url to a temporary file as well. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:56:37
|
According to Neal Richter: > > > 1) This combination of command line options is not playing well together. > > > > No, they won't play well when you don't follow the correct syntax. > > Thanks Gilles... should we put the '-' stdin option on our list of > desired features for 3.2.0??? > > I was looking at the output of htdig.cc:usage(). There is currently no output for the -m option. Yes, we should deal with both of these issues, plus the confusing doc entry, in the final release. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-11-17 21:55:14
|
> > 1) This combination of command line options is not playing well together. > > No, they won't play well when you don't follow the correct syntax. Thanks Gilles... should we put the '-' stdin option on our list of desired features for 3.2.0??? I was looking at the output of htdig.cc:usage(). There is currently no output for the -m option. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:54:33
|
According to BOOTH, Nicholas, FM: > I've noticed the following which may, or may not be "real" bugs when testing > the new Beta version: It's usually best to deal with these one at a time, as you're unlikely to find one person who can answer all questions at once. You can always try again if any of your questions slip through the cracks. > 1] The htdig.conf file seems to be very, very sensitive to whitespace at the > end of lines. In particular, with a multiline attribute as illustrated just > below, if there is white space (tested with [tab]s) after the \ character, > htdig _and_ htsearch will fail: > > server_aliases: www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net > <http://www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net> \ > > www.cib.rbs.co.uk=www.cib.rbsgrp.net > <http://www.cib.rbs.co.uk=www.cib.rbsgrp.net> This isn't really a bug, but rather it's pretty much standard behaviour among most Unix-like utilities I've seen that use backslash for multi-line configuration definitions. It's actually pretty logical when you stop and think about it. The backslash escapes the character IMMEDIATELY following it, somehow altering it's meaning (usually to include it literally). So, <backslash><newline> would mean take the newline in as part of the definition instead of it's usual meaning of the end of the definition. If you go and stick a space or tab after the backslash, then you're escaping that space or backslash, not the newline. > 3] If there is _not_ a return after the last line in the config file then > htsearch causes a cgi error. Results from apache eror log: > > Unknown char in line 224: #[Fri Nov 14 23:51:46 2003] [error] [client > 147.114.74.200] malformed header from script. Bad header=syntax error: > /var/www/cgi-bin/htsearch32 That's because you're using the wrong editor. If you use vi, it will ensure that the last line ends with a newline. ;-) Seriously, in 3.2 we moved from a simple format parsed directly in some C++ code, to a more elaborate format allowing blocks of config attributes. That required us going with a more complex parser written in flex and bison. I don't know how much control we have over how these will deal with an improperly terminated final line -- it may be out of our hands. I'll see if I can figure something out. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:28:05
|
According to Joe R. Jah: > On Fri, 14 Nov 2003, Gilles Detillieux wrote: > > endings and synonyms are generated a bit differently, and most importantly > > are built in a temporary spot then moved into place, so they don't collide > > with any existing databases until the new one is complete. soundex and > > accents use the same code as metaphone for writing the database from the > > generated word list, so they'd all potentially have the same problem. > > The scrambling of data may not have been bad enough to cause a segfault > > (yet) but may have led to corrupt databases. If you had existing > > databases for accents and/or soundex in place, built by 3.1.x, before > > regenerating them for 3.2.0b5, you should remove them and try again, > > just as for metaphone. > > Thanks; I see. It'd be nice if it could be made to quit, spitting a > warning, instead of dumping core. Better still, it should probably truncate or unlink an existing database before proceeding, especially if there's a version number mismatch. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2003-11-17 21:20:57
|
According to Neal Richter: > 2) If you temporarily replace your start_url with the one you want to > re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for > me.. and it shows up again in search results. > > This seems to indicate that there is a problem properly > parsing/interpreting the command line options: > > ./htdig - -s -v -m -c > > I haven't debugged it yet... but it looks like: > > 1) This combination of command line options is not playing well together. No, they won't play well when you don't follow the correct syntax. The -m option MUST be followed by a file name, and this file must be a list of one or more URLs to add to the index. The htdig.html page is a tad misleading, as it shows [url_file] in brackets, which would suggest the file is optional, but the description for -m in http://www.htdig.org/dev/htdig-3.2/htdig.html says "Only index the URLs in the file provided and no others." How will it get teh URL(s) if you don't provide a file? The description says nothing about reading from stdin. (htdig 3.1.6 can read from stdin, if a "-" is given, but this is one feature from 3.1.6 that I never got a chance to add to 3.2.0b5 before the feature freeze.) With the syntax above, htdig will try to open a file called "-c", which it won't find, so it won't add any URLs to the index. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Jean-Sebastien M. <jsm...@mv...> - 2003-11-17 14:35:16
|
Developers, I found a bug in v3.1.6, and probably in all future versions too. Here it is: If you enter a "restrict" value in the URL for htsearch (not in the config file), it will be compared UNENCODED to the ENCODED URLs in htdig's database. For example, the following query: http://www.mvpix.com/cgi-bin/perl/search?words=%2A&restrict=/photos/021/Netherland%20Antilles/Bonaire/Places/Urban/&method=and&sort=date&format=short Will never match: http://www.mvpix.com/photos/021/Netherland%20Antilles/Bonaire/Places/Urban/Industry/20030511-062204.jpg.html I've fixed htsearch temporarily with the following code, but some thought probably should be given on how to address this. I suspect the solution is to compare both strings in their unencoded form. My snippet: root@dent:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig htsearch/htsearch.cc 23a24 > #include "URL.h" 169,170c170,174 < if (input.exists("restrict")) < config.Add("restrict", input["restrict"]); --- > if (input.exists("restrict")) { > String restrict_url = input["restrict"]; > encodeURL(restrict_url, "-_./"); > config.Add("restrict", restrict_url); > } root@dent:/mnt/lan/src/htdig-3.1.6$ Another side-effect of using 'config.Add("restrict", input["restrict"]);' un-encoded is that any spaces will be treated as ORs later on by this line 'urllist.Create(config["restrict"], "| \t\r\n\001");'. BTW, this same bug affects the "exclude" value too. Thanks, js. On Sun, Nov 16, 2003 at 11:15:09PM -0500, Jean-Sebastien Morisset wrote: > Guys, > > Shouldn't the following change to v3.1.6 work? > > ---START--- > > root@dent:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig htsearch/htsearch.cc > 220c220 > < urllist.Create(config["restrict"], "| \t\r\n\001"); > --- >> urllist.Create(config["restrict"], "|\t\r\n\001"); > > ---END--- > > It seems to have fixed the OR problem, but now I'm not getting any > matches. I've added "<!--RESTRICT:$(RESTRICT)-->" to the nomatch.html > file, and here is what it gives me: > > <!--RESTRICT:/photos/021/Netherland Antilles--> > > So it appears the space made it in there, but I don't understand why > htsearch isn't matching the URLs with it. > > Any ideas? I've tried a whole bunch of things, but nothing has worked so > far... > > BTW, here's a snippet from rundig showing the URLs it should match: > > 307:307:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Flying/: **-*-*******-*****-********- size = 6914 > 308:308:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Automobiles/: **-*-*******-*****-********- size = 6934 > 309:309:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Industrial/: **-*-*******-*****-********- size = 6895 > 310:310:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Still%20Life/: **-*-*******-*****-********- size = 6898 > > Thanks, > js. > > On Sun, Nov 16, 2003 at 05:10:19PM -0500, Jean-Sebastien Morisset wrote: >> Hi, >> >> I'm trying to use a restrict value with spaces - for example: >> >> restrict=/photos/021/Netherland%20Antilles/Bonaire/ >> >> Unfortunately, htdig v3.1.6 reads this as "/photos/021/Netherland" OR >> "Antilles/Bonaire/" when I would like it to read it as a single string. >> Is there a way to have it treat spaces as part of the string? > > ------------------------------------------------------- > This SF. Net email is sponsored by: GoToMyPC > GoToMyPC is the fast, easy and secure way to access your computer from > any Web browser or wireless device. Click here to Try it Free! > https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl -- Jean-Sebastien Morisset, Sr. UNIX Administrator <jsm...@mv...> Personal Home Page <http://jsmoriss.mvlan.net/> JS & Melanie's Homebrewery <http://brewery.mvlan.net/> Underwater and Travel Photographs <http://www.mvpix.com/> |
|
From: Jean-Sebastien M. <jsm...@mv...> - 2003-11-17 04:15:58
|
Guys, Shouldn't the following change to v3.1.6 work? ---START--- root@dent:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig htsearch/htsearch.cc 220c220 < urllist.Create(config["restrict"], "| \t\r\n\001"); --- > urllist.Create(config["restrict"], "|\t\r\n\001"); ---END--- It seems to have fixed the OR problem, but now I'm not getting any matches. I've added "<!--RESTRICT:$(RESTRICT)-->" to the nomatch.html file, and here is what it gives me: <!--RESTRICT:/photos/021/Netherland Antilles--> So it appears the space made it in there, but I don't understand why htsearch isn't matching the URLs with it. Any ideas? I've tried a whole bunch of things, but nothing has worked so far... BTW, here's a snippet from rundig showing the URLs it should match: 307:307:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Flying/: **-*-*******-*****-********- size = 6914 308:308:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Automobiles/: **-*-*******-*****-********- size = 6934 309:309:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Industrial/: **-*-*******-*****-********- size = 6895 310:310:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Still%20Life/: **-*-*******-*****-********- size = 6898 Thanks, js. On Sun, Nov 16, 2003 at 05:10:19PM -0500, Jean-Sebastien Morisset wrote: > Hi, > > I'm trying to use a restrict value with spaces - for example: > > restrict=/photos/021/Netherland%20Antilles/Bonaire/ > > Unfortunately, htdig v3.1.6 reads this as "/photos/021/Netherland" OR > "Antilles/Bonaire/" when I would like it to read it as a single string. > Is there a way to have it treat spaces as part of the string? -- Jean-Sebastien Morisset, Sr. UNIX Administrator <jsm...@mv...> Personal Home Page <http://jsmoriss.mvlan.net/> JS & Melanie's Homebrewery <http://brewery.mvlan.net/> Underwater and Travel Photographs <http://www.mvpix.com/> |
|
From: Gabriele B. <bar...@in...> - 2003-11-16 22:29:55
|
At 22.14 16/11/2003 +0000, BOOTH, Nicholas, FM wrote: >Lastly, are the cookies.txt mechanism and check_unique_md5 actually known >to work? I answer to the 'cookies.txt' question. I programmed it, trying to follow the standards and it should work fine. Please if you find any bug, let us know. Thanks a lot for your precious info. Ciao, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: BOOTH, N. F. <Nic...@rb...> - 2003-11-16 22:15:22
|
I've noticed the following which may, or may not be "real" bugs when testing the new Beta version: 1] The htdig.conf file seems to be very, very sensitive to whitespace at the end of lines. In particular, with a multiline attribute as illustrated just below, if there is white space (tested with [tab]s) after the \ character, htdig _and_ htsearch will fail: server_aliases: www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net <http://www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net> \ www.cib.rbs.co.uk=www.cib.rbsgrp.net <http://www.cib.rbs.co.uk=www.cib.rbsgrp.net> 2] I can't seem to get any sensible changes to results with htsearch using url_seed_score url_seed_score: cbfm|fmintranet|cib. *500,+1000 \ manufacturing.|retail|technology.|wealthmanagement.|rbs.|group *.1, Even stupidly high factors don't seem to have an effect (like 100,000). (tried with and without commas and spaces separating values) 3] If there is _not_ a return after the last line in the config file then htsearch causes a cgi error. Results from apache eror log: Unknown char in line 224: #[Fri Nov 14 23:51:46 2003] [error] [client 147.114.74.200] malformed header from script. Bad header=syntax error: /var/www/cgi-bin/htsearch32 4] If you search for a phrase and it forms part of a longer string then the results are not highlighted in the extract displayed. This is most apparent when the second word is singular, but it finds a plural result. Search for "animal feedstuff" finds "animal feedstuff"s --- no highlight finds "animal feedstuff" --- highlight as expected Hope this makes sense! Lastly, are the cookies.txt mechanism and check_unique_md5 actually known to work? Running 3.2.0b5 on: Linux lon3561xus 2.4.9-31smp #1 SMP Tue Feb 26 06:55:00 EST 2002 i686 unknown It has happily indexed multi server intranet with about <50k pages, including parseing PDFs and Word docs - but, as ever, seems limited by my web server responses/network latentcy, so this took over 18 hours. I'm really very happy with what I've seen so far - especially the phrase search which is crucial for me to keep this product in place. Best regards Nicholas Booth Royal Bank of Scotland, Corporate Banking 280 Bishopsgate London *********************************************************************************** This e-mail is intended only for the addressee named above. As this e-mail may contain confidential or privileged information, if you are not the named addressee, you are not authorised to retain, read, copy or disseminate this message or any part of it. The Royal Bank of Scotland plc is registered in Scotland No 90312 Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB Regulated by the Financial Services Authority Visit our website at http://www.rbs.co.uk/CBFM/ *********************************************************************************** |
|
From: Joe R. J. <jj...@cl...> - 2003-11-15 04:09:04
|
On Fri, 14 Nov 2003, Gilles Detillieux wrote:
> Date: Fri, 14 Nov 2003 12:08:00 -0600 (CST)
> From: Gilles Detillieux <gr...@sc...>
> To: Joe R. Jah <jj...@cl...>
> Cc: "ht://Dig developers list" <htd...@li...>
> Subject: Re: [htdig-dev] 3.2.0b5 Testing
>
> According to Joe R. Jah:
> > On Thu, 13 Nov 2003, Gilles Detillieux wrote:
> > > According to Joe R. Jah:
> > > > htfuzzy metaphone dumps core, but it works fine with endings, etc.
> > >
> > > Just a hunch, but I'd guess that you had a metaphone database left over
> > > from 3.1.6, and that the newer DB code in 3.2.0b5 doesn't like it. We
> > > had put some tests for this in some of the other programs, but maybe not
> > > this one. Try deleting the databases and running htfuzzy metaphone again.
> > > The same would probably go for soundex and accents databases, if you use
> > > either of those.
> >
> > Spot on! I had an older db.metaphone.db left over; none of the other
> > supported algorithms had any problem:
> > soundex
> > accents
> > endings
> > synonyms
>
> endings and synonyms are generated a bit differently, and most importantly
> are built in a temporary spot then moved into place, so they don't collide
> with any existing databases until the new one is complete. soundex and
> accents use the same code as metaphone for writing the database from the
> generated word list, so they'd all potentially have the same problem.
> The scrambling of data may not have been bad enough to cause a segfault
> (yet) but may have led to corrupt databases. If you had existing
> databases for accents and/or soundex in place, built by 3.1.x, before
> regenerating them for 3.2.0b5, you should remove them and try again,
> just as for metaphone.
Thanks; I see. It'd be nice if it could be made to quit, spitting a
warning, instead of dumping core.
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|
|
From: Lachlan A. <lh...@us...> - 2003-11-15 03:28:10
|
Greetings Neil, I don't know 3.16 well, but I think the answer is "yes, much". 3.2 has an entry for every occurrence of every word in every document. One of my (long term) plans is to put in the option to record words=20 only once per document like 3.1.6 did. That obviously means phrase=20 searching is impossible, but I could imagine lots of people accepting=20 the tradeoff. Neal Richter is also planning to optimise the database=20 format in due course. Cheers, Lachlan On Sat, 15 Nov 2003 08:53, Neil Kohl wrote: > i/o problems haven't been an issue with 3.1.6. Does 3.2 do more > writing? --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Christopher M. <chr...@mc...> - 2003-11-15 01:40:17
|
On Fri, 2003-11-14 at 20:12, Neal Richter wrote: > Ack! This would imply that the 'purged document' is still returned in > the search results AFTER you run htpurge!! True???? > > I am assuming that you did something like this: > > 1) index pages > 2) htdump -w > 3) mv db.docs db.docs1 > 4) htpurge > 5) htdump -w > 6) mv db.docs db.docs2 > 7) diff db.docs1 db.docs2 Sorry, my bad. I had to do a fresh index first (I had already purged the same one earlier today). After the fresh index, I did a dump, purged a record and diffed the second dump. Here's what I got: 824a825 > 818 u:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825 t:*** BASSIST WANTED *** \ a:0 m:1068859617 s:336 H: anyone out there play bass? we're a groove/funk\ /jazz/rock improv band with influences from medeski martin wood and bela fleck to phish, \ pink floyd and hendrix... anything and everything in between... improv skills would help...\ email fa...@ho... for details... h: l:1068859617 L:0 b:2 c:1 g:0\ e: n: S: d:1025825 A: 1357a1359 > 2 u:http://newfind.mcgill.ca/indexes/ads/ t: a:2 m:1068859603 s:112334 \ H: h: l:1068859604 L:1403 b:1 c:0 g:0e: n: S: d: \ A: After the purge, it doesn't show up any more. Then after that, I tried to re-index it by doing this: [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825' | ./htdig - -s -v \ -m -c /www/htdig/install/conf/ads.conf ht://dig Start Time: Fri Nov 14 20:36:08 2003 New server: newfind.mcgill.ca, 80 0:11476:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825: (changed) size = 336 htdig: Run complete htdig: 1 server seen: htdig: newfind.mcgill.ca:80 1 document HTTP statistics =============== Persistent connections : Yes HEAD call before GET : Yes Connections opened : 2 Connections closed : 1 Changes of server : 0 HTTP Requests : 3 HTTP KBytes requested : 0.442383 HTTP Average request time : 0 secs HTTP Average speed : inf KBytes/secs ht://dig End Time: Fri Nov 14 20:36:08 2003 but it still doesn't show up in the search results (even after I changed my start_url to be 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825'). Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
|
From: Neal R. <ne...@ri...> - 2003-11-15 01:15:33
|
On 14 Nov 2003, Christopher Murtagh wrote: > On Fri, 2003-11-14 at 18:11, Neal Richter wrote: > > OK, so I forgot the #2 Question... > > > > > Questions: > > > > > > 1) If you run an htdump -w before and after the purge, do the db.docs > > > files differ? > > No. Ack! This would imply that the 'purged document' is still returned in the search results AFTER you run htpurge!! True???? I am assuming that you did something like this: 1) index pages 2) htdump -w 3) mv db.docs db.docs1 4) htpurge 5) htdump -w 6) mv db.docs db.docs2 7) diff db.docs1 db.docs2 Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Christopher M. <chr...@mc...> - 2003-11-15 00:49:51
|
On Fri, 2003-11-14 at 18:11, Neal Richter wrote: > OK, so I forgot the #2 Question... > > > Questions: > > > > 1) If you run an htdump -w before and after the purge, do the db.docs > > files differ? No. > 2) If you temporarily replace your start_url with the one you want to > re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for > me.. and it shows up again in search results. No either unfortunately. > I've got this kind of thing working in libhtdig & libhtdigphp. How > attached are you to your current implementation????? Not terribly. I can change it if needed. Here's what I'm doing: I've got a couple of stored procedures in PostgreSQL that allow me to query htdig databases and return indexes. Basically, htdig returns the following type URLs: http://newfind.mcgill.ca/indexes/ads/?AdsID=1026194 where the integer at the end of the URL is the primary key of the Ads table in our database. This allows me to do things like this: http://newfind.mcgill.ca/ads/?words=jazz+guitar which is a nifty way of doing full text indexing in PostgreSQL. The only alternative at the moment in Postgres is to use GiST indexes and t_search, which is incredibly complex and so poorly documented that even the core Postgres developers are unable to get it to work. So, I've got a PL/Perl script (with a PL/pgSQL wrapper) in Postgres that returns these integers (above) as rows. This means I can do the following type query: SELECT * FROM htsearch('"reasonable offer"', 'ads'); and it returns this: item_id | htdig_order ------------+--------------- 1014752 | 1 1026970 | 2 All of this is working very nicely. The thing I want to do now is write the stored procedures that get htdig to re-index a new item. If I can do this in PHP, that's fine. Just tell me how to build PHP/HtDig with libhtdigphp and how to use it and I'm there. At the bottom of this email are the two stored procedures and data type definitions if you are interested. The Pl/pgSQL is a required wrapper because at the moment, PL/Perl scripts can't return sets, only simpler data types. If anyone else is interested in this, I can send all the code and documentation when it it completely built. I know that there are folks on the Postgres list that are interested in this when it is done. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 CREATE TYPE htdig AS (item_id int, htdig_order int); CREATE OR REPLACE FUNCTION htdig(text, text) RETURNS SETOF htdig AS ' DECLARE result text[]; low integer; high integer; item htdig%rowtype; BEGIN result := htsearch($1,$2); low := 1; high := array_upper(result, 1); FOR i IN low..high LOOP item.item_id := result[i]; item.htdig_order := i; RETURN NEXT item; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql' STABLE STRICT; CREATE OR REPLACE FUNCTION htsearch(text, text) RETURNS text[] AS ' my $SearchTerms = $_[0]; my $DBName = $_[1]; my @Result; my $Line; $DBName =~ s/[^a-z]//g; #dbname is only allowed letters $SearchTerms =~ s/['']/ /g; # remove single quotes (prevent SQL injection) open HTSEARCH, "/usr/local/htdig/bin/htsearch -c /usr/local/htdig/conf/${DBName}.conf ''config=${DBName};words=${SearchTerms};matchesperpage=1000;'' |"; while(<HTSEARCH>) { $Line = $_; $Line =~ s/[^0-9-]//g; chomp($Line); push @Result, $Line; } close HTSEARCH; return qq/{/ . (join qq/,/, @Result) . qq/}/; ' LANGUAGE plperlu; |
|
From: Neal R. <ne...@ri...> - 2003-11-14 23:15:01
|
OK, so I forgot the #2 Question...
> Questions:
>
> 1) If you run an htdump -w before and after the purge, do the db.docs
> files differ?
{snip}
2) If you temporarily replace your start_url with the one you want to
re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for
me.. and it shows up again in search results.
This seems to indicate that there is a problem properly
parsing/interpreting the command line options:
./htdig - -s -v -m -c
I haven't debugged it yet... but it looks like:
1) This combination of command line options is not playing well together.
2) htpurge is not really deleting any data.. just marking it obsolete.
This is probably due to not properly closing the BDB files at the end of
htpurge.
I've got this kind of thing working in libhtdig & libhtdigphp. How
attached are you to your current implementation?????
I will fix it in the regular binaries.... after I figure it out ;-)
Thanks.
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Neal R. <ne...@ri...> - 2003-11-14 22:50:54
|
Questions:
1) If you run an htdump -w before and after the purge, do the db.docs
files differ?
For me, they differ by one line.. the URL I purged. I do notice the
dbfiles don't seem to differ in size.
But, the deleted URL won't show up in search results for me (after
the purge).
I'll investigate this further.. but my gut is that the record in the
db.docdb is not being purged, but is instead 'changing state' to
Reference_Obsolete.
As for trying to re-add it.... I can't get your htdig command to work
at all.... it errors for me that it can't find the default htdig.conf
file, even though I gave it the '-c' option. This indicates some error
happening somewhere....
I'll keep digging.
Thanks.
On 13 Nov 2003, Christopher Murtagh wrote:
> Greetings htdig folks,
>
> Recently I've been trying to have htDig purge and re-index items (via a
> trigger in Postgres). The purge seems to work as I no longer see the
> item in the search results, however, when I try to re-index, I cannot
> bring the page back in unless I do a full index. I've just installed
> 3.2.0b5 hoping that this would help, but no luck. Here's some output
> from my command line attempts to get it to work:
>
>
> [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=10266860
>
> [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf
>
> ht://dig Start Time: Thu Nov 13 16:36:02 2003
>
> New server: newfind.mcgill.ca, 80
> 0:11472:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1026860: (changed) size = 660
> htdig: Run complete
> htdig: 1 server seen:
> htdig: newfind.mcgill.ca:80 1 document
>
> HTTP statistics
> ===============
> Persistent connections : Yes
> HEAD call before GET : Yes
> Connections opened : 2
> Connections closed : 1
> Changes of server : 0
> HTTP Requests : 3
> HTTP KBytes requested : 0.442383
> HTTP Average request time : 0 secs
> HTTP Average speed : inf KBytes/secs
>
> ht://dig End Time: Thu Nov 13 16:36:03 2003
>
> So although this thing has been purged and re-entered, it no longer
> shows up in the query results. Also, it seems that the dbs aren't being
> updated after the htperge and htdig. Again more output from my konsole
> (note the moddates and filesizes - also, the filesize of db.docdb
> doesn't change between the purge and re-index):
>
> [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads
> total 1584
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work
> -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr
> -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db
> -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts
> -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index
> -rw-r--r-- 1 root root 344064 Nov 13 16:38 db.docdb
>
> [root@lovelace bin]# ./htpurge -c /www/htdig/install/conf/ads.conf -u http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825
>
> [root@lovelace bin]# echo 'http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825' | ./htdig - -s -v -m -c /www/htdig/install/conf/ads.conf
>
> ht://dig Start Time: Thu Nov 13 17:05:14 2003
>
> New server: newfind.mcgill.ca, 80
> 0:11475:0:http://newfind.mcgill.ca/indexes/ads/?AdsID=1025825: (changed) size = 336
> htdig: Run complete
> htdig: 1 server seen:
> htdig: newfind.mcgill.ca:80 1 document
>
> HTTP statistics
> ===============
> Persistent connections : Yes
> HEAD call before GET : Yes
> Connections opened : 2
> Connections closed : 1
> Changes of server : 0
> HTTP Requests : 3
> HTTP KBytes requested : 0.442383
> HTTP Average request time : 0 secs
> HTTP Average speed : inf KBytes/secs
>
> ht://dig End Time: Thu Nov 13 17:05:14 2003
>
> [root@lovelace bin]# ls -ltr /www/htdig/install/var/ads
> total 1584
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.excerpts.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docs.index.work
> -rw-r--r-- 1 root root 24576 Nov 13 13:35 db.docdb.work
> -rw-r--r-- 1 root root 16384 Nov 13 16:14 db.words.db_weakcmpr
> -rw-r--r-- 1 root root 619520 Nov 13 16:36 db.words.db
> -rw-r--r-- 1 root root 655360 Nov 13 16:36 db.excerpts
> -rw-r--r-- 1 root root 172032 Nov 13 16:36 db.docs.index
> -rw-r--r-- 1 root root 344064 Nov 13 17:05 db.docdb
>
>
> So, and info or help on this would be much appreciated.
>
> Cheers,
>
> Chris
>
> --
> Christopher Murtagh
> Enterprise Systems Administrator
> ISR / Web Communications Group
> McGill University
> Montreal, Quebec
> Canada
>
> Tel.: (514) 398-3122
> Fax: (514) 398-2017
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: ApacheCon 2003,
> 16-19 November in Las Vegas. Learn firsthand the latest
> developments in Apache, PHP, Perl, XML, Java, MySQL,
> WebDAV, and more! http://www.apachecon.com/
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Neil K. <nk...@ma...> - 2003-11-14 21:53:57
|
Here's a performance quirk I've found.=20 I was running some timed tests on our development box and noticed that the = 3.2.0b5 release was causing performance to suffer for logged in users. = System load and memory were OK. But the web site and databases were on the = same disk so things were getting disk-bound. Moving the databases and = logging to a separate disk solved the problem and I'm rerunning the tests. i/o problems haven't been an issue with 3.1.6. Does 3.2 do more writing? >> Job well done! It configured/built/ran out of the box on my BSD/OS-4.3.= 1 >> with gcc 2.95.3 like a charm; It took only 96 minutes to index my = site;) > How does this compare to earlier 3.2.0b4 snapshots, and to 3.1.6? > Is 3.2.0b5 significantly slower than 3.1 releases, and is it better or > worse than earlier 3.2 betas? Neil Kohl Manager, ACP Online =20 American College of Physicians nk...@ac... 215.351.2638, 800.523.1546 x2638 |
|
From: Neal R. <ne...@ri...> - 2003-11-14 21:50:15
|
> That raises a good question as to feature-freeze status. Now that > 3.2.0b5 is out, are we still in a feature freeze, or are new features > still allowed? If the freeze is still on, then we need to vote in > any new features. > > This particular one is a pretty simple addition which doesn't break > anything as far as I can see. The description above could be the > basis of the defaults.cc entry, which the patch doesn't have yet. > I vote +1 as well. As long as the default is OFF.... +1 for me. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Andy L. <al...@ju...> - 2003-11-14 19:00:20
|
Thanks Gilles, I did over look that user_agent attribute and have added that. Any chances on speeding up indexing? I had to interupt indexing because of the slowness. I can't even index 30k of pages in a 24 hour period. That hurts. Got to have some speed when indexing. Regards, Andy On Fri, 14 Nov 2003, Gilles Detillieux wrote: > According to Andy Lewis: > > Look like the robots.txt file isn't being parsed properly. > > > > I've used the > > <http://www.jumboclassifieds.com/~alewis/attrs.html#robotstxt_name> > > robotstxt_name tag and added the same name to my robots.txt file and I > > still see the > > default htdig name when indexing. > > > > Any ideas? Running the lastest beta. Downloaded today. > > It seems to me you're confusing the robotstxt_name attribute with > the user_agent attribute. If by "I still see the default htdig name" > you mean that's what's showing up in the access_log, then you want to > change user_agent. > > See http://www.htdig.org/dev/htdig-3.2/attrs.html#user_agent > > There is a bug in 3.2.0b5 in that it doesn't correctly handle an empty > Disallow directive, but that doesn't seem to be the issue here. The fix > for this latter bug is at > > ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/robots.0 > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > |