You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Robert R. <ri...@li...> - 2004-07-07 19:26:40
|
Hello List, the main feature that 3.2. will bring over 3.1 is to allow 'phrase searching', whicxh means that the indexing process will slow up to 75% (compared 3.2.0b6 to 3.1.6) I haven't looked at the code in depth, so I don't know how hard it would be to implement a config file option that would switch off that new behaviour, making 3.2.0b6 behave exactly like 3.1.6? Would that be worth a try (it could save me from packaging two different versions for Sarge release)? What do you think, is there any speed (and acceptance) to be gained by such an option? Robert |
From: Lachlan A. <lh...@us...> - 2004-07-07 13:41:50
|
Greetings Joe, OK, this might take some playing... Could you please copy the attached file to htcommon/URL.cc and send me the output? Unfortunately, I've recently upgraded to Apache 2, which isn't supported by the test suite :( Does anyone else find make TESTS=t_htdig check fails on any other platforms? Thanks for your help, Joe! Lachlan On Thu, 24 Jun 2004 03:45 am, Joe R. Jah wrote: > On Wed, 23 Jun 2004, Lachlan Andrew wrote: > > I think that the attached patch should fix it. If so, I'll have > > to work out why t_htdig_local *was* working... > > No, it doesn't: -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Lachlan A. <lh...@us...> - 2004-07-07 13:14:19
|
Greetings Gilles, Very true. I didn't mean to under-state the progress you made towards fixing that bug. Perhaps the bug status should be downgraded from "major" to "minor", rather than being closed... Cheers, Lachlan On Wed, 7 Jul 2004 01:25 pm, Gilles Detillieux wrote: > Well, I suspect there's a judgement call involved in whether bug > 244867 is fixed or not. If any performance slower than 3.1.6 is > going to be viewed as unacceptable, then no, it's not fixed yet. > However, 3.2.0b6 does fix the biggest source of bad performance in > 3.2.0b5, i.e. the repeated calls to regcomp(). -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Gilles D. <gr...@sc...> - 2004-07-07 03:26:30
|
Well, I suspect there's a judgement call involved in whether bug 244867 is fixed or not. If any performance slower than 3.1.6 is going to be viewed as unacceptable, then no, it's not fixed yet. However, 3.2.0b6 does fix the biggest source of bad performance in 3.2.0b5, i.e. the repeated calls to regcomp(). The overall performance will of course vary considerably from machine to machine and configuration to configuration, but now we're getting performance that's in the ballpark of 1.5 to 4 times the runtime of 3.1.6, as opposed to the 4 to 50 times slower we were often seeing before. So, while there's still a lot of room for improvement in performance, I don't think we'll get nearly as many complaints about it as we have in the past. I think all we can do is leave it to the debian powers that be to make that judgement call for themselves, and decide whether 3.2.0b6 is acceptable as a stable release, rather than an experimental one. According to lac...@ip...: > Thanks, Robert. > > The latest beta doesn't solve bug 244867, but nor will 3.2.0-release. It > was decided at the recent committee meeting that we will postpone the optimisation > until 3.2.1 (unless we stumble across any particular bug causing the problem). > > I'd strongly suggest that Sarge ship with packages for both versions, to > allow users who need features in 3.2 to use them, while allowing those who > need speed to have it. Would that be possible? > > Cheers, > Lachlan > > >I have first tentatively built a debian package of htdig 3.2.0b6 > > > >Available here: > > > >http://users.linuxbourg.ch/ribnitz/debian/htdig_3.2.0b6-1_i386.deb > > > >It should solve quite a few problems, please do test, those of you who > >are on debian.. > > > >I am usure however, whether it solves the following problem: > > > >http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=244867 > > > >Once I have a confirmation that it works better than the previuous one, > > >I'll start closing bugs. > > > >Robert -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: <lac...@ip...> - 2004-07-07 00:48:51
|
Thanks, Robert. The latest beta doesn't solve bug 244867, but nor will 3.2.0-release. It was decided at the recent committee meeting that we will postpone the optimisation until 3.2.1 (unless we stumble across any particular bug causing the problem). I'd strongly suggest that Sarge ship with packages for both versions, to allow users who need features in 3.2 to use them, while allowing those who need speed to have it. Would that be possible? Cheers, Lachlan >I have first tentatively built a debian package of htdig 3.2.0b6 > >Available here: > >http://users.linuxbourg.ch/ribnitz/debian/htdig_3.2.0b6-1_i386.deb > >It should solve quite a few problems, please do test, those of you who >are on debian.. > >I am usure however, whether it solves the following problem: > >http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=244867 > >Once I have a confirmation that it works better than the previuous one, >I'll start closing bugs. > >Robert > > > >------------------------------------------------------- >This SF.Net email sponsored by Black Hat Briefings & Training. >Attend Black Hat Briefings & Training, Las Vegas July 24-29 - >digital self defense, top technical experts, no vendor pitches, >unmatched networking opportunities. Visit www.blackhat.com >_______________________________________________ >ht://Dig Developer mailing list: >htd...@li... >List information (subscribe/unsubscribe, etc.) >https://lists.sourceforge.net/lists/listinfo/htdig-dev |
From: <Gle...@co...> - 2004-07-06 20:34:16
|
Hi People, I've just written an addon external parsing script that goes with doc2html.pl, allowing htdig to index word documents using the wvHtml application (part of http://sourceforge.net/projects/wvware/ ) If anyone's interested, it's attached, along with a modifed doc2html.pl file. (See attached file: doc2html.pl)(See attached file: word2html.pl) Needs work, but is good enough for my purposes now. Figured it could be handy for someone else. Cheers Glen Ogilvie |
From: Robert R. <ri...@li...> - 2004-07-06 18:14:17
|
Hello, I have first tentatively built a debian package of htdig 3.2.0b6 Available here: http://users.linuxbourg.ch/ribnitz/debian/htdig_3.2.0b6-1_i386.deb It should solve quite a few problems, please do test, those of you who are on debian.. I am usure however, whether it solves the following problem: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=244867 Once I have a confirmation that it works better than the previuous one, I'll start closing bugs. Robert |
From: <End...@np...> - 2004-07-06 04:02:47
|
I will be out of the office starting 05/07/2004 and will not return until 19/07/2004. I will be on leave for 2 weeks commencing 5 July 04 and will return on 19 July 04. This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please notify the sender and then delete the message. Views expressed in this message may be those of the individual sender, and are not necessarily the views of the NSW Department of Environment and Conservation. |
From: Tony H. <th...@wr...> - 2004-07-05 23:56:17
|
Hi All For the ht://dig webmaster, the contribs page for Guides has a broken link at the top. Search This! Searching Your Dynamic Site Using PHP3 and ht://Dig <http://www.devshed.com/Server_Side/PHP/Search_This/page1.html> by Colin Viebrock This link goes to a default page on DevShed and misses the article by Colin. A search of devshed for ht://dig reveals two related articles: http://www.devshed.com/c/a/PHP/Search-This%21 - (this is the one that was meant to be linked above) http://www.devshed.com/c/a/Administration/Site-Search-with-HTDIG a more recent article Not sure of the value of the second document as yet. cheers Tony |
From: Ted Stresen-R. <ted...@ma...> - 2004-07-05 08:03:41
|
Hi, I was in need of a rundig.sh script that was a little more flexible and easy to use than what's in the contrib directory. I wanted one that would allow me to override default values with command line parameters and that would fail gracefully when there was a problem. I spent about two or three days working on a shell script that would meet these needs and produced the attached. Toward the end of the exercise, while consulting rundig in the /installdir directory for help on how to do things like iterate through the parameters, I realized that rundig more or less does what I need already, and with more elegance (using 'shift' when -c is encountered rather than trying to use sed, like I did...) so I felt rather humbled, but decided to send along my script should someone find it useful. Pointers are always appreciated, but keep in mind that I'm still marveling over installdir/rundig... Ted Stresen-Reuter ====================== #! /bin/sh # This is the directory where all the htdig stuff lives BASEDIR="/Library/htdig" # Report destination - who to send the report to REPORT_DEST="web...@hi..." export REPORT_DEST # This is the name (with complete path) of the conf file to use CONF="$BASEDIR/conf/htdig.conf" ###### You shouldn't need to modify anything below this line debug=0 stats= opts= alt= main= for arg do case "$arg" in -a) alt="$arg" ;; -s) stats="$arg" ;; -i) main="$arg" ;; *) opts="$opts $arg" ;; # e.g. -v or -c config esac done # if a config file exists, use it, otherwise, use the default configfile=`echo $opts | sed 's/.*\-c *\([^ ][^ ]*\).*/\1/g'` if test -z "$configfile"; then CONF=$CONF else CONF=$configfile fi if test -z "$REPORT_DEST"; then echo "No destination has been set for the report. Please set DESTINATION to a valid email address." exit 1 else if test -d $BASEDIR; then if test -r $CONF; then # Get the db dir new_db_dir=`awk '/^[^#a-zA-Z]*database_dir/ { print $NF }' < $CONF` if [ "$new_db_dir" != "" ]; then DBDIR=$new_db_dir if test -d $DBDIR && test $debug -lt 1; then # This is the name of a temporary report file REPORT=$DBDIR/htdig.report # This is the subject line of the report SUBJECT="htdig report" # This is a little intro to the report echo "This report produced by the script $0 running on `hostname` with the following parameters: $main $alt $stats $opts" > $REPORT ##### Dig phase STARTTIME=`date` echo "Start time: $STARTTIME" echo "rundig: Start time: $STARTTIME" >> $REPORT # by default we use -a? $BASEDIR/bin/htdig $main $alt $stats $opts TIME=`date` echo "Done Digging: $TIME" echo "htdig: Done Digging: $TIME" >> $REPORT ##### Purge Phase # (clean out broken links, etc.) $BASEDIR/bin/htpurge $alt $opts >> $REPORT TIME=`date` echo "Done Purging: $TIME" echo "htpurge: Done Purging: $TIME" >> $REPORT # Move 'em into place. cp $DBDIR/db.docs.index.work $DBDIR/db.docs.index cp $DBDIR/db.docdb.work $DBDIR/db.docdb cp $DBDIR/db.excerpts.work $DBDIR/db.excerpts cp $DBDIR/db.words.db.work $DBDIR/db.words.db test -f $DBDIR/db.words.db.work_weakcmpr && cp $DBDIR/db.words.db.work_weakcmpr $DBDIR/db.words.db_weakcmpr ##### Fuzzy Phase $BASEDIR/bin/htfuzzy $opts endings >> $REPORT $BASEDIR/bin/htfuzzy $opts synonyms >> $REPORT TIME=`date` echo "Done Fuzzying: $TIME" echo "htfuzzy: Done Fuzzying: $TIME" >> $REPORT ##### Cleanup Phase # To get additional statistics, uncomment the following line $BASEDIR/bin/htstat $opts >>$REPORT END=`date` echo "End time: $END" echo "rundig: End time: $END" >> $REPORT echo # Grab the important statistics from the report file # All lines begin with htdig: or htmerge: fgrep "htdig:" $REPORT echo fgrep "htpurge:" $REPORT echo fgrep "htfuzzy:" $REPORT echo fgrep "rundig:" $REPORT echo echo "Total lines in $REPORT: `wc -l $REPORT`" # Send out the report ... mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT # ... and clean up rm $REPORT exit 0 else if test $debug -eq 1 -o $debug -gt 1; then echo $CONF exit 0 else echo "No database_dir has been specified in $CONF" exit 1 fi fi else echo "$DBDIR is NOT a directory" exit 1 fi else if test -z "$configfile"; then echo "$CONF is NOT readable." else echo "$CONF is NOT readable. You need to specify the complete path to the config file when passing it as a parameter to this script." fi exit 1 fi else echo "$BASEDIR is NOT a directory" exit 1 fi fi |
From: <End...@np...> - 2004-07-05 04:26:39
|
I will be out of the office starting 05/07/2004 and will not return until 19/07/2004. I will be on leave for 2 weeks commencing 5 July 04 and will return on 19 July 04. This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please notify the sender and then delete the message. Views expressed in this message may be those of the individual sender, and are not necessarily the views of the NSW Department of Environment and Conservation. |
From: Geoff H. <ghu...@us...> - 2004-07-04 07:17:06
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b6: Scheduled: 31 May 2004. 3.2.0b5: Released: 10 Nov 2003. 3.2.0b4: Cancelled. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: KNOWN BUGS: (none serious. See <http://sourceforge.net/tracker/?atid=104593&group_id=4593&func=browse>.) PENDING PATCHES (available but need work): * Gilles's configuration parsing patches need testing before committing. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. (Is this still pending??) NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
From: Jim <li...@yg...> - 2004-07-04 01:47:24
|
On Fri, 2 Jul 2004, Eli White wrote: > >Based on a quick look, I think it is an easy fix. However you would need > >to touch the source and rebuild at least htsearch. My assumption is that > >if you replace the 100 with 100.0 at line 314 of htsearch/Display.cc (for > >ht://Dig 3.1.6), then you will avoid the overflow and correct the problem > >you are encountering. I haven't actually tested this. If you are able to > >test this and it solves the problem, please let us know. You might also > >consider submitting a bug report if you get a chance just to make sure > >this issue doesn't get lost in the shuffle. > > Hey, whaddya know :) It works. Thanks for the feedback. > Should I still file a bug report on this to get it 'officially > fixed'? (3.1.7 anyone?) Probably not necessary. The thread is archived, a patch is in the 3.1.6 repository (thanks Joe!), and it doesn't look like the bug affects the 3.2 branch. If someone gets around to putting together a 3.1.7 release, I am sure they will check all the existing patches to see what needs to be included. Jim |
From: Joe R. J. <jj...@cl...> - 2004-07-02 18:30:48
|
On Fri, 2 Jul 2004, Gilles Detillieux wrote: > Date: Fri, 2 Jul 2004 11:50:40 -0500 (CDT) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] Re: [htdig] PERCENT returning 1% on big hits > > According to Joe R. Jah: > > If it works we can fix 3.2.0b6 too: > > > > --- htsearch/Display.cc.orig Fri May 28 06:15:24 2004 > > +++ htsearch/Display.cc Thu Jul 1 15:51:29 2004 > > @@ -362,7 +362,7 @@ > > > > if (maxScore != 0 && maxScore != minScore) > > { > > - int percent = (int)((ref->DocScore() - minScore) * 100 / > > + int percent = (int)((ref->DocScore() - minScore) * 100.0 / > > (maxScore - minScore)); > > if (percent <= 0) > > percent = 1; > > > > Thanks, Joe, but this shouldn't be an issue with the 3.2 code. > Not that your patch would hurt anything, and indeed it makes it more > clear what's going on, but it won't make any difference as far as > generated code. In 3.2, DocScore() and minScore are doubles, not ints, > so the multiplication will be done using doubles as well. The problem > in 3.1.x is DocScore() was an int. Thanks Gilles; I put in the old patch archives: ftp://ftp.ccsf.org/htdig-patches/3.2.0b6/0ld/percent.0 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... > The 3.1.6 patch seems bang on, as far as I can tell. This certainly > explains some of the complaints we've had in the past about weird > rankings in 3.1.x, but never got to the bottom of. > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2004-07-02 16:51:00
|
According to Joe R. Jah: > If it works we can fix 3.2.0b6 too: > > --- htsearch/Display.cc.orig Fri May 28 06:15:24 2004 > +++ htsearch/Display.cc Thu Jul 1 15:51:29 2004 > @@ -362,7 +362,7 @@ > > if (maxScore != 0 && maxScore != minScore) > { > - int percent = (int)((ref->DocScore() - minScore) * 100 / > + int percent = (int)((ref->DocScore() - minScore) * 100.0 / > (maxScore - minScore)); > if (percent <= 0) > percent = 1; > Thanks, Joe, but this shouldn't be an issue with the 3.2 code. Not that your patch would hurt anything, and indeed it makes it more clear what's going on, but it won't make any difference as far as generated code. In 3.2, DocScore() and minScore are doubles, not ints, so the multiplication will be done using doubles as well. The problem in 3.1.x is DocScore() was an int. The 3.1.6 patch seems bang on, as far as I can tell. This certainly explains some of the complaints we've had in the past about weird rankings in 3.1.x, but never got to the bottom of. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Eli W. <ew...@st...> - 2004-07-02 12:10:43
|
>Based on a quick look, I think it is an easy fix. However you would need >to touch the source and rebuild at least htsearch. My assumption is that >if you replace the 100 with 100.0 at line 314 of htsearch/Display.cc (for >ht://Dig 3.1.6), then you will avoid the overflow and correct the problem >you are encountering. I haven't actually tested this. If you are able to >test this and it solves the problem, please let us know. You might also >consider submitting a bug report if you get a chance just to make sure >this issue doesn't get lost in the shuffle. Hey, whaddya know :) It works. Thanks ... amazing what the float/int difference can make. Should I still file a bug report on this to get it 'officially fixed'? (3.1.7 anyone?) I'm just surprised that noone else had run into this yet. Would seem to quickly happen on any decent sized website, if you search on a word that occurs, as a link, on many/all pages (such as nav text) Thanks again! Eli |
From: Joe R. J. <jj...@cl...> - 2004-07-01 22:57:32
|
On Thu, 1 Jul 2004, Jim wrote: > Date: Thu, 1 Jul 2004 15:45:10 -0600 (MDT) > From: Jim <li...@yg...> > To: Eli White <ew...@st...> > Cc: htd...@li..., htd...@li... > Subject: [htdig-dev] Re: [htdig] PERCENT returning 1% on big hits > > On Thu, 1 Jul 2004, Eli White wrote: > > > In fact, due to cross-linking of terms, some of those can have a VERY HIGH > > score. > > > > The problem is, that at some point, it seems we passed a threshold, and for > > those extremely high hits, the PERCENT result is messed up. > > > > It appears as if it occurs when it wants to give a score HIGHER than 100%, > > and ends up printing 1% instead. > > Looks like a bug has crept into the handling of PERCENT. As you indicate, > the problem is in fact tied to high scores. The score, which is stored as > an int, is multiplied by 100 (literal), resulting in an overflow whenever > the score is much more than 20 million. If the result ends up negative, > percent is manually set to 1; otherwise you just get whatever garbage > resulted from the overflow. > > Based on a quick look, I think it is an easy fix. However you would need > to touch the source and rebuild at least htsearch. My assumption is that > if you replace the 100 with 100.0 at line 314 of htsearch/Display.cc (for > ht://Dig 3.1.6), then you will avoid the overflow and correct the problem > you are encountering. I haven't actually tested this. If you are able to > test this and it solves the problem, please let us know. You might also > consider submitting a bug report if you get a chance just to make sure > this issue doesn't get lost in the shuffle. > > Jim I can't test your fix either, but I have put it as a patch in: ftp://ftp.ccsf.org/htdig-patches/3.1.6/percent.0 If it works we can fix 3.2.0b6 too: --- htsearch/Display.cc.orig Fri May 28 06:15:24 2004 +++ htsearch/Display.cc Thu Jul 1 15:51:29 2004 @@ -362,7 +362,7 @@ if (maxScore != 0 && maxScore != minScore) { - int percent = (int)((ref->DocScore() - minScore) * 100 / + int percent = (int)((ref->DocScore() - minScore) * 100.0 / (maxScore - minScore)); if (percent <= 0) percent = 1; Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... > > Can anyone shed some light on why this is happening? It happens with 3.1.5 > > AND 3.1.6 ... > > > > Some example star/percent/score results to peruse: > > > > Search (A) > > Result 1: 1% - 5 stars - score: 40695460 > > Result 2: 24% - 2 stars - score: 9850704 > > Result 3: 24% - 2 stars - score: 9808377 > > Result 4: 16% - 2 stars - score: 8606539 > > > > Search (B) > > Result 1: 11% - 5 stars - score: 48628560 > > Result 2: 43% - 3 stars - score: 21387546 > > Result 3: 23% - 2 stars - score: 11271058 > > Result 4: 17% - 2 stars - score: 8733621 > > > > Search (C) > > Result 1: 1% - 5 stars - score: 761285184 > > Result 2: 1% - 1 star - score: 51412848 > > Result 3: 1% - 1 star - score: 43010124 > > Result 4: 2% - 1 star - score: 20066298 > > > > And so on ... > > > > Thanks in advance, > > Eli |
From: Jim <li...@yg...> - 2004-07-01 21:45:14
|
On Thu, 1 Jul 2004, Eli White wrote: > In fact, due to cross-linking of terms, some of those can have a VERY HIGH > score. > > The problem is, that at some point, it seems we passed a threshold, and for > those extremely high hits, the PERCENT result is messed up. > > It appears as if it occurs when it wants to give a score HIGHER than 100%, > and ends up printing 1% instead. Looks like a bug has crept into the handling of PERCENT. As you indicate, the problem is in fact tied to high scores. The score, which is stored as an int, is multiplied by 100 (literal), resulting in an overflow whenever the score is much more than 20 million. If the result ends up negative, percent is manually set to 1; otherwise you just get whatever garbage resulted from the overflow. Based on a quick look, I think it is an easy fix. However you would need to touch the source and rebuild at least htsearch. My assumption is that if you replace the 100 with 100.0 at line 314 of htsearch/Display.cc (for ht://Dig 3.1.6), then you will avoid the overflow and correct the problem you are encountering. I haven't actually tested this. If you are able to test this and it solves the problem, please let us know. You might also consider submitting a bug report if you get a chance just to make sure this issue doesn't get lost in the shuffle. Jim > Can anyone shed some light on why this is happening? It happens with 3.1.5 > AND 3.1.6 ... > > Some example star/percent/score results to peruse: > > Search (A) > Result 1: 1% - 5 stars - score: 40695460 > Result 2: 24% - 2 stars - score: 9850704 > Result 3: 24% - 2 stars - score: 9808377 > Result 4: 16% - 2 stars - score: 8606539 > > Search (B) > Result 1: 11% - 5 stars - score: 48628560 > Result 2: 43% - 3 stars - score: 21387546 > Result 3: 23% - 2 stars - score: 11271058 > Result 4: 17% - 2 stars - score: 8733621 > > Search (C) > Result 1: 1% - 5 stars - score: 761285184 > Result 2: 1% - 1 star - score: 51412848 > Result 3: 1% - 1 star - score: 43010124 > Result 4: 2% - 1 star - score: 20066298 > > And so on ... > > Thanks in advance, > Eli |
From: Geoff H. <ghu...@us...> - 2004-06-27 07:17:48
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b6: Scheduled: 31 May 2004. 3.2.0b5: Released: 10 Nov 2003. 3.2.0b4: Cancelled. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: KNOWN BUGS: (none serious. See <http://sourceforge.net/tracker/?atid=104593&group_id=4593&func=browse>.) PENDING PATCHES (available but need work): * Gilles's configuration parsing patches need testing before committing. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. (Is this still pending??) NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
From: Neal R. <ne...@ri...> - 2004-06-24 17:48:19
|
Great! As soon as we stamp 3.2.0b6 as final 3.2.0... we can do some code cleaning and begin work on Unicode and /possibly/ swapping our current search/index-store code for CLucene. It's going to be a fair amount of work cleaning up all the 'char' dependencies. Thanks. On Thu, 24 Jun 2004, Christopher Murtagh wrote: > > This just came up on the Postgres mailing list, and I figured it might > be interesting for the htdig list as well: > > http://oss.software.ibm.com/icu/ > > >From this page: > > "ICU is a mature, widely used set of C/C++ and Java libraries for > Unicode support, software internationalization and globalization > (i18n/g11n). It grew out of the JDK 1.1 internationalization APIs, which > the ICU team contributed, and the project continues to be developed for > the most advanced Unicode/i18n support. ICU is widely portable and gives > applications the same results on all platforms and between C/C++ and > Java software." > > Looks promising. Comments? > > Cheers, > > Chris > > -- > Christopher Murtagh > Enterprise Systems Administrator > ISR / Web Communications Group > McGill University > Montreal, Quebec > Canada > > Tel.: (514) 398-3122 > Fax: (514) 398-2017 > > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - > digital self defense, top technical experts, no vendor pitches, > unmatched networking opportunities. Visit www.blackhat.com > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Admin <ro...@wi...> - 2004-06-24 17:14:59
|
Hello htdig-dev, I've sent a message concerning new mirror for Estonia of htdig.org for about a month ago or so, but still got no reply. Have you received it? -- Best regards, Admin mailto:ro...@wi... |
From: Christopher M. <chr...@mc...> - 2004-06-24 06:30:36
|
This just came up on the Postgres mailing list, and I figured it might be interesting for the htdig list as well: http://oss.software.ibm.com/icu/ >From this page: "ICU is a mature, widely used set of C/C++ and Java libraries for Unicode support, software internationalization and globalization (i18n/g11n). It grew out of the JDK 1.1 internationalization APIs, which the ICU team contributed, and the project continues to be developed for the most advanced Unicode/i18n support. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software." Looks promising. Comments? Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: Joe R. J. <jj...@cl...> - 2004-06-23 17:45:37
|
On Wed, 23 Jun 2004, Lachlan Andrew wrote: > Date: Wed, 23 Jun 2004 23:12:35 +1000 > From: Lachlan Andrew <lh...@us...> > To: Joe R. Jah <jj...@cl...> > Cc: htd...@li... > Subject: Re: Make check and htdig warnings > > Greetings Joe, > > I think that the attached patch should fix it. If so, I'll have to > work out why t_htdig_local *was* working... No, it doesn't: running htdig: expected file:///tmp/htdig-3.2.0b6/test/htdocs/set1/site4.html http://localhost:7400/set1/ http://localhost:7400/set1/bad_local.htm http://localhost:7400/set1/script.html http://localhost:7400/set1/site%201.html http://localhost:7400/set1/site2.html http://localhost:7400/set1/site3.html http://localhost:7400/set1/sub%2520dir/ http://localhost:7400/set1/sub%2520dir/empty%20file.html http://localhost:7400/set1/title.html but got http://localhost:7400/set1/ http://localhost:7400/set1/bad_local.htm http://localhost:7400/set1/script.html http://localhost:7400/set1/site%201.html http://localhost:7400/set1/site2.html http://localhost:7400/set1/site3.html http://localhost:7400/set1/sub%2520dir/ http://localhost:7400/set1/sub%2520dir/empty%20file.html http://localhost:7400/set1/title.html FAIL: t_htdig Attached is test/conf/htdig.conf.tmp file. Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Lachlan A. <lh...@us...> - 2004-06-23 13:13:59
|
Greetings Joe, I think that the attached patch should fix it. If so, I'll have to work out why t_htdig_local *was* working... Thanks, Lachlan On Tue, 22 Jun 2004 09:34 am, Joe R. Jah wrote: > url_rewrite_rules: (.*)si[a-z]*[4-9]*\.([a-z]*)tml file:////tmp/htdig-3.2.0b6/test/htdocs/set1/site4.\\2tml -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Lachlan A. <lh...@us...> - 2004-06-23 12:49:55
|
Greetings Ted, I think you're looking for $0 (I'm sure there is a pun to be made here, but I'll refrain...). Cheers, Lachlan On Wed, 23 Jun 2004 09:29 pm, Ted Stresen-Reuter wrote: > What is the environment variable for the name and/or path of the > currently executing script? -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |