htdig-dev Mailing List for ht://Dig (Page 99)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

This is my first post and I've only been using Ht://Dig for 2 days, so
please go easy on me. :)
First, a little about my system - I'm running Ht://Dig 3.1.5 on a Red Hat
7.2 Linux PC. The standard sort command that is installed is 2.0.14.

I was playing around with 3 databases which all worked fine individually.
However, I wished to merge them into a single database so that I could
search across all 3 at once. Looking through the archive I found a post from
Gilles explaining how to do this with htmerge and the -m switch. I tried
this and all seemed to work beautifully; I was able to search across all
databases.

However I noticed that some of the documents that SHOULD have shown up in a
particular search did not. They showed up in their individual databases, and
running the htmerge program with full verbosity showed they were being
merged into the conglomerate database. Looking through the db.wordlist file
showed that the word I was searching for, 'support' was not correctly
sorted, as there were also some instances of 'supported' which were mixed in
amongst them. I couldn't work out how this could happen until I tried
playing with the sort command for a while. As one of the posts from the
archive says, sort is assumed to sort across the whole line, but this simply
does not produce the intended effect, at least on my (very common) system.

Here's a part of the db.wordlist I get from htmerge as it stands:

supply  i:29    l:796   w:204
support i:20    l:111   w:2585  c:4
support i:69    l:35    w:201890        c:4
support i:70    l:11    w:4650  c:6
supported       i:57    l:710   w:290
supported       i:38    l:797   w:203
support i:59    l:240   w:760
support i:18    l:797   w:203
support i:29    l:799   w:201
support i:73    l:869   w:131
sure    i:20    l:656   w:344
surname i:30    l:115   w:1607  c:2
surname i:31    l:115   w:1607  c:2

Here's what you get when you use the straight /bin/sort on it:

supply  i:29    l:796   w:204
supported       i:38    l:797   w:203
supported       i:57    l:710   w:290
support i:18    l:797   w:203
support i:20    l:111   w:2585  c:4
support i:29    l:799   w:201
support i:59    l:240   w:760
support i:69    l:35    w:201890        c:4
support i:70    l:11    w:4650  c:6
support i:73    l:869   w:131
sure    i:20    l:656   w:344
surname i:30    l:115   w:1607  c:2
surname i:31    l:115   w:1607  c:2

And here's what's actually needed to get correct results from htsearch

supply  i:29    l:796   w:204
support i:18    l:797   w:203
support i:20    l:111   w:2585  c:4
support i:29    l:799   w:201
support i:59    l:240   w:760
support i:69    l:35    w:201890        c:4
support i:70    l:11    w:4650  c:6
support i:73    l:869   w:131
supported       i:38    l:797   w:203
supported       i:57    l:710   w:290
sure    i:20    l:656   w:344
surname i:30    l:115   w:1607  c:2
surname i:31    l:115   w:1607  c:2

The reason 'supported' appears above 'support' is because the whole line is
being used as the key and the 'e' in 'supported' comes before the 'i' in the
next field.

Below is a patch for htmerge/words.cc that appends a '--key=1,1' parameter
to the sort command in htmerge. This seems to fix the problem. Gilles
mentions elsewhere that the intended behaviour is to sort by the first
field, then second, etc. so you may wish to include those parameters also.
No idea if this would work on any systems apart from my own (or even if the
problem exists in different versions, etc). Obviously it would be better if
this parameter were in the Makefile or something and was configured as
necessary by make, but I don't know enough about it.

59a60,72
>
>
>     // START PATCH
>     // patch added by Dan Cutting (da...@wo...) 23/01/2002 to
make htmerge
>     // use first field of wordlist to sort instead of entire line which
could lead
>     // to incorrectly sorted database. this in turn leads to missing
results
>     // from searches. NB: this has not been tested on any systems (apart
from my own
>     // Linux box) and is designed purely for sort version 2.0.14. Other
versions
>     // will probably also work, but no guarantees! Try it from the command
line first.
>     command << " --key=1,1";
>     /// END PATCH
>
>

Regards,
Dan Cutting
dan...@so...

**********************************************************************

visit http://www.solution6.com
visit http://www.eccountancy.com - everything for accountants.

UK Customers - http://www.solution6.co.uk

*********************************************************************
This email message (and attachments) may contain information that is confidential to Solution 6. If you are not the intended recipient you cannot use, distribute or copy the message or attachments.  In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments.  Opinions, conclusions and other information in this message and attachments that do not relate to the official business of Solution 6 are neither given nor endorsed by it.
*********************************************************************

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

htdig-dev Mailing List for ht://Dig (Page 99)

htdig-dev — Developer Discussion for the ht://Dig project