You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Neal R. <ne...@ri...> - 2003-10-14 20:55:37
|
On Tue, 14 Oct 2003, Gilles Detillieux wrote: > when it's not supposed to. However, in the case of head_before_get, > I believe the question was raised, but not actually answered, as to > whether this attribute even needs to be here. [Snip] > head_before_get in this latter case. Doesn't keeping this attribute > just add to code bloat, user confusion, and potential inefficiencies > for no apparent benefit? Maybe I'm missing something here, as I'm > not as versed in HTTP/1.1 as you are. It seems to me that htdig should > always be doing a HEAD before a GET when doing incremental digs through > persistent connections. I suppose you could make a case that it is useful during the development of the spidering code to always see this header separately.. This marginal debugging benefit could be handled well with #ifdefs. I'm with you on this one.. we should just kill head_before_get. I would vote for killing it instead of hacking the logic. And we should probably be on alert during this process to think about killing any other configs that look unneeded. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-10-14 19:46:09
|
According to Neal Richter: > On Tue, 14 Oct 2003, Gilles Detillieux wrote: > > Speaking of committing to CVS, Neal, what's up with your request to hold off? ... > > What was the busy work other than adding bug tracker categories, which > > you mentioned in your earlier message? Are you done and is it OK to > > commit now? I noticed some people have been doing so. > > Nothing really.. the idea was at this point we should have > a 'Include_in_3.2' bug associated with each 'commit' until we release. > This is purely for the purpose of 'tracking' the bugs & fixes. > > I was wanting to get the 'Include_in_3.2' group added first... and clean > up the bug list. That's done. > > No offense intended there... I was just trying to get some measure of > organization to the 'Feature-Freeze' state we are in now. > > ie.. during 'feature-freeze' there should be a bug created for each > issue, and any commits should list the bug number in the commit message. > > This isn't an attempt to exert control.. just to help organize the > process. No offense taken at all. I was just a bit puzzled as to the reason and the status of your request to hold off. This helps clarify things a lot. I'm all for organizing the process better, but I think what you just said needed to be stated explicitly so we all understand what the process ought to be. Thanks. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2003-10-14 19:40:12
|
According to Gabriele Bartolini: > > Are you sure? I override current config parameters using this type of > >call inside libhtdig.. works fine. > > I am pretty sure. So far indeed, the 'head_before_get' has always been a > sub-option of the persistent connection feature, which means that head > before get is not handled if persistent connections are off. > > Also, the option works on a per-server configuration as well, which will > eventually override the global setting. It would be a good idea, in the general case, for us all to learn how to properly override config parameters in the code, so that a server block or URL block definition doesn't override an internal override when it's not supposed to. However, in the case of head_before_get, I believe the question was raised, but not actually answered, as to whether this attribute even needs to be here. If it's only used when persistent connections are on, and will be fixed to be turned off when doing an initial dig, then it will only ever take effect when doing an update (or incremental) dig with persistent connections turned on. I can't for the life of me imagine any benefit of turning off head_before_get in this latter case. Doesn't keeping this attribute just add to code bloat, user confusion, and potential inefficiencies for no apparent benefit? Maybe I'm missing something here, as I'm not as versed in HTTP/1.1 as you are. It seems to me that htdig should always be doing a HEAD before a GET when doing incremental digs through persistent connections. By the way, Gabriele, good call on the Accept-Encoding header. It's a simple, elegant fix to a troublesome bug. You're right that adding support for gzip encoding is a feature request, and not a bug fix, and should be done after the upcoming release (not before). Good work. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-10-14 19:37:01
|
On Tue, 14 Oct 2003, Gilles Detillieux wrote: > Speaking of committing to CVS, Neal, what's up with your request to hold off? [Snip] > What was the busy work other than adding bug tracker categories, which > you mentioned in your earlier message? Are you done and is it OK to > commit now? I noticed some people have been doing so. Nothing really.. the idea was at this point we should have a 'Include_in_3.2' bug associated with each 'commit' until we release. This is purely for the purpose of 'tracking' the bugs & fixes. I was wanting to get the 'Include_in_3.2' group added first... and clean up the bug list. That's done. No offense intended there... I was just trying to get some measure of organization to the 'Feature-Freeze' state we are in now. ie.. during 'feature-freeze' there should be a bug created for each issue, and any commits should list the bug number in the commit message. This isn't an attempt to exert control.. just to help organize the process. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-10-14 18:01:44
|
According to Lachlan Andrew: > On reflection, I think the behaviour that seems to have been intended > is better. I've filed a bug report (with patch) to implement: > > 1. If allow_numbers is false, words must contain at least one > non-digit (2001 not a word, X11 is). > 2. If allow_numbers is true, digits are equivalent to letters. > > Comments/testing welcome. I would agree that this is the desirable behaviour, and it is what 3.1.x implements. Somewhere in the creation of the WordType class in 3.2, a few errors were made in porting over the logic of the WordList class in 3.1. The logical error was in assuming that IsStrictChar() returned false for digits, when it in fact returns true. Without actually testing your patch beyond a visual "walk-through", the new logic appears to be correct. That the WordType class read allow_numbers as Value rather than Boolean was just bizarre, but I guess an understandable oversight. I got thouroughly confused in reading your patch, though, because it is reversed, with the new code appearing in the first file and the old code in the second, rather than the other way around. Taking that into account, though, the patch seems right to me. I think it should be committed ASAP. Speaking of committing to CVS, Neal, what's up with your request to hold off? According to Neal Richter: >> Please make the fix but hold off committing it... see >> previous message, I need to do some busy work on the sourceforge site >> first.... and we need to wait to hear from anyone with objections ;-) What was the busy work other than adding bug tracker categories, which you mentioned in your earlier message? Are you done and is it OK to commit now? I noticed some people have been doing so. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-10-14 17:49:35
|
Hey all, Would one of the Solaris developers please investigate these Bugs? 799938 configuration failed on Solaris 2.8 820139 make failing after configure 820139 is slightly different now, he can't get configure to find libstdc++ Basically what is the current state of the build on Soalris 2.8 with various GCC versions? Do they need to update the automake/autoconf tools on their system? Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Ted Stresen-R. <ted...@ma...> - 2003-10-14 13:23:23
|
I'm happy to do testing on Mac OS X. I have a Linux box at my disposal
but it is a production machine so it is hard to test on. When
requesting help testing, please provide a test case (the steps one must
take to complete the test) and the intended behavior (so the testers
know what to look for and what shouldn't be appearing).
I try to follow all the threads on this list but lately I seem to be
missing every other message! Bear with me...
Thanks,
Ted Stresen-Reuter
On Monday, October 13, 2003, at 05:37 PM, Lachlan Andrew wrote:
> Greetings all,
>
> On reflection, I think the behaviour that seems to have been intended
> is better. I've filed a bug report (with patch) to implement:
>
> 1. If allow_numbers is false, words must contain at least one
> non-digit (2001 not a word, X11 is).
> 2. If allow_numbers is true, digits are equivalent to letters.
>
> Comments/testing welcome.
>
> Cheers,
> Lachlan
>
> On Tue, 14 Oct 2003 07:59, Neal Richter wrote:
>> Sounds good to me.
>>
>> On Mon, 13 Oct 2003, Lachlan Andrew wrote:
>>> 1. If allow_numbers is true then digits are treated the same as
>>> extra_word_characters.
>>> 2. If allow_numbers is false, then digits are treated as
>>> ("invalid") punctuation.
>>> 3. The default be changed to allow_numbers=true (which is
>>> compatibile with the current buggy default behaviour).
>
> --
> lh...@us...
> ht://Dig developer DownUnder (http://www.htdig.org)
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
|
|
From: Lachlan A. <lh...@us...> - 2003-10-14 12:01:40
|
Greetings, Thanks for filing the bug report. As I said in the follow-up, this seems to work for me. Could you=20 please tell me the exact query, and the site you are searching? That=20 way I can try to reproduce the bug. Thanks, Lachlan On Tue, 14 Oct 2003 09:53, you wrote: > When I do a request beetween double quotes and > beginning by a badword, HtSearch doesn't find any results, > but if I begin with a valid keyword, it works great. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Gabriele B. <bar...@in...> - 2003-10-14 05:30:45
|
At 18.10 13/10/2003 -0600, Neal Richter wrote: >On Tue, 14 Oct 2003, Gabriele Bartolini wrote: > > 3) Fixing the 'Accept-Encoding' bug of the HTTP 1/1 protocol (see bug > > #594790). My solution, waiting to correctly handle gzipped contents (if you > > think it is reasonable doing it now, I could think about that) is just to > > send an empty 'Accept-Encoding' header, which lets the server know that our > > user agent is only able to manage document with the 'identity' encoding. > > Otherwise - the actual case - if no header is sent, the server assumes we > > can handle every kind of encoding. > >I'll leave this up to you. Does your fix solve the problem completely? I have committed the simple patch, which solves the bug #594790. Now the request - according to me - becomes a feature request, that is to say to handle the compressed contents. This should not be a problem, but IMHO could lead to some portability problems that I would rather avoid now. Indeed, we should enable it only if zlib is present. Let me know. Now, the HTTP request always includes an empty Accept-Encoding header (this informs the server that htdig is only able to manage documents that are not encoded. Before, no Accept-Encoding was sent, letting the server assume that the client was capable of handling every content encoding - i.e. zipped documents with Apache's mod_gzip module. >1) Is your potential code fix large? The fix is extremely simple (one line) and should not lead to problems. >2) Can it be fixed in a timely manner? Yep. Done. >3) Is is important enough to delay 3.2? I refer this question to the feature request. I think this is not urgent and I consider 3.2 more important than adding this feature, even though I suggest putting it into the TODO list for the next release. >4) Does the bug affect enough users to be important in the short-term? Potentially yes. Indeed, for mod_gzip Apache servers this could have led to some problems. Ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Gabriele B. <bar...@in...> - 2003-10-14 05:23:43
|
> Are you sure? I override current config parameters using this type of >call inside libhtdig.. works fine. I am pretty sure. So far indeed, the 'head_before_get' has always been a sub-option of the persistent connection feature, which means that head before get is not handled if persistent connections are off. Also, the option works on a per-server configuration as well, which will eventually override the global setting. The workaround I have written - included in the attached patch - is to add a piece of information to both the Retriever and the Document classes and to activate (or deactivate) the HeadBeforeGet feature of the HTTP-ish classes according to: - the server settings - the fact that we are now performing an incremental indexing Basically, htdig sets a RetrieverType variable to be 'Initial' or 'Incremental' into the Retriever Object. This object informs then the Document class that we are performing (or not) an initial indexing. When retrieving the document, the Document class now always check for the head before get option but always activate it when the indexing is set to 'incremental'. I also added a few lines to the defaults.cc file. Please give it a look and tell me if the patch sounds good. If yes, I will apply as soon as possible. I made a few simple tests and it works. Let me know, -Gabriele |
|
From: Neal R. <ne...@ri...> - 2003-10-14 00:12:38
|
On Tue, 14 Oct 2003, Gabriele Bartolini wrote: > > >As of now there are two immediate bugs > >1) Fixing head_before_get to be default true and override (to false) > > during a '-i' reindex > >2) Fixing allow_numbers bug > > 3) Fixing the 'Accept-Encoding' bug of the HTTP 1/1 protocol (see bug > #594790). My solution, waiting to correctly handle gzipped contents (if you > think it is reasonable doing it now, I could think about that) is just to > send an empty 'Accept-Encoding' header, which lets the server know that our > user agent is only able to manage document with the 'identity' encoding. > Otherwise - the actual case - if no header is sent, the server assumes we > can handle every kind of encoding. I'll leave this up to you. Does your fix solve the problem completely? Let's all evaluate bugs at this stage according to: 1) Is your potential code fix large? 2) Can it be fixed in a timely manner? 3) Is is important enough to delay 3.2? 4) Does the bug affect enough users to be important in the short-term? Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-10-14 00:08:07
|
On Tue, 14 Oct 2003, Gabriele Bartolini wrote:
>
> >htdig.cc:156 config->Read(configFile);
> >
> >if(initial > 0)
> > config->Add ("head_before_get", FALSE);
>
> I will. I also think this is not enough. I have to kind of use a 'force'
> variable in order to override every inner specification (i.e. blocks). I
> was thinking of using a class variable for the HTTP class. I will give a
> look at the code this morning.
Are you sure? I override current config parameters using this type of
call inside libhtdig.. works fine.
Thanks.
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Gabriele B. <bar...@in...> - 2003-10-14 00:01:08
|
>htdig.cc:156 config->Read(configFile);
>
>if(initial > 0)
> config->Add ("head_before_get", FALSE);
I will. I also think this is not enough. I have to kind of use a 'force'
variable in order to override every inner specification (i.e. blocks). I
was thinking of using a class variable for the HTTP class. I will give a
look at the code this morning.
Ciao
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Melbourne, Victoria, Australia
bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno
|
|
From: Gabriele B. <bar...@in...> - 2003-10-13 23:57:14
|
>As of now there are two immediate bugs >1) Fixing head_before_get to be default true and override (to false) > during a '-i' reindex >2) Fixing allow_numbers bug 3) Fixing the 'Accept-Encoding' bug of the HTTP 1/1 protocol (see bug #594790). My solution, waiting to correctly handle gzipped contents (if you think it is reasonable doing it now, I could think about that) is just to send an empty 'Accept-Encoding' header, which lets the server know that our user agent is only able to manage document with the 'identity' encoding. Otherwise - the actual case - if no header is sent, the server assumes we can handle every kind of encoding. Any comments? Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Neal R. <ne...@ri...> - 2003-10-13 23:42:56
|
On Tue, 14 Oct 2003, Gabriele Bartolini wrote:
>
> >If you find a bug during testing
> > 1) Post description of bug to devlist
> > 2) Submit a Sourceforge bug.
> > 3) If the bug is deemed serious we'll mark it as 'Include-in-3.2RC1'
> > category
> > 4) Someone Fix the Bug!
> > 5) When a fix is committed the bug should be retested by a third person
> > 6) When fix is verified, the Sourceforge bug is marked as 'Solved'
>
> Sounds good.
>
> >As of now there are two immediate bugs
> >1) Fixing head_before_get to be default true and override (to false)
> > during a '-i' reindex
>
> I volonteer for this bug. Is it ok?
Great! Add something after line 156 of htdig.cc
htdig.cc:156 config->Read(configFile);
if(initial > 0)
config->Add ("head_before_get", FALSE);
Please add a bug with the Group 'Include-in-3.2', commit the fix and I'll
be the verifier on this one.
Thanks!
> Ciao ciao
> -Gabriele
> --
> Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
> maintainer
> Current Location: Melbourne, Victoria, Australia
> bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
> Inferno
>
>
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Lachlan A. <lh...@us...> - 2003-10-13 22:44:58
|
Greetings all,
On reflection, I think the behaviour that seems to have been intended=20
is better. I've filed a bug report (with patch) to implement:
1. If allow_numbers is false, words must contain at least one=20
non-digit (2001 not a word, X11 is).
2. If allow_numbers is true, digits are equivalent to letters.
Comments/testing welcome.
Cheers,
Lachlan
On Tue, 14 Oct 2003 07:59, Neal Richter wrote:
> Sounds good to me.
>
> On Mon, 13 Oct 2003, Lachlan Andrew wrote:
> > 1. If allow_numbers is true then digits are treated the same as
> > extra_word_characters.
> > 2. If allow_numbers is false, then digits are treated as
> > ("invalid") punctuation.
> > 3. The default be changed to allow_numbers=3Dtrue (which is
> > compatibile with the current buggy default behaviour).
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|
|
From: Gabriele B. <bar...@in...> - 2003-10-13 22:27:27
|
>If you find a bug during testing > 1) Post description of bug to devlist > 2) Submit a Sourceforge bug. > 3) If the bug is deemed serious we'll mark it as 'Include-in-3.2RC1' > category > 4) Someone Fix the Bug! > 5) When a fix is committed the bug should be retested by a third person > 6) When fix is verified, the Sourceforge bug is marked as 'Solved' Sounds good. >As of now there are two immediate bugs >1) Fixing head_before_get to be default true and override (to false) > during a '-i' reindex I volonteer for this bug. Is it ok? Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Neal R. <ne...@ri...> - 2003-10-13 22:01:14
|
Sounds good to me.
Please make the fix but hold off committing it... see
previous message, I need to do some busy work on the sourceforge site
first.... and we need to wait to hear from anyone with objections ;-)
Thanks.
On Mon, 13 Oct 2003, Lachlan Andrew wrote:
> Greetings all,
>
> I have a question about the interpretation of allow_numbers.
> If allow_numbers is false, should digits be considered separators?
> Looking at the code, it seems someone wanted to say that "3G", "Y2K"
> and "X11" would be words, even if allow_numbers is false, because
> they contain at least one letter:
>
> int alpha = 0;
> for(const unsigned char *p =
> (const unsigned char*)(const char*)(char *)word; *p; p++) {
> if(IsStrictChar(*p) || (allow_numbers && IsDigit(*p))) {
> alpha = 1;
> } else if(IsControl(*p)) {
> return status | WORD_NORMALIZE_CONTROL;
> }
> }
>
> //
> // Reject if contains no alpha characters
> //
> if(!alpha) return status | WORD_NORMALIZE_NOALPHA;
>
>
>
> Current behaviour is to *ignore* allow_numbers and to default to
> treating digits as letters [since WORD_TYPE_DIGIT is included in
> IsChar() and IsStrictChar()].
>
> I propose the following behaviour:
>
> 1. If allow_numbers is true then digits are treated the same as
> extra_word_characters.
> 2. If allow_numbers is false, then digits are treated as ("invalid")
> punctuation.
> 3. The default be changed to allow_numbers=true (which is
> compatibile with the current buggy default behaviour).
>
> Any objections?
>
> Lachlan
>
> On Sat, 11 Oct 2003 05:56, Neal Richter wrote:
>
> > Everyone: Please let me know what kind of time you'd be willing to
> > put in to get this stuff tested??!!
>
> --
> lh...@us...
> ht://Dig developer DownUnder (http://www.htdig.org)
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Neal R. <ne...@ri...> - 2003-10-13 21:59:37
|
Hey all, So we're in Feature-Freeze for 3.2RC1 starting today. Here's my TODO list 1) Create a Sourceforge task to track progress of testing 2) Solicit help testing 3) Institute bug-tracking procedure 4) Evaluate current bugs in Sourceforge for inclusion in 3.2RC1. Bug-tracking Procedure: If you find a bug during testing 1) Post description of bug to devlist 2) Submit a Sourceforge bug. 3) If the bug is deemed serious we'll mark it as 'Include-in-3.2RC1' category 4) Someone Fix the Bug! 5) When a fix is committed the bug should be retested by a third person 6) When fix is verified, the Sourceforge bug is marked as 'Solved' I'll be adding this category to our Sourceforge bug-tracker soon.. As of now there are two immediate bugs 1) Fixing head_before_get to be default true and override (to false) during a '-i' reindex 2) Fixing allow_numbers bug Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-10-13 13:35:07
|
Greetings all,
I have a question about the interpretation of allow_numbers.
If allow_numbers is false, should digits be considered separators? =20
Looking at the code, it seems someone wanted to say that "3G", "Y2K"=20
and "X11" would be words, even if allow_numbers is false, because=20
they contain at least one letter:
int alpha =3D 0;
for(const unsigned char *p =3D
(const unsigned char*)(const char*)(char *)word; *p; p++) {
if(IsStrictChar(*p) || (allow_numbers && IsDigit(*p))) {
alpha =3D 1;
} else if(IsControl(*p)) {
return status | WORD_NORMALIZE_CONTROL;
}
}
//
// Reject if contains no alpha characters
//
if(!alpha) return status | WORD_NORMALIZE_NOALPHA;
Current behaviour is to *ignore* allow_numbers and to default to=20
treating digits as letters [since WORD_TYPE_DIGIT is included in =20
IsChar() and IsStrictChar()].
I propose the following behaviour:
1. If allow_numbers is true then digits are treated the same as=20
extra_word_characters.
2. If allow_numbers is false, then digits are treated as ("invalid")=20
punctuation.
3. The default be changed to allow_numbers=3Dtrue (which is=20
compatibile with the current buggy default behaviour).
Any objections?
Lachlan
On Sat, 11 Oct 2003 05:56, Neal Richter wrote:
> Everyone: Please let me know what kind of time you'd be willing to
> put in to get this stuff tested??!!
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|
|
From: Lachlan A. <lh...@us...> - 2003-10-12 07:34:25
|
Greetings all,
It seems that the problem is that words in the word database have a=20
couple of numeric fields appended to them. My guess is that these=20
are new since 3.1, but I don't know what they are for.
The solution seems to be to write a second compare function which only=20
compares the string component of the keys.
Comments, anyone?
BTW, my internet connection has been very flaky of late, so apologies=20
if I'm slow in replying to anything.
Cheers,
Lachlan
On Sat, 11 Oct 2003 12:33, Lachlan Andrew wrote:
> "Speling" generates permuted forms of the query term, and check
> each to see if it is in the database, like so:
> if (!wordDB.Exists(initial)) // Seems weird, but this is correct
>
> The problem is that, whether or not the permutation is in the
> database, wordDB.Exists(...) always seems to return -1 ("true"),
> which means that the 'if' always fails (despite the comment saying
> it is correct :)
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|
|
From: Lachlan A. <lh...@us...> - 2003-10-11 02:46:19
|
Greetings all, I propose that we call the next stage "feature freeze", rather than=20 "code freeze", since we'll be changing the code as we fix bugs. =20 After that, we could have a week of actual code-freeze, so we can all=20 re-run our tests on the final product. Attached is a more completed breakdown of attributes to test. I'll=20 take those under the headings which I've put (lha) after, and have=20 already verified those ending with lha. I agree with Neal that we really need people to say what testing they=20 have time for, and to pick one or more groups of attributes. Can I=20 ask one of our multi-lingual developers to test the=20 internationalisation (even though those attributes are also all in=20 other groups)? FYI, I'm also attaching a couple of scripts that I eventually want to=20 put in .../test -- one is a utility to set an attribute in a=20 temporary config file, and the other is what I've been using to test=20 the "fuzzy" rules. (It needs some other files to run, but could form=20 a template for other people's testing.) It would be nice to build up=20 a complete test suite to make the next release easier. Thanks in advance to all those who can chip in. Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-10-11 02:35:53
|
Greetings all,
I have been testing the fuzzy algorithms, and can't get "speling" to=20
work, and I don't understand "regex".
"Speling" generates permuted forms of the query term, and check each=20
to see if it is in the database, like so:
// First transposes
// (these are really common)
initial =3D stripped;
char=09temp =3D initial[pos];
initial[pos] =3D initial[pos+1];
initial[pos+1] =3D temp;
if (!wordDB.Exists(initial)) // Seems weird, but this is=20
correct
=09words.Add(new String(initial));
// Now let's do deletions
The problem is that, whether or not the permutation is in the=20
database, wordDB.Exists(...) always seems to return -1 ("true"),=20
which means that the 'if' always fails (despite the comment saying it=20
is correct :)
"Regex" has problems with the meta characters being stripped out=20
before getting to the algorithm. Should the meta characters be part=20
of "extra_word_characters"? If so, what happens if "regex" and=20
"exact" are both specified in search_algorithms? When I set=20
"extra_word_characters=3D.*[^]\\$", the query ".*vers[ia].*" expands to=20
all the *numbers* in the word database.
Has anyone got speling or regex to work, or is the person who=20
wrote either of them still here?
Thanks!
Lachlan
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|
|
From: Neal R. <ne...@ri...> - 2003-10-10 22:17:33
|
Keep pounding. I didn't use .NET compilers to make these files so I'm not suprised you see these errors. Before you do lots of changes to the code... isn't there a cl switch to accept the older C++ no-namespace idioms? On Fri, 10 Oct 2003 st...@ei... wrote: > Thanx for the suggestion. Here's what I get: > > ================ > cl -nologo -W3 -DZLIB_DLL -MD -I../include -DDBUG_OFF -D_WIN32 -DWIN32 - > D__WIN32 > __ -IL:/win32/include/zlib -DDEFAULT_CONFIG_FILE=\"c:\htdig\demo.conf\" - > DCOMMON > _DIR=\"c:\htdig\demo.db\templates\" -DBIN_DIR=\"c:\htdig\demo.db\bin\" - > DCONFIG_ > DIR=\"c:\htdig\" -DIMAGE_URL_PREFIX=\"/rnt/rnm/img\" - > DDATABASE_DIR=\"c:\htdig\d > emo.db\" -Fowin32/dirent_local.obj -c dirent_local.c > dirent_local.c > dirent_local.c(26) : fatal error C1083: Cannot open include file: 'iostream.h': > No such file or directory > make[1]: *** [win32/dirent_local.obj] Error 2 > make[1]: Leaving directory `/home/htdig320b4/htdig-3.2.0b4-20031005/db' > make: *** [db.build] Error 2 > ================ > > I'm using the 2003 .NET compiler (Vc7/cl.exe), and it doesn't have iostream.h > or ostream.h headers (only the C++ versions iostream and ostream). Commenting > out the '#include <iostream.h>' line in dirent_local.c (probably a bad idea) > appears to let compilation continue through the BDB code without errors. The > next error is as follows: > > ================ > cl -DHAVE_CONFIG_H -I../db -I. -I../htword -I../htcommon -nologo -W3 -DZLIB_DLL > -MD -I../include -DDBUG_OFF -D_WIN32 -DWIN32 -D__WIN32__ - > IL:/win32/include/zlib > -DDEFAULT_CONFIG_FILE=\"c:\htdig\demo.conf\" - > DCOMMON_DIR=\"c:\htdig\demo.db\te > mplates\" -DBIN_DIR=\"c:\htdig\demo.db\bin\" -DCONFIG_DIR=\"c:\htdig\" - > DIMAGE_U > RL_PREFIX=\"/rnt/rnm/img\" -DDATABASE_DIR=\"c:\htdig\demo.db\" -GX - > Fowin32/Conf > iguration.obj -c /Tp Configuration.cc > Configuration.cc > c:\cygwin\home\htdig320b4\htdig-3.2.0b4-20031005-B\htlib\htString.h(28) : fatal > error C1083: Cannot open include file: 'iostream.h': No such file or directory > make[1]: *** [win32/Configuration.obj] Error 2 > make[1]: Leaving directory `/home/htdig320b4/htdig-3.2.0b4-20031005-B/htlib' > make: *** [htlib.build] Error 2 > ================ > > I attempted to make use of several suggested fixes for htString.h regarding > ostream and the std namespace from the mailing lists, but none of them seemed > to work. The issue seems to be that HAVE_STD and HAVE_NAMESPACES checks at the > top of the file aren't working correctly. I was sure to copy over > your .h.win32 headers as specified. > > With a little tweaking around, it looks like the compile will continue *IF* I > can get the std namespace working for the appropriate references, and *IF* I > can get the files to reference iostream and ostream instead of the .h > versions. Does this sound like the right path? Lots of other files have this > same conditional block. Any ideas? > > Cheers!! > > > > > Please try this in a cygwin shell: > > > > cp ./db/db.h.win32 ./db/db.h > > cp ./db/db_config.h.win32 ./db/db_config.h > > cp ./include/htconfig.h.win32 ./include/htconfig.h > > > > make -f Makefile.win32 > > > > That should fire off a build using Microsoft's compilers... > > > > Thanks! Neal. > > > > Neal Richter > > Knowledgebase Developer > > RightNow Technologies, Inc. > > Customer Service for Every Web Site > > Office: 406-522-1485 > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: <st...@ei...> - 2003-10-10 21:57:40
|
Thanx for the suggestion. Here's what I get: ================ cl -nologo -W3 -DZLIB_DLL -MD -I../include -DDBUG_OFF -D_WIN32 -DWIN32 - D__WIN32 __ -IL:/win32/include/zlib -DDEFAULT_CONFIG_FILE=\"c:\htdig\demo.conf\" - DCOMMON _DIR=\"c:\htdig\demo.db\templates\" -DBIN_DIR=\"c:\htdig\demo.db\bin\" - DCONFIG_ DIR=\"c:\htdig\" -DIMAGE_URL_PREFIX=\"/rnt/rnm/img\" - DDATABASE_DIR=\"c:\htdig\d emo.db\" -Fowin32/dirent_local.obj -c dirent_local.c dirent_local.c dirent_local.c(26) : fatal error C1083: Cannot open include file: 'iostream.h': No such file or directory make[1]: *** [win32/dirent_local.obj] Error 2 make[1]: Leaving directory `/home/htdig320b4/htdig-3.2.0b4-20031005/db' make: *** [db.build] Error 2 ================ I'm using the 2003 .NET compiler (Vc7/cl.exe), and it doesn't have iostream.h or ostream.h headers (only the C++ versions iostream and ostream). Commenting out the '#include <iostream.h>' line in dirent_local.c (probably a bad idea) appears to let compilation continue through the BDB code without errors. The next error is as follows: ================ cl -DHAVE_CONFIG_H -I../db -I. -I../htword -I../htcommon -nologo -W3 -DZLIB_DLL -MD -I../include -DDBUG_OFF -D_WIN32 -DWIN32 -D__WIN32__ - IL:/win32/include/zlib -DDEFAULT_CONFIG_FILE=\"c:\htdig\demo.conf\" - DCOMMON_DIR=\"c:\htdig\demo.db\te mplates\" -DBIN_DIR=\"c:\htdig\demo.db\bin\" -DCONFIG_DIR=\"c:\htdig\" - DIMAGE_U RL_PREFIX=\"/rnt/rnm/img\" -DDATABASE_DIR=\"c:\htdig\demo.db\" -GX - Fowin32/Conf iguration.obj -c /Tp Configuration.cc Configuration.cc c:\cygwin\home\htdig320b4\htdig-3.2.0b4-20031005-B\htlib\htString.h(28) : fatal error C1083: Cannot open include file: 'iostream.h': No such file or directory make[1]: *** [win32/Configuration.obj] Error 2 make[1]: Leaving directory `/home/htdig320b4/htdig-3.2.0b4-20031005-B/htlib' make: *** [htlib.build] Error 2 ================ I attempted to make use of several suggested fixes for htString.h regarding ostream and the std namespace from the mailing lists, but none of them seemed to work. The issue seems to be that HAVE_STD and HAVE_NAMESPACES checks at the top of the file aren't working correctly. I was sure to copy over your .h.win32 headers as specified. With a little tweaking around, it looks like the compile will continue *IF* I can get the std namespace working for the appropriate references, and *IF* I can get the files to reference iostream and ostream instead of the .h versions. Does this sound like the right path? Lots of other files have this same conditional block. Any ideas? Cheers!! > > Please try this in a cygwin shell: > > cp ./db/db.h.win32 ./db/db.h > cp ./db/db_config.h.win32 ./db/db_config.h > cp ./include/htconfig.h.win32 ./include/htconfig.h > > make -f Makefile.win32 > > That should fire off a build using Microsoft's compilers... > > Thanks! Neal. > > Neal Richter > Knowledgebase Developer > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 > |