From: Gilles D. <gr...@sc...> - 2003-10-21 22:55:13
|
Hey, guys. I ran into something wierd when I was testing out the allow_numbers changes last week, which I haven't been quite able to explain or track down in the code. Of the pages on my site that I was indexing, about a dozen of them were from a CGI script that puts out a Last-Modified header to set the date appropriately in search results. Because of a recent bug in the script, which I just fixed last week, it turns out that the Last-Modified headers were coming out with no date on them, so htdig was giving them a modtime of 0 (i.e. the epoch). This is different behaviour than htdig 3.1.6, which gave them the current time instead. It may be that the 3.2 code should be fixed to do likewise, as it seems the more sensible behaviour. However, that's not the wierd thing. What was odd is that even though these dozen or so web pages were definitely in the database, and came out into db.docs after an htdump (with a m:0 field), htsearch would not show these in search results. I looked at the code, and the only thing that I can see that would cause this is if the startyear, startmonth or startday input parameters were set, causing the timet_startdate value in Display.cc to be greater than 0. But I didn't set these! I ran htsearch from the command line, so I know I wasn't passing it these values as input parameters, and the config file I used didn't define these as attributes either. I know the problem was the 0 modtime, because when I fixed the CGI script to return a proper Last-Modified header, the pages showed up in htsearch, with no other changes being made. Does anyone know of anything else that might explain this behaviour? I'd start putting trace prints in htsearch to track this down, but I have too many high-priority things right now to spend much time on ht://Dig right away. htsearch -vvvv didn't give any indication of what might be going on - the URLs in question never even showed up at all in the output. I don't think I'd consider this a showstopper, but it does seem odd that htsearch rejects any modtime value at all when none of those parameters have been specified. This, coupled with the fact that htdig will assign a 0 modtime if it can't parse the Last-Modified header (as opposed to a missing Last-Modified header, which should be taken as the current time if I'm not mistaken), could lead to others having similar problems. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Lachlan A. <lh...@us...> - 2003-10-22 15:11:45
|
Greetings Gilles, In htcommon/defaults.cc, startyear is specified as 1970, so your=20 config file would have to explicitly clear startyear to say no date=20 is given. The reason for startyear being specified in defaults.cc is that=20 the default value should be in attrs.html, which is automatically=20 generated. The three fixes I can think of (in order of my=20 preference) are: 1. Set the (hard-coded) default value of startday in htsearch/Display.cc to 0 instead of 1. I'm not sure if this would work, and it may break other things. 2. Leave startyear empty in defaults.cc and manually hack attrs.hml. 3. Leave startyear undocumented. Opinions? Cheers, Lachlan On Wed, 22 Oct 2003 08:30, Gilles Detillieux wrote: > even > though these dozen or so web pages were definitely in the database, > and came out into db.docs after an htdump (with a m:0 field), > htsearch would not show these in search results. I looked at the > code, and the only thing that I can see that would cause this is if > the startyear, startmonth or startday input parameters were set, > causing the timet_startdate value in Display.cc to be greater than > 0. But I didn't set these! I ran htsearch from the command line, > so I know I wasn't passing it these values as input parameters, and > the config file I used didn't define these as attributes either. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Lachlan A. <lh...@us...> - 2003-10-24 10:43:29
|
Greetings, I've applied a better patch. The default for startyear is empty,=20 and it is documented that *if* a start/end date is specified then it=20 defaults to 1970. Gilles, could you please verify that this fixes the bug, and close the=20 report? Thanks, Lachlan On Thu, 23 Oct 2003 00:05, Lachlan Andrew wrote: > The three fixes I can think of are: > > 1. Set the (hard-coded) default value of startday in > htsearch/Display.cc to 0 instead of 1. I'm not sure if this > would work, and it may break other things. > 2. Leave startyear empty in defaults.cc and manually hack > attrs.hml. > 3. Leave startyear undocumented. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Gilles D. <gr...@sc...> - 2003-10-24 17:36:08
|
According to Lachlan Andrew: > Greetings, > > I've applied a better patch. The default for startyear is empty, > and it is documented that *if* a start/end date is specified then it > defaults to 1970. > > Gilles, could you please verify that this fixes the bug, and close the > report? ... > On Thu, 23 Oct 2003 00:05, Lachlan Andrew wrote: > > The three fixes I can think of are: > > > > 1. Set the (hard-coded) default value of startday in > > htsearch/Display.cc to 0 instead of 1. I'm not sure if this > > would work, and it may break other things. > > 2. Leave startyear empty in defaults.cc and manually hack > > attrs.hml. > > 3. Leave startyear undocumented. Yes, I think keeping all of these empty by default is the best approach, and the one most like 3.1.6 uses. I think 3.1.6 with patches has a solid implementation of this, so I'll compare the 2 Display codes to see what discrepancies I find, and try to figure if these are warranted or not. I see no reason to go with option 2 or 3 above. However, I haven't seen your patch, nor has it come through in the CVS yet if you committed it. The current htcommon/defaults.cc still has a default of 1970. Can I see the fix for this so I can give it a try? It's not attached to the bug report you filed. Thanks. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2003-10-25 23:49:37
|
According to me: > According to Lachlan Andrew: > > I've applied a better patch. The default for startyear is empty, > > and it is documented that *if* a start/end date is specified then it > > defaults to 1970. > > > > Gilles, could you please verify that this fixes the bug, and close the > > report? ... > Yes, I think keeping all of these empty by default is the best approach, > and the one most like 3.1.6 uses. I think 3.1.6 with patches has a solid > implementation of this, so I'll compare the 2 Display codes to see what > discrepancies I find, and try to figure if these are warranted or not. > I see no reason to go with option 2 or 3 above. > > However, I haven't seen your patch, nor has it come through in the CVS > yet if you committed it. The current htcommon/defaults.cc still has a > default of 1970. Can I see the fix for this so I can give it a try? > It's not attached to the bug report you filed. Never mind about seeing your fix. There are some wierd SourceForge delays, such that some messages arrive quickly while others take a day or more. I guess there were similar delays with CVS, but your change did come through yesterday. I thought you were going to change Display.cc, but I see it was just defaults.cc that needed fixing. Anyway, I've got my changes into Display.cc as well, and it all seems to be working just as it should. I've closed your bug report. Could someone do likewise for bugs 578570 and 829746, as I've applied the bug fixes for these? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Lachlan A. <lh...@us...> - 2003-10-26 10:19:56
|
Greetings all, Thanks for the changes to Display.cc, and closing the bug report,=20 Gilles. I've closed 578570, but can't test 829746. Does anyone have root=20 access to a Solaris box? If not, someone will have to close this=20 without testing. Similarly, does anyone have access to a Solaris box=20 running gcc 3.3 to close the -Wno-deprecated issue, 799938. Does anyone have access to Cygwin to check if 814268 is a common=20 problem? If ht://Dig doesn't work under Cygwin, that is probably a=20 show-stopper (or can Neal's native Win32 port supplant it?). I personally don't think that the memory usage of htmerge (823866)=20 should be addressed before release, since it sounds like it would=20 require a fairly significant rewrite (although I haven't checked the=20 code at all). Could someone please check and close 829754 and 829761. Also 823455=20 can be removed from "Include in 3.2.0" once the partial fix has=20 been checked. On Sat, 25 Oct 2003 07:53, Gilles Detillieux wrote: > I've got my changes into Display.cc as well, and it all > seems to be working just as it should. I've closed your bug > report. Could someone do likewise for bugs 578570 and 829746, as > I've applied the bug fixes for these? --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-10-26 23:02:27
|
> I've closed 578570, but can't test 829746. Does anyone have root > access to a Solaris box? If not, someone will have to close this > without testing. Similarly, does anyone have access to a Solaris box > running gcc 3.3 to close the -Wno-deprecated issue, 799938. I can look at the first one at work tommorow.. we have a solaris 8 box, but I can't do the second as it doesn't have gcc 3.3 (and it's not my machine so installing a new gcc would probably lead to bloodshed) > Does anyone have access to Cygwin to check if 814268 is a common > problem? If ht://Dig doesn't work under Cygwin, that is probably a > show-stopper (or can Neal's native Win32 port supplant it?). I'll check this out as well. I want to support BOTH cygwin & native ports. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |