You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Geoff H. <ghu...@ws...> - 2001-12-14 20:43:52
|
I usually pipe the results of htdig (actually rundig.sh) to a log file. So I'm not concerned when something "scrolls off the screen." Additionally, the script outputs the date stamps, so it's pretty easy to generate speed ratings. Personally, I'd prefer to stay away from too many statistics about the local file access--keep in mind that to collect these, we must add code to collect the statistics, calculate the read rate, check the time, etc. While this may or may not help in profiling, it's certainly going to kill the performance. (Personally, I'm not so keen on the HTTP statistics either, but the latency is a bit longer.) You ask for the number from HTTP access and the number from local access. This is probably easy enough, but you're clearly not getting any from HTTP at the moment, right? > My most recent dig (still ongoing) has been indexing for 20 hours and > has reached 110,000 documents over local file access. 3.1.3 indexed > 330,000 documents in around 6 hours. I have three weeks before my > server needs to be up and I am very willing to help locate any possible > slowdowns (NFS tuning, etc). I'm not a Linux guru but I have plenty of > time to spend on this. Please understand this is an extremely unfair comparison. For one, there are bugs in 3.1.3 and features that have been added that slow down indexing. Even 3.1.6 is reported to be slower than 3.1.3 for these reasons. You also mention using NFS but don't elaborate. It's probably fine to index over "local" NFS disks, but I don't know that it's ncessarily better than using the HTTP/1.1 code depending on the speed and latency of your network. -Geoff ---------- Forwarded message ---------- Date: Fri, 14 Dec 2001 07:53:09 -0500 From: Greg Lepore <gr...@md...> To: Geoff Hutchison <ghu...@ws...> Subject: Re: [htdig-dev] profiling Geoff, Well, I guess I wasn't too clear. 1. At the end of a dig with the -s flag htdig displays http statistics: ( Persistent connections : Yes HEAD call before GET : No Connections opened : 0 Connections closed : 0 Changes of server : 0 HTTP Requests : 0 HTTP KBytes requested : 0 HTTP Average request time : 0 secs HTTP Average speed : 0 KBytes/secs) When you are using local file access, it displays the above, which is accurate but not useful. It would be useful to see the number of documents indexed at this point, which is after the dig is finished and after htdig displays a list of problem urls. I know -v displays a running count and a final count at the end, but this is before it prints the problem files and the statistics, which usually push the document count off the screen. If the -s flag prints statistics about the dig, surely one of the most important is how many documents were indexed. Is it possible to add a section on Local File Access Statistics that gives relevant information about the dig? Total time, average transfer speed (which should help diagnose disc read slowdowns), total documents, how many .pdfs, number of links not found, etc. If htdig is using a combination of HTTP and local file access, the number of each would also be nice to have at this point. For instance, when a site is a mixture of dynamic and static pages, and the static pages are being indexed over local file access, and the dynamic ones via http. My most recent dig (still ongoing) has been indexing for 20 hours and has reached 110,000 documents over local file access. 3.1.3 indexed 330,000 documents in around 6 hours. I have three weeks before my server needs to be up and I am very willing to help locate any possible slowdowns (NFS tuning, etc). I'm not a Linux guru but I have plenty of time to spend on this. At 11:43 PM 12/13/01, you wrote: >At 8:09 AM -0500 12/13/01, Greg Lepore wrote: >> 1. When running htdig with the -s flag, it would be nice to see >> the number of documents indexed (I know it appears at the end of the >> dig, but it would be nice to have it here as well). > >I'm not quite sure what you mean. Do you want the number of documents >indexed so far? If so, it's probably easier to get this from htdig -v >rather than anything from -s, which is only called at the end. > >> 2. The statistics give no results when using local file access, >> at least the speed could be displayed. > >I'm not sure I understand what you mean by "no results." Do you mean the >statistics that come up about HTTP access? > >> 3. When using local file access, there should be a message when >> htdig has to use http instead, and then the summaries should be displayed. > >Again, I'm not quite sure I follow. This certainly comes up when you're >running with htdig -v and this seems fine to me. Do you want something >additional? > >-Geoff Gregory Lepore Webmaster, State of Maryland Supervisor, Archives of Maryland Online 410-260-6425 |
From: Gilles D. <gr...@sc...> - 2001-12-14 19:23:13
|
According to Peter Wurbs: > I tried to use whatsnew.pl together with htdig 3.2.0b4. > My config: > - whatsnew.pl # modified 26 Oct 1998 (c) 1998 Jacques Reynes > - htdig 3.2.0b4 > - Solaris 7 > - Berceley-DB 4.0.14 > - BerceleyDB.pm 0.17 > > I configured in the correct way. Otherwise I had error messages. > But there is no any found URL in the result html-page. > I debugged a little bit and added print-messages in the whatsnew.pl. > As a conclusion it seems, that whatsnew does'nt open the database in a > correct manner. Additionally, no any warning is given, if a give the > wrong parameter as the database index file. The modified date for whatsnew.pl should be an immediate clue. The 3.2 version of htdig didn't start to be developed until the last couple years, and whatsnew.pl hasn't even kept up with early developments in the 3.1.x series in 2999, so it's pretty hopeless to try to get it to work with 3.2. There is a different whatsnew script on ftp.ccsf.org that can handle 3.1.x databases, but it's a bit tricky to get it to work because it requires special Perl libraries. There's nothing equivalent for 3.2. I've proposed modifications to htsearch (for 3.1 and 3.2) which would allow it to be used as part of a what's new facility. As an integral part of the main source tree, it would be able to keep up with new developments and changes to the database structures. However, these modifications have yet to be developed. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-12-14 19:10:08
|
According to J. op den Brouw: > On Thu, 13 Dec 2001, Geoff Hutchison wrote: > > It's something of a disk hog considering you rarely ever revise a > > file. When we moved, I basically felt it wasn't such a great idea to > > put them in CVS, but we could change if this is what people want. > > It's ideal for reproduction/mirroring. You can use CVS the same > way as you use it for mirroring the maindocs. Therefor we don't > need extra software, like mirror or wget. > > It would be nice if Joe's patch site would be available by CVS too.... > > But fair is fair, rsync would be the most preferred..... The drawback to using CVS is that everyone has to use it. You can't really have an alternate method of updating the files, otherwise the repository doesn't get updated. Another option open to htdig developers is ssh access and the scp command. Personally I like being able to use scp to copy in any contributed works. I find this more convenient than the old method of using CVS. Is there some way of setting up automatic updating/mirroring using ssh access? > > So far SourceForge hasn't put a quota on CVS (it's about the only > > thing they *haven't* quota'ed. So it's up to the group. If this is a > > good idea now that FTP has disappeared, sure. > > > > Keep in mind, I still find strange quirks with the CVS on SourceForge > > in terms of connectivity. Some of my cron jobs hang because the cvs > > command never hangs. (And I don't know how you'd set an alarm in a > > shell script.) > > Is there a possibility for a trail period? I'm currently testing new > ways for mirroring the htdig sites, since we have lost FTP access, > and a combination of CSV and wget is getting some successfull results. > Later more on that. Given the quirks with CVS on SF, I'd be a bit leary about using it, even for a trial period, because it's sort of an all or nothing proposition. Mind you, the maindocs updates have been going pretty well now, haven't they? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Peter W. <Pet...@en...> - 2001-12-14 15:09:42
|
Hi, I tried to use whatsnew.pl together with htdig 3.2.0b4. My config: - whatsnew.pl # modified 26 Oct 1998 (c) 1998 Jacques Reynes - htdig 3.2.0b4 - Solaris 7 - Berceley-DB 4.0.14 - BerceleyDB.pm 0.17 I configured in the correct way. Otherwise I had error messages. But there is no any found URL in the result html-page. I debugged a little bit and added print-messages in the whatsnew.pl. As a conclusion it seems, that whatsnew does'nt open the database in a correct manner. Additionally, no any warning is given, if a give the wrong parameter as the database index file. Any idea or help? Thanks. Peter. |
From: J. op d. B. <MSQ...@st...> - 2001-12-14 11:19:17
|
On Thu, 13 Dec 2001, Geoff Hutchison wrote: > It's something of a disk hog considering you rarely ever revise a > file. When we moved, I basically felt it wasn't such a great idea to > put them in CVS, but we could change if this is what people want. It's ideal for reproduction/mirroring. You can use CVS the same way as you use it for mirroring the maindocs. Therefor we don't need extra software, like mirror or wget. It would be nice if Joe's patch site would be available by CVS too.... But fair is fair, rsync would be the most preferred..... > So far SourceForge hasn't put a quota on CVS (it's about the only > thing they *haven't* quota'ed. So it's up to the group. If this is a > good idea now that FTP has disappeared, sure. > Keep in mind, I still find strange quirks with the CVS on SourceForge > in terms of connectivity. Some of my cron jobs hang because the cvs > command never hangs. (And I don't know how you'd set an alarm in a > shell script.) Is there a possibility for a trail period? I'm currently testing new ways for mirroring the htdig sites, since we have lost FTP access, and a combination of CSV and wget is getting some successfull results. Later more on that. --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
From: Geoff H. <ghu...@ws...> - 2001-12-14 06:01:19
|
At 9:42 AM -0600 12/12/01, Gilles Detillieux wrote: >I'm not too sure of the exact reasons for moving away from CVS for >files on SourceForge. It's something of a disk hog considering you rarely ever revise a file. When we moved, I basically felt it wasn't such a great idea to put them in CVS, but we could change if this is what people want. >Geoff would probably know better than me, as he's the one who coordinated >the switch to SourceForge. So far SourceForge hasn't put a quota on CVS (it's about the only thing they *haven't* quota'ed. So it's up to the group. If this is a good idea now that FTP has disappeared, sure. Keep in mind, I still find strange quirks with the CVS on SourceForge in terms of connectivity. Some of my cron jobs hang because the cvs command never hangs. (And I don't know how you'd set an alarm in a shell script.) -Geoff |
From: Geoff H. <ghu...@ws...> - 2001-12-14 06:01:18
|
At 8:09 AM -0500 12/13/01, Greg Lepore wrote: > 1. When running htdig with the -s flag, it would be nice to >see the number of documents indexed (I know it appears at the end of >the dig, but it would be nice to have it here as well). I'm not quite sure what you mean. Do you want the number of documents indexed so far? If so, it's probably easier to get this from htdig -v rather than anything from -s, which is only called at the end. > 2. The statistics give no results when using local file >access, at least the speed could be displayed. I'm not sure I understand what you mean by "no results." Do you mean the statistics that come up about HTTP access? > 3. When using local file access, there should be a message >when htdig has to use http instead, and then the summaries should be >displayed. Again, I'm not quite sure I follow. This certainly comes up when you're running with htdig -v and this seems fine to me. Do you want something additional? -Geoff |
From: didier <dga...@ma...> - 2001-12-13 15:08:55
|
Hi everybody Greg Lepore wrote: > > I am willing to enable profiling on my recent install of the 12/9 build of > 3.2 beta 4 to figure out some bottlenecks. I just need to know how and > what to look for in the results. I have 10 or so collections ranging from > 200 - 500,000 documents. I tried to search the site but the search has > been down for a couple of days (Internal Server Error). The system specs > are: RedHat 7.2, dual 1GHz processors, 1GB of RAM, 5 18GB SCSI hard > drives, four are RAID 5 (with /opt/ on that partition). If you guys can > tell me how to re-compile and what to look for I can do it. My initial > digs (over an NFS mount) were taking from 8-10 times as long as 3.1.3. > My rough re-design of the site is still available at: > http://rhobard.com/htdig/ > I still like it.... > I'm afraid profiling won't help a lot. First look at CPU usage with top, vmstat, time, whatever. If you have a cpu load around 10, 15% then the bottleneck is IO subsystem not the CPU. I didn't use 3.2 for a long time (I had 1, 2% CPU load), but if db size > RAM then you're toasted. Anyway for profiling (don't know if it's the right way but should work). make distclean export CFLAGS=-pg export CXXFLAGS=-pg ./configure --enable-shared=false and so on Don't use rundig (you won't get profile info for htdig) rather use htdig -i and other options after a long time. gprof htdig (you should have a gmon.out in your current dir). Geoff what about for -i: going the old way, ascii --> sort --> htmerge. or htdig uses a tempory db (half RAM size?) and bulk insert in the background, if there's such a thing with berkeley db. Didier |
From: Greg L. <gr...@md...> - 2001-12-13 13:10:28
|
I am willing to enable profiling on my recent install of the 12/9 build of 3.2 beta 4 to figure out some bottlenecks. I just need to know how and what to look for in the results. I have 10 or so collections ranging from 200 - 500,000 documents. I tried to search the site but the search has been down for a couple of days (Internal Server Error). The system specs are: RedHat 7.2, dual 1GHz processors, 1GB of RAM, 5 18GB SCSI hard drives, four are RAID 5 (with /opt/ on that partition). If you guys can tell me how to re-compile and what to look for I can do it. My initial digs (over an NFS mount) were taking from 8-10 times as long as 3.1.3. Features Requests: 1. When running htdig with the -s flag, it would be nice to see the number of documents indexed (I know it appears at the end of the dig, but it would be nice to have it here as well). 2. The statistics give no results when using local file access, at least the speed could be displayed. 3. When using local file access, there should be a message when htdig has to use http instead, and then the summaries should be displayed. My rough re-design of the site is still available at: http://rhobard.com/htdig/ I still like it.... Gregory Lepore Webmaster, State of Maryland Supervisor, Archives of Maryland Online 410-260-6425 |
From: Gilles D. <gr...@sc...> - 2001-12-12 15:43:02
|
According to J. op den Brouw: > Okay, I'm currently setting up a wget mirror for the "files" section. > With a dozen or so commands you can make a nice copy of the files > section. Instructions follow shortly. > > Isn't it better (for mirroring anyway) to put files section into > a CVS repository? Updating your local copy would be very simple then. > > Gilles Detillieux wrote: > > > > No, from the SourceForge site, all we have right now is HTTP acces. > > FTP access has been completely cut off. You will need to use HTTP > > for mirroring. The files are at http://www.htdig.org/files/ That's a good question. On the old htdig.org site hosted by Verio, before the move to SourceForge, we used CVS for everything including the files section. I'm not too sure of the exact reasons for moving away from CVS for files on SourceForge. (We still use CVS for the source trees and maindocs, though.) I know there are some strange quirks with CVS on SourceForge, so maybe it caused problems with the large binary files in the files section. I've also heard that CVS doesn't deal with binary files all that efficiently (it's hard to "diff" them) so maybe it was a concern over the amount of repository space we'd use up. Geoff would probably know better than me, as he's the one who coordinated the switch to SourceForge. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: J. op d. B. <ms...@st...> - 2001-12-11 21:00:16
|
Okay, I'm currently setting up a wget mirror for the "files" section. With a dozen or so commands you can make a nice copy of the files section. Instructions follow shortly. Isn't it better (for mirroring anyway) to put files section into a CVS repository? Updating your local copy would be very simple then. Gilles Detillieux wrote: > > No, from the SourceForge site, all we have right now is HTTP acces. > FTP access has been completely cut off. You will need to use HTTP > for mirroring. The files are at http://www.htdig.org/files/ --Jesse |
From: Gilles D. <gr...@sc...> - 2001-12-11 19:17:08
|
According to J. op den Brouw: > is there no way HTDIG files are availably by FTP? If so, how do > we make a copy of the files directory? If FTP is no longer > available, I'll update the mirror-howto page accordingly. No, from the SourceForge site, all we have right now is HTTP acces. FTP access has been completely cut off. You will need to use HTTP for mirroring. The files are at http://www.htdig.org/files/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-12-11 15:22:35
|
According to Peter Wurbs: > Running htdig 3.2.0.b3 under Solaris 2.7 produces the follwing > Arithmetic Exception (Core Dumped). This has been known and fixed since shortly after 3.2.0b3 was released. Try the latest 3.2.0b4 development snapshot from http://www.htdig.org/files/snapshots/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Peter W. <Pet...@en...> - 2001-12-11 10:41:08
|
Hi, Running htdig 3.2.0.b3 under Solaris 2.7 produces the follwing Arithmetic Exception (Core Dumped). # ./htdig -vvv ht://dig Start Time: Tue Dec 11 08:50:21 2001 Arithmetic Exception (core dumped) This is the the first run, thus there are no databases. The same error occurs when running all the other commands (htsearch, htpurge ...) Here is the advised debugger run: (gdb) run Starting program: /www/htdig-3.2.0b3/bin/htdig Program received signal SIGFPE, Arithmetic exception. 0xfefb885c in .urem () (gdb) bt #0 0xfefb885c in .urem () #1 0xff15bef4 in Dictionary::Add (this=0x3cd3c, name=@0xffbef410, obj=0x3e280) at Dictionary.cc:196 #2 0xff15ae4c in Configuration::AddParsed (this=0x3cd38, name=@0xffbef410, value=@0x3e280) at Configuration.cc:200 #3 0xff15b664 in Configuration::Defaults (this=0x3cd38, array=0x3b568) at Configuration.cc:381 #4 0x21c6c in main (ac=1, av=0xffbefc5c) at htdig.cc:125 (gdb) quit Are there any ideas, what's the problem. Thanks in advance. Peter |
From: J. op d. B. <MSQ...@st...> - 2001-12-10 16:07:01
|
Hi all, is there no way HTDIG files are availably by FTP? If so, how do we make a copy of the files directory? If FTP is no longer available, I'll update the mirror-howto page accordingly. --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
From: Gilles D. <gr...@sc...> - 2001-12-06 22:57:25
|
According to Joe R. Jah: > Hi Gilles, > > In absence of htdig-notification-date meta tag in any document, htdig > could add a number to the document date and use it instead. The number > could be set to 180 days by default, and set in htdig.conf. Interesting idea, but there's no way the default would be set to anything that would activate this. If we did that, then by default htnotify (and therefore rundig) would send out e-mail notices for all documents that haven't been updated in about 6 months. Do you really think the average htdig user would want this by default? > In absence of htdig-email meta tag in any document, htdig could use the > Link tag or Maintainer meta tag instead; if they were all absent, htdig > could use a htdig-email set in htdig.conf. It could be set to maintainer > by default. Only if set to blank, no notification would be sent in > absence of all of the above. Again, an interesting idea. Good items for a wish list, but not for 3.1.6 IMHO. At least, neither Geoff nor I would likely have the time or inclination to implement this ourselves, and we're not holding up its release for any wish list items that can't be done in short order. So, if you can find someone to implement these in the next couple weeks and submit them to us, I'd certainly consider them, but otherwise they'll just go on a wish list for 3.2. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gabriele B. <an...@us...> - 2001-12-06 15:06:54
|
Hi! >Program received signal SIGSEGV, Segmentation fault >0x40023822 in HtHTTP::SetRequestCommand (this=0x80e8e08, cmd=@0xbfffe96c) >at HtHTTP.cc:61 >616 _cookie_jar->SetHTTPRequest_CookiesString(_url, cmd I feel terrible about this, because I should worry about the net code and especially about the Cookies stuff. I have been studying in Australia for five months and I am going back to Italy next tuesday. Hopefully I could give it a look by the end of next week, if you don't mind. Indeed, my laptop broke down almost three months ago, and it took more than 2 months to fix it at Toshiba in Melbourne!!! Of course I lost all of my work and overall my Linux (with all the settings for the development of ht://Dig and my ht://Check). Consider that I have not had the chance to develop also my project during this time, because everything here at the University is under firewall and proxies and I have no access to the Internet. Moreover, I had to do things I had already done cos I lost them during the crash (yes, I know, I am stupid because I did not do any backup). Having said this, I still feel terrible in front of every ht://Dig user and overall developers (Geoff and Gilles especially). So please be patient and I swear I will fix it before XMas for sure! Ciao -Gabriele -- Gabriele Bartolini - Web Programmer Current Location: Melbourne, Victoria - Australia an...@us... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > find bin/laden -name osama -exec rm {} \; |
From: Daniel N. <dan...@t-...> - 2001-12-06 09:38:08
|
Hi, (resending to the developers list, as I got no repsonse on the general li= st=20 and it's quite important) I tried to use Cookies with the 2001-11-25 snapshot. However, when I set=20 disbable_cookies: false, htdig will crash immediately. gdb says Program received signal SIGSEGV, Segmentation fault 0x40023822 in HtHTTP::SetRequestCommand (this=3D0x80e8e08, cmd=3D@0xbfffe= 96c)=20 at HtHTTP.cc:61 616 _cookie_jar->SetHTTPRequest_CookiesString(_url, cmd It seems _cookie_jar is never properly defined (it's only set to 0 at the= =20 top). What is the state of cookie support? I assume this crash is a known= =20 problem? Is this easy to fix? If so, I might try... Regard Danie -- http://www.danielnaber.d |
From: Gilles D. <gr...@sc...> - 2001-12-05 22:23:01
|
According to Joe R. Jah: > On Fri, 30 Nov 2001, Gilles Detillieux wrote: > > I don't think the difference between 99 and 104 seconds is significant. > > This confirms my suspicion that the HAVE_BROKEN_REGEX doesn't do a > > whole lot. To be sure, though, I think we'd need timings for 112501 + > > parsedate.0 + ssl.6, remove reference to regex.o in htlib/Makefile, #undef > > AND #define HAVE_BROKEN_REGEX (i.e. two tests) in include/htconfig.h > > (but don't remove htlib/regex.h). I suspect the timings for both will > > be like the 2nd test above, around 143 sec. > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #define HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Sat Dec 1 00:10:58 PST 2001 > htmerge: Start merging: Sat Dec 1 00:12:44 PST 2001 106 ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #undef HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Sat Dec 1 00:18:55 PST 2001 > htmerge: Start merging: Sat Dec 1 00:20:38 PST 2001 103 ... OK, these are all around 100 sec, so I guess the main thing is to make sure the bundled htlib/regex.c isn't compiled and the resulting regex.o put into htlib/htlib.a. Removing the reference to regex.o in the Makefile seems to be the key. > > I suspect the difference between the 143 and the 99-104 sec is due > > to the inclusion of the bundled regex.h even though you're using > > the C library regex.o code. It's a wonder this works at all, but > > there does seem to be some impact on performance. > > I am not sure how that 143 came about last time; I can't reproduce it any > more;-/ Probably some other system activity, or less pages in the disk cache when you ran that test. Are you getting times closer to 100 sec now? This would stand to reason. However, to be on the safe side, I think the code should make sure it doesn't use the bundled regex.h if it doesn't use the bundled regex.c. If you mix and match them, there may be problems in some cases we haven't discovered yet. Geoff said he'd look into what other packages do for regex support. > > > ____________________ 092301 + Armstrong + ssl.4 ____________________ > > > htdig: Start digging: Fri Nov 30 00:18:06 PST 2001 > > > htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001 38 seconds > > ... > > > > This is the part I find a bit troubling, but I don't know what we > > can do about it. I don't know why Armstrong's patch, which uses rx > > instead of regex, causes htdig to run 2-3 times faster, unless there > > are other changes between 092301 and 112501 that account for much of > > this, but it could well be just implementation efficiencies in one > > library and not in the other. > > I reported the difference in indexing time to the list the very first time > url_rewrite_rules was integrated in the code. I don't believe at that > time anything else had changed in the code. Right you are. The Sep 23 snapshot was just before I committed Geoff's changes for url_rewrite_rules using regex. Since then, very little has changed that should affect htdig performance. I was thinking back to when your Armstrong patch benchmarks were on a snapshot from early or mid-August, and before I had committed a number of parser changes. > > In your tests above, do you make use of url_rewrite_rules? If so, > > how do the timings change if you don't use it? > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #define HAVE_BROKEN_REGEX in include/htconfig.h > no url_rewrite_rules > > htdig: Start digging: Sat Dec 1 00:40:09 PST 2001 > htmerge: Start merging: Sat Dec 1 00:40:34 PST 2001 25 seconds ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #undef HAVE_BROKEN_REGEX in include/htconfig.h > no url_rewrite_rules > > htdig: Start digging: Sat Dec 1 00:28:50 PST 2001 > htmerge: Start merging: Sat Dec 1 00:29:10 PST 2001 20 seconds ... OK, I don't think that 5 second difference can be treated as significant given the variations in timings we've seen for other tests. The only way to get more significant results would be to run each test several times and take the mean run time. It is good to know that the latest code doesn't bog down when you're not using url_rewrite_rules. That suggests we're not seeing the sort of wierdness we were seeing in your profiling of 3.2 several months ago, with the millions of unexplained calls to regcomp. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Joe R. J. <jj...@cl...> - 2001-12-05 07:02:49
|
Hi Gilles, In absence of htdig-notification-date meta tag in any document, htdig could add a number to the document date and use it instead. The number could be set to 180 days by default, and set in htdig.conf. In absence of htdig-email meta tag in any document, htdig could use the Link tag or Maintainer meta tag instead; if they were all absent, htdig could use a htdig-email set in htdig.conf. It could be set to maintainer by default. Only if set to blank, no notification would be sent in absence of all of the above. Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: J. op d. B. <MSQ...@st...> - 2001-12-03 11:37:21
|
Okay, FTP at opdenbrouw.nl is not supported for now. Server was hacked last wednesday and took me a day to restore a lot of things. Also wu-ftpd has a known security hole that was discovered at 29 nov 2001. It can be used to gain root access, even from anonynous FTP see http://www.incidents.org Greetz, --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
From: J. op d. B. <MSQ...@st...> - 2001-12-03 11:18:29
|
Yup, thanx, but my server was hacked. I've disabled anonymous FTP and will do that for a while. Will tell the htdig group about it. On Thu, 29 Nov 2001, Marco Nenciarini wrote: > > Hi all, > I am the italian mirror maintainer, and I will report you that your ftp site > doesn't work at all. > > My logs says that your ftp site is down from 5 Nov 2001 > > Best Regards > > -- > --------------------------------------------------------------------- > | Marco Nenciarini | Debian/GNU Linux Developer - Plug Member | > | mn...@pr... | http://www.prato.linux.it/~mnencia | > --------------------------------------------------------------------- > Key fingerprint = FED9 69C7 9E67 21F5 7D95 5270 6864 730D F095 E5E4 > > --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
From: Geoff H. <ghu...@ws...> - 2001-12-01 23:57:49
|
At 11:55 AM -0600 11/29/01, Gilles Detillieux wrote: >I think to make this all work, we need to rename the bundled regex.h to >something like htregex.h to avoid conflicts, as well as put some hooks >in the bundled regex.c code to disable it all if you need to use the >C library code instead. What do you think, Geoff? I think other packages have some similar trickery, so I'm going to hunt through the autoconf repositories again. >In the end, it might make sense to have a configure option to override >the automatic test for this, because I'm not convinced it will work in >all cases. (However, to the best of my recollection, it is only BSDi >systems that have a problem with the bundled regex code.) If I can find the tests I'm thinking of, they'll have something like configure --with-bundled-regex --without-bundled-regex --with-rx -Geoff |
From: Geoff H. <ghu...@ws...> - 2001-12-01 23:57:48
|
At 12:28 PM -0600 11/30/01, Gilles Detillieux wrote: >Yes, we know. SourceForge has disabled project FTP services. I realized that I may not have made this as well-known as I should have, and I apologize. I put a news item on the SourceForge page (and hence on the main page). It's probably a good idea also to pull together a list of contacts for all of the mirrors in case there are similar announcements in the future. -Geoff |
From: Geoff H. <ghu...@ws...> - 2001-12-01 23:57:46
|
At 4:14 PM -0600 11/27/01, Gilles Detillieux wrote: >subtle conflict on BSDI systems. On Oct. 4, you suggested checking the >system type for "*-*-bsdi*", to make an explicit exception to the test >for these systems. As far as I know, this hasn't been done in either >3.2.x or 3.1.x. I've still been trying to find an overall better solution (as outlined in other messages). -Geoff |