You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Gilles D. <gr...@sc...> - 2003-02-20 18:12:38
|
According to Gabriele Bartolini: > Ciao Gilles, > > just a quick note. A few weeks ago I posted a message regarding a patch > for ht://Dig 3.2 which enables to import cookies through a text file. Can > you please give it a look and - as usual - find the right words for the > description in the defaults.cc file? > > I wanted to post you the URL of my message on the mailing list archive, > but ... sf.net seems to be encountering some problems right now. > > Indeed, I'd love to commit the code in the CVS rep. Hi, Gabriele. You don't need anyone's permission to commit to cvs, as long as there isn't a feature freeze in place, which there isn't for 3.2.0b4/b5. If you're confident that the patch works, then go ahead and commit it. I do see one potential memory leak in the patch, though - if result is non-zero, you just give a warning, but cookie_file has still been set to a new HtCookieInFileJar object, which doesn't get deleted. Shouldn't the delete cookie_file statement be moved outside of the innermost if clause, and past the end of the else clause? As for the description of the new attribute, I'm not really clear enough on what the attribute does to describe it adequately. I haven't even seen all your code, but just the patch. It seems the purpose of the file is to pre-load the memory-based cookie jar with some preset cookie values, but that's about the extent of my understanding. I've deliberately stayed clear of anything to do with cookies in ht://Dig, because I have absolutely no use for them, and feel overwhelmed already with the parts of ht://Dig I do understand, so please don't ask me to become an expert on every piece of code added to 3.2. If you add a description to defaults.cc yourself, doing your best to describe it, I'll gladly fix any grammatical errors or ask you about ambiguities I find in the description, but I don't want to have to document things I don't understand or use. Ditto for testing - I can't test cookie support in htdig, because I don't use them on my system. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2003-02-19 18:00:55
|
According to Martin Laarz: > I try to compile the 3.2.0.b3 Version on an AIX 5.1L Partition using gcc > 2.9-aix51-020209 > on a RS6000 (IBM Regatta). What i need from the newer Version is the phrases > feature. > > During the compilation phase i got theese warnings: ... > At least I got my binaries, linking lstdc++ and lm against it. > > But when i try to use them e.g. htfuzzy to begin to build up the first .db for > the endings, it got > an exception like "illegal instruction" a core but not able to backtrace. Well, at the very least, I recommend you dump 3.2.0b3, and try the latest 3.2.0b4 development snapshot from http://www.htdig.org/files/snapshots/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-02-19 17:32:21
|
Thanks. I'll give this page a test. What page sizes are you seeing the errors on? Ie what is your wordlist_page_size set to? Thanks again. On Wed, 19 Feb 2003, Lachlan Andrew wrote: > On Friday 14 February 2003 11:16, Neal Richter wrote: > > > Is there something you can tell us about the type of data you are > > indexing? Are they big pages with lots of repetitive information.. > > giving htdig many similar keys which hash/sort to the same pages? > > Greetings Neal, > > I've found one page in the qt documentation which may be causing > those problems (attached). I hadn't realised it, but the > valid_punctuation attribute seems to be treated as an *optional* > word break. (The docs say it is *not* a word break, and that seems > the intention of WordType::WordToken...) The page has long strings > with many valid_punctuation symbols, and gives output like > > elliptical 1060 0 1113 34 > elp 1363 0 131 0 > elphick 1516 0 750 0 > elsbs 1372 0 968 4 > elsbsw 1372 0 968 4 > elsbswp 1372 0 968 4 > elsbswpe 1372 0 968 4 > elsbswpew 1372 0 968 4 > elsbswpewg 1372 0 968 4 > elsbswpewgr 1372 0 968 4 > elsbswpewgrr 1372 0 968 4 > elsbswpewgrr1 1372 0 968 4 > elsbswpewgrr1t 1372 0 968 4 > elsbswpewgrr1twa7 1372 0 968 4 > elsbswpewgrr1twa7z 1372 0 968 4 > elsbswpewgrr1twa7z1bea0 1372 0 968 4 > elsbswpewgrr1twa7z1bea0f 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkd 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbke 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbkezb 1372 0 968 4 > else 225 0 1285 0 > > Might that be the trouble? > > (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly > different data set.) > > Cheers, > Lachlan Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Martin L. <mar...@fi...> - 2003-02-19 16:46:22
|
Hi guys, I'm new to this list and so please don't flame me if i ask this questio= n again: I try to compile the 3.2.0.b3 Version on an AIX 5.1L Partition using gc= c 2.9-aix51-020209 on a RS6000 (IBM Regatta). What i need from the newer Version is the ph= rases feature. During the compilation phase i got theese warnings: lock_region.c: In function `CDB___lock_dump_object': lock_region.c:630: warning: format not a string literal, argument types= not checked strptime.cc: In function `char *mystrptime (const char *, const char *,= tm *)': strptime.cc:86: warning: `int len' might be used uninitialized in this = function In file included from WordDBCompress.cc:22: WordDBPage.h: In method `void WordDBPage::insert_btikey (WordDBKey &, B= INTERNAL &, int)': WordDBPage.h:267: warning: int format, long int arg (arg 2) WordDBPage.h:267: warning: int format, long int arg (arg 3) WordDBPage.h: In method `void WordDBPage::compress_key (Compressor &, i= nt)': WordDBPage.h:323: warning: int format, long int arg (arg 3) In file included from WordDBPage.cc:19: WordDBPage.h: In method `void WordDBPage::insert_btikey (WordDBKey &, B= INTERNAL &, int)': WordDBPage.h:267: warning: int format, long int arg (arg 2) WordDBPage.h:267: warning: int format, long int arg (arg 3) WordDBPage.h: In method `void WordDBPage::compress_key (Compressor &, i= nt)': WordDBPage.h:323: warning: int format, long int arg (arg 3) WordMonitor.cc: In method `void WordMonitor::TimerStart ()': WordMonitor.cc:176: warning: long int format, time_t arg (arg 3) defaults.cc:184: warning: white space at end of line in string ... defaults.cc:2478: warning: white space at end of line in string conf_lexer.cxx: In function `int yylex ()': conf_lexer.cxx:705: warning: label `find_rule' defined but not used conf_lexer.cxx: At top level: conf_lexer.cxx:1784: warning: `void *yy_flex_realloc (void *, unsigned = int)' defined but not used Synonym.cc: In method `int Synonym::createDB (const HtConfiguration &)'= : Synonym.cc:84: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Synonym.cc:84: warning: for conversion from `String' to `const char *= ' Synonym.cc:84: warning: because conversion sequence for the argument = is better Server.cc: In method `Server::Server (URL, StringList *)': Server.cc:52: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Server.cc:52: warning: for conversion from `String' to `const char *'= Server.cc:52: warning: because conversion sequence for the argument i= s better Server.cc:55: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Server.cc:55: warning: for conversion from `String' to `const char *'= Server.cc:55: warning: because conversion sequence for the argument i= s better Server.cc:58: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Server.cc:58: warning: for conversion from `String' to `const char *'= Server.cc:58: warning: because conversion sequence for the argument i= s better Server.cc:61: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Server.cc:61: warning: for conversion from `String' to `const char *'= Server.cc:61: warning: because conversion sequence for the argument i= s better Server.cc:123: warning: `HtHTTP *http' might be used uninitialized in t= his function Display.cc: In method `void Display::setVariables (int, List *)': Display.cc:621: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Display.cc:621: warning: for conversion from `String' to `const char = *' Display.cc:621: warning: because conversion sequence for the argument= is better Display.cc:623: warning: choosing `String::operator char * ()' over `String::operator const char * () const' Display.cc:623: warning: for conversion from `String' to `const char = *' Display.cc:623: warning: because conversion sequence for the argument= is better At least I got my binaries, linking lstdc++ and lm against it. But when i try to use them e.g. htfuzzy to begin to build up the first = .db for the endings, it got an exception like "illegal instruction" a core but not able to backtrac= e. When i start htfuzzy verry verbosely i got something like this: tfuzzy: Selected algorithm: endings htfuzzy/endings: Reading rules htfuzzy/endings: Creating databases Applying regex '^.*.$' to =E4bte =E4bte with n --> '=E4bten' Illegal instruction (core dumped) The same with the digger itself: Rejected: URL not in the limits! url rejected: (level 1)http://xyz.fiducia.de/fiducia/brettall.nsf/Datum= ?OpenView image: http://xyz.fiducia.de/img/foldoutmenu_arrow_closed.gif href: http://xyz.fiducia.de/fiducia/brettall.nsf/kategorie?OpenView (Ka= tegorie) Rejected: URL not in the limits! url rejected: (level 1)http://xyz.fiducia.de/fiducia/brettall.nsf/kategorie?OpenView size =3D 3530 Illegal instruction (core dumped) I'm not the specialist for the gdb to trace anything, but when I backtr= ace (that's wat i know) the gdb shows me something like: GNU gdb 5.0-aix51-020209 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and yo= u are welcome to change it and/or distribute copies of it under certain condi= tions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for deta= ils. This GDB was configured as "powerpc-ibm-aix5.1.0.0"... Core was generated by `htdig'. Program terminated with signal 4, Illegal instruction. #0 0x0 in ?? () from (unknown load module) (gdb) bt #0 0x0 in ?? () from (unknown load module) #1 0x100e92a4 in HtHeap::pushDownRoot (this=3D0x212ed894, root=3D0) at= HtVector.h:65470 #2 0x100e8ec8 in HtHeap::Remove (this=3D0x212ed894) at HtHeap.cc:103 #3 0x100e8464 in Server::pop (this=3D0x214eac58) at Server.cc:342 #4 0x100c8d6c in Retriever::Start (this=3D0x2ff21fb8) at Retriever.cc:= 428 #5 0x100012b8 in main (ac=3D5, av=3D0x2004d124) at htdig.cc:317 #6 0x100001dc in __start () I know that there were some people before, who had some problems on Sun= Solaris or HP UX11. Is some body out there who compiled this horse on a 64Mbit (RS600= 0 (big Endian)) System and keep that horse running? What do i have to care on? Any suggestions are welcome! Thanks Martin Laarz --------------------- FIDUCIA AG Karlsruhe/Stuttgart Wachhausstra=DFe 4 76227 Karlsruhe Martin Laarz Wissensmanager Technik Tel.: 07 21 / 4004 - 2861 mailto: mar...@fi... Fax: 07 21 / 4004 - 1176 Web: http://www.fiducia.de = |
|
From: Gilles D. <gr...@sc...> - 2003-02-19 15:19:48
|
According to Lachlan Andrew: > On Friday 14 February 2003 11:16, Neal Richter wrote: > > > Is there something you can tell us about the type of data you are > > indexing? Are they big pages with lots of repetitive information.. > > giving htdig many similar keys which hash/sort to the same pages? > > Greetings Neal, > > I've found one page in the qt documentation which may be causing > those problems (attached). I hadn't realised it, but the > valid_punctuation attribute seems to be treated as an *optional* > word break. (The docs say it is *not* a word break, and that seems > the intention of WordType::WordToken...) I guess the docs haven't kept up with what the code does. It used to be that valid_punctuation didn't cause word breaks at all, i.e. these punctuation characters were valid inside a word, and got stripped out but didn't break up the word. However, for some time now, this functionality was extended to also index each word part, so that something like "post-doctoral" gets indexed as postdoctoral, post and doctoral. This greatly enhances searches for compound words, or parts thereof, but it tends to break down when you're indexing something that's not really words... > The page has long strings > with many valid_punctuation symbols, and gives output like > > elliptical 1060 0 1113 34 > elp 1363 0 131 0 > elphick 1516 0 750 0 > elsbs 1372 0 968 4 > elsbsw 1372 0 968 4 > elsbswp 1372 0 968 4 > elsbswpe 1372 0 968 4 > elsbswpew 1372 0 968 4 > elsbswpewg 1372 0 968 4 > elsbswpewgr 1372 0 968 4 > elsbswpewgrr 1372 0 968 4 > elsbswpewgrr1 1372 0 968 4 > elsbswpewgrr1t 1372 0 968 4 > elsbswpewgrr1twa7 1372 0 968 4 > elsbswpewgrr1twa7z 1372 0 968 4 > elsbswpewgrr1twa7z1bea0 1372 0 968 4 > elsbswpewgrr1twa7z1bea0f 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkd 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbke 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbkezb 1372 0 968 4 > else 225 0 1285 0 > > Might that be the trouble? Well, I would think that if you're going to feed a bunch of C code into htdig, especially C code containing many pixmaps, then you should probably do so with a severely stripped down setting of valid_punctuation. This would speed up the process a lot and get rid of a lot of the spurious junk that's getting indexed. However, if the underlying word database is solid, then it shouldn't fall apart no matter how much junk you throw at it. So, this might be the trigger that brings the trouble to the surface, but the root cause of the trouble seems to be a bug somewhere in the code. > (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly > different data set.) Bummer. Have you tried running with no compression at all, and if so, does that work reliably? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Lachlan A. <lh...@us...> - 2003-02-19 12:49:20
|
On Friday 14 February 2003 11:16, Neal Richter wrote: > Is there something you can tell us about the type of data you are > indexing? Are they big pages with lots of repetitive information.. > giving htdig many similar keys which hash/sort to the same pages? Greetings Neal, I've found one page in the qt documentation which may be causing=20 those problems (attached). I hadn't realised it, but the=20 valid_punctuation attribute seems to be treated as an *optional*=20 word break. (The docs say it is *not* a word break, and that seems=20 the intention of WordType::WordToken...) The page has long strings=20 with many valid_punctuation symbols, and gives output like elliptical=091060=090=091113=0934 elp=091363=090=09131=090 elphick=091516=090=09750=090 elsbs=091372=090=09968=094 elsbsw=091372=090=09968=094 elsbswp=091372=090=09968=094 elsbswpe=091372=090=09968=094 elsbswpew=091372=090=09968=094 elsbswpewg=091372=090=09968=094 elsbswpewgr=091372=090=09968=094 elsbswpewgrr=091372=090=09968=094 elsbswpewgrr1=091372=090=09968=094 elsbswpewgrr1t=091372=090=09968=094 elsbswpewgrr1twa7=091372=090=09968=094 elsbswpewgrr1twa7z=091372=090=09968=094 elsbswpewgrr1twa7z1bea0=091372=090=09968=094 elsbswpewgrr1twa7z1bea0f=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fk=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkd=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbk=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbke=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbkezb=091372=090=09968=094 else=09225=090=091285=090 Might that be the trouble? (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly=20 different data set.) Cheers, Lachlan |
|
From: Lachlan A. <lh...@us...> - 2003-02-19 12:17:34
|
On Wednesday 19 February 2003 03:00, Gilles Detillieux wrote: > something from out in left field... Are you running the very > latest release of the libz code? Very good point, Gilles. I was running the old code. I've installed=20 zlib1-1.1.4 and am keeping testing. (KDE needs the zlib1 package=20 rather than zlib, but I assume the libraries are the same.) Cheers, Lachlan |
|
From: Gilles D. <gr...@sc...> - 2003-02-18 16:01:17
|
According to Neal Richter: > > > Question: Your message below points to an error on page 26613. > > > Your previous message pointed to an error on page 33. > > > Is the error a moving target? ;-) > > > > I think I reduced the data set slightly, but yes, the target does seem > > to move :( As the attached file shows, it is now complaining about > > page 46, and I don't think I've changed anything since the last time, > > except the runtime flags (-v level). > > If you are indexing the same data in the same order every time and the > target moves with trivial changes of code or flags.. that usually means > MEMORY CORRUPTION!!! > > Unfortunately these types of errors are difficult to duplicate on other > machines/platforms. > > Valgrind is a nice open source memory debugger.. have you used it > before? > > %valgrind htdig xxx yyy > > This will definetly help find a memory error, but will also drown you in > output sice htdig is pretty bad with memory leaks.. but you can dissable > leak detection and look only for memory corruption. Just a little something from out in left field, as I've only been very loosely following this thread... Are you running the very latest release of the libz code? All this talk of memory corruption and memory leaks, seemingly related to libz (at least as suggested by Lachlan) reminds me of the recent security alerts regarding libz. The alerts referred to a double-free bug, which was presented as a security vulnerability, but if libz wasn't (or isn't) managing allocs and frees properly, couldn't that lead to all sorts of wierdness like this? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Lachlan A. <lh...@us...> - 2003-02-18 12:04:49
|
Greetings Neal, On Monday 17 February 2003 06:41, Neal Richter wrote: > First question: What were the results with the pagesize @ 32K?? The run I didn't didn't produce any errors, but that doesn't give me=20 much confidence. The problem is elusive and I haven't yet played=20 around with the data set (using exclude_urls) to try to reproduce=20 it, as I have had to with 8k. > If you are indexing the same data in the same order every time > and the target moves with trivial changes of code or flags.. that > usually means MEMORY CORRUPTION!!! > Unfortunately these types of errors are difficult to duplicate on > other machines/platforms. True. But is the order actually the same? I haven't read the code=20 thoroughly, but I thought ht://Dig had a timing mechanism in order=20 not to flood a particular server. Couldn't that reorder the search=20 depending on how much time is spent on debugging output? > Valgrind is a nice open source memory debugger. > This will definetly help find a memory error, but will also drown > you in output sice htdig is pretty bad with memory leaks.. but you > can dissable leak detection and look only for memory corruption. Thanks for the tip. I had used mpatrol previously, which seems more=20 powerful, with the corresponding increase in hassle. Interestingly,=20 it didn't report any leaks or reproduce the error... I'm rerunning=20 it without valgrind now to check that the error is actually=20 repeatable. > I'd also like a link to your test data if you can put it up on a > high bandwidth server.. and a copy of your conf file and > commandline htdig options. I'd like to run Insure++ on it. I'll see what I can do. However, since all the URLs are file:///s,=20 the actual data set will differ for you, unless you overwrite your=20 current filesystem. Remember, this is a *very* pernickety bug! Now that you have suggested some tools, I'll keep playing around until=20 I have something more concrete to report. Thanks, as always, Lachlan |
|
From: Steininger, H. <HSt...@ch...> - 2003-02-17 10:37:40
|
Hello, this is my first posting to this list. During my Internship i should read out the Database-Files of htdig with Java and hold the Content in Memory to accelerate Searches. I know there's no JDBC for htdig-Database , so anyone out there knows how to access the Files so that i can read them out. Thanx in Advance Steininger Herbert PS.: Excuse my English , im from Germany |
|
From: Neal R. <ne...@ri...> - 2003-02-16 19:41:02
|
First question: What were the results with the pagesize @ 32K?? > On Friday 14 February 2003 11:16, Neal Richter wrote: > > > Question: Your message below points to an error on page 26613. > > Your previous message pointed to an error on page 33. > > Is the error a moving target? ;-) > > I think I reduced the data set slightly, but yes, the target does seem > to move :( As the attached file shows, it is now complaining about > page 46, and I don't think I've changed anything since the last time, > except the runtime flags (-v level). If you are indexing the same data in the same order every time and the target moves with trivial changes of code or flags.. that usually means MEMORY CORRUPTION!!! Unfortunately these types of errors are difficult to duplicate on other machines/platforms. Valgrind is a nice open source memory debugger.. have you used it before? %valgrind htdig xxx yyy This will definetly help find a memory error, but will also drown you in output sice htdig is pretty bad with memory leaks.. but you can dissable leak detection and look only for memory corruption. > When I compile htdig with that flag, the error does not occur. > However, I'm attaching a trace for htdump from a dig performed > without the flag, in case that helps. I hope it isn't too big... Wow, very interesting since the debugg output doesn't do much. FYI: the other way to activate the DEBUG_CMPR flag is to make line 64 of mp_cmpr.c active. This way you can compile htdig normally and reduce the number of variables changing in the experiments. There are 12 spots in mp_cmpr.c where addtional code is executed with the flag. Three of them do nothing but a printf with no variable access.. they can be ignored. It would be interesting to comment out each of them except for one and see what the results are. Let us know what you find. I'd also like a link to your test data if you can put it up on a high bandwidth server.. and a copy of your conf file and commandline htdig options. I'd like to run Insure++ on it. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: kariyach a. <kar...@ya...> - 2003-02-16 17:08:08
|
Dear Sir, According to I do my thesis base on htdig program.I found the problem about the search result list after perform searching which I don't understand what is it and how does it look like? I meant it's the text file or not. Could you please give me some detail about it? I have to use the resultlist of searching to be the input for my other program. Regards, Krisanee A. __________________________________________________ Do you Yahoo!? Yahoo! Shopping - Send Flowers for Valentine's Day http://shopping.yahoo.com |
|
From: Geoff H. <ghu...@us...> - 2003-02-16 08:14:09
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Lachlan A. <lh...@us...> - 2003-02-15 15:12:07
|
On Friday 14 February 2003 11:16, Neal Richter wrote: > Question: Your message below points to an error on page 26613.=20 > Your previous message pointed to an error on page 33. > Is the error a moving target? ;-) I think I reduced the data set slightly, but yes, the target does seem=20 to move :( As the attached file shows, it is now complaining about=20 page 46, and I don't think I've changed anything since the last time,=20 except the runtime flags (-v level). > Is there something you can tell us about the type of data you are > indexing? Are they big pages with lots of repetitive information.. > giving htdig many similar keys which hash/sort to the same pages? It is just micellaneous Linux documentation -- Qt, KDE, xemacs, LaTeX,=20 lots of READMEs. I can't think off hand of any particularly=20 repetitive pages... > Please recompile just mp_cmpr.c with "gcc -DDEBUG_CMPR [etc]" and > post the output to a webserver > At that point I'll check it out and get you a replacement mp_cmpr.c > to try to get more information about the page in question... When I compile htdig with that flag, the error does not occur. =20 However, I'm attaching a trace for htdump from a dig performed=20 without the flag, in case that helps. I hope it isn't too big... Cheers, Lachlan |
|
From: Lachlan A. <lh...@us...> - 2003-02-15 04:24:30
|
Greetings Wim, This one is a bit too hard for me, so I'm Cc'ing to the htdig-dev=20 list. What output do you get from "rundig -v" and from "make check" =20 (both with the LD_LIBRARY_PATH set correctly). Thanks, Lachlan On Friday 14 February 2003 18:35, Wim Alsemgeest wrote: > Then I reconfigured, make and make instal with my setting and > started rundig with my LD_LIBRARY_PATH setting. This was the > output: > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > lr006tux:/apps/data/wwwroot/htdig-3.2.0b4-102101 2329$ export > LD_LIBRARY_PATH > lr006tux:/apps/data/wwwroot/htdig-3.2.0b4-102101 2330$ > /distr/htdig/3.2.0/bin/rundig > htpurge: Database is empty! > > > Abort - core dumped > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > what core produces the following output: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > lr006tux:/apps/data/wwwroot/htdig-3.2.0b4-102101 2334$ what core > core: > db_reclaim.c 11.2 (Sleepycat) 9/10/99 > bt_curadj.c 11.5 (Sleepycat) 11/10/99 > > SunOS 5.9 Generic May 2002 > tanh.c 1.14 93/09/07 SMI > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: Neal R. <ne...@ri...> - 2003-02-14 23:19:55
|
Lachlan, Question: What is your wordlist_page_size set to? The htdig default is zero, and the BDB default of 8K (in most situations) is then used. Altough the BDB max page size is 64K, we can't use that yet as a result of a multiplication bug in mp_cmpr I haven't tracked down yet. I use this as my default: wordlist_page_size: 32768 Larger pages are usually more efficient, especially since here we pay the overhead of deflating each page individually before returning the data. If your bug is caused by page overflow as I suspect, then this change will at least push the bug 'away' so that you may have to index several orders of magnitude more than 50,000 pages to see the bug. We've got all kinds of problems if we want to try and index 5 Million+ pages. I could be wrong, but I'd be interested to see if it makes the problem go away. Thanks! On Fri, 14 Feb 2003, Lachlan Andrew wrote: > An error occurs during an htdump straight after htdig. However, I > haven't yet got it to occur *within* htdig. > > Interestingly, the error first reported by htdump is similar to the > one I last reported, > > WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 23 > WordDB: PANIC: Input/output error > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > but the one by htpurge (and subsequent htdumps) is > > WordDB: CDB___memp_cmpr_read: unexpected compression flag value 0x8 > at pgno = 26613 > WordDB: PANIC: Successful return: 0 > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > I'll keep looking... > > On Friday 14 February 2003 05:05, Neal Richter wrote: > > Please attempt to reproduce the error using ONLY htdig next. > > > > If the error is still present, the the error is in htdig. If the > > error is not present then the bug is happening during htpurge. > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-02-14 22:51:47
|
On Fri, 14 Feb 2003, Geoff Hutchison wrote: > On Fri, 14 Feb 2003, Neal Richter wrote: > > > What if we had a feature that stripped the querystrs from a URL > > contained in "bad_querystr" rather than rejecting them? > > url_rewrite_rules ? Ouch. That was a big frying pan that went all the way through my dunce cap. I guess I need to break down and read all 199 config vars. ;-) I'm currently working on an extension to libhtdig that will allow validation/testing of a given URL against these config vars: limit_urls_to limit_normalized exclude_urls max_hop_count restrict I'll add code to send the test-URL through the rewriter. It basically duplicates Retriever::IsURLValid without having to instantiate a Retriever object and also tests the 'restrict' var. It will allow the building of a cgi or PHP page for light configuration of the spidering component of HtDig and test urls interactively. This was inspired by the a commercial search engine filter page. http://ai.rightnow.com/htdig/testurl_snapshot.png If I am missing any config vars tell me! Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Geoff H. <ghu...@ws...> - 2003-02-14 22:20:13
|
On Fri, 14 Feb 2003, Neal Richter wrote: > What if we had a feature that stripped the querystrs from a URL > contained in "bad_querystr" rather than rejecting them? url_rewrite_rules ? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Neal R. <ne...@ri...> - 2003-02-14 22:03:29
|
Yo, Here's an idea for you: What if we had a feature that stripped the querystrs from a URL contained in "bad_querystr" rather than rejecting them? This would allow htdig to better index php/asp etc pages which may use the same page for different documents. Example: http://www.xxxx.com/document.php?docid=5&session_id=98721491204724 http://www.xxxx.com/document.php?docid=5&session_id=09235783432458 These would 'map' to the same URL http://www.xxxx.com/document.php?docid=5 But still allow these two URLs to be treated as different pages http://www.xxxx.com/document.php?docid=5 http://www.xxxx.com/document.php?docid=10 Has this been done? Please hit me with a verbal fying pan if htdig supports this now. I've noticed that Google is starting to get smart about beng able to strip some querystrs that are sessionids while leaving others alone. There is no 'out-of-the-box' default way to do this automatically since ndividual webdevelopers can chose any querystr they want to represent whatever. Thanks and Happy Valentine's day you female HtDigers lurking out there! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Ted Stresen-R. <ted...@ma...> - 2003-02-14 20:29:15
|
Hi, I've got the XSL document properly processing (transforming) the Description elements of the defaults.xml document such that the embeded HTML is retained and "ref" elements of type "attr" are translated to links. Unfortunately, I don't have all of the documentation in this file so links to anything other than attributes (196 of them) are broken. http://www.tedmasterweb.com/htdig/ Always appreciate the feedback... Ted |
|
From: Gilles D. <gr...@sc...> - 2003-02-14 20:21:59
|
While I was waiting for the other shoe to drop, Frank Passek wrote: > I encountered the following problem with the versions 3.1.6 and 3.2.0.b3. > htdig cannot parse HTTP-header lines when there is no blank after the colon, > as in content-type:text/html > > This problem may be solved by replacing the following lines in > htnet/HtHHTP.cc:633ff > > while (*token && !isspace(*token)) > token++; > > while (*token && isspace(*token)) > token++; > > by > > while (*token && > !isspace(*token) && *token!=':') > token++; > > while (*token && (isspace(*token)||*token==':')) > token++; > > > It worked for me, but I had not time to dive deep into htdig source (had to > get it running for a customer) so please check if this is sufficient. Funny how bug reports seem to come in pairs like this. I just responded two days ago to someone else who had run into this same problem. The odd thing is this code has been the same for years, and just now we get two reports of this previously unreported problem back to back. When Oliver reported the problem, he wasn't even using a particularly new web server (Apache/1.3.27). Anyway, you can see my reply to Oliver here... ftp://ftp.ccsf.org/htdig-patches/3.1.6/Document.cc.0 Same fix, so I think I was on the right track. Thanks for confirming it works. In the case of htdig 3.1.6, this is a case of a misbehaving server, as HTTP/1.0 states that there should be a single space after the colon. However, the htdig 3.2 code implements HTTP/1.1, in which the space (or spaces) after the colon is optional, so in 3.2, it's a client-side bug. Either way, though, it's an easy fix to htdig. For 3.2, the change is made in htnet/HtHTTP.cc rather than htdig/Document.cc. By the way, if you're still using 3.2.0b3, I highly recommend upgrading to a recent 3.2.0b4 snapshot. 3.2.0b3 is VERY buggy. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Frank P. <f.p...@we...> - 2003-02-14 18:52:59
|
Dear htdig-Developers, I encountered the following problem with the versions 3.1.6 and 3.2.0.b3. htdig cannot parse HTTP-header lines when there is no blank after the col= on,=20 as in content-type:text/html=20 This problem may be solved by replacing the following lines in=20 htnet/HtHHTP.cc:633ff while (*token && !isspace(*token)) token++; while (*token && isspace(*token)) token++; =20 by while (*token &&=20 !isspace(*token) && *token!=3D':') token++; while (*token && (isspace(*token)||*token=3D=3D':')) token++; It worked for me, but I had not time to dive deep into htdig source (had = to=20 get it running for a customer) so please check if this is sufficient. Regards Frank |
|
From: Divyank T. <div...@di...> - 2003-02-14 16:00:57
|
Hi, We are interested in putting up a ht://Dig mirror site in our Indian Datacenter. We noticed that you do not have any active mirror sites as of now in India. We have already setup the mirror as per the instructions on your mirroring page. I have included below, the details required to be filled in your mirroring page: Organisation: <http://www.directi.com> Directi Web Hosting Country: India Main Site: <http://htdig.mirror.directi.com/> http://htdig.mirror.directi.com/ Developer Site: <http://htdig.mirror.directi.com/dev/> http://htdig.mirror.directi.com/dev/ About Us http://www.directi.com/aboutus/corpinfo/ * Directi Web Hosting: hosts the Official PHP Indian mirror ( in.php.net ) is a fully operational ICANN Accredited Registrar. There are only 179 Accredited Registrars worldwide, of which only 90+ are operational. has a customer base of over 50,000 Customers, growing at the rate of thousands of Customers every month is the largest Web Hosting provider in South East Asia is a pioneer and Market Leader in Web Hosting Automation Services Beliefs There are certain fundamental beliefs that guide all our Products, Services and Technology decisions - Open Source Software: Directi is a strong supporter of the Opensource community, OpenSource Software and Opensource Fundamentals. A large portion of our software work is either Open Source, or derives from/extends existing Open Source software. Our work culture and methodology largely derives from Project Management principles and philosophies of the Open Source community" About the Datacenter http://www.directi.com/products/virtual-web-hosting/datacenters/india/ A local mirror at this Datacenter will provide upto 6 times faster speeds for people in India. We have ample free bandwidth and multiple free servers at our disposal. Looking forward to hearing from you soon. Warm Regards, Divyank Turakhia President & Director ------------------------------- Directi Web Hosting <http://www.directi.com/> http://www.directi.com ------------------------------- |
|
From: Neal R. <ne...@ri...> - 2003-02-14 00:15:32
|
Interesting. Thanks for the clarification about your process.. Question: Your message below points to an error on page 26613. Your previous message pointed to an error on page 33. >WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 33 Is the error a moving target? ;-) I checked out the code in mp_cmpr.c around the error output (line 289) Basically I'm trying to figure out if you are getting page overflow or not.. Is there something you can tell us about the type of data you are indexing? Are they big pages with lots of repetitive information.. giving htdig many similar keys which hash/sort to the same pages? Please recompile just mp_cmpr.c with "gcc -DDEBUG_CMPR [etc]" and rerun htdig & htdump. You could do this quickly by hand via cut-paste and then link everything with make. If you could post the output to a webserver somewhere I'd like to look at it. At that point I'll check it out and get you a replacement mp_cmpr.c to try to get more information about the page in question... Thanks! On Fri, 14 Feb 2003, Lachlan Andrew wrote: > An error occurs during an htdump straight after htdig. However, I > haven't yet got it to occur *within* htdig. > > Interestingly, the error first reported by htdump is similar to the > one I last reported, > > WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 23 > WordDB: PANIC: Input/output error > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > but the one by htpurge (and subsequent htdumps) is > > WordDB: CDB___memp_cmpr_read: unexpected compression flag value 0x8 > at pgno = 26613 > WordDB: PANIC: Successful return: 0 > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > I'll keep looking... > > On Friday 14 February 2003 05:05, Neal Richter wrote: > > Please attempt to reproduce the error using ONLY htdig next. > > > > If the error is still present, the the error is in htdig. If the > > error is not present then the bug is happening during htpurge. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-13 23:49:47
|
An error occurs during an htdump straight after htdig. However, I=20 haven't yet got it to occur *within* htdig. Interestingly, the error first reported by htdump is similar to the=20 one I last reported, WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno =3D 23 WordDB: PANIC: Input/output error WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run=20 database recovery but the one by htpurge (and subsequent htdumps) is WordDB: CDB___memp_cmpr_read: unexpected compression flag value 0x8=20 at pgno =3D 26613 WordDB: PANIC: Successful return: 0 WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run=20 database recovery I'll keep looking... On Friday 14 February 2003 05:05, Neal Richter wrote: > Please attempt to reproduce the error using ONLY htdig next. > > If the error is still present, the the error is in htdig. If the > error is not present then the bug is happening during htpurge. |