You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Jim C. <gre...@yg...> - 2002-08-31 07:13:44
|
Geoff Hutchison's bits of Fri, 30 Aug 2002 translated to: >Hmm. I thought that OS X 10.2 used gcc-3.1 for compiling just about >everything. (There were comments about how the improved PowerPC >performance would help app speed, etc.) > >I guess I'd suggest recompiling from scratch as I know the C++ ABI changed >between 2.95.x and 3.1 and it'll eventually change again with gcc-3.2. I have tried both 2.95.2 and 3.1 with the same result. I think there is a bug in htnotify's readPreAndPostamble(). Both htnotify_prefix_file and htnotify_suffix_file have a default value of "", but the code only checks for NULL when examining the values of prefixfile and suffixfile. The code then proceeds to create ifstream objects using the default values. Finally, the streams are checked with 'if (! in.bad())'; however the ifstream constructor sets failbit, rather than badbit, when it is unable to open the specified file. The result is that the code drops into a while loop and starts extracting from an undefined stream object. The problem doesn't occur in the 3.2 branch because in addition to checking for NULL prefixfile/suffixfile, the code also checks the values of *prefixfile and *suffixfile. I think that in both branches checking 'in.bad()' is the wrong thing to do. Verifying that in.good() is true seems most appropriate, though ensuring that in.fail() is not true is enough to prevent the problem I encountered. Jim |
|
From: <sv...@kb...> - 2002-08-30 15:46:42
|
Hi all. I have now started work on FTP-handling in htdig. It's still VERY experimental, and I have not tested the code yet. It does compile, though, without any warnings on my RedHat 6.0 Linux in= tel pentium 3 box regards ------------------------------------------------------------ S=F8ren Vejrup Carlsen, DWA, Det Kongelige Bibliotek tlf: (+45) 33 47 48 41 email: sv...@kb... email: sv...@us... ------------------------------------------------------------- Non omnia possumus omnes --- Macrobius, Saturnalia, VI, 1, 35 -------= |
|
From: Geoff H. <ghu...@ws...> - 2002-08-30 14:26:51
|
On Thu, 29 Aug 2002, Jim Cole wrote: > very little about htnotify. The executables were built with the > GCC 2 (2.95.2) compiler. It doesn't appear that the compiler > version changed since the last Developer Tool release, so I guess > that implies that a library related problem is likely. Hmm. I thought that OS X 10.2 used gcc-3.1 for compiling just about everything. (There were comments about how the improved PowerPC performance would help app speed, etc.) I guess I'd suggest recompiling from scratch as I know the C++ ABI changed between 2.95.x and 3.1 and it'll eventually change again with gcc-3.2. If not, you can always run programs through gdb and hit a Control-Z in the middle of a loop to see where it is. -Geoff |
|
From: Jim C. <gre...@yg...> - 2002-08-30 05:44:04
|
After upgrading to 10.2 (and the associated 10.2 Developer Tools), the 3.1.6 htnotify seems to have major problems. I have left it running as long as three hours, and it never completes. It seems to alternate between using excessive CPU and grinding away on the hard drive. It also grabs a lot of RAM. Top shows VSIZE at 1 GB. The machine becomes essentially unusable for minutes at a time. Any idea on where I might start looking for the problem? I know very little about htnotify. The executables were built with the GCC 2 (2.95.2) compiler. It doesn't appear that the compiler version changed since the last Developer Tool release, so I guess that implies that a library related problem is likely. I didn't encounter any problems with the htnotify build from htdig-3.2.0b4-20020825. Other than the above problem, everything looks good based on very light testing. Jim |
|
From: Jim C. <gre...@yg...> - 2002-08-30 05:07:52
|
KomRO - Uwe Becker's bits of Thu, 29 Aug 2002 translated to: >i have started ./configure in the htdig-source-dir. The script breaked with >the error message: > > installation or configuration problem: C++ compiler cannot create executables Have you verified that a compiler system (e.g. GCC) is installed on the system? If so, have you verified that the compiler works? If the compilers are there and functional, are they in your path? Do the commands which gcc which g++ return anything? Jim |
|
From: KomRO - U. B. <ub...@ca...> - 2002-08-29 12:37:11
|
Hi, i have started ./configure in the htdig-source-dir. The script breaked with the error message: installation or configuration problem: C++ compiler cannot create executables The system is Linux (SuSE 7.3, but i have more installed, that's basically compatible with RedHat). Who can i solve this problem? Thank you and best regards Uwe. ---- Uwe Becker, Giessenbachstr. 10, D-83022 Rosenheim, iNet: http://www.beckeru.de Fon: 08031-219797, Fax: 08031-15058, Cell: (ohne), Mail: ub...@ca... |
|
From: Geoff H. <ghu...@ws...> - 2002-08-28 20:41:01
|
On Wed, 28 Aug 2002, Neal Richter wrote: > /home/nealr/RNT/htdig/mifluz-merge-20020827/htsearch/../htlib/Object.h(.text+0x1423): undefined > reference to `Parser::~Parser(void)' > > I'm all set to run a couple leak checkers on it when I can get it to > complie! It's easy to fix. Just take out the ~Parser() declaration in htsearch/parser.h That's not to guarantee that htsearch will work. :-( -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Neal R. <ne...@ri...> - 2002-08-28 20:38:53
|
I'm getting the same error: g++ -g -O2 -Wall -W -Woverloaded-virtual -fno-rtti -fno-exceptions -o .libs/htsearch Display.o DocMatch.o ResultList.o ResultMatch.o Template.o TemplateList.o WeightWord.o htsearch.o parser.o Collection.o SplitMatches.o HtURLSeedScore.o ../htfuzzy/.libs/libfuzzy.so ../htnet/.libs/libhtnet.so ../htcommon/.libs/libcommon.so ../htword/.libs/libhtword.so ../db/.libs/libhtdb.al ../htlib/.libs/libht.so -lz -Wl,--rpath -Wl,/opt/www/lib/htdig htsearch.o: In function `StringList::~StringList(void)': /home/nealr/RNT/htdig/mifluz-merge-20020827/htsearch/../htlib/Object.h(.text+0x1423): undefined reference to `Parser::~Parser(void)' collect2: ld returned 1 exit status make[1]: *** [htsearch] Error 1 make[1]: Leaving directory `/home/nealr/RNT/htdig/mifluz-merge-20020827/htsearch' make: *** [all-recursive] Error 1 I'm all set to run a couple leak checkers on it when I can get it to complie! On Tue, 27 Aug 2002, Geoff Hutchison wrote: > > On Tuesday, August 27, 2002, at 10:02 PM, Joe R. Jah wrote: > > > Configured with-rx on BSD/OS-4.3; it failed to compile htsearch: > > ------------------------------8<------------------------------ > > /tmp/htdig/mifluz- > > merge-20020827/htsearch/../htlib/Object.h(.text+0x17cc): undefined > > reference to `Parser::~Parser(void)' > > Hmm. I think I know what's wrong there, as I added a destructor to help > with some debugging. Still, that's the debugging I'm doing to track down > the memory leak, so searching isn't so happy right now. > > > I randig any way, and got an htpurge.core;( Here is a gdb back trace: > > Do you get this with htstat or htdump? Can you also run something like > "htfuzzy metaphone" to see if programs can read the resulting database? > (i.e. is htdig writing, but everyone else crashes?) > > Unfortunately, the key features to improve indexing performance are also > really buggy. I'm not sure if it's in mifluz yet, or the interface to > htdig. > > Thanks greatly for the backtrace, I'll go hunt that down. > > Anyone else? > > -Geoff > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing > real-time communications platform! Don't just IM. Build it in! > http://www.jabber.com/osdn/xim > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gabriele B. <g.b...@co...> - 2002-08-28 14:46:49
|
Ciao Romain, as far as I know, now htdig doesn't support it yet, but you could easily hack the code to make it work. I have something to complain about this way of negotiating a request by the CMS, because HTTP says the when no Accept is given, every media type is accepted by the client, but ... it's ok. However, I think this is a good point to analyse for the 3.2 code. We should somehow let the Web server know what kind of media types htdig is able to understand, by listing all of them (default ones plus those managed through external parses' help). What d'u think guys? Ciao -Gabriele Il mer, 2002-08-28 alle 15:14, rl...@bn... ha scritto: >=20 > I want to index my web site using htdig. >=20 > However, my web site, using a CMS , needs the "Accept " HTTP Header, in > order to render the dynamic content properly. >=20 > htdig does not send this Header. >=20 > How can I define custom HTTP Headers for the robot : > using htdig.conf ? > modifying the source code ? >=20 > PS: > I am using a compiled htdig v3.1.5 on an AIX v4.3 box >=20 > Thanks for your help, >=20 >=20 > Romain Lor=E9al >=20 >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing=20 > real-time communications platform! Don't just IM. Build it in!=20 > http://www.jabber.com/osdn/xim > _______________________________________________ > htdig-general mailing list <htd...@li...> > To unsubscribe, send a message to <htd...@li...= e.net> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Chad P. <gph...@aa...> - 2002-08-28 14:20:18
|
I decreased my max_doc_size to 100000 and that seems to have helped. I = haven't had a dig crash since. We have some large pdf files, so before I = had the max_doc_size very high to 8000000. =20 thanks chad |
|
From: Joe R. J. <jj...@cl...> - 2002-08-28 05:02:01
|
On Tue, 27 Aug 2002, Geoff Hutchison wrote:
> Date: Tue, 27 Aug 2002 23:04:04 -0500
> From: Geoff Hutchison <ghu...@ws...>
> To: Joe R. Jah <jj...@cl...>
> Cc: htdig3-dev <htd...@li...>
> Subject: Re: [htdig-dev] Re: mifluz merge snapshot 2002-08-27
>
> Do you get this with htstat or htdump? Can you also run something like
Here they go:
--------------------------------8<--------------------------------
$ htdump
WordKeyInfo::WordKeyInfo: didn't find key description in config
WordKey::Pack: malloc returned 0
WordKey::Pack: malloc returned 0
WordKey::Pack: malloc returned 0
WordDBEncoded::ShiftValue: what = 9, (idx = 1) >= (length = 1)
Abort (core dumped)
$ gdb htdump htdump.core
GNU gdb
This GDB was configured as "i386-unknown-bsdi4.3"...
Core was generated by `htdump'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtnet-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libcommon-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtword-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libht-...so...done.
Reading symbols from /usr/lib/libz.so...done.
Reading symbols from /usr/local/lib/libiconv.so.2...done.
Reading symbols from /usr/lib/libstdc++.so.1...done.
Reading symbols from /shlib/libm.so.0.0...done.
Reading symbols from /shlib/libgcc.so.1...done.
Reading symbols from /shlib/libc.so.2...done.
Reading symbols from /shlib/ld-bsdi.so...done.
#0 0x482c548d in kill () from /shlib/libc.so.2
(gdb) bt
#0 0x482c548d in kill () from /shlib/libc.so.2
#1 0x483509b3 in abort () from /shlib/libc.so.2
#2 0x48116544 in WordDBCompress::UncompressIBtree (this=0x8110840, inbuff=0x8116000 "#",
inbuff_length=2032, outbuff=0x81a4a28 "", outbuff_length=8192) at WordDBCompress.cc:219
#3 0x48115d93 in WordDBCompress::UncompressBtree (this=0x8110840, inbuff=0x8116000 "#",
inbuff_length=2032, outbuff=0x81a4a28 "", outbuff_length=8192) at WordDBCompress.cc:726
#4 0x48114aa2 in WordDBCompress::Uncompress (this=0x8110840, inbuff=0x8116000 "#", inbuff_length=2032,
outbuff=0x81a4a28 "", outbuff_length=8192) at WordDBCompress.cc:351
#5 0x481146af in WordDBCompress_uncompress_c (inbuff=0x8116000 "#", inbuff_length=2032,
outbuff=0x81a4a28 "", outbuff_length=8192, user_data=0x8110840) at WordDBCompress.cc:75
#6 0x8089491 in CDB___memp_cmpr_read (dbmfp=0x80aa6c0, bhp=0x81a49f0, db_io=0x8047800, niop=0x80477fc)
at mp_cmpr.c:353
#7 0x80890f2 in CDB___memp_cmpr (dbmfp=0x80aa6c0, bhp=0x81a49f0, db_io=0x8047800, flag=1, niop=0x80477fc)
at mp_cmpr.c:134
#8 0x8088717 in CDB___memp_pgread (dbmfp=0x80aa6c0, bhp=0x81a49f0, can_create=0) at mp_bh.c:214
#9 0x8062db7 in CDB_memp_fget (dbmfp=0x80aa6c0, pgnoaddr=0x8047904, flags=0, addrp=0x8047908)
at mp_fget.c:370
#10 0x8097aed in CDB___bam_search (dbc=0x80cbe00, key=0x8047aec, flags=257, stop=1, recnop=0x0,
exactp=0x80479e4) at bt_search.c:302
#11 0x809039d in __bam_c_search (dbc=0x80cbe00, key=0x8047aec, flags=30, exactp=0x80479e4)
at bt_cursor.c:1828
#12 0x808eabb in __bam_c_get (dbc=0x80cbe00, key=0x8047aec, data=0x8047ad0, flags=30, pgnop=0x8047a3c)
at bt_cursor.c:938
#13 0x80794dc in CDB___db_c_get (dbc_arg=0x80cbf00, key=0x8047aec, data=0x8047ad0, flags=30) at db_cam.c:569
#14 0x4810fe6a in WordCursorOne::WalkNextStep (this=0x80cbd00) at WordDB.h:226
#15 0x4810fd74 in WordCursorOne::WalkNext (this=0x80cbd00) at WordCursorOne.cc:269
#16 0x4810f707 in WordCursorOne::Walk (this=0x80cbd00) at WordCursorOne.cc:158
#17 0x480ce04f in HtWordList::Dump (this=0x8047c38, filename=@0x8047c28) at HtWordList.cc:173
#18 0x804beb4 in main (ac=1, av=0x8047d2c) at htdump.cc:149
#19 0x804b723 in __start ()
(gdb) q
$ htstat
htstat: Total documents: 130
WordKeyInfo::WordKeyInfo: didn't find key description in config
WordList::NotImplemented
Abort (core dumped)
$ gdb htstat htstat.core
GNU gdb
This GDB was configured as "i386-unknown-bsdi4.3"...
Core was generated by `htstat'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtnet-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libcommon-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtword-...so...done.
Reading symbols from /usr/local/htdig/3.2/lib/htdig/libht-...so...done.
Reading symbols from /usr/lib/libz.so...done.
Reading symbols from /usr/local/lib/libiconv.so.2...done.
Reading symbols from /usr/lib/libstdc++.so.1...done.
Reading symbols from /shlib/libm.so.0.0...done.
Reading symbols from /shlib/libgcc.so.1...done.
Reading symbols from /shlib/libc.so.2...done.
Reading symbols from /shlib/ld-bsdi.so...done.
#0 0x482c548d in kill () from /shlib/libc.so.2
(gdb) bt
#0 0x482c548d in kill () from /shlib/libc.so.2
#1 0x483509b3 in abort () from /shlib/libc.so.2
#2 0x481239ed in WordList::NotImplemented () at WordList.h:427
#3 0x804c03f in main (ac=1, av=0x8047d2c) at ../htword/WordList.h:202
#4 0x804b813 in __start ()
(gdb) q
--------------------------------8<--------------------------------
> "htfuzzy metaphone" to see if programs can read the resulting database?
No apparent Problem:
--------------------------------8<--------------------------------
$ htfuzzy metaphone
$
--------------------------------8<--------------------------------
> (i.e. is htdig writing, but everyone else crashes?)
Htdig took four minutes and 20 seconds to index ~300 documents:
--------------------------------8<--------------------------------
$ ll ../db
-rw-r--r-- 1 jjah www 98304 Aug 27 21:27 db.docdb
-rw-r--r-- 1 jjah www 1648618 Aug 27 21:31 db.docs
-rw-r--r-- 1 jjah www 32768 Aug 27 21:27 db.docs.index
-rw-r--r-- 1 jjah www 860160 Aug 27 21:27 db.excerpts
-rw-r--r-- 1 jjah www 0 Aug 27 21:31 db.worddump
-rw-r--r-- 1 jjah www 1609728 Aug 27 21:27 db.words.db
--------------------------------8<--------------------------------
> Unfortunately, the key features to improve indexing performance are also
> really buggy. I'm not sure if it's in mifluz yet, or the interface to
> htdig.
You are right about indexing prformance; htdig-3.1.6 takes ~10 minutes on
my system to index ~10500 documents;)
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|
|
From: Geoff H. <ghu...@ws...> - 2002-08-28 04:10:14
|
Hi, I had a brief brainstorm on my run today as far as profiling the indexing. Obviously htword/mifluz performance still needs to improve significantly. But another slowdown relative to 3.1 is from the way 3.2 treats hopcounts. To ensure that restricting indexes by hopcount works correctly, the "queue" for URLs is really a priority queue. URLs with lower hopcounts move up the heap. Of course this requires some sorting and some overhead. Right now, I don't think this needs to happen *unless* we're restricting indexing based on hopcount. So the proposal is that when we're not restricting by hopcount, the Server objects would switch back to the previous system (i.e. no sorting). I think this should shave a few percent off of indexing. Does this seem like an OK idea? Can anyone come up with an example where this would be a Bad Idea(tm)? -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-08-28 04:04:11
|
On Tuesday, August 27, 2002, at 10:02 PM, Joe R. Jah wrote: > Configured with-rx on BSD/OS-4.3; it failed to compile htsearch: > ------------------------------8<------------------------------ > /tmp/htdig/mifluz- > merge-20020827/htsearch/../htlib/Object.h(.text+0x17cc): undefined > reference to `Parser::~Parser(void)' Hmm. I think I know what's wrong there, as I added a destructor to help with some debugging. Still, that's the debugging I'm doing to track down the memory leak, so searching isn't so happy right now. > I randig any way, and got an htpurge.core;( Here is a gdb back trace: Do you get this with htstat or htdump? Can you also run something like "htfuzzy metaphone" to see if programs can read the resulting database? (i.e. is htdig writing, but everyone else crashes?) Unfortunately, the key features to improve indexing performance are also really buggy. I'm not sure if it's in mifluz yet, or the interface to htdig. Thanks greatly for the backtrace, I'll go hunt that down. Anyone else? -Geoff |
|
From: Joe R. J. <jj...@cl...> - 2002-08-28 03:02:35
|
On Tue, 27 Aug 2002, Geoff Hutchison wrote: > Date: Tue, 27 Aug 2002 20:10:54 -0500 > From: Geoff Hutchison <ghu...@ws...> > To: htdig3-dev <htd...@li...> > Subject: [htdig-dev] Re: mifluz merge snapshot 2002-08-27 > > I've posted a revised mifluz-merge snapshot. This version should be much > more stable and I can confirm that it'll index all of htdig.org > (including mailing list archives), run htfuzzy, htpurge, htmerge, > htstat, and htnotify. There's still a memory leak hiding around in > htsearch, so it often segfaults. Help with leak detection would be > GREATLY appreciated! > > I would hope that at least for indexing, all the bugs are out. Please > prove me wrong, so we can get this ready for wide-scale release. Configured with-rx on BSD/OS-4.3; it failed to compile htsearch: ------------------------------8<------------------------------ /tmp/htdig/mifluz-merge-20020827/htsearch/../htlib/Object.h(.text+0x17cc): undefined reference to `Parser::~Parser(void)' gmake[1]: *** [htsearch] Error 1 gmake[1]: Leaving directory `/tmp/htdig/mifluz-merge-20020827/htsearch' gmake: *** [install-recursive] Error 1 ------------------------------8<------------------------------ I randig any way, and got an htpurge.core;( Here is a gdb back trace: ------------------------------8<------------------------------ $ gdb htpurge htpurge.core GNU gdb Copyright 1998 Free Software Foundation, Inc. This GDB was configured as "i386-unknown-bsdi4.3"... Core was generated by `htpurge'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtnet-...so...done. Reading symbols from /usr/local/htdig/3.2/lib/htdig/libcommon-...so...done. Reading symbols from /usr/local/htdig/3.2/lib/htdig/libhtword-...so...done. Reading symbols from /usr/local/htdig/3.2/lib/htdig/libht-...so...done. Reading symbols from /usr/lib/libz.so...done. Reading symbols from /usr/local/lib/libiconv.so.2...done. Reading symbols from /usr/lib/libstdc++.so.1...done. Reading symbols from /shlib/libm.so.0.0...done. Reading symbols from /shlib/libgcc.so.1...done. Reading symbols from /shlib/libc.so.2...done. Reading symbols from /shlib/ld-bsdi.so...done. #0 0x4815a492 in String::String (this=0x8047be4, s=0x1 <Address 0x1 out of bounds>) at String.cc:53 53 len = strlen(s); (gdb) bt #0 0x4815a492 in String::String (this=0x8047be4, s=0x1 <Address 0x1 out of bounds>) at String.cc:53 #1 0x48157de2 in Configuration::Defaults (this=0x80ad000, array=0x80a6c28) at Configuration.cc:379 #2 0x804c11c in main (ac=1, av=0x8047ce8) at htpurge.cc:80 #3 0x804bf83 in __start () (gdb) q ------------------------------8<------------------------------ Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
|
From: Geoff H. <ghu...@ws...> - 2002-08-28 01:11:02
|
I've posted a revised mifluz-merge snapshot. This version should be much more stable and I can confirm that it'll index all of htdig.org (including mailing list archives), run htfuzzy, htpurge, htmerge, htstat, and htnotify. There's still a memory leak hiding around in htsearch, so it often segfaults. Help with leak detection would be GREATLY appreciated! I would hope that at least for indexing, all the bugs are out. Please prove me wrong, so we can get this ready for wide-scale release. This version of ht://Dig is an experimental snapshot of 3.2.0b4 including a new version of the mifluz backend. With help testing this and beating out some bugs, I'll pour it into the main CVS repository. So if you're willing to try very bleeding edge code (i.e. may not compile or run correctly), some feedback would be very helpful. The snapshots will be left in <http://www.htdig.org/files/snapshots/> but there's no script--I'm rolling them by hand as needed. I doubt I'll ever have time to roll more than one a day, so I'm not concerned by the date stamp. :-) If you'd like to give it a spin, please see below, particularly about libiconv--which will eventually have to be bundled much like the Berkeley DB or how we bundled librx. -Geoff CAVEAT LECTOR: * Some additional software may be required for compilation and/or running this version. Currently, known requirements are: - libiconv: <ftp://ftp.gnu.org/pub/gnu/libiconv/> Automake 1.6.3 and Autoconf 2.53 are required for updating/revising config and Makefiles. Older versions will cause problems in building. * This version is extremely experimental. Don't blame me if it eats your files, disks, RAM, OS, etc. I doubt it, but I'm not making any promises. (NO WARRANTY EXPRESSED OR IMPLIED.) * This version has received limited testing, so there are likely tons of bugs. Please report them to me or htd...@li... so they can be squashed. * Not all features are implemented, e.g. revised documentation, faster indexing, faster searching lower memory requirements, etc. This will happen shortly. Let's fix bugs first. |
|
From: Chad P. <gph...@aa...> - 2002-08-27 13:54:11
|
Here are some errors I have from digs: Error from htdig-3.2.0b4-20020818: FATAL ERROR:WordDBPage::Uncompress read wrong num worddiffs FATAL ERROR at file:WordDBPage.cc line:335 !!! ./rundig: line 36: 13869 Segmentation fault $BINDIR/htdig -i $opts = $stats $alt Error from mifluz-merge-20020804: WordKeyInfo::AddField: there must be exactly two strings separated by a = white space (space or tab) in a field description (Word in key description = Word/DocID 32/Flags 8/Location 16) 0:2:0:http://www.aafp.org/: --WordListOne::Override() word is zero length WordListOne::Override() word is zero length WordListOne::Override() word is zero length thanks chad |
|
From: Neal R. <ne...@ri...> - 2002-08-26 16:06:24
|
Chad, Please post your indexing errors to the list. I'm in the process of debugging aparticular error and am curious what you are running into. Thanks! On Mon, 26 Aug 2002, Chad Phillips wrote: > Geoff, > > I just read the Current Status of snapshot 3.2.0b4-20020825, I noticed there wasn't anything about indexing problems. I can't get a good index of my site with 3.2 . 90% of the time it throws some database error about 3/4 of the way through the dig. If I index a small site ( < 10000 docs ) it seems to work great. Anything over about 15,000 and it seems to die pretty regularly. I know you having been doing some work on the database backend. > > I guess my quesiton is should I submit error reports for the different types of DB errors I get? > > thanks > chad > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r_______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Chad P. <gph...@aa...> - 2002-08-26 14:38:41
|
Geoff, I just read the Current Status of snapshot 3.2.0b4-20020825, I noticed = there wasn't anything about indexing problems. I can't get a good index = of my site with 3.2 . 90% of the time it throws some database error about = 3/4 of the way through the dig. If I index a small site ( < 10000 docs ) = it seems to work great. Anything over about 15,000 and it seems to die = pretty regularly. I know you having been doing some work on the database = backend. I guess my quesiton is should I submit error reports for the different = types of DB errors I get? thanks chad |
|
From: Brian W. <bw...@st...> - 2002-08-26 02:25:20
|
I am afraid I have dropped off the radar a bit on this one,
so sorry in advance for this coming out of the blue a bit.
Anyway - I was going through the archives and I stumbled
across this post, which I had missed:
>According to J. op den Brouw:
> > It's a nice patch for those who cannot use syslog facilities, but
> > the patch removes the syslog logging feature. It would be nice
> > to select one of them (or have them both) on compile or run time
> > basis.
> >
> > It's also a patch against 3.1.6. It would be nice if there's a
> > patch for 3.2.0b4-xxxx too.
> >
> > Furthermore, I see a flock() call somewhere. AFAIK, different
> > OS-es use different names and parameter lists. Example
> >
> > HP-UX: int lockf(int fildes, int function, off_t size);
> > Linux 2.2: int flock(int fd, int operation);
>
>I hadn't noticed when I looked at the patch that it completely removed
>the ability to log to syslog(). That's one more reason to reject
>it for 3.1.x. I rejected it over concerns about portability, as you
>pointed out. I don't think it's appropriate for inclusion in 3.1.7
>either for that reason.
Ok.
1) The patch does not remove the ability to do syslog. In my notes
that go with the patch it says:
> * logging_file ( Default: none )
>
> If this is set to "none", then it will log using syslog, otherwise
> this will be assumed to be the path to the log file
The whole way it is set up, it uses the existing default
behaviour if it isn't explicitly activated.
2) If the issue is the portability of flock, would it be
acceptable if I changed it over to using fcntl?
(Mr Google threw up the follwoing page which says that "fcntl() is the
only POSIX-compliant locking mechanism, and is therefore the only
truly portable lock"
http://www.erlenstar.demon.co.uk/unix/faq_3.html
)
3) It should be simple enough to create a patch that works with 3.2.x,
judging by a quick look at the latest Display.cc in the CVS repository.
I *would* like to get it rolled into 3.1.x if I can. I am
more than willing to make any changes required to make this
happen.
Regs
Brian White
-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: bw...@st...
Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste
|
|
From: Geoff H. <ghu...@us...> - 2002-08-25 07:13:42
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
(mifluz merge essentially finished, contact Geoff for patch to test)
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Geoff H. <ghu...@ws...> - 2002-08-20 19:39:30
|
Some time ago, someone posted a supposed vulnerability in ht://Dig to the BugTraq mailing list about Cross-Site Scripting attacks using the htsearch CGI. To the best of our knowledge, this is not a problem in versions 3.1.5, 3.1.6, 3.2.0b2, 3.2.0b3 or 3.2.0b4 snapshots of ht://Dig. However, we are sending out this security advisory to let you know the issue and how to tell if your htsearch templates could allow a cross-site scripting attack. * The Problem: (About Cross-Site Scripting) Cross-site scripting (also known as XSS) is an attack when a web application gathers malicious data from a user. For example, a link in another website, e-mail, instant message, etc. could call your CGI, collect data and then present an output page in a manner to make it appear as valid content from your website. XSS is the most dangerous for sites where users have authenticated accounts or logins and could allow access for remote users to obtain access to data not available to outside users. * How Does XSS Affect ht://Dig? Since the htsearch CGI presents web templates containing data from the original query, a query could be constructed which adds HTML code to the template--potentially sending data to remote sites or users or otherwise hijacking the client's browser. Remember that the HTML would appear to be from _your_ site and would have a "trust rating" associated with your site (e.g. an intranet). In versions 3.1.5 and later, the htsearch templates were changed to allow variable expansion using the syntax $&(VAR) to HTML-encode all output. This was done to force more standards-compliant HTML as well as providing proper encoding for special characters, including < > and &. The default templates (headers, footers, no_match pages, etc.) were all changed to use this syntax where appropriate. This "HTML-encoded" output also prevents XSS attacks as all attempts at inserting XSS queries would result in text, rather than HTML, e.g. XSS malicious code <script ...> htsearch output <script ...> (this would show up on a user's screen, rather than executed by the browser) * Solutions As stated, versions 3.1.5, 3.1.6, 3.2.0b2, 3.2.0b3 and snapshots of 3.2.0b4 are *NOT* vulnerable by default. The templates installed use the $&(VAR) syntax for proper HTML expansion. However, if you have upgraded from older versions and have not changed your templates, or you have changed your templates and use other forms of variable expansion, you may be allowing XSS attacks. Future versions of htsearch will likely make the $&(VAR) HTML-expansion the default, unless other forms (for URL encoded or URL decoded output) are specified explicitly. In particular, the following rules should be used (to "protect" user-input): $&(WORDS) not $(WORDS) $&(LOGICAL_WORDS) not $(LOGICAL_WORDS) $&(URL) not $(URL) $&(CONFIG) not $(CONFIG) $&(RESTRICT) not $(RESTRICT) $&(EXCLUDE) not $(EXCLUDE) Once again, to the best of our knowledge, the default installation of versions 3.1.5, 3.1.6, 3.2.0b2, 3.2.0b3 and snapshots of 3.2.0b4 are not vulnerable to XSS. If a repeatable example or exploit can be demonstrated, we would like to know of it, and will respond ASAP with appropriate fixes. Original BugTraq Posting and my reply: http://online.securityfocus.com/archive/1/279118 http://online.securityfocus.com/archive/1/281550 For more on htsearch templates or upgrading ht://Dig: (Current recommended production version is 3.1.6) (Current 3.2 beta is the latest possible 3.2.0b4 development snapshot) http://www.htdig.org/hts_templates.html http://www.htdig.org/RELEASE.html http://www.htdig.org/where.html http://www.htdig.org/files/snapshots/ For more about Cross-Site Scripting: http://www.cert.org/advisories/CA-2000-02.html http://httpd.apache.org/info/css-security/ http://www.cgisecurity.com/articles/xss-faq.shtml -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Geoff H. <ghu...@ws...> - 2002-08-20 19:27:13
|
Several people have complained that the 3.2 SSL support doesn't always compile when using the configure flag --with-ssl=/path/to/openssl because the configure script can't find the ssl.h header. Attached is a patch that should fix this--it's a bit large because I've had to update configure/configure.in, etc. I also switched from using HAVE_SSL_H to "turn on" the SSL code to HAVE_SSL, which works more reliably. I'm still investigating why the ssl.h header isn't always discovered by the configure script. (I think it doesn't add the include path to search until the Makefiles are written.) Please let me know if it works for you. -Geoff |
|
From: Geoff H. <ghu...@us...> - 2002-08-18 07:13:41
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
(mifluz merge essentially finished, contact Geoff for patch to test)
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Geoff H. <ghu...@ws...> - 2002-08-17 18:30:15
|
On Thursday, August 15, 2002, at 03:03 PM, Gilles Detillieux wrote: > parser has had to take a back seat to this. However, maybe Geoff or > Quim > could shed some light here... > > Does the new query parser fix such problem? How's the work coming along > on it? What sort of help is needed on this? I believe it does--furthermore even if it didn't, it would be easier to fix the bug than in the old parser.cc. Anyone who wants my start at htsearch-new.cc can have it. It still needs some work, notably taking the results and feeding them to Display.cc. So far I haven't tackled the issue of collections either. The work shouldn't be bad and I'd be glad to give pointers. However it does need *hours*, which are in short supply in my life right now. Anyone interested in taking a crack at this? I can promise the result will be *much* more flexible searching in the short and long term. Search performance will also ultimately improve significantly. -Geoff |
|
From: Jim C. <gre...@yg...> - 2002-08-16 22:33:51
|
You probably want to start by double checking your config file. I didn't have any problem starting a dig of http://www.ahisa.com.au using two different 3.2.0b4 snapshots. If there are no obvious problems with the config file, you might try checking it with something like 'cat -v' to look for non-printing characters. If that fails to uncover any problems, you might consider trying a different snapshot. Jim Maksymilian Kusmierek's bits of Fri, 16 Aug 2002 translated to: >What is wrong with this? >Why it is doing this : >Making HTTP request on http://0/robots.txt/ >Unable to find the host: 0/robots.txt (port 80) > >[root@nextweb htdig]# rundig -vvv >ht://dig Start Time: Fri Aug 16 19:22:18 2002 > 1:1:http://www.ahisa.com.au >New server: , 0 > - Persistent connections: enabled > - HEAD before GET: disabled > - Timeout: 30 > - Connection space: 0 > - Max Documents: -1 > - TCP retries: 1 > - TCP wait time: 5 >Trying to retrieve robots.txt file >Making HTTP request on http://0/robots.txt/ >Unable to find the host: 0/robots.txt (port 80) |