You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Neal R. <ne...@ri...> - 2003-02-13 18:04:47
|
Please attempt to reproduce the error using ONLY htdig next. If the error is still present, the the error is in htdig. If the error is not present then the bug is happening during htpurge. On Thu, 13 Feb 2003, Lachlan Andrew wrote: > Sorry for the brief report, but I was just heading out to work... > > I was running rundig, so yes I started with htdig -i and then ran > htpurge. > > Running htdump was a good suggestion. It gives the same error after > getting up to 01examplesopengloverlay. htsearch also crashes when > searching for that string, but seems OK otherwise. > > >From ddd, it seems that the call to inflate() inside > CDB___memp_cmpr_inflate() is failing. > > Later I'll see if I can reproduce the problem on a smaller data set. > > Cheers, > Lachlan > > On Thursday 13 February 2003 10:52, Neal Richter wrote: > > what does 'htpurge' mean here? Did you start with a empty index > > via 'htdig -i'? > > > > Do you get this error during searching? > > > > What exactly are you using htpurge to do? > > > > If the page is corrupted you will be able to find the error using > > the correct search string with htsearch. > > > > Another idea is to do the dig and use htdump to dump the WordDB, > > that should also duplicate the error if the page is truly > > corrupted. > > > > If you can't duplicate the error with htsearch or htdump then > > htpurge is doing something funky. > > > > Need more info!!!! ;-) > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-13 12:06:06
|
Sorry for the brief report, but I was just heading out to work... I was running rundig, so yes I started with htdig -i and then ran=20 htpurge. Running htdump was a good suggestion. It gives the same error after=20 getting up to 01examplesopengloverlay. htsearch also crashes when=20 searching for that string, but seems OK otherwise. =46rom ddd, it seems that the call to inflate() inside =20 CDB___memp_cmpr_inflate() is failing. Later I'll see if I can reproduce the problem on a smaller data set. Cheers, Lachlan On Thursday 13 February 2003 10:52, Neal Richter wrote: > what does 'htpurge' mean here? Did you start with a empty index > via 'htdig -i'? > > Do you get this error during searching? > > What exactly are you using htpurge to do? > > If the page is corrupted you will be able to find the error using > the correct search string with htsearch. > > Another idea is to do the dig and use htdump to dump the WordDB, > that should also duplicate the error if the page is truly > corrupted. > > If you can't duplicate the error with htsearch or htdump then > htpurge is doing something funky. > > Need more info!!!! ;-) |
|
From: Neal R. <ne...@ri...> - 2003-02-12 23:51:24
|
what does 'htpurge' mean here? Did you start with a empty index via 'htdig -i'? Do you get this error during searching? What exactly are you using htpurge to do? If the page is corrupted you will be able to find the error using the correct search string with htsearch. Another idea is to do the dig and use htdump to dump the WordDB, that should also duplicate the error if the page is truly corrupted. If you can't duplicate the error with htsearch or htdump then htpurge is doing something funky. Need more info!!!! ;-) Thanks. On Thu, 13 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I've just run a dig of about 50000 documents and got: > ... > Deleted, not found: ID: 38018 URL: > file:///usr/share/doc/HTML/en/kdevelop/reference/C/CONTRIB/OR_PRACTICAL_C/12_8.c > htpurge: 37130 > WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 33 > WordDB: PANIC: Input/output error > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run database > recovery > > Any ideas? > > Thanks, > Lachlan > > On Tuesday 11 February 2003 05:29, Neal Richter wrote: > > This dump is happening using the old compression scheme. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-12 23:37:24
|
Greetings Neal, I've just run a dig of about 50000 documents and got: =2E.. Deleted, not found: ID: 38018 URL:=20 file:///usr/share/doc/HTML/en/kdevelop/reference/C/CONTRIB/OR_PRACTICAL_C= /12_8.c htpurge: 37130 WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno =3D 33 WordDB: PANIC: Input/output error WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run database=20 recovery Any ideas? Thanks, Lachlan On Tuesday 11 February 2003 05:29, Neal Richter wrote: > This dump is happening using the old compression scheme. |
|
From: Koen L. <k.l...@ge...> - 2003-02-12 15:37:18
|
Dear developers of HT://Dig, the Gezondheidskiosk is a website for reliable medical information in the Netherlands. One of our main tools is the use of HT://Dig as its searchengine. We use HT://Dig to search several websites of our participants and using a series of custom made tools and script we present this information to our customers. The URL of our website is http://www.gezondheidskiosk.nl We would like to finetune the results of Ht/Dig with the help of an HtDig expert. Someone who can educate us in getting the best results with HtDig. !!!We are now looking for an expert on HT://Dig, who can assist us with and educate us about HT://Dig!!! We are looking for someone who: - is interested to analyse our current solutions used by HT://Dig, - can explain the more about functionality of HT://Dig to us, - preferrably lives in The Netherlands Kind regards, Koen Linders Contentmanager Stichting Gezondheidskiosk Postbus 262 2260 AG Leidschendam +31 (0)70 317 98 21 (Telefoon) +31 (0)70 317 98 22 (Fax) |
|
From: Wim A. <wal...@ho...> - 2003-02-12 13:10:16
|
Hello all, I just installed htdig 3.1.6 on a solaris 9 machine. htdig 3.2.0b3 did not compile wright. My question is: How do I make or get a db.wordlist? The db.wordlist.new in /apps/htdig/ is empty. Please help me with this. Greetings, Wim Alsemgeest _________________________________________________________________ MSN Zoeken, voor duidelijke zoekresultaten! http://search.msn.nl |
|
From: Lachlan A. <lh...@us...> - 2003-02-12 10:28:34
|
Greetings Abbie, If you look a bit earlier on in the output, does it say something like=20 "! UNABLE to convert" or "!! Unable to execute pdftotext at /.../pdf2html.pl line 34."? If it says the first, you'll have to edit your file=20 /opt/www/htdig/bin/doc2html/doc2html.pl -- line 77 should be=20 something like my $PDF2HTML =3D '/.../pdf2html.pl'; # full pathname of pdf2html/pl=20 script where '/.../pdf2html.pl' is the path to your pdf2html.pl script. =20 (Type 'which pdf2html.pl' to find it.) If it says the second, you'll have to edit your pdf2html.pl file. =20 At about line 20, should be the line my $PDFTOTEXT =3D "/usr/bin/pdftotext"; (Replace the path with your own path, from 'which pdftotext'.) Regarding highlighting words, ht://Dig *does* highlight the words in=20 the excerpt if they are there. If your excerpt is too small to=20 contain the search terms, you can increase the max_head_length =20 attribute. In the standard rundig it is 10,000 bytes. If your =20 max_head_length is longer than your document length and you are=20 still not getting the words highlighted, let us know. Out of interest, how did you overcome your earlier problem of ht://Dig=20 not finding the documents at all? Cheers, Lachlan On Tuesday 11 February 2003 02:12, Abbie Greene wrote: > When I run .rundig I have a large set of .txt files as well as > .pdfs to search through. I've done all of the installation process > for converting pdfs to text...however when I .rundig I receive the > error message: > > Deleted, no excerpt: [name of pdf] for what seems like all of my > pdf files. Any ideas what I can do to fix this? > > Also, is it possible to have the words within the document > highlighted for easier use of finding them? |
|
From: Lachlan A. <lh...@us...> - 2003-02-11 11:05:15
|
Greetings Neal, I agree that compressing the excerpts is a good side effect and have=20 just updated CVS to use '6' as the default. Cheers, Lachlan On Tuesday 11 February 2003 09:28, Neal Richter wrote: > compression_level's new default should be '6' (the > default compression level of gzip). > This does make the excerpts database compressed also, but I don't > see a disadvantage there at all. It's pretty well tested at this > point. |
|
From: Neal R. <ne...@ri...> - 2003-02-10 22:27:12
|
Ahh. yes compression_level's new default should be '6' (the default compression level of gzip). This does make the excerpts database compressed also, but I don't see a dissadvantage there at all. It's pretty well tested at this point. Good work. I have the value set in all my xxx.conf files. Thanks On Tue, 11 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I have found (what I think is) the problem -- the default > compression_level is 0, which causes mifluz to be used even when > the wordlist_compress_zlib flag is true. Is that correct? > Changing it to 8 causes CDB___memp_cmpr_deflate to be called, and > seems to fix the problem. > > Thanks for your help, and for writing the patch! > Lachlan > > On Tuesday 11 February 2003 05:29, Neal Richter wrote: > > This dump is happening using the old compression scheme. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 22:07:51
|
Greetings Neal, I have found (what I think is) the problem -- the default=20 compression_level is 0, which causes mifluz to be used even when=20 the wordlist_compress_zlib flag is true. Is that correct? =20 Changing it to 8 causes CDB___memp_cmpr_deflate to be called, and=20 seems to fix the problem. Thanks for your help, and for writing the patch! Lachlan On Tuesday 11 February 2003 05:29, Neal Richter wrote: > This dump is happening using the old compression scheme. |
|
From: Neal R. <ne...@ri...> - 2003-02-10 18:27:51
|
This dump is happening using the old compression scheme. Notice that the code starts out in the HtDig code (Retriever/WordList) then travels to the BDB code (CDB_*) then pops back into HtDig code (WordDBCompress & WordDBPage) WordDBCompress & WordDBPage is the mifluz page compression scheme. stick a printf in mp_cmpr.c:CDB___memp_cmpr_deflate This function does the zlib page compression for the wordlist_compress_zlib=true option. The current mifluz code has issues and may be replaced by a newer milfuz snapshot in the future. Thanks. On Tue, 11 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I've found a repeatable crash in the database code (stack dump below). > The bug seems to be very sensitive to the code, so I can't easily > print out debugging information. If you can suggest things to try in > ddd then I'll happily give them a go. The crash is about 30 mins > into execution. > > Thanks :) > Lachlan > > Program received signal SIGABRT, Aborted. > 0x4027e621 in kill () from /lib/libc.so.6 > Current language: auto; currently c > (gdb) where > #0 0x4027e621 in kill () from /lib/libc.so.6 > #1 0x4027e425 in raise () from /lib/libc.so.6 > #2 0x4027fa53 in abort () from /lib/libc.so.6 > #3 0x401ff45c in std::terminate() () from /usr/lib/libstdc++.so.5 > #4 0x401ff616 in __cxa_throw () from /usr/lib/libstdc++.so.5 > #5 0x401ff862 in operator new(unsigned) () from > /usr/lib/libstdc++.so.5 > #6 0x401ff94f in operator new[](unsigned) () from > /usr/lib/libstdc++.so.5 > #7 0x4007f24e in Compressor::get_vals(unsigned**, char const*) > (this=0xbfffe4a0, pres=0x85c3058, tag=0x2 <Address 0x2 out of > bounds>) at WordBitCompress.cc:815 > #8 0x40085268 in WordDBPage::Uncompress_main(Compressor*) > (this=0xbfffe500, pin=0xbfffe4a0) at WordDBPage.cc:213 > #9 0x40084fef in WordDBPage::Uncompress(Compressor*, int, > __db_cmpr_info*) (this=0xbfffe500, pin=0xbfffe4a0, ndebug=2) at > WordDBPage.cc:155 > #10 0x40083547 in WordDBCompress::Uncompress(unsigned char const*, > int, unsigned char*, int) (this=0x8245c88, inbuff=0x87d2678 "\004", > inbuff_length=2032, outbuff=0x408309b0 "\001", outbuff_length=8192) > at WordDBCompress.cc:156 > #11 0x40082e11 in WordDBCompress_uncompress_c (inbuff=0x87d2678 > "\004", inbuff_length=2032, outbuff=0x408309b0 "\001", > outbuff_length=8192, user_data=0x2) at WordDBCompress.cc:48 > #12 0x400fea93 in CDB___memp_cmpr_read (dbmfp=0x82db790, > bhp=0x40830978, db_io=0xbfffe680, niop=0xbfffe67c) at mp_cmpr.c:306 > #13 0x400fe832 in CDB___memp_cmpr (dbmfp=0x82db790, bhp=0x40830978, > db_io=0xbfffe680, flag=1, niop=0xbfffe67c) at mp_cmpr.c:153 > #14 0x400fdcdf in CDB___memp_pgread (dbmfp=0x82db790, bhp=0x40830978, > can_create=0) at mp_bh.c:212 > #15 0x400ffdfb in CDB_memp_fget (dbmfp=0x82db790, pgnoaddr=0xbfffe758, > flags=0, addrp=0xbfffe75c) at mp_fget.c:353 > #16 0x400d1747 in CDB___bam_search (dbc=0x82dbab0, key=0xbfffe970, > flags=12802, stop=1, recnop=0x0, exactp=0xbfffe844) at > bt_search.c:251 > #17 0x400c95a9 in CDB___bam_c_search (dbc=0x82dbab0, key=0xbfffe970, > flags=15, exactp=0xbfffe844) at bt_cursor.c:1594 > #18 0x400c8740 in CDB___bam_c_put (dbc_orig=0x82dbd50, key=0xbfffe970, > data=0xbfffe990, flags=15) at bt_cursor.c:982 > #19 0x400da9ea in CDB___db_put (dbp=0x80e5f88, txn=0x0, > key=0xbfffe970, data=0xbfffe990, flags=0) at db_am.c:508 > #20 0x40092796 in WordList::Put(WordReference const&, int) > (this=0xbffff190, arg=@0xbfffe970, flags=0) at WordDB.h:126 > #21 0x4003ab7e in HtWordList::Flush() (this=0xbffff190) at > ../htword/WordList.h:118 > #22 0x08056012 in Retriever::parse_url(URLRef&) (this=0xbffff0e0, > urlRef=@0x837c560) at Retriever.cc:667 > #23 0x08055612 in Retriever::Start() (this=0xbffff0e0) at > Retriever.cc:432 > #24 0x0805daa5 in main (ac=5, av=0xbffff704) at htdig.cc:338 > #25 0x4026c280 in __libc_start_main () from /lib/libc.so.6 > (gdb) > > On Tuesday 28 January 2003 09:38, Neal Richter wrote: > > > What DB errors are you speaking of? Turning on > > wordlist_compress_zlib should be a workaround for the DB errors I > > know about. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Abbie G. <ag...@th...> - 2003-02-10 15:12:40
|
When I run .rundig I have a large set of .txt files as well as .pdfs to search through. I've done all of the installation process for converting pdfs to text...however when I .rundig I receive the error message: =20 Deleted, no excerpt: [name of pdf] for what seems like all of my pdf files. Any ideas what I can do to fix this? =20 Also, is it possible to have the words within the document highlighted for easier use of finding them? =20 Thanks, =20 Abbie |
|
From: Ted Stresen-R. <ted...@ma...> - 2003-02-10 14:00:02
|
The defaults.dtd and defaults.xml documents should also be diff'd and committed as I had to make minor changes to both to get the Xalan processor to process them. As I recall, in the dtd I had to change one of the attribute types from NUMERIC (which is not, apparently, a valid attribute type) to CDATA and in defaults.xml there was one incorrect tag (it was opened as "href" and closed as just "ref"). Also, we should probably import the htdig.css style sheet into the archive as well. Thanks for the positive feedback. It's nice to know that after all these years using htdig (sometimes for commercial endeavors) I'm finally able to give something back! Ted On Monday, February 10, 2003, at 05:58 AM, Lachlan Andrew wrote: > * Split attrs.html into categories for faster loading. > (i.e., commit Ted's scripts) |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 13:07:27
|
Greetings Neal, I've found a repeatable crash in the database code (stack dump below). =20 The bug seems to be very sensitive to the code, so I can't easily=20 print out debugging information. If you can suggest things to try in=20 ddd then I'll happily give them a go. The crash is about 30 mins=20 into execution. Thanks :) Lachlan Program received signal SIGABRT, Aborted. 0x4027e621 in kill () from /lib/libc.so.6 Current language: auto; currently c (gdb) where #0 0x4027e621 in kill () from /lib/libc.so.6 #1 0x4027e425 in raise () from /lib/libc.so.6 #2 0x4027fa53 in abort () from /lib/libc.so.6 #3 0x401ff45c in std::terminate() () from /usr/lib/libstdc++.so.5 #4 0x401ff616 in __cxa_throw () from /usr/lib/libstdc++.so.5 #5 0x401ff862 in operator new(unsigned) () from=20 /usr/lib/libstdc++.so.5 #6 0x401ff94f in operator new[](unsigned) () from=20 /usr/lib/libstdc++.so.5 #7 0x4007f24e in Compressor::get_vals(unsigned**, char const*)=20 (this=3D0xbfffe4a0, pres=3D0x85c3058, tag=3D0x2 <Address 0x2 out of=20 bounds>) at WordBitCompress.cc:815 #8 0x40085268 in WordDBPage::Uncompress_main(Compressor*)=20 (this=3D0xbfffe500, pin=3D0xbfffe4a0) at WordDBPage.cc:213 #9 0x40084fef in WordDBPage::Uncompress(Compressor*, int,=20 __db_cmpr_info*) (this=3D0xbfffe500, pin=3D0xbfffe4a0, ndebug=3D2) at=20 WordDBPage.cc:155 #10 0x40083547 in WordDBCompress::Uncompress(unsigned char const*,=20 int, unsigned char*, int) (this=3D0x8245c88, inbuff=3D0x87d2678 "\004",=20 inbuff_length=3D2032, outbuff=3D0x408309b0 "\001", outbuff_length=3D8192)= =20 at WordDBCompress.cc:156 #11 0x40082e11 in WordDBCompress_uncompress_c (inbuff=3D0x87d2678=20 "\004", inbuff_length=3D2032, outbuff=3D0x408309b0 "\001",=20 outbuff_length=3D8192, user_data=3D0x2) at WordDBCompress.cc:48 #12 0x400fea93 in CDB___memp_cmpr_read (dbmfp=3D0x82db790,=20 bhp=3D0x40830978, db_io=3D0xbfffe680, niop=3D0xbfffe67c) at mp_cmpr.c:306 #13 0x400fe832 in CDB___memp_cmpr (dbmfp=3D0x82db790, bhp=3D0x40830978,=20 db_io=3D0xbfffe680, flag=3D1, niop=3D0xbfffe67c) at mp_cmpr.c:153 #14 0x400fdcdf in CDB___memp_pgread (dbmfp=3D0x82db790, bhp=3D0x40830978,= =20 can_create=3D0) at mp_bh.c:212 #15 0x400ffdfb in CDB_memp_fget (dbmfp=3D0x82db790, pgnoaddr=3D0xbfffe758= ,=20 flags=3D0, addrp=3D0xbfffe75c) at mp_fget.c:353 #16 0x400d1747 in CDB___bam_search (dbc=3D0x82dbab0, key=3D0xbfffe970,=20 flags=3D12802, stop=3D1, recnop=3D0x0, exactp=3D0xbfffe844) at=20 bt_search.c:251 #17 0x400c95a9 in CDB___bam_c_search (dbc=3D0x82dbab0, key=3D0xbfffe970,=20 flags=3D15, exactp=3D0xbfffe844) at bt_cursor.c:1594 #18 0x400c8740 in CDB___bam_c_put (dbc_orig=3D0x82dbd50, key=3D0xbfffe970= ,=20 data=3D0xbfffe990, flags=3D15) at bt_cursor.c:982 #19 0x400da9ea in CDB___db_put (dbp=3D0x80e5f88, txn=3D0x0,=20 key=3D0xbfffe970, data=3D0xbfffe990, flags=3D0) at db_am.c:508 #20 0x40092796 in WordList::Put(WordReference const&, int)=20 (this=3D0xbffff190, arg=3D@0xbfffe970, flags=3D0) at WordDB.h:126 #21 0x4003ab7e in HtWordList::Flush() (this=3D0xbffff190) at=20 =2E./htword/WordList.h:118 #22 0x08056012 in Retriever::parse_url(URLRef&) (this=3D0xbffff0e0,=20 urlRef=3D@0x837c560) at Retriever.cc:667 #23 0x08055612 in Retriever::Start() (this=3D0xbffff0e0) at=20 Retriever.cc:432 #24 0x0805daa5 in main (ac=3D5, av=3D0xbffff704) at htdig.cc:338 #25 0x4026c280 in __libc_start_main () from /lib/libc.so.6 (gdb)=20 On Tuesday 28 January 2003 09:38, Neal Richter wrote: > =09What DB errors are you speaking of? Turning on > wordlist_compress_zlib should be a workaround for the DB errors I > know about. |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 12:02:54
|
Greetings all, This are my list of *bare essentials* for 3.2.0b5 and a tentative=20 release schedule to get it out in about three weeks. Comments on=20 both are invited... =20 TO-DO LIST ~~~~~~~~~~ SHOWSTOPPER: * Still need thorough testing of the database, with Neal's zlib patch Any ideas how we can test this thoroughly? TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Test field restricted searching DOCUMENTATION: * Split attrs.html into categories for faster loading. (i.e., commit Ted's scripts) * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * Document the list of all installed files and default locations. PR#405715. * (Update version number from 3.2.0b4 to 3.2.0b5) TENTATIVE SCHEDULE ~~~~~~~~~~~~~~~~~~ Fri 14 Feb: Last additions to the above TO-DO list Fri 21 Feb: Feature freeze. =09 No new features added. All available development =09 effort for testing, bug fixes, documentation and =09 configure/install issues Fri 28 Feb: Code freeze. Testing and documenting only. =09 Update version number throughout. Fri 7 Mar: Release. On Tuesday 04 February 2003 01:55, Geoff Hutchison wrote: > > Is there a list of tasks which *must* be completed before the > > release of 3.2.0b4/5? > The STATUS file is the list > I would definitely say that this zlib compression issue is a > "showstopper" at the moment. > > Why don't you propose a list of what you think is essential for > 3.2.0b5. |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 10:13:23
|
Greetings Ted and all, Looks great! I propose that we mark that item as "done". Any=20 seconders? (That's not to say it you shouldn't keep making the=20 improvements you have in mind, Ted.) Cheers, Lachlan On Monday 10 February 2003 17:42, Ted Stresen-Reuter wrote: > Latest version of effort to address "splitting attrs.html" posted > here: http://www.tedmasterweb.com/htdig/ > Please take a look and let me know what you think. |
|
From: Ted Stresen-R. <ted...@ma...> - 2003-02-10 06:42:28
|
Hi, Latest version of effort to address "splitting attrs.html" posted here: http://www.tedmasterweb.com/htdig/ All links are working as needed. There are some enhancements that should still be added, but nothing terribly difficult. Will be researching using Perl to process xsl document rather than using Xalan. If time permits, will also attempt to write Perl script to do what the xsl document currently does so that everything is more portable. Will also be researching ways to improve processing speed (seems slow to me). If you have a chance, please take a look and let me know what you think. Ted |
|
From: Geoff H. <ghu...@us...> - 2003-02-09 08:14:10
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, tentatively 1 Feb 2003.
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Field-restricted searching. (e.g. PR#460833)
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Lachlan A. <lh...@us...> - 2003-02-08 23:15:59
|
Greetings all, Attached is a patch for field-restricted searches. Could people=20 please test it before I commit it, especially people who have=20 external parser and can test the new handling of <meta...> tags? Questions that arose from doing this: 1. Could/should the tests for '|| (t >=3D 161 && t <=3D 255)' be made par= t=20 of HtIsWordChar? I assume they are for accented letters. 2. Should 'prefix_match_character' really be a string, not a char? 3. What (if anything) should I do with the 'author' field, other than=20 put the words in the word list? 4. This *doesn't* allow field-restricted parenthesised queries like=20 'Harry and Potter and title:(fan and club)'. Is that OK? 5. Should 'exact:' inhibit fuzzy rules, as the comment in htsearch.cc=20 suggests? If not, what should it do? 5. I've made some annotations to STATUS. Could someone else please=20 check these too, and delete the entries if they are really resolved? Thanks, Lachlan |
|
From: Lachlan A. <lh...@us...> - 2003-02-08 21:50:14
|
Thanks for your bug report. Which ht://Dig version are you using, and what is your operating=20 system? It will also help if you could send the config.log file. Cheers, Lachlan =20 On Saturday 08 February 2003 03:59, xx28 wrote: > Dear Sir, > > I ran make command and got the following: > > CC -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" > -I../htlib -I../htcommon -I../db/dist -I../include -g HtRegex.cc > "htString.h", line 124: Warning: String literal converted to char* > in initialization. > "HtRegex.cc", line 32: Error: Too many arguments in call to > "regfree()" [deleted] > 6 Error(s) and 2 Warning(s) detected. > *** Error code 6 > make: Fatal error: Command failed for target `HtRegex.o' > > Any idea why failed? |
|
From: Mitch M. <mi...@th...> - 2003-02-08 09:00:34
|
Organisation: The Online Record Store URL: www.theonlinerecordstore.com Location: Houston, TX, USA Main Site: N/A Developer Site: N/A Files: http://mirrors.theonlinerecordstore.com/htdig Patch Archive: N/A cheers, Mitch |
|
From: xx28 <xx...@dr...> - 2003-02-07 16:59:58
|
Dear Sir, I ran make command and got the following: CC -c -DDEFAULT_CONFIG_FILE=\"/opt/www/htdig/conf/htdig.conf\" -I../htlib -I../htcommon -I../db/dist -I../include -g HtRegex.cc "htString.h", line 124: Warning: String literal converted to char* in initialization. "HtRegex.cc", line 32: Error: Too many arguments in call to "regfree()". "HtRegex.cc", line 44: Error: Too many arguments in call to "regfree()". "HtRegex.cc", line 50: Error: Too many arguments in call to "regcomp()". "HtRegex.cc", line 56: Error: Too many arguments in call to "regerror()". "HtRegex.cc", line 58: Error: Too many arguments in call to "regerror()". "HtRegex.cc", line 86: Warning: String literal converted to char* in formal argument str in call to String::operator<<(char*). "HtRegex.cc", line 101: Error: Too many arguments in call to "regexec()". 6 Error(s) and 2 Warning(s) detected. *** Error code 6 make: Fatal error: Command failed for target `HtRegex.o' Any idea why failed? Thanks, |
|
From: Budd, S. E <s....@im...> - 2003-02-07 14:29:24
|
Hello Below is a SINGLE line of output in a -vv dig log for htdig 3.2.0b4.20020126 Notice that the first url on the line has a md5 sum.. means it found something I hope, but no size. The next url appears on the same line. 181:251:3:http://www.ic.ac.uk/P360.htm: f11ca1ef64495ee31d5366e25adc85f0 182:252:3:http://www.ic.ac.uk/P3033.htm: f0f688146a7ae856b27337500542cf4e --------******--* size = 21425 Something not flushing buffers or what? |
|
From: Lachlan A. <lh...@us...> - 2003-02-06 09:52:56
|
Thanks for that, Geoff. Are these actually implemented? It seems to=20 me that exact: sets the isExact flag, which is never actually=20 read. (I was expecting it to disable the fuzzy algorithms, but it=20 doesn't, and grep doesn't show it being used anywhere.) Anyway, I'll leave that alone for now, and ask for feedback once the=20 patch is finished... Cheers, Lachlan On Tuesday 04 February 2003 01:17, Geoff Hutchison wrote: > > Could someone who knows what exact: and hidden: mean please > > explain what they are for? > > These are fuzzy algorithms essentially. You could have > endings:blah. You're right that it's undocumented, and it should > probably be taken out of the parser. (Nice idea to have per-word > fuzzy possibilities, but maybe not the right way to do it.) |
|
From: Lachlan A. <lac...@ip...> - 2003-02-05 21:33:46
|
Greetines Abbie, Did you actually include the string "<http://www...parsers>" in=20 htdig.conf? It shouldn't be there -- it's just a pointer to the=20 documentation on the external_parsers attribute... Good luck, Lachlan On Thursday 06 February 2003 08:17, Abbie Greene wrote: > I have so far done the following: > > Added to htdig.conf: > external_parsers <http://www.htdig.org/attrs.html#external_parsers> > : application/pdf->text/html > /opt/www/htdig/bin/doc2html/doc2html.pl |