You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Gilles D. <gr...@sc...> - 2002-08-02 16:50:04
|
According to Natalija Stevens: > Hi my current weighting in .conf file is > > title_factor: 100 > keywords_factor: 50 > text_factor: 1 > other _factors are set to 0 > > at the top of the pages I have < meta name="keywords" content="blah, blah, > blah"> > > I have also set in conf file the line that equals keywords and > htdig-keywords, that I found on this discussion group. > > > My problem is that as a result of search I first get all pdf and doc files ( > marked with four stars), then rest of the search. This boders me as some of > these pdf-s and docs files are not really relevant to the search, they might > just have search word mention on one or two places in the text. > > The rest of the search is marked with three and less stars, although those > with search words in title should really get 4 stars rather then 3 and 2. What verion of htdig are you running? If 3.1.x, did you reindex after changing the factors in your config file? What external parser or converter are you using to index pdf and doc files? Do these scripts output <title> and meta keywords tags from info extracted from the pdf or doc files? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Natalija S. <Nat...@fi...> - 2002-08-02 15:53:31
|
Hi my current weighting in .conf file is title_factor: 100 keywords_factor: 50 text_factor: 1 other _factors are set to 0 at the top of the pages I have < meta name="keywords" content="blah, blah, blah"> I have also set in conf file the line that equals keywords and htdig-keywords, that I found on this discussion group. My problem is that as a result of search I first get all pdf and doc files ( marked with four stars), then rest of the search. This boders me as some of these pdf-s and docs files are not really relevant to the search, they might just have search word mention on one or two places in the text. The rest of the search is marked with three and less stars, although those with search words in title should really get 4 stars rather then 3 and 2. Can you help with this please? Natalija |
|
From: Geoff H. <ghu...@ws...> - 2002-07-31 02:04:44
|
> Does anyone know exactly how to disable the BDB compression for > the word-database db.words.db? <http://www.htdig.org/dev/htdig-3.2/attrs.html#wordlist_compress> -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-07-30 21:31:02
|
Hey, Does anyone know exactly how to disable the BDB compression for the word-database db.words.db? I'm getting an error after indexing the 131,731th document.. FATAL ERROR:Compressor::get_vals invalid comptype FATAL ERROR at file:WordBitCompress.cc line:827 !!! It's having a problem with putting the string "ocean" into the wordDB during a words.flush(). The string is already in the database, it's contained in a number of other documents. After loading a page from the database, it looks like it's expecting a 'O' or '1' as a comptype in Compressor::get_vals() The actual comptype is '3'. Here's the backtrace: #0 0x4025b3e0 in Compressor::get_vals (this=0xbfff4dbc,pres=0x83fd430, tag=0x4031e360 "NumField2") at WordBitCompress.cc:827 #1 0x4026218a in WordDBPage::Uncompress_main (this=0xbfff4e1c,pin=0xbfff4dbc) at WordDBPage.cc:213 #2 0x402600b0 in WordDBPage::Uncompress (this=0xbfff4e1c, pin=0xbfff4dbc,ndebug=0) at WordDBPage.cc:155 #3 0x4025ee1a in WordDBCompress::Uncompress (this=0x848c740,inbuff=0x83fa208 "\004", inbuff_length=2032, outbuff=0x41031758 "\002",outbuff_length=8192) at WordDBCompress.cc:127 #4 0x4025e6e9 in WordDBCompress_uncompress_c (inbuff=0x83fa208 "\004",inbuff_length=2032, outbuff=0x41031758 "\002", outbuff_length=8192,user_data=0x848c740) at WordDBCompress.cc:48 #5 0x4023db09 in CDB___memp_cmpr_read (dbmfp=0x848dbe0, bhp=0x41031720,db_io=0xbfff4f50, niop=0xbfff4f4c) at mp_cmpr.c:288 #6 0x4023d889 in CDB___memp_cmpr (dbmfp=0x848dbe0, bhp=0x41031720,db_io=0xbfff4f50, flag=1, niop=0xbfff4f4c) at mp_cmpr.c:139 #7 0x4023cfa0 in CDB___memp_pgread (dbmfp=0x848dbe0, bhp=0x41031720,can_create=0) at mp_bh.c:214 #8 0x4023eeb4 in CDB_memp_fget (dbmfp=0x848dbe0, pgnoaddr=0xbfff5038,flags=0, addrp=0xbfff503c) at mp_fget.c:353 #9 0x40214fd7 in CDB___bam_search (dbc=0x848e148, key=0xbfff51e8,flags=12802, stop=1, recnop=0x0, exactp=0xbfff5100) at bt_search.c:251 #10 0x4020d85b in CDB___bam_c_search (dbc=0x848e148, key=0xbfff51e8,flags=15, exactp=0xbfff5100) at bt_cursor.c:1594 #11 0x4020c713 in CDB___bam_c_put (dbc_orig=0x848dfe0, key=0xbfff51e8,data=0xbfff51d0, flags=15) at bt_cursor.c:982 #12 0x4021d0a4 in CDB___db_put (dbp=0x83f69b0, txn=0x0, key=0xbfff51e8,data=0xbfff51d0, flags=0) at db_am.c:508 #13 0x4026e371 in WordList::Put (this=0x83f200c, arg=@0x848aef8,flags=0) at ../htword/WordDB.h:128 #14 0x402719b9 in WordList::Override (this=0x83f200c,wordRef=@0x848aef8) at ../htword/WordList.h:118 #15 0x40276eda in HtWordList::Flush (this=0x83f200c) at HtWordList.cc:84 Thanks! Any thoughts/isights would be appreciated! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Geoff H. <ghu...@ws...> - 2002-07-29 01:24:12
|
>> NEEDED FEATURES: >> * Handle local_urls through file:// handler, for mime.types support. > > About a month ago I submitted a patch for this, but haven't heard > back about it. If it is not acceptable, I'd be happy to fix it, but No, I just haven't updated the STATUS file. Sorry. I'll take care of that ASAP. -Geoff |
|
From: Lachlan A. <lh...@ee...> - 2002-07-28 23:35:25
|
On Sun, Jul 28, 2002 at 12:12:49PM -0700, it was written > From: Geoff Hutchison <ghu...@us...> > Date: Sun, 28 Jul 2002 00:13:41 -0700 > Subject: [htdig-dev] Current Status as of snapshot 3.2.0b4-20020728 > > NEEDED FEATURES: > * Handle local_urls through file:// handler, for mime.types support. About a month ago I submitted a patch for this, but haven't heard back about it. If it is not acceptable, I'd be happy to fix it, but by leaving it in the "needed features" section, I'm afraid there may be duplicated effort. Cheers, Lachlan -- Lachlan Andrew lh...@ee... Phone: +613 8344-3816 Fax: +613 8344-6678 Department of Electrical and Electronic Engineering CRICOS Provider Code University of Melbourne, Victoria, 3010 AUSTRALIA 00116K |
|
From: Geoff H. <ghu...@ws...> - 2002-07-28 14:57:39
|
> My last try to patch 3.1.6 a little failed, so I still have no htdig > that > can use DBs bigger than 2 GB. I wouldn't try this. It may or may not be worth trying to upgrade the entire db/ subdirectory of the Berkeley DB, but just changing the configure script is asking for trouble. If the code to support large files on Linux is in the source, it'll be set in the configure script too. > ignored them cold-blooded) and finally I got the following error ... > md5.cc:16: implicit declaration of function `int malloc(...)' OK, I wish I saw this before now, since the snapshot has already been taken. But the fix for this is pretty easy. The source is missing an "#include <stdlib.h>" at the top, which is the header which declares malloc. I'm moving this discussion to htdig-dev since I think it makes much more sense there. -Geoff |
|
From: Geoff H. <ghu...@us...> - 2002-07-28 07:13:42
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Geoff H. <ghu...@ws...> - 2002-07-27 16:28:31
|
> Basically the HTTP library should send a "Proxy-Authorization" > request header and the task should be done. Any suggestion? Any > comments? Can I go on with that? If it works, yes. I don't think there's a distinct problem with adding a feature like this. The 3.2 code is hardly in a feature freeze at the moment. ;-) -Geoff |
|
From: <no...@so...> - 2002-07-24 21:36:12
|
Patches item #586158, was opened at 2002-07-24 21:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=586158&group_id=4593 Category: devel Group: None Status: Open Resolution: None Priority: 5 Submitted By: Andy Bach (afbach) Assigned to: Nobody/Anonymous (nobody) Summary: fix for ssl/https connection Initial Comment: I found on https connections failed on the 2nd connect for 3.2b4 on Solaris and linux. I appear to have fixed it by changing the SSLConnection destructor, adding a ctx = NULL; to htnet/SSLConnection.cc SSLConnection::~SSLConnection() { if ( ctx != NULL ) SSL_CTX_free(ctx); ctx = NULL; } I'd added much debug to track it down to the failure in SSLConnection::Connect, at the SSL_new(ctx) call for the seg fault. ctx had the same value as when the first connection, yet that connection was closed. SSLConnection::InitSSL only inits if ctx is NULL so ... Anyway, it seems to have fixed it. I did have to manually alter include/htconfig.h to #define HAVE_SSL_H HAVE_LIBSSL and HAVE_LIBCRYPTO even though configure found ssl and -lssl -lcrypto got added to the Makefiles as needed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=586158&group_id=4593 |
|
From: Gabriele B. <an...@ti...> - 2002-07-23 13:45:10
|
Ciao guys,
I have been asked to add a feature to ht://Check regarding the proxy
authorization (and therefore to ht://Dig). If I am not wrong, I should just
the handling of a new configuration attribute, maybe called
http_proxy_authorization, very similar to the authorization attribute we
already have.
Basically the HTTP library should send a "Proxy-Authorization" request
header and the task should be done. Any suggestion? Any comments? Can I go
on with that?
Thanks
-Gabriele
--
Gabriele Bartolini - Web Programmer
Current Location: Prato, Tuscany, Italia
an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> find bin/laden -name osama -exec rm {} \;
|
|
From: Jim C. <gre...@yg...> - 2002-07-23 06:40:35
|
We...@ao...'s bits of Mon, 15 Jul 2002 translated to: >Hi! >I joined this list several months ago when I first started looking at htdig 3.1.5. I was having a problem at the time with Segmentation faults when runing rundig. I was only able to get around this through lowering the max_doc size to the default and removing some 7 or 8 of the urls from my list of 384. I'm not sure why this made a difference, but it did. Do any of the documents that you removed consistently cause a crash if indexed alone? If so, I would be willing to try them on a couple other platforms in order to see if the problem is general or specific to your setup. Depending on your circumstances, you might also want to try one of the 3.2.x snapshots (http://www.htdig.org/files/snapshots/). They are not recommended for production unless you need features not in 3.1.6, but a number of people are using them with some success. Jim |
|
From: Geoff H. <ghu...@us...> - 2002-07-21 07:13:56
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: <And...@wi...> - 2002-07-19 22:25:32
|
Hey. Sort of newbie on some of this, but on Intel sol 2.7, gcc 2.95.3, htdig-3.2.0b4-20020714, trying to get the ssl stuff to work (openssl 0.9.6c) though I configure w/ --with-ssl and configure seems to recognize it HAVE_SSL_H isn't set in the config process (HAVE_SSL is, but then I don't get ssl connections). Document.cc seems to want HAVE_SSL_H to enable the ssl sockets, so I set it by hand. All compiles fine but my 2nd socket connection dumps core. htdig succeeds at finding /robots.txt: - Persistent connections: enabled - HEAD before GET: enabled - Timeout: 30 - Connection space: 0 - Max Documents: -1 - TCP retries: 1 - TCP wait time: 5 - Accept-Language: Trying to retrieve robots.txt file Creating an HtHTTPSecure object Making HTTPS request on https://wiwbei.wiwb.circ7.dcn/robots.txt Making a HEAD call before the GET Try to get through to host wiwbei.wiwb.circ7.dcn (port 443) 1 - Open of the connection ok Assigned the remote host wiwbei.wiwb.circ7.dcn Assigned the port 443 Header line: HTTP/1.1 200 OK Header line: Date: Fri, 19 Jul 2002 22:21:58 GMT Header line: Server: Stronghold/3.0 Apache/1.3.12 C2NetEU/3011 (Unix) tomcat/1.0 mod_ssl/2.6.4 OpenSSL/0.9.5a mod_perl/1.22 ... but the next attempt, I get a core dump: > wiwbei.wiwb.circ7.dcn supports HTTP persistent connections (30) 0:2:0:https://wiwbei.wiwb.circ7.dcn/software/: Creating an HtHTTPSecure object Making HTTPS request on https://wiwbei.wiwb.circ7.dcn/software/ Making a HEAD call before the GET Try to get through to host wiwbei.wiwb.circ7.dcn (port 443) 3 - Open of the connection ok Assigned the remote host wiwbei.wiwb.circ7.dcn Assigned the port 443 This happens w/ and w/o persistant_connections enabled. I can use openssl s_client ... to retreive GET /software/ link above. a Andy Bach, Sys. Mangler Internet: and...@wi... VOICE: (608) 261-5738 FAX 264-5030 "To understand recursion, we must first understand recursion." |
|
From: Neal R. <ne...@ri...> - 2002-07-19 17:44:34
|
Hey all, So here's an update on libhtdig. If you aren't familiar with it, its a libraried version of htdig. There are two libraries, libhtdig.so and libhtdigphp.so. the PHP version contains PHP wrappers and can be loaded as a PHP module directly and queried using PHP calls. We're into QA of it currently.. it's holding up very well, no major problems so far. It's indexing 150K documents exported out of a database. Indexing time is about .1 to .3 seconds per document (avg 4k) on a dual processor PIII-800Mhz The excerpts DB file is 390MG, the words DB file is 120MG So about 3.3K of archive space per document. I'll be submitting a snapshot of the code soon, including updates for a Native Win32 binaries (using separate makefiles). The current code is based on an older 3.2.0b4 snapshot from about April. The next project will be to use xxdiff to merge the changes to the current snapshot. Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Scott G. <sgi...@su...> - 2002-07-17 07:46:56
|
A client's Web site which is indexed with htdig was having problems
with some corrupt PDF files. Unfortunately, the errors indicated that
the PDF files weren't indexing properly, but not which files were
causing the problem. This patch to pdf2html.pl has it check the error
code of the PDF conversion programs, and report an error to stdout and
exit with a failure code if that program exits with a failure code:
http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-pdf2html-checkerrorcode.patch
Adding to the confusion, xpdf's pdftotext doesn't exit with an error
code when it fails to parse a document. This patch to xpdf fixes
that:
http://www.suspectclass.com/~sgifford/htdig/xpdf-1.01-pdftotext-exitstatus.patch
With these two patches, the stderr output of htdig includes the
temporary filename and a URL of the document whose conversion failed,
making tracking down problems much easier.
Let me know if you have any problems, questions, etc.
----ScottG.
|
|
From: <We...@ao...> - 2002-07-15 17:02:48
|
Hi! I joined this list several months ago when I first started looking at htdig 3.1.5. I was having a problem at the time with Segmentation faults when runing rundig. I was only able to get around this through lowering the max_doc size to the default and removing some 7 or 8 of the urls from my list of 384. I'm not sure why this made a difference, but it did. I am now looking to upgrade to 3.1.6 in the hopes of not having to do this. Unfortunately I am still getting the segmentation fault and core dump. I attempted the ./configure --with-rx suggested in the FAQ with similar results. I ran the gdb backtrace but was told there was "no stack". I'm running with apache 1.3.26 on BSDI and I'm not doing anything fancy in the configuration at all. Any ideas? Thanks in advance, Wendy |
|
From: Geoff H. <ghu...@us...> - 2002-07-14 07:13:54
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: codewuzz <cod...@gm...> - 2002-07-12 13:00:35
|
Hi, I just wanted to tell you that I had some problem running the autotools. autoconf automake --foreign autoheader aclocal with the newever versions: autoconf (GNU Autoconf) 2.53 autoheader (GNU Autoconf) 2.53 automake (GNU automake) 1.5 aclocal (GNU automake) 1.5 $ autoconf configure.in:83: warning: AC_ARG_PROGRAM invoked multiple times configure.in:108: warning: AC_PROG_LEX invoked multiple times configure.in:158: error: do not use LIBOBJS directly, use AC_LIBOBJ (see section `AC_LIBOBJ vs. LIBOBJS' ------------------------------------------------------------------------------ hence the older version of these tools work fine: autoconf 2.13 autoheader 2.13 automake 1.4 aclocal 1.4 |
|
From: <no...@so...> - 2002-07-10 11:29:43
|
Patches item #579581, was opened at 2002-07-10 11:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=579581&group_id=4593 Category: stable Group: None Status: Open Resolution: None Priority: 5 Submitted By: Seb Wills (sebwills) Assigned to: Nobody/Anonymous (nobody) Summary: bug fix: parsing of date meta header Initial Comment: I found a bug in htdig/Retreiver.cc which prevents it from correctly parsing YYYY-MM-DD dates in date meta headers (it requires a space, rather than allowing a hyphen, between YYYY and MM). The patch (below) is trivial. This is for the 3.1.6 source - I haven't checked whether the problem is fixed in the 3.2 source. casi@sphinx$ diff -c Retriever.cc~ Retriever.cc *** Retriever.cc~ Thu Jan 31 23:47:17 2002 --- Retriever.cc Wed Jul 10 00:38:05 2002 *************** *** 1139,1145 **** year += 1900; else if (year >= 19100) // seen some programs do it, why not check? year -= (19100-2000); ! while (isspace(*s)) s++; // get month... --- 1139,1145 ---- year += 1900; else if (year >= 19100) // seen some programs do it, why not check? year -= (19100-2000); ! while (*s == '-' || isspace(*s)) s++; // get month... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=579581&group_id=4593 |
|
From: codewuzz <cod...@gm...> - 2002-07-09 12:37:14
|
Hi,
I'm having some problem compiling the ht://dig through Kdevelop 2.1, it seems
up on ./configure everything crashes after "creating db_config.h"
It`s quite odd cause it work fine in the console. I don`t know whether this
rather is a kdevelop bug, maybe so.
I'm using the latest current version 3.2.0b4, mainly coding for adding some meta tags
and so. And I wanted to compile everything throught Kdevelop for the ease of adding
some files to the project. I wonder also if it could be possible to include .prj
for htdig in the CVS, I know it shouldnt be necceary though. But maybe it`s my
escape path out of this.
Up until now I have just been coding on the existing files on the project, hence
I believe I should tighdy up some part of my code in a own library or so, its make
logic and especially when I`m wanting to do some futher extenstions.
Maybe this thing can be added as an package/plugin for the contribute section later.
I tried to implement my own files directly into the configure.in,makefile.config,
and Makefile.am. Hence running automake didn`t seem to be a really good idea as
I got some problems with htfuzzy lib with dependency on it-self.
Acctually running automake seems to be a bad idea even with-out modifying the code.
Regards,
CodeWuzz.
Stack Trace:
<zip>
read(13, "creating db_config.h\n", 1024) = 21
gettimeofday({1026212413, 167458}, NULL) = 0
write(3, "8\5\5\0\323\21\200\2\4 \0\0\377\377\377\0\276\356\377\377"..., 1248) = 1248
gettimeofday({1026212413, 168388}, NULL) = 0
gettimeofday({1026212413, 168803}, NULL) = 0
write(3, "C\5\5\0\16\3\200\2G\r\200\2\0\0\0\0\17\0\17\0B\r\7\0\16"..., 1464) = 1464
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 169094}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 105014}) = 1 (in [3], left {0, 100000})
gettimeofday({1026212413, 177561}, NULL) = 0
ioctl(3, 0x541b, [64]) = 0
read(3, "\16\371\375\377\20\3\200\2\0\0>\10\270\357\30\t<\371\377"..., 64) = 64
ioctl(3, 0x541b, [0]) = 0
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 177897}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 96211}) = 0 (Timeout)
gettimeofday({1026212413, 272865}, NULL) = 0
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 273003}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 1105}) = 0 (Timeout)
gettimeofday({1026212413, 282547}, NULL) = 0
gettimeofday({1026212413, 282656}, NULL) = 0
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 282784}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 293979}) = 1 (in [13], left {0, 50000})
read(13, "db_config.h is unchanged\n", 1024) = 25
gettimeofday({1026212413, 543902}, NULL) = 0
write(3, "8\5\5\0\323\21\200\2\4 \0\0\377\377\377\0\256\356\377\377"..., 1264) = 1264
gettimeofday({1026212413, 544878}, NULL) = 0
gettimeofday({1026212413, 545289}, NULL) = 0
write(3, "C\5\5\0\16\3\200\2[\r\200\2\0\0\0\0\17\0\17\0B\r\7\0\16"..., 1464) = 1464
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 545551}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 31212}) = 1 (in [3], left {0, 30000})
gettimeofday({1026212413, 553859}, NULL) = 0
ioctl(3, 0x541b, [64]) = 0
read(3, "\16\371r\0\20\3\200\2\0\0>\10\270\357\30\t<\371\377\277"..., 64) = 64
ioctl(3, 0x541b, [0]) = 0
ioctl(3, 0x541b, [0]) = 0
gettimeofday({1026212413, 554192}, NULL) = 0
select(17, [3 4 6 7 9 11 13 16], [], NULL, {0, 22571}) = 2 (in [13 16], left {0, 30000})
--- SIGCHLD (Child exited) ---
wait4(1792, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 1792
write(8, "\0\7\0\0", 4) = 4
write(8, "\0\0\0\0", 4) = 4
write(8, "\0\0\0\0", 4) = 4
write(8, "\0\0\0\0", 4) = 4
sigreturn() = ? (mask now [RTMIN])
read(13, "\n\nNow you must run \'make\' follow"..., 1024) = 54
gettimeofday({1026212413, 560847}, NULL) = 0
--- SIGSEGV (Segmentation fault) ---
rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DFL}, 8) = 0
alarm(3) = 0
write(2, "KCrash: crashing.... crashRecurs"..., 47) = 47
getpid() = 1786
write(2, "KCrash: Application Name = kdeve"..., 64) = 64
fork() = 4064
close(9) = 0
getrlimit(0x7, 0xbfffe488) = 0
|
|
From: Scott G. <sgi...@su...> - 2002-07-08 08:12:57
|
My multiple noindex_start/noindex_end has been updated. The old one
had some typos that I didn't notice, which caused crashes on some
systems.
You can get the new version at:
http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-multiple-noindex.patch
More information about this patch is available at:
http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-multiple-noindex.README
For posterity, the old version is available at:
http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-multiple-noindex-0.1.patch
-----ScottG.
|
|
From: Geoff H. <ghu...@us...> - 2002-07-07 07:14:02
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Jim C. <gre...@yg...> - 2002-07-02 10:36:14
|
LEGISMEX's bits of Mon, 1 Jul 2002 translated to: > >does any searcher exist that could one use for platform >Internet information server 5.0 ? Some people have had luck building ht://Dig under various Windows platforms. See http://www.htdig.org/FAQ.html#q2.6 If you are just looking for something that can index sites served up by IIS, ht://Dig should work fine for that purpose. Jim |
|
From: LEGISMEX <ala...@it...> - 2002-07-01 16:46:08
|
does any searcher exist that could one use for platform Internet information server 5.0 ? Ing. Alberto Angel Garza |