You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Lachlan A. <lh...@us...> - 2003-03-09 09:23:03
|
Greetings Jim, Attached is a hack which explicitly stops the recursion in OS X. Does=20 it work? (Neal, would it be better in one of the other functions in=20 the loop?) I don't know why a different OS should crash in a different place. =20 Does OS X support pread? Type man pread. Are you having any luck with the other errors in 'make check'? Thanks! Lachlan On Friday 28 February 2003 11:23, Jim Cole wrote: > Hi - Just a follow up on the issue of zlib version. I installed > the 1.1.4 version of zlib available via Fink and rebuilt > everything. Using the newer version of zlib, I encounter the same > problem (i.e. a segfault from htdig with a very deep stack trace). > I did perform a distclean and verified the use of the 1.1.4 version > libz via otool. > > Jim > > On Thursday, February 27, 2003, at 04:41 AM, Lachlan Andrew wrote: > > On Thursday 27 February 2003 11:16, Jim Cole wrote: > >> The backtrace is attached. The problem does not occur if I turn > >> off compression. > > > > Thanks. My guess is that (part of) the reason for the very deep > > recursion is that it's trying to allocate a block of len=3D8247 > > bytes, when the page size is only 8192: > > #3244 0x00070958 in CDB___memp_alloc (dbmp=3D0xa98c30, > > memreg=3D0xa99f60, mfp=3D0xc84e98, len=3D8247, offsetp=3D0x0, > > retp=3D0xbfffd900) at mp_alloc.c:88 > > > > I used to get the error > > Unable to allocate %lu bytes from mpool shared region > > at some stage too, which is generated inside CDB___memp_alloc.=20 > > From memory, that was when I was using 1.1.3. > > > > If that is really the problem, it can be fixed by testing > > explicitly whether len>pagesize (if the pagesize is available > > somewhere...). > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev |
|
From: Geoff H. <ghu...@us...> - 2003-03-09 08:15:24
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Lachlan A. <lh...@us...> - 2003-03-05 11:52:29
|
Just following up on that, the problem seems to be that =20
dbmfp->mfp->last_pgno doesn't get read in properly when the database=20
is reopened. I *assume* that it should be the last page actually=20
used, but at some point the free list allocates 71 (which was used=20
in the database when it was written in the previous program) despite=20
dbmfp->mfp->last_pgno having been reset to 33. I added the if =20
statement below in the diagnostic section at the end of=20
CDB___memp_cmpr_alloc, and the only place it reports an error is the=20
one that causes the crash...
#ifdef DEBUG_CMPR
fprintf(stderr,"CDB___memp_cmpr_alloc:: reuse free page %d from=20
weakcmpr\n", *pgnop);
if (*pgnop > dbmfp->mfp->last_pgno)
fprintf (stderr, "*******ERROR?? dbmfp->mfp->last_pgno %d,\=20
allocating %d\n", dbmfp->mfp->last_pgno, *pgnop);
#endif
Good night :)
Lachlan
On Wednesday 05 March 2003 22:08, Lachlan Andrew wrote:
> I've got as far as finding that, at some point, page 3 has a chain
> 3->71->34, but that page 27 is then allocated the chain 27->70->71,
> so page 3 gets corrupted. I'm about to start looking for the
> free-list.
|
|
From: Lachlan A. <lh...@us...> - 2003-03-05 11:08:47
|
Greetings Neal, Have you had any luck reproducing this bug? I've got as far as finding that, at some point, page 3 has a chain=20 3->71->34, but that page 27 is then allocated the chain 27->70->71,=20 so page 3 gets corrupted. I'm about to start looking for the=20 free-list. Cheers, Lachlan On Friday 28 February 2003 06:14, Neal Richter wrote: > Great! I'll test and fix this ASAP. |
|
From: Lachlan A. <lac...@ip...> - 2003-03-04 13:07:57
|
This has been fixed in the latest snapshot. If you don't want to download that, just put the line #define DBL_MAX 1e37 at the start of Display.cc Cheers, Lachlan On Tuesday 04 March 2003 20:34, Erick Papadakis wrote: > Just tried installing, and on MAKE, it gives me the following > error: > > Display.cc:55: `DBL_MAX' undeclared (first use this function) > Display.cc:55: (Each undeclared identifier is reported only once |
|
From: Erick P. <eri...@ya...> - 2003-03-04 09:34:04
|
Just tried installing, and on MAKE, it gives me the following error:
Display.cc:55: `DBL_MAX' undeclared (first use this function)
Display.cc:55: (Each undeclared identifier is reported only once
for each function it appears in.)
make[1]: *** [Display.o] Error 1
make[1]: Leaving directory `/home/MYUSER/htdig-3.2.0b3/htsearch'
make: *** [all-recursive] Error 1
What am I doing wrong? I followed the instruction to the T. Ran a
.configure with the necessary options (I notice there is no CONFIG file
to edit unlike the HTDIG 3.1.6 version, so I did this with the
--OPTION=VALUE settings).
Any ideas would be appreciated.
Thanks,
Erick
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/
|
|
From: Geoff H. <ghu...@ws...> - 2003-03-04 04:15:24
|
On Monday, March 3, 2003, at 09:37 PM, Geoff Hutchison wrote:
> for the maximum number of words in a document. But 2^24 gives us a
> good 16-million words, which is good enough for War and Peace. (I'm
> checking at the moment.)
Well, we might get by on less than that:
(These are the Project Gutenberg etext editions of _War and Peace_ and
the King James Bible.)
localhost: ghutchis% wc wrnpc10.txt
67337 566237 3282452 wrnpc10.txt
localhost: ghutchis% wc bible11.txt
114385 822894 4959549 bible11.txt
So I'd guess that 2^20 should be more than enough words. Or does
someone have a nice long document to prove me wrong?
-Geoff
|
|
From: Geoff H. <ghu...@ws...> - 2003-03-04 03:37:49
|
> That could have its own problems. If they are labelled -1, -2, ... > then phrase searching would have to match *backwards* for negative > numbers. Then if true positions overflowed into negative numbers, > ...very negative number, then it is essentially starting from a very > large (unsigned) location. Thoughts? It's pretty easy to come up with a n-bit integer that should be long enough for practical purposes. 2^16 = 65,536 which is probably still a bit too small for the maximum number of words in a document. But 2^24 gives us a good 16-million words, which is good enough for War and Peace. (I'm checking at the moment.) > Regarding flexibility, we could make htsearch treat words separated > by "invalid" puctuation (but no spaces) as a phrase, and make the > default valid_punctuation empty. That way people who want the > current functionality can have it (except queries where words are not > separated by spaces but *should* match those words separately?) but > the default would be less buggy for phrase searches. Sounds sensible to me--but I think we need more than one or two voices on this. But just to make sure I'm clear on what you want to do... status-quo -> status (location 0) + quo (location 1) And there's no entry for "statusquo" >> For some people, punctuation has meaning. Let's say we have part >> numbers or dates. "3/24/03" isn't really the same as "32403" and >> I'm not sure the phrase search works well either. > > Ah, yes. All three would be too short to be indexed... But isn't > that what extra_word_characters is for? Yes. But my point is that we should eventually work out a WordToken class or something that wraps up all these attributes and can be generalized for Unicode-type issues. -Geoff |
|
From: Geoff H. <ghu...@us...> - 2003-03-02 08:18:52
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Lachlan A. <lh...@us...> - 2003-02-28 22:55:33
|
Thanks for your explanations, Geoff :) More questions follow. On Saturday 01 March 2003 04:51, Geoff Hutchison wrote: > > 1. location must be in the range 0-1000? > That's a 3.1-ism. > > > 2. Could we add "meta" information > > at successive locations starting from, say, location 10,000? > > Actually, now that I think about it, a better idea is to use > negative word locations for META information. > As for some other arbitrary > number--we might actually have documents that long (esp. with PDF > indexing). That could have its own problems. If they are labelled -1, -2, ...=20 then phrase searching would have to match *backwards* for negative=20 numbers. Then if true positions overflowed into negative numbers,=20 the phrases wouldn't match. (If such overflow is impossible with =20 n-bit numbers, we could use *unsigned* locations, and count forward=20 from 2^(n-1) for meta information.) If we count *forward* from a=20 very negative number, then it is essentially starting from a very=20 large (unsigned) location. Thoughts? > > 3. With phrase searching, do we still need valid_punctuation?=20 > > For example, "post-doctoral" > > This is a strange example. What if I had a hyphenated word? I don't > know that your "phrase conversion" is the best solution. What we do > need is a flexible "word parser" that addresses some of these > issues. I suppose a key is how often people do phrase searches vs word=20 searches. Optionally-hyphenated words are trouble-prone since the=20 status-quo gives oh-so-many fasle-negatives for non-hyphenated=20 phrase-queries applied to over-hyphenated text... (The suggestion=20 was based on what google does.) Regarding flexibility, we could make htsearch treat words separated=20 by "invalid" puctuation (but no spaces) as a phrase, and make the=20 default valid_punctuation empty. That way people who want the=20 current functionality can have it (except queries where words are not=20 separated by spaces but *should* match those words separately?) but=20 the default would be less buggy for phrase searches. > For some people, punctuation has meaning. Let's say we have part > numbers or dates. "3/24/03" isn't really the same as "32403" and > I'm not sure the phrase search works well either. Ah, yes. All three would be too short to be indexed... But isn't=20 that what extra_word_characters is for? > > 4. Does anybody know what the existing external parsers do about > > words less than the minimum length? > I don't think most external parsers bother with the config file. |
|
From: Geoff H. <ghu...@ws...> - 2003-02-28 17:59:50
|
> 1. Why do the documentation for external_parser and the comments > before Retriever::got_word both say that the word location must be > in the range 0-1000? That's a 3.1-ism. The documentation is wrong. Oops. > first word of any *other* entry. Could we add "meta" information at > successive locations starting from, say, location 10,000? Actually, now that I think about it, a better idea is to use negative word locations for META information. This would leave "0" empty and make it impossible to match across the boundary, but fix phrase searching for META words. As for some other arbitrary number--we might actually have documents that long (esp. with PDF indexing). > 3. With phrase searching, do we still need valid_punctuation? For > example, "post-doctoral" currently gets entered as three words at the > *same* location: "post", "doctoral" and "postdoctoral". Would it be > better to convert queries for post-doctoral into the phrase "post This is a strange example. What if I had a hyphenated word? I don't know that your "phrase conversion" is the best solution. What we do need is a flexible "word parser" that addresses some of these issues. After all, Unicode raises even more problems about "what is a word." > "the non-smoker" will match "the smoker", since all the words are > given the same position in the database, but a search for "the non > smoker" won't match "the non-smoker". This also reduces the size of For some people, punctuation has meaning. Let's say we have part numbers or dates. "3/24/03" isn't really the same as "32403" and I'm not sure the phrase search works well either. Yes, reducing the database size would improve speed. Perhaps Gilles can comment on the motivations for the compound-word additions. (I'm having a hard time pulling them up in my mail archive or on the web.) > 4. Does anybody know what the existing external parsers do about words > less than the minimum length? Because they are passed the I don't think most external parsers bother with the config file. Remember that any word should go through HtWordList and this should throw out words that are too long, too short in the bad_words list, etc. -Geoff |
|
From: Lachlan A. <lh...@us...> - 2003-02-28 12:23:28
|
Greetings all, I'm checking through phrase searching, and have found several possible=20 bugs. First, some questions... 1. Why do the documentation for external_parser and the comments=20 before Retriever::got_word both say that the word location must be=20 in the range 0-1000? The HTML parser doesn't stick to that. If=20 locations are just scaled down (rather than reduced modulo 1001),=20 that will break the phrase searches. Is there a maximum in practice? 2. Every "meta" data entry (<title>, <meta ...> etc.) gets added as if=20 it starts at location 0. This gives *heaps* of false-positives,=20 because the second word of *any* entry is deemed adjacent to the=20 first word of any *other* entry. Could we add "meta" information at=20 successive locations starting from, say, location 10,000? 3. With phrase searching, do we still need valid_punctuation? For=20 example, "post-doctoral" currently gets entered as three words at the=20 *same* location: "post", "doctoral" and "postdoctoral". Would it be=20 better to convert queries for post-doctoral into the phrase "post=20 doctoral" in queries, and simply the words post and doctoral at=20 successive locations in the database? As it stands, a search for=20 "the non-smoker" will match "the smoker", since all the words are=20 given the same position in the database, but a search for "the non=20 smoker" won't match "the non-smoker". This also reduces the size of=20 the database (marginally in most cases, but significantly for=20 pathological documents). Now that there is phrase searching, is=20 there any benefit of the current approach? 4. Does anybody know what the existing external parsers do about words=20 less than the minimum length? Because they are passed the=20 configuration file, they *could* omit them. Currently the HTML=20 parser omits them, but that introduces false-positives into phrase=20 queries, and I want to fix that. Thanks! Lachlan |
|
From: Jim C. <li...@yg...> - 2003-02-28 05:08:19
|
Hi - I can't currently duplicate this result; however make check is pretty much a disaster on both systems that I have tried (Red Hat 8.0 and OS X). On the Red Hat box, I receive all sorts of error messages regarding modules and about half of the tests end up failing. Perhaps this is due to the fact that I am running a 2.0.x version of Apache? I don't have time to dig into it at the moment. Under OS X, I can't even build all of the test programs; WordType::instance shows up as an undefined symbol when linking testnet Jim On Thursday, February 27, 2003, at 05:43 AM, Lachlan Andrew wrote: > Greetings all, > > I've found a (much) smaller data set that gives me errors. > What do other people get with: > ./configure --enable-tests > make > make check > make check > > I get t_htmerge failing the second time, with a read error on page 3 > (if I remove the redirection of stderr). > > Cheers, > Lachlan > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld Something 2 See! > http://www.vasoftware.com > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev |
|
From: Jim C. <li...@yg...> - 2003-02-28 00:23:58
|
Hi - Just a follow up on the issue of zlib version. I installed the 1.1.4 version of zlib available via Fink and rebuilt everything. Using the newer version of zlib, I encounter the same problem (i.e. a segfault from htdig with a very deep stack trace). I did perform a distclean and verified the use of the 1.1.4 version libz via otool. Jim On Thursday, February 27, 2003, at 04:41 AM, Lachlan Andrew wrote: > On Thursday 27 February 2003 11:16, Jim Cole wrote: > >> The backtrace is attached. The problem does not occur if I turn off >> compression. > > Thanks. My guess is that (part of) the reason for the very deep > recursion is that it's trying to allocate a block of len=8247 > bytes, when the page size is only 8192: > #3244 0x00070958 in CDB___memp_alloc (dbmp=0xa98c30, memreg=0xa99f60, > mfp=0xc84e98, len=8247, offsetp=0x0, retp=0xbfffd900) at > mp_alloc.c:88 > > I used to get the error > Unable to allocate %lu bytes from mpool shared region > at some stage too, which is generated inside CDB___memp_alloc. From > memory, that was when I was using 1.1.3. > > If that is really the problem, it can be fixed by testing explicitly > whether len>pagesize (if the pagesize is available somewhere...). > |
|
From: Neal R. <ne...@ri...> - 2003-02-27 19:12:47
|
Great! I'll test and fix this ASAP. On Thu, 27 Feb 2003, Lachlan Andrew wrote: > Greetings all, > > I've found a (much) smaller data set that gives me errors. > What do other people get with: > ./configure --enable-tests > make > make check > make check > > I get t_htmerge failing the second time, with a read error on page 3 > (if I remove the redirection of stderr). > > Cheers, > Lachlan > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld http://www.vasoftware.com > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-27 12:44:24
|
Greetings all,
I've found a (much) smaller data set that gives me errors.
What do other people get with:
./configure --enable-tests
make
make check
make check
I get t_htmerge failing the second time, with a read error on page 3=20
(if I remove the redirection of stderr).
Cheers,
Lachlan
|
|
From: Lachlan A. <lh...@us...> - 2003-02-27 12:30:34
|
Greetings Wim, I notice that your run script is only redirecting standard output. =20 Are there any extra errors being reported to the screen? (You might=20 like to put a 2>&1 before 'tee' to redirect standard error too). I'm Cc'ing this to htdig-dev, because I'm stuck!! Sorry :( Lachlan ---------- Forwarded Message ---------- Subject: Re: how to get/create a db.wordlist for htdig 3.1.6 Date: Thu, 27 Feb 2003 11:46:18 +0100 From: "Wim Alsemgeest" <wal...@ho...> To: lh...@us... Hi again Lachan, Well i downloaded the new version of htdig and compiled it as you asked: ./configure --enable-tests make make install make check This did give errors, becourse htdig could not find apache and things like that. Then i configured and did a make by my script "run". Again i found errors with make check. I added the following file's to this email. I hope it will help you to find the reason of my problem. configure.out make.out make-install.out make-check.out configure.log run Greetings Wim From: Lachlan Andrew <lh...@us...> >Reply-To: lh...@us... >To: "Wim Alsemgeest" <wal...@ho...> >Subject: Re: how to get/create a db.wordlist for htdig 3.1.6 >Date: Wed, 26 Feb 2003 23:57:38 +1100 > >Greetings Wim, > >"Snapshot" is not a program. The snapshots are just the latest >development code, released every week. When new features have just >been added they can be buggy, but at the moment we are about to >release a new beta, and so the snapshots are fairly reliable. > >You can download the latest from ><http://www.htdig.org/files/snapshots/htdig-3.2.0b4-20030223.tar.gz> >and unpack it with the standard > tar -zxf htdig-3.2.0b4-20030223.tar.gz > cd htdig-3.2.0b4-20030223 > ./configure --enable-tests > make > make install > make check >sequence. > >I hope this helps, >Lachlan > >On Wednesday 26 February 2003 17:34, you wrote: > > Then I searched my system for the snapshot software you > > mentioned, but could not find it. What is the purpose of > > 'snapshot'? Where do I get a version for solaris-9? _________________________________________________________________ MSN Zoeken, voor duidelijke zoekresultaten! http://search.msn.nl ------------------------------------------------------- |
|
From: Lachlan A. <lh...@us...> - 2003-02-27 11:41:40
|
On Thursday 27 February 2003 11:16, Jim Cole wrote:
> The backtrace is attached. The problem does not occur if I turn off
> compression.
Thanks. My guess is that (part of) the reason for the very deep=20
recursion is that it's trying to allocate a block of len=3D8247 =20
bytes, when the page size is only 8192:
#3244 0x00070958 in CDB___memp_alloc (dbmp=3D0xa98c30, memreg=3D0xa99f60,=
=20
mfp=3D0xc84e98, len=3D8247, offsetp=3D0x0, retp=3D0xbfffd900) at=20
mp_alloc.c:88
I used to get the error
Unable to allocate %lu bytes from mpool shared region
at some stage too, which is generated inside CDB___memp_alloc. From=20
memory, that was when I was using 1.1.3.
If that is really the problem, it can be fixed by testing explicitly=20
whether len>pagesize (if the pagesize is available somewhere...).
|
|
From: Jim C. <li...@yg...> - 2003-02-27 00:16:53
|
On Tuesday, February 25, 2003, at 05:55 AM, Lachlan Andrew wrote: > The OS X crash is probably a good thing to look at, if it occurs early > in the dig. (a) Could you please post (or mail me) the complete > backtrace? (b) Does it still occur without compression? (c) Are you > using zlib-1.1.4? (I have had core dumps with earlier versions, but > not since upgrading.) The backtrace is attached. The problem does not occur if I turn off compression; by turning off compression, I mean that I changed your provided config file so that wordlist_compress and wordlist_compress_zlib are false and compression_level is commented out. OS X is still using 1.1.3; according to Apple, the vulnerability that drove the move to 1.1.4 did not affect their system. When I get a chance, I will try rebuilding everything against the 1.1.4 zlib available via fink and see if that makes any difference. Jim |
|
From: Neal R. <ne...@ri...> - 2003-02-27 00:15:52
|
FYI: What the Hell is Overture up to??? Feb 18: "Overture announced that they bought AltaVista today for $140M in cash and stock" http://news.com.com/2100-1023-984968.html?tag=fd_top Feb 25: "Hot off the heels of buying Altavista, Overture today announced it would buy Fast Search. Fast Search, a Norwegian company which manages AllTheWeb.com, will get $70 million in cash with up to $30 million in performance bonuses over the next three years. The deal is expected to close by April." http://rss.com.com/2100-1023-985850.html?type=pt&part=rss&tag=feed&subj=news Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Jim C. <li...@yg...> - 2003-02-27 00:09:32
|
Hi - The problem does not occur when running only htdig with -i Jim On Tuesday, February 25, 2003, at 04:30 PM, Neal Richter wrote: > > Jim, > Does the error happen when you run htdig -i twice (NOT using > rundig)? > > Thanks. > > > On Mon, 24 Feb 2003, Jim Cole wrote: > >> Hi - I was able to repeat the problem again. The second time around I >> made a point of catching the page numbers. They were the same as those >> listed in your log file. >> >> Jim >> >> On Sunday, February 23, 2003, at 06:21 AM, Lachlan Andrew wrote: >> >>> OK, now try this on for size... >>> >>> If I run the attached rundig script, with -v and the attached >>> .conf script on the attached directory (51 copies of the attached >>> file hash) with an empty .../var/htdig-crash1 directory, then all >>> is well. However, if I run it a *second* time, it gives the attached >>> log file. >>> >>> This is odd since the script uses -i which is supposed to ignore >>> the >>> contents of the directory. (On another note, should -i also ignore >>> the db.log file? It currently doesn't.) >>> >>> Neal, can you (or anyone else) replicate this behaviour? >>> >>> Thanks! >>> Lachlan >>> >>> On Sunday 23 February 2003 16:50, Lachlan Andrew wrote: >>>> Whoops! I didn't make clean after installing the new libraries. >>>> Now that I have, I haven't been able to reproduce the >>>> problem.<rundig><valid_punct.conf><directory><hash><log.first-200- >>>> lines> >> >> >> >> ------------------------------------------------------- >> This sf.net email is sponsored by:ThinkGeek >> Welcome to geek heaven. >> http://thinkgeek.com/sf >> _______________________________________________ >> htdig-dev mailing list >> htd...@li... >> https://lists.sourceforge.net/lists/listinfo/htdig-dev >> > > Neal Richter > Knowledgebase Developer > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev |
|
From: Lachlan A. <lh...@us...> - 2003-02-26 22:28:26
|
Greetings, What version of ht://Dig are you using? Try running htdig (or rundig) with -v (or -vv) and seeing what=20 it tells you. Cheers, Lachlan On Thursday 27 February 2003 03:14, Joerg Frenzel wrote: > I have installed htdig, doc2html and catdoc. > The test is running fine. But if I want to find doc-files on my > search.html I got no results. |
|
From: Lachlan A. <lh...@us...> - 2003-02-26 22:21:19
|
On Thursday 27 February 2003 04:36, Neal Richter wrote: > On Wed, 26 Feb 2003, Lachlan Andrew wrote: > > 1) The -i option doesn't remove the _weakcmpr file. > > 2) I've just run htdig on an existing database *without* -i=20 > > and it also complained about weakcmpr problems. > > (I've forgotten whether I ran htpurge after the first run, > > so I'm running it again without it.) > > #1 is easy to fix. Yes. While we're at it, we should remove db.log (=3D"url_log"). I was=20 just thinking it might give you/us some insight into the cause of the=20 problem. For #2, I have run htdig again without -i and without having purged=20 the database, but after 'touch'ing each html file. It complains: WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno =3D=20 40435 WordDB: PANIC: Input/output error Whenever this appears, it appears twice. > #3 > What is htpurge being run for???? Isn't its used to remove > entries from the index? I know that htpurge is called immediately > after htdig in rundig... my question is WHY???!!! Entries are created for all of the pages referred to during the dig,=20 even if they don't exist. Purging gets rid of these useless entries. > How are you guys using it? =2E./bin/htpurge -v -c <file>.conf > An interesting test would be to establish two test datasets that > are exact duplicates of each other at different URLs on your > server. > > %htdig -i URL1 > %htdig -i URL2 > > This would access, expand and rewrite nearly every page in the > WordDB. If there are problems rewriting/expanding pages, they may > show up. If -i works, the database should be erased before being accessed in=20 the second dig, shouldn't it? Regards, Lachlan |
|
From: Neal R. <ne...@ri...> - 2003-02-26 17:35:04
|
On Wed, 26 Feb 2003, Lachlan Andrew wrote:
> Greetings all,
>
> Just for the record:
> 1) The -i option doesn't remove the _weakcmpr file.
> Neal, what effect will that have?
> 2) I've just run htdig on an existing database *without* -i and
> it also complained about weakcmpr problems.
> (I've forgotten whether I ran htpurge after the first run, so
> I'm running it again without it.)
> 3) There is still a (different) problem with pagesize 32k. The
> htdig ran OK, but the second htpurge complained near the end.
#1 is easy to fix.
Note that there is no word_db_weakcmp config variable....
Changes near htdig.cc:279
const String word_filename = config->Find("word_db");
const String word_weakcmp_filename = word_filename;
word_weakcmp_filename.append("_weakcmpr");
if (initial)
{
unlink(word_filename);
unlink(word_weakcmp_filename);
}
#3
What is htpurge being run for???? Isn't its used to remove entries from
the index? I know that htpurge is called immediately after htdig in
rundig... my question is WHY???!!!
How are you guys using it?
What happens when you try and use it to remove URLs from the index,
and try to add more URLs after purging??
An interesting test would be to establish two test datasets that are
exact duplicates of each other at different URLs on your server.
%htdig -i URL1
%htdig -i URL2
This would access, expand and rewrite nearly every page in the WordDB.
If there are problems rewriting/expanding pages, they may show up.
Thanks!
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Joerg F. <jo...@ne...> - 2003-02-26 15:54:44
|
Hi, I have problems in finding results for my search items. I have installed htdig, doc2html and catdoc. Untill yesterday i had problem in running catdoc correctly. I solved this problem and with the following input I tested it: Under htdig/bin you find the program catdoc: ./catdoc ../examples/Die_Technik.doc My example is stored under htdig/examples. The test is running fine. But if I want to find doc-files on my search.html I got no results. I hope you can help me. I have no idea where the error is located. Best regards Jörg Frenzel Please mail to: jo...@ne... |