From: Lachlan Andrew <lha@us...> - 2004-05-08 12:42:46
It's really good to hear from you! Good luck with the PhD -- you are
a brave man doing that part time!
Using Clucene is a very good idea. (I'm Cc'ing this to the clucene
developers' list, to hear their opinions.) We had been talking about
doing that eventually anyway. We're already using another project's
'guts' by using Mifluz/BDB, so I don't see any ideological problem
there -- it's just a matter of scale. In fact, I'd be perfectly
happy to be subsumed by Clucene, as long as week keep backward
compatibilty with ht://Dig. As you say, the role of ht://Dig
(spidering and user interface) is complementary to that of Clucene.
The big problem is the amount of work, but all of the options are a
lot of work. I can really only afford to spend a couple of hours a
week on ht://Dig. To be viable, I think we need at least four times
that (not including support for 3.1.x and 3.2, or developers adding
new features). The fact that only two people have so far responded
to my mail reflects our dire straits...
On Sun, 2 May 2004 05:26 pm, Neal Richter wrote:
> There is another alternative to either flushing out the
> inefficient cruft in 3.2.0 or backporting to 3.1.6
> We could look at integrating with Clucene.
> It's worth considering... but would be a lot of work. We would
> have to carefully examine which htdig configs we could still
> The advantage is that CLucene is under active development by
> experienced search-engine people, I believe one of the participants
> is an original Altavista developer. It's a fairly small code
> base, and it's LGPL.
> The disadvantages are that at the moment there is no DB
> compression, it's not an enduser application (where HtDig is), and
> it will be a lot of work.
> Would we all be satisfied if we used a different project's 'guts'?
> For that matter we could look at moving our spidering code to use a
> different library.
ht://Dig developer DownUnder (http://www.htdig.org)
On Sat, 8 May 2004, Lachlan Andrew wrote:
> there -- it's just a matter of scale. In fact, I'd be perfectly
> happy to be subsumed by Clucene, as long as week keep backward
> compatibilty with ht://Dig. As you say, the role of ht://Dig
> (spidering and user interface) is complementary to that of Clucene.
I believe the CLucene developers are more focused on providing a
indexing/searching library that other can build an application with.
And given everyones time constraints, it makes more sense long-term to
throw in with them and participate.
> The big problem is the amount of work, but all of the options are a
> lot of work. I can really only afford to spend a couple of hours a
> week on ht://Dig. To be viable, I think we need at least four times
It will be no trivial amount of work to convert htdig to use CLucene as
the index/search guts.... unless we make the decision that all
configuration options related to the current search/index code are
eligible for removal... we'll keep the ones we can, change the
definitions of some and others will die.
If we try and be fully backwards compatible with the current subset of
search/index configs... we'll be asking for trouble.
I'm subscribed to the CLucene list.. it's quite active and motivated...
and they are working on UTF-8 support... which is motivation enough to
consider the switch.
I do think we should consider releasing 3.2.0 after a bit of
RightNow Technologies, Inc.
Customer Service for Every Web Site