From: Neal R. <ne...@ri...> - 2005-10-18 17:23:45
|
I've been lax in checking-in myself. Anthony Arnone and I have started work on HtDig 4.0 Here is a blog that Anthony has been keeping on Htdig 4.0 development. http://htdig.blogspot.com/ There is a new branch in CVS. http://cvs.sourceforge.net/viewcvs.py/htdig/htdig/?only_with_tag=htdig_4_0 This is an older design document.. I'll get an updated one put on the blog ASAP. http://opensource.rightnow.com/htdig4_refactor_design.pdf Basically the idea is to rip out the existing word-index and searching code and replace it with CLucene while preserving as much of htdig configurability as possible. The function of the spider will be nearly unchanged. The db.doc.index will still exist, but that's the only thing Berkeley DB will be used for. I've removed the hacked version of BDB in 4.0 CVS. What do we do about 3.2? My vote is to call it 'final', update the website and move forward. I could do this, and have posted this thought in the past.. no consensus emerged and I have no desire to be heavy-handed. After having looked at many commercial implementation of search engines over the past few years and following Nutch a bit.. I am still convinced that HtDig has plenty of legs. 3.2 has become a road-block to progress. We know it has issues, and various people have made valiant efforts to address them. From working with the 'general' list some, plenty of users try moving to 3.2 then move back to 3.1.6. On the other hand people, like Christopher Murtagh and myself have used it as a cog in a larger application. My thought process for 4.0 is to get the htdig developers to concentrate on building an application for web-servers rather than trying to do it all and maintain the inverted index code... the Lucene community has already cracked that nut. Maybe this will get development kick-started again, since it's 100% obvious that we're all not interested in furthering the current 3.2 code for whatever reason. Thanks. On Sat, 15 Oct 2005, Gustave Stresen-Reuter wrote: > It's been pretty quiet on the list lately. Is the party over? > > Ted -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |