From: Neal R. <ne...@ri...> - 2005-10-20 16:00:34
|
An updated design doc is here: http://opensource.rightnow.com/htdig4_refactor_design.pdf Thanks On Tue, 18 Oct 2005, Neal Richter wrote: > > I've been lax in checking-in myself. > > Anthony Arnone and I have started work on HtDig 4.0 > > Here is a blog that Anthony has been keeping on Htdig 4.0 development. > > http://htdig.blogspot.com/ > > There is a new branch in CVS. > > http://cvs.sourceforge.net/viewcvs.py/htdig/htdig/?only_with_tag=htdig_4_0 > > This is an older design document.. I'll get an updated one put on the blog > ASAP. > http://opensource.rightnow.com/htdig4_refactor_design.pdf > > Basically the idea is to rip out the existing word-index and searching > code and replace it with CLucene while preserving as much of htdig > configurability as possible. The function of the spider will be nearly > unchanged. The db.doc.index will still exist, but that's the only thing > Berkeley DB will be used for. > > I've removed the hacked version of BDB in 4.0 CVS. > > What do we do about 3.2? My vote is to call it 'final', update the > website and move forward. I could do this, and have posted this thought > in the past.. no consensus emerged and I have no desire to be > heavy-handed. > > After having looked at many commercial implementation of search engines > over the past few years and following Nutch a bit.. I am still convinced > that HtDig has plenty of legs. > > 3.2 has become a road-block to progress. We know it has issues, and > various people have made valiant efforts to address them. From working > with the 'general' list some, plenty of users try moving to 3.2 then move > back to 3.1.6. > > On the other hand people, like Christopher Murtagh and myself have used it > as a cog in a larger application. > > My thought process for 4.0 is to get the htdig developers to concentrate > on building an application for web-servers rather than trying to do it all > and maintain the inverted index code... the Lucene community has already > cracked that nut. > > Maybe this will get development kick-started again, since it's 100% > obvious that we're all not interested in furthering the current 3.2 code > for whatever reason. > > Thanks. > > On Sat, 15 Oct 2005, Gustave Stresen-Reuter wrote: > > > It's been pretty quiet on the list lately. Is the party over? > > > > Ted > > -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |