RE: [htdig-dev] Bayesian Algorithm Part II

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, 17 Oct 2001, Quim Sanmarti wrote:

> > That's when I began to wonder if anybody in the htdig
> > development community had looked into implementing 'bayesian'
> > searching, or if htdig could do 'it', hence my vague post.
> 
> The htdig databases are postitively not prepared to deal with such
> techniques, IMHO. They are not intended to. The power of htdig is based in
> 'classical' boolean queries.

I don't know that I'd call them "classical" anymore, but I'd agree that
I'd design a different word database backend if I wanted to do Bayesian
queries. But I think you can get pretty high quality results without
this--AFAIK, Google doesn't use them and most people see them as the
target.

> > My theory was that most new market trends (worth paying
> > attention to) are usually already, or quickly will be, reflected in the open
> > source development community.
> My perception is eventually the inverse. Open-source has been traditionally
> being bound to research and innovation. It's now being used by companies as
> an innovation channel, so that market trends emerge later from there...

This depends a lot on the development effort. Certainly gcc has some true
innovation and is a great example of getting truly fantastic people
together--I doubt you could ever afford to pay for all the development on
gcc.

In this case, I think parts of ht://Dig could be used for research
purposes and I think there are several research-grade algorithms that
could be implemented without too much effort (n-gram fuzzy algorithms come
to mind).

On the other hand, the number of active contributors to the project right
now is extremely low and so I think we'd need an infusion of "fresh
brains" before this could happen.

As Quim pointed out as well, there are other packages which attempted to
tackle Bayesian searches.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/