This project started as a responce to a request at Access To Insight dot Org for a search engine that could run from a CD-ROM.
I've uploaded a working test. I've generated the index using Apache Lucene simple demo on the 'majjhima' directory. To run this project on another set of files, run Apache Lucene's demo on any directory you like and replace 'majjhima' with that directory. If you need to modify the code, you will need to package (JAR) and sign the JAR before the applet will be granted permission to read and write (temp file) from the local file system.
At the moment, I am unable to continue development work (I'm typing this from an Internet cafe in Copenhagen). However, the next steps will be to make the results page more interesting (with a summary, title, etc) as well as to make a custom the indexer.
> It is Java-based, but I don't think it is free,
> at least I couldn't
> find it outside the CD-ROM.
I wonder if the code is open (to look at).
I've found O'Reilly a very friendly company
to work with. They are not as protectionist
as many others (and of course, that's why
I use and pay for their products).
> In any case, if we would like a web server
> and servlet container, we could try:
This is a common container. I've indirectly used
it with a number of products (JBoss and Cocoon
come to mind). When I first talked with John, I
was considering integrating a web browser with
a web server. I'm surprised something like this
doesn't exist. Many of the Mozilla modules are
written with C and Java equivilants. There is a
Swing JEdit panel which handles basic HTML.
However interesting, I think that's beyond
the scope of this project.
> What would be the benefits of running a webserver
> versus the applet?
It would be possible to run the exact same
code on the server (ATI) as on the CDROM. We
would have much more control of the output
(we can generate HTML dynamically which is
supported by ALL browsers). On the otherhand,
I think it is over-the-head for most users,
and introduces a number of other concerns
(security, post conflicts, etc).
> Do we need to perform more complex stuff
> than what it is available via the applet?
Not really, but I'll give you an example of
the difficulties. Let me explain a bit how
There are two components: The Indexer and
the Searcher. The Indexer need only be run
once (on John's machine, perhaps). This
generates the index which allows very fast
searches at a later time.
The Searcher is loaded into the browser as
an applet. It reads the index (directory)
from the local media and performs the query
from the user input against the index. We
then generate a temp HTML file on the local
harddrive containing the search hits. We
then tell the browser to open the temp file
in the active window.
The applet must be signed for all of that
to work (read/write permissions). The
application can simply bypass all the
security restrictions, but because it is
not running in a browser window, it has to
jump through hoops to open the temp HTML
file in the default system browser.
All of that works (less or more). However,
it's at that level that keeps me from supporting
a few browser/platform combinations. IE/Mac
comes to mind.
You'll notice that for very obvious
queries, the HTML file is quite large
and takes a bit of time to generate and
load in a browser. This isn't necessary at
all. However, the alternatives are interesting.
(1) Generate multiple pages with 10-20 hits
per page. (2) Generate one page with 10-20
hits and embed the applet in that page, add
(next, previous) buttons in the applet and
requery and retrieve the appropriate hits. I'm sure
you could come up with others.
Of course, there are a bunch of cosmetic features,
that I'm not ready to care about yet (feel free
to implement them) such as "return key" instead
of searching after a button click. This list is
While not (yet) required by ATI, I'd like to get
this Unicode compliant.
solutions, but I've come to see it as a very
powerful language. It's just annoying because
there is no type checking and there are no good
However, I have been considering embedding a
hidden version of Vicaya in a web page and
access the Lucene classes via vanilla HTML
The hits page can be generated completely
dynamically, without temp files and all the
features we require. It's four years since I
last decided LiveConnect was a deadend.
Perhaps its matured since then.
This adds a compatibility issue, but perhaps it will
work better than the Java applet alone.
> It works fine on Fedora Core
> (formerly Red Hat Linux) and
> Firefox, with the Java plug-in from Sun.
I'm very happy to hear that. We'll add the
good news to the supported
> I noticed that Vicaya is not in CVS,
> what are your plans
> regarding this?
I am a big fan of CVS. However, as of today,
this is a one-man development show. Perhaps
this is chicken-and-egg. I think the
lack of a web page keeps developers away,
but maybe CVS might help attract them. Before
CVS though, is the web page.
I will split the current discussion in different topics.
> This adds a compatibility issue, but perhaps it will
> work better than the Java applet alone.
I would put compatibility as one of the top priorities, we want this CD-ROM to be usable by anybody and as self-contained as possible.
> I think the
> lack of a web page keeps developers away,
> but maybe CVS might help attract them. Before
> CVS though, is the web page.
I will try to get a web page ready.
Even if it is a one-man-show, I would recommend that you upload it to CVS, you don't need to grant write permissions for anybody, and having it on CVS will allow other people to generate patches against it to send it to you.
Plus, if you get hit by the proverbial truck, the code will be there for somebody else to take. :-)