What its about

  • Alex


    This project started as a responce to a request at Access To Insight dot Org for a search engine that could run from a CD-ROM.

    I've uploaded a working test. I've generated the index using Apache Lucene simple demo on the 'majjhima' directory. To run this project on another set of files, run Apache Lucene's demo on any directory you like and replace 'majjhima' with that directory. If you need to modify the code, you will need to package (JAR) and sign the JAR before the applet will be granted permission to read and write (temp file) from the local file system.

    At the moment, I am unable to continue development work (I'm typing this from an Internet cafe in Copenhagen). However, the next steps will be to make the results page more interesting (with a summary, title, etc) as well as to make a custom the indexer.

    • Alex

      > It is Java-based, but I don't think it is free,
      > at least I couldn't
      > find it outside the CD-ROM.

      I wonder if the code is open (to look at).
      I've found O'Reilly a very friendly company
      to work with. They are not as protectionist
      as many others (and of course, that's why
      I use and pay for their products).

      > In any case, if we would like a web server
      > and servlet container, we could try:
      > http://jetty.mortbay.org/jetty

      This is a common container. I've indirectly used
      it with a number of products (JBoss and Cocoon
      come to mind). When I first talked with John, I
      was considering integrating a web browser with
      a web server. I'm surprised something like this
      doesn't exist. Many of the Mozilla modules are
      written with C and Java equivilants. There is a
      Swing JEdit panel which handles basic HTML.
      However interesting, I think that's beyond
      the scope of this project.

      > What would be the benefits of running a webserver
      > versus the applet?

      It would be possible to run the exact same
      code on the server (ATI) as on the CDROM. We
      would have much more control of the output
      (we can generate HTML dynamically which is
      supported by ALL browsers). On the otherhand,
      I think it is over-the-head for most users,
      and introduces a number of other concerns
      (security, post conflicts, etc).

      > Do we need to perform more complex stuff
      > than what it is available via the applet?

      Not really, but I'll give you an example of
      the difficulties. Let me explain a bit how
      Vicaya works.

      There are two components: The Indexer and
      the Searcher. The Indexer need only be run
      once (on John's machine, perhaps). This
      generates the index which allows very fast
      searches at a later time.

      The Searcher is loaded into the browser as
      an applet. It reads the index (directory)
      from the local media and performs the query
      from the user input against the index. We
      then generate a temp HTML file on the local
      harddrive containing the search hits. We
      then tell the browser to open the temp file
      in the active window.

      The applet must be signed for all of that
      to work (read/write permissions). The
      application can simply bypass all the
      security restrictions, but because it is
      not running in a browser window, it has to
      jump through hoops to open the temp HTML
      file in the default system browser.

      All of that works (less or more). However,
      it's at that level that keeps me from supporting
      a few browser/platform combinations. IE/Mac
      comes to mind.

      You'll notice that for very obvious
      queries, the HTML file is quite large
      and takes a bit of time to generate and
      load in a browser. This isn't necessary at
      all. However, the alternatives are interesting.
      (1) Generate multiple pages with 10-20 hits
      per page. (2) Generate one page with 10-20
      hits and embed the applet in that page, add
      (next, previous) buttons in the applet and
      requery and retrieve the appropriate hits. I'm sure
      you could come up with others.

      Of course, there are a bunch of cosmetic features,
      that I'm not ready to care about yet (feel free
      to implement them) such as "return key" instead
      of searching after a button click. This list is
      inexhaustible though.

      While not (yet) required by ATI, I'd like to get
      this Unicode compliant.

      Javascript. I used to scoff at any Javascript
      solutions, but I've come to see it as a very
      powerful language. It's just annoying because
      there is no type checking and there are no good

      However, I have been considering embedding a
      hidden version of Vicaya in a web page and
      access the Lucene classes via vanilla HTML
      forms, javascript events and LiveConnect.
      The hits page can be generated completely
      dynamically, without temp files and all the
      features we require. It's four years since I
      last decided LiveConnect was a deadend.
      Perhaps its matured since then.

      This adds a compatibility issue, but perhaps it will
      work better than the Java applet alone.

      > It works fine on Fedora Core
      > (formerly Red Hat Linux) and
      > Firefox, with the Java plug-in from Sun.

      I'm very happy to hear that. We'll add the
      good news to the supported
      platform list.

      > I noticed that Vicaya is not in CVS,
      > what are your plans
      > regarding this?

      I am a big fan of CVS. However, as of today,
      this is a one-man development show. Perhaps
      this is chicken-and-egg. I think the
      lack of a web page keeps developers away,
      but maybe CVS might help attract them. Before
      CVS though, is the web page.

      • Hugo Gayosso
        Hugo Gayosso

        I will split the current discussion in different topics.

    • Hugo Gayosso
      Hugo Gayosso

      > This adds a compatibility issue, but perhaps it will
      > work better than the Java applet alone.

      I would put compatibility as one of the top priorities, we want this CD-ROM to be usable by anybody and as self-contained as possible.

    • Hugo Gayosso
      Hugo Gayosso

      > I think the
      > lack of a web page keeps developers away,
      > but maybe CVS might help attract them. Before
      > CVS though, is the web page.

      I will try to get a web page ready.

      Even if it is a one-man-show, I would recommend that you upload it to CVS, you don't need to grant write permissions for anybody, and having it on CVS will allow other people to generate patches against it to send it to you.

      Plus, if you get hit by the proverbial truck, the code will be there for somebody else to take. :-)