SourceForge has been redesigned. Learn more.

User defined fields + script2javaViaParam

  • wuuru

    wuuru - 2005-03-01

    Hello, are these forum and project alive at all?

    Now I am working on creating a catalogue on CD with search   capability and I decided to use Vicaya, because didn't find any better. Still, Vicaya seems to be far from peftect too.
    So now I am trying to implement two features I need for my project:

    1) User-defined fields in HTML fields.

    A feature for searching a production catalogue, so that one could search for fields like product_name or product_number rather than performing full-text search.

    Lucene already supports fielded data and one can use queries like "field1:value" provided that field1 exists in the index. Vicaya indexer, however, only creates fields "url", "summary", "title" and "contents".

    So, for my purposes I modified the Vicaya indexer and now it understands the following notation in any HTML file:

    <!-- @vicaya fielname value -->

    If anyone finds this option interesting, I could send a simple patch for the authors to add it to the project tree.

    2) A different user interface. I would like to have my own search form in HTML and Javascript rather than the look offered by the applet. For this purpose, I see that at the project page there exist already "script2javaViaParam" demo with a simple "Reloader" class that demontrate how to pass some input from Javascript to a Java applet and get back the output. However, this is only a demo. Has anyone added this feature to Vicaya?

    3) Non-ASCII text in HTML attributes and comments.
    Since I work with documents in Russian (Cyrillic), the indexer crashed on seeing Russian (KOI8-R) characters in HTML attributes such as <img alt="russian text"> or
    <meta name="description" content="russian text">. As a workaround, I modified the definition of <LET> in HTMLParser.jj so that it has Russian letters too. For me it works just fine, but if one will need to use non-ASCII characters other than Russian, the indexer will still crash.

    • Alex

      Alex - 2005-03-01

      Hi Grigoriy,

      Yes, I have gone on a long hiatus (moved to Denmark, got a job, became frustrated with the various browser/jvm/os incompatibilities).

      (1) By user defined fields, do you mean defined by the programer (such as yourself) a web/cdrom administrator, or the end user (or HTML developer)? Are you searching HTML or XML?

      I have another side project: that uses a growing XML file and performs searches based on specific XPATH/XQueries. I would like to allign that with Lucene. Or better yet, I would like to codify the Lucene DB files into XML and load them into Javascript.

      I assume the file you sent me by email was such a patch. I'll give it a look-see (perhaps I'll give Vicaya another run in the following weekend).

      (2) The demo was created because the Mac can not handle LiveConnect nor script calls to Java, nor any real interaction between java and javascript. So, me thinks, I'll reload the java with params set dynamically with javascript. BUT, this doesn't work in Firefox on the mac (at that time). Specifically, Firefox on the Mac does not load javascript with GET parameters when the protocol is file://


      I am willing to try again without the GET fields writing directly to the DOM in javascript (which may load a java applet, I don't know). Would that be enough?

      (3) I'd like to see that code (maybe you've sent it to me). I intend to get full Unicode support into Vicaya.

      I won't be truely active in the project for a while yet, but I may try to help your specific concerns if they are inline with our goals. None the less, I invite you to join the development. The least I can do is add the project to CVS -- I have been waiting for SVN. Oh, well.

      • wuuru

        wuuru - 2005-03-02

        Thank you for your reply!

        I have received your email too.

        (1) I mean fields defined by HTML developper, in form of HTML comments. In my project, we have a production catalogue with a separate HTML file for each catalogue item. The data for such an item will include fields such as name, item number, item size and so on. So, it will be possible to add the following text to the HTML file:
          <!-- @vicaya name trousers -->
          <!-- @vicaya code TL564 -->
          <!-- @vicaya size L -->

        Then the user will be able not only to search for any page that contain the words "size" and "L", but specify that the "L" he is looking for is exactly the value of the field "size" by typing a query such as "name:trousers size:L".

        The patch I've sent to you does just this: looks for comments in form of <!-- @vicaya name value -->
        and creates the appropriate fields with doc.add(Field.Text()). Another solution would be having these data in XML, but I have implemented the simplest.

        What I will do now is either adding a selection list to your applet, so that the user just select "size" from the list of fields and then look for "L", or doing the same with HTML form and then passing the query "size:L" to your applet. I am the feel to prefer the latter.


        I've seen the screenshots such as "Hits can be displayed in customizable HTML"

        Is this really generated by any version of Vicaya or only supposed to be like this in future? If you really have a working version of this, please give me it! Even if it is not completely ready. I don't mind firefox not letting ?param=value in local URLs, because passing parameters via document.write("<param>") is enough.

        (3) I have modified only one line of HTMLParser.jj:
          | < #LET:     ["A"-"Z","a"-"z","&#1072;"-"&#1103;","&#1040;"-"&#1071;","0"-"9"] >
        This allows for KOI8-R Russian letters &#1040;&#1072; &#1041;&#1073; &#1042;&#1074; &#1043;&#1075; etc. to be in the HTML attribute values. This is not a right way to go in general, however, because characters of different alphabets will still make the parser to fail. However, this is OK for my current purposes.

      • wuuru

        wuuru - 2005-03-02


        The second case of using user-defined fields in my project is restricting the search to a given section of the CD catalogue (such as "About the company", "Staff", "List of products" etc.). For instance, I will add the field
        <!-- @vicaya section staff --> to every HTML page being the part of the section "staff" and give a user a list of sections to select for restricting the search.


Log in to post a comment.