Hello, are these forum and project alive at all?
Now I am working on creating a catalogue on CD with search capability and I decided to use Vicaya, because didn't find any better. Still, Vicaya seems to be far from peftect too.
So now I am trying to implement two features I need for my project:
1) User-defined fields in HTML fields.
A feature for searching a production catalogue, so that one could search for fields like product_name or product_number rather than performing full-text search.
Lucene already supports fielded data and one can use queries like "field1:value" provided that field1 exists in the index. Vicaya indexer, however, only creates fields "url", "summary", "title" and "contents".
So, for my purposes I modified the Vicaya indexer and now it understands the following notation in any HTML file:
<!-- @vicaya fielname value -->
If anyone finds this option interesting, I could send a simple patch for the authors to add it to the project tree.
3) Non-ASCII text in HTML attributes and comments.
Since I work with documents in Russian (Cyrillic), the indexer crashed on seeing Russian (KOI8-R) characters in HTML attributes such as <img alt="russian text"> or
<meta name="description" content="russian text">. As a workaround, I modified the definition of <LET> in HTMLParser.jj so that it has Russian letters too. For me it works just fine, but if one will need to use non-ASCII characters other than Russian, the indexer will still crash.
Yes, I have gone on a long hiatus (moved to Denmark, got a job, became frustrated with the various browser/jvm/os incompatibilities).
(1) By user defined fields, do you mean defined by the programer (such as yourself) a web/cdrom administrator, or the end user (or HTML developer)? Are you searching HTML or XML?
I assume the file you sent me by email was such a patch. I'll give it a look-see (perhaps I'll give Vicaya another run in the following weekend).
(3) I'd like to see that code (maybe you've sent it to me). I intend to get full Unicode support into Vicaya.
I won't be truely active in the project for a while yet, but I may try to help your specific concerns if they are inline with our goals. None the less, I invite you to join the development. The least I can do is add the project to CVS -- I have been waiting for SVN. Oh, well.
Thank you for your reply!
I have received your email too.
(1) I mean fields defined by HTML developper, in form of HTML comments. In my project, we have a production catalogue with a separate HTML file for each catalogue item. The data for such an item will include fields such as name, item number, item size and so on. So, it will be possible to add the following text to the HTML file:
<!-- @vicaya name trousers -->
<!-- @vicaya code TL564 -->
<!-- @vicaya size L -->
Then the user will be able not only to search for any page that contain the words "size" and "L", but specify that the "L" he is looking for is exactly the value of the field "size" by typing a query such as "name:trousers size:L".
The patch I've sent to you does just this: looks for comments in form of <!-- @vicaya name value -->
and creates the appropriate fields with doc.add(Field.Text()). Another solution would be having these data in XML, but I have implemented the simplest.
What I will do now is either adding a selection list to your applet, so that the user just select "size" from the list of fields and then look for "L", or doing the same with HTML form and then passing the query "size:L" to your applet. I am the feel to prefer the latter.
I've seen the screenshots such as "Hits can be displayed in customizable HTML" https://sourceforge.net/project/screenshots.php?group_id=113008&ssid=1891:
Is this really generated by any version of Vicaya or only supposed to be like this in future? If you really have a working version of this, please give me it! Even if it is not completely ready. I don't mind firefox not letting ?param=value in local URLs, because passing parameters via document.write("<param>") is enough.
(3) I have modified only one line of HTMLParser.jj:
| < #LET: ["A"-"Z","a"-"z","а"-"я","А"-"Я","0"-"9"] >
This allows for KOI8-R Russian letters Аа Бб Вв Гг etc. to be in the HTML attribute values. This is not a right way to go in general, however, because characters of different alphabets will still make the parser to fail. However, this is OK for my current purposes.
The second case of using user-defined fields in my project is restricting the search to a given section of the CD catalogue (such as "About the company", "Staff", "List of products" etc.). For instance, I will add the field
<!-- @vicaya section staff --> to every HTML page being the part of the section "staff" and give a user a list of sections to select for restricting the search.
Log in to post a comment.