Yeah...the search engine...
Thats the final piece of code i touched again, before i can release the first beta of PTR.
You may have noticed, that i rejected "Lucene" and moved towards an implementation of Boyer-Moore.
Lucene was terrible on the unicode files - didnt find all terms in a document and couldnt even tell you how many occurences it found.
Now i use a Boyer-Moore implementation which is quite fast, however, it searches through the files, not using an index. When you consider todays computing power: it does take about 15 seconds for my notebook ( 1.5 Ghz) to scan through all files...
Right now, i am trying to prepare the search results in a similar way to the CSCD software, i.e. showing no. of occurences, words which contain your search term, and the books, which contain matches.
However, for that, i am scanning text-files (not html) and my idea is, for the future, kind of a webservice, so that PTR makes a request, and gets the result via internet. That way, you wont need to download a search index etc. and could at least provide one version of the Pali Text Reader, which browses and searches online, making the download package very small...
I hope, everything went well with the checked in source?
Wednesday, August 16, 2006, 11:38:26 AM, you wrote:
LL> This is correct. Reason: I was time and again thinking whether:
LL> 1.) i should prepackage the html directly -> faster load times, limited to just one format
LL> 2.) or whether i should include the aalekh binaries -> doing conversion "on the fly" which takes
LL> longer to load a single page but allows for variant output files to be generated (like RTF, XML, DOC,
LL> Finally, (recent release) i decided, that most user, will not understand why each loading time of
LL> their pages takes so long. They will, in any case, copy pali text passages from the rendered page and
LL> paste them into whatever other document they need.
LL> An alternative could be something like your converter: deliver or look for CSCD original aalekh
LL> decoded files. Do the conversion on the fly in the background either after installation or during
LL> What are your ideas about this matter?
There seems to be one more reason to use converted files. That is search. The search engine wouldn't search
in aalekh-encoded files, so keeping all files in HTML (or any other format it understands) is the only option we have.
Pavel mailto: email@example.com