Hi Pavel,



Hello Lennart,

Friday, August 18, 2006, 1:11:44 PM, you wrote:


LL> Right now, i am trying to prepare the search results in a similar
LL> way to the CSCD software, i.e. showing no. of occurences, words
LL> which contain your search term, and the books, which contain
LL> matches.

LL> However, for that, i am scanning text-files (not html) and my idea
LL> is, for the future, kind of a webservice, so that PTR makes a
LL> request, and gets the result via internet. That way, you wont need
LL> to download a search index etc. and could at least provide one
LL> version of the Pali Text Reader, which browses and searches online,
making the download package very small...
Is there a way to create the search index on the client computer from the
source files? Have you tried this?

Yes, i went along this path for some time. I stripped all pali text words and organised them according to the books. The (unoptimized) index file was still 64 MB large! Not so good :-) But, if i loaded that file into memory, search results where immediate :-)

However, next step included incorporation of word positions as well, in order to analyze a combination of terms whether they match  in a sentence or not. But then i stopped the effort there and digressed a little bit, playing a bit with concordance stuff:

http://paliconcordance.nibbanam.com

Finally, i came back to a simpler approach, which is now a plain file-based search using Boyer-Moore... Its fast, realiable and can be used to look up occurences and sourrounding text passages quite fast. Processor speed and available memory make it not impossible or boring to scan 217 files each time you look up something.




Webservice will require connection to the internet and having a webserver
set up for this purpose.


Sure. But i prepared two versions of the PaliText Reader for the BETA release: one edition, measuring 4 MB will allow online browsing of the Pali texts from this website: http://tipitaka.nibbanam.com. While the "full download" will include searchable textfiles, html version of the Pali texts with an impact of 70 MB for download. So can even switch between both versions later on, when you separatly download stuff. So, that way, the user can decide: small impact on his system means he needs online connections. Or: huge download in the beginning means he can work offline. I hope that this two-way strategy leaves room for everyone's needs.

For the online edition, an online search (webservices based) still needs to be implemented... So this is the only drawback right now, if someone decides to go for the online edition - no search possible (yet).


LL> I hope, everything went well with the checked in source?
I was very busy with my converter which is now complete. Next task will be
creating the installer for the pocket PC version. Only after that I will
install tortoise SVN and update the source code.

okay,

best wishes Lennart