Re: [eXist-TEIXML] eXist for linguistic purposes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Wolfgang,

On donderdag 8 september 2011 17:47:06, Wolfgang Meier wrote:
>
> Instead of just highlighting the match, we could tag all preceding and
> following tokens without much additional cost. This could probably
> also include the relative position of the token to the match, so you
> would end up with something like<context pos="-1">...</context>
> <match>...</match> <context pos="+1">...</context>.
>

This sounds great! Such a 'native' segmentation would undoubtedly 
perform much faster. Additionally, I guess it would facilitate further 
interaction with such collocation data as well. For example, if a 
collocation table shows that "great" occurs at position 3 after the 
search term "eXist", I can imagine that users would want a link from 
there to "exact proximity searches", where "eXist" occurs exactly 3 
words before "great". That's something the Lucene search syntax doesn't 
support, does it?

>
> I suppose Lucene does store the total number of words per indexed
> document somewhere (it should be relevant for computing weights), so
> we could add a function to retrieve it.
>

Ditto: would be very useful!

> P.S.: I plan to integrate your improved version of the kwic module. I
> just wanted to test it on some of my existing apps first to see if it
> breaks backwards compatibility or not.

That's nice to hear. Please make sure to test the version at 
<http://www.kantl.be/ctb/download/kwic.xql>, which has some improvements 
and fixes some dumb errors, compared to the one I posted on eXist-open.

Kind regards,

Ron

Re: [eXist-TEIXML] eXist for linguistic purposes

eXist-db is a feature rich Open Source native XML database

Re: [eXist-TEIXML] eXist for linguistic purposes