-
I need the term frequency for a single extent (e.g., sentence).
I retrieve a set of sentences using the #combine[sentence](q) type query.
And now I'd like to be able to tell what the term frequency is for certain words in each of the sentences.
Obviously, I can run for each word 'w' the query #combine[sentence](w) and get the but there has to be a better way.
There's the UnigramLM class...
2009-08-27 09:15:36 UTC in The Lemur Toolkit
-
I need the term frequency for a single extent (e.g., sentence).
I retrieve a set of sentences using the #combine[sentence](q) type query.
And now I'd like to be able to tell what the term frequency is for certain words in each of the sentences.
Obviously, I can run for each word 'w' the query #combine[sentence](w) and get the but there has to be a better way.
There's the UnigramLM class...
2009-08-27 06:42:28 UTC in The Lemur Toolkit
-
I don't even need the count of a word in the extent, I just need the probability of the word in the extent. but I wouldn't like to run a query for each word in order to get that.
2009-08-25 08:54:14 UTC in The Lemur Toolkit
-
Hi,
Is there a way to get statistics for a ScoredExtentResult?
To be specific, what I need is to count the number of occurrences of certain words in extents (e.g., sentences) I get as a result from a query.
Can I do that without having to manually go over each of the extents counting the occurrences?
Thank you,
Eyal.
2009-08-25 08:19:57 UTC in The Lemur Toolkit
-
I have two queries that illustrate my problem:
1. hypertension
2. brontosauruses
For both of these, the documents retrieved are only those containing an exact match to the query word. If I run -query="hypertens" I get an empty set returned and for -query="hypertension" I get all the documents containing hypertension.
the same idea follows in for the second query.
I don't...
2009-08-23 20:40:12 UTC in The Lemur Toolkit
-
I've made the change in the typedef but still some of the stopwords from some of the docs make their way into the index.
Should the change in the typedef be accompanied by the addition of "const" before "char *"?
Or any other changes?
[I'm using lemur 4.9]
Thanks.
2009-08-09 16:06:08 UTC in The Lemur Toolkit
-
I have the same problem under linux.
The addition of 'const' before 'char*' doesn't solve the problem for me.
Is there a solution to this problem for linux?
thanks.
2009-08-09 08:01:04 UTC in The Lemur Toolkit
-
Hi,
I have a slightly theoretical question regarding the calculation of clarity.
I take a very simple test example to see if I calculate the clarity score correctly.
D1=time time
D2=only time
D3=watch now
q=time
the set-up is:
<parameters>
<index>/home/usr/tmpIndex2/tmpIndex.key</index>
<retModel>2</retModel>...
2009-08-01 13:20:21 UTC in The Lemur Toolkit
-
When I use IndriBuildIndex for indexing files, the index created stores the doc-numbers as if they were part of the text.
For instance, a single file contains:
<DOC>
<DOCNO>D1</DOCNO>
<TEXT>
test
</TEXT>
</DOC>
the build parameter file:
<parameters>
<index>/home/usr/tmpIndex</index>
<corpus>...
2009-08-01 13:04:28 UTC in The Lemur Toolkit
-
This is not quite the same.
If we take this example
<DOC>
<DOCNO> K1 </DOCNO>
<TEXT>
good test word word
</TEXT>
</DOC>
<DOC>
<DOCNO> K2 </DOCNO>
<TEXT>
good test test test test test word word word word
</TEXT>
</DOC>
./IndriRunQuery -index=/home/user/tmpIndex -query="good test" -trecFormat=true...
2009-07-11 10:10:45 UTC in The Lemur Toolkit