This would be very useful for gathering word frequencies, a critical parameter in many psycholinguistic experiments.
Until now the system returns the first 1000 results and does return the total number of all results for a query.
Anonymous
I understand the limit, which can be changed, is introduced for safety reasons; on the other hand you can still get all the hits with poliqarpc, which accesses the corpus files directly without the use of the poliqarp protocol.
What about handling the queries for token (word) frequencies in a special way? Aren't they stored explicitely somewhere in the corpus files?
We think about building a dedicated frequencies module. Maybe in such queries the format of results could be different (only frequencies, without actual matches in text)?
Yes, I was about to suggest it :-).
For our corpora we always make a frequency list (with poliqarpc and a simple script by Jakub Wilk) but feel uneasy about making them available in the raw form.