I have indexed a corpus consisting of a single file (it's M01 from FLOB, so a total of about 2000 words).
Words with multiple forms are not being indexed correctly. This affects the freq0 file and also the lists you get in the Word Query dialog.
In the freq0 file, I get, for instance, three lines for "of" instead of the one line I ought to have:
of 1 1
of 46 1
of 1 1
This should, surely, be
of 48 3
(the "forms" are distinguished by POS tags -- respectively, ???, IO, and II22).
This is reflected in the WQ dialog as each form of "of" appearing as a separate line in the top half of the word-list, rather than them being combined together in the top half and then separated out in the bottom half. See enclosed graphic.
This is carried over into the XML listing saved from this dialog as well.
Since it is a consistent problem, I'm guessing it has to do with the indexer rather than the client. I don't get this bug with bigger corpora, e.g. the full whack of FLOB (whose freq0 has one line, "of 34094 14"), and it's fine with the BNC too.
Andrew.
bug graphic
Logged In: YES
user_id=1036552
Originator: NO
I tested this with 1.22 and get the correct behaviour so whatever the problem is it appears to have gone away. Strange, I can't explain how it can have arisen, I suspected the addkeys setup but it is quite correct.
Andrew, will you test this when 1.22 appears and report the result here; then if it really has gone away I'll close the bug.
Logged In: YES
user_id=1460495
Originator: YES
No change in 1.22. There are still three separate listings for "of" in both the freq0 file and in the result I get from the Word Query dialog lookup ...
Logged In: YES
user_id=1036552
Originator: NO
I can reproduce this now: I missed it before because it doesn't happen with the debug version. This also makes it rather hard to track, but I'll catch it one way or another!