Hello All!
I've been using Xaira with some success with a lemmatised and tagged corpus of around 1.2 million words for the last couple of months--but I have been largely unable to use the morphological tagging.
I have tried two different approaches. I began with each word associated with a single morphological tag which contained many different bits of information (there were, in theory, about 1300 possible tags in this scheme), so that the xml looked like this:
<w xml:id="w39" lemma="byrja" ana="sng3eþ">byrjaði</w>
When I ran the indexer, it threw up lots of "Task stack too deep, switched off." errors. The resulting corpus was then usable in Xaira, but it tended to crash when I tried to use the tags, and even when it didn't, the length limit on queries meant that this was an infeasible approach (for example, a query to search for any noun had to include over a hundred different tag possibilities, which meant that even these simplest queries were too long to run).
I've now split the tags up, so that the xml looks like this:
<w xml:id="w39" lemma="byrja" cls="verb" mood="ind" voice="act" gen_pers="3" num="sg" tense="past">byrjaði</w>
However, the indexer still throws up thousands of errors, and when the corpus is used in Xaira, the programme crashes if I try to use any of the morphological tags in the Word Query dialogue. Alternatively, if I use the XML Query dialogue and specify a <w> element with the correct attribute, it always returns no results. The corpus works fine if I exclude all the morphological tags when indexing, but including even one tag causes these problems.
Does anyone happen to be familiar with these errors and could suggest a solution or a workaround? I'd be extremely grateful for any suggestions!
Thanks in advance,
Tam Blaxter
|