Anonymous - 2011-03-12


I am making for my disertation paper a pos tagger for Romanian. I want to be able to restrict the outcomes even more than just using a dictionary. I want to use some rules for disambiguation, based on the context. This would allow me to use smaller corpus, and also to fix consistent output mistakes.

So I want to be able to give the postagger the possible set of outcomes for each word from the input, separately. So, since the training of a model doesn't use the tagdict, I figured I could make this possible by making small modifications in the code. Let me know if I am wrong.

But before starting to dig into this further, I need to know if, when parsing a sentence, the constraints of a word, given by the tagdict, can affect the outcomes of the next word in the chain. Cose if not, then all this work would be pointless, cose I could just use the probabilities of the outcomes (but this would be of course much less precise).