Re: [Gramadoir-devel] number of loops / number of ambiguous words, in rules
Status: Beta
Brought to you by:
cos
From: Kevin P. S. <sca...@sl...> - 2005-03-18 23:48:43
|
On 17:30 Fri 18 Mar , Myriam Lechelt wrote: > Hi (again)! > > I know rules from aonchiall are apllied two times. > Is it to disambiguate those which couldn't be disambiguated the first > time? because we only could disambiguate words with non ambiguate ones. > If I have : > <B>[Pp]etit</B> <N>ANYTHING</N>:<J m s> > > If the noun after "petit" is ambiguous, "petit is not going to be > disambiguate. > So, is it the reason why rules are applied two times? and two times, is > it enough? (maybe it is because kevin you have good results with gaelic > ;-) ) Yes, this is the idea (and two passes is enough for 99% of the cases in Irish). I've thought about more complicated schemes that would allow you to ask for certain rules to be applied more times, etc., but it hasn't been of vital importance yet. If you want you can simply add more calls to the aonchiall function around line 677 of Gramadoir.pm.in in the "engine" directory: sub add_tags_real { my $self = $_[0]; my $sentences = unchecked_xml(@_); foreach my $sentence (@$sentences) { comhshuite($sentence); aonchiall($sentence); aonchiall($sentence); aonchiall($sentence); aonchiall($sentence); unigram($sentence) if $self->{'unigram_tagging'}; } return $sentences; } will do four passes, etc. for experimentation. If you find it's important this will be easy enough to customize per language, please let me know > And I don't think it is possible, but could we work on three or more > ambiguous words? > You surely had (or have) the problem of several ambiguous words which > follow each other... All rules in all of the input files can contain as many words of context as you need. It is pretty unusual to need four and I'm not sure I've ever used five. But, each rule in aonchiall-fr.in is only allowed to disambiguate one word (so one <B> tag per rule). Since it is common to have context words that have not yet been disambiguated, you just leave them without any markup: <B>[Aa]s</B> an:<S> "an" is either the article or an interrogative word in Irish; it is tricky to disambiguate. The rule above disambiguates "as" and this rule appears before any work is done on "an", so no tags are provided for "an". On the other hand, in this case, it is clear if you speak Irish that the "an" ought to be the article, so you might want to say something like this: <B>[Aa]s</B> <B>an</B>:<S>,<T> Is this what you're asking about? For now though, you have to do this in two steps: <B>[Aa]s</B> an:<S> <S>[Aa]s</S> <B>an</B>:<T> -Kevin |