Re: [Languagetool] How to write a rule with an optional adverb token?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Dominique,

> Hi Marcin
>
> I still have a problems with this way of skipping tokens (adverbs
> for example).
>
> 1/ One problem is that it not only skips adverbs, but it also
> skips nouns and adjectives in above example because the
> regexp has to list not only the postag "A" to skip, but also all
> the possible postags of the next token.  So skipping can be
> too greedy. What if I want to skip only adverbs?

Well, it seems that it isn't possible with skipping as it is right now. 
We would have to rewrite the pattern matching a bit, for example to 
include your idea of repetition (ranging maximally from 0 to the end of 
sentence minus the number of tokens in the pattern). This could be 
specified to occur at the end of the pattern, so that no closing token 
of the skipping would be included. Feel free to propose a patch to the 
XML Schema and Java files: the class you need to look is 
AbstractPatternRule -- especially testAllReadings() and PatternRule -- 
especially match(). Frankly, I'm not sure how to implement it - probably 
you could simply change only some bits of the code. Now it simply 
calculates a possible range of skipped tokens and checks for all of them 
if they match an exception. You could use the same range but check for a 
positive condition, yet I'm not sure if that would work easily. The code 
is pretty short but the concepts involved are tricky... I guess it 
should go around line 172 of AbstractPatternRule - adding an OR 
condition after prevElement.isMatchedByScopeNextException -- something like:

|| !prevElement.matchesAPositiveCondition()

while matchesAPositiveCondition would be computed in as AND rather than 
as OR (all positive conditions have to be met). I'm not sure if that 
would make any difference, I have no time to play with this.

> 2/ It does not work well if a token can have multiple tags.
> We saw for example that if postag to skip can have both "A"
> (adverb) and "SENT_END", the regex has to list all those
> cases ("A|SENT_END|...") But it's impossible to know what
> are all the other possible tags besides "A" and SENT_END.
> Some adverb words can also be verbs, or nouns, etc.
> The disambiguator cannot always give a single tag to each
> token. Such adverbs would not be skipped.

Same as above...

> 3/ It does not seem to allow me to skip with a token which
> is either an adverb (postag) or a specific token value. For
> example, I don't see how to skip a token which is either an
> adverb (postag "A") or specific word "foo".

There is a quick hack possible: just assign a new special POS tag to the 
word "foo" in the disambiguator, and you will be able to use the tag in 
the rules.

Note that using disambiguator this way you may simplify your rules a 
lot, and this allows to have a certain "cascade" of rules.

Regards
Marcin

Re: [Languagetool] How to write a rule with an optional adverb token?

Proofreading Software for 20+ Languages

Re: [Languagetool] How to write a rule with an optional adverb token?