Re: [gate-users] Adding Rules to the Tokeniser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Great.
First question is easy. Write a negative rule with higher priority, that 
does nothing if it finds that XXX contains the letter Q (lots of ways 
you could check that). See the user guide section called "hints and 
tips" (or something like that) in the JAPE chapter.

Second question - not quite sure what you mean. The tokeniser doesn't 
have a whole word parameter. Or perhaps you mean the gazetteer. 
Basically no, it's up to you to define how you want with the rules. eg 
you can check on the RHS of the rule that the thing you're looking at is 
a whole word, or not (by comparing length with the Token starting at the 
same place, or by looking at spaces, or something like that).

Diana

Storey, Jeff wrote:
> Diana,
>
> That worked exactly like you said it would. I have two small follow-up
> questions if you don't mind...
>
> First, if I wanted to restrict XXX to not contain the letter Q, could I
> do that in the JAPE rule or is that something I would need to validate
> in my Java code (or maybe Java code in the JAPE rule)?
>
> Second, does the transducer have a whole word only parameter similar to
> the tokeniser?
>
> Thanks again for all of the help.
>
> Jeff 
> -----Original Message-----
> From: Diana Maynard [mailto:d.m...@dc...] 
> Sent: Thursday, November 30, 2006 9:28 AM
> To: Storey, Jeff
> Cc: gat...@li...
> Subject: Re: [gate-users] Adding Rules to the Tokeniser
>
> Hi Jeff
> If I'm not mistaken, the XXX and the ##### will be annotated as two 
> separate tokens if XXX consists of letters and ### numbers. So an 
> alternative method could  be to just search for a pattern such as 
> ({Token.kind == word, Token.length == 3}{Token.kind == number})
> making sure that you add SpaceToken in your input annotations (so that 
> if the word and number have a space between them, they won't get 
> recognised by your rule).
> Then you could double check on the RHS of the rule that your 3 letter 
> "word" matches one of your 3 letter Lookups.
> Diana
>
>
> Storey, Jeff wrote:
>   
>> Ian,
>>
>> Thanks, that makes sense. The only problem that is now that I'm
>> annotating a lot more data than I need (so ANNIE is using up more
>> memory/time than I would like). In this particular case, I'm looking
>>     
> for
>   
>> any annotations that match the pattern:
>> XXX######## (where X is a letter and # is a digit). 
>> To get this pattern recognized in a jape file, I am using one
>>     
> Gazetteer
>   
>> to pick up the XXX combinations since that is only part of a word. The
>> biggest problem is that this picks up pretty much every 3 letter
>> combination. Is there a way I can find terms that START with xxx and
>> only annotate that?
>>
>> Thanks.
>> Jeff
>>
>>
>>   
>>   
>>     
>
>