From: Ian R. <i.r...@dc...> - 2005-08-22 06:41:35
|
On Sun, 21 Aug 2005 sr...@ug... wrote: > > The docs make frequent mention of the ANNIE English Tokenizer (as > distinct from the ordinary DefaulTokenizer). It seems that the gui, > for example, reads in creole.xml, which in turn defines the ANNIE > English Tokenizer as the DefaultTokenizer plus some extra > post-processing routines. Terminology - the GATE Unicode Tokeniser is the SimpleTokeniser class. The ANNIE English Tokeniser is the DefaultTokeniser class. > My question is this: How do I call this from code? In other words, > how do I use the ANNIE English Tokenizer when I'm using GATE as a > library? The English tokeniser is a processing resource the same as any other, so you can instantiate it in the usual way via the factory, e.g. LanguageAnalyser tok = (LanguageAnalyser)Factory.createResource( "gate.creole.tokeniser.DefaultTokeniser", Factory.newFeatureMap()); The DefaultTokeniser PR internally creates a SimpleTokeniser and a JAPE transducer for the postprocessing. Ian -- Ian Roberts | Department of Computer Science i.r...@dc... | University of Sheffield, UK |