$ hfst-info
No tests selected; printing known data
HFST info version: 0.1
HFST packaging: hfst 3.7.1
HFST version: 3.7.1
HFST long version: 300070001
HFST configuration revision: $Revision: 3900 $
OpenFst supported
SFST supported
Unicode support: glib
$ uname -a
Linux mike-HP-ProBook-6560b 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:45:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
A simple hfst-pmatch preprocessor that copies the filename of a binary transducer into a regular expression file. In order to apply the patch, the name of the directory should be "hfst/":
$ patch -p0 -i hfst-with-pmatch-pproc.patch
The patch will modify configure.ac in hfst/ and Makefile.am in hfst/tools/src/. Also, hfst-pmatch-proc.cc will be added into the hfst/tools/src/ directory. When the patch is applied, the patched hfst can be made using:
$ make && sudo make install
In a langs directory (e.g. langs/sme), the regexp.pmatch file can be made. It is important to write @InsertAnalyserBin at a place where it should be replaced by for example @bin"analyser-gt-desc.hfst". Look at the next example:
/// (e.g. regexp.pmatch in langs/sme/tools/preprocessor)
Define Terminator {!} | {?} | {.} | {,} | {;} | {:};
Define WhiteSpace Whitespace EndTag(WS) ;
Define FormatMarkUp [{<} | {</}] Alpha+ [{>}] EndTag(Format) ;
Define Deliminator # | WhiteSpace | Terminator | FormatMarkUp ;
Define Word LC(Deliminator) @InsertAnalyserBin RC(Deliminator) EndTag(SamWord) ;
Define TOP Word | FormatMarkUp | WhiteSpace ;
///
After applying (in langs/sme/tools/preprocessor as an example):
$ cd langs/sme/tools/preprocessor
$ hfst-pmatch-pproc -i ../../src/analyser-gt-desc.hfst
, it will compile this pmatch file and write to regexp.hfst in binary hfst format:
/// (temporary pmatch file to be compiled, see @bin"../../src/analyser-gt-desc.hfst")
Define Terminator {!} | {?} | {.} | {,} | {;} | {:};
Define WhiteSpace Whitespace EndTag(WS) ;
Define FormatMarkUp [{<} | {</}] Alpha+ [{>}] EndTag(Format) ;
Define Deliminator # | WhiteSpace | Terminator | FormatMarkUp ;
Define Word LC(Deliminator) @bin"../../src/analyser-gt-desc.hfst" RC(Deliminator) EndTag(SamWord) ;
///
Finally, the hfst-pmatch tool can be used like this:
$ echo "Mun ja son." | hfst-pmatch regexp.hfst
<samword>mun+Pron+Pers+Sg1+Nom</samword><ws> </ws><samword>ja+CC</samword><ws> </ws><samword>son+Pron+Pers+Sg3+Nom</samword>.