You can subscribe to this list here.
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
|
From: Tommi A. P. <tom...@he...> - 2011-09-01 22:15:54
|
Hi all, I just commited changes to the HFST trunk, that will make HFST tools make more extensive use of the metadata headers. While this change is downwards and upwards compatible with HFST header parsing algorithm I expect there will be few b0rkages here and there, so be cautious while updating. And report bugs to designated bug mail address or the sourceforge's bug tracker. The list of metadata headers planned and used will be in doc/hfst3-metadata-header-registry.rst and some similarly named spot in HFST wiki; suggestions and corrections are very welcome. While most of the metadata in list are just ancillary end user information, some can be used for extended functionality, such as Compression, which could be coupled with foma format to store original untouched (gzipped) foma transducers in HFST3 transducer container. -- Tommi A. Pirinen, tietojenkäsittelijälukki sekä kieli-, puhe- ja käännösteknologimestari <http://www.helsinki.fi/~tapirine/> |
|
From: Francis T. <ft...@pr...> - 2011-03-17 09:41:45
|
El dj 17 de 03 de 2011 a les 01:31 +0200, en/na Flammie Pirinen va escriure: > The HFST's new major version has been released as stable package now. > New features include foma integration, dynamic linking to external > backend libraries, xfst scripting language support (via foma) and more. > Downloads are available in <http://sf.net/projects/hfst/>. Bugs and > regressions can be reported to sourceforge's bug tracking system, via > email to hfs...@he..., or at HFST's IRC channel #hfst on > Freenode networks. Excellent news ! Congratulations ! :) Note for Apertium users: Some programs have changed name / options, so if you're using HFST in your pair, you'll probably need to work a bit on the Makefile. Any questions feel free to ask on #hfst. Fran |
|
From: Flammie P. <fl...@ik...> - 2011-03-16 23:31:26
|
The HFST's new major version has been released as stable package now. New features include foma integration, dynamic linking to external backend libraries, xfst scripting language support (via foma) and more. Downloads are available in <http://sf.net/projects/hfst/>. Bugs and regressions can be reported to sourceforge's bug tracking system, via email to hfs...@he..., or at HFST's IRC channel #hfst on Freenode networks. -- Flammie, computer scientist bachelor, linguist master, free software Finnish localiser, and more! <http://www.iki.fi/flammie/> |
|
From: Tommi A. P. <tom...@he...> - 2010-07-14 06:50:03
|
[Sorry for the slow answer] 2010-07-05, Brian Croom sanoi: > I'm looking into restoring the lookup functionality into libhfst that > hasn't made it over from HFST2 yet, but I'm not sure what philosophy > the library should be following. Should lookup/analysis functions > (and support functions for tokenizing input strings) from the backend > libraries be driving the lookup? SFST and foma both have such > functions exposed, while OpenFST does not directly. This approach > leads to considerable variance in the lookup operation with different > backends as e.g. foma honors flag diacritics for its lookup while > SFST does not. Variance is not a bad thing here, one of the design decisions in HFST3 is that we can have all sorts of more or less limited backends to library, the missing implementations will raise an exception and then programmer can recover from it (e.g. by converting or, in this case, using composition and extract paths). So I would go for using the underlying functions where possible. Of course where functionality actually differs there should be different functions or signatures, otherwise it would be too confusing, I think. So with flag diacritics and different tokenizations there could be specialized functions. > So would it instead be preferred to follow HFST2 in > using HFST-specific methods for performing lookups and input string > tokenization? As long as it's still possible this can be done as well. But the main HFST-specific lookup we need is most likely the one for optimized lookup transducers. > I'm also wondering what design decisions have been made regarding the > the role of HFST2's Symbol and Key layers in the new library version. > The code currently seems to have traces of key table usage which has > been removed. I think the aim is to reduce the complexity as much as possible, as the symbol and key distinction wasn't used for anything in HFST2 tools. > And what about HfstTransducer's is_trie member > variable? Does it have any relation to the Trie class in > HfstTokenizer.h? is_trie variable is for optimizations. Some algorithms such as union are order of magnitude faster when operating with two trie or trie and path shaped transducers. The trie backing up the default tokenizer is just a light-weight implementation of the data structure with no relation to trie-shaped transducers of the main library; it's probably the fastest and simplest way to perform left-to-right longest match tokenization. -- Tommi A. Pirinen, tietojenkäsittelijälukki sekä kieli-, puhe- ja käännösteknologimestari <http://www.helsinki.fi/~tapirine/> |
|
From: Brian C. <bri...@gm...> - 2010-07-05 20:40:53
|
Hi, I'm looking into restoring the lookup functionality into libhfst that hasn't made it over from HFST2 yet, but I'm not sure what philosophy the library should be following. Should lookup/analysis functions (and support functions for tokenizing input strings) from the backend libraries be driving the lookup? SFST and foma both have such functions exposed, while OpenFST does not directly. This approach leads to considerable variance in the lookup operation with different backends as e.g. foma honors flag diacritics for its lookup while SFST does not. So would it instead be preferred to follow HFST2 in using HFST-specific methods for performing lookups and input string tokenization? I'm also wondering what design decisions have been made regarding the the role of HFST2's Symbol and Key layers in the new library version. The code currently seems to have traces of key table usage which has been removed. And what about HfstTransducer's is_trie member variable? Does it have any relation to the Trie class in HfstTokenizer.h? Thanks for the help, --Brian Croom |
|
From: Krister L. <kri...@he...> - 2010-06-29 14:37:28
|
Brian Croom wrote: > I also have a question about foma transducer I/O. The HFST specific > write/read functions in foma's io.c work only with a plain-text format, > while foma's native functions gzip the entire thing. Is there a reason > for HFST's not doing the same thing? Our transducers are designed to be used in the input stream of the command line functions in order to pipe them from one command line tool to the next. However, the gzip-library seems not to work with streams. Since gzipping is only a disk saving operation, we decided that currently it was not crucial for our purposes. Our current speller transducers are normally modest in size, i.e. around or below 10 MB. If the size of the transducers starts to grow above several 100 MB, e.g., for weighted tagger or grammar language models, it may still become interesting to provide gzipping simply to speed up the loading of model files from the disk. Regards, Krister |
|
From: Brian C. <bri...@gm...> - 2010-06-28 21:25:42
|
Hi all, For those who don't know me, I am a GSoC student and have been developing the hfst-proc tool for Apertium to allow integrating morphological analysis/generation with HFST transducers into the Apertium MT pipeline. A secondary goal of my project is to get foma transducers working with hfst-proc, so I have started working towards getting the necessary tools working for getting foma transducers converted to the HFST optimized lookup format. After working with the HfstInput/OutputStream classes a bit, I have a question about the design of the HFST3 header processing. On the input side of things, header processing is currently split between the HfstInputStream frontend where detection of the transducer type is done, and the backend implementation classes which are also aware of the header so they can skip past it when loading. Writing the header is also done by the backend implementations. My understanding of the header is that it is supposed to encapsulate the actual transducer. If that is the case, would it not be more sensible to have all the header processing handled in the frontend classes, and leaving the implementation classes unaware of the header? I also have a question about foma transducer I/O. The HFST specific write/read functions in foma's io.c work only with a plain-text format, while foma's native functions gzip the entire thing. Is there a reason for HFST's not doing the same thing? Cheers, --Brian Croom |
|
From: Tommi P. <tom...@he...> - 2010-06-02 13:14:58
|
Just a heads up that I deleted openfst from libhfst source tree since bundling apache licenced libraries is considered evil. If you wish to continue using OpenFST with hfst3 you need to have libfst and libfstmain installed, as well as openfst headers, in a nice location where autotools can find it. And use #include <fst/libfst.h> in sourcecode in stead of #include "openfst-1.1/src/include/fst/fstlib.h". -- Tommi A Pirinen, Tietojenkäsittelijälukki ja kieliteknologimestari |