|
From: Brian C. <bri...@gm...> - 2010-06-28 21:25:42
|
Hi all, For those who don't know me, I am a GSoC student and have been developing the hfst-proc tool for Apertium to allow integrating morphological analysis/generation with HFST transducers into the Apertium MT pipeline. A secondary goal of my project is to get foma transducers working with hfst-proc, so I have started working towards getting the necessary tools working for getting foma transducers converted to the HFST optimized lookup format. After working with the HfstInput/OutputStream classes a bit, I have a question about the design of the HFST3 header processing. On the input side of things, header processing is currently split between the HfstInputStream frontend where detection of the transducer type is done, and the backend implementation classes which are also aware of the header so they can skip past it when loading. Writing the header is also done by the backend implementations. My understanding of the header is that it is supposed to encapsulate the actual transducer. If that is the case, would it not be more sensible to have all the header processing handled in the frontend classes, and leaving the implementation classes unaware of the header? I also have a question about foma transducer I/O. The HFST specific write/read functions in foma's io.c work only with a plain-text format, while foma's native functions gzip the entire thing. Is there a reason for HFST's not doing the same thing? Cheers, --Brian Croom |