A new version of the apertium-tagger-training-tools has been released (version 1.0.0).
Apertium-tagger-training-tools provides a set of tools to train (in an unsupervised way) the hidden-Markov-model-based part-of-speech taggers used by the Apertium MT platform; to that end, information not only from the source language, but also from the target language and from the rest of modules of the Apertium MT platform is used while training.... read more
Spanish-Portuguese data have been adapted to Apertium 3.0, and released as apertium-es-pt 1.0.3 .
Note that, as previous releases, apertium-es-pt has support both for Brazilian and European Portuguese.
Apertium 3.0, the latest version of the open-source machine translation platform Apertium (packages apertium and lttoolbox), has just been released, together with language-pair data for the Spanish-Catalan pair (version 1.0.4), adapted for use with the new version.
Unlike earlier versions, which worked with single-byte character sets such as ISO-8859-1, which restricted its use to some Western languages, version 3.0 of Apertium has been completely reworked to be fully Unicode-capable: both the text to be translated and the language-pair data needed to translate them can now be encoded in Unicode.... read more
apertium-transfer-tools provides a set of tools for the automatic generation of shallow-transfer machine translation (MT) rules from parallel corpora. The generated transfer rules (in XML format) can be directly used by the Apertium MT platform (http://www.apertium.org).
Although this package is aimed at the generation of Apertium transfer rules it can be adapted to generate shallow-transfer rules for other MT platforms. Moreover, some of the tools it provides can be used for other purposes such as the extraction of bilingual phrase pairs or the symmetrization of previously computed alignments. ... read more
A new release of the apertium-tagger-training-tools has been released (version 0.9.2).
This release does not add new features, minor changes has been done in order to make it compatible with the last version of lttoolbox and apertium (version 2.0).
A set of new packages have been released as part of the Apertium project, as a result of work funded by the Generalitat de Catalunya (Catalan autonomous government):
* lttoolbox-2.0 and apertium-2.0: a new version of the Apertium machine translation toolbox (with backward compatibility with Apertium 1.0 language-pair data files). The new package features:
- An enhanced structural-transfer ("translation rules") module able to perform more complicated operations in order to treat less-related language pairs such as English-Catalan
- An experimental, optional lexical selection module which tries to deal with polysemous words (under development, not currently used by any language package)
- A new format for dictionaries that allows for a more powerful definition of inflection paradigms and supports multiple equivalents in bilingual dictionaries (to be used in connection with the lexical selection module).
- A new translation script that allows for easy maintenance.... read more
This new release of Apertium includes:
* An enhanced structural-transfer module able to perform more complicated operations in order to treat less related language pairs.
* A new translation script that allows for easy maintenance.
This is the preview of the upcoming Apertium 2.0 release that will include other improvements.
A new version (1.0) of the Aranese Occitan - Catalan linguistic data for the apertium open-source machine translation system (www.sf.net/projects/apertium/) has just been released.
A new version (0.8) of the French - Catalan linguistic data for the apertium open-source machine translation system (www.sf.net/projects/apertium/) has just been released.
A new version (0.9) of the Occitan (Aranese) - Catalan linguistic data for the apertium open-source machine translation system (www.sf.net/projects/apertium/) has just been released.
Thanks to the work of Fran Tyers, Sergio Talens, and other Debian people, Apertium (http://www.apertium.org) packages have started to be available as Debian Linux packages ready to be installed as binaries for a number of platforms. They are part of the "unstable" distribution.
Currently, Debian distributes version 1.0 of the apertium and lttoolbox packages. Updated versions (1.0.3) will soon be available.... read more
The CVS tree of Apertium contains incomplete linguistic data to build a Swedish-Danish MT system (module apertium-sv-da). These data are incomplete and may contain a number of errors, since they have not been tested as they form part of an abandoned project. The Apertium team welcomes developers for this pair or other Scandinavian language pairs. The relatedness of these languages make it feasible to build a reasonable MT system based on Apertium.
This package contains a simple Perl script to evaluate Apertium-based machine translation (MT) systems.
The evaluation consists of the computation (at document level) of the word error rate (WER) and the position-independent word error rate (PER) between a translation performed by the Apertium MT system and a reference translation obtained by post-editing the system output.
This package can be easily adapted to evaluate other MT systems.
lttoolbox-1.0.3 and apertium-1.0.3 have been uploaded. They correct minor bugs observed.
Bug fixed in lttoolbox-1.0.3:
- Blank characters other than ISO-8859-1 number 32 (" ") broke multiword expressions. [Fixed]
Bug fixed in apertium-1.0.3:
- Preference rules in tagger definition file (.tsx) do not work properly. [Fixed]
Part-of-speech taggers affected should be recompiled in order to take advantage of the new apertium release.
First release of French-Catalan language pair data for Apertium (6500 lemmata, 60 transfer rules). The French-Catalan language pair is actively being developed with support of the Generalitat de Catalunya; new versions will be released before the end of 2006.
Apertium-tagger-training-tools is a new software package useful to train the part-of-speech taggers used within the open-source machine translation system Apertium.
Using this package you will be able to train in an unsupervised way the part-of-speech tagger for a given language using information from a another language by means of the apertium MT toolbox. In particular, the package may simplify the initial building of a machine translation system for a new pair of languages.... read more
Apertium has recently received funding from the Generalitat de Catalunya (the government of the autonomous community of Catalonia in Spain) to develop new language pairs (Occitan-Catalan, French-Catalan) and an improved transfer architecture to include more difficult pairs such as English–Catalan. The new transfer architecture, which will be released in late 2006, will deal with polysemic words having more than one possible translation and will be able to do more extensive syntactical transformations.
Package apertium-oc-ca provides linguistic data for translation between the Aranese variant of Occitan and the Catalan language; this is the fourth language-pair package available for Apertium. This language pair is actively being developed; therefore, new releases with more data will soon be made available.
A new version of the lttoolbox lexical processing package (1.0.2) has been uploaded. It corrects some observed bugs.
In the recent weeks the Apertium team has released:
The 1.0.1 version of the Apertium engine (packages lttoolbox and apertium)
The 0.9 version of the Spanish-Portuguese package (package apertium-es-pt; developers welcome!)
Documentation on how to install apertium and how to add data to an existing language pair.