From: <silvia.gnome@...> - 2008-04-08 10:36:29
|
Hi! I'm new in the list so, first of all, I'll briefly introduce myself. My name is Sílvia and I'm from Barcelona. I've been a Linux user for about 3 years (I currently use Ubuntu) and I've been working as a full-time freelance translator for about half a year (for now, 80% under a virtualised Windows). I'm also involved in the GNOME Translation Project; I've translating and proofreading GNOME files into Catalan for a couple of years. In the Catalan translation team, we generally use Gtranslator or Gedit to translate .po files; however, these two editors don't support TMs. In addition, I've now been hired to translate OpenOffice documentation, together with a few more translators, and the project is soo large (I've translated it before as a volunteer...) that having a TM is extremely important in order to avoid inconsistencies. So, the question is: is OmegaT suitable for translating .po files? Do I need to tweak the files in any way, or are they fully supported by the application? What about sharing TMs? Since the OOo translation is a collaborative project, we would like to share our TMs as they grow. Can this be done using OmegaT? Any suggestions and opinions will be greatly appreciated! Thanks, Sílvia [Non-text portions of this message have been removed] |
From: Marc P. <mail@...> - 2008-04-08 10:47:26
|
This has come up several times recently; try searching the forum archives. Those of you who have experience of handling .po in OmT: is there any chance of the situation/procedure being documented and included in the manual? Marc S�lvia Miranda schrieb: > Hi! > > I'm new in the list so, first of all, I'll briefly introduce myself. My name > is SÃÂlvia and I'm from Barcelona. I've been a Linux user for about 3 years > (I currently use Ubuntu) and I've been working as a full-time freelance > translator for about half a year (for now, 80% under a virtualised Windows). > I'm also involved in the GNOME Translation Project; I've translating and > proofreading GNOME files into Catalan for a couple of years. > > In the Catalan translation team, we generally use Gtranslator or Gedit to > translate .po files; however, these two editors don't support TMs. In > addition, I've now been hired to translate OpenOffice documentation, > together with a few more translators, and the project is soo large (I've > translated it before as a volunteer...) that having a TM is extremely > important in order to avoid inconsistencies. > > So, the question is: is OmegaT suitable for translating .po files? Do I need > to tweak the files in any way, or are they fully supported by the > application? > > What about sharing TMs? Since the OOo translation is a collaborative > project, we would like to share our TMs as they grow. Can this be done using > OmegaT? > > Any suggestions and opinions will be greatly appreciated! > > Thanks, > > SÃÂlvia > > > [Non-text portions of this message have been removed] > > > ------------------------------------ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > IRC channel: irc://irc.freenode.net/omegat > For bug reports, feature requests, OmegaT test versions etc... > Go to: http://sourceforge.net/projects/omegat/ > To localize OmegaT to your language, or proofread existing translations, > read: http://www.omegat.org/omegat/omegat_en/translation-info.html > If OmegaT makes you richer than you need, check the Pine Ridge Reservation's > charities for a donation: http://www.friendsofpineridgereservation.org/ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Yahoo! Groups Links > > > > |
From: Jean-Christophe H. <fusion@...> - 2008-04-08 11:40:12
|
On 8 avr. 08, at 19:47, Marc Prior wrote: > This has come up several times recently; try searching the forum > archives. > > Those of you who have experience of handling .po in OmT: is there any > chance of the situation/procedure being documented and included in the > manual? I planned to propose Vito a chapter about each file format's idiosyncrasy but lacked time. That is on my list of todo things for the next version of addendum to Vito's work. Basically, PO files work like any other files: put them in /source/ and OmegaT will parse the contents of all the msgid strings. Problem is that PO files sometimes come with legacy translation data in msgstr (fuzzies or old translations), OmegaT not only ignores such contents but _overwrites it_ with the contents of msgid when creating the translated file. The fact that OmegaT ignores such contents is known and fixing this is an existing RFE. The fact that OmegaT overwrites the contents of msgstr is a registered bug. Basically, OmegaT works well with PO files that are created from scratch, without anything in msgstr (ie with no legacy translation data). PO files created from file formats that OmegaT does not support directly, with the aid of Okapi (.NET or MONO) or po4a (Debian, perl utility) for example, should be handled without problems in OmegaT. Files that include legacy data should be processed to remove the legacy data sets (pairs of already translated msgid/msgstr) or to remove fuzzy data (the ancestor of the TM for PO files...) This can be done with the aid of some tools in the Translate Toolkit (Python). Obviously PO support is currently not optimal, especially when considering that a lot of PO based processes include legacy data, but current development will probably lead to better handling of such "bilingual" data sets (including XLIFF). I leave that part to Alex and Didier for comments. Sílvia Miranda schrieb: > In the Catalan translation team, we generally use Gtranslator or > Gedit to > translate .po files; however, these two editors don't support TMs. In > addition, I've now been hired to translate OpenOffice documentation, > together with a few more translators, and the project is soo large > (I've > translated it before as a volunteer...) that having a TM is extremely > important in order to avoid inconsistencies. OpenOffice.org has a dedicated l10n list where OmegaT and PO has been extensively discussed. I suggest you check their archives, where you'll basically see a variation of the above contents. We (I work in the French and Japanese teams) had a lot of questions recently with the shift to Pootle and a lot has been clarified thanks to the Translate Toolkit team's explanations. > So, the question is: is OmegaT suitable for translating .po files? > Do I need > to tweak the files in any way, or are they fully supported by the > application? Yes, OmegaT is suitable. Yes you need to "tweak" the files with the tools provided by Translate Toolkit. > What about sharing TMs? Since the OOo translation is a collaborative > project, we would like to share our TMs as they grow. Can this be > done using > OmegaT? Yes this is possible in OmegaT. The Italian team does it (see the l10n list archives), the Japanese team does it too. Keep in mind that OmegaT is not (yet) designed for multiuser tasks so team reviewing and data sharing can be sometimes a relatively management heavy process. But besides for its drawbacks, OmegaT is the only acceptable multiplatform free CAT tool available on the market, and it does a very decent job for the tasks it is designed for (namely single user translation of document files as opposed to preprocessed multilingual localization files)... Jean-Christophe Helary ------------------------------------ http://mac4translators.blogspot.com/ |
From: Samuel M. <leuce@...> - 2008-04-08 12:35:22
|
Jean-Christophe Helary wrote: > Problem is that PO files sometimes come with legacy translation data > in msgstr (fuzzies or old translations)... From a PO file user's perspective, they are not "old translations" but "existing translations". Such a PO file is considered partially translated. OmegaT assumes all PO files are 100% untranslated, and OmegaT assumes that the PO file will be 100% translated by the time the translator passes it on to the next person or to the client. It may be prudent not to refer to fuzzies and existing translations as "legacy" data. From a PO file's perspective, data is legacy if its source text exist in the PO file but not in the file from which the PO file was created. And some PO files do contain such legacy strings (the OmegaT equivalent would be "orphaned strings"). Translations of strings in the PO file that also exist in the original file, are considered current translations, not legacy translations (even if from OmegaT's perspective they are "pre-existing" translations). Samuel |
From: Vito S. <smolejv@...> - 2008-04-09 12:43:21
|
OmegaT sees the world of translations as separated in sources and targets, with a few, selected files straddling the divide. The one and only translation memory is bound in its context to the source of the project - i.e. anything I add in the form of TMX files is conditional upon being accepted (Enter, enter .... enter) when the suitable / identical segment turns up in the source.(Of course there's ways around it, but the fact I have to go around the block to achieve it, is a good enough argument). The point I want to make is that bilingual material is for whatever reason - memory problems with big(ger) TMs? - frowned upon. That includes XLF and PO material among others - both on the input as well as on the output side. I already wrote an RFE suggesting that at least XLF is accepted as a template-able format. The same of course could apply to PO - whatever is already there in the msgstr - legacy, my past sins, whatever you may want to call it (I call it my assets). And it is indeed memory considerations behind this separation into project TM and the rest of the world, oh well, I may write another RFE (while being sure, I would not be the first one to address this point). JC re documenting it: this could be a nice HowTo. And there's some other subjects that need to be explained in a little more detail - starting with the spellchecker etc. I would invite everybody to add his/her two cents to http://sourceforge.net/tracker/?atid=912617&group_id=68187&func=browse Note: check first the points already entered, so we dont get duplicates. Regards Vito |
From: Didier B. <d.briel@...> - 2008-04-11 09:12:05
|
-----Original Message----- >From: Om...@ya... [mailto:Om...@ya...]On Behalf Of Jean-Christophe Helary >Sent: Tuesday, April 08, 2008 1:40 PM >To: Om...@ya... >Subject: Re: [OmT] Is OmegaT suitable for editing .po files? >Obviously PO support is currently not optimal, especially when >considering that a lot of PO based processes include legacy data, but >current development will probably lead to better handling of such >"bilingual" data sets (including XLIFF). I leave that part to Alex and >Didier for comments. We're certainly interested in handling bilingual documents generally (there are RFEs on that). First, we have to introduce the concept of bilingual documents in OmegaT. Currently, OmegaT always replaces in target documents, it never adds. When something is done, it will be probably for XLIFF first, because that can be done generically for XML. The issue with PO is that everything is hard-coded (we do not have a generic "PO parser"). Didier |
From: Samuel M. (Home) <leuce@...> - 2008-04-08 19:43:13
|
Sílvia Miranda wrote: > So, the question is: is OmegaT suitable for translating .po files? Do I need > to tweak the files in any way, or are they fully supported by the > application? You can translate PO in OmegaT, but... Use the Translate Toolkit's pocount to determine how much of the PO file is translated. This will help determine whether you're going to follow the long route or the short route. If very little of the PO file is untranslated, you can follow the short route. By "very little" I mean the actual number, not necessarily the percentage. With both routes, the PO file must be preprocessed. With the short route, there is no post-processing required but the translation process itself contains potentially annoying repetitive steps. With the long route, the annoyingly repetitive steps are reduced, but there is commandline post-processing required (not complicated, though). SHORT ROUTE 1. Create a TMX file from the PO file using the Translate Toolkit's po2tmx tool. This TMX will contain only strings from the PO file that are marked as "translated" (not fuzzy or untranslated). 2. Use the TMX file as a reference TM in OmegaT (put it in the /tm/ folder), and reload the project in OmegaT. You can have other reference TMs also, of course. 3. Put the PO file in the /source/ folder of OmegaT. 4. Translate as normal. When you encounter a 100% from the TM, this is likely a string which was translated in the original PO file. With OmegaT you can't autotranslate from a reference TM, so when you encounter a 100% match from the TM, just insert/replace that match. 5. You can close and reopen OmegaT any number of times during the translation, but *do not* deliver the PO file to the client unless you have translated the entire file. This is because OmegaT inserts the source text into the msgstr field for all segments that are untranslated -- and if the msgstr is not empty (and if there is no fuzzy marker), many PO tools will regard that string as "translated". LONG ROUTE 1. Extract all fuzzy marked translations from the PO file using the Translate Toolkit's pofilter tool (you can also use a Gettext tool, but I'm not sure which). You now have the original PO file and a file containing only the fuzzy marked strings. 2. Open the file with the fuzzy marked strings in a regular PO editing tool such as PoEdit, or in a word processor or plaintext text editor, and do proofread the translations. If a string is too complicated to proofread, delete the entire string from the file. Also remove from those files the line that says "fuzzy". 3. Merge the proofread file back into the original file (I think you can use the pomerge for it, or some Gettext tool). This process will replace the original fuzzy strings with the corrected fuzzy strings, but it will not touch the original fuzzy strings that were not corrected. Steps 1 - 3 may be skipped if there are very few fuzzies or to save time... but sometimes not skipping them can also save time (depending on the usefulness of the fuzzy strings in the PO file). 4. Create a TMX from the updated PO file, using po2tmx. Put the TMX file in the /tm/ folder of the OmegaT project. 5. Extract the untranslated strings from the updated PO file, using pofilter (or your favourite Gettext tool). Put this file into the /source/ folder, and translate it as usual. Ensure that you translate the entire file completely before moving on to step 6. 6. Merge the newly translated PO file with the originally updated PO file. I think pomerge is for this. This will replace all untranslated strings in the originally updated PO file with the new translations for those strings. I'm not sure how the Gettext tools work -- it may be necessary to create a separate PO file containing only the translated strings, and at step 6 you throw the two PO files together (the one containing only initially translated strings, and the one translated by OmegaT). Remember, most of the Translate Toolkit's tools work not only on single files but on whole directories of files. Sorry I don't have time now to write down the exact syntax for the toolkit tools, but here's a hint -- you need to use a template. > What about sharing TMs? Since the OOo translation is a collaborative > project, we would like to share our TMs as they grow. Can this be done using > OmegaT? Yes. For translators using something else than OmegaT, you would share the TMs created by OmegaT. For translators using OmegaT, they would put the TMX files in the /tm/ folder of the project. The Translate Toolkit is here: http://translate.sourceforge.net/wiki/toolkit/index Samuel |
From: Sílvia M. <silvia.gnome@...> - 2008-04-08 20:39:40
|
Hi, Thanks a lot for your explanations. They were extremely helpful (by the way, I wasn't able to find messages related to this topic on the archive...). > Use the Translate Toolkit's pocount to determine how much of the PO > file > is translated. This will help determine whether you're going to > follow > the long route or the short route. (...) Phew, that looks quite complicated (well, not complicated but long)... Alright, I'll have to start testing it (I'll always have to use the long way, since most files are not completely untranslated). > > Yes. For translators using something else than OmegaT, you would > share > the TMs created by OmegaT. For translators using OmegaT, they would > put > the TMX files in the /tm/ folder of the project. Oh, so you can have multiple tmx files and OmegaT will use them all as a TM? That's great! Thanks again! I'll start "investigating" :) SÃlvia --- sÃlvia miranda Bloc: http://silvia.badall.net |
From: Samuel M. (Home) <leuce@...> - 2008-04-09 18:23:24
|
Sílvia Miranda wrote: > Samuel wrote: >> Use the Translate Toolkit's pocount to determine how much of the PO >> file is translated. This will help determine whether you're going >> to follow the long route or the short route. (...) > Phew, that looks quite complicated (well, not complicated but > long)... Alright, I'll have to start testing it (I'll always have to > use the long way, since most files are not completely untranslated). The command line in Windows is: python pocount yourfiles > yourfiles.txt ("yourfiles" is a directory with PO files in it) OR: python pocount yourfile.po > yourfiles.txt The problem in Windows is that the path support is rather primitive, so in Windows I have to copy my PO files to the folder where pocount (or in some cases pocount.py) is stored before I can run this thing. Another way to determine how much of the PO file (if it is a single PO file) is translated, is to open it in a real PO editor (like PoEdit), in which you can sort strings by translation status. But pocount is useful for nested, nested, nested folders full of hundreds of little PO files. > Oh, so you can have multiple tmx files and OmegaT will use them all > as a TM? That's great! Yes, you can have an unlimited number of reference TMs in OmegaT and OmegaT will use them all (no need to merge your TMs first). However, there is no way of telling OmegaT which TM is better, and if your document contains a lot of 100% matches from those TMs, there is no way in OmegaT to "auto accept" the match (you have to press a button each time a match is found). Samuel |