I was thinking that for textual verifications, it would be interesting to
have OmegaT export a text only version of the project/per file eventually.
This would serve as a "draft" for the final version and could be used to
review the translation in a text only editor to get a feel of the text style/
rendering.
Logged In: YES
user_id=168045
Excellent idea. It would be great if OmegaT could export
this at any time (even before you start translating). There
should be an option to retain the original's paragraphing or
to have one segment per line.
Logged In: YES
user_id=168045
I'd like to propose a merge between this RFE and RFE 1521629 "TM
automatic creation from target", because this RFE can be part of a
solution for RFE 1521629.
If OmegaT can export all strings from a file (or an entire project),
then a user can more easily prepare text files for alignment in the
aligner or bligner tools. The user simply puts all source files in
the "source" folder, does the extraction, then puts all the target
files in the "source" folder, does the extraction again, and then
he'll have to text files containing all the segments, ready to be
aligned in his favourite align tool.
The advantage of having OmegaT extract the text as opposed to simply
using File -> Save As -> Text in his word processor, is that the
strings will have been segmented by OmegaT itself, which will lead to
better fuzzy match results in future projects.
The Wordfast extraction tool works like this: it extracts all
segments from all selected documents (processing the documents in
alphabetical order by the document's name) into a document called
Wf_Extracted.Txt (it also puts all repetivive segments in a second
document called Wf_Repetitions.Txt, but that is irrelevant here).
Being able to extract all text in OmegaT-based segments can also
enable a project manager to share the work among translators who do
not necessarily use OmegaT, because the text doesn't get segmented by
foreign tools using foreign segmentation strategies.
Logged In: YES
user_id=545103
You can already export all strings. (Ok, it's a bit of
work.) Just put your file in a project, go through every
segment, and let OmegaT insert the source text
automatically. Save. Open the TM, and presto, all your segments.
Logged In: NO
>>> You can already export all strings. (Ok, it's a bit of work.) <<<
You can do a lot of things with "a bit of work", but the problem is
that if you don't do this regularly or all the time, every time you
want to do it, you first have to figure out where you put that tool
that does is.
Logged In: YES
user_id=915082
problem is that I usually need to do that in _big_ projects where hitting [enter] 6000 times is necessary to do that :)
there are instances where a source file has weird line breaks that can't easily be corrected in the original, so exporting such a file allows
the check all segments beforehand, fix some, eventually translate with that text file and fix the whole thing after reloading with the
original source file. also, weird tagging can easily be avoided with such a function.
Logged In: YES
user_id=915082
I think you should go ahead with the merging. Eventually propose a rewrite of the original RFE to clarify the whole thing.
In SVN (/trunk, revision 9745) the script extract_text_content.groovy allows exporting all strings of the project to a single text file.
Didier
Is it possible to modify the script to force output in UTF-8? It exports (tested the script as included into the OmegaT 4.1.5 package) to the system default encoding which may be bad for many language pairs. For example, the output for a FR→RU project on Windows/Russian locale results in lost French characters with diacritics (question marks placed instead) as this locale uses cp1251 where the mentioned chars are missing.
Yes, that can be done.
Can you create an RFE for it?
Didier
Done:
https://sourceforge.net/p/omegat/feature-requests/1393/