OmegaT - multiplatform CAT tool / Feature Requests / #182 Extracts the content of the projects to text file

Samuel Murray - 2006-06-05

Logged In: YES
user_id=168045

Excellent idea. It would be great if OmegaT could export
this at any time (even before you start translating). There
should be an option to retain the original's paragraphing or
to have one segment per line.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Samuel Murray - 2006-11-08

Logged In: YES
user_id=168045

I'd like to propose a merge between this RFE and RFE 1521629 "TM
automatic creation from target", because this RFE can be part of a
solution for RFE 1521629.

If OmegaT can export all strings from a file (or an entire project),
then a user can more easily prepare text files for alignment in the
aligner or bligner tools. The user simply puts all source files in
the "source" folder, does the extraction, then puts all the target
files in the "source" folder, does the extraction again, and then
he'll have to text files containing all the segments, ready to be
aligned in his favourite align tool.

The advantage of having OmegaT extract the text as opposed to simply
using File -> Save As -> Text in his word processor, is that the
strings will have been segmented by OmegaT itself, which will lead to
better fuzzy match results in future projects.

The Wordfast extraction tool works like this: it extracts all
segments from all selected documents (processing the documents in
alphabetical order by the document's name) into a document called
Wf_Extracted.Txt (it also puts all repetivive segments in a second
document called Wf_Repetitions.Txt, but that is irrelevant here).

Being able to extract all text in OmegaT-based segments can also
enable a project manager to share the work among translators who do
not necessarily use OmegaT, because the text doesn't get segmented by
foreign tools using foreign segmentation strategies.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Henry Pijffers - 2006-11-08

Logged In: YES
user_id=545103

You can already export all strings. (Ok, it's a bit of
work.) Just put your file in a project, go through every
segment, and let OmegaT insert the source text
automatically. Save. Open the TM, and presto, all your segments.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-11-09

Logged In: NO

>>> You can already export all strings. (Ok, it's a bit of work.) <<<

You can do a lot of things with "a bit of work", but the problem is
that if you don't do this regularly or all the time, every time you
want to do it, you first have to figure out where you put that tool
that does is.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2006-11-09

Logged In: YES
user_id=915082

problem is that I usually need to do that in _big_ projects where hitting [enter] 6000 times is necessary to do that :)

there are instances where a source file has weird line breaks that can't easily be corrected in the original, so exporting such a file allows
the check all segments beforehand, fix some, eventually translate with that text file and fix the whole thing after reloading with the
original source file. also, weird tagging can easily be avoided with such a function.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2006-11-10

Logged In: YES
user_id=915082

I think you should go ahead with the merging. Eventually propose a rewrite of the original RFE to clarify the whole thing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2006-12-11

milestone: --> future
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2017-03-28

summary: exporting the strings only --> Extracts the content of the projects to text file
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2017-03-28

In SVN (/trunk, revision 9745) the script extract_text_content.groovy allows exporting all strings of the project to a single text file.

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabix - 2018-06-21
  
  Is it possible to modify the script to force output in UTF-8? It exports (tested the script as included into the OmegaT 4.1.5 package) to the system default encoding which may be bad for many language pairs. For example, the output for a FR→RU project on Windows/Russian locale results in lost French characters with diacritics (question marks placed instead) as this locale uses cp1251 where the mentioned chars are missing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Didier Briel - 2018-06-21
    
    Yes, that can be done.
    
    Can you create an RFE for it?
    
    Didier
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Gabix - 2018-06-21
      
      Done:
      https://sourceforge.net/p/omegat/feature-requests/1393/
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Extracts the content of the projects to text file

The free computer aided translation (CAT) tool for professionals

Group

Searches

Help

#182 Extracts the content of the projects to text file

Discussion