Menu

#182 Extracts the content of the projects to text file

future
open
nobody
None
5
2017-03-28
2006-05-07
No

I was thinking that for textual verifications, it would be interesting to
have OmegaT export a text only version of the project/per file eventually.

This would serve as a "draft" for the final version and could be used to
review the translation in a text only editor to get a feel of the text style/
rendering.

Discussion

  • Samuel Murray

    Samuel Murray - 2006-06-05

    Logged In: YES
    user_id=168045

    Excellent idea. It would be great if OmegaT could export
    this at any time (even before you start translating). There
    should be an option to retain the original's paragraphing or
    to have one segment per line.

     
  • Samuel Murray

    Samuel Murray - 2006-11-08

    Logged In: YES
    user_id=168045

    I'd like to propose a merge between this RFE and RFE 1521629 "TM
    automatic creation from target", because this RFE can be part of a
    solution for RFE 1521629.

    If OmegaT can export all strings from a file (or an entire project),
    then a user can more easily prepare text files for alignment in the
    aligner or bligner tools. The user simply puts all source files in
    the "source" folder, does the extraction, then puts all the target
    files in the "source" folder, does the extraction again, and then
    he'll have to text files containing all the segments, ready to be
    aligned in his favourite align tool.

    The advantage of having OmegaT extract the text as opposed to simply
    using File -> Save As -> Text in his word processor, is that the
    strings will have been segmented by OmegaT itself, which will lead to
    better fuzzy match results in future projects.

    The Wordfast extraction tool works like this: it extracts all
    segments from all selected documents (processing the documents in
    alphabetical order by the document's name) into a document called
    Wf_Extracted.Txt (it also puts all repetivive segments in a second
    document called Wf_Repetitions.Txt, but that is irrelevant here).

    Being able to extract all text in OmegaT-based segments can also
    enable a project manager to share the work among translators who do
    not necessarily use OmegaT, because the text doesn't get segmented by
    foreign tools using foreign segmentation strategies.

     
  • Henry Pijffers

    Henry Pijffers - 2006-11-08

    Logged In: YES
    user_id=545103

    You can already export all strings. (Ok, it's a bit of
    work.) Just put your file in a project, go through every
    segment, and let OmegaT insert the source text
    automatically. Save. Open the TM, and presto, all your segments.

     
  • Nobody/Anonymous

    Logged In: NO

    >>> You can already export all strings. (Ok, it's a bit of work.) <<<

    You can do a lot of things with "a bit of work", but the problem is
    that if you don't do this regularly or all the time, every time you
    want to do it, you first have to figure out where you put that tool
    that does is.

     
  • Jean-Christophe Helary

    Logged In: YES
    user_id=915082

    problem is that I usually need to do that in _big_ projects where hitting [enter] 6000 times is necessary to do that :)

    there are instances where a source file has weird line breaks that can't easily be corrected in the original, so exporting such a file allows
    the check all segments beforehand, fix some, eventually translate with that text file and fix the whole thing after reloading with the
    original source file. also, weird tagging can easily be avoided with such a function.

     
  • Jean-Christophe Helary

    Logged In: YES
    user_id=915082

    I think you should go ahead with the merging. Eventually propose a rewrite of the original RFE to clarify the whole thing.

     
  • Jean-Christophe Helary

    • milestone: --> future
     
  • Didier Briel

    Didier Briel - 2017-03-28
    • summary: exporting the strings only --> Extracts the content of the projects to text file
     
  • Didier Briel

    Didier Briel - 2017-03-28

    In SVN (/trunk, revision 9745) the script extract_text_content.groovy allows exporting all strings of the project to a single text file.

    Didier

     
    • Gabix

      Gabix - 2018-06-21

      Is it possible to modify the script to force output in UTF-8? It exports (tested the script as included into the OmegaT 4.1.5 package) to the system default encoding which may be bad for many language pairs. For example, the output for a FR→RU project on Windows/Russian locale results in lost French characters with diacritics (question marks placed instead) as this locale uses cp1251 where the mentioned chars are missing.

       
      • Didier Briel

        Didier Briel - 2018-06-21

        Yes, that can be done.

        Can you create an RFE for it?

        Didier

         

Log in to post a comment.