Menu

#1312 CORE_SRX_ERROR_LOADING_SEGMENTATION_CONFIGUnexpected

6.0.2
open
nobody
None
5
2026-02-15
2026-01-02
No

I'm running OmegaT-6.0.2_0_dda2c9e, and I've got a project with project specific segmentation rules. If I open OmegaT the first time, there's an CORE_SRX_ERROR_LOADING_SEGMENTATION_CONFIGUnexpected error. But the project specific segmentation rules work.

If I then close OmegaT, and restart it, it restarts without those project specific segmentation rules.

Attached also the logs of a third subsequent restart.

This of course makes 6.0.2 unusuable for projects with project specific segmentation rules... So hopefully this is an easy fix? ;)

Attached also the "correct" segmentation.conf, which I luckily managed to retreive from the Google Drive versioning. (I made these segmentation rules when I was brand new to regex, so I assume they're not optimal, but I don't think that would be the issue?)

4 Attachments

Discussion

  • Erik De Boeck

    Erik De Boeck - 2026-01-26

    Any takers? :)

     
    • Hiroshi Miura

      Hiroshi Miura - 2026-01-27

      OmegaT 6.0.2 and 6.1.0 drop supporting segmentation.conf because of highest dangerous security issue.

      OmegaT has a converter that is com from DGT OmegaT project, and it may be incomplete.

      You can convert your custom rule into SRX using the migrator
      https://github.com/omegat-org/segmentation-migrator

      It has a security hole because it uses legacy logic to read a legacy segmentation.conf, but it should read well and write segmentation.srx that should recognized by OmegaT.

       
      • Erik De Boeck

        Erik De Boeck - 2026-01-27

        I'd love to, but I'm having trouble finding the mentioned omegat-segmentation-migrator-fat.jar or omegat-segmentation-migrator.zip files...

         
  • Hiroshi Miura

    Hiroshi Miura - 2026-01-27

    @t_cordonnier Please check the XSLT you contributed to convert into SRX.

     
  • Hiroshi Miura

    Hiroshi Miura - 2026-01-27

    The tool is not released yet. I have convert your segmentation.conf with the tool. If it works, I will proceed the project.

     
    • Erik De Boeck

      Erik De Boeck - 2026-01-28

      Alas... It doesn't throw any errors or exceptions, but it doesn't work either...

      For fun I randomly added gibberish scattered around segmentation.srx (inside tags, before the first tag, in the final tag), and there are no complaints. So it looks like nothing is actually done with segmentation.srx. It isn't read.

      Could that be?

      (I'm using OmegaT-6.0.2_0_768deab.)

       

      Last edit: Erik De Boeck 2026-01-28
      • Erik De Boeck

        Erik De Boeck - 2026-02-08

        I came up with a quick and dirty fix: I wrote a .ps1 file that copies the segmentation.conf file into the omegat\ directory every time before starting OmegaT:

        Copy-Item `
          "C:\Users\Erik\Google Drive\Scheidsrechteren\Vertalingen\OmegaT\segmentation.conf" `
          "C:\Users\Erik\Google Drive\Scheidsrechteren\Vertalingen\OmegaT\IFAF Engels-Nederlands\omegat\" `
          -Force
        java -jar "C:\PortbleApps\OmegaT\OmegaT_6.0.2_Without_JRE\OmegaT.jar"
        

        But that doesn't solve the issue, obviously.

         

        Last edit: Erik De Boeck 2026-02-08
  • Hiroshi Miura

    Hiroshi Miura - 2026-02-14

    @t_cordonnier please help.

     
  • Hiroshi Miura

    Hiroshi Miura - 2026-02-15

    When I opened the attached segmentation.conf with OmegaT 6.1.0 Beta Weekly,
    I got error

    09:44:48.507: FINE: Unknown languagerulename '{}' 
    09:44:48.507: FINE: Unknown languagerulename '{}' 
    09:44:48.507: FINE: Unknown languagerulename '{}' 
    09:44:48.507: FINE: Unknown languagerulename '{}' 
    
     
    • Hiroshi Miura

      Hiroshi Miura - 2026-02-15

      Here is a grepping result of segmentation.conf and generated segmentation.srx

       grep -A1 -n language segmentation.conf 
      7:     <void property="language">
      8-      <string>IFAF-Engels</string>
      --
      322:     <void property="language">
      323-      <string>Engels</string>
      --
      556:     <void property="language">
      557-      <string>Catalaans</string>
      --
      1080:     <void property="language">
      1081-      <string>Tsjechisch</string>
      --
      6123:     <void property="language">
      6124-      <string>Duits</string>
      --
      9949:     <void property="language">
      9950-      <string>Spaans</string>
      --
      10443:     <void property="language">
      10444-      <string>Fins</string>
      --
      10597:     <void property="language">
      10598-      <string>Frans</string>
      --
      10891:     <void property="language">
      10892-      <string>Italiaans</string>
      --
      12105:     <void property="language">
      12106-      <string>Japans</string>
      --
      12142:     <void property="language">
      12143-      <string>Nederlands</string>
      --
      15896:     <void property="language">
      15897-      <string>Pools</string>
      --
      21829:     <void property="language">
      21830-      <string>Russisch</string>
      --
      21883:     <void property="language">
      21884-      <string>Zweeds</string>
      --
      22627:     <void property="language">
      22628-      <string>Slowaaks</string>
      --
      28200:     <void property="language">
      28201-      <string>Chinees</string>
      --
      28247:     <void property="language">
      28248-      <string>Standaard</string>
      --
      28297:     <void property="language">
      28298-      <string>Segmentatie van tekstbestanden</string>
      --
      28324:     <void property="language">
      28325-      <string>segmentatie voor HTML, XHTML, ODF en Infix</string>
      

      and generated SRX xml file.

      $ grep -n languagerulename segmentation.srx 
      6:      <languagerule languagerulename="IFAF-Engels">
      108:      <languagerule languagerulename="English">
      198:      <languagerule languagerulename="Catalan">
      404:      <languagerule languagerulename="Czech">
      2414:      <languagerule languagerulename="German">
      3936:      <languagerule languagerulename="Spanish">
      4130:      <languagerule languagerulename="Finnish">
      4188:      <languagerule languagerulename="French">
      4302:      <languagerule languagerulename="Italian">
      4784:      <languagerule languagerulename="Japanese">
      4794:      <languagerule languagerulename="Dutch">
      6292:      <languagerule languagerulename="Polish">
      8658:      <languagerule languagerulename="Russian">
      8676:      <languagerule languagerulename="Swedish">
      8970:      <languagerule languagerulename="Slovak">
      11192:      <languagerule languagerulename="Chinese">
      11206:      <languagerule languagerulename="Standaard">
      11220:      <languagerule languagerulename="Segmentatie van tekstbestanden">
      11226:      <languagerule languagerulename="segmentatie voor HTML, XHTML, ODF en Infix">
      11234:      <languagemap languagerulename="IFAF-Engels" languagepattern="EN-GB"/>
      11235:      <languagemap languagerulename="English" languagepattern="EN.*"/>
      11236:      <languagemap languagerulename="Catalan" languagepattern="CA.*"/>
      11237:      <languagemap languagerulename="Czech" languagepattern="CS.*"/>
      11238:      <languagemap languagerulename="German" languagepattern="DE.*"/>
      11239:      <languagemap languagerulename="Spanish" languagepattern="ES.*"/>
      11240:      <languagemap languagerulename="Finnish" languagepattern="FI.*"/>
      11241:      <languagemap languagerulename="French" languagepattern="FR.*"/>
      11242:      <languagemap languagerulename="Italian" languagepattern="IT.*"/>
      11243:      <languagemap languagerulename="Japanese" languagepattern="JA.*"/>
      11244:      <languagemap languagerulename="Dutch" languagepattern="NL.*"/>
      11245:      <languagemap languagerulename="Polish" languagepattern="PL.*"/>
      11246:      <languagemap languagerulename="Russian" languagepattern="RU.*"/>
      11247:      <languagemap languagerulename="Swedish" languagepattern="SV.*"/>
      11248:      <languagemap languagerulename="Slovak" languagepattern="SK.*"/>
      11249:      <languagemap languagerulename="Chinese" languagepattern="ZH.*"/>
      11250:      <languagemap languagerulename="Standaard" languagepattern=".*"/>
      11251:      <languagemap languagerulename="Segmentatie van tekstbestanden" languagepattern=".*"/>
      11252:      <languagemap languagerulename="segmentatie voor HTML, XHTML, ODF en Infix" languagepattern=".*"/>
      
       
    • Hiroshi Miura

      Hiroshi Miura - 2026-02-15

      There is debug log template bug. when tweaks it, I got

      10:40:05.225: FINE: Unknown languagerulename 'IFAF-Engels' 
      10:40:05.226: FINE: Unknown languagerulename 'Standaard' 
      10:40:05.226: FINE: Unknown languagerulename 'Segmentatie van tekstbestanden' 
      10:40:05.227: FINE: Unknown languagerulename 'segmentatie voor HTML, XHTML, ODF en Infix' 
      

      segmentation.conf is depends on the RUNNING LOCALE. I run on English locale, Standaard is not English name Standard so It considered as unknown.

      OmegaT try to use standard English name in segmentation.srx but built-in conf -> srx converter contributed by @t_cordonnier does not handle localized rule names. so produced segmentation.srx is not compliant with the one expected by OmegaT.

       
      • Hiroshi Miura

        Hiroshi Miura - 2026-02-15

        Even the warning recorded in debug log, the loading of segmentation.conf looks working

         
  • Erik De Boeck

    Erik De Boeck - 2026-02-15

    How can I activate/find debug logs?

    Furthermore, since I 'invented' a language ("IFAF-Engels"), will OmegaT not complain about that 'not-recognized' language, even if the localisation problems would be fixed?

     

Log in to post a comment.

MongoDB Logo MongoDB