Menu

#742 Malformed strings in message files cause crash

Unknown
wip
None
Unknown
Bug
Unknown
Unknown
Unknown
Unknown
10 hours ago
1 day ago
No

Just tried one of the Arabic locales and ended up with this:
WIKINDX version : 6.12.1
PHP version : 8.2.0
Script : /Applications/MAMP/htdocs/wikindx6/trunk/src/core/messages/wikindx-ar.php
Line : 761
Code : 0
Message : Unmatched ')'

At that line, there is:
\'C:\\Program Files\\\example\\\\').';

The 4 backslashes at the end should be 3. It occurs elsewhere in that file (e.g., line 762) and may well be endemic in many language files.

As the new locale is already written to the users table, it is impossible to get out of this without manually editing the users table.

Discussion

  • Stéphane Aulery

    I'm taking the raw file as it comes out of Transifex. I'm going to see if there's a way in Transifex to protect specific sequences because it's the automatic translation that's doing a poor job.

     
    • Stéphane Aulery

      Stéphane Aulery - 16 hours ago

      Hi Mark,

      The PHP output of Transifex is buggy for two reasons:

      • Transifex PHP input reader is buggy. It read PHP code with custom rules and doesn't parse exactly like PHP interpreter, so illegal escape sequence are loaded in source strings. After that, garbage in, garbage out.
      • For the specific case provided the machine translation is puzzled by the source string for Arabic. For now, I've simply deleted the erroneous string.

      Since I have no way to control the output of Transifex, I changed the input/output format to JSON. In this operation, half of the translations are lost due to the change in encoding of the misinterpreted sources.

      I'm not immediately deleting the PHP catalogs until you've tested the new version.

      Transifex added a JSON suffix to the filename and I can't figure out how to get rid of it without reloading all 352 catalogs to avoid losing the translation memories. :-((

      That's enough for today. I haven't dealt with the time zone.

      Regards,

       
      • Mark Grimshaw

        Mark Grimshaw - 10 hours ago

        Sounds like a significant problem. I've done a quick check with the latest SVN and all seems fine. Arabic (multiple versions I think) is listed—I chose one or two of these without the issues previously reported.

        Mark

         
        • Stéphane Aulery

          Stéphane Aulery - 10 hours ago

          Thank you, a first step in the right direction. After spending the night on it, I'm hesitant.

          Even in the newly exported JSON files, there were incorrectly encoded characters. I could only correct them manually, searching for the strings one by one for hours. I simply deleted the erroneous translations. Transifex, once again, exports without properly respecting the encoding of the target format. So, now I have serious doubts about the platform's reliability.

          Should we redo all the translations or not? Should we simply remove Arabic because the automatic translator doesn't seem up to the task? Won't this break again later?

          Our placeholder system in strings is also flawed, non-standard, and therefore poorly recognized. Variable substitution is poorly implemented. We also can't specify text sequences not to be translated.

          So, since we're losing half the translations, I'm wondering if I shouldn't just fix our system and abandon Transifex.

           
  • Stéphane Aulery

    • status: open --> wip
     

Log in to post a comment.

MongoDB Logo MongoDB