Menu

#92 Grep cannot handle UTF-16 and UTF-32 pas files

Closed
closed-wont-fix
None
5
2019-02-26
2018-11-25
No

When Grep tries to open UTF-16 with little endian BOM or UTF-32 with big endian BOM, it displays an error message "TStringList.LoadFromFile failed to read [filename]".
This worked fine with the Delphi 7 Grep version.

Example files are in the PasDoc tests:
error_bom_utf16_le.pas
error_bom_utf32_be.pas

(Of course the Delphi 7 did not know anything about Unicode and ignored the BOM, so it "worked".)

reported by Shlomo Abuisak

2 Attachments

Discussion

  • Thomas Mueller

    Thomas Mueller - 2018-11-25
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,9 +1,9 @@
    -When Grep tries to open UTF-16 with  little endian BOM or UTF-32  with BOM, it displays an error message "TStringList.LoadFromFile failed to read [filename]".
    +When Grep tries to open UTF-16 with  little endian BOM or UTF-32  with big endian BOM, it displays an error message "TStringList.LoadFromFile failed to read [filename]".
     This worked fine with the Delphi 7 Grep version.
    
     Example files are in the PasDoc tests:
     * error_bom_utf16_le.pas
    -* error_bom_utf32_le.pas
    +* error_bom_utf32_be.pas
    
     (Of course the  Delphi 7 did not know anything about Unicode and ignored the BOM, so it "worked".)
    
    • Attachments has changed:

    Diff:

    --- old
    +++ new
    @@ -0,0 +1,2 @@
    +error_bom_utf16_le.pas (492 Bytes; text/plain)
    +error_bom_utf32_be.pas (328 Bytes; text/plain)
    
     
  • Jeroen W. Pluimers

    Please add

    • "steps to reproduce" including: which file set to search (does it fail only on opened files, or only on disk files; only current; only in project; only in project group; all?)
    • which exact GExperts versions it fails and works
     
  • limelect

    limelect - 2018-12-28

    It works on D7 grep.
    On newer Delphi it fails wile searching in directory PAS content

     
  • Jeroen W. Pluimers

    Which D7 grep? The one in GExperts 1.12? Or another one? Any one?
    Which newer Delphi grep?

    Please state relevant GExperts versions so it is easier to track down the differences.

    If it failed in GExperts version X for Delphi 7 and version Y for Delphi 7, did you do a bisection search to find out which is the lowest GExperts version it starts to fail?

    Note I do not have any D7. I run Delphi 2007 and XE8, so any help to to focus this towards which exact GExperts versions this works/fails is welcome.

    Reference: https://plus.google.com/+ThomasMueller/posts/15yLTeyo7q1 / https://web.archive.org/web/20181230134142/https://plus.google.com/+ThomasMueller/posts/15yLTeyo7q1

     

    Last edit: Jeroen W. Pluimers 2018-12-30
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    I cannot reproduce the problem with the error_bom_utf16_le.pas file. It works fine in Delphi 2007.
    The other one fails as reported.

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    UTF-32 is not supported at all by the underlying SynUnicode TUnicodeStringList which is used in Delphi < 2009. An update to the latest version might solve this problem as https://github.com/SynEdit/SynEdit/blob/master/Source/SynUnicode.pas#L84 lists UTF-32 BOMS as well as UTF-16 ones.

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    UTF-16 LE also works fine with Delphi 10.3, UTF-32 BE also fails. The latter is not supported by Delphi 10.3's TEncoding, which only supports UTF-7, 8 and 16 encoding. A test with reading the file into a TStringList leaves the Count at 0, so that's why GExperts fails reading it.
    Again, using the latest SynUnicode might solve that problem.
    I'm not sure it is worth the trouble.

    Does anybody really use UTF-32 ?

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26
    • status: open --> accepted
    • assigned_to: Thomas Mueller
    • Group: New --> Need_More_Info
     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    Does anybody really use UTF-32 source files?
    (I simply don't know.)

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    Using SynUnicode for non-Unicode Delphi versions may solve the problem for these (not tested yet)

    But for Unicode-Delphi versions, this just sets TUnicodeStringList = TStringList, assuming it is Unicode aware, so it won't help.

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    No, SynUnicode doesn't work for non-Unicode Delphi versions either. It simply does not check for UTF-32 BOMs, even though they are declared in the source code.

    I close this case. Too much trouble for too little gain.

     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26
    • status: accepted --> closed-wont-fix
    • Group: Need_More_Info --> Closed
     
  • Thomas Mueller

    Thomas Mueller - 2019-02-26

    closed-won't fix, unless somebody submits a working patch

     

Log in to post a comment.