Menu

#267 Add support for C++Builder 13 Florence

6.36
closed
Build (36)
2
2026-02-22
2025-09-17
Anonymous
No

C++Builder 13 Florence is available: https://sourceforge.net/p/owlnext/news/2025/09/announcing-the-availability-of-rad-studio-13-florence/
So we can add support in OWLNext, OCFNext, coolprj and OWLext

Related

Bugs: #632
News: 2025/09/announcing-the-availability-of-rad-studio-13-florence
News: 2026/02/owlnext-7020-64428-and-63613-updates
Wiki: OWLNext_Stable_Releases

Discussion

1 2 > >> (Page 1 of 2)
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-10-27
    • labels: --> Build
     
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-10-27

    Note removal of "tmschema.h" in this toolset version [bugs:#632].

     

    Related

    Bugs: #632


    Last edit: Vidar Hasfjord 2025-10-28
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-10-29

    Support has now been added in "owllink.h" on branches/6.44 [r8542]. Typo was fixed in [r8543].

     

    Related

    Commit: [r8542]
    Commit: [r8543]

  • Sebastian Ledesma

    Added support for command line tools: BCMAKE.BAT, BC.MAK
    Enhanced owllink.h
    Commit [r8544]

     
    👍
    1

    Related

    Commit: [r8544]

  • Ognyan Chernokozhev

    @sebas_ledesma, @vattila When trying to build the trunk with C++ Builder 13, I get errors:

    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__algorithm\comp.h(41,18): error E5133: invalid operands to binary expression ('const owl::TCodePages::TCodePage' and 'const owl::TCodePages::TCodePage') [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>      41 |     return __lhs < __rhs;
    1>         |            ~~~~~ ^ ~~~~~
    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__type_traits\invoke.h(179,25): Hint warning H5849: in instantiation of function template specialization 'std::__less<>::operator()<owl::TCodePages::TCodePage, owl::TCodePages::TCodePage>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>     179 |                { return static_cast<_Fp&&>(__f)(static_cast<_Args&&>(__args)...); }
    1>         |                         ^
    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__algorithm\lower_bound.h(40,14): Hint warning H5849: in instantiation of function template specialization 'std::__invoke<std::__less<void, void> &, owl::TCodePages::TCodePage &, const owl::TCodePages::TCodePage &>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>      40 |     if (std::__invoke(__comp, std::__invoke(__proj, *__m), __value)) {
    1>         |              ^
    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__algorithm\lower_bound.h(89,15): Hint warning H5849: in instantiation of function template specialization 'std::__lower_bound_bisecting<std::_ClassicAlgPolicy, std::__wrap_iter<owl::TCodePages::TCodePage *>, owl::TCodePages::TCodePage, std::__identity, std::__less<void, void>>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>      89 |   return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
    1>         |               ^
    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__algorithm\lower_bound.h(97,15): Hint warning H5849: in instantiation of function template specialization 'std::__lower_bound<std::_ClassicAlgPolicy, std::__wrap_iter<owl::TCodePages::TCodePage *>, std::__wrap_iter<owl::TCodePages::TCodePage *>, owl::TCodePages::TCodePage, std::__identity, std::__less<void, void>>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>      97 |   return std::__lower_bound<_ClassicAlgPolicy>(__first, __last, __value, __comp, __proj);
    1>         |               ^
    1> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__algorithm\lower_bound.h(103,15): Hint warning H5849: in instantiation of function template specialization 'std::lower_bound<std::__wrap_iter<owl::TCodePages::TCodePage *>, owl::TCodePages::TCodePage, std::__less<void, void>>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\owl.cbproj]
    1>     103 |   return std::lower_bound(__first, __last, __value, __less<>());
    1>         |               ^
    

    Any ideas how to fix this?

     
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-12-18

    @jogybl wrote:

    Any ideas how to fix this [TCodePage related compilation bug]?

    TCodePage lacks a less-than operator, so presumably cannot be used with std::lower_bound without an explicit predicate (or a less-than operator needs to be defined for the type).

     

    Last edit: Vidar Hasfjord 2025-12-18
    • Ognyan Chernokozhev

      Interesting that the same code compiles with C++ Builder 12 and Visual C++

      I can try to add the operator to see if that helps.

       
      👍
      1
      • Ognyan Chernokozhev

        Actually, the operators are there, but defined in the .cpp and not in the header.

         
        👍
        1
        • Ognyan Chernokozhev

          Moving the operators to from the .cpp to the .h solves this issue. ([r8583])

           
          👍
          2

          Related

          Commit: [r8583]

  • Ognyan Chernokozhev

    With C++ Builder 13, CoolPrj fails to build with error

    13>   In file included from ..\..\include\coolprj/cooledit.h:15:
    13> ..\..\include\coolprj/source.h(380,62): error E1927: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>     380 |     return D<M>::template TNotificationDispatch<N>::template Encode(sendMessage, nullptr, std::forward<A>(a)...);
    13>         |                                                              ^
    
     
  • Ognyan Chernokozhev

    And C++ Builder 12 fails to compile CoolPrj with the error

    13>   ..\..\include\coolprj/textbuffer.h(41,10): error E2771: 'operator<=' cannot be the name of a variable or data member [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    

    I guess it cannot handle the 'spaceship' operator?

     
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-12-19

    @jogybl wrote:

    With C++ Builder 13, CoolPrj fails to build with error [...] E1927: a template argument list is expected after a name prefixed by the template keyword

    The standard apparently requires a template argument list after Encode, since it is prefixed by the template keyword (which is required for disambiguation, because Encode is a dependent name). We want argument list type deduction here, since specifying it is a little bit awkward (Encode<decltype(sendMessage)>). Fortunately, according to Copilot, the standard allows an empty list to be specified, i.e. Encode<>, preserving the deduction . I've tested that this fix works with the Microsoft compilers. Hopefully, C++Builder 13 accepts it too.

    Note: The fix needs to be applied to QueryViews as well at line 399.

     
    • Ognyan Chernokozhev

      Thanks, that works.

      The next error is

      13> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\unordered_map(895,13): error E4632: no viable overloaded '=' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
      13>     895 |     __ref() = std::forward<_ValueTp>(__v);
      13>         |     ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
      13> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__hash_table(1263,44): Hint warning H5849: in instantiation of function template specialization 'std::__hash_value_type<int, owl::TFont>::operator=<const std::pair<const int, owl::TFont> &, 0>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
      13>    1263 |         __cache->__upcast()->__get_value() = *__first;
      13>         |                                            ^
      13> C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\unordered_map(1725,12): Hint warning H5849: in instantiation of function template specialization 'std::__hash_table<std::__hash_value_type<int, owl::TFont>, std::__unordered_map_hasher<int, std::__hash_value_type<int, owl::TFont>, std::hash<int>, std::equal_to<int>>, std::__unordered_map_equal<int, std::__hash_value_type<int, owl::TFont>, std::equal_to<int>, std::hash<int>>, std::allocator<std::__hash_value_type<int, owl::TFont>>>::__assign_unique<const std::pair<const int, owl::TFont> *>' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
      13>    1725 |   __table_.__assign_unique(__il.begin(), __il.end());
      13>         |            ^
      13> ..\..\source\coolprj\cooledit.cpp(2287,16): Hint warning H6232: in instantiation of member function 'std::unordered_map<int, owl::TFont>::operator=' requested here [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
      13>    2287 |   FontVariants = {};
      13>         |                ^
      
       
      • Ognyan Chernokozhev

        A possible solution: Replace the line
        FontVariants = {};
        with
        FontVariants.clear();

         
        👍
        1
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-12-19

    @jogybl wrote:

    And C++ Builder 12 fails to compile CoolPrj [...]

    This is deliberate. C++Builder 12 is not C++20 compliant and for this reason no longer supported on the trunk. In my latest work on CoolPrj, I started using C++20 features not supported by this compiler. The errors are hence expected.

     
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-12-20

    @jogybl wrote:

    Interesting that the same code compiles with C++ Builder 12 and Visual C++

    Apparently, those compilers delay the relevant error‑checking until instantiation time, which makes them more permissive, if not strictly standard‑compliant.

     
  • Ognyan Chernokozhev

    The next set of errors:

    13> "C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj" (Clean;Build target) (1) ->
    13> (_CLANGCoreCompile target) -> 
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(425,29): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__locale(167,24): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__locale(172,52): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(1035,27): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(919,22): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(991,29): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(1007,26): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(1013,34): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(788,20): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\fstream(849,20): error E4974: implicit instantiation of undefined template 'std::codecvt<char8_t, char, _Mbstatet>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\istream(332,18): error E4974: implicit instantiation of undefined template 'std::ctype<char8_t>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\istream(332,26): error E4974: implicit instantiation of undefined template 'std::ctype<char8_t>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    13>   C:\Program Files (x86)\Embarcadero\Studio\37.0\include\x86_64-w64-mingw32\c++\v1\__locale(172,52): error E4974: implicit instantiation of undefined template 'std::ctype<char8_t>' [C:\Work\OWLNext\Subversion\trunk\build\embarcadero\coolprj.cbproj]
    
     

    Last edit: Ognyan Chernokozhev 2025-12-21
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-12-22

    @jogybl wrote:

    The next set of errors [related to std::codecvt and Unicode]

    See "Unicode support in CoolPrj" [feature-requests:#54], especially my last comment. The Embarcadero toolset may not support the std::codecvt features I've used to implement support for reading and writing Unicode. The best solution may be to rewrite the code to no longer rely on these (deprecated) facilities. I surmise that rewriting these functions from scratch (with the help of AI) may not take much effort. Feel free to take ownership of the ticket.

     

    Related

    Feature Requests: #54

    • Ognyan Chernokozhev

      I am trying it with Copilot:

      "To refactor the code to avoid using std::codecvt, you need to replace all text encoding conversions that currently rely on std::codecvt facets with alternative approaches. Since C++17, std::codecvt is deprecated and not recommended for new code. The most portable and future-proof way is to use platform APIs or third-party libraries (like ICU, iconv, or Windows API for Windows-only code).
      Given your code is Windows-centric (uses _TCHAR, MultiByteToWideChar, etc.), the most direct replacement is to use Windows API functions for conversions between UTF-8, UTF-16, and ANSI.
      Below is a refactored version of the relevant conversion functions, removing all use of std::codecvt and replacing them with Windows API calls. The templates and specializations for ConvertTo_ are replaced with explicit functions for each conversion."

      #include <windows.h>
      
      // ... (rest of your includes)
      
      namespace CoolPrj {
      
      // --- Conversion helpers using Windows API ---
      
      // ANSI (ACP) <-> UTF-16
      inline std::wstring AnsiToUtf16(const std::string& s) {
        if (s.empty()) return {};
        int len = MultiByteToWideChar(CP_ACP, 0, s.data(), static_cast<int>(s.size()), nullptr, 0);
        std::wstring ws(len, L'\0');
        MultiByteToWideChar(CP_ACP, 0, s.data(), static_cast<int>(s.size()), ws.data(), len);
        return ws;
      }
      
      inline std::string Utf16ToAnsi(const std::wstring& ws) {
        if (ws.empty()) return {};
        int len = WideCharToMultiByte(CP_ACP, 0, ws.data(), static_cast<int>(ws.size()), nullptr, 0, nullptr, nullptr);
        std::string s(len, '\0');
        WideCharToMultiByte(CP_ACP, 0, ws.data(), static_cast<int>(ws.size()), s.data(), len, nullptr, nullptr);
        return s;
      }
      
      // UTF-8 <-> UTF-16
      inline std::wstring Utf8ToUtf16(const std::string& s) {
        if (s.empty()) return {};
        int len = MultiByteToWideChar(CP_UTF8, 0, s.data(), static_cast<int>(s.size()), nullptr, 0);
        std::wstring ws(len, L'\0');
        MultiByteToWideChar(CP_UTF8, 0, s.data(), static_cast<int>(s.size()), ws.data(), len);
        return ws;
      }
      
      inline std::string Utf16ToUtf8(const std::wstring& ws) {
        if (ws.empty()) return {};
        int len = WideCharToMultiByte(CP_UTF8, 0, ws.data(), static_cast<int>(ws.size()), nullptr, 0, nullptr, nullptr);
        std::string s(len, '\0');
        WideCharToMultiByte(CP_UTF8, 0, ws.data(), static_cast<int>(ws.size()), s.data(), len, nullptr, nullptr);
        return s;
      }
      
      // --- Replace ConvertTo_ specializations ---
      
      // UTF-16 -> ANSI
      template <>
      auto ConvertTo_<char, wchar_t>(const std::wstring_view& w) -> std::string {
        return Utf16ToAnsi(std::wstring(w));
      }
      
      // ANSI -> UTF-16
      template <>
      auto ConvertTo_<wchar_t, char>(const std::string_view& n) -> std::wstring {
        return AnsiToUtf16(std::string(n));
      }
      
      // UTF-8 -> UTF-16
      inline std::u16string Utf8ToUtf16_u16(const std::string& s) {
        std::wstring ws = Utf8ToUtf16(s);
        return std::u16string(ws.begin(), ws.end());
      }
      
      template <>
      auto ConvertTo_<char16_t, char8_t>(const std::u8string_view& n) -> std::u16string {
        // reinterpret_cast to std::string for Windows API
        return Utf8ToUtf16_u16(std::string(reinterpret_cast<const char*>(n.data()), n.size()));
      }
      
      // UTF-16 -> UTF-8
      inline std::u8string Utf16ToUtf8_u8(const std::u16string& u) {
        std::wstring ws(u.begin(), u.end());
        std::string s = Utf16ToUtf8(ws);
        return std::u8string(reinterpret_cast<const char8_t*>(s.data()), s.size());
      }
      
      template <>
      auto ConvertTo_<char8_t, char16_t>(const std::u16string_view& u) -> std::u8string {
        return Utf16ToUtf8_u8(std::u16string(u));
      }
      
      // UTF-8 <-> UTF-32 and UTF-32 <-> UTF-8
      // These are less common and not directly supported by Windows API; you may use manual conversion or a library like ICU if needed.
      // For now, you can use std::wstring as an intermediate (assuming UCS-2/UTF-16 surrogate pairs are not present).
      
      template <>
      auto ConvertTo_<char32_t, char8_t>(const std::u8string_view& n) -> std::u32string {
        std::wstring ws = Utf8ToUtf16(std::string(reinterpret_cast<const char*>(n.data()), n.size()));
        return std::u32string(ws.begin(), ws.end());
      }
      
      template <>
      auto ConvertTo_<char8_t, char32_t>(const std::u32string_view& u) -> std::u8string {
        std::wstring ws(u.begin(), u.end());
        std::string s = Utf16ToUtf8(ws);
        return std::u8string(reinterpret_cast<const char8_t*>(s.data()), s.size());
      }
      
      // --- Update ToAnsi_, ToUtf16_, ToUtf8_, ToUtf32_ to use new helpers ---
      
      auto ToAnsi_(const std::string_view& n) {
        return n; // Already ANSI
      }
      
      auto ToAnsi_(const std::wstring_view& w) {
        return ConvertTo_<char>(w);
      }
      
      auto ToUtf16_(const std::string_view& n) {
        return ConvertTo_<wchar_t>(n);
      }
      
      auto ToUtf16_(const std::wstring_view& w) {
        return w;
      }
      
      auto ToUtf8_(const std::wstring_view& w) {
        return ConvertTo_<char8_t>(std::u16string_view(reinterpret_cast<const char16_t*>(w.data()), w.size()));
      }
      
      auto ToUtf8_(const std::string_view& n) {
        return ToUtf8_(ToUtf16_(n));
      }
      
      template <class TStringView>
      auto ToUtf32_(const TStringView& s) {
        return ConvertTo_<char32_t, char8_t>(ToUtf8_(s));
      }
      
      // ... (rest of your code remains unchanged)
      
      } // namespace CoolPrj
      

      Key changes and reasoning:

      • All std::codecvt-based conversions are replaced with Windows API (MultiByteToWideChar, WideCharToMultiByte).
      • The conversion templates are now explicit functions for each conversion direction.
      • For UTF-32, the conversion is simplified and may not handle surrogate pairs perfectly; for full Unicode support, consider using a library like ICU.
      • The rest of your code (line readers, file I/O, etc.) can remain unchanged, as long as you use these new conversion helpers.

      Note:
      If you need cross-platform support, consider using a library like ICU or utfcpp for encoding conversions. The above solution is Windows-specific.
      Let me know if you want the full file with all changes applied, or if you need further adjustments for cross-platform support.

       
      👍
      1
      • Ognyan Chernokozhev

        Attached is the latest iteration of the code changes implemented by Copilot Agent that finally compiles

         
        • Ognyan Chernokozhev

          Unfortunately, my trial of C++ Builder 13 expired long time ago so I cannot test the changes.

           
  • Vidar Hasfjord

    Vidar Hasfjord - 2026-02-09

    @jogybl wrote:

    Attached is the latest iteration of the code changes implemented by Copilot Agent that finally compiles

    Alas, it doesn't work, though. I just did a quick test with Visual Studio 2026, and UTF-16LE fails to load correctly in CoolDemo. UTF-32 handling seems incomplete as well.

    Perhaps AI assistance will work better if you ask it to create simple standard C++ functions from scratch, only based on Win32, without providing the existing code (which is confusing even for me who wrote it, due to the intricate stream interoperability).

    Functions needed are:

    • detection routine for UTF-8 without BOM (heuristic; if it has no BOM and converts to UTF-8 without error, it is assumed to be UTF-8, otherwise assume current ANSI code page),
    • detection routine for BOM for all the UTF formats (including Little-Endian and Big-Endian variants), and
    • corresponding reading routines, e.g. into corresponding C++ standard string types, with
    • subsequent conversion to Win32 runtime string (ANSI/Unicode string, depending on build mode).
     
    • Ognyan Chernokozhev

      Moral of the story: Do not rely on AI to effectively fix code that you do not understand.

       
      😄
      1
  • Vidar Hasfjord

    Vidar Hasfjord - 2026-02-09

    @jogybl wrote:

    Moral of the story: Do not rely on AI to effectively fix code that you do not understand.

    The following attempt, using Microsoft Copilot in "think deeper" mode, looks more promising. I haven't tested it though.

    (Edit: I've now done some tests, and the code seems to work well, handling all the UTF encoding variants. And it is lightening fast compared to the stream solution.)


    File encoding helpers and readers (Win32 + ISO C++20)

    Below are compact, self-contained C++20 functions that:

    • Detect BOM for UTF-8/UTF-16/UTF-32 (LE/BE).
    • Heuristically detect UTF‑8 without BOM (valid UTF‑8 sequences and presence of multi‑byte sequences).
    • Read a file (Win32 CreateFile/ReadFile) into bytes and then into:
    • std::wstring (UTF‑16 native on Windows),
    • std::basic_string<TCHAR> (runtime string depending on UNICODE macro),
    • std::string (ANSI) if desired.
    • Convert UTF‑8/ANSI/UTF‑16LE/UTF‑16BE/UTF‑32LE/UTF‑32BE to std::wstring using only Win32 APIs and small helper code (no external libraries).

    Copy the code below into a single .cpp file and compile with a Windows-targeting toolchain (MSVC or clang-cl). It uses only Win32 APIs and standard C++20.


    Key design notes

    • BOM detection returns an enum identifying encoding and BOM length.
    • UTF‑8 heuristic: validates byte sequences and requires at least one multi‑byte sequence to avoid misclassifying ASCII as UTF‑8 (ASCII is valid UTF‑8 but often intended as ANSI).
    • UTF‑16 BE / UTF‑32 BE: bytes are swapped to little-endian before conversion.
    • UTF‑32 → UTF‑16: implemented manually (handles BMP and surrogate pairs).
    • Conversion to runtime string: std::basic_string<TCHAR> is produced by converting the std::wstring to ANSI when UNICODE is not defined, or returning std::wstring content when UNICODE is defined.

    Code

    // win_text_readers.cpp
    // Requires: Windows.h, vector, string, optional <tchar.h> for TCHAR
    // Compile: cl /std:c++20 win_text_readers.cpp /link
    
    #include <windows.h>
    #include <vector>
    #include <string>
    #include <optional>
    #include <cstdint>
    #include <tchar.h> // for TCHAR and _T macro
    
    // -----------------------------
    // Encoding detection result
    // -----------------------------
    enum class FileEncoding {
        Unknown,
        ANSI,           // current code page (CP_ACP)
        UTF8_BOM,
        UTF8_NoBOM,
        UTF16_LE_BOM,
        UTF16_BE_BOM,
        UTF32_LE_BOM,
        UTF32_BE_BOM
    };
    
    struct BomInfo {
        FileEncoding encoding;
        size_t bom_length;
    };
    
    // -----------------------------
    // Read file bytes (Win32)
    // -----------------------------
    static std::optional<std::vector<uint8_t>> ReadFileBytes(const std::wstring& path) {
        HANDLE h = CreateFileW(path.c_str(), GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING,
                               FILE_ATTRIBUTE_NORMAL, nullptr);
        if (h == INVALID_HANDLE_VALUE) return std::nullopt;
    
        LARGE_INTEGER size{};
        if (!GetFileSizeEx(h, &size) || size.QuadPart < 0) {
            CloseHandle(h);
            return std::nullopt;
        }
    
        std::vector<uint8_t> data;
        data.resize(static_cast<size_t>(size.QuadPart));
        DWORD read = 0;
        bool ok = true;
        if (size.QuadPart > 0) {
            if (!ReadFile(h, data.data(), static_cast<DWORD>(data.size()), &read, nullptr) ||
                read != data.size()) {
                ok = false;
            }
        }
        CloseHandle(h);
        if (!ok) return std::nullopt;
        return data;
    }
    
    // -----------------------------
    // BOM detection
    // -----------------------------
    static BomInfo DetectBOM(const std::vector<uint8_t>& data) {
        // BOM signatures:
        // UTF-8:       EF BB BF
        // UTF-16 LE:   FF FE
        // UTF-16 BE:   FE FF
        // UTF-32 LE:   FF FE 00 00
        // UTF-32 BE:   00 00 FE FF
        if (data.size() >= 3 &&
            data[0] == 0xEF && data[1] == 0xBB && data[2] == 0xBF) {
            return { FileEncoding::UTF8_BOM, 3 };
        }
        if (data.size() >= 4 &&
            data[0] == 0xFF && data[1] == 0xFE && data[2] == 0x00 && data[3] == 0x00) {
            return { FileEncoding::UTF32_LE_BOM, 4 };
        }
        if (data.size() >= 4 &&
            data[0] == 0x00 && data[1] == 0x00 && data[2] == 0xFE && data[3] == 0xFF) {
            return { FileEncoding::UTF32_BE_BOM, 4 };
        }
        if (data.size() >= 2 &&
            data[0] == 0xFF && data[1] == 0xFE) {
            return { FileEncoding::UTF16_LE_BOM, 2 };
        }
        if (data.size() >= 2 &&
            data[0] == 0xFE && data[1] == 0xFF) {
            return { FileEncoding::UTF16_BE_BOM, 2 };
        }
        return { FileEncoding::Unknown, 0 };
    }
    
    // -----------------------------
    // UTF-8 validation heuristic (no BOM)
    // - returns true if bytes are valid UTF-8 and contain at least one multi-byte sequence
    // -----------------------------
    static bool IsValidUtf8AndHasMultiByte(const std::vector<uint8_t>& data) {
        size_t i = 0;
        bool hasMulti = false;
        while (i < data.size()) {
            uint8_t c = data[i];
            if (c <= 0x7F) {
                // ASCII
                ++i;
                continue;
            }
            // multi-byte sequences
            size_t expected = 0;
            if ((c & 0xE0) == 0xC0) expected = 2;         // 110x xxxx
            else if ((c & 0xF0) == 0xE0) expected = 3;    // 1110 xxxx
            else if ((c & 0xF8) == 0xF0) expected = 4;    // 1111 0xxx
            else return false; // invalid leading byte
    
            if (i + expected > data.size()) return false;
            // check continuation bytes
            for (size_t j = 1; j < expected; ++j) {
                if ((data[i + j] & 0xC0) != 0x80) return false;
            }
            hasMulti = true;
            i += expected;
        }
        // If file is empty or only ASCII, we treat ASCII as not "definitive" UTF-8 for heuristic
        return hasMulti;
    }
    
    // -----------------------------
    // Helpers: swap bytes for BE -> LE conversion
    // -----------------------------
    static void SwapBytesInPlace16(uint8_t* p, size_t count) {
        for (size_t i = 0; i < count; i += 2) {
            std::swap(p[i], p[i + 1]);
        }
    }
    static void SwapBytesInPlace32(uint8_t* p, size_t count) {
        for (size_t i = 0; i < count; i += 4) {
            std::swap(p[i + 0], p[i + 3]);
            std::swap(p[i + 1], p[i + 2]);
        }
    }
    
    // -----------------------------
    // UTF-32 (LE) -> std::wstring (UTF-16) conversion
    // Handles surrogate pairs
    // -----------------------------
    static std::wstring Utf32LeBytesToWstring(const uint8_t* bytes, size_t byteCount) {
        std::wstring out;
        if (byteCount % 4 != 0) return out;
        size_t count = byteCount / 4;
        out.reserve(count);
        for (size_t i = 0; i < count; ++i) {
            uint32_t cp = uint32_t(bytes[i*4 + 0]) |
                          (uint32_t(bytes[i*4 + 1]) << 8) |
                          (uint32_t(bytes[i*4 + 2]) << 16) |
                          (uint32_t(bytes[i*4 + 3]) << 24);
            if (cp <= 0xFFFF) {
                // BMP (but exclude surrogate code points)
                if (cp >= 0xD800 && cp <= 0xDFFF) {
                    // invalid code point in UTF-32 for direct mapping; replace with U+FFFD
                    out.push_back(0xFFFD);
                } else {
                    out.push_back(static_cast<wchar_t>(cp));
                }
            } else if (cp <= 0x10FFFF) {
                // surrogate pair
                uint32_t v = cp - 0x10000;
                wchar_t high = static_cast<wchar_t>((v >> 10) + 0xD800);
                wchar_t low  = static_cast<wchar_t>((v & 0x3FF) + 0xDC00);
                out.push_back(high);
                out.push_back(low);
            } else {
                // invalid -> replacement char
                out.push_back(0xFFFD);
            }
        }
        return out;
    }
    
    // -----------------------------
    // Convert bytes (after BOM removal/byte-swapping) to std::wstring
    // Uses MultiByteToWideChar for UTF-8 and ANSI; direct reinterpret for UTF-16LE bytes.
    // -----------------------------
    static std::wstring BytesUtf8ToWstring(const uint8_t* bytes, size_t byteCount) {
        if (byteCount == 0) return {};
        // MultiByteToWideChar with CP_UTF8
        int needed = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
                                         reinterpret_cast<LPCCH>(bytes),
                                         static_cast<int>(byteCount),
                                         nullptr, 0);
        if (needed == 0) {
            // conversion failed
            return {};
        }
        std::wstring out;
        out.resize(needed);
        int got = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
                                      reinterpret_cast<LPCCH>(bytes),
                                      static_cast<int>(byteCount),
                                      out.data(), needed);
        if (got == 0) return {};
        return out;
    }
    
    static std::wstring BytesAnsiToWstring(const uint8_t* bytes, size_t byteCount) {
        if (byteCount == 0) return {};
        int needed = MultiByteToWideChar(CP_ACP, 0,
                                         reinterpret_cast<LPCCH>(bytes),
                                         static_cast<int>(byteCount),
                                         nullptr, 0);
        if (needed == 0) return {};
        std::wstring out;
        out.resize(needed);
        int got = MultiByteToWideChar(CP_ACP, 0,
                                      reinterpret_cast<LPCCH>(bytes),
                                      static_cast<int>(byteCount),
                                      out.data(), needed);
        if (got == 0) return {};
        return out;
    }
    
    static std::wstring BytesUtf16LeToWstring(const uint8_t* bytes, size_t byteCount) {
        if (byteCount % 2 != 0) return {};
        size_t wcharCount = byteCount / 2;
        std::wstring out;
        out.resize(wcharCount);
        // reinterpret cast is safe on Windows where wchar_t is UTF-16 (2 bytes)
        memcpy(out.data(), bytes, byteCount);
        return out;
    }
    
    // -----------------------------
    // Public: Read text file into std::wstring (UTF-16 native)
    // Detects BOM, heuristically detects UTF-8 without BOM, otherwise assumes ANSI.
    // -----------------------------
    static std::optional<std::wstring> ReadTextFileToWstring(const std::wstring& path) {
        auto bytesOpt = ReadFileBytes(path);
        if (!bytesOpt) return std::nullopt;
        auto& bytes = *bytesOpt;
        // Detect BOM
        BomInfo b = DetectBOM(bytes);
        size_t offset = b.bom_length;
        switch (b.encoding) {
        case FileEncoding::UTF8_BOM:
            return BytesUtf8ToWstring(bytes.data() + offset, bytes.size() - offset);
        case FileEncoding::UTF16_LE_BOM:
            return BytesUtf16LeToWstring(bytes.data() + offset, bytes.size() - offset);
        case FileEncoding::UTF16_BE_BOM: {
            // swap bytes in place copy
            std::vector<uint8_t> copy(bytes.begin() + offset, bytes.end());
            SwapBytesInPlace16(copy.data(), copy.size());
            return BytesUtf16LeToWstring(copy.data(), copy.size());
        }
        case FileEncoding::UTF32_LE_BOM: {
            return Utf32LeBytesToWstring(bytes.data() + offset, bytes.size() - offset);
        }
        case FileEncoding::UTF32_BE_BOM: {
            std::vector<uint8_t> copy(bytes.begin() + offset, bytes.end());
            SwapBytesInPlace32(copy.data(), copy.size());
            return Utf32LeBytesToWstring(copy.data(), copy.size());
        }
        case FileEncoding::Unknown:
        default:
            // No BOM: heuristic for UTF-8 without BOM
            if (IsValidUtf8AndHasMultiByte(bytes)) {
                auto s = BytesUtf8ToWstring(bytes.data(), bytes.size());
                if (!s.empty()) return s;
                // fallthrough to ANSI if conversion failed
            }
            // Assume ANSI (current code page)
            return BytesAnsiToWstring(bytes.data(), bytes.size());
        }
    }
    
    // -----------------------------
    // Convert std::wstring to runtime string (std::basic_string<TCHAR>)
    // If UNICODE is defined, returns wstring; otherwise converts to ANSI using WideCharToMultiByte(CP_ACP).
    // -----------------------------
    static std::basic_string<TCHAR> WstringToTString(const std::wstring& w) {
    #ifdef UNICODE
        return std::basic_string<TCHAR>(w.begin(), w.end());
    #else
        if (w.empty()) return {};
        int needed = WideCharToMultiByte(CP_ACP, 0, w.data(), static_cast<int>(w.size()),
                                         nullptr, 0, nullptr, nullptr);
        if (needed == 0) return {};
        std::string s;
        s.resize(needed);
        int got = WideCharToMultiByte(CP_ACP, 0, w.data(), static_cast<int>(w.size()),
                                      s.data(), needed, nullptr, nullptr);
        if (got == 0) return {};
        return std::basic_string<TCHAR>(s.begin(), s.end());
    #endif
    }
    
    // -----------------------------
    // Convenience: Read file into runtime string (TCHAR)
    // -----------------------------
    static std::optional<std::basic_string<TCHAR>> ReadTextFileToTString(const std::wstring& path) {
        auto wopt = ReadTextFileToWstring(path);
        if (!wopt) return std::nullopt;
        return WstringToTString(*wopt);
    }
    
    // -----------------------------
    // Example usage (uncomment for a simple test)
    // -----------------------------
    /*
    #include <iostream>
    
    int wmain(int argc, wchar_t** argv) {
        if (argc < 2) {
            std::wcerr << L"Usage: reader <path>\n";
            return 1;
        }
        auto w = ReadTextFileToWstring(argv[1]);
        if (!w) {
            std::wcerr << L"Failed to read file\n";
            return 1;
        }
        // Print as wide (console must support wide output)
        std::wcout << *w << L"\n";
    
        auto t = ReadTextFileToTString(argv[1]);
        if (t) {
    #ifdef UNICODE
            std::wcout << L"TCHAR string length: " << t->size() << L"\n";
    #else
            std::cout << "TCHAR string length: " << t->size() << "\n";
    #endif
        }
        return 0;
    }
    */
    

    Quick usage guidance

    • To read into Windows native UTF‑16: call ReadTextFileToWstring(path).
    • To read into runtime TCHAR string: call ReadTextFileToTString(path).
    • If you need std::string (UTF‑8): convert std::wstring with WideCharToMultiByte(CP_UTF8, ...).
    • The heuristic won’t classify pure ASCII as UTF‑8 (it’s valid UTF‑8 but ambiguous); it requires at least one multi‑byte sequence to claim UTF‑8 without BOM. This reduces false positives for ANSI files that are plain ASCII.

    Limitations & notes

    • The heuristic is intentionally conservative: ASCII-only files are treated as ANSI unless a BOM is present or multi-byte UTF‑8 sequences are found.
    • The code assumes wchar_t is 2 bytes (Windows). It uses Win32 APIs (MultiByteToWideChar, WideCharToMultiByte) for conversions.
    • Error handling is minimal and returns std::nullopt on I/O or conversion failures; you can extend with richer diagnostics if needed.
    • For very large files you may want streaming processing rather than reading the entire file into memory.
     

    Last edit: Vidar Hasfjord 2026-02-09
1 2 > >> (Page 1 of 2)

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB