Menu

#813 std::cout splits up UTF-8 characters so that garbage characters are displayed

v1.0 (example)
open
nobody
5
2019-09-25
2019-09-24
xamid
No

std::cout does not behave well on UTF-8 encoded strings of lengths greated than 1024. They are printed out before they are reassembled, even if they were disassembled internally by std::cout by passing a large string. This code should explain the issue:

#include <iostream>
#include <sstream>

void myFaultyOutput();
void simulatedFaultyBehavior();

int main()
{
    myFaultyOutput();
    //simulatedFaultyBehavior();
}

void myFaultyOutput() {
    std::stringstream ss; // Note that ss is built correctly (which could be shown by saving ss.str() to a file).
    ss << "...";
    for (int i = 0; i < 20; i++) {
        for (int j = 0; j < 341; j++)
            ss << u8"\u301A";
        ss << "\n..";
    }
    std::cout << ss.str() << std::endl; // Problem occurs here, with cout.
    // Note that converting ss.str() to std::wstring and using std::wcout results in std::wcout not
    // displaying anything, not even ASCII characters in the future (until restarting the application).
}

// To display the problem on well-behaved systems ; just imagine the output would not contain newlines, while the faulty formatted characters remain.
void simulatedFaultyBehavior() {
    std::stringstream ss;
    int amount = 2000;
    for (int j = 0; j < amount; j++)
        ss << u8"\u301A";
    std::string s = ss.str();
    std::cout << "s.length(): " << s.length() << std::endl; // amount * 3
    while (s.length() > 1024) {
        std::cout << s.substr(0, 1024) << std::endl;
        s = s.substr(1024);
    }
    std::cout << s << std::endl << std::flush;
}

I also documented it here: https://stackoverflow.com/q/58080623/3410351, where might appear some information helpful for a fix.

Discussion

  • xamid

    xamid - 2019-09-24

    I used app.exe > out.txt 2>&1 which generates a file without these formatting issues. So the problem is that usually std::cout does this splitting but the underlying control (which receives the char sequence) has to handle correct reassembling? Unfortunately, nothing seems to handle it on Windows, except file streams.

     
  • xamid

    xamid - 2019-09-25

    I just posted an answer to my issue which I think clarifies it a lot: https://stackoverflow.com/a/58099629

     

Log in to post a comment.

MongoDB Logo MongoDB