Menu

#2 copy constructor of empty string throws std::bad_array_new_length

1.0
closed
None
2018-03-07
2017-04-08
Grynca
No

Hello,
first of all thanks for such a great library.
I am currently trying to use it in my codebase instead of std::string.
It mostly works but i encountered one little problem.

utf8_string us(""); 
utf8_string us2(us);

this throws std::bad_array_new_length

Is it desired behavior or bug?

Thanks for your response in advance.
Grynca

Discussion

  • DuffsDevice

    DuffsDevice - 2017-04-09

    Thank you! That encourages me really!
    This is definitely a bug, but I can't reproduce it (yet). Do you have the latest version? sorry for this kinda stupid question :)

    What compilation flags do you use?

    Greetings,
    Jakob

     

    Last edit: DuffsDevice 2017-04-09
  • Grynca

    Grynca - 2017-04-09

    Yes,
    i tested it in version - tinyutf8_1.35.zip (BSD-3) 2017-02-12
    I was compiling in Clion with gcc, but reproduced it with simple

    g++ tinyutf8.cpp test.cpp -I . -std=c++14 -o test
    

    The problem seems to be in indices_len set to -1 here:

    utf8_string::utf8_string( const char* str , size_type len ) :
        utf8_string()
    {
        // Reset some attributes, or else 'get_num_bytes_of_utf8_char' will not work, since it thinks the string is empty
        this->buffer_len = -1;
        this->indices_len = -1;
        if( str && str[0] )
        {
        ...
        }
    }
    

    and then used in copy constructor for allocation size: (converted to size_t huge int)

    indices_of_multibyte( str.indices_len ? new size_type[str.indices_len] : nullptr )
    

    mu current workaround was to change the first constructor like this:

    inline utf8_string::utf8_string( const char* str , size_type len ) :
            utf8_string()
    {
        // Reset some attributes, or else 'get_num_bytes_of_utf8_char' will not work, since it thinks the string is empty
        this->buffer_len = -1;
        this->indices_len = -1;
    
        if(str)
        {
            if (!str[0]) {
                clear();
            }
            else {
            ...
            }
        }
     }
    

    Seems to work but i am not really sure if it wont introduce new problems somewhere else ;]

    Thanks for quick response btw.

     

    Last edit: Grynca 2017-04-09
  • DuffsDevice

    DuffsDevice - 2017-04-09

    Hi Grynca,

    I know now, why I couldn't reproduce it, because 'I' were not on the latest version, but the current nightly build. Shame on Me :) The Bug is fixed there, but it may have some others now :|

    You can download the latest nightly in the file section now. I'm currently writing my bachelors thesis in computer science so the development of tiny_utf8 is not doing to well at the moment, I'm sorry for that.
    My my bachelors thesis I discuss on ways to test C++ Programs the easy way. Doing that, I wrote a testing framework, which I plan to apply to tiny_utf8 as well. But for now, I have not done any Unit-Testing, which I'd love to do. Maybe I find some time for that within the next 3 weeks.

    You can at least try the nightly build, it simplifies a lot, while being little bit more efficient in both space and time. The table of indices to codepoints > 127 has been moved to the end of the buffer (which brings down the number of pointers) and memory allocations have been greatly reduced by using a capacity field (instead). Furthermore, the multibyte indices table is now RLE as well in order to use the datatype with least possible width.

    I hope this helps for now...

    Jakob

     
  • DuffsDevice

    DuffsDevice - 2018-03-07

    Now fixed on Github :D

     
  • DuffsDevice

    DuffsDevice - 2018-03-07
    • status: open --> closed
     

Log in to post a comment.