tiny-utf8 / Tickets / #2 copy constructor of empty string throws std::bad_array_new

DuffsDevice - 2017-04-09

Thank you! That encourages me really!
This is definitely a bug, but I can't reproduce it (yet). Do you have the latest version? sorry for this kinda stupid question :)

What compilation flags do you use?

Greetings,
Jakob

Last edit: DuffsDevice 2017-04-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yes,
i tested it in version - tinyutf8_1.35.zip (BSD-3) 2017-02-12
I was compiling in Clion with gcc, but reproduced it with simple

g++ tinyutf8.cpp test.cpp -I . -std=c++14 -o test

The problem seems to be in indices_len set to -1 here:

utf8_string::utf8_string( const char* str , size_type len ) :
    utf8_string()
{
    // Reset some attributes, or else 'get_num_bytes_of_utf8_char' will not work, since it thinks the string is empty
    this->buffer_len = -1;
    this->indices_len = -1;
    if( str && str[0] )
    {
    ...
    }
}

and then used in copy constructor for allocation size: (converted to size_t huge int)

indices_of_multibyte( str.indices_len ? new size_type[str.indices_len] : nullptr )

mu current workaround was to change the first constructor like this:

inline utf8_string::utf8_string( const char* str , size_type len ) :
        utf8_string()
{
    // Reset some attributes, or else 'get_num_bytes_of_utf8_char' will not work, since it thinks the string is empty
    this->buffer_len = -1;
    this->indices_len = -1;

    if(str)
    {
        if (!str[0]) {
            clear();
        }
        else {
        ...
        }
    }
 }

Seems to work but i am not really sure if it wont introduce new problems somewhere else ;]

Thanks for quick response btw.

Last edit: Grynca 2017-04-09

DuffsDevice - 2017-04-09

Hi Grynca,

I know now, why I couldn't reproduce it, because 'I' were not on the latest version, but the current nightly build. Shame on Me :) The Bug is fixed there, but it may have some others now :|

You can download the latest nightly in the file section now. I'm currently writing my bachelors thesis in computer science so the development of tiny_utf8 is not doing to well at the moment, I'm sorry for that.
My my bachelors thesis I discuss on ways to test C++ Programs the easy way. Doing that, I wrote a testing framework, which I plan to apply to tiny_utf8 as well. But for now, I have not done any Unit-Testing, which I'd love to do. Maybe I find some time for that within the next 3 weeks.

You can at least try the nightly build, it simplifies a lot, while being little bit more efficient in both space and time. The table of indices to codepoints > 127 has been moved to the end of the buffer (which brings down the number of pointers) and memory allocations have been greatly reduced by using a capacity field (instead). Furthermore, the multibyte indices table is now RLE as well in order to use the datatype with least possible width.

I hope this helps for now...

Jakob

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DuffsDevice - 2018-03-07

Now fixed on Github :D

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DuffsDevice - 2018-03-07

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

copy constructor of empty string throws std::bad_array_new_length

Lightweight C++11 library for embedding Unicode

Milestone

Searches

Help

#2 copy constructor of empty string throws std::bad_array_new_length

Discussion