copy constructor of empty string throws std::bad_array_new_length
Lightweight C++11 library for embedding Unicode
Brought to you by:
ijake1111
Hello,
first of all thanks for such a great library.
I am currently trying to use it in my codebase instead of std::string.
It mostly works but i encountered one little problem.
utf8_string us("");
utf8_string us2(us);
this throws std::bad_array_new_length
Is it desired behavior or bug?
Thanks for your response in advance.
Grynca
Thank you! That encourages me really!
This is definitely a bug, but I can't reproduce it (yet). Do you have the latest version? sorry for this kinda stupid question :)
What compilation flags do you use?
Greetings,
Jakob
Last edit: DuffsDevice 2017-04-09
Yes,
i tested it in version - tinyutf8_1.35.zip (BSD-3) 2017-02-12
I was compiling in Clion with gcc, but reproduced it with simple
The problem seems to be in indices_len set to -1 here:
and then used in copy constructor for allocation size: (converted to size_t huge int)
mu current workaround was to change the first constructor like this:
Seems to work but i am not really sure if it wont introduce new problems somewhere else ;]
Thanks for quick response btw.
Last edit: Grynca 2017-04-09
Hi Grynca,
I know now, why I couldn't reproduce it, because 'I' were not on the latest version, but the current nightly build. Shame on Me :) The Bug is fixed there, but it may have some others now :|
You can download the latest nightly in the file section now. I'm currently writing my bachelors thesis in computer science so the development of tiny_utf8 is not doing to well at the moment, I'm sorry for that.
My my bachelors thesis I discuss on ways to test C++ Programs the easy way. Doing that, I wrote a testing framework, which I plan to apply to tiny_utf8 as well. But for now, I have not done any Unit-Testing, which I'd love to do. Maybe I find some time for that within the next 3 weeks.
You can at least try the nightly build, it simplifies a lot, while being little bit more efficient in both space and time. The table of indices to codepoints > 127 has been moved to the end of the buffer (which brings down the number of pointers) and memory allocations have been greatly reduced by using a capacity field (instead). Furthermore, the multibyte indices table is now RLE as well in order to use the datatype with least possible width.
I hope this helps for now...
Jakob
Now fixed on Github :D