#174 Array doubling wrong in vector, string, etc.

closed-accepted
5
2014-08-16
2008-04-14
Herve Bronnimann
No

The array doubling computation in vector, string, and possibly all the other containers that use capacity is typically as follows:

const size_type __old_size = size();
size_type __len = __old_size + (max)(__old_size, __n);

This is plain wrong. When sizeof(T) is 1, old_size is greater than 2^31 = max_size() / 2 and n is a small number (e.g., 1 for push_back), clearly old_size is larger than n, resulting in old_size + old_size which overflows and (wrapping around) ends up at less than the old_size.

The "correct" code in such instances is:

const size_type __old_size = size();
size_type __len = (max)(2 * __old_size, __old_size + __n);

And I say correct between quotes, because this is still wrong for, e.g., push_back in strings with more than max_size()/2 length, as it will produce a quadratic behavior (resize one by one with realloc at each resize). In order to really take care of the overflows and of this problem, one ought to (I hope this is the really correct code):

if (n > max_size() - __old_size
throw length_error();
const size_type __old_size = size();
size_type __len = (__n < __old_size) ? 2 * __old_size : old_size + n;
if (__len > max_size() || __len < __old_size)
__len = max_size(); // overflow

In addition, when the allocate(__len) throws a bad_alloc, it would be graceful to catch that exception and try to allocate(__old_size + n). Perhaps doing that only if __len == max_size().

The bug is exposable with a short program such as:

main()
{
typedef std::string S;
S s; s.resize(2 * (s.max_size() / 3), 'a');
s.push_back('a'); // blows
}

I can't even survey all the places where that happens, I'll

Discussion

  • Logged In: YES
    user_id=36092
    Originator: YES

    Ooops, I left the previous sentence unfinished... I meant: I'll trust the stlport maintainers to know their code :)

     
    • assigned_to: nobody --> dums
    • status: open --> open-accepted
     
  • Logged In: YES
    user_id=1096600
    Originator: NO

    Thanks for trusting us but it is not really necessary. This problem simply do not apply for node oriented containers. In fact only vector/string containers are not node oriented so this problem is limited to those 2 containers.

    Moreover I don't think we need to catch bad_alloc exception and allocate less memory in this case. If you do so you will really consume all available memory making the system almost unusable. I prefer STLport to throw bad_alloc exception when it is still possible to work with it.

    So the only problem left is the wrong size computation for containers containing numerous values. Once again this is not so easy to reproduce. For instance I am working on a system with only 500 Mo Ram, I have bad_alloc exception before I am able to fall on the size computation issue. Of course we still have to fix it but it is not as critical as it might seem.

     
  • Logged In: YES
    user_id=36092
    Originator: YES

    Re: catching bad_alloc, I do agree. Re: reproducible, I've been able to obtain a segfault on a machine with > 4GB of ram. Re: severity: it is a nuisance, but I agree it's fairly circumstancial. If it's just vector and string, then it's easy to fix. That's good to know. (There is no doubling in _hashtable, though?)

     
  • Logged In: YES
    user_id=615813
    Originator: NO

    Well, what about change doubling (in places where it happens) to following algorithm?

    if sizeof(T) > PAGE_SIZE
    allocate( old_size + n );
    else
    if old_size < PAGE_SIZE
    allocate( old_size + max(old_size, n) );
    else
    allocate( old_size + (n / PAGE_SIZE + 1) * PAGE_SIZE );

    But I'm not sure, because this is a bit like what deque has. In any case some performance measure should be made.

     
    • status: open-accepted --> pending-accepted
     
  • Logged In: YES
    user_id=1096600
    Originator: NO

    I apply your fix. The major problem of the modif is that I made a _M_compute_next_size method that contains your proposed computation algo and I used it everywhere. Doing so result in having a length_error exception each time container is supposed to be larger than max_size. C++ Standard is not clear on exception specifications so I don't know if it is bad...

    I also discover several other issues in basic_string implementation like this one for instance in basic_string::insert method, previous code was:

    if (size() > max_size() - __s.size())
    this->_M_throw_length_error();

    The problem of this test is that __s.size() might be higher than this->max_size() so max_size() - __s.size() might be negative which, if transformed in an unsigned value, gives a large value. So I fix it like this:

    if (__s.size() > max_size() - size())
    this->_M_throw_length_error();

    this->size() is necessarily lower or equal to this->max_size() so there is no problem.

    About resize algo, your algo will also delay emission of bad_alloc up to a point where memory will be very limited making management of the bad_alloc exception almost impossible. But maybe there systems better than others on this point. What I saw is that on my windows XP if I try to allocate a big memory chunk the system is like frozen, I expect Linux to be better in this situation.

    About

     
    • status: pending-accepted --> closed-accepted
     
  • Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).