#161 Edge case: string().find(string())

closed-fixed
2
2008-05-11
2008-01-16
howaradr
No

See discussion in thread started with https://sourceforge.net/forum/message.php?msg_id=4723253

Summary thereof:

>>> ISO/IEC 14882-2003 extract
21.3.6.1 basic_string::find [lib.string::find]
size_type find(const basic_string<charT,traits,Allocator>& str,
size_type pos = 0) const;
Effects: Determines the lowest position xpos, if possible, such that both of the following conditions
1
obtain:
— pos <= xpos and xpos + str.size() <= size();
— at(xpos+I) == str.at(I) for all elements I of the string controlled by str.
Returns: xpos if the function can determine such a value for xpos. Otherwise, returns npos.
>>> extract ends

Hence the standard says basic_string::find() returns 0 if str.size()==0 and pos==0.

STLport returns npos in this case.

The relevant code in SVN Trunk version 3388 of stlport\stl\_string.c is below. It is a different overload of find but is used by STLport to implement the overload at issue. It returns npos if this->size()==0, regardless of __n and __pos.

517 template <class _CharT, class _Traits, class _Alloc> __size_type__
518 basic_string<_CharT,_Traits,_Alloc> ::find(const _CharT* __s, size_type __pos,
519 size_type __n) const {
520 const size_t __len = size();
521 if (__pos >= __len || __pos + __n > __len)
522 return npos;

The following addition to the implementation of void StringTest::find() in test\unit\string_test.cpp would form a test case:
CPPUNIT_ASSERT( string().find(string())==0 );

At present there is no test of this case in SVN Trunk StringTest::find().

Discussion

  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-16
    • assigned_to: nobody --> complement
    • status: open --> closed-invalid
     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-16

    Logged In: YES
    user_id=615813
    Originator: NO

    You ignore "if the function can determine such a value for xpos" of Standard.

    For size 0 "at(xpos+I) == str.at(I)" can't be accomplished. So I don't see standard
    fault, and I treat you case as "can't determine such a value".

    string s;

    string::size_type p = s.find( "", 0, 0 );

    ...

    if ( p != string::npos ) { // normal
    char ch = s[p]; // Arghhhhhhhhhh
    }

     
  • Ulrich Eckhardt

    Ulrich Eckhardt - 2008-01-16

    Logged In: YES
    user_id=1522877
    Originator: NO

    Yes, this is indeed a valid bug.

    However, I would go one step further: I'd say the standard says it returns 'pos' if the searched sequence 'str' is empty and 'pos' is a valid index.

    Concerning the danger of returning the invalid index 0 from 'string("").find("")', this is just another case that are N+1 places where you can find the empty string in a string of N elements, e.g. for a 1-element string it can be found at position 0 and 1. Obviously, this presents a danger, because position 1 is not part of the string, but that is a problem with the standard and if user actually uses that index, they are lost anyway. OTOH, if they use 'str.size()' indices starting at the returned position, this works, because 'str.size()==0'.

    Further, look at the definition of e.g. substr(): even there, it requires 'pos <= size()' and _not_ 'pos < size()'. IOW, this behaviour is consistent throughout the string class.

    BTW: Petr, it would have been nice if you had not downright closed the bugreport only because _you_ disagree. This is just respectless.

    Uli

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-16

    Logged In: YES
    user_id=615813
    Originator: NO

    Again,, not a bug. Remain closed.

    Reason:

    <snip>
    21.3.6.1 basic_string::find [lib.string::find]
    size_type find(const basic_string<charT,traits,Allocator>& str,
    size_type pos = 0) const;
    Effects: Determines the lowest position xpos, if possible, such that both of the
    following conditions obtain:
    - pos <= xpos and xpos + str.size() <= size();
    - at(xpos+I) == str.at(I) for all elements I of the string controlled by str.

    Returns: xpos if the function can determine such a value for xpos. Otherwise, returns npos.

    Notes: Uses traits::eq().
    </snip>

    substr() isn't argument: it hasn't chance to return invalid position like here:

    string s;

    string::size_type p = s.find( "", 0, 0 );

    ...

    if ( p != string::npos ) { // normal
    char ch = s[p]; // Arghhhhhhhhhh
    }

    Reference 'like search', not argument, because for search

    <snip>
    Effects: Finds a subsequence of equal values in a sequence.
    Returns: The first iterator i in the range [first1, last1 - (last2 - first2)) such that for
    any non-negative integer n less than last2 - first2 the following corresponding conditions hold:
    *(i + n) == *(first2 + n), pred(*(i + n), *(first2 + n)) != false.
    Returns last1 if no such iterator is found.
    </snip>

    requirements a bit differ from string's one (return iterator, that may be end(), not integer).

     
  • Francois Dumont

    Francois Dumont - 2008-01-17

    Logged In: YES
    user_id=1096600
    Originator: NO

    Well, I still have to study the Standard by myself but what I can say for the moment is that the code you post Petr is not a problem:

    if ( p != string::npos ) { // normal
    char ch = s[p]; // Arghhhhhhhhhh
    }

    There is no Arghhhhhh here. You know that string()[0] is valid and returns the trailing null character. I am even surprise that for the moment string().at(0) throws and exception, I will check that too.

    Whatever, as signal howaradr, this is an edge case. The best would be to see how other implementations do behave and do the same. We already know how MSVC implementation behave, we only need to check how GLibC++ behaves and if they agree do the same. Otherwise we will have to take our own decision.

     
  • Francois Dumont

    Francois Dumont - 2008-01-17
    • status: closed-invalid --> open-invalid
     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-17

    Logged In: YES
    user_id=615813
    Originator: NO

    > There is no Arghhhhhh here. You know that string()[0] is valid and returns
    > the trailing null character. I am even surprise that for the moment
    > string().at(0) throws and exception, I will check that too.

    Nice. Out-of-string-bound and presence of trailing 0 (implementation specific, we had
    another variant few time ago) is fine?!

    Just say to me, what the _last_ char in 'string s( "1234" );', 0x34 or 0x00?
    And then, _last_ char in 'string s( "" );'?

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-17
    • status: open-invalid --> closed-invalid
     
  • Ulrich Eckhardt

    Ulrich Eckhardt - 2008-01-17

    Logged In: YES
    user_id=1522877
    Originator: NO

    Francois, I don't agree with your reasoning. The problem is that the code in question was already broken:

    p = x.find(y);
    if(p != string::npos) {
    char c = x[p];
    }

    The problem here is the assumption that y's size is at least one. In fact the successful find() call means that y.size() elements of x from the returned position all match the content of y, so instead of calling x[p] you could as well call y[0], which are both wrong without checking against the according size.

    So, the correct code is this:

    p = x.find(y);
    if(p != string::npos) {
    for( i=0; i!=y.size(); ++i)
    whatever(x[p+i]);
    }

    Or just check against an empty sequence before searching for it because that typically doesn't make any sense in the application logic. Still, returning zero from find() is not unsafe, except for code that is unsafe anyway.

    Hmmm, there is another reason that s[s.size()] or s.at(s.size()) should not be called: it could well be that this is used on a non-const string, and in that case it is never correct, IIRC.

    Uli

     
  • howaradr

    howaradr - 2008-01-17

    Logged In: YES
    user_id=1982193
    Originator: YES

    g++ gives this behaviour:

    $ cat a.cpp
    #include <string>
    #include <iostream>

    int main()
    {
    std::cout<<std::string().find(std::string());
    }

    $ g++ --version
    g++ (GCC) 3.4.6
    Copyright (C) 2006 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    $ g++ a.cpp -o a

    $ ./a
    0

    Note also the RogueWave STL now under apache has the following test case for string::find in stdcxx-4.2.0/tests/strings/21.string.find.cpp

    // exercises:
    // find (const basic_string&)
    static const StringTestCase
    cstr_test_cases [] = {

    #undef TEST
    #define TEST(str, arg, res) { \ __LINE__, -1, -1, -1, -1, -1, \ str, sizeof str - 1, \ arg, sizeof arg - 1, 0, res, 0 \ }

    // +------------------------------------ controlled sequence
    // | +---------------------- sequence to be found
    // | | +----- expected result
    // | | |
    // | | |
    // V V V
    TEST ("ab", "a", 0),

    TEST ("", "", 0),
    ...

    The Roguewave test-bed now under apache looks quite extensive and interesting.

    I have already mentioned that Microsoft VC6 and VC7 also return 0 in this case in the related forum thread. I believe I have also shown there that the standard unambiguously requires a 0 return.

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-17

    Logged In: YES
    user_id=615813
    Originator: NO

    Discussion moved to stlport-devel mailing list. Stop here.

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-17
    • status: closed-invalid --> closed-later
     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-27
    • priority: 5 --> 2
    • status: closed-later --> open-accepted
     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-01-27

    Logged In: YES
    user_id=615813
    Originator: NO

    People near Standard commete has opinion opposite to my. Even if it looks like bogus behaviour for me, it should be fixed.

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-05-11

    Logged In: YES
    user_id=615813
    Originator: NO

    In trunk, r3530.

     
  • Petr Ovtchenkov

    Petr Ovtchenkov - 2008-05-11
    • status: open-accepted --> closed-fixed
     

Log in to post a comment.