#24 Buffer overflow error

RegexKitLite 2.2

I have been receiving semi-frequent U_BUFFER_OVERFLOW_ERROR

I have tracked it down to a tmp buffer being too small. An example case is below:

NSString *text = @"{{see|Aquila}} <h2>Italian</h2> <h3>Noun</h3> {{it-noun|aquil|f|a|e}} # [[eagle]] <h4>Derived terms</h4> *[[aquila arpia]] *[[aquila del Bonelli]] *[[aquila gigante della Nuova Zelanda]] *[[aquila di Haast]] *[[aquila imperiale]] *[[aquila di mare]] *[[aquila di mare a coda bianca]] *[[aquila di mare di Steller]] *[[aquila di mare della testa bianca]] *[[aquila pescatrice africana]] *[[aquila pescatrice del Madagascar]] *[[aquila reale]] *[[aquila spiegata]] *[[aquila urlatrice]] <h3>Proper noun</h3> <i>Aquila</i> # See [[L'Aquila]] ---- <h2>Latin</h2> <h3>Noun</h3> {{la-noun|aquila|aquilae|aquilae|f|first}} # An [[eagle]] # An eagle as the [[standard]] carried by a [[Roman]] [[legion]]. <h4>Descendants</h4> [[Category:la:Birds]] [[zh-min-nan:aquila]] [[co:aquila]] [[de:aquila]] [[el:aquila]] [[fr:aquila]] [[gl:aquila]] [[ko:aquila]] [[hy:aquila]] [[io:aquila]] [[it:aquila]] [[la:aquila]] [[lt:aquila]] [[hu:aquila]] [[ja:aquila]] [[no:aquila]] [[oc:aquila]] [[pl:aquila]] [[ru:aquila]] [[fi:aquila]] [[sv:aquila]] [[tr:aquila]] [[uk:aquila]]";
NSString *regx = @"(\\[{2})(.+?)(]{2})";
NSString *repla = @"<a href=\"$2\">$2</a>";

NSString *matca = [text stringByReplacingOccurrencesOfRegex:regx withString:repla];

the problem is that the tempUniCharBufferU16Capacity is to small. a few lines later, there is a check for this, and an increase in the buffer size, but even then the size isn't enough.

static NSString *rkl_replaceString(RKLCacheSlot *cacheSlot, id searchString, NSUInteger searchU16Length, NSString *replacementString, NSUInteger replacementU16Length, NSUInteger *replacedCountPtr, int replaceMutable, id *exception, int32_t *status) {
int32_t resultU16Length = 0, tempUniCharBufferU16Capacity = 0;
UniChar *tempUniCharBuffer = NULL;
const UniChar *replacementUniChar = NULL;
id resultObject = NULL;
NSUInteger replacedCount = 0;

// Zero order approximation of the buffer sizes for holding the replaced string or split strings and split strings pointer offsets. As UTF16 code units.
tempUniCharBufferU16Capacity = (int32_t)(16 + (searchU16Length + (searchU16Length >> 1)) + (replacementU16Length * 2))];

// Buffer sizes converted from native units to bytes.
size_t stackSize = 0, replacementSize = (replacementU16Length * sizeof(UniChar)), tempUniCharBufferSize = (tempUniCharBufferU16Capacity * sizeof(UniChar));


  • John Engelhart

    John Engelhart - 2008-12-10
    • status: open --> closed-fixed
  • John Engelhart

    John Engelhart - 2008-12-10

    I've checked in a fix for this bug.

    From the SVN log:

    Fix for a string replacement bug. In short, if the final replaced text is "complicated" (for some value of complicated), it may exceed the buffer size allocated for the replaced text. The code for rkl_replaceAll() was taken from the ICU uregex_replaceAll() function, and it is supposed to return the size of a buffer that is large enough to hold the entire replaced text. Under certain "complicated" replacement conditions, the returned buffer size will be too small, causing a double U_BUFFER_OVERFLOW_ERROR, which should only happen once (the first time). This check in fixes the bug in the original ICU uregex_replaceAll and now correctly calculated the required buffer size to hold all of the replacement text. Bug: http://sourceforge.net/tracker/index.php?func=detail&aid=2408447&group_id=204582&atid=990188 ICU bug: http://bugs.icu-project.org/trac/ticket/6656


    Direct URL for updated RegexKitLite.m: http://regexkit.svn.sourceforge.net/viewvc/regexkit/RegexKitLite/RegexKitLite.m?revision=35

    diff result:

    [johne@LAPTOP_10_5] RegexKitLite% svn diff
    Index: RegexKitLite.m
    --- RegexKitLite.m (revision 34)
    +++ RegexKitLite.m (working copy)
    @@ -711,8 +711,9 @@

    // Modified version of the ICU libraries uregex_replaceAll() that keeps count of the number of replacements made.
    static int32_t rkl_replaceAll(RKLCacheSlot *cacheSlot, const UniChar *replacementUniChar, int32_t replacementU16Length, UniChar *replacedUniChar, int32_t replacedU16Capacity, NSUInteger *replacedCount, id *exception, int32_t *status) {
    - NSUInteger replaced = 0;
    - int32_t u16Length = 0;
    + BOOL bufferOverflowed = NO;
    + NSUInteger replaced = 0;
    + int32_t u16Length = 0;
    RKLCDelayedAssert((cacheSlot != NULL) && (replacementUniChar != NULL) && (replacedUniChar != NULL) && (status != NULL), exception, exitNow);

    uregex_reset(cacheSlot->icu_regex, 0, status);
    @@ -721,10 +722,17 @@
    // http://sourceforge.net/tracker/index.php?func=detail&aid=2105213&group_id=204582&atid=990188
    if((cacheSlot->setToLength == 0) && (*status == 8)) { *status = 0; }

    + // This loop originally came from ICU source/i18n/uregex.cpp, uregex_replaceAll.
    + // There is a bug in that code which causes the size of the buffer required for the replaced text to not be calculated correctly.
    + // This contains a work around using the variable bufferOverflowed.
    + // ICU bug: http://bugs.icu-project.org/trac/ticket/6656
    + // http://sourceforge.net/tracker/index.php?func=detail&aid=2408447&group_id=204582&atid=990188
    while(uregex_findNext(cacheSlot->icu_regex, status)) {
    u16Length += uregex_appendReplacement(cacheSlot->icu_regex, replacementUniChar, replacementU16Length, &replacedUniChar, &replacedU16Capacity, status);
    + if(*status == U_BUFFER_OVERFLOW_ERROR) { bufferOverflowed = YES; *status = 0; }
    + if((*status == 0) && (bufferOverflowed == YES)) { *status = U_BUFFER_OVERFLOW_ERROR; }
    u16Length += uregex_appendTail(cacheSlot->icu_regex, &replacedUniChar, &replacedU16Capacity, status);

    if(replacedCount != 0) { *replacedCount = replaced; }


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks