Donate Share

RegexKit

Tracker: Bugs

5 Buffer overflow error - ID: 2408447
Last Update: Comment added ( jengelhart )

I have been receiving semi-frequent U_BUFFER_OVERFLOW_ERROR

I have tracked it down to a tmp buffer being too small. An example case is
below:

NSString *text = @"{{see|Aquila}} <h2>Italian</h2> <h3>Noun</h3>
{{it-noun|aquil|f|a|e}} # [[eagle]] <h4>Derived terms</h4> *[[aquila
arpia]] *[[aquila del Bonelli]] *[[aquila gigante della Nuova Zelanda]]
*[[aquila di Haast]] *[[aquila imperiale]] *[[aquila di mare]] *[[aquila di
mare a coda bianca]] *[[aquila di mare di Steller]] *[[aquila di mare della
testa bianca]] *[[aquila pescatrice africana]] *[[aquila pescatrice del
Madagascar]] *[[aquila reale]] *[[aquila spiegata]] *[[aquila urlatrice]]
<h3>Proper noun</h3> <i>Aquila</i> # See [[L'Aquila]] ---- <h2>Latin</h2>
<h3>Noun</h3> {{la-noun|aquila|aquilae|aquilae|f|first}} # An [[eagle]] #
An eagle as the [[standard]] carried by a [[Roman]] [[legion]].
<h4>Descendants</h4> [[Category:la:Birds]] [[zh-min-nan:aquila]]
[[co:aquila]] [[de:aquila]] [[el:aquila]] [[fr:aquila]] [[gl:aquila]]
[[ko:aquila]] [[hy:aquila]] [[io:aquila]] [[it:aquila]] [[la:aquila]]
[[lt:aquila]] [[hu:aquila]] [[ja:aquila]] [[no:aquila]] [[oc:aquila]]
[[pl:aquila]] [[ru:aquila]] [[fi:aquila]] [[sv:aquila]] [[tr:aquila]]
[[uk:aquila]]";
NSString *regx = @"(\\[{2})(.+?)(]{2})";
NSString *repla = @"<a href=\"$2\">$2</a>";

NSString *matca = [text stringByReplacingOccurrencesOfRegex:regx
withString:repla];


the problem is that the tempUniCharBufferU16Capacity is to small. a few
lines later, there is a check for this, and an increase in the buffer size,
but even then the size isn't enough.

static NSString *rkl_replaceString(RKLCacheSlot *cacheSlot, id
searchString, NSUInteger searchU16Length, NSString *replacementString,
NSUInteger replacementU16Length, NSUInteger *replacedCountPtr, int
replaceMutable, id *exception, int32_t *status) {
int32_t resultU16Length = 0, tempUniCharBufferU16Capacity = 0;
UniChar *tempUniCharBuffer = NULL;
const UniChar *replacementUniChar = NULL;
id resultObject = NULL;
NSUInteger replacedCount = 0;

// Zero order approximation of the buffer sizes for holding the replaced
string or split strings and split strings pointer offsets. As UTF16 code
units.
tempUniCharBufferU16Capacity = (int32_t)(16 + (searchU16Length +
(searchU16Length >> 1)) + (replacementU16Length * 2))];

// Buffer sizes converted from native units to bytes.
size_t stackSize = 0, replacementSize = (replacementU16Length *
sizeof(UniChar)), tempUniCharBufferSize = (tempUniCharBufferU16Capacity *
sizeof(UniChar));


shogunjp ( shogunjp ) - 2008-12-09 02:48

5

Closed

Fixed

John Engelhart

RegexKitLite

RegexKitLite 2.2

Public


Comment ( 1 )

Date: 2008-12-10 12:32
Sender: jengelhartProject Admin

I've checked in a fix for this bug.

From the SVN log:

Fix for a string replacement bug. In short, if the final replaced text is
"complicated" (for some value of complicated), it may exceed the buffer
size allocated for the replaced text. The code for rkl_replaceAll() was
taken from the ICU uregex_replaceAll() function, and it is supposed to
return the size of a buffer that is large enough to hold the entire
replaced text. Under certain "complicated" replacement conditions, the
returned buffer size will be too small, causing a double
U_BUFFER_OVERFLOW_ERROR, which should only happen once (the first time).
This check in fixes the bug in the original ICU uregex_replaceAll and now
correctly calculated the required buffer size to hold all of the
replacement text. Bug:
http://sourceforge.net/tracker/index.php?func=detail&aid=2408447&group_id=204582&atid=990188
ICU bug: http://bugs.icu-project.org/trac/ticket/6656

---

Direct URL for updated RegexKitLite.m:
http://regexkit.svn.sourceforge.net/viewvc/regexkit/RegexKitLite/RegexKitLite.m?revision=35

diff result:

[johne@LAPTOP_10_5] RegexKitLite% svn diff
Index: RegexKitLite.m
===================================================================
--- RegexKitLite.m (revision 34)
+++ RegexKitLite.m (working copy)
@@ -711,8 +711,9 @@

// Modified version of the ICU libraries uregex_replaceAll() that keeps
count of the number of replacements made.
static int32_t rkl_replaceAll(RKLCacheSlot *cacheSlot, const UniChar
*replacementUniChar, int32_t replacementU16Length, UniChar
*replacedUniChar, int32_t replacedU16Capacity, NSUInteger *replacedCount,
id *exception, int32_t *status) {
- NSUInteger replaced = 0;
- int32_t u16Length = 0;
+ BOOL bufferOverflowed = NO;
+ NSUInteger replaced = 0;
+ int32_t u16Length = 0;
RKLCDelayedAssert((cacheSlot != NULL) && (replacementUniChar != NULL)
&& (replacedUniChar != NULL) && (status != NULL), exception, exitNow);

uregex_reset(cacheSlot->icu_regex, 0, status);
@@ -721,10 +722,17 @@
//
http://sourceforge.net/tracker/index.php?func=detail&aid=2105213&group_id=204582&atid=990188
if((cacheSlot->setToLength == 0) && (*status == 8)) { *status = 0; }

+ // This loop originally came from ICU source/i18n/uregex.cpp,
uregex_replaceAll.
+ // There is a bug in that code which causes the size of the buffer
required for the replaced text to not be calculated correctly.
+ // This contains a work around using the variable bufferOverflowed.
+ // ICU bug: http://bugs.icu-project.org/trac/ticket/6656
+ //
http://sourceforge.net/tracker/index.php?func=detail&aid=2408447&group_id=204582&atid=990188
while(uregex_findNext(cacheSlot->icu_regex, status)) {
replaced++;
u16Length += uregex_appendReplacement(cacheSlot->icu_regex,
replacementUniChar, replacementU16Length, &replacedUniChar,
&replacedU16Capacity, status);
+ if(*status == U_BUFFER_OVERFLOW_ERROR) { bufferOverflowed = YES;
*status = 0; }
}
+ if((*status == 0) && (bufferOverflowed == YES)) { *status =
U_BUFFER_OVERFLOW_ERROR; }
u16Length += uregex_appendTail(cacheSlot->icu_regex, &replacedUniChar,
&replacedU16Capacity, status);

if(replacedCount != 0) { *replacedCount = replaced; }



Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2008-12-10 12:32 jengelhart
resolution_id None 2008-12-10 12:32 jengelhart
close_date - 2008-12-10 12:32 jengelhart