From: Mark D. <mar...@jt...> - 2005-06-22 00:29:42
|
Those seem reasonable. (I am generally in favor of using separate setters instead of complicated constructors; especially when combined with chaining I think it is clearer -- although chaining is not applicable here.) One question: can you clarify why you want a shallow clone? It would seem like the only reason for cloning is to be thread-safe, but a shallow clon= e doesn't guarantee that as I understand it (and I think thread safety shou= ld be at a higher level in this case anyway). =E2=80=8EMark ----- Original Message -----=20 From: "Andy Heninger" <an...@jt...> To: <icu...@li...> Sent: Tuesday, June 21, 2005 15:32 Subject: Re: [icu-design] API Proposal: Break Iteration and UText for ICU= 4C > Here are some minor tweaks to the proposal, below, for extending Break > Iteration to work with UText. > > 1. (Suggested by Markus) > > void BreakIterator::setText(UText &text); > becomes > void BreakIterator::setText(UText *text); > > This is consistent with all other UText APIs, which are > uniformly passed around by pointer. > > Make it clear in the description that the function is doing > a shallow clone of the supplied UText. > > 2. (Also suggested by Markus) > > void CharacterIterator& BreakIteartor::getUText(UText &fillIn) > becomes > UText *CharacterIterator& BreakIteartor::getUText( > UText *fillIn, UErrorcode &errorCode) > > Again, make it clear that the function is shallow-cloning the > internal UText to produce the result. By the usual UText > scheme for clone and open, a new UText will be allocated > if NULL is passed in. > > > 3. In the C API > > UBreakIterator *ubrk_openUText(type, locale, UText, > status) > > I propose dropping this function, and relying on a > doing a ubrk_open() with no input text specified, followed > by a ubrk_setUText(). > > The problem is that to keep things symmetric, we would really > want to have two flavors of ubrk_openUText, one from rules > and one with a break iterator type, and this starts to blow > up the number of API functions more than I like. > > Also, in practice while actually writing code using break > iterators, I have found that I always create the break iterator > with no text, and then set the text later. Break iterators > are intended to be reused, and doing so naturally tends to > separate creation from setting the text. > > > -- Andy Heninger > hen...@us... > > > Andy Heninger wrote: > > > ICU4C API Proposal for extending Break Iteration to work with UText. > > Expires 6/23/05 > > > > The general idea is to add the necessary functions to allow break > > iteration to work with text input the form of UText in addition to th= e > > existing text forms (CharacterIterator, UChar *, UnicodeString). > > > > > > // > > // Additions and changes to the C++ API: > > // > > > > // > > // Reset the break iterator to operate over the text represented by > > // the UText. The text boundary is reset to the start. > > // > > // Ownership of the UText remains with the caller. The UText need > > // not be preserved after calling this method, but the underlying > > // text itself should not be altered while invoking other break > > // iteration functions over it. > > // > > void > > BreakIterator::setText(UText &text); > > > > > > // > > // getText() is an existing function of BreakIterator. > > // When the original input is supplied as a UText, > > // this function will fail. Because there is no > > // error status available, return a CharacterIterator > > // over an empty string in this case. > > // > > // A possible alternative: do a CharacterIterator implementation > > // that wraps up a UText. > > // > > CharacterIterator& BreakIteartor::getText() > > > > // > > // Get the UText for this break iterator. > > // The caller-supplied UText will be filled in with > > // the requested data. > > // > > // It would be very dangerous to return a reference to the > > // internal live UText because that one is reused forever, > > // across all setText() operations. UTexts are designed to > > // copy efficiently with a shallow UText::clone(). > > // > > void BreakIterator::getUText(UText &ut); > > > > // > > // first() is an existing method of BreakIterator. > > // CharacterIterators, on which the existing BreakIterator > > // implementation is based, can have a non-zero starting > > // index. > > // > > // UText does not have this capability. > > // > > // When switching the internal implementation from > > // CharacterIterator to UText, we may want to think about > > // losing the ability for first() to be non-zero. > > // > > int32_t CharacterIterator::first() > > > > > > // > > // Additions to the C API > > // > > > > // > > // Open a break iterator to operate over a UText. > > // > > // Identical to the existing function ubrk_open(), except that > > // the text is supplied as a UText instead of a UChar* and length. > > // > > UBreakIterator *ubrk_openUText( > > UBreakIteratorType type, > > const char *locale, > > UText *text, > > UErrorCode *status); > > > > > > > > // > > // Reset the break iterator to work with new text. > > // > > // Ownership of the UText remains with the caller. The UText need > > // not be preserved after calling this method, but the underlying > > // text itself should not be altered while invoking other break > > // iteration functions over it. > > // > > void ubrk_setUText(UBreakIterator *bi, > > UText *text, > > UErrorCode *status); > > > > > > > > > > Implementation Considerations: > > > > The RBBI implementation needs to be switched from being based on > > CharacterIterator to being based on UText. It's not conceptually har= d > > or tricky, but the changes are extensive. It's a little worrisome to > > change out the underpinnings of something as heavily used as RBBI, > > replacing it with something brand new and not yet proven, UText. > > > > There are two alternatives that I can think of: > > > > o Do the UText based RBBI implementation, but don't roll it in as > > the main RBBI implementation for ICU 3.4. The new implementation > > would be used only when text was supplied as a UText. > > > > This would probably also include some temporary restrictions on > > the C++ API related to how the C++ class hierarchy would need > > to be arranged. The plain C API could be made to work cleanly. > > > > It would also involve some code bloat from having two copies > > of the RBBI engine. The size isn't huge, though. > > > > o Write a CharacterIterator implementation that wraps up a > > UText. Leave the RBBI engine as-is, working with > > CharacterIterator. > > > > This would be safe, and provide the full RBBI API for UText. > > It would not run as efficiently as a native UText based rbbi > > engine. > > > > > > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=3D7477&alloc_id=3D16492&op=3Dcl= ick > _______________________________________________ > icu-design mailing list > icu...@li... > https://lists.sourceforge.net/lists/listinfo/icu-design > > |