From: Ian S. <ian...@oz...> - 2008-05-25 08:22:39
|
Javier Kohen wrote: > Hi Ian, > > El dom, 25-05-2008 a las 09:19 +1000, Ian Stewart escribió: > > >>> One of the g_utf8_strlen calls is back. I found that I still need to >>> test for a zero length string at the start of jump_letter_table, >>> otherwise it loops forever. The ones within the loop through the >>> string are gone for good though. >>> > > [..] > > >>> I've made the change suggested to the loop building up the >>> mhod53_entry list. I didn't like having that extra bit of code after >>> exiting the loop, but couldn't think of another way of doing it. >>> >>> >> Sorry, I spoke too soon about the g_utf8_strlen functions. The other one >> is back, otherwise it goes into an endless loop if there are no >> alphanumerics in the string. >> >> Perhaps someone with a better understanding than me of the way the UTF8 >> strings and functions work can suggest a better way of doing this. >> > > Calling strlen in a loop makes your code at least O(n^2). > > Fortunately UTF-8 works like plain ASCII C strings in the sense that the > NUL character should only appear at the end of the string. Thus in this > case it suffices to check for *p == '\0'. > > I would rewrite the beginning as follows (not tested but small, faster > and better self-documenting): > > + g_assert (g_utf8_validate (p, -1, NULL)); > + > + bool found_alnum_chars = false; > + while (*p != '\0') { > + chr = g_utf8_get_char (p); > + if (g_unichar_isalnum (chr)) > + { > + found_alnum_chars = true; > + break; > + } > + p = g_utf8_find_next_char (p, NULL); > + } > + if (!found_alnum_chars) { > + return '0'; > + } > > I haven't looked up the coding conventions of this project, but return > '0' is much more clear than return (0x0030) if you are handling text. > Remember to do the same replacement at the other return points in this > function. Also, I don't think anyone likes parentheses in return > statements. > > Cheers. > > ------------------------------------------------------------------------ Thanks, that looks much better. I wasn't sure about using '0' rather than 0x0030 as the return value as it's a 16-bit number, but it seems to work OK. I've tested it on my problem cases so far (string starting with Æ and strings with no alphanumerics) and it handled these. Attached is the patch as it stands now. Regards, Ian. |