From: Richard G. <rg...@us...> - 2010-09-24 22:07:14
|
Dear icu-support-- I have a task I'd like to be able to use the ICU regex facility to perform, but I'm not sure if it's possible or how to go about it, and was hoping some of the ICU experts here could give me some advice. I have a large list of strings-- over 10 million, generally shorter than 15 characters or so-- and I want to find all the strings in that list that match a given regular expression. I could do this by passing each of them, one at a time, to a RegexMatcher, but this is likely to be extremely slow. The list is stored as a trie. What I'm after is some way of telling not just whether a given string matches a regex, but whether there are any strings that *begin with* a given string that match the regex. For example, in addition to "Does 'bl' match this regex?" I need to be able to ask "Does any string beginning with 'bl' match this regex?" I'm not seeing anything in the API docs that sounds like it'll do this. If I'm right about this, is there a way I could go about cannibalizing the current ICU code to implement this, maybe with an eye toward including it in a future version of ICU? Thanks for whatever help you can give... --Rich Gillam IBM |