|
From: Deborah G. <gol...@ap...> - 2005-10-31 22:14:40
|
I'm looking at implementing a tokenization function with ICU, based on the boundary analysis code. One thing I've noticed is that there is considerable overhead for batch-finding boundaries due to the overhead of a function call per boundary. Would it make sense to add an API to batch-find boundaries (i.e., return an array of them rather than one at a time)? Do the boundary APIs seem like the best approach for tokenization? Thanks, Deborah |