From: Mark D. <mar...@jt...> - 2005-05-27 16:38:24
|
MessageThanks. Yes, you remember right. I don't think there is any need to have syntax = like ::[:Lu:]BEGIN; we can just use the normal syntax that is currently = used, as in your second example. I agree with you that begin/end is the = way to go, even if some of the more advanced uses for it are not = implemented in a first pass. I'll sketch it out a bit for others; this also has relevance to your = question #1. A. In the general case, one can currently have: ::filter1; ::translit1 (rev1); ::translit2 (rev2); rule1a; ... rule1z; ::translit3 (rev3); ::translit4 (rev4); ::filter2; B. When you build a transliterator from that, you actually build 2 = compounds, one in each direction. Number 1 is: filter1 ; translit2, translit2; translit-temp; translit3; translit4; and the reverse one is filter2 ; rev4; rev3; translit-temp-rev; rev3; rev1 where translit-temp is built from rule1a..rule1z, and=20 translit-temp-rev is built from reverse(rule1z)...reverse(1a) But we'll concentrate one direction in the following, for simplicity.=20 C. What we had envisioned was is that if you put a translit in the = middle of rules, that it cut them into two pieces. So that=20 =20 ::filter1; ::translit1 (rev1); ::translit2 (rev2); rule1a; ... rule1m; ::translitA; rule1n; ... rule1z; ::translit3 (rev3); ::translit4 (rev4); that that would produce: filter1; translit2, translit2; translit-temp1; translitA; = translit-temp2; translit3; translit4; Note that this is recursive: each of the translits can be itself a = compound, with filters. D. Having begin/end would simply act exactly as if we had pulled out all = the rules between them, and made a temporary "translit-TempA". Thus ::filter1; ::translit1 (rev1); ::translit2 (rev2); rule1a; ... rule1m; ::begin; ::filter3; ::translit5; ::rule2a; ... ::rule2z; ::end; rule1n; ... rule1z; ::translit3 (rev3); ::translit4 (rev4); that that would produce: filter1; translit2, translit2; translit-temp1; translit-TempA; = translit-temp2; translit3; translit4; Where translit-TempA was exactly what we would have gotten by a separate = file ::filter3; ::translit5; ::rule2a; ... ::rule2z; E. Notice that this means that embedding is almost, but not quite the = same as separating. Filter1 applies to the whole sequence of following = actions in the file. And if you embed, the same is true. That is: rule0; ::begin :: filter1 :: rule1; ::begin :: filter2; :: rule2; ::end ::rule3; ::end rule4; produces a compound that looks like: (rule1) (filter1 (rule1) (filter2 = rule2) (rule3) ) (rule4) where the parentheses enclose a compound. This would be different than = just separation: (rule1) (filter1 (rule1)) (filter2 rule2) (rule3) (rule4) F. > 1) They might be an argument for loosening the restriction on = having ID calls inside ::BEGIN and ::END (can CompoundTransliterators = nest?). Yes, that is what we envisioned, as in C above. G. > 2) I'd probably be inclined to implement the named blocks by = registering them with the framework-- otherwise, you have to maintain a = second name registry in TransliteratorParser (and have = TransliteratorRuleParser get access to it to make the $ syntax work). = Is it worth the extra effort to have the namespace for named blocks be = local? We tossed that back and forth. I think it would be fine to use the same = kind of hack that Java uses for anonymous inner classes, eg. ID: any-foo ... ::begin "internal1" ... ::end creates a registered id called any-foo$internal1. We had wanted to put = all the named ones at the top, so that it was clear that they were not = part of the overall flow. =E2=80=8EMark ----- Original Message -----=20 From: Richard T. Gillam=20 To: icu...@li...=20 Sent: Friday, May 27, 2005 08:34 Subject: RE: [icu-design] API Proposal: Multiple passes in RBT rules Mark-- Thanks as always for your insightful comments. Re filters: I hadn't really thought about filters. If I remember right, you can = have filters in two places in a normal set of rules: a global filter at = the beginning (and/or a reverse global filter at the end) and a filter = on an individual ID rule. With ::BEGIN/::END, I think these would = devolve to the same thing as far as any ::BEGIN/::END blocks are = concerned: In other words, abc > xyz; ::[:Lu:]BEGIN; ABC > XYZ; DEF > ZYX; ::END def > zyx; would be equivalent to abc > xyz; ::BEGIN; ::[:Lu:]; ABC > XYZ; DEF > ZYX; ::END; def > zyx; Of the two, I'd be more inclined to allow people to stick filters on = the BEGIN, but I could go either way. Mostly, though I'm wondering = whether this buys us anything. Since the inner set of rules is = specified inline at the call site, and since the inner set of rules = can't (currently) include any ID rules of its own, you could just have = the left-hand sides of the inner rules operate on the characters they = should operate on. Using a filter would be syntactic sugar. Am I = misunderstanding something here? Re nesting: Does nesting buy us anything? What would it mean? Consider the = following example: rule1; ::BEGIN; rule2; rule3; ::END; rule4; In my proposal, this means: - Go through the whole string and apply rule1 wherever it applies. - Go back to the beginning, then go through the whole string and apply = rule2 and rule3 wherever they apply (if they have overlapping matches, = the normal behavior applies-- a match earlier in the string wins, and if = both rules match at the same place, rule2 wins). - Go back to the beginning again, and apply rule4 wherever it applies. If we have nesting, this seems like it'd mean something like: - Apply rule1. - Go back to the beginning and apply rule2 and rule3 to the whole = string. - THEN RESUME WHERE YOU LEFT OFF and apply rule4 to the remainder. But what does "resume where you left off" mean? How would you know = when/where to do the BEGIN/END block relative to rule1 and rule4? One = possibility might be to apply the rules from the inside out-- first do = rule2 and rule3, then go back to the beginning and do rule1 and rule4, = but this doesn't seem intuitive. So is there another meaning of nesting you had in mind? If not, = levels of nesting are irrelevant: rule1; ::BEGIN; rule2; ::BEGIN; rule3; ::END; rule4; ::END; rule5; is exactly the same as ::BEGIN; rule1; ::END; ::BEGIN; rule2; ::END; ::BEGIN; rule3; ::END; ::BEGIN; rule4; ::END; ::BEGIN; rule5; ::END; . ..which is what my current implementation of toRules() will print = out. (Some of the ::BEGINs and ::ENDs are redundant: You can actually = express the same thing without the ::BEGIN and ::END around rules 1, 3, = and 5.) I could, of course, make the syntax more regular by requiring that = normal conversion rules always have to appear inside ::BEGIN and ::END, = but this breaks backward compatibility and makes everything more = verbose. Re named blocks: I was going to argue here more or less the same way I was arguing = against the filter thing: How often would I want to reset to the = beginning of the string and apply some set of rules more than once? If = you did it more than once with different filters, it might make a little = more sense, but even then, what it mainly gives you is a decrease in = verbosity. But I didn't know (or had forgotten) about the ability to call a = transliterator as part of the right-hand side of a conversion rule ("a = $1 > b &any-tamil($1) ;"). Then you're applying the rules to a = different string, and you might want to use the same set of rules in = multiple places. This seems potentially very useful, and it seems like = a good argument for the ::BEGIN/::END syntax (people can always use = "::Null;" as a separator anyway). I'm with you that doing that at the same time I'm doing everything = seems like biting off more than I can chew, but I think it's worth using = ::BEGIN/::END syntax to maintain forward compatibility with this. It = also solves the filter problem, since you could define the rule set and = give it a name and then apply filters to the name the same way you can = with any other ID call. Two other thoughts on named blocks: 1) They might be an argument for = loosening the restriction on having ID calls inside ::BEGIN and ::END = (can CompoundTransliterators nest?). 2) I'd probably be inclined to = implement the named blocks by registering them with the framework-- = otherwise, you have to maintain a second name registry in = TransliteratorParser (and have TransliteratorRuleParser get access to it = to make the $ syntax work). Is it worth the extra effort to have the = namespace for named blocks be local? Thoughts? --Rich |