From: Carsten B. <ca...@gm...> - 2012-08-05 12:52:49
|
Hi, I've got another question regarding GDL and substitution: How do I match and replace two subsequent identical characters? What I want to do is basically this: Consonant Same_Consonant > Consonant Gemination_Diacritic Since according to the documentation it's not possible to use referencing with @ or $ on the LHS, I'm a little stumped. Is this maybe solvable with some flag, or a recursive rule? Regards Carsten -- Currently developing Tagāti Book G: http://bit.ly/Ml5xb8 |
From: Sharon C. <sha...@si...> - 2012-08-05 19:30:41
|
There's not really a good way built into the language--normally you'd have to list all the pairs using separate rules. But here's another idea: table(glyph) g_b {type = 1}; g_c {type = 2}; g_d {type = 3}; g_f {type = 4}; etc. Consonant = (g_b, g_c, g_d, g_f, ...); endtable; table(sub) Consonant Consonant > Consonant Gemination_Diacritic / _ {type == @2.type} _ ; endtable; Actually making a set of separate rules is probably slightly more efficient--using a finite state machine to match rules is more efficient than testing the glyph attribute. But maybe the code isn't as neat. Although either way you have to make a long list. On 8/5/2012 7:52 AM, Carsten Becker wrote: > Hi, > > I've got another question regarding GDL and substitution: How do I match > and replace two subsequent identical characters? What I want to do is > basically this: > > Consonant Same_Consonant > Consonant Gemination_Diacritic > > Since according to the documentation it's not possible to use > referencing with @ or $ on the LHS, I'm a little stumped. Is this maybe > solvable with some flag, or a recursive rule? > > Regards > Carsten > |
From: Shriramana S. <sa...@gm...> - 2012-08-06 03:01:38
|
Hi Sharon (or Martin and others), On Sun, Aug 5, 2012 at 6:22 PM, Carsten Becker <ca...@gm...> wrote: > Since according to the documentation it's not possible to use > referencing with @ or $ on the LHS Is there a prohibitive reason by which it is not possible to use referencing in the context? I seem to have encountered a situation similar to Carsten's some time back... @Carsten: I think it would be $ and not @ too which references the whole glyph and not just its index. -- Shriramana Sharma |
From: Martin H. <mar...@si...> - 2012-08-06 03:39:47
|
Dear Shriramana, > Is there a prohibitive reason by which it is not possible to use > referencing in the context? I seem to have encountered a situation > similar to Carsten's some time back... There is nothing to stop us making a decision to support this. But the cost would be considerable and outweighs the benefit. The regular expression engine is deterministic and adding back references would cause a radical change involving rewriting nearly everything in order to turn the regular expression non deterministic. One option might be to add a slot attribute .glyphid or somesuch that could be tested: cons > geminate / cons _ {glyphid == @1.glyphid}; But that would take a while to appear (since it would have to be added to the compiler and the engine and then trickle down via releases) and be pretty slow, so perhaps you would be better off with doing something like: #define GEM(x, y) x > y / x _ GEM(g_a, g_a_gem); GEM(g_b, g_b_gem); etc. That will run much faster than any constraint test would. Yours, Martin |
From: Shriramana S. <sa...@gm...> - 2012-08-06 07:04:21
|
Hi Martin -- thanks for your reply. On Mon, Aug 6, 2012 at 9:09 AM, Martin Hosken <mar...@si...> wrote: > One option might be to add a slot attribute .glyphid or somesuch that could be tested: > cons > geminate / cons _ {glyphid == @1.glyphid}; Another option might be to allow some kind of foreach construct in the language. But perhaps that would be somewhat deviating from the existing language model. And without LHS referencing, it still would not be useful for much more than geminates. (Neither would glyphid slot attributes.) Although, honestly I cannot imagine what real case other than geminates would need to be supported. In my Tamil Brahmi GDL, I had to do 18 series of: KA + clsVowelSigns -> KA-series NGA + clsVowelSigns -> NGA-series and so on. (But that was because the glyph developer had made pre-composed glyphs which I realized later are not necessary.) But if GDL had for and treated classes like arrays, we could always use a two dimensional array: for x = 1 to 18: for y = 1 to 12: cons[x] vs[y] > syllable[x][y] ;-) > #define GEM(x, y) x > y / x _ > GEM(g_a, g_a_gem); > GEM(g_b, g_b_gem); > etc. Well if it came to typing everything out, this is only a little less cumbersome than the full version. I'm sure right now we can use Python (or some such scripting language) to output the requisite statements using *its* for loop. -- Shriramana Sharma |
From: Carsten B. <ca...@gm...> - 2012-08-06 07:29:07
|
Am Mo 06 Aug 2012 09:03:54 CEST schrieb Shriramana Sharma: > Well if it came to typing everything out, this is only a little less > cumbersome than the full version. I typed it all out in the end. And while that certainly looks more cumbersome than a few lines of determining character properties on the fly, it's still just 30-odd lines like the one below, so that's entirely manageable. U+0070 AnyVirama U+0070 > @1:(1 2) _ U+0313; // PA (key: p) Carsten -- Currently developing Tagāti Book G: http://bit.ly/Ml5xb8 |