From: Tino D. <ma...@ti...> - 2024-06-14 12:17:20
|
G'day, Questions like these should really go to the whole mailing list, so I've added it. The pipe can handle language variations in a few ways. There is the FST variant, to handle different scripts (e.g. Latin vs. Cyrillic) and false friends, which apertium-oci-fra uses for the _gascon mode. More recently, there is the preferences system, to handle semantic or preferential differences. Both are documented at https://wiki.apertium.org/wiki/Dialectal_or_standard_variation - and the mailing list and IRC can answer further questions. -- Tino Didriksen On Tue, 4 Jun 2024 at 17:44, Aure Séguier <a.s...@lo...> wrote: > Adiu > > Soi Aure Séguier. Contribuissi a l'Apertium occitan dins l'encastre de mon > trabalh al Congrès permanent de la lenga occitana. > > Coma sèm a soscar a i ajustar d'autras varietats (primièr enriquesir > l'occitan aranés, mas mai tard ajustar tanben lo lemosin e lo provençal), > sèm a soscar a la gestion de la varietat de faiçon mai larga. Dins aquel > encastre, ai una question rapòrt a l'analisi morfosintaxica (Hectòr Alòs me > diguèt qu'èras la persona a la quala demandar). > > Es possible de far de règlas de desambiguïzacion especificas a una > varietat ? Per exemple, en gascon, avèm los enonciatius ("que", "ne", etc.) > qu'existisson pas dins las autras varietats. Se cambiam lo sistèma de > gestion de las varietats, serà benlèu pas pus possible d'indicar dins lo > monodix que "que" (enonciatiu) existís sonque en gascon. Riscarà d'èstre > reconegut en lengadocian e de faussar la traduccion. I a tanben d'autres > cases especifics ("de" partitiu que se ditz quasi pas jamai en gascon, mas > totjorn en lengadocian...). > > Se es pas possible de far de règlas especificas a una varietat, es quicòm > que se pòt pensar per l'avenidor ? Se òc, amb quala carga de trabalh e > qualas competéncias ? > > Mercés > -- > Aure SÉGUIER > > Responsabla del pòle informatic > > Congrès permanent de la lenga occitana > > > > [image: mobilePhone] +33 (0)5 32 00 00 64 <+33%20(0)5%2032%2000%2000%2064> > [image: website] www.locongres.org <//www.locongres.org> > [image: address] La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau > > > > > [image: facebook] <https://www.facebook.com/lo.congres> > > [image: twitter] <https://twitter.com/locongres> > > [image: linkedin] > <https://www.linkedin.com/company/congres-permanent-de-la-lenga-occitane/> > > [image: instagram] <https://www.instagram.com/locongres/> > > > > |
From: Kevin B. U. <unh...@fs...> - 2024-06-15 20:06:04
|
> On Tue, 4 Jun 2024 at 17:44, Aure Séguier <a.s...@lo...> wrote: >> Es possible de far de règlas de desambiguïzacion especificas a una >> varietat ? Per exemple, en gascon, avèm los enonciatius ("que", "ne", etc.) >> qu'existisson pas dins las autras varietats. Se cambiam lo sistèma de >> gestion de las varietats, serà benlèu pas pus possible d'indicar dins lo >> monodix que "que" (enonciatiu) existís sonque en gascon. Riscarà d'èstre >> reconegut en lengadocian e de faussar la traduccion. I a tanben d'autres >> cases especifics ("de" partitiu que se ditz quasi pas jamai en gascon, mas >> totjorn en lengadocian...). If you use the "new" system documented at https://wiki.apertium.org/wiki/Dialectal_or_standard_variation#Overlapping_variants with AP_SETVAR etc., then the variant info is available in all CG files, not just the ones that select bidix/generator choices, but also the disambiguator. So you could have source variant tags as well as target variant. E.g. if you want to say that your source language is gascon, you could export AP_SETVAR='src_gascon' or something like that, and then in CG, if for example "que" is used as a personal pronoun only in Gascon, you could do SELECT pers IF (0 ("que") + (VAR:src_gascon)); REMOVE pers IF (0 ("que")); # not gascon Or you could make it more nuanced and feature-based like export AP_SETVAR='src_que_pers,src_other_feature' SELECT pers IF (0 ("que") + (VAR:src_que_pers)); … (if, say, both Gascon and Bigourdan use que as personal pronoun, but only Gascon has other_feature as well) With this system, the .dix file is more ambiguous, but it's easy to do early removal of irrelevant stuff from CG. |
From: Kevin B. U. <unh...@fs...> - 2024-06-19 08:53:50
|
> Occitan can manage variety in its metadix file. My question is, is > there a way to manage variety in the .rlx file ? There is :) > For instance, we have the word "bad", "evil" which is "mal" in > lengadocian and "mau" en gascon. But "mau" can also be a conjugated > verb (a pretty rare one). I did this rule in the RLX file : REMOVE V > IF (0 ("<mau>"i)); > But I would want this rule not to apply to lengadocian, where "mau" > can only be a conjugated verb. > Is that possible ? If not, is this something easy to implement ? Yes. You could for example say that "src_lengadocian" is the variable that signifies that the source language is lengadocian, and then have one rule that picks the verb if source language is lengadocian: SELECT V IF (0 ("<mau>"i)) (0 (VAR:src_lengadocian)) ; and one that removes it if not: REMOVE V IF (0 ("<mau>"i)) (NEGATE 0 (VAR:src_lengadocian)) ; I can't say for certain if this system makes things simpler or not for you compared to metadix, but it allows for a lot more flexibility, with much shorter compile times (since we have just one compiled FST which contains all the variety). |
From: Aure S. <a.s...@lo...> - 2024-06-19 10:14:49
|
Thanks a lot ! How can I define src_lengadocian as the variable that means the source language is lengadocian ? AureSÉGUIER Responsabla del pòle informatic Congrès permanent de la lenga occitana mobilePhone +33 (0)5 32 00 00 64 <tel:+33 (0)5 32 00 00 64> website www.locongres.org <//www.locongres.org> address La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau facebook <https://www.facebook.com/lo.congres> twitter <https://twitter.com/locongres> linkedin <https://www.linkedin.com/company/congres-permanent-de-la-lenga-occitane/> instagram <https://www.instagram.com/locongres/> Le 19/06/2024 à 10:53, Kevin Brubeck Unhammer a écrit : >> Occitan can manage variety in its metadix file. My question is, is >> there a way to manage variety in the .rlx file ? > There is :) > >> For instance, we have the word "bad", "evil" which is "mal" in >> lengadocian and "mau" en gascon. But "mau" can also be a conjugated >> verb (a pretty rare one). I did this rule in the RLX file : REMOVE V >> IF (0 ("<mau>"i)); >> But I would want this rule not to apply to lengadocian, where "mau" >> can only be a conjugated verb. >> Is that possible ? If not, is this something easy to implement ? > Yes. You could for example say that "src_lengadocian" is the variable > that signifies that the source language is lengadocian, and then have > one rule that picks the verb if source language is lengadocian: > > SELECT V IF (0 ("<mau>"i)) (0 (VAR:src_lengadocian)) ; > > and one that removes it if not: > > REMOVE V IF (0 ("<mau>"i)) (NEGATE 0 (VAR:src_lengadocian)) ; > > > I can't say for certain if this system makes things simpler or not for > you compared to metadix, but it allows for a lot more flexibility, with > much shorter compile times (since we have just one compiled FST which > contains all the variety). > > > > _______________________________________________ > Apertium-stuff mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-stuff |
From: Kevin B. U. <unh...@fs...> - 2024-06-19 13:43:41
|
> How can I define src_lengadocian as the variable that means the source > language is lengadocian ? Hm, it kind of depends. In general, if you use variables, you can do export AP_SETVAR=src_lengadocian echo mau o mal | apertium -d . oci-fra and that variable will be available to the CG as VAR:src_lengadocian If you put it in oci-fra.preferences.xml, it will also show up on the web like the Preferences d'estil button at https://beta.apertium.org/index.cat.html#?dir=cat-spa But maybe these source language differences actually *should* be kept as separate pipelines, and shown as different source languages in the language selector in the web UI? In that case, it might actually be simpler to not do variables at all, and just have a separate CG file with lengadocian rules that runs before the regular CG. So in your oci-fra_lengadocian mode in https://github.com/apertium/apertium-oci-fra/blob/master/modes.xml#L373 instead of <program name="lt-proc -w"> <file name="oc...@le...n"/> </program> <program name="cg-proc -w" debug-suff="disamb"> <file name="oci-fra.rlx.bin"/> </program> you would have the general automorf, but two CG disambiguator steps <program name="lt-proc -w"> <file name="oci-fra.automorf.bin"/> </program> <program name="cg-proc" debug-suff="disamb-lengadocian"> <file name="oc...@le...n"/> </program> <program name="cg-proc -w" debug-suff="disamb"> <file name="oci-fra.rlx.bin"/> </program> and the first CG would just have a few rules for lengadocian-specific stuff. |
From: Xavi I. <xav...@gm...> - 2024-06-22 07:34:28
|
Aperitium spa-cat heavily uses the preferences system to choose features between variants (and even different standards or language styles within the same dialect) by doing exactly what Kevin and Tino propose. The dictionary is tagged with the features, and then different modes apply different cg files (see https://github.com/apertium/apertium-cat/blob/master/apertium-cat.cat_valencia.prefs.rlx and other similar files) that apply those preferences by default. -- Xavi Ivars < http://xavi.ivars.me > El dc., 19 de juny 2024, 15:44, Kevin Brubeck Unhammer <unh...@fs...> va escriure: > > How can I define src_lengadocian as the variable that means the source > > language is lengadocian ? > > Hm, it kind of depends. In general, if you use variables, you can do > > export AP_SETVAR=src_lengadocian > echo mau o mal | apertium -d . oci-fra > > and that variable will be available to the CG as VAR:src_lengadocian > > If you put it in oci-fra.preferences.xml, it will also show up on the > web like the Preferences d'estil button at > https://beta.apertium.org/index.cat.html#?dir=cat-spa > > But maybe these source language differences actually *should* be kept as > separate pipelines, and shown as different source languages in the > language selector in the web UI? In that case, it might actually be > simpler to not do variables at all, and just have a separate CG file > with lengadocian rules that runs before the regular CG. So in your > oci-fra_lengadocian mode in > https://github.com/apertium/apertium-oci-fra/blob/master/modes.xml#L373 > instead of > > <program name="lt-proc -w"> > <file name="oc...@le...n"/> > </program> > <program name="cg-proc -w" debug-suff="disamb"> > <file name="oci-fra.rlx.bin"/> > </program> > > you would have the general automorf, but two CG disambiguator steps > > <program name="lt-proc -w"> > <file name="oci-fra.automorf.bin"/> > </program> > <program name="cg-proc" debug-suff="disamb-lengadocian"> > <file name="oc...@le...n"/> > </program> > <program name="cg-proc -w" debug-suff="disamb"> > <file name="oci-fra.rlx.bin"/> > </program> > > and the first CG would just have a few rules for lengadocian-specific > stuff. > > > > > _______________________________________________ > Apertium-stuff mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > |
From: Aure S. <a.s...@lo...> - 2024-06-18 16:44:40
|
Hi I read the documentation you mentioned but I didn't understood very well. Occitan can manage variety in its metadix file. My question is, is there a way to manage variety in the .rlx file ? For instance, we have the word "bad", "evil" which is "mal" in lengadocian and "mau" en gascon. But "mau" can also be a conjugated verb (a pretty rare one). I did this rule in the RLX file : REMOVE V IF (0 ("<mau>"i)); But I would want this rule not to apply to lengadocian, where "mau" can only be a conjugated verb. Is that possible ? If not, is this something easy to implement ? I need to have an answer to this question in order to know if it would be interesting to change the way we manage variety or not. Thanks AureSÉGUIER Responsabla del pòle informatic Congrès permanent de la lenga occitana mobilePhone +33 (0)5 32 00 00 64 <tel:+33 (0)5 32 00 00 64> website www.locongres.org <//www.locongres.org> address La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau facebook <https://www.facebook.com/lo.congres> twitter <https://twitter.com/locongres> linkedin <https://www.linkedin.com/company/congres-permanent-de-la-lenga-occitane/> instagram <https://www.instagram.com/locongres/> Le 14/06/2024 à 13:09, Tino Didriksen a écrit : > G'day, > > Questions like these should really go to the whole mailing list, so > I've added it. > > The pipe can handle language variations in a few ways. > > There is the FST variant, to handle different scripts (e.g. Latin vs. > Cyrillic) and false friends, which apertium-oci-fra uses for the > _gascon mode. More recently, there is the preferences system, to > handle semantic or preferential differences. > > Both are documented at > https://wiki.apertium.org/wiki/Dialectal_or_standard_variation - and > the mailing list and IRC can answer further questions. > > -- Tino Didriksen > > > On Tue, 4 Jun 2024 at 17:44, Aure Séguier <a.s...@lo...> wrote: > > Adiu > > Soi Aure Séguier. Contribuissi a l'Apertium occitan dins > l'encastre de mon trabalh al Congrès permanent de la lenga occitana. > > Coma sèm a soscar a i ajustar d'autras varietats (primièr > enriquesir l'occitan aranés, mas mai tard ajustar tanben lo > lemosin e lo provençal), sèm a soscar a la gestion de la varietat > de faiçon mai larga. Dins aquel encastre, ai una question rapòrt a > l'analisi morfosintaxica (Hectòr Alòs me diguèt qu'èras la persona > a la quala demandar). > > Es possible de far de règlas de desambiguïzacion especificas a una > varietat ? Per exemple, en gascon, avèm los enonciatius ("que", > "ne", etc.) qu'existisson pas dins las autras varietats. Se > cambiam lo sistèma de gestion de las varietats, serà benlèu pas > pus possible d'indicar dins lo monodix que "que" (enonciatiu) > existís sonque en gascon. Riscarà d'èstre reconegut en lengadocian > e de faussar la traduccion. I a tanben d'autres cases especifics > ("de" partitiu que se ditz quasi pas jamai en gascon, mas totjorn > en lengadocian...). > > Se es pas possible de far de règlas especificas a una varietat, es > quicòm que se pòt pensar per l'avenidor ? Se òc, amb quala carga > de trabalh e qualas competéncias ? > > Mercés > > -- > > > > AureSÉGUIER > > Responsabla del pòle informatic > > Congrès permanent de la lenga occitana > > > > > > mobilePhone > > +33 (0)5 32 00 00 64 <tel:+33%20(0)5%2032%2000%2000%2064> > website > > www.locongres.org <//www.locongres.org> > address > > La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau > > > > > > > facebook <https://www.facebook.com/lo.congres> > twitter <https://twitter.com/locongres> > linkedin > <https://www.linkedin.com/company/congres-permanent-de-la-lenga-occitane/> > > instagram <https://www.instagram.com/locongres/> > > > |