You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
(15) |
May
(21) |
Jun
(39) |
Jul
(35) |
Aug
(67) |
Sep
|
Oct
|
Nov
(14) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
(4) |
Mar
(3) |
Apr
(23) |
May
(22) |
Jun
(9) |
Jul
(50) |
Aug
(41) |
Sep
(12) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(3) |
Mar
(4) |
Apr
(4) |
May
|
Jun
|
Jul
(3) |
Aug
(18) |
Sep
(4) |
Oct
(18) |
Nov
(32) |
Dec
(17) |
2014 |
Jan
(4) |
Feb
(14) |
Mar
(10) |
Apr
|
May
(3) |
Jun
(4) |
Jul
(7) |
Aug
(12) |
Sep
(4) |
Oct
|
Nov
(19) |
Dec
(2) |
2015 |
Jan
(3) |
Feb
(3) |
Mar
(5) |
Apr
|
May
(3) |
Jun
(13) |
Jul
(4) |
Aug
(6) |
Sep
(7) |
Oct
(26) |
Nov
(9) |
Dec
(14) |
2016 |
Jan
|
Feb
(33) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
(10) |
Aug
(10) |
Sep
(1) |
Oct
(7) |
Nov
(5) |
Dec
(4) |
2018 |
Jan
(4) |
Feb
(5) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(4) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(8) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Anara T. <t2...@gm...> - 2024-02-26 12:52:52
|
Dear Apertium team, My name is Anara. I am an MA student in Linguistics and Cognitive Studies at the University of Siena (Italy). I am from Kyrgyzstan and a native Kyrgyz speaker and my area of interest includes rule-based NLP. Recently, I have discovered your amazing work on Kyrgyz morphological finite-state transducer which got me very interested in what you are doing. I have found that there is an opportunity to apply for Google Summer of Code in your organization. So, for GSoC 2024 I would like to make a proposal based on the "Develop a prototype MT system for a strategic language pair <https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair>" idea. I have seen that Turkish-Kyrgyz and Kazakh-Kyrgyz pairs are at the nursery stage and have some work to be done (e.g. writing transfer rules). While Tatar-Kyrgyz, Kyrgyz-Uzbek, and English-Kyrgyz pairs are at the incubator stage and need more work to be done. I could also suggest new pairs like Bashkort-Kyrgyz or Altay (Southern)-Kyrgyz which however have to be implemented from scratch. Before starting my coding challenge, I would like to get advice from you, on which language pair needs more attention or has a higher priority for the organization at the moment, or any other advice on choosing a language pair. Among those I have mentioned, I don't have specific preferences and they all sound very exciting to me, but it seems like nursery stage pairs need more attention. I would be very grateful if you could give me some advice with choosing a language pair and with the idea in general:) Hope to hear from you! Best regards, Anara Tyurekanova |
From: Elmurod K. <elm...@gm...> - 2023-01-26 10:03:28
|
Dear Apertium-Turkic community, My name is Elmurod Kuriyozov, about to graduate my PhD in Computational Linguistics at the University of A Coruna (Spain), and a teacher at Urgench State University(Uzbekistan). I have done a GSoC with Apertium once and also collaborated on another one. Recently, the *Agency For Innovative Development Of The Republic Of Uzbekistan *And The *Scientific And Technological Research Council Of Türkiye (TUBITAK)* started an academic collaboration and announced the call for international scientific projects of cooperation between Uzbekistan and Turkey(Turkiye). Our NLP team is looking for collaborators from Turkish universities to collaborate on the project: *"Creating Turkish-Uzbek Machine Translation Models and Tools"*, which, according to the draft we have already created, uses Apertium for the open-source RBMT part, and NMT models will be trained using parallel corpora. Anyone who is interested is kindly asked to contact us soon, as the deadline is approaching. If you know anyone who might be interested, please help us by forwarding this email to them as well(and thank you!). Best regards, Elmurod -- Elmurod Kuriyozov, PhD student in Computational Linguistics University of a Coruna, Spain |
From: Jonathan W. <jon...@gm...> - 2022-03-05 13:49:32
|
Hi, Mikel! Yes, several of us on this list have been collaborating with that group (i.e., have been part of that group), including as coauthors. It's certainly good to publicise what they're doing to this list in case anyone would like to be involved and hasn't been connected yet. Here's their website, which lists some of their recent publications: https://turkicinterlingua.org/ They're also the group leading the initiative to form an ACL SIG (Special Interest Group) for Turkic languages. If anyone is interested in being involved in that effort or in the resulting group, let me know! -- Jonathan On Sat, Mar 5, 2022, 01:40 Mikel L. Forcada <ml...@dl...> wrote: > Did we know about these folks? > https://twitter.com/til_nlp?t=KjwpJ3HBrThzPGkPR3cGMg&s=09 > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > |
From: Mikel L. F. <ml...@dl...> - 2022-03-05 06:39:45
|
Did we know about these folks? https://twitter.com/til_nlp?t=KjwpJ3HBrThzPGkPR3cGMg&s=09 |
From: Ilnar S. <il...@se...> - 2021-06-13 10:29:54
|
-------- Originalnachricht -------- Betreff: Re: [Apertium-turkic] Tatar grammatical forms statistics Datum: 13.06.2021 12:12 Von: iln...@po... An: ape...@li... Am 13.06.2021 11:02 schrieb dinar qurbanov: > ".deps/tat.LR.lexc.hfst" > > where is it? i cannot find it in github. It is a file which is created after you compile apertium-tat. It should not be committed to Github. Ilnar > > 2016-02-23 19:29 GMT+03:00, Jonathan North Washington > <jon...@in...>: >> There is also a way to count morphemes, including bound ones, which >> could be interesting as well. >> >> If you skip just the twol transducer (e.g., use >> .deps/tat.LR.lexc.hfst) and split by > (or %>), this will give you >> what the transducer considers to be separate morphemes (which ignores >> some derivational morphemes, even fairly productive ones, but should >> have all the inflectional ones). >> >> -- >> Jonathan >> >> On 23 February 2016 at 05:06, dinar qurbanov <qd...@gm...> wrote: >>> you have made statistics of tags: >>> >>> http://corpus.tatfolk.ru/stat/tatcorpus2.uniq_tags.pdf >>> >>> . >>> >>> >>> what i asked was a very little other: i suggest to put suffix >>> morphemes with root morphemes ( like this >>> http://corpus.tatfolk.ru/stat/freq_200.pdf ) in one united list. >>> >>> ( so һәм 1630726 would be nearly after 1672959 +да<cnjcoo> ) >>> >>> >>> >>> >>> 2015-09-23 14:12 GMT+03:00 mansur <66...@gm...>: >>>> Hello, Dinar! >>>> >>>> I think, you can make a script to count individual morphemes from >>>> the >>>> list I >>>> made. >>>> >>>> If you don't want to, then let me know. I'll do it when I have some >>>> spare >>>> time. >>>> >>>> With best wishes, >>>> Mansur >>>> >>>> 2015-09-20 23:36 GMT+03:00 dinar qurbanov <qd...@gm...>: >>>>> >>>>> hello. >>>>> >>>>> i would like you make statistics of "individual" morphemes. >>>>> >>>>> that would show "overall" frequencies and most used tatar >>>>> morphemes. >>>>> >>>>> i think you should see that morphemes are real "atoms" of turkic >>>>> languages, not the "classic" "words". >>>>> >>>>> "classical" grammar of words and word forms is not correct, because >>>>> the "words" are not grammatical structures, but just a phonetics >>>>> level >>>>> phenomena. >>>>> >>>>> also, by the way, as far as i see it, apertium also has been going >>>>> by >>>>> "classical" grammar of words and word forms. >>>>> >>>>> you can additionally look for "distributive morphology". also >>>>> "minimalist program", as i assume by the book name and some reviews >>>>> seen by me, maybe something like that. but i could not read it. by >>>>> the >>>>> way cannot somebody give me electronic version of it ("minimalist >>>>> program")? i planned to ask this first in irc, but since i have >>>>> come >>>>> to this topic here, i ask this here. i have written a paper >>>>> partially >>>>> about this topic and i would use "minimalist program" as an example >>>>> of >>>>> previous works / literature / bibliography, if possible . the paper >>>>> does not have any or almost any new discoveries, just popularises >>>>> some >>>>> ideas. it is not published in any journal , you can download it >>>>> here : >>>>> >>>>> http://qdb.wp.kukmara-rayon.ru/2015/04/15/sentence-syntax-trees-should-be-made-from-morphemes-semantically-ordered-trees/ >>>>> . >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2015-06-19 10:28 GMT+03:00 mansur <66...@gm...>: >>>>> > Hey, Francis! >>>>> > >>>>> > We are not planning to participate in TurkLang. >>>>> > >>>>> > If you open the main page of the corpus >>>>> > (http://corpus.tatar/index_en.php) >>>>> > you can see in news section that we work quite actively :) Our team is >>>>> > not >>>>> > big, so process goes slower than we want... >>>>> > >>>>> > About plans... We have tons of them :) One of them: our corpus is >>>>> > annotated >>>>> > morphologically (thanks to Apertium project), but I haven't made the >>>>> > full >>>>> > functional search engine for that yet (you know, search by grammatical >>>>> > categories, distance between words and so on). But it is not very >>>>> > difficult, >>>>> > I'm planning to consult a couple database specialists and make it this >>>>> > year. >>>>> > The most annoying part for me will be developing the popup window in >>>>> > javascript so people could mark checkboxes with the "human readable" >>>>> > names >>>>> > of grammatical categories :) For that we also need description of all >>>>> > tags >>>>> > you use in English, Russian and Tatar languages. >>>>> > >>>>> > Today in morphological aspect we only have search by lemma. But our >>>>> > main >>>>> > purpose wasn't morphology, because first of all we wanted to make our >>>>> > corpus >>>>> > statistically rich (like Leipzig Corpora), it gives more information >>>>> > for >>>>> > linguistic research both in theoretical and applied aspects. We are >>>>> > planning >>>>> > to develop additional types of statistical search for Corpus next >>>>> > year. >>>>> > >>>>> > In the Statistics page of our site you can find more statistical >>>>> > information >>>>> > we gathered during the work. >>>>> > >>>>> > With best wishes, >>>>> > Mansur >>>>> > >>>>> > 2015-06-19 8:14 GMT+03:00 Francis Tyers <ft...@pr...>: >>>>> >> >>>>> >> A 2015-06-19 07:05, mansur escrigué: >>>>> >> > Hello! >>>>> >> > >>>>> >> > Last year we annotated our corpus (http://corpus.tatar/index_en.php >>>>> >> > [1]) using Apertium's tagger for Tatar language (thanks to Ilnar >>>>> >> > for >>>>> >> > help). >>>>> >> > >>>>> >> > A couple of days ago I made a frequecy list of grammatical forms >>>>> >> > found >>>>> >> > in our corpus. I'll just put the link here in case if somebody is >>>>> >> > interested in it: >>>>> >> > >>>>> >> > http://corpus.tatar/stat/tatcorpus2.grammatical_forms.pdf [2] >>>>> >> > http://corpus.tatar/stat/tatcorpus2.grammatical_forms.txt [3] >>>>> >> >>>>> >> Wow, this is really cool! :D >>>>> >> >>>>> >> Thanks for sharing it, and I'm glad that the analyser could be of use >>>>> >> (and thanks to Ilnar for helping you out). >>>>> >> >>>>> >> Are you planning to participate in TurkLang this year ? Do you have >>>>> >> any >>>>> >> further plans for the corpus ? We would be interested in helping out >>>>> >> in >>>>> >> any way possible. >>>>> >> >>>>> >> Regards, >>>>> >> >>>>> >> Fran >>>>> >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >>>>> >> _______________________________________________ >>>>> >> Apertium-turkic mailing list >>>>> >> Ape...@li... >>>>> >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ >>>>> > >>>>> > _______________________________________________ >>>>> > Apertium-turkic mailing list >>>>> > Ape...@li... >>>>> > https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>>> > >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> _______________________________________________ >>>>> Apertium-turkic mailing list >>>>> Ape...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Monitor Your Dynamic Infrastructure at Any Scale With Datadog! >>>> Get real-time metrics from all of your servers, apps and tools >>>> in one place. >>>> SourceForge users - Click here to start your Free Trial of Datadog >>>> now! >>>> http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 >>>> _______________________________________________ >>>> Apertium-turkic mailing list >>>> Ape...@li... >>>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>> >>> >>> ------------------------------------------------------------------------------ >>> Site24x7 APM Insight: Get Deep Visibility into Application >>> Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >>> _______________________________________________ >>> Apertium-turkic mailing list >>> Ape...@li... >>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >> > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic |
From: dinar q. <qd...@gm...> - 2021-06-13 09:02:20
|
".deps/tat.LR.lexc.hfst" where is it? i cannot find it in github. 2016-02-23 19:29 GMT+03:00, Jonathan North Washington <jon...@in...>: > There is also a way to count morphemes, including bound ones, which > could be interesting as well. > > If you skip just the twol transducer (e.g., use > .deps/tat.LR.lexc.hfst) and split by > (or %>), this will give you > what the transducer considers to be separate morphemes (which ignores > some derivational morphemes, even fairly productive ones, but should > have all the inflectional ones). > > -- > Jonathan > > On 23 February 2016 at 05:06, dinar qurbanov <qd...@gm...> wrote: >> you have made statistics of tags: >> >> http://corpus.tatfolk.ru/stat/tatcorpus2.uniq_tags.pdf >> >> . >> >> >> what i asked was a very little other: i suggest to put suffix >> morphemes with root morphemes ( like this >> http://corpus.tatfolk.ru/stat/freq_200.pdf ) in one united list. >> >> ( so һәм 1630726 would be nearly after 1672959 +да<cnjcoo> ) >> >> >> >> >> 2015-09-23 14:12 GMT+03:00 mansur <66...@gm...>: >>> Hello, Dinar! >>> >>> I think, you can make a script to count individual morphemes from the >>> list I >>> made. >>> >>> If you don't want to, then let me know. I'll do it when I have some spare >>> time. >>> >>> With best wishes, >>> Mansur >>> >>> 2015-09-20 23:36 GMT+03:00 dinar qurbanov <qd...@gm...>: >>>> >>>> hello. >>>> >>>> i would like you make statistics of "individual" morphemes. >>>> >>>> that would show "overall" frequencies and most used tatar morphemes. >>>> >>>> i think you should see that morphemes are real "atoms" of turkic >>>> languages, not the "classic" "words". >>>> >>>> "classical" grammar of words and word forms is not correct, because >>>> the "words" are not grammatical structures, but just a phonetics level >>>> phenomena. >>>> >>>> also, by the way, as far as i see it, apertium also has been going by >>>> "classical" grammar of words and word forms. >>>> >>>> you can additionally look for "distributive morphology". also >>>> "minimalist program", as i assume by the book name and some reviews >>>> seen by me, maybe something like that. but i could not read it. by the >>>> way cannot somebody give me electronic version of it ("minimalist >>>> program")? i planned to ask this first in irc, but since i have come >>>> to this topic here, i ask this here. i have written a paper partially >>>> about this topic and i would use "minimalist program" as an example of >>>> previous works / literature / bibliography, if possible . the paper >>>> does not have any or almost any new discoveries, just popularises some >>>> ideas. it is not published in any journal , you can download it here : >>>> >>>> http://qdb.wp.kukmara-rayon.ru/2015/04/15/sentence-syntax-trees-should-be-made-from-morphemes-semantically-ordered-trees/ >>>> . >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2015-06-19 10:28 GMT+03:00 mansur <66...@gm...>: >>>> > Hey, Francis! >>>> > >>>> > We are not planning to participate in TurkLang. >>>> > >>>> > If you open the main page of the corpus >>>> > (http://corpus.tatar/index_en.php) >>>> > you can see in news section that we work quite actively :) Our team is >>>> > not >>>> > big, so process goes slower than we want... >>>> > >>>> > About plans... We have tons of them :) One of them: our corpus is >>>> > annotated >>>> > morphologically (thanks to Apertium project), but I haven't made the >>>> > full >>>> > functional search engine for that yet (you know, search by grammatical >>>> > categories, distance between words and so on). But it is not very >>>> > difficult, >>>> > I'm planning to consult a couple database specialists and make it this >>>> > year. >>>> > The most annoying part for me will be developing the popup window in >>>> > javascript so people could mark checkboxes with the "human readable" >>>> > names >>>> > of grammatical categories :) For that we also need description of all >>>> > tags >>>> > you use in English, Russian and Tatar languages. >>>> > >>>> > Today in morphological aspect we only have search by lemma. But our >>>> > main >>>> > purpose wasn't morphology, because first of all we wanted to make our >>>> > corpus >>>> > statistically rich (like Leipzig Corpora), it gives more information >>>> > for >>>> > linguistic research both in theoretical and applied aspects. We are >>>> > planning >>>> > to develop additional types of statistical search for Corpus next >>>> > year. >>>> > >>>> > In the Statistics page of our site you can find more statistical >>>> > information >>>> > we gathered during the work. >>>> > >>>> > With best wishes, >>>> > Mansur >>>> > >>>> > 2015-06-19 8:14 GMT+03:00 Francis Tyers <ft...@pr...>: >>>> >> >>>> >> A 2015-06-19 07:05, mansur escrigué: >>>> >> > Hello! >>>> >> > >>>> >> > Last year we annotated our corpus (http://corpus.tatar/index_en.php >>>> >> > [1]) using Apertium's tagger for Tatar language (thanks to Ilnar >>>> >> > for >>>> >> > help). >>>> >> > >>>> >> > A couple of days ago I made a frequecy list of grammatical forms >>>> >> > found >>>> >> > in our corpus. I'll just put the link here in case if somebody is >>>> >> > interested in it: >>>> >> > >>>> >> > http://corpus.tatar/stat/tatcorpus2.grammatical_forms.pdf [2] >>>> >> > http://corpus.tatar/stat/tatcorpus2.grammatical_forms.txt [3] >>>> >> >>>> >> Wow, this is really cool! :D >>>> >> >>>> >> Thanks for sharing it, and I'm glad that the analyser could be of use >>>> >> (and thanks to Ilnar for helping you out). >>>> >> >>>> >> Are you planning to participate in TurkLang this year ? Do you have >>>> >> any >>>> >> further plans for the corpus ? We would be interested in helping out >>>> >> in >>>> >> any way possible. >>>> >> >>>> >> Regards, >>>> >> >>>> >> Fran >>>> >> >>>> >> >>>> >> >>>> >> ------------------------------------------------------------------------------ >>>> >> _______________________________________________ >>>> >> Apertium-turkic mailing list >>>> >> Ape...@li... >>>> >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>> > >>>> > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ >>>> > >>>> > _______________________________________________ >>>> > Apertium-turkic mailing list >>>> > Ape...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>>> > >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Apertium-turkic mailing list >>>> Ape...@li... >>>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Monitor Your Dynamic Infrastructure at Any Scale With Datadog! >>> Get real-time metrics from all of your servers, apps and tools >>> in one place. >>> SourceForge users - Click here to start your Free Trial of Datadog now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 >>> _______________________________________________ >>> Apertium-turkic mailing list >>> Ape...@li... >>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >>> >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > |
From: Sevilay B. <sev...@gm...> - 2020-10-02 17:42:37
|
Dear Apertiumers, In addition to what Jonathan said, Iraqi Turkmen is a third official language in Iraq. In Turkey and in many Turkic countries, Iraqi Türkmen’s poems, songs, fictions and stories have seen a lot of interest and people admire it. So, I believe it will be of great benefit to the Iraqi Turkman by taking the ISO 639-3 registration of the language. People can study in their mother tongue and this will help them to be developed easily in different areas. These areas could be language filed or other fields. Also, this will unite the people there to maintain their own culture and customs :) Your support by commenting will be appreciated. Sincerely; Sevilay On Fri, Oct 2, 2020 at 8:28 PM Jonathan Washington < jon...@gm...> wrote: > Dear colleagues (apologies for cross-posting), > > Sevilay (CCed) and I have submitted an application to the ISO 639-3 > registrar for a new three-letter code for Sevilay's native language, > Iraqi Türkman, to be added to the standard: > https://iso639-3.sil.org/request/2020-039 > > The registration authority is currently accepting comments from the > public (until December 15th), which are taken into consideration when > the decision is made to approve the request or not. We would like to > ask you to consider submitting a comment. > > Because of how the world works, an ISO code is the next step towards > recognition of the existence of the language among academics and > industry. Hence it is also a major prerequisite for providing access > to language technology, which in turn has the potential to reinforce > continued use and intergenerational transmission of the language. > > One concern those reviewing the application might have is the > similarity of the language to other Western Oghuz varieties, like > Turkish and Azerbaycani. This is a valid concern—there is some level > of mutual intelligibility of the spoken varieties, and many speakers > of Iraqi Türkman do have some level of exposure to Turkish. However, > the varieties are linguistically rather divergent, and there are > distinct literary traditions. Furthermore, official classification of > Iraqi Türkman as a dialect of Turkish (i.e., denial of the application > along these lines) runs the risk of denying speakers of Iraqi Türkman > access to materials in their own language, whether already existing or > yet to be created. > > Please feel free to contact Sevilay and/or me with any questions about > any of this. > > -- > Jonathan > |
From: Jonathan W. <jon...@gm...> - 2020-10-02 17:29:12
|
Dear colleagues (apologies for cross-posting), Sevilay (CCed) and I have submitted an application to the ISO 639-3 registrar for a new three-letter code for Sevilay's native language, Iraqi Türkman, to be added to the standard: https://iso639-3.sil.org/request/2020-039 The registration authority is currently accepting comments from the public (until December 15th), which are taken into consideration when the decision is made to approve the request or not. We would like to ask you to consider submitting a comment. Because of how the world works, an ISO code is the next step towards recognition of the existence of the language among academics and industry. Hence it is also a major prerequisite for providing access to language technology, which in turn has the potential to reinforce continued use and intergenerational transmission of the language. One concern those reviewing the application might have is the similarity of the language to other Western Oghuz varieties, like Turkish and Azerbaycani. This is a valid concern—there is some level of mutual intelligibility of the spoken varieties, and many speakers of Iraqi Türkman do have some level of exposure to Turkish. However, the varieties are linguistically rather divergent, and there are distinct literary traditions. Furthermore, official classification of Iraqi Türkman as a dialect of Turkish (i.e., denial of the application along these lines) runs the risk of denying speakers of Iraqi Türkman access to materials in their own language, whether already existing or yet to be created. Please feel free to contact Sevilay and/or me with any questions about any of this. -- Jonathan |
From: Aibek M. <aib...@nu...> - 2020-07-11 19:52:12
|
Thanks, Jonathan and Fran! I agree that it's just how the language works. I was just thinking that for certain applications ignoring px2pl might be reasonable. In any case, Jonathan's approach is probably the best one (as usual :) On 11-Jul-20 19:22, Jonathan Washington wrote: > 11 iyl 2020, Ş. tarixində 07:08 tarixində Francis Tyers > <ft...@pr...> yazdı: >> El 2020-07-10 23:30, Aibek Makazhanov escribió: >>> Wait, but how are we supposed to recover the number of possessors when >>> there's no context? >> My first two questions would be: >> 1) How many instances of that are there in 100 or 1000 randomly selected >> examples (we can search in corpora). >> 2) What is the frequency distribution of the different analyses, again >> we can look in corpora. > I think what Fran is getting at here is that it's not a big deal that > the heuristics are going to fail sometimes. > >>> And that of possessions, for that matter. >>> >>> I can't help but think in "tagging" terms, and when presented with >>> something like: >>> "Мысықтарың келді." >>> - how shall I disambiguate between the three readings of "Мысықтарың" ? >>> >>> There might be extra-sentential context, but that's hard to work with. > If a native speaker can't recover the correct reading from the > available context, then a computer can't be expected to either. This > isn't a problem of the tags we choose, but just with how language > works. > >> That depends very much on the formalism. I agree that we shouldn't >> overengineer things, >> but in the CG it's possible to look back several "windows" (=sentences) >> and >> we can do it in apertium-anaphora too. >> >>> Wouldn't it be more practical to analyze all such cases as >>> ...<n><pl><px2sp>? > I see what you're saying—i.e., it would be nice to have a <px2sp> tag > to allow for ambiguity when it really is ambiguous. But note you've > also assigned <pl> here, which may in fact be the wrong reading, and > could screw other things up. For example, if you're doing MT, and > мысықтарың is identified as an anaphor for a subject in the following > sentence (e.g., Сосын тамақ ішті.), then it might be translated > incorrectly (as "Then they ate." instead of "Then it ate."). > > So in effect, you're not simply saying "this is ambiguous", but > instead assigning a sometimes-incorrect analysis. This is no better > than choosing an analysis at random from among the three I proposed, > which is the worst-case scenario with that approach. > > -- > Jonathan > >>> I guess that was my original incentive for introducing <px2sp>, >>> although now I see how it is different from <px3sp>. >>> >> Fran >> >> >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic -- Aibek Makazhanov, National Laboratory Astana, Computer Science Lab, 010000, 53 Kabanbay batyr ave., Astana, Kazakhstan, +7(7172)709277 |
From: Jonathan W. <jon...@gm...> - 2020-07-11 18:22:51
|
11 iyl 2020, Ş. tarixində 07:08 tarixində Francis Tyers <ft...@pr...> yazdı: > > El 2020-07-10 23:30, Aibek Makazhanov escribió: > > Wait, but how are we supposed to recover the number of possessors when > > there's no context? > > My first two questions would be: > 1) How many instances of that are there in 100 or 1000 randomly selected > examples (we can search in corpora). > 2) What is the frequency distribution of the different analyses, again > we can look in corpora. I think what Fran is getting at here is that it's not a big deal that the heuristics are going to fail sometimes. > > And that of possessions, for that matter. > > > > I can't help but think in "tagging" terms, and when presented with > > something like: > > "Мысықтарың келді." > > - how shall I disambiguate between the three readings of "Мысықтарың" ? > > > > There might be extra-sentential context, but that's hard to work with. If a native speaker can't recover the correct reading from the available context, then a computer can't be expected to either. This isn't a problem of the tags we choose, but just with how language works. > That depends very much on the formalism. I agree that we shouldn't > overengineer things, > but in the CG it's possible to look back several "windows" (=sentences) > and > we can do it in apertium-anaphora too. > > > Wouldn't it be more practical to analyze all such cases as > > ...<n><pl><px2sp>? I see what you're saying—i.e., it would be nice to have a <px2sp> tag to allow for ambiguity when it really is ambiguous. But note you've also assigned <pl> here, which may in fact be the wrong reading, and could screw other things up. For example, if you're doing MT, and мысықтарың is identified as an anaphor for a subject in the following sentence (e.g., Сосын тамақ ішті.), then it might be translated incorrectly (as "Then they ate." instead of "Then it ate."). So in effect, you're not simply saying "this is ambiguous", but instead assigning a sometimes-incorrect analysis. This is no better than choosing an analysis at random from among the three I proposed, which is the worst-case scenario with that approach. -- Jonathan > > I guess that was my original incentive for introducing <px2sp>, > > although now I see how it is different from <px3sp>. > > > > Fran > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic |
From: Jonathan W. <jon...@gm...> - 2020-07-11 18:14:29
|
10 iyl 2020, C. tarixində 17:42 tarixində Aibek Makazhanov <aib...@nu...> yazdı: > > Hi Jonathan, > > yes, мысықтарың has all three readings. > > Would that entail following analyses: > твои кошки: сенің мысық<n><pl><px2sg> > ваши кошки: сендердің мысық<n><pl><px2pl> > ваша кошка: сендердің мысық<n><px2pl> That looks like what I had in mind, yes. > ...with <px2pl> surfacing as -[Ы]ң for plural "possessions" -ЛАрЫң for > singular ones? > > That's neat! > Of course, as <px2pl> and <px2pl><frm>, -[Ы]ңдАр and -[Ы]ңЫздАр still > have to go ;) Yep, that'll need to be fixed. -- Jonathan > > On 10-Jul-20 19:28, Jonathan Washington wrote: > > Hi Aibek, > > > > Thanks for bringing this up. I believe I remember thinking about this > > or even discussing it with someone a few years back, but nothing ever > > became of it. > > > > In any case, I think you're right about this problem. One question > > might be whether it's three-ways ambiguous: can мысықтарың be "твои > > кошки", "ваша(pl) кошка" and "ваши(pl) кошки"? Or just some subset of > > those? > > > > I like your analogy to the <px3sp> problem. But in e.g. Turkish, > > there is still reasoning for having <px3pl>. For example, kedileri > > has three readings: > > kedi<n><pl><px3sg> > > kedi<n><pl><px3pl> > > kedi<n><px3pl> > > > > The first two can be condensed into kedi<n><pl><px3sp>, which is what we do. > > > > Kyrgyz мышыктары only has two readings though: > > мышык<n><pl><px3sg> > > мышык<n><pl><px3pl> > > > > So there's no need for separate <px3sg> and <px3pl> tags; we just > > condense both readings into мышык<n><pl><px3sp>. > > > > So I'd say we want <px2sp> *only* if it's not ambiguous with a > > non-<pl> reading as with Kyrgyz <px3sp>. But it seems to me it is > > ambiguous? If this is the case, then we should just fix the > > morphology in apertium-kaz. > > > > -- > > Jonathan > > > > 10 iyl 2020, C. tarixində 12:42 tarixində Aibek Makazhanov > > <aib...@nu...> yazdı: > >> Hi everyone, > >> > >> currently Apertium-kaz has 2nd person plural possession (<px2pl>) in its > >> Kazakh lexicon, realized on surface as: > >> [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р (formal) > >> I really doubt that Kazakh has dedicated plural for second possession, > >> though. > >> Instead a common workaround is to use something like <pl><px2sg> > >> > >> For instance, something like асыңдар is currently analyzed as (among > >> other things): ас<n><px2pl><nom> > >> I have never heard such usage, but ас<n><pl><px2sg> is common, as in > >> "астарың болсын!" (bon apetit) > >> > >> In general, I have never heard or seen (in data) [Ы]ңд[А]р or > >> [Ы]ң[Ы]зд[А]р being used for anything else than respective imperatives, > >> e.g. as ас<v><tv><imp><p2><pl> for асыңдар. > >> I think that 2nd person possession should be treated as the 3rd person > >> one, i.e. something like <px2sp>. > >> > >> Any thoughts? > >> > >> P.S. I think I told this to Ilnar a couple of years ago and wanted to > >> share with others, but forgot until it came up in conversation with Fran > >> today > >> > >> -- > >> Aibek Makazhanov, > >> National Laboratory Astana, > >> Computer Science Lab, > >> 010000, 53 Kabanbay batyr ave., > >> Astana, Kazakhstan, > >> +7(7172)709277 > >> > >> > >> > >> _______________________________________________ > >> Apertium-turkic mailing list > >> Ape...@li... > >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > > > _______________________________________________ > > Apertium-turkic mailing list > > Ape...@li... > > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > > -- > Aibek Makazhanov, > National Laboratory Astana, > Computer Science Lab, > 010000, 53 Kabanbay batyr ave., > Astana, Kazakhstan, > +7(7172)709277 > > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic |
From: Francis T. <ft...@pr...> - 2020-07-11 11:08:18
|
El 2020-07-10 23:30, Aibek Makazhanov escribió: > Wait, but how are we supposed to recover the number of possessors when > there's no context? My first two questions would be: 1) How many instances of that are there in 100 or 1000 randomly selected examples (we can search in corpora). 2) What is the frequency distribution of the different analyses, again we can look in corpora. > And that of possessions, for that matter. > > I can't help but think in "tagging" terms, and when presented with > something like: > "Мысықтарың келді." > - how shall I disambiguate between the three readings of "Мысықтарың" ? > > There might be extra-sentential context, but that's hard to work with. That depends very much on the formalism. I agree that we shouldn't overengineer things, but in the CG it's possible to look back several "windows" (=sentences) and we can do it in apertium-anaphora too. > Wouldn't it be more practical to analyze all such cases as > ...<n><pl><px2sp>? > I guess that was my original incentive for introducing <px2sp>, > although now I see how it is different from <px3sp>. > Fran |
From: Aibek M. <aib...@nu...> - 2020-07-10 22:30:25
|
Wait, but how are we supposed to recover the number of possessors when there's no context? And that of possessions, for that matter. I can't help but think in "tagging" terms, and when presented with something like: "Мысықтарың келді." - how shall I disambiguate between the three readings of "Мысықтарың" ? There might be extra-sentential context, but that's hard to work with. Wouldn't it be more practical to analyze all such cases as ...<n><pl><px2sp>? I guess that was my original incentive for introducing <px2sp>, although now I see how it is different from <px3sp>. On 10-Jul-20 22:41, Aibek Makazhanov wrote: > Hi Jonathan, > > yes, мысықтарың has all three readings. > > Would that entail following analyses: > твои кошки: сенің мысық<n><pl><px2sg> > ваши кошки: сендердің мысық<n><pl><px2pl> > ваша кошка: сендердің мысық<n><px2pl> > ...with <px2pl> surfacing as -[Ы]ң for plural "possessions" -ЛАрЫң for > singular ones? > > That's neat! > Of course, as <px2pl> and <px2pl><frm>, -[Ы]ңдАр and -[Ы]ңЫздАр still > have to go ;) > > > On 10-Jul-20 19:28, Jonathan Washington wrote: >> Hi Aibek, >> >> Thanks for bringing this up. I believe I remember thinking about this >> or even discussing it with someone a few years back, but nothing ever >> became of it. >> >> In any case, I think you're right about this problem. One question >> might be whether it's three-ways ambiguous: can мысықтарың be "твои >> кошки", "ваша(pl) кошка" and "ваши(pl) кошки"? Or just some subset of >> those? >> >> I like your analogy to the <px3sp> problem. But in e.g. Turkish, >> there is still reasoning for having <px3pl>. For example, kedileri >> has three readings: >> kedi<n><pl><px3sg> >> kedi<n><pl><px3pl> >> kedi<n><px3pl> >> >> The first two can be condensed into kedi<n><pl><px3sp>, which is what >> we do. >> >> Kyrgyz мышыктары only has two readings though: >> мышык<n><pl><px3sg> >> мышык<n><pl><px3pl> >> >> So there's no need for separate <px3sg> and <px3pl> tags; we just >> condense both readings into мышык<n><pl><px3sp>. >> >> So I'd say we want <px2sp> *only* if it's not ambiguous with a >> non-<pl> reading as with Kyrgyz <px3sp>. But it seems to me it is >> ambiguous? If this is the case, then we should just fix the >> morphology in apertium-kaz. >> >> -- >> Jonathan >> >> 10 iyl 2020, C. tarixində 12:42 tarixində Aibek Makazhanov >> <aib...@nu...> yazdı: >>> Hi everyone, >>> >>> currently Apertium-kaz has 2nd person plural possession (<px2pl>) in >>> its >>> Kazakh lexicon, realized on surface as: >>> [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р (formal) >>> I really doubt that Kazakh has dedicated plural for second possession, >>> though. >>> Instead a common workaround is to use something like <pl><px2sg> >>> >>> For instance, something like асыңдар is currently analyzed as (among >>> other things): ас<n><px2pl><nom> >>> I have never heard such usage, but ас<n><pl><px2sg> is common, as in >>> "астарың болсын!" (bon apetit) >>> >>> In general, I have never heard or seen (in data) [Ы]ңд[А]р or >>> [Ы]ң[Ы]зд[А]р being used for anything else than respective imperatives, >>> e.g. as ас<v><tv><imp><p2><pl> for асыңдар. >>> I think that 2nd person possession should be treated as the 3rd person >>> one, i.e. something like <px2sp>. >>> >>> Any thoughts? >>> >>> P.S. I think I told this to Ilnar a couple of years ago and wanted to >>> share with others, but forgot until it came up in conversation with >>> Fran >>> today >>> >>> -- >>> Aibek Makazhanov, >>> National Laboratory Astana, >>> Computer Science Lab, >>> 010000, 53 Kabanbay batyr ave., >>> Astana, Kazakhstan, >>> +7(7172)709277 >>> >>> >>> >>> _______________________________________________ >>> Apertium-turkic mailing list >>> Ape...@li... >>> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >> >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > -- Aibek Makazhanov, National Laboratory Astana, Computer Science Lab, 010000, 53 Kabanbay batyr ave., Astana, Kazakhstan, +7(7172)709277 |
From: Aibek M. <aib...@nu...> - 2020-07-10 21:42:11
|
Hi Jonathan, yes, мысықтарың has all three readings. Would that entail following analyses: твои кошки: сенің мысық<n><pl><px2sg> ваши кошки: сендердің мысық<n><pl><px2pl> ваша кошка: сендердің мысық<n><px2pl> ...with <px2pl> surfacing as -[Ы]ң for plural "possessions" -ЛАрЫң for singular ones? That's neat! Of course, as <px2pl> and <px2pl><frm>, -[Ы]ңдАр and -[Ы]ңЫздАр still have to go ;) On 10-Jul-20 19:28, Jonathan Washington wrote: > Hi Aibek, > > Thanks for bringing this up. I believe I remember thinking about this > or even discussing it with someone a few years back, but nothing ever > became of it. > > In any case, I think you're right about this problem. One question > might be whether it's three-ways ambiguous: can мысықтарың be "твои > кошки", "ваша(pl) кошка" and "ваши(pl) кошки"? Or just some subset of > those? > > I like your analogy to the <px3sp> problem. But in e.g. Turkish, > there is still reasoning for having <px3pl>. For example, kedileri > has three readings: > kedi<n><pl><px3sg> > kedi<n><pl><px3pl> > kedi<n><px3pl> > > The first two can be condensed into kedi<n><pl><px3sp>, which is what we do. > > Kyrgyz мышыктары only has two readings though: > мышык<n><pl><px3sg> > мышык<n><pl><px3pl> > > So there's no need for separate <px3sg> and <px3pl> tags; we just > condense both readings into мышык<n><pl><px3sp>. > > So I'd say we want <px2sp> *only* if it's not ambiguous with a > non-<pl> reading as with Kyrgyz <px3sp>. But it seems to me it is > ambiguous? If this is the case, then we should just fix the > morphology in apertium-kaz. > > -- > Jonathan > > 10 iyl 2020, C. tarixində 12:42 tarixində Aibek Makazhanov > <aib...@nu...> yazdı: >> Hi everyone, >> >> currently Apertium-kaz has 2nd person plural possession (<px2pl>) in its >> Kazakh lexicon, realized on surface as: >> [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р (formal) >> I really doubt that Kazakh has dedicated plural for second possession, >> though. >> Instead a common workaround is to use something like <pl><px2sg> >> >> For instance, something like асыңдар is currently analyzed as (among >> other things): ас<n><px2pl><nom> >> I have never heard such usage, but ас<n><pl><px2sg> is common, as in >> "астарың болсын!" (bon apetit) >> >> In general, I have never heard or seen (in data) [Ы]ңд[А]р or >> [Ы]ң[Ы]зд[А]р being used for anything else than respective imperatives, >> e.g. as ас<v><tv><imp><p2><pl> for асыңдар. >> I think that 2nd person possession should be treated as the 3rd person >> one, i.e. something like <px2sp>. >> >> Any thoughts? >> >> P.S. I think I told this to Ilnar a couple of years ago and wanted to >> share with others, but forgot until it came up in conversation with Fran >> today >> >> -- >> Aibek Makazhanov, >> National Laboratory Astana, >> Computer Science Lab, >> 010000, 53 Kabanbay batyr ave., >> Astana, Kazakhstan, >> +7(7172)709277 >> >> >> >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic -- Aibek Makazhanov, National Laboratory Astana, Computer Science Lab, 010000, 53 Kabanbay batyr ave., Astana, Kazakhstan, +7(7172)709277 |
From: Jonathan W. <jon...@gm...> - 2020-07-10 18:29:03
|
Hi Aibek, Thanks for bringing this up. I believe I remember thinking about this or even discussing it with someone a few years back, but nothing ever became of it. In any case, I think you're right about this problem. One question might be whether it's three-ways ambiguous: can мысықтарың be "твои кошки", "ваша(pl) кошка" and "ваши(pl) кошки"? Or just some subset of those? I like your analogy to the <px3sp> problem. But in e.g. Turkish, there is still reasoning for having <px3pl>. For example, kedileri has three readings: kedi<n><pl><px3sg> kedi<n><pl><px3pl> kedi<n><px3pl> The first two can be condensed into kedi<n><pl><px3sp>, which is what we do. Kyrgyz мышыктары only has two readings though: мышык<n><pl><px3sg> мышык<n><pl><px3pl> So there's no need for separate <px3sg> and <px3pl> tags; we just condense both readings into мышык<n><pl><px3sp>. So I'd say we want <px2sp> *only* if it's not ambiguous with a non-<pl> reading as with Kyrgyz <px3sp>. But it seems to me it is ambiguous? If this is the case, then we should just fix the morphology in apertium-kaz. -- Jonathan 10 iyl 2020, C. tarixində 12:42 tarixində Aibek Makazhanov <aib...@nu...> yazdı: > > Hi everyone, > > currently Apertium-kaz has 2nd person plural possession (<px2pl>) in its > Kazakh lexicon, realized on surface as: > [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р (formal) > I really doubt that Kazakh has dedicated plural for second possession, > though. > Instead a common workaround is to use something like <pl><px2sg> > > For instance, something like асыңдар is currently analyzed as (among > other things): ас<n><px2pl><nom> > I have never heard such usage, but ас<n><pl><px2sg> is common, as in > "астарың болсын!" (bon apetit) > > In general, I have never heard or seen (in data) [Ы]ңд[А]р or > [Ы]ң[Ы]зд[А]р being used for anything else than respective imperatives, > e.g. as ас<v><tv><imp><p2><pl> for асыңдар. > I think that 2nd person possession should be treated as the 3rd person > one, i.e. something like <px2sp>. > > Any thoughts? > > P.S. I think I told this to Ilnar a couple of years ago and wanted to > share with others, but forgot until it came up in conversation with Fran > today > > -- > Aibek Makazhanov, > National Laboratory Astana, > Computer Science Lab, > 010000, 53 Kabanbay batyr ave., > Astana, Kazakhstan, > +7(7172)709277 > > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic |
From: Aibek M. <aib...@nu...> - 2020-07-10 16:42:31
|
Hi everyone, currently Apertium-kaz has 2nd person plural possession (<px2pl>) in its Kazakh lexicon, realized on surface as: [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р (formal) I really doubt that Kazakh has dedicated plural for second possession, though. Instead a common workaround is to use something like <pl><px2sg> For instance, something like асыңдар is currently analyzed as (among other things): ас<n><px2pl><nom> I have never heard such usage, but ас<n><pl><px2sg> is common, as in "астарың болсын!" (bon apetit) In general, I have never heard or seen (in data) [Ы]ңд[А]р or [Ы]ң[Ы]зд[А]р being used for anything else than respective imperatives, e.g. as ас<v><tv><imp><p2><pl> for асыңдар. I think that 2nd person possession should be treated as the 3rd person one, i.e. something like <px2sp>. Any thoughts? P.S. I think I told this to Ilnar a couple of years ago and wanted to share with others, but forgot until it came up in conversation with Fran today -- Aibek Makazhanov, National Laboratory Astana, Computer Science Lab, 010000, 53 Kabanbay batyr ave., Astana, Kazakhstan, +7(7172)709277 |
From: Jonathan W. <jon...@gm...> - 2020-06-30 13:52:47
|
In case anyone missed this and is interested in joining. First session starts in 10 minutes! -- Jonathan ---------- Forwarded message --------- From: Mikel L. Forcada <ml...@dl...> Date: Mon, Jun 29, 2020, 11:34 Subject: [Apertium-stuff] Apertium Online Workshop 2020 To: ape...@li... < ape...@li...> Dear Apertiumers: The Apertium Online Workshop is open to all Apertium developers. We will discuss how information flows from one module to another in the Apertium pipeline It will be held in two two-hour sessions, one tomorrow and the other one on Thursday. Here's the programme and instructions to sign up: https://wiki.apertium.org/wiki/Online_Apertium_Workshop_2020 Sorry for the late notice. All the best, Mikel Forcada -- Mikel L. Forcada http://www.dlsi.ua.es/~mlf/ Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03690 Sant Vicent del Raspeig Spain Office: +34 96 590 9776 _______________________________________________ Apertium-stuff mailing list Ape...@li... https://lists.sourceforge.net/lists/listinfo/apertium-stuff |
From: Jonathan W. <jon...@gm...> - 2019-12-26 19:27:19
|
Salom Elmurod, Siz bilan tanishganimizdan xursandmiz! We're very glad to hear of your interest in developing Uzbek-language resources in the context of Apertium. There is an HFST-based morphological analyser of Uzbek, with a fairly large lexicon, but the morphology has not been fully implemented or tested, and the lexicon probably needs to be cleaned some as well: http://github.com/apertium/apertium-uzb There are translation systems for Uzbek-Qaraqalpaq as well as Uzbek-Kyrgyz and Turkish-Uzbek: https://github.com/apertium?utf8=✓&q=uzb I believe the Uzbek-Qaraqalpaq translation system is the best developed, but they all need additional work. Any of the other pairs you mentioned are also very worthwhile. In deciding what to focus your work on, one consideration might be what is useful for a given community. For example, Uzbek-Qaraqalpaq could be very useful e.g. for more efficiently expanding the Qaraqalpaq Wikipedia and translating official documents to Qaraqalpaq. The Turkish-Uzbek translation pair could be useful for efficiently expanding the Uzbek Wikipedia. The Uzbek-Kyrgyz/Kyrgyz-Uzbek pair could be useful for supporting Uzbek-speaking communities in Kyrgyzstan, and the same goes for Uzbek-Kazakh/Kazakh-Uzbek and Uzbek-speaking communities in Kazakhstan. There are new Apertium modules which are not used in any of these pairs, except perhaps one or two of them in Uzbek-Qaraqalpaq: apertium-separable, apertium-recursive, and apertium-anaphora. The first two especially could be very useful for Turkic-Turkic translation. Others on the apertium-turkic list (CCed, and which you might consider subscribing to) might have thoughts about what is most interesting to work on. -- Jonathan On Thu, Dec 26, 2019, 06:12 Elmurod Kuriyozov <elm...@gm...> wrote: > Dear Apertium staff, > > I am Elmurod Kuriyozov, a PhD student in Computational Linguistics at > Universidade la Coruna, Spain. > My research work is to create *NLP resources for Turkic languages*, with > a special focus on Uzbek language, that is because *I come from > Uzbekistan.* > > I have been following your amazing works done for Turkic languages and > would like to *contribute* to Apertium project by creating more resources > for Uzbek language. > My special interests are: > > 1. Creating HFST based Morphological analyzer for Uzbek language; > 2. RBMT for Uzbek - Other Turkic languages, more specifically: > - Uzbek-Kara-Kalpak > - Uzbek-Uyghur > - Uzbek-Kyrgyz > - Uzbek-Kazakh > - Uzbek-Turkish > > One year has been spent to enter the field of NLP and now I would like to > improve my NLP resources of the native language. > > These above are the reasons I am contacting you. Could you please let me > know if you're interested in working with Uzbek language?. > It would be very kind of you if you could help me by forwarding this email > to specific people who work with Turkic languages or sharing the contact > details of them. I am available to do a research of minimum of 3 months as > required by my International Doctorate. > > Will be looking forward to hear from you soon. > > Kindest regards > -- > Elmurod Kuriyozov, > PhD student in Computational Linguistics > University of a Coruna, Spain > _______________________________________________ > Apertium-contact mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-contact > |
From: mansur <66...@gm...> - 2019-11-25 18:32:16
|
Hello! Recently I noticed quite strange behavior in apertium-tat. 1) In first group we can see the difference in previous (several months ago) and current analysis. +мы +МЫ -- why is it in uppercase now? 4179632c4179632 < ^АНСАТМЫ/АНСАТ<adj>+мы<qst>$^./.<sent>$ --- > ^АНСАТМЫ/АНСАТ<adj>+МЫ<qst>$^./.<sent>$ 4179638c4179638 < ^АНСАМБЛЬЛӘРЕ/АНСАМБЛЬ<n><pl><px3sp><nom>+и<cop><aor><p3><pl>$^./.<sent>$ --- > ^АНСАМБЛЬЛӘРЕ/АНСАМБЛЬ<n><pl><px3sp><nom>+И<cop><aor><p3><pl>$^./.<sent>$ 4179654c4179654 < ^АНОДЛЫ/АНОД<n><sg>+лы<post>$^./.<sent>$ --- > ^АНОДЛЫ/АНОД<n><sg>+ЛЫ<post>$^./.<sent>$ 4179949c4179949 < ^АМБИЦИЯСЕЗ/АМБИЦИЯ<n><sg><sg>+сыз<post>$^./.<sent>$ --- > ^АМБИЦИЯСЕЗ/АМБИЦИЯ<n><sg><sg>+СЫЗ<post>$^./.<sent>$ 2) This group of unknown words tend to change case for some letters (morphemes) in lemmas: АЛЛАҺ and АЛлаһ 4180375,4180376c4180375,4180376 < ^АЛЛаҺтаД/*АЛЛАҺТАД$^./.<sent>$ < ^АЛЛаҺ/АЛЛАҺ<n><sg><sg><nom>$^./.<sent>$ --- > ^АЛЛаҺтаД/*АЛЛАҺтаД$^./.<sent>$ > ^АЛЛаҺ/АЛлаһ<n><sg><sg><nom>$^./.<sent>$ Do you have the same behavior? Could you help to fix it?! With best wishes, Mansur |
From: Jonathan W. <jon...@gm...> - 2019-11-02 02:21:25
|
Hi all, Below please find a revised CFP for the Machine Translation Special Issue on MT for Low-Resource Languages. ===== CALL FOR PAPERS: Machine Translation Journal Special Issue on Machine Translation for Low-Resource Languages https://www.springer.com/computer/ai/journal/10590/ GUEST EDITORS (Listed alphabetically) • Alina Karakanta (FBK-Fondazione Bruno Kessler) • Audrey N. Tong (NIST) • Chao-Hong Liu (ADAPT Centre/Dublin City University) • Ian Soboroff (NIST) • Jonathan Washington (Swarthmore College) • Oleg Aulov (NIST) • Xiaobing Zhao (Minzu University of China) Machine translation (MT) technologies have been improved significantly in the last two decades, with developments in phrase-based statistical MT (SMT) and recently neural MT (NMT). However, most of these methods rely on the availability of large parallel data for training the MT systems, resources which are not available for the majority of language pairs, and hence current technologies often fall short in their ability to be applied to low-resource languages. Developing MT technologies using relatively small corpora still presents a major challenge for the MT community. In addition, many methods for developing MT systems still rely on several natural language processing (NLP) tools to pre-process texts in source languages and post-process MT outputs in target languages. The performance of these tools often has a great impact on the quality of the resulting translation. The availability of MT technologies and NLP tools can facilitate equal access to information for the speakers of a language and determine on which side of the digital divide they will end up. The lack of these technologies for many of the world's languages provides opportunities both for the field to grow and for making tools available for speakers of low-resource languages. In recent years, several workshops and evaluations have been organized to promote research on low-resource languages. NIST has been conducting Low Resource Human Language Technology evaluations (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is no training data in the evaluation language. Participants receive training data in related languages, but need to bootstrap systems in the surprise evaluation language at the start of the evaluation. Methods for this include pivoting approaches and taking advantage of linguistic universals. The evaluations are supported by DARPA's Low Resource Languages for Emergent Incidents (LORELEI) program, which seeks to advance technologies that are less dependent on large data resources and that can be quickly pivoted to new languages within a very short amount of time so that information from any language can be extracted in a timely manner to provide situation awareness to emergent incidents. There are also the Workshop on Technologies for MT of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing (DeepLo), which provide a venue for sharing research and working on the research and development in this field. This special issue solicits original research papers on MT systems/methods and related NLP tools for low-resource languages in general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very welcome to submit their work to the special issue. Summary papers on MT research for specific low-resource languages, as well as extended versions (>40% difference) of published papers from relevant conferences/workshops are also welcome. Topics of the special issue include but are not limited to: * Research and review papers of MT systems/methods for low-resource languages * Research and review papers of pre-processing and/or post-processing NLP tools for MT * Word tokenizers/de-tokenizers for low-resource languages * Word/morpheme segmenters for low-resource languages * Use of morphological analyzers and/or morpheme segmenters in MT * Multilingual/cross-lingual NLP tools for MT * Review of available corpora of low-resource languages for MT * Pivot MT for low-resource languages * Zero-shot MT for low-resource languages * Fast building of MT systems for low-resource languages * Re-usability of existing MT systems and/or NLP tools for low-resource languages * Machine translation for language preservation * Techniques that work across many languages and modalities * Techniques that are less dependent on large data resources * Use of language-universal resources * Bootstrap trained resources for short development cycle * Entity-, relation- and event-extraction * Sentiment detection * Summarization * Processing diverse languages, genres (news, social media, etc.) and modalities (text, speech, video, etc.) IMPORTANT DATES November 26, 2019: Expression of interest (EOI) February 25, 2020: Paper submission deadline July 7, 2020: Camera-ready papers due December, 2020: Publication SUBMISSION GUIDELINES o For EOI, please submit via the link: https://forms.gle/mAQH4qaPTuzDhEceA o For paper submission, please go to the MT journal website https://link.springer.com/journal/10590 and select this special issue o Authors should follow the "Instructions for Authors" o Recommended length of paper is 15 pages ===== -- Jonathan |
From: ogabek y. <oga...@gm...> - 2019-04-06 17:13:52
|
Hello give a feedback to my proposal. Also I am struggling to write a work plan. http://wiki.apertium.org/wiki/User:Ogabek |
From: Jonathan W. <jon...@gm...> - 2019-03-30 18:41:35
|
Murat is probably right, assuming the sentence is about a "computer game" and not a "computer thought". Which would explain wh the Tatar output was уеныгыз and not уегыз. But I guess the main issues were алғашқы and нинди. Also, I think bilgesez hasn't returned to IRC since dropping this bug report and leaving. -- Jonathan сб, 30 мар. 2019 г. в 05:50, Murat Jumashev <jum...@gm...>: > There is a typo in "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай > болды?" > > It should be "ойыныңыздың", not "ойыңыздың" > > -------------------- > Best regards / Урматым менен / С уважением, > Murat Jumashev / Мурат Жумашев > > > On Sat, Mar 30, 2019 at 4:46 AM Ilnar Salimzianov <il...@se...> > wrote: > >> Hi, >> >> this has been fix in [1]. >> >> Thanks for the bug report. >> >> Ilnar >> >> [1] >> https://github.com/apertium/apertium-kaz-tat/commit/d4b77517442184c390fdbc56c4b12a4b860f00de >> >> On 3/29/19 3:05 PM, Francis Tyers wrote: >> > 11:09 --> bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has >> > joined #apertium >> > 11:11 <bilgesez> hey >> > 11:11 <+TinoDidriksen> Hullo >> > 11:11 <bilgesez> Who's know Russian? >> > 11:12 <+TinoDidriksen> A few people. Not sure if any of them are awake. >> > 11:15 <bilgesez> I can suggest a Tatar Grammar Manual if you need to >> > upgrade translation >> > 11:15 <bilgesez> >> > >> https://et.ef-cdn.com/EtownResources/anat-tele-grammar/1.4/en/Index.html >> > 11:15 <begiak> [ Grammar lab ] >> > 11:15 <bilgesez> yes >> > 11:19 <bilgesez> I found an error in this translation of Kazakh into >> > Tatar "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай болды?" to >> > "Сезнең #алдан компьютер уегызның исеме кандай булды?" >> > 11:20 <bilgesez> but actually right is "Сезнең беренче компьютер >> > уеныгызның исеме нинди булды?" >> > 11:28 <-- bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has >> > quit (Quit: Page closed) >> > >> > Fran >> > >> > >> > _______________________________________________ >> > Apertium-turkic mailing list >> > Ape...@li... >> > https://lists.sourceforge.net/lists/listinfo/apertium-turkic >> >> -- >> GPG: 0xF3ED6A19 >> >> >> >> _______________________________________________ >> Apertium-turkic mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/apertium-turkic >> > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > |
From: Murat J. <jum...@gm...> - 2019-03-30 09:50:08
|
There is a typo in "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай болды?" It should be "ойыныңыздың", not "ойыңыздың" -------------------- Best regards / Урматым менен / С уважением, Murat Jumashev / Мурат Жумашев On Sat, Mar 30, 2019 at 4:46 AM Ilnar Salimzianov <il...@se...> wrote: > Hi, > > this has been fix in [1]. > > Thanks for the bug report. > > Ilnar > > [1] > https://github.com/apertium/apertium-kaz-tat/commit/d4b77517442184c390fdbc56c4b12a4b860f00de > > On 3/29/19 3:05 PM, Francis Tyers wrote: > > 11:09 --> bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has > > joined #apertium > > 11:11 <bilgesez> hey > > 11:11 <+TinoDidriksen> Hullo > > 11:11 <bilgesez> Who's know Russian? > > 11:12 <+TinoDidriksen> A few people. Not sure if any of them are awake. > > 11:15 <bilgesez> I can suggest a Tatar Grammar Manual if you need to > > upgrade translation > > 11:15 <bilgesez> > > https://et.ef-cdn.com/EtownResources/anat-tele-grammar/1.4/en/Index.html > > 11:15 <begiak> [ Grammar lab ] > > 11:15 <bilgesez> yes > > 11:19 <bilgesez> I found an error in this translation of Kazakh into > > Tatar "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай болды?" to > > "Сезнең #алдан компьютер уегызның исеме кандай булды?" > > 11:20 <bilgesez> but actually right is "Сезнең беренче компьютер > > уеныгызның исеме нинди булды?" > > 11:28 <-- bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has > > quit (Quit: Page closed) > > > > Fran > > > > > > _______________________________________________ > > Apertium-turkic mailing list > > Ape...@li... > > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > > -- > GPG: 0xF3ED6A19 > > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic > |
From: Ilnar S. <il...@se...> - 2019-03-29 22:46:07
|
Hi, this has been fix in [1]. Thanks for the bug report. Ilnar [1] https://github.com/apertium/apertium-kaz-tat/commit/d4b77517442184c390fdbc56c4b12a4b860f00de On 3/29/19 3:05 PM, Francis Tyers wrote: > 11:09 --> bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has > joined #apertium > 11:11 <bilgesez> hey > 11:11 <+TinoDidriksen> Hullo > 11:11 <bilgesez> Who's know Russian? > 11:12 <+TinoDidriksen> A few people. Not sure if any of them are awake. > 11:15 <bilgesez> I can suggest a Tatar Grammar Manual if you need to > upgrade translation > 11:15 <bilgesez> > https://et.ef-cdn.com/EtownResources/anat-tele-grammar/1.4/en/Index.html > 11:15 <begiak> [ Grammar lab ] > 11:15 <bilgesez> yes > 11:19 <bilgesez> I found an error in this translation of Kazakh into > Tatar "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай болды?" to > "Сезнең #алдан компьютер уегызның исеме кандай булды?" > 11:20 <bilgesez> but actually right is "Сезнең беренче компьютер > уеныгызның исеме нинди булды?" > 11:28 <-- bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has > quit (Quit: Page closed) > > Fran > > > _______________________________________________ > Apertium-turkic mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-turkic -- GPG: 0xF3ED6A19 |
From: Francis T. <ft...@pr...> - 2019-03-29 12:05:51
|
11:09 --> bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has joined #apertium 11:11 <bilgesez> hey 11:11 <+TinoDidriksen> Hullo 11:11 <bilgesez> Who's know Russian? 11:12 <+TinoDidriksen> A few people. Not sure if any of them are awake. 11:15 <bilgesez> I can suggest a Tatar Grammar Manual if you need to upgrade translation 11:15 <bilgesez> https://et.ef-cdn.com/EtownResources/anat-tele-grammar/1.4/en/Index.html 11:15 <begiak> [ Grammar lab ] 11:15 <bilgesez> yes 11:19 <bilgesez> I found an error in this translation of Kazakh into Tatar "Сіздің алғашқы компьютерлік ойыңыздың атауы қандай болды?" to "Сезнең #алдан компьютер уегызның исеме кандай булды?" 11:20 <bilgesez> but actually right is "Сезнең беренче компьютер уеныгызның исеме нинди булды?" 11:28 <-- bilgesez (b2cd3e14@gateway/web/freenode/ip.178.205.62.20) has quit (Quit: Page closed) Fran |