Re: [Apertium-turkic] Fwd: Re: apertium-eng-kaz migration

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 13 August 2013 14:51, Mikel Forcada <ml...@dl...> wrote:
> I resend this message after clipping, as it bounced from the list.
>
> Mikel
>
>
> -------- Missatge original --------
> Assumpte: Re: [Apertium-turkic] apertium-eng-kaz migration
> Data: Tue, 13 Aug 2013 18:51:54 +0200
> De: Mikel Forcada <ml...@dl...>
> Empresa: Universitat d'Alacant
> A: ape...@li..., Aida Sundetova
> <sun...@gm...>
>
>
> Jonathan, Aida:
>
> Jonathan, thanks a million for the changes.
>
> I am checking changes and I have some doubts.
>
> The rule you wrote had a verb жат<vaux><pres><p1><sg> and now it does not
> work as in the new dictionaries, where to get the same form (жатырмын) we
> seem to need to output жатыр<vaux><pres><p1><sg>. I have changed (in two
> places) that but I am not sure it is correct. Now things work, but may be
> wrong. Please enlighten me ;-)

The issue here is that the base form of the auxiliary is жат, just
like the verb.  For present tense, it's жатыр, with copula endings,
but you e.g. past tense forms like жатқан (тамақ жеп жатқанмын), etc.
We have it defined specially in kaz.lexc so that жат<vaux> should get
all the right forms.  Using this will save you having to write special
rules for other tenses later.  If something's broken, it should be
investigated, but it's probably something minor--the larger picture is
being approached correctly with жат<vaux>.  See the following outputs:

$ echo жатырмын | hfst-proc kaz-eng.automorf.hfst
^жатырмын/жат<vaux><pres><p1><sg>$

$ echo "жат<vaux><pres><p1><sg>" | hfst-proc eng-kaz.autogen.hfst
^жат<vaux><pres><p1><sg>/жатырмын$

> Aida, I have changed the .t1x file to remove the if rule which was causing
> many regression tests not to work. We have to deal with "if" in .t2x and
> this needs deeper thinking I think.
>
> I have also changed one rule which misplaced the blanks.

Yeah, I noticed that a lot of verbs were getting conditional and that
there were extra spaces in some places, but it didn't seem related to
the migration, so I left it for you guys to figure out.  Glad it was
something known / easy to get out of the way for now.

> I seem to have additional problems with the nominal form needed with the
> verb "want", which I will change later.  I might ask you (Jonathan) for
> assistance on this one.

Nominal forms?  If you're talking about the -{G}{I} forms, I tend to
treat those as participles.  They take person agreement, but verbal
nouns and adjectives get person agreement in Oghuz varieties (at least
Turkish, that is), and the conditional verbal adverbs take person
agreement in Qychaq languages (Kazakh, Kyrgyz, Tatar, etc.).

> Finally, I think I did some research and I had figured out that "керек" was
> an adjective ("necessary", "essential", "required") because of the way it
> worked. Now it is a noun in your dictionaries (as in Turkish and Azeri
> "gerek", where "necessary" is ), and you changed .t1x rules accordingly. I'd
> like to hear your reasons to call it a noun now — I don't have a Kazakh
> reference. Then we can discuss the issue with the copula ;-)

We've had various discussions about this issue, and the major
complicating factor is that many nouns can be used as adjectives and
many adjectives can be used as nouns.  The word керек probably fits
into one of the Kazakh adjective classes nicely, though, so maybe it
should be treated as an adjective.  I wonder if Ilnar has any thoughts
about this.

Either way, my point about using +е<cop> stands.

> Once we sort these out, I'd like us to refine the protocol, so that Aida can
> clearly understand it and stick to it.

A good plan.  This would be a great time and venue to decide on such
things as a group and have us all stick to it.

-- 
Jonathan

> All the best,
>
> Mikel
>
>
>
> Al 08/13/2013 03:09 AM, En/na Jonathan North Washington ha escrit:
>
> Yes, I was able to fix things and commit early in the weekend.  The main
> problem was that I'd gotten some stuff backwards in the update-morphs
> script, so it was looking for the dix's eng stems in kaz.lexc.  I also
> cleaned up a few transfer rules so that some of the regressions tests worked
> given some updates in the transducer.
>
> Also, I noticed something and have a suggestion.  Instead of having a bunch
> of different transfer rules for керек in different tenses, you could just
> have one that does керек<adj>+е<cop> and deal with the copula tenses more
> generally, with some combination of existing tricks in the transducer and
> new tricks in the transducer and transfer.  I'm happy to advise further,
> especially if you can find me in IRC.
>
> --
> Jonathan
>
>
> --
> Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
>
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-turkic mailing list
> Ape...@li...
> https://lists.sourceforge.net/lists/listinfo/apertium-turkic
>

Re: [Apertium-turkic] Fwd: Re: apertium-eng-kaz migration

The free and open-source rule-based machine translation platform

Re: [Apertium-turkic] Fwd: Re: apertium-eng-kaz migration