From: Hèctor A. i F. <h....@es...> - 2009-12-28 04:53:36
|
2009/12/28 Jacob Nordfalk <jac...@gm...> > > >> 8 iron course. >> >> ^fervojlinio/fero<n><sg><nom>+vojlinio<n><sg><nom>/fervojon><sg><nom>+linio<n><sg><nom>$ >> >> Railway line? >> > > First analysis (fero<n><sg><nom>+vojlinio<n><sg><nom>) was chosen by > tagger. > Had it chosen the last it would have been 'Railway line'. > In general I have the impression that ealiest longest match on first word > should be preferred, but I have to discuss this with Hector, la lingvisto. > "La lingvisto"? "De pa sucat en oli" one would say in Catalan. In fact this word type is unusual in Esperanto: three so-called roots in the same word. Generally they are broken into constructions of the type "fervoja linio" (ADJ NOM). In this precise case Google gives a bit more results of "fervojlinio" than "fervoja linio", so this seems to indicate that for Esperanto-speakers "fervoj" is becoming a root and not simply a compound. In any case, it couldn't say that the earliest longest match would be the better. It probably may be a good guess because of the the many so called suffixes, which almost all of them in fact acts as roots and should be analyzed as the main element. (Gledhill's corpus based "The Grammar of Esperanto", Lincom Europa, does not help for this kind of proposed heuristic). > 8 bronze era. ^bronzepoko/bronzo<n><sg><nom>+epoko<n><sg><nom>$ >> >> Bronze age. >> > > Yes. I would say its acceptable, that is, understandable in the context. > > In fact this example presents one of the problems of the mild problems of this approach. I'd say that "epok" used as a suffix should be translated in English as "age", but not as it's main translation "era". The whole treatment of compounds in Esperanto is very complex, and generally their analysis takes half of the big Esperanto grammars. Maybe an approach based in specific rules for typical elements would be more productive and reliable. I'm thinking also in something that could work at least between HE languages, e.g. the subdivision of typical affixes: - rediri = re+diri = tell (diri) again (re) - refari = re+fari = do again - subgrupo = subgroup - subfamilio = subfamily - superfamilio = superfamily - postklasika = postclassic(al) - antaŭklasika = preclassic(al) - praklasika = protoclassic(al) - reĝeto = little king - reĝego = big king - reĝido = king's son - etc. (In fact I quite missed this possibility when in the translators from Catalan and Spanish into Esperanto) Also some rules could help to find alternatives in the dictionaries, e.g.: - Xeca = Xa (beleca ~ bela = beautiful) - Xigo = Xiĝo (formigo ~ formiĝo = formation) - Xadi = Xi (salutadi ~ saluti = to greet) - etc. I hope this can help a bit on the discussion. Hèctor |