From: Jacob N. <jac...@gm...> - 2009-12-27 22:45:17
|
2009/12/27 Jimmy O'Regan <jo...@gm...> > 2009/12/27 Jacob Nordfalk <jac...@gm...>: > >> > >> 2009/12/22 Jimmy O'Regan <jo...@gm...>: > >> > >> > It seems to do a fair job of getting the decompounding right, but, at > >> > a glance, the translation results seem - at best - to be 50-50. > > > > The main problem heres is that the bidix is not mature in eo->en > direction, > > giving not-so-good translations in general. For example, flanko should be > > translated as 'side', not 'aspect'. If you take that into account I think > > its generally a big improvement. > > > > No, Jacob, I don't, and only a mix of confirmation bias and sheer > optimism would make you think so. > > Let's take the 8s; I was looking at the top 10, and weighting by frequency ( world war / mondmilito is used 150 times in corpus, so I would say its much more important than words used 8 times), but OK. > I'll put my guess of the meaning of the Esperanto, > you rate it. > > 8 Song contest. > ^Kantokonkurso/Kanto<n><sg><nom>+konkurso<n><sg><nom>$ > > I figure this is good, but only because I can guess 'song contest' > from the Esperanto. > Yes, its perfect (as far as my English reaches :-) > > 8 knowledge crown. > ^konverto/kono<n><sg><nom>+verto<n><sg><nom>$ > > Convert? > Yes. Bad analysis. This is the only I would say is really bad from the 8's. > > 8 iron course. > > ^fervojlinio/fero<n><sg><nom>+vojlinio<n><sg><nom>/fervojon><sg><nom>+linio<n><sg><nom>$ > > Railway line? > First analysis (fero<n><sg><nom>+vojlinio<n><sg><nom>) was chosen by tagger. Had it chosen the last it would have been 'Railway line'. In general I have the impression that ealiest longest match on first word should be preferred, but I have to discuss this with Hector, la lingvisto. > > 8 field city. ^kampurbo/kampo<n><sg><nom>+urbo<n><sg><nom>$ > > Camp site? > http://eo.wikipedia.org/wiki/Kampurbo (http://en.wikipedia.org/wiki/Market_town ). I would say its acceptable, that is, understandable in the context. > > 8 Euro sight. ^Eŭrovido/Eŭro<n><sg><nom>+vido<n><sg><nom>$ > > No idea. > *Eurovision * http://eo.wikipedia.org/wiki/Eŭrovido I would say its acceptable, that is, understandable in the context. > > 8 crossing war. ^krucmilito/kruco<n><sg><nom>+milito<n><sg><nom>$ > > Crusader? > http://eo.wikipedia.org/wiki/Krucmilito kruco should be 'cross' (heres an example of that the bidix is not mature in eo->en direction), thus 'cross war' for 'crusade'. I would say its acceptable, that is, understandable in the context. > > 8 bronze era. ^bronzepoko/bronzo<n><sg><nom>+epoko<n><sg><nom>$ > > Bronze age. > Yes. I would say its acceptable, that is, understandable in the context. All in all, both as a help for postedit and also for people that doesent understand Esperanto, the translations above is, in 6 out of 7 cases, a help. Had 'Convert' existed in the English .dix-es I used as source a year ago when I last mass-added http://traduku.net data then konverto would also have been known (and thus not analyzed as compound). Now only 'konverti' (the verb) is known. In Esperanto it might be feasible to somehow have some word POS change fallback (here use verb root 'konverti' when noun 'konverto' is not known), before doing a decomposition fallback. In other languages where, as in Esperanto, there is a very clear POS marker on the word (-o, -i), there might also be a benefit. Well, I hope I don't make the linguists laugh too much on my ideas. Jacob :-) -- Jacob Nordfalk एस्पेरान्तो के हो? http://www.esperanto.org.np/. Memoraĵoj de KEF -. http://kef.saluton.dk/memorajoj/ |