#1 derive hun-eng and eng-hun (?) properly

open
nobody
5
2011-01-10
2006-10-29
No

Visi Gabor <dzsabor@mir.hu> submited the bug #394865
on Debian BTS:

Using the eng-hun dictionary, definitions that contain
phrases (I mean short, multiword definitions) do not
contain spaces between words. E.g.:

a cake of soap -> egydarabszappan should be "egy darab
szappan"
addict -> rabjavminek should be "rabja vminek"

I'm using Debian/Sid, dict/dictd 1.10.2-3,
dict-freedict-eng-hun 1.3-3.
I dictunzipped /usr/share/dictd/freedict-eng-hun.dict.dz
and it contains the wrong definitions.

Best regardas
Kęstutis Biliūnas

Discussion

  • Daniel Darabos

    Daniel Darabos - 2007-12-06

    Logged In: YES
    user_id=1085153
    Originator: NO

    It is the same for the hun-eng dictionary.

    It has "egy darab szappan" -> "acakeofsoap".

    This makes the two largest dictionaries of the project practically useless. Since they contain huge amounts of multi-word expressions, I would guess that 80-90% of their content is erroneous in this way. Also it is impossible to automatically filter out these errors (well, maybe matching the hun-eng and eng-hun dictionaries against each other could work).

    The originals (I guess) for these very large and high quality dictionaries were available at http://dict.sztaki.hu/ until a few years ago, when it was bought up buy a large company. Since then I can not find a download link for the dictionaries. I don't know how to get them or even how they were licenced at the time they were available.

     
  • Daniel Darabos

    Daniel Darabos - 2007-12-06

    Logged In: YES
    user_id=1085153
    Originator: NO

    The issue also afflicts the Turkish-English dictionary. I am worried that maybe there is a bug in one of the tools used for dictionary creation.

     
  • Piotr Banski

    Piotr Banski - 2011-01-08
    • priority: 5 --> 7
     
  • Piotr Banski

    Piotr Banski - 2011-01-08

    The dictionaries are still around and as far as hun-eng is concerned, it's only a matter of better reprocessing. Increasing the priority -- hun<->eng are the only two dictionaries left over as TEI P4.

     
  • Piotr Banski

    Piotr Banski - 2011-01-09

    (Moving from Bugs, summary changed)

    The task is as follows, if I understand correctly: look at the hun-eng directory in our SVN and fix the Perl script that derives both hun-eng and eng-hun from the .zip file present there, so that it produces the proper TEI P4 output (easier than P5, which I can then create from the P4).

     
  • Piotr Banski

    Piotr Banski - 2011-01-09
    • labels: 719897 -->
    • priority: 7 --> 5
    • assigned_to: micha137 --> nobody
    • summary: missing spaces in definitions (eng-hun) --> derive hun-eng and eng-hun (?) properly
     
  • Piotr Banski

    Piotr Banski - 2011-01-10
    • labels: --> conversion to TEI
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks