Menu

#8 update_alphabet() chokes on Latin-1 input

closed-rejected
nobody
None
5
2004-09-18
2004-09-03
Anonymous
No

Also called by fmt_newheadword(), update_alphabet()
has no apparent purpose but triggers an assertion if it
receives non-UTF-8 8-bit data. dictfmt continues to
function if update_alphabet() is removed entirely.
Does this function do anything useful?

There is a related restriction in write_hw_to_index():

if (tolower_alnumspace (word, new_word,
allchars_mode, utf8_mode)){
fprintf (stderr, "'%s' is not a UTF-8 string", word);

However, as update_alphabet can only process UTF-8
text, this condition will never be true.

--Leah <qleah@earthlink.net>

Discussion

  • Aleksey Cheusov

    Aleksey Cheusov - 2004-09-18
    • status: open --> closed-rejected
     
  • Aleksey Cheusov

    Aleksey Cheusov - 2004-09-18

    Logged In: YES
    user_id=587312

    In order to build 8-bit or utf-8 dictionaries compatible
    with dictd server
    it is necessary to specify --locale option.
    For latin1 charset you may, for example, run dictfmt like this

    dictfmt --locale de_DE.ISO-8859-1

    assuming that the locale de_DE.ISO-8859-1 is installed on
    you system.

    update_alphabet function in turn is needed to build
    00-database-alphabet headword, its definition contains a
    list of characters present in real headwords. This
    information is necessary for utf-8 dictionaries works
    correctly with LEV search strategy.
    This also speed-ups LEV strategy for utf8 and ASCII
    dictionaries.

     
MongoDB Logo MongoDB